Face-detection and face-recognition algorithms have progressed enormously over the past few years. A significant contributor to that surge is the coupling of algorithms modeled on mammalian brain processing functions — so-called neural networks — with vastly increased computing power that makes possible lightning-quick comparisons of a viewed image with a dataset of millions of existing images. Despite that developmental leap, the absence of apples-to-apples evaluations has long left law enforcement and the security industry in doubt as to exactly which algorithms work best for different face-analysis functions (e.g., face detection vs. face recognition) and in different face-comparison environments (e.g., small mugshots vs. fuzzy facial images, or off-center shots from airport security cameras).
More precise knowledge of how alternative face algorithms work can improve product performance and better inform users and prospective buyers, while bringing more fact-based evidence to the public discussion on appropriate limits of face recognition technology use.
An Apples-to-Apples Comparison of Face Algorithms’ Effectiveness
Noting the potential of layered, deep neural networks, the National Institute of Justice (NIJ) set out to facilitate a precise, apples-to-apples comparison of deep neural network algorithms designed for face detection or recognition with other, older products, including PittPatt, the conventional algorithm long widely used by law enforcement for detection and recognition functions. NIJ first funded work by Carnegie Mellon University (CMU) to develop a new generation of deep neural networks especially adept at face recognition, then supported comparative testing of the performance of the CMU algorithms and other deep neural networks against that of the industry workhorse, PittPatt. The evaluations were performed by the National Criminal Justice Technology Research, Test, and Evaluation Center (RT&E Center) at the Johns Hopkins University Applied Physics Laboratory. (The RT&E Center is also supported by NIJ.)
The evaluation yielded significant new insights, among them that (1) deep neural network algorithms were generally shown to be superior in key respects, including in comparison to PittPatt; (2) the suite of new, deep neural network algorithms developed by CMU, named Ultron, showed particular promise for recognizing faces where the view is less than optimal because it is from an angle, otherwise obscured, or of low quality (because it was taken in low light, or had a low pixel number, etc.); and (3) testing of deep neural network algorithms tailored to “periocular reconstruction” — a method of identifying individuals by comparing images restricted to facial areas, around and including the eyes — was also highly encouraging.
Results of Comparison: Neural Networks Generally Perform Better
The RT&E Center team designed evaluations affording one-to-one comparisons of the object algorithms and benchmark algorithms by using identical image datasets and metrics to test and compare algorithm performance. The research focused on three elements of face analysis:
- Face detection — determining whether an observed image is in fact a human face.
- Face recognition — comparing an observed face to a dataset of face images.
- Periocular face reconstruction — identifying an individual by referencing images capturing only the area around and including the eyes.
In the face detection realm, the team compared four algorithms: CMU’s new Ultron; two other deep neural network algorithms, TinyFaces and YOLO; and a traditional algorithm extensively used by law enforcement, PittPatt. Ultron, TinyFaces, and YOLO are all open-source — meaning the original source code is free. For each of the four face detection algorithms, the same three datasets were used to test the algorithm’s performance.
The detection algorithm evaluation results indicated that two deep learning neural networks, Ultron and TinyFaces, were superior to PittPatt and YOLO. The evaluators concluded that Ultron’s performance was comparable to TinyFaces’s, PittPatt’s performance was inferior to Ultron’s and TinyFaces’s, and YOLO had the lowest performance.
For the face recognition phase, the CMU deep neural network algorithm, part of its Ultron suite, was measured against benchmark algorithms OpenFace and PittPatt. Ultron was specifically designed to excel at recognizing off-angle faces — faces that are largely obscured or in low light, or face images otherwise of low quality. CMU’s Ultron algorithms outperformed Open Face and PittPatt. Though positive, the assessment of the Ultron algorithms left an important question unanswered: “It does not provide any indication of the performance relative to other state-of-the-art face recognition algorithms circa 2018, such as the Intelligence Advanced Research Projects Activity’s Janus,” said Christopher Rigano, an NIJ computer scientist who helped manage the RT&E Center’s research.
For the face reconstruction evaluation phase, the CMU algorithm, called Dimensionally Weighted K-SVD (DWK-SVD), was measured against the benchmark algorithm Principal Component Analysis, using face recognizers Kernel Class-Dependence Feature Analysis and PittPatt, with four datasets. The CMU algorithm was shown to generate better matching results than the preexisting technologies. The RT&E Center researchers recommended that, before the DWK-SVD periocular algorithm could be turned into a viable product, “it should be retrained on a larger dataset of representative images of the end-use case.”
Neural Network Potential Unlocked by Vast Increases in Computing Power
In effect, deep neural networks applied to face detection and recognition train themselves to be hugely proficient at rifling through massive volumes of images in existing datasets, in seconds, and at spotting stored images matching (or nearly matching) the image being viewed. The emergence of deep neural networks as game changers was made possible by a recent quantum leap in computing power, facilitating fast, efficient processing of mega-datasets. That enhanced computing power has allowed scientists to model many more layers of virtual neurons, amassing neural networks that are ultrafast and deeply perceptive, and possessing a quality known as “deep learning.” As noted in a recent MIT Technology Review article on deep learning neural networks, “With this greater depth, they are producing remarkable advances in speech and image recognition.”
Despite the historical opacity of propriety algorithms in the area of face detection and recognition, work by another federal agency revealed that collectively, face recognition algorithm performance has improved dramatically in a short time. A study by the National Institute of Standards and Technology (NIST) applied its own test to assess industry performance over time and determined that 25 developers’ recognition algorithms in 2018 were superior to the most accurate algorithm tested in 2014, and that just 0.2% of searches by all of the algorithms tested failed in 2018, as compared with a failure rate of 4% in 2004 and 5% in 2010. Since 2000, NIST stated, overall face recognition accuracy has improved “massively.” (See accompanying article on NIJ’s history of support for face detection and face recognition technology research.)
To better understand the benefits of artificial intelligence in terms of processing speed, NIJ is partnering with NIST and the FBI to assess the speed of facial identification by humans, as compared with identification by algorithms. The project assessed the face recognition performance of human examiners alone, algorithms alone, and human examiners aided by algorithms. Preliminary results suggest that when human examiners limited their recognition time to 30 seconds, algorithms performed comparably to the examiners. Examiners aided by algorithms delivered the best results.
For optimal results, dataset selection for training algorithms should reflect their intended use. For example, an algorithm designed for face recognition in a closed circuit TV environment will be trained and tested, optimally, on a dataset of closed circuit TV images.
A rigorous, NIJ-supported comparison of face-analysis algorithms, including CMU’s deep neural network algorithm, Ultron, designed to take on difficult face recognition challenges concluded that the neural networks generally outperformed the conventional face-analysis algorithm, PittPatt, and that CMU’s periocular reconstruction algorithm had better matching results than preexisting products, warranting further research.
Deep neural networks — brain-mimicking algorithms of unprecedented speed and analytical acumen — appear likely to transform face detection and recognition. However, more work is needed to understand, measure, and compare competing algorithms. That will require more and better access to the “black box” of proprietary commercial algorithms, as well as a commitment to performing exacting, apples-to-apples comparisons of algorithms’ strengths, weaknesses, and comparative merits. Knowing what is driving recognition results will promote a better picture of the potential strengths of these tools, and the best ways for law enforcement to utilize them.
About this Article
The research described in this article was funded by NIJ award 2013-MU-CX-K111, awarded to the National Criminal Justice Technology Research, Test, and Evaluation Center at the Johns Hopkins University Applied Physics Laboratory. This article is based on the grantee final report “NIJ Face Algorithm Assessment – Phases I, II, and III, Version 1.0” (March 2019).
[note 1] Manuel Günther et al., “Unconstrained Face Detection and Open-Set Face Recognition Challenge,” errata version of paper presented at the International Joint Conference on Biometrics, 2017, arXiv: 1708.02337v3. TinyFaces is described at the top of the third page.
[note 2] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” 2016, arXiv: 1506.02640v5.
[note 3] The datasets used to test the face-detection algorithms are listed on page iii in the research report: The National Criminal Justice Technology Research, Test, and Evaluation Center, “NIJ Face Algorithm Assessment – Phases I, II, and III, Version 1.0,” report to the National Institute of Justice, grant number 2013-MU-CX-K111, March 2019, NCJ 252825.
[note 4] The Johns Hopkins researchers added a caveat to the face-detection assessments: The face-image algorithm research community recognizes that it is not uncommon for evaluation mismatches to occur between detection algorithms and testing datasets — for example, where an algorithm tends to generate face images showing a full head of hair and both ears, but the testing dataset provides only closely cropped images. A mismatch influences the accuracy of the face-detection function. In this case, the researchers concluded, such a mismatch initially caused an inaccurate comparison showing the older tested algorithm to perform better than Carnegie Mellon’s new deep neural network algorithm. A widely used “normalization” process was applied to correct for the mismatch, with the final result showing that two deep neural networks, including the Carnegie Mellon product, outperformed the conventional detection algorithm, PittPatt.
[note 5] The National Criminal Justice Technology Research, Test, and Evaluation Center, “NIJ Face Algorithm Assessment,” iii.
[note 6] Principal Component Analysis (PCA) is a statistical method used to reveal the structure of data in a way that best captures data variance. PCA allows projection of an object from an optimal viewpoint, in terms of variances. It is a method commonly used in pattern recognition. Marios Savvides, “Kernel Correlation Feature Analysis: A New Advanced Correlation Filter Approach for Recognizing Uncontrolled Face Image Data in FRGC - Phase II,” presentation deck, Carnegie Mellon University Electrical and Computer Engineering, 11, Kernel Class-Dependence Feature Analysis (KCFA) helps reduce “noise” in image recognition data by capturing the consistent aspects of training images while de-emphasizing inconsistent parts. KCFA is well-suited to image recognition, as it represents an optimal tradeoff between discrimination ability and noise tolerance. Chunyan Xie and B.V.K. Vijaya, “Comparison of Kernel Class-Dependence Feature Analysis (KCFA) with Kernel Discriminant Analysis (KDA) for Face Recognition,” in IEEE International Conference on Biometrics: Theory, Applications, and Systems (BTAS), Institute of Electrical and Electronics Engineers, 2007, doi:10.1109/BTAS.2007.4401947.
[note 7] The National Criminal Justice Technology Research, Test, and Evaluation Center, “NIJ Face Algorithm Assessment,” ix.
[note 8] Robert D. Hof, “Deep Learning,” MIT Technology Review, April 23, 2013.
[note 9] Patrick Grother, Mei Ngan, and Kayee Hanaoka, “Ongoing Face Recognition Vendor Test (FRVT) Part 2: Identification,” Washington, DC: U.S. Department of Commerce, National Institute of Standards and Technology, November 2018, NISTIR 8238; National Institute of Standards and Technology, “NIST Evaluation Shows Advance in Face Recognition Software’s Capabilities,” Washington, DC: U.S. Department of Commerce, National Institute of Standards and Technology, November 2018; and Katie Kaye, “This Little-Known Facial-Recognition Accuracy Test Has Big Influence,” International Association of Privacy Professionals, January 7, 2019.
[note 10] Grother, Ngan, and Hanaoka, “Ongoing Face Recognition,” 3.
[note 11] Christopher Rigano, “Using Artificial Intelligence to Address Criminal Justice Needs,” NIJ Journal 280, October 2018.