Face-detection and face-recognition algorithms have progressed enormously over the past few years. A significant contributor to that surge is the coupling of algorithms modeled on mammalian brain processing functions — so-called neural networks — with vastly increased computing power that makes possible lightning-quick comparisons of a viewed image with a dataset of millions of existing images. Despite that developmental leap, the absence of apples-to-apples evaluations has long left law enforcement and the security industry in doubt as to exactly which algorithms work best for different face-analysis functions (e.g., face detection vs. face recognition) and in different face-comparison environments (e.g., small mugshots vs. fuzzy facial images, or off-center shots from airport security cameras).
More precise knowledge of how alternative face algorithms work can improve product performance and better inform users and prospective buyers, while bringing more fact-based evidence to the public discussion on appropriate limits of face recognition technology use.
An Apples-to-Apples Comparison of Face Algorithms’ Effectiveness
Noting the potential of layered, deep neural networks, the National Institute of Justice (NIJ) set out to facilitate a precise, apples-to-apples comparison of deep neural network algorithms designed for face detection or recognition with other, older products, including PittPatt, the conventional algorithm long widely used by law enforcement for detection and recognition functions. NIJ first funded work by Carnegie Mellon University (CMU) to develop a new generation of deep neural networks especially adept at face recognition, then supported comparative testing of the performance of the CMU algorithms and other deep neural networks against that of the industry workhorse, PittPatt. The evaluations were performed by the National Criminal Justice Technology Research, Test, and Evaluation Center (RT&E Center) at the Johns Hopkins University Applied Physics Laboratory. (The RT&E Center is also supported by NIJ.)
The evaluation yielded significant new insights, among them that (1) deep neural network algorithms were generally shown to be superior in key respects, including in comparison to PittPatt; (2) the suite of new, deep neural network algorithms developed by CMU, named Ultron, showed particular promise for recognizing faces where the view is less than optimal because it is from an angle, otherwise obscured, or of low quality (because it was taken in low light, or had a low pixel number, etc.); and (3) testing of deep neural network algorithms tailored to “periocular reconstruction” — a method of identifying individuals by comparing images restricted to facial areas, around and including the eyes — was also highly encouraging.
Results of Comparison: Neural Networks Generally Perform Better
The RT&E Center team designed evaluations affording one-to-one comparisons of the object algorithms and benchmark algorithms by using identical image datasets and metrics to test and compare algorithm performance. The research focused on three elements of face analysis:
- Face detection — determining whether an observed image is in fact a human face.
- Face recognition — comparing an observed face to a dataset of face images.
- Periocular face reconstruction — identifying an individual by referencing images capturing only the area around and including the eyes.
In the face detection realm, the team compared four algorithms: CMU’s new Ultron; two other deep neural network algorithms, TinyFaces and YOLO; and a traditional algorithm extensively used by law enforcement, PittPatt. Ultron, TinyFaces,[1] and YOLO[2] are all open-source — meaning the original source code is free. For each of the four face detection algorithms, the same three datasets were used to test the algorithm’s performance.[3]
The detection algorithm evaluation results indicated that two deep learning neural networks, Ultron and TinyFaces, were superior to PittPatt and YOLO. The evaluators concluded that Ultron’s performance was comparable to TinyFaces’s, PittPatt’s performance was inferior to Ultron’s and TinyFaces’s, and YOLO had the lowest performance.[4]
For the face recognition phase, the CMU deep neural network algorithm, part of its Ultron suite, was measured against benchmark algorithms OpenFace and PittPatt. Ultron was specifically designed to excel at recognizing off-angle faces — faces that are largely obscured or in low light, or face images otherwise of low quality.[5] CMU’s Ultron algorithms outperformed Open Face and PittPatt. Though positive, the assessment of the Ultron algorithms left an important question unanswered: “It does not provide any indication of the performance relative to other state-of-the-art face recognition algorithms circa 2018, such as the Intelligence Advanced Research Projects Activity’s Janus,” said Christopher Rigano, an NIJ computer scientist who helped manage the RT&E Center’s research.
For the face reconstruction evaluation phase, the CMU algorithm, called Dimensionally Weighted K-SVD (DWK-SVD), was measured against the benchmark algorithm Principal Component Analysis, using face recognizers Kernel Class-Dependence Feature Analysis[6] and PittPatt, with four datasets. The CMU algorithm was shown to generate better matching results than the preexisting technologies. The RT&E Center researchers recommended that, before the DWK-SVD periocular algorithm could be turned into a viable product, “it should be retrained on a larger dataset of representative images of the end-use case.”[7]
Neural Network Potential Unlocked by Vast Increases in Computing Power
In effect, deep neural networks applied to face detection and recognition train themselves to be hugely proficient at rifling through massive volumes of images in existing datasets, in seconds, and at spotting stored images matching (or nearly matching) the image being viewed. The emergence of deep neural networks as game changers was made possible by a recent quantum leap in computing power, facilitating fast, efficient processing of mega-datasets. That enhanced computing power has allowed scientists to model many more layers of virtual neurons, amassing neural networks that are ultrafast and deeply perceptive, and possessing a quality known as “deep learning.” As noted in a recent MIT Technology Review article on deep learning neural networks, “With this greater depth, they are producing remarkable advances in speech and image recognition.”[8]
Despite the historical opacity of propriety algorithms in the area of face detection and recognition, work by another federal agency revealed that collectively, face recognition algorithm performance has improved dramatically in a short time. A study by the National Institute of Standards and Technology (NIST) applied its own test to assess industry performance over time and determined that 25 developers’ recognition algorithms in 2018 were superior to the most accurate algorithm tested in 2014, and that just 0.2% of searches by all of the algorithms tested failed in 2018, as compared with a failure rate of 4% in 2004 and 5% in 2010.[9] Since 2000, NIST stated, overall face recognition accuracy has improved “massively.”[10] (See accompanying article on NIJ’s history of support for face detection and face recognition technology research.)
To better understand the benefits of artificial intelligence in terms of processing speed, NIJ is partnering with NIST and the FBI to assess the speed of facial identification by humans, as compared with identification by algorithms. The project assessed the face recognition performance of human examiners alone, algorithms alone, and human examiners aided by algorithms. Preliminary results suggest that when human examiners limited their recognition time to 30 seconds, algorithms performed comparably to the examiners.[11] Examiners aided by algorithms delivered the best results.
For optimal results, dataset selection for training algorithms should reflect their intended use. For example, an algorithm designed for face recognition in a closed circuit TV environment will be trained and tested, optimally, on a dataset of closed circuit TV images.
Conclusion
A rigorous, NIJ-supported comparison of face-analysis algorithms, including CMU’s deep neural network algorithm, Ultron, designed to take on difficult face recognition challenges concluded that the neural networks generally outperformed the conventional face-analysis algorithm, PittPatt, and that CMU’s periocular reconstruction algorithm had better matching results than preexisting products, warranting further research.
Deep neural networks — brain-mimicking algorithms of unprecedented speed and analytical acumen — appear likely to transform face detection and recognition. However, more work is needed to understand, measure, and compare competing algorithms. That will require more and better access to the “black box” of proprietary commercial algorithms, as well as a commitment to performing exacting, apples-to-apples comparisons of algorithms’ strengths, weaknesses, and comparative merits. Knowing what is driving recognition results will promote a better picture of the potential strengths of these tools, and the best ways for law enforcement to utilize them.
About this Article
The research described in this article was funded by NIJ award 2013-MU-CX-K111, awarded to the National Criminal Justice Technology Research, Test, and Evaluation Center at the Johns Hopkins University Applied Physics Laboratory. This article is based on the grantee final report “NIJ Face Algorithm Assessment – Phases I, II, and III, Version 1.0” (March 2019).