Skip to main content

NIST Study Evaluates Effects of Race, Age, Sex on Face Recognition Software

A recent NIST study explored the question of how accurately facial recognition algorithms match faces on a one-to-one scale and one-to-many scale, more specifically how the demographics of those being tested by the algorithm had an impact on the results. One-to-one matching entails verifying the identity of one person based on two different images of them, and one-to-many matching is used for identification of a person through images in a large database. The study ran 18.27 million images depicting a total of 8.49 million people (with demographics such as race, age, and sex included) through various algorithms, some from the United States and others developed in other countries. It aimed to measure the various demographic differentials in the ability and accuracy of measuring two images of the same person. It is intended to provide additional context to policymakers and the developers of the algorithms as to how to improve and/or regulate the technology. Altogether, 189 algorithms from 99 total developers were tested using the same database of images. The categories of false positives and false negatives were evaluated in the results of the tests, and the real-life implications of each vary drastically. For false negatives, they can be briefly retested to obtain a match and have no legitimate consequences; however, for false positives, it can prove to be life-changing for some. In a one-to-many database search, a false positive can lead to situations like those detailed in this New York Times article, in which someone is wrongly accused and potentially faces undeserved legal consequences. The study is also unique for being the first to consider demographic effects on one-to-many matching, the potential consequences of which were just described. The study’s findings, generally, were as follows.

1) Images of Asian and African American people faced higher rates of false positives than those of Caucasians for one-to-one matching.

2) Algorithms developed in the United States showed similarly high false positives rates for Asians, African Americans, Native Americans, American Indians (the highest among them), Alaskan Indians, and Pacific Islanders for one-to-one matching.

3) For one-to-one matching, the drastic false positive difference between Asian and American faces was not present in Asian-developed algorithms, with a possible reason for this being the data used to train the algorithm. 

4) African American females faced high rates in one-to-many matching.

5) Different algorithms provided different rates of false positives through demographics.

Overall, the results of the study, as said by author of the report, Patrick Grother, serve as “an encouraging sign that more diverse training data may produce more equitable outcomes.” These “equitable outcomes” become more and more important as the danger of inaccurate facial recognition rises.


Comments are closed.