Facial recognition systems are starting to become commonplace in our modern day society. Many major cities have implemented such systems in their CCTV systems and are being ostensibly used to combat crime.
The accuracy of these systems have been questioned. Trials in London resulted in a 96% rate of false positives, where the software incorrectly matches a person with a photo on the database. The city of San Francisco has banned the use of facial recognition technology by departments under its jurisdiction due to privacy concerns and other ethical questions.
Mugshot.py simulates such a facial recognition system using dlib face recognition, which boasts an accuracy of 99.38%. Its facial encoding library was created with publicly accessible mugshot imagery and the charges associated with each image. The user submits an image that contains one or multiple faces. Mugshot.py will then return the closest match, the accuracy percentage and the charges associated with the match. I hope that by creating an open source application, the methods and limitations of facial recognition systems become apparent.
Mugshot imagery is first collected from JailBase, a service that provides pubicly available arrest information from US states where publication of such data is legal. When the collection process is run, images are collected from the 10 most recent entries from each US county. The entry is anonymised and finding the original JailBase entry is rendered impossible to be found, neither by the user nor by the creator.
Each mugshot image is analyzed. When an image is not valid or a face was not found, it is not inserted into the dataset. Likewise if no charges are presented, it is not included as well. If valid, the face's location in the image, unique encoding and associated charges are inserted into the dataset.
When the user submits an image, Mugshot.py will search for faces. It will then return the nearest match for each face found in the submitted image. The user then can see the submitted image with the encoding visualised, the accuracy percentage to the nearest match and its associated charges. The accuracy percentage is calculated using this method with the threshold of 0.5, which indicates a strong match.
It would be incorrect to say the dataset is without issues. It contains data from the US states where arrest information can be legally published. States where this is not allowed include California, the District of Columbia, Vermont and a handful others. As a result a significant proportion of the arrestee data is already excluded. As JailBase does not provide any ethnicity data, it is impossible to do an analysis of its ethnicity markup. However if we do consider the dataset as a representative cross section of the general arrestee population, the US Bureau of Justice Statistics estimate that in 2014, of all the arrests that were made in 2014, 69% of the arrestees were white, 28% were black, 2% were American Indian/Alaskan Natives and 1% were Asian/Pacific Islanders. 73% were men and 27% were women. Thus the accuracy differs depending on the ethnicity or skin color of the individual. A white male will more likely return a match than an Asian female.
The face recognition model has flaws too. The dlib model was trained using public datasets that do not have an equal distribution of all countries. As a result, analyses of Asian individuals are less accurate than European individuals.
Mugshot.py was created by Haryo Sukmawanto, a multimedia artist based in Ghent, Belgium. The source code is available on Github. It owes particular thanks to Adam Geitgey for his library face_recognition. The facial encoding data was created using resources from JailBase. Data acquired from Jailbase has been anonymised and its contents are not traceable to the origin.