Original article was published on Deep Learning on Medium
AI pseudoscience and scientific racism
Recent attempts to predict criminality from facial features recall a long tradition of unethical and racist pseudoscience
A recent paper about to be published by Harrisburg University caused quite a stir earlier this month. Titled “A Deep Neural Network Model to Predict Criminality Using Image Processing,” the paper promised:
With 80 percent accuracy and with no racial bias, the software can predict if someone is a criminal based solely on a picture of their face. The software is intended to help law enforcement prevent crime.
This claim is absolutely ludicrous, as will be discussed below.
Wide-ranging criticism from various researchers caused the paper to be pulled. Another paper from 2016 made similar promises — it too was widely criticized, causing the authors to prepend a response to their original paper on arXiv.
And yet, there was another paper published just months ago in January 2020 which has largely escaped similar notice and outrage. Hashemi and Hall’s “Criminal tendency detection from facial images and the gender bias effect” published by the Journal of Big Data claims the following:
[T]his paper explores the deep learning’s capability in distinguishing between criminal and non-criminal facial images . . . CNN [convolutional neural network] achieves a tenfold cross-validation accuracy of 97%.
Claiming 97% accuracy in predicting criminality from a person’s photograph is a bold claim. As we will see, due to problems in both its conception and execution, this claim is utterly unfounded.
This will be shown by looking at the pseudoscientific framework of the study as well as practical problems with the experiment, including an attempt to reproduce similar results by training a deep learning model. It turns out that we can get similarly excellent results that have nothing to do with “criminality” but with fundamental experimental errors.
What is a crime?
Crime is a social construct. This means that what we call a “crime” or a “criminal act” is actually an evolving definition in society depending on the place and time we are referring to. It has nothing to do with biology and has no higher, unchanging definition outside of how society defines it.
Gambling and prostitution are criminal acts in California but are legal industries in Nevada. Marijuana sale and consumption is legal in California but that was not the case until recently, and it remains a federal crime.
But these are victimless crimes, one might argue. What about murder?
If you plant a bomb in your neighbors house and it explodes and kills them, you could be found guilty of arson and murder. But if the president declares war on Iraq and bombs an entire neighborhood in Baghdad, that might not be considered a criminal act (under US law) although some of us would find it morally objectionable.
It used to be legal for a slaveowner to kill their slave, and entire economies and cultures saw this imbalance of power as a fundamental right of their society. Now in the US it is a crime to enslave somebody in the first place. A person who is kidnapped, held captive for a period of time and brutalized might be entirely justified, both morally and legally, in killing their captor in order to win back their freedom. A person who otherwise would never take another human life might be willing to do so in such circumstances.
In other words, there is nothing inherently biological about crime. It has nothing to do with your genes, much less your face.
No machine learning algorithm, not even a really good convolutional neural network, can predict crime based on a photo or other biological information about a person because “crime” has less to do with you as it does with how a society judges and responds to your actions.
Does this mean that if somebody kills another human being we just throw our hands up and say, “Well, it’s just a social construct!” No, of course not. But we have to put it into context and look at what sort of social harm was caused. In some cases, killing is inhumane and unconscionable, while in others it may be an act of self-defense and the only difference may not be the action but the society in which they occur.
The point here is that “criminality” is an incredibly vague and broad concept and the acts that we might deem criminal (or not criminal) change over time for better and for worse.
How do Hashemi and Hall address these ambiguities? They do not address them at all. In fact, they have not even built a data set which consistently labels faces correctly as criminals or non-criminals, even using the most basic and widely accepted definition used in the US.
Innocent until proven useful to data science
Hashemi and Hall use two sets of faces to train their models. First, their “criminal” faces are gathered from a set of mugshots made available by the National Institute of Standards and Technology (NIST). Their “non-criminal” faces are gathered from a number of generic non-mughsot sources managed by facial recognition researchers at various universities.
The problem is, the mugshots do not necessarily depict criminals. These are people who have been arrested, not necessarily convicted of crimes. This is a database of people who the police think are guilty of crimes, or in some cases, people that the police just felt like arresting.
On a side note, this also raises ethical issues as to whether the people in the mugshots have consented to have their images used for such purposes. Police regularly take mugshots and make them public in order to shame the people they have arrested and many journalists, to their great discredit, reprint them without permission. Again, mugshots are taken well before a person is convicted of a crime, if they are ever convicted. It is not at all clear that any of them have either been convicted of a crime or have consented to having their image widely distributed.
On the other hand, Hashemi and Hall’s generic image databases of supposedly non-criminals do not necessarily depict people who have not been convicted of a crime. We have no idea one way or the other. In fact, there is no reason why a person from the mugshot group may not also have their face in one of the generic databases, other than the statistical unlikelihood of this.
A Justice Department study estimated that one-third of working age adults in the US have an arrest record. It is statistically very likely that many of the “non-criminals” have been charged and some convicted of crimes at one time in their life, though their photos used in the experiment would show them in much happier times than when their mugshot was taken.
This is a stunning oversight. One does not need to be a sociologist to appreciate this problem with the data. One merely needs to have the most naive definition of what a crime is.
In spite of this, Hashemi and Hall claim to have trained a model that can accurately predict whether an image comes from the mugshot group or the non-mugshot group with 97% accuracy. How did they do this?
One critique of their paper has noted a number of differences between the two datasets that a neural network would pick up on that has nothing to do with criminality or arrest:
All images for the Criminal category come from the NIST dataset, and all images for the Non-Criminal category come from a set of five datasets from other sources.
All of the images labeled Criminal are photographs of printed images and are taken in a controlled manner with the same camera model, and all of the images labeled Non-Criminal are photographs of live persons taken by various cameras.
All of the images labeled Criminal were in (lossless) PNG format, and all of the images labeled Non-Criminal were in (lossy) JPG format.
All of the images labeled Criminal started out as grayscale; all of the images labeled Non-Criminal were converted from color to grayscale by the investigators.
Hashemi and Hall note, “Such disparities which are not related to facial structure, though negligible in majority of cases, might have slightly contributed in training the classifier and helping the classifier to distinguish between the two categories.”
But even here, they are only referring specifically to the first two points (the different sources) and not the last two.
These last two points were almost certainly detected by the neural network and are very likely to have been the primary cause of the 97% hit rate. We can show this in practice
Reproducing the results
It would be quite easy to build a neural network under similar conditions as Hashemi and Hall and train a model that simply discriminates between JPGs and PNGs.
In an afternoon, I did just that and the resulting Jupyter notebook can be found here.
Starting with a sample project from the book Deep Learning with Python by François Chollet, I was able to build a neural network to distinguish between “criminal” and “non-criminal” images.
Instead of using mugshots, I used the MNIST database of images of handwritten digits retrieved via the Keras Python library.
First, I randomly labelled each image as being a criminal (0) or non-criminal (1), then I trained a model to predict this arbitrarily defined criminality. Note that every image was given a criminality designation regardless of the content of the image, that is, regardless of what number was displayed in the image. So a handwritten “7” in one image was labelled as “criminal” and a “7” in another case was labelled as “non-criminal.” This was completely arbitrary and random and without any knowledge of whether the handwriting was from people with criminal convictions.
The model succeeded in correctly determining which of the handwritten digits I had designated as “criminal” versus “non-criminal” only about 50% of the time, no better than a random guess or coin toss. This is what we would expect because these designations were arbitrary.
But after I transformed all of my “criminal” images into PNG and “non-criminal” images to JPG, the resulting model that I trained succeeded over 99% of the time!
Is this a weird and arbitrary experiment? Yes, but no more so than attempting to detect criminality from a person’s face. In fact, a version of my experiment could be performed to attempt to detect criminality from handwriting. Taking handwriting samples from people in prison and other samples from people with no criminal record, we could reproduce the results above with the same accuracy — so long as the “criminal” samples were PNGs and the “non-criminal” samples were JPGs. Instead of handwriting samples, of course, we could also use faces.
In other words, what Hashemi and Hall built was probably nothing more than a sophisticated PNG vs JPG image format detector. They happened to train theirs on images of faces while I happened to train mine on images of handwritten digits. They achieved 97% accuracy with a convolutional neural network and 89% accuracy with a standard feedforward neural network — the latter being similar to how I trained my model with 99% accuracy.
The question then is not why they succeeded so well, but rather why their model fared so poorly.
Putting all of these problems aside, there remain inherent racial issues with their experiment and its very conception.
For example, FBI statistics show that Black people are four times more likely to be arrested than white people in the San Francisco Bay Area. This is true in many other large cities and metropolitan areas as well.
However, Hashemi and Hall note that “no control has been imposed on race, due to our small dataset and the difficulty and occasionally subjectivity of identifying the race from low-quality facial images.” A model trained as described could easily associate darker-skinned people with criminality due to their disproportionately high presence in the mugshots dataset and nothing else.
This would simply reflect the existing racism in the criminal justice system and further help to increase it rather than teaching us anything about inherent criminality. Such experiments also reinforce the often racist assumptions both police and juries make when believing that somebody “looks guilty.”
The return of scientific racism
This experiment harkens back to the scientific racism of the 19th and early 20th centuries. It was not so much scientific as it was a form of racism justified with psuedoscientific theories claiming to be based in Darwinian evolutionary theory but in fact having no scientific basis whatsoever.
These theories justified a number of racist policies including limits on immigration and forced sterilization of the poor and other “undesirables” who were disproportionately non-white. Much of this work originated in the US and became an inspiration for Hitler and the Third Reich.
After the horrors of World War II, these theories which had some hearing in academia became completely discredited. Hashemi and Hall do not seem to have gotten the message, instead placing their work in the tradition of Cesare Lombroso.
“This study is triggered by Lombroso’s research,” they uncritically state in the introduction, “which showed that criminals could be identified by their facial structure and emotions.”
In fact, Lombroso’s work failed to show these things and has been completely discredited. Stephen Jay Gould, the late evolutionary biologist, dedicated most of a chapter in his classic work The Mismeasure of Man to debunking Lombroso’s theory, and described it with words based on Lombroso’s own twisted thinking:
Criminals are evolutionary throwbacks in our midst. Germs of an ancestral past lie dormant in our heredity. In some unfortunate individuals, the past comes to life again. These people are innately driven to act as a normal ape or savage would, but such behavior is deemed criminal in our civilized society. Fortunately, we may identify born criminals because they bear anatomical signs of their apishness.
According to a Wired article from 2014:
Lombroso took Darwin’s recently published theory of evolution and added a horrifying twist that would reverberate for decades. You’d be hard-pressed to find an upside to his argument that criminals in fact express the physical qualities of our ancestors, bringing them closer to the dispositions of an ape than a human. Or see what good came from the towering whirlwind of racism that accompanied his hypothesis. Or in profiling people with big earlobes, “as in the ancient Egyptians,” as born criminals.
These signs of criminality included a number of traits he identified in Africans and Asians, but even identifying missing earlobes and hooked noses in some Europeans as a sign of criminality.
The very foundation of Hashemi and Hall’s article is rooted in pseudoscience and scientific racism and should be rejected as such.
One wonders what sort of peer review process would allow so many of these problems to make it into an article in the Journal of Big Data.
Much more concerning, though, is the future this sort of research has in academia and the harm it can cause to living, breathing human beings dealing with an already biased criminal justice system every day.