The Counter-intuitiveness of Fairness in Machine Learning

Original article was published by Wai On on Artificial Intelligence on Medium

The Counter-intuitiveness of Fairness in Machine Learning

Review of a Statistical Framework for Implementing Fairness

The idea that what happened in the past can serve as a good predictor of the future is the central tenet behind much of the incredible success of Machine Learning (ML). However, this “past-as-prelude” approach to predicting behavior is increasingly under scrutiny due to its perceived failures relating to bias, discrimination, and fairness. For example, the revelation that the Apple credit card algorithm was giving women less credit than men; the accusation that a widely used software for assessing recidivism risk was discriminating against black defendants. These reports not only grab news headlines, they rile our hardwired sense of fairness¹.

Photo by Unseen Histories on Unsplash

In criminal justice, there is a sustained debate on whether existing anti-discrimination laws are adequate for the oversight of predictive algorithms. In addition, there is a burgeoning research community examining how we can safeguard fairness in predictive algorithms under these laws (e.g. Barocas & Selbst, Huq). As recent unrest and protests over systemic discrimination have shown, the consequence of getting it wrong is severe. As more and more of our lives becomes automated, there is an urgency in accelerating our effort in making this technology acceptable to the society as a whole.

In this article, I will review an idea on how to bring fairness to ML. The idea appears to be counter-intuitive at first blush, but as a couple of articles have illustrated, it has a solid statistical and legal grounding.

This article is intended for anyone with a basic understanding of ML and is interested in how we can work towards implementing fairness in ML.

What is Fair?

Fairness is a social ideal. As one scholar puts it,

Fairness is whatever people say it is, as long as they agree.

In a free society, this ideal is both contentious and ever evolving. It is not surprising then that we have so many definitions of fairness in the ML literature (e.g. Verma & Rubin, Narayan).

Despite the lack of a definitive agreement on fairness, there are well established laws in the US that protect the basic rights of individuals despite their differences. Most notably, differences such as race, gender, and religion represent a set of “protected groups” that enables individuals to be treated “fairly”, particularly in the areas of employment (Title VII of the Civil Rights Act of 1964) and criminal justice (Equal Protection Clause of the Fourteenth Amendment).

Current Approaches to Implementing Fairness

As a consequence of these laws and the history of discrimination in the past, the mainstream approach to bringing fairness to ML, and predictive algorithms in general, is to eliminate the use of data from protected groups. Yang and Dobbie² compiled a list of eight commercially available tools most commonly used in the criminal justice system and found that all of them exclude race (a protected group) as an input.

Eliminating protected groups as input is not sufficient however. As a number of scholars point out, we are still left with correlated variables that are tantamount to the use of protected variables. For example, ZIP code can act as a proxy for race. Yet, there is no agreement on whether these should also be excluded. In the same list of commercially available tools, Yang and Dobbie found that only three out of eight exclude the use of correlated variables. In other words, a high proportion of the most commonly used predictive algorithms, 5/8 (62.5%), can be argued to be breaching established laws protecting individuals with protected attributes. These are potential lawsuits waiting to happen!

Benchmark (Include all variables):

Common (Exclude protected groups):

Restrictive (Exclude protected groups and correlated variables):

The quandary for algorithm makers is that further eliminating variables based on their correlation to protected groups will ultimately render the algorithm virtually useless. For example, in Yang and Dobbie’s study, they found that all their input variables were correlated with the protected variables. So following the spirit of the mainstream approach of elimination of protected attributes would deprive an ML algorithm of any input!

The problem we have is: How do we bring fairness into ML while at the same time preserve its utility?

A Statistical Framework for Implementing Fairness

Pope and Sydnor³ introduced a simple statistical framework for eliminating the effects of protected variables and their proxies. This framework was used and further examined by Yang and Dobbie² in the context of pretrial predictions.

It works like this: Let’s say we are making a prediction. For example, the likelihood that a defendant will reoffend before their trial. Based on this prediction, we can then decide if we should release the defendant before the trial.

In this framework, we represent defendants’ characteristics by three types of variables:

  • Protected (Xp): variables that represent protected groups. E.g., race, gender, national origin, religion.
  • Correlated (Xc): variables that correlate with a protected variable. E.g., ZIP code or education level, which can be a proxy for race.
  • Uncorrelated (Xu): variables that represent data that is uncorrelated with protected groups and their proxies.

For simplicity, the authors assume a predictive model using Linear Regression (Ordinary Least Square):

In other words, the prediction equals the sum of a constant (Beta-0) plus unprotected variables, correlated variables, protected variables and their weight coefficients (Beta-1, Beta-2, Beta-3), plus an error term (E). The authors also discussed how this model can be extended to more complex non-linear models under the proposed framework.

The method for making a prediction under this framework consists of 2 steps:

  1. Train a predictive model and obtain coefficient estimates. That is, Beta-0, Beta-1, Beta-2, and Beta-3 from the equation above.
  1. Make predictions using coefficient estimates from step 1 and the average values of the protected variables.

That’s all there is to it! Not only is this method deceptively simple, it is also counter-intuitive in the sense that we are using protected variables and their proxies as part of the algorithm to ensure fairness!

Why Does This Work?

Let’s start backwards with the second step since this step contains the major change and is actually quite intuitive.

With the exception of using the estimated coefficients, the only difference between this step and the first step is that instead of using data with protected characteristics we use the average of it.

More precisely, we are using the average vector of protected groups for the population. This means that two individuals who differ only in terms of a protected characteristic will not receive different predictions. For example, if race is a protected variable then the model would not know the racial profile of the individual. It is said to be “blind” to the effects of the protected variable. This is precisely what we want from a fair ML algorithm.

But what about the correlated term? That is,

Doesn’t this include the proxy effects of the protected variable? The answer is no because of the first step. Pope and Sydnor explain it like this:

First, let’s look at the estimate for the commonly accepted approach:


That is, the sum of a constant and the uncorrelated and correlated variables with their associated coefficients; which we know contains proxy effects due to the inclusion of correlated variables.

Now compare this to the estimate for the benchmark equation:


Assuming that Beta-3 is greater than zero, we see that Gamma-2 cannot be equal to Beta-2.

This is because in (ii) we are estimating with the protected variable and in (i) we are estimating without it. Our intuition tells us that the coefficient Gamma-2 is carrying the estimation power of the proxy. In other words, Gamma-2 is carrying a term that allows Xc to be a correlate of Xp.

Let’s unpack this further. Since Xc is correlated with Xp, we can assume that:


Using the standard omitted-variable-bias formula from Economics literature, we can substitute the benchmark equation with (iii):

Compare this to the commonly accepted approach (i),

Ignoring the constant and error terms, we now see that Gamma-2 estimates towards Beta-2 plus Beta-3 times Alpha-C, where Beta-2 is the uncorrelated weight coefficient (what the authors call the orthogonal coefficient), and Alpha-C is the weighted correlation coefficient. That is,

By including the protected variable Xp in step 1, we are actually making the coefficient for Xc independent of Xp. Yang and Dobbie put it like this:

“Estimating this benchmark model allows us to obtain predictive weights on correlated characteristics that are not contaminated by proxy effects, exactly because we explicitly include X_protected. Thus, this first estimation step ensures that we eliminate all proxy effects from including X_correlated” (p.34)

We can therefore be confident that Beta-2 is not “contaminated” by the correlates. In other words, it does not contain proxy effects from the Protected variables.

How Accurate is This?

Using sum of squared errors as the measure, Pope and Sydnor analyzed the predictive accuracy of the different formulations of the statistical framework. The result shows that their proposed framework is third most accurate, behind the Benchmark model and the Common model but more accurate than the Restrictive model:

  1. Benchmark model
  2. Common model
  3. Proposed model
  4. Restrictive model

As expected, the Benchmark model and Common model were more accurate than the Proposed model since they include the use of protected variables and their proxies. However, because of the tenuous legal standing of such a practice, algorithm makers may need to use a more restrictive model (providing they can find variables that can be considered as uncorrelated with other variables). The Proposed model provides a way to avoid the need to resort to a restrictive approach.

Yang and Dobbie applied the models to a large dataset of pretrial cases between 2008 and 2013 from New York City. They also painstakingly cross-referenced pretrial cases to whether the defendant actually appeared in court. The result is a data corpus of around 200,000 defendants.

Their findings corroborate the accuracy findings from Pope and Sydnor. Furthermore, although it is true that we are sacrificing accuracy for fairness, Yang and Sydnor showed that the differences in accuracy between the different algorithms are extremely small. For example, at a 50% release rate, the rates of failure to appear in court were:

  • Benchmark: 8.35
  • Common: 8.38
  • Proposed: 8.40

Under the dataset examined, the proposed model would result in only eight additional failures-to-appear.

Is it Legally Sound?

Since this framework uses data from protected groups, does it potentially violate the law? For example, the Equal Protection Clause of the Fourteenth Amendment imposes two fundamental protections with respect to race (see Huq⁴ ):

  1. Racial classifications
  2. Racialized intentions

So if race is used under this framework, does it violate discrimination laws? Yang and Dobbie argued that it does not. Even though the framework uses protected groups, it does so in order to achieve a “race-neutral prediction”.

Classifications based on “protected characteristics” are given “strict scrutiny” by the courts (i.e., #1 above). However, Yang and Dobbie argued that the Constitution “bars all racial classifications, except as a remedy for specific wrongdoing”. They argued that the framework

“should not be subject to strict scrutiny given that the use/consideration of race is not meant to distinguish or treat individuals differently on the basis of membership in a particular racial group, but the exact opposite.”. (p.37)

Even if it does get flagged, they argue that it would withstand any legal scrutiny since the purpose of the procedure is “tailored towards the aim of remedying and correcting for proxy effects and historical biases that can be ‘baked in’ to an algorithm…” (i.e., #2 above, see p.37)

Yang and Dobbie acknowledge, however, that a “lack of understanding of the underlying statistical properties of direct and proxy effects in algorithms may lead a naïve observer to conclude that both proposals are illegal because they run up against the widely accepted prohibition on the use or consideration of protected characteristics.” (p.36) The onus is on us, the ML and related communities, to educate and communicate the framework in a way that is easily understood by the Courts and by the public at large.


It is often said that AI, and ML in particular, will profoundly change the world. However, what types of relationships we establish with this nascent technology and how our lives will change are still to be determined. The recent push back against predictive algorithms, e.g. the banning of use of a benefit fraud prediction tool in the Netherlands, the banning of face recognition software in the US, and the withdrawal of A-levels grades prediction in the UK, show the struggles that we have as a free society to craft a symbiosis that is acceptable by everyone.

We cannot stumble into the future without being cognizant of our past, nor should we stifle technologies that grant us unprecedented opportunities to transform society for the better. For ML to mature, we need to learn, educate, and come to an informed decision about what the desirable characteristics of this technology should be.

“Man will only become better when you make him see what he is like.“ A. Chekhov.

In this article, I highlighted a statistical framework that aims to bring fairness to ML whilst retaining the usefulness of the algorithm. Although counterintuitive at first glance, Pope and Sydnor’s framework has been shown to be both theoretically and legally sound. The model created using this framework does lose some degree of accuracy when compared to legally contentious models; however, it does so in a way that eliminates the influence of input from protected variables and their proxies.

This work also raises an important question about what “accuracy” means in ML. For example, is accuracy a simple question of the number of correct predictions made by an algorithm? As Pope and Sydnor put it, the question becomes:

“is it more important to predict the outcomes correctly (“get it right on average”) or to weight the different characteristics properly (“get it right at the margins”)?”

Seeing it this way, many of the heated debates in algorithm fairness mentioned earlier can essentially be thought of as a difference in emphasis. Such debates have a long history in philosophy and ethics. No doubt, as long as we differ with respect to the ideals to which we subscribe, these debates will likely continue. The work highlighted here therefore deserves more attention. It provides a practical solution that theoretically satisfies the demands of the law without severely compromising the predictive power of the algorithm.