Face Recognition and the Pyramid of Power

February 10, 2019 Liz O'Sullivan

Spoiler Alert: Prisoners are at the bottom.

Last week, IBM released a landmark dataset consisting of 1 million faces, where each face has annotations describing their age, gender, and skin tone (along with various other “craniofacial features” that include measurements of the things you’d expect to see).

Why is this a big deal? Well, as a practitioner in AI, bias isn’t just a buzzword, it’s a daily reality. Every model you try to build will be biased; this is guaranteed. The key to launching a successful product is to ensure that your particular mode of bias isn’t, as the community loves to say, “problematic.”

An example of problematic bias would be if all the training data for your wedding model is made up of photos that are heteronormative, and as a result, your model can’t predict same sex couples as “brides.” An example of unproblematic bias would be if your cat model can’t recognize Scottish Folds (apologies to the Scottish Fold community if this offends).

If you want to release a product that involves face recognition and you don’t want to be problematic, the only way to check whether it will perform differently on visually distinct groups of people is to test it with a statistically significant set of photos where the ground truth annotation (in terms of age/gender/skintone) is known to be accurate. And before the IBM dataset, there weren’t many great options out there.

As a practitioner, it was my job to think deeply about how to disentangle the bias that I knew existed in our models. And then it was my job to go out and find photos to help us do that.

Easier said than done.

The first thing I had to do was decide which groups to test for. Age? Gender? Race? The first seems fairly straightforward (it’s not), but the other two… where would I even begin? It occurred to me that we weren’t even in a position to start thinking about disability or deformity, because were we to include those criteria, the project would be prohibitively expensive.

So maybe let’s focus on race and gender? Ok - but which races and which genders come first? And since I was doing this in a business setting, I had to think about how this product would be used. Could it be incorporated into predictive policing models? Or loan applications? What would the effect of neglecting a protected community be, and what effect would that have on the people subject to the model's predictions?

If I decide not to specifically test for Pacific Islanders, for instance, and instead nest them under a blanket category of “Asian,” would they be less likely to get accepted into a college if that school used our model as part of the application process?

If you’re a researcher or a startup, there aren’t many things you can do without regard to cost. And so this IBM dataset seems like a huge win for the community, a solid contribution and a good start to help other researchers and practitioners check for bias that might affect certain people disproportionately.

But as you might expect, they stole the idea from someone. And of course, she’s a woman of color.

Joy Buolamwini may not yet be a household name on the order of IBM, but if you’re in AI or around it, you should definitely know who she is. Her explosive and important research called Gender Shades was the first to publicly assess and report on the disparities of facial recognition products with regard to gender and skin tone in the AI community.

A whole calendar year later, her 2018 research is still under attack. IBM didn’t attribute any aspect of this dataset contribution to the work she did to inspire the need for it. Amazon came out and explicitly lied about her in order to undermine the work she’s done, because it makes them look bad. And there really isn’t much she can do about it. They have all the power.

Even if we just think about creating the dataset itself, IBM had to spend a great deal of money to run that million photos through an extensive annotations pipeline. I would guess it cost them something on the order of $100,000, based on market rates for annotation companies today. IBM happened to have enough money to throw at the issue, and in doing so got some great press that makes them seem like a pioneer in the field of AI Fairness.

So, at the top of the pyramid we have big tech companies who can throw money around to generate headlines about how “fair” they are, and how much they love “diversity.” The money and power flow from them all the way down to the most disadvantaged people on the globe.

Slightly below AppAmaGoo (et al.) are the practitioners like me and researchers like Joy. We are working on the same issues with limited resources, trying to be respectful of the communities we might affect. If I could’ve thrown money at the issue of trying to de-bias our facial recognition products, my employer would have loved to create and open source a diversity dataset in order to create a new standard that would help the community at large. I’ve never been in a position to do so. It is worth mentioning, though, that as a white woman with a fancy education, I was the one in a position of power to decide which parts of the population we’d want to investigate for fairness first.

As practitioners and researchers, we often need to make use of image labeling services like iMerit or Samasource, both of whom are amazing companies that provide full time jobs to underprivileged segments of India's and Africa’s populations (respectively). But if you think for a minute about how these countries came to be so riddled with underprivileged people in need of jobs, Western Colonialism has to be a part of your answer. The native populations didn’t have enough power to stop conquering foreigners from taking over, long ago. And the effects of this colonization are still widely felt.

Under the people creating the AI on the pyramid are the people feeding us one label at a time, in giant, nondescript office buildings all around the world, clicking on images, all day every day, for less than $1 an hour.

And by Indian and African standards, these jobs are actually really good.

But one last thing... Because if not for this amazing gift from IBM, those of us building things or doing research on these issues would have been subject to the tools that were available before its existence. In fact, a lot of practitioners and researchers have used the popular datasets put out by the National Institute of Standards and Technology. They have one of the few datasets out there that has annotations for skin tone instead of the more ambiguous and potentially problematic issues inherent in labeling for “race.” They have other datasets too that practitioners often use for training facial recognition products. One dataset in particular…

is made up of mugshots.

Thousands of faces.

Not one instance of consent.

Let’s not forget what makes facial recognition technology so scary: government entities are already hard at work trying to integrate this technology into law enforcement body cams, traffic lights, and CCTV with the intention of tracking people of interest and making surveillance and oppression a whole lot easier.

That’s the pyramid: the prisoners are the faces. Impoverished populations make the labels. Employees and researchers make decisions about how and what to build. Big tech takes all the credit, and in the end we’re all surveilled.

So yeah, prisoners are at the bottom of the pyramid, but in a way, we all are.

That’s why I’m writing to congress and telling my local officials that we don’t want facial recognition technology in our public cameras. I hope you’ll join me, before it’s too late.