Karen Hao recently published an article for MIT Technology Review on the grave implications of recent research finding computer vision AI to be sexist. Read it here.
Image-generation algorithms, powered by computer vision, are beginning to display some of the same bias that has previously been found in language-generation ones. Language-generating algorithms are trained on the internet, allowing it to learn from the hate speech and misinformation of the Internet as well. Ryan Steed and Aylin Caliskan analyzed two image-generation models, which both, importantly, used unsupervised learning. This means the algorithms were not trained with human labels (like the label “cat” for a photo of a cat); this shift to unsupervised learning was in part motivated by the finding of disturbing language that targets women and minorities in these human-generated labels. They studied the algorithms in a similar manner to how natural language processing models are studied. NLP models use word embeddings to cluster those found together and separate those found apart — the unsupervised image-generating algorithms studied here cluster or separate pixels based on how often they are found together.
What Steed and Caliskan found was that 43% of a type, a photo of a man cropped beneath his neck would return him wearing a suit, whereas 53% of the time, a photo of a woman — even US Representative Alexandria Ocasio-Cortez — would be autocompleted with an image of her wearing a bikini or low-cut top. Similarly to how hate speech on places like Reddit and Twitter encoded such bias into language-generating algorithms, an overrepresentation of photos of scantily clad women, as well as of men appearing with ties and suits, led to these biased autocompletions.
The researchers identified a problem common to the field of AI at large — algorithms trained on biased data that oten over- or underrepresents certain groups can propagate this bias. When such models are deployed in fields like policing and hiring, bias like the kind found by Steed and Caliskan can target women disproportionately. They called for more responsible sourcing of training datasets, more testing of these models, greater transparency from those developing such models, and potential open-sourcing and supervision from the academic community.