Unintended Machine Learning Biases as Social Barriers for Persons with Disabilities
Ben Hutchinson, Vinodkumar Prabhakaran, Remi Denton, Kellie Webster, Yu Zhong, Stephen Denuyl · 2020 · SIGACCESS Accessibility and Computing · doi:10.1145/3386296.3386305
Summary
This paper from Google's Ethical AI team provides concrete empirical evidence that widely deployed NLP models encode measurable biases against people with disabilities, creating social barriers through technology. The authors examine three layers of bias using a perturbation methodology with 56 linguistic phrases for referring to people with various disabilities (compiled from Anti-Defamation League, SIGACCESS, and ADA National Network guidelines). First, they demonstrate that the Perspective API toxicity model — used to moderate online conversations — classifies texts mentioning disability as more toxic. The sentence "I am a person with mental illness" receives a toxicity score of 0.62, compared to 0.03 for "I am a tall person" and 0.08 for "I am a person." Similarly, "I will fight for people with mental illness" scores 0.54, versus 0.14 for "I will fight for people." This means innocuous comments discussing disability are disproportionately flagged for moderation, potentially suppressing disability-related speech online. Second, a sentiment analysis model rates texts mentioning disability as systematically more negative across all disability categories. Third, the authors examine BERT — the foundational language model underlying countless downstream NLP applications — using fill-in-the-blank analysis. When prompted with "[disability phrase] is ___", BERT's top-10 word predictions produce negative sentiment scores at significantly higher rates for disability phrases than for non-disability phrases, with mobility (35%), short stature (34%), and cerebral palsy (34%) showing the highest rates of negative associations compared to 18% for "person without a disability."
Key findings
The bias operates at every level of the NLP pipeline: in the training data, in foundational language representations, and in deployed classification models. Analysis of the Jigsaw toxicity classification dataset revealed that discussions about mental illness are statistically over-represented alongside topics of homelessness (log-odds score 12.2), gun violence (8.4), drugs (6.2), and addiction (4.7) — social topics that carry negative connotations and explain why models learn to associate disability with toxicity. Crucially, even "recommended" disability terminology (person-first, respectful language) triggers bias in these models: the toxicity model associates recommended phrases with increased toxicity, and BERT generates more negative word associations for recommended disability phrases than for neutral phrases. Non-recommended (derogatory) phrases produce even stronger bias effects, with an average toxicity increase of 0.06 versus 0.01 for recommended terms. The authors frame these as representational harms — distinct from allocative harms — that perpetuate harmful stereotypes about disabled people and can moderate how persons with disabilities engage with technology by suppressing their online speech about their own experiences.
Relevance
This paper provides the most direct empirical evidence in our collection that ML models actively encode and perpetuate ableist attitudes from their training data. The practical implications are immediate and serious: content moderation systems powered by these models may silence disability-related discourse online, sentiment analysis tools may skew public opinion monitoring about disability topics, and BERT's biased representations propagate into every downstream application built on top of it — from search engines to hiring tools to chatbots. For accessibility practitioners, the key message is that ML fairness work must explicitly include disability as a protected category, not just race and gender. The finding that even respectful disability terminology triggers bias means the problem cannot be solved simply by choosing better words — it requires fundamental changes to training data, model architectures, and evaluation practices. The paper complements Guo et al.'s research roadmap and White's philosophical analysis by providing concrete, measurable evidence of the harms they theorize.
Tags: machine learning bias · NLP · toxicity detection · sentiment analysis · disability representation · content moderation · BERT · word embeddings · social barriers · stereotyping