Examining and Mitigating Ability-bias in LLMs via Self-Reflection

Neel Iyer, Akshita Jha, Alisha Pradhan · 2025 · Proceedings of the 22nd International Web for All Conference (W4A) · doi:10.1145/3744257.3744268

Summary

This short paper investigates ability bias in large language models — the tendency of LLMs to encode and perpetuate stereotypical or discriminatory associations about people with disabilities. Using the Bias Benchmarking Questionnaire (BBQ) dataset, the authors administered prompts to OpenAI's GPT-3.5 model across six disability types: autism, Down syndrome, visual impairment, bipolar disorder/schizophrenia, PTSD, depression, and deaf/hard of hearing. The BBQ dataset presents scenarios in both ambiguous (under-informative) and disambiguous (adequately informative) contexts, asking the model to choose between a group with a disability, a group without, or "cannot be determined." The researchers then applied a self-reflection approach through prompt chaining: after the model's initial response, it was asked to explain its reasoning, identify any biases or stereotypes in its answer, and then rewrite its response. The full dataset comprised 278 responses (140 ambiguous, 138 disambiguous) which were qualitatively coded using a multiphase approach with inter-rater reliability (Cohen's kappa averaging 0.95). The study surfaces specific linguistic associations the model encodes for different disabilities and examines how the types of justifications provided in LLM explanations affect perceived trust.

Key findings

The study revealed distinct stereotypical associations encoded in GPT-3.5 for each disability type. Autism was linked to difficulty expressing emotions and "socially unconventional ways"; visual impairment with reduced independence and needing more support; depression with lack of motivation and increased absenteeism; and deaf/hard of hearing with communication barriers and difficulty maintaining employment. In disambiguous contexts, initial responses contained assumptions about the disability group in 28 cases, which dropped to 38 after rewriting — though notably no rewritten responses contained stereotypes (vs. N=1 initially). In ambiguous contexts, initial responses were much more likely to contain stereotypes (N=63) compared to rewritten responses (N=1). The self-reflection approach significantly increased responses matching the expected "cannot be determined" answer for ambiguous contexts (N=128 rewritten vs. N=89 initial). Rewritten responses also showed improved justification patterns: greater emphasis on fairness (N=41 vs. N=16), more acknowledgment of limited context (N=32 vs. N=11), and fewer instances of falsely inferring from context (N=20 vs. N=43). However, some rewritten responses (N=11) got worse, and the model occasionally introduced new information not in the original prompt.

Relevance

This research is critically important as LLMs are increasingly used in consequential domains like employment screening, education, and healthcare — all areas where disability bias can cause direct harm. The finding that GPT-3.5 associates specific disabilities with reduced independence, employability challenges, and social difficulties mirrors real-world ableist stereotypes that already disadvantage disabled people. For accessibility practitioners, the study demonstrates that even well-intentioned AI tools can encode harmful assumptions about disability. The self-reflection prompt chaining approach offers a practical, low-cost debiasing technique that developers can implement without model retraining. However, the approach is not foolproof — some responses worsened after reflection, and the method was only tested on one model (GPT-3.5) with one dataset. The paper's analysis of how LLM justifications shape trust is particularly relevant for applications where AI-generated explanations accompany decisions about people with disabilities.

Tags: ability bias · ableism · LLM bias · debiasing · AI fairness · disability representation · prompt chaining