Shiny Stories, Hidden Struggles: Investigating the Representation of Disability Through the Lens of LLMs

Marco Bombieri, Simone Paolo Ponzetto, Marco Rospocher · 2026 · ACM Transactions on Intelligent Systems and Technology · doi:10.1145/3806202

Summary

This paper investigates how Large Language Models (LLMs) represent disability by comparing AI-generated social media posts with self-descriptions from real people with disabilities on Reddit. The study addresses a critical gap in bias research: while prior work has focused on detecting negative stereotypes in AI systems, relatively little attention has been paid to positive idealizations—cases where debiasing efforts overcorrect, producing unrealistically optimistic portrayals that erase the complexity and struggles of marginalized communities. The authors constructed two datasets. First, REDd: 1,250 Reddit posts from six disability-related subreddits (r/disability, r/blind, r/autism, r/depression, r/deaf, r/cerebralpalsy), filtered to first-person self-descriptions by individuals with disabilities, with inter-annotator agreement of 0.875 (Fleiss' Kappa). Second, LLMd/LLMnd: posts generated using three models—Gemini-1.5F, GPT-4o-mini, and Mixtral-8B—prompted as either a person with a specific disability (P1–P6) or a generic person (P7) across six activity types, with 360 posts per model per condition. Posts were analyzed using three quantitative metrics: sentiment (VADER), emotion distribution (NRC EmoLex), and depression indicators (LT-EDI-ACL2022 model). Statistically distinctive words were identified using z-scores (the "Fightin' Words" methodology), and thematic clusters were generated using GPT-4o-mini. This multi-dimensional analysis enables a comprehensive comparison of how disability is depicted in real versus AI-generated discourse.

Key findings

The results reveal a stark divergence between real and AI-generated representations. LLM-generated posts about people with disabilities are overwhelmingly positive in sentiment: 96.39%–99.72% positive across the three models, compared to only 46.94% positive for Reddit posts—53.06% of which expressed overall negative sentiment. Depression analysis showed LLMs produce almost no signs of depression (0%–4.17% moderate) versus 20.42% severe and 26.26% moderate depression indicators in the Reddit dataset. Distinctive words in LLM-generated content cluster around joy, gratitude, community, inspiration, strength, and resilience—closely matching inspiration porn tropes. Real Reddit posts feature words related to pain, medical systems (doctor, hospital, surgery), financial precarity (unemployed, homeless, money), and emotional distress (suicidal, anxiety, crying). For RQ2, comparing LLM posts about disabled versus generic personas, LLMs associate disability with disproportionately more negative emotions than generic personas—simultaneously idealizing through inspirational narratives while negatively marking disability by over-indexing on struggle and advocacy, and under-representing everyday topics like career, entertainment, and leisure that feature prominently in non-disability posts. The data and code are publicly available at https://github.com/marcobombieri/LLM-disability-representation.

Relevance

This paper is highly relevant to accessibility practitioners and AI developers building systems that interact with or represent people with disabilities. LLMs are increasingly embedded in content generation, accessibility tools, chatbots, and support services—and if these models systematically misrepresent disability through toxic positivity or inspiration porn, they can inadvertently harm the communities they aim to serve. The paper extends the disability representation literature into the AI domain, providing empirical evidence that seemingly "safe" positive outputs can perpetuate exclusionary narratives by erasing complexity. The openly released datasets offer a resource for future accessibility-relevant AI bias research. For practitioners building accessible AI tools, the findings underscore the need for community co-evaluation of AI-generated content, nuanced debiasing approaches that preserve emotional authenticity, and explicit design policies ensuring AI systems can represent the full spectrum of human experience—including pain, frustration, and systemic barriers—rather than defaulting to sanitised optimism.

Tags: AI bias · large language models · disability representation · inspiration porn · toxic positivity · sentiment analysis · inclusion · social media · debiasing