Examining Age-Bias and Stereotypes of Aging in LLMs

Sherwin Dewan, Ismail Shaikh, Connie Shaw, Abhilash Sahoo, Akshita Jha, Alisha Pradhan · 2025 · ASSETS 2025: 27th International ACM SIGACCESS Conference on Computers and Accessibility · doi:10.1145/3663547.3746464

Summary

This paper investigates how large language models encode and reproduce age-related stereotypes about older adults. Using prompts from the Bias Benchmarking Questionnaire (BBQ), a well-established fairness dataset, the researchers administered 1,648 age-bias prompts to ChatGPT (GPT-3.5) across two types of contexts: ambiguous scenarios (where information is insufficient to determine an answer) and disambiguous scenarios (where the answer is clear). Each prompt was queried ten times to account for probabilistic variation, and responses were embedded using GPT-3.5 embeddings then clustered via K-Means to select representative responses. The qualitative analysis employed a multi-phase coding process with four researchers independently coding responses, achieving a Krippendorff's alpha of 0.94. The study examined how the LLM responded to scenarios involving intergenerational interactions across domains including technology proficiency, cognitive and physical decline, resistance to change, workplace roles, and conservative values. The researchers analyzed both responses that exhibited explicit age-bias and those categorized as unbiased, finding that even ostensibly neutral responses often subtly alluded to ageist stereotypes.

Key findings

The study identified four major categories of age-bias in LLM responses. Technology proficiency stereotypes were most prevalent (N=175 responses), with older groups portrayed negatively 105 times versus 0 for younger groups, depicting older adults as "old-timers" who are "overwhelmed" by technology and "relied on the assistance of their grandchildren." Cognitive and physical decline stereotypes appeared in 104 responses, associating older adults with being forgetful, "struggling with simple tasks," and dependent on younger people. Resistance to change stereotypes appeared in 73 responses, and workplace stereotypes in 56 responses that associated younger people with tech jobs like "software engineer" while assigning older people managerial or retired roles. Crucially, the study found that even responses without explicit age-bias contained problematic patterns: "educational" responses (N=191) that avoided picking a group still casually mentioned stereotypes like older adults finding it "harder to keep up with the rapid pace of change." Some responses (N=22) subtly alluded to aging stereotypes while technically appearing unbiased. The LLM also frequently fabricated fictional content (N=37 in technology, N=10 in cognitive decline) that reinforced stereotypes. A preliminary quantitative comparison showed age-bias dropped from ~49% to ~17% between GPT-3.5 and GPT-4.0, but stereotypes and assumptions persisted qualitatively.

Relevance

This research has direct implications for the growing use of LLMs in applications that affect older adults, from healthcare tools and reminder systems to employment screening and educational content. The finding that even "unbiased" responses subtly encode ageist assumptions is particularly concerning, as it means bias mitigation efforts focused solely on explicit stereotyping will miss more insidious forms of age discrimination. For accessibility practitioners, this work underscores the need to critically evaluate AI-generated content when designing tools for or about older adults. The paper recommends involving older adults as human-in-the-loop annotators in dataset creation and model evaluation, while cautioning about internalized ageism that older adults themselves may hold. The suggestion to leverage educational explanations within LLM responses as a debiasing strategy offers a pragmatic path forward. As LLM-based assistive technologies increasingly target older users, understanding and mitigating these embedded stereotypes becomes essential for equitable design.

Tags: ageism · AI bias · large language models · older adults · stereotypes · fairness