I Can Do It: Exploring Voice Assistants for Adults with Intellectual Disabilities
Madhuka Nadeeshani, Jacqueline Johnstone, Kirsten Ellis, Swamy Ananthanarayan · 2026 · Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI '26) · doi:10.1145/3772318.3790573
Summary
This CHI 2026 paper reports an eight-week deployment of screen-based voice assistants (Google Nest Hub and Amazon Echo Show 8) with 17 adults with intellectual disabilities (ID) — including co-occurring autism spectrum disorder, cerebral palsy, Rett syndrome, and partial trisomy 9p — and 4 support workers ("coaches") across five sites of an Australian disability-support organisation running a STEAM (Science, Technology, Engineering, Arts, Mathematics) program. The paper addresses a gap in the voice-assistant literature: most prior work on specific-need groups studied older adults or blind users, with ID participants typically engaged in short, structured tasks rather than naturalistic long-term use. The study used a three-phase design: preliminary interviews, deployment interviews at week 4, and post-reflection interviews at week 8. Data sources included 240 hours of multimodal interaction observation, voice-interaction logs retrieved from Google My Activity and Alexa Voice History, and inductive reflexive thematic analysis of transcripts (Braun and Clarke). Participants initiated 260 logged interactions (234 on Nest Hub, 26 on Echo Show) across 17 topic categories; usage peaked in week 4 and stabilised. The paper ends with six design considerations: auto-correction for mispronunciation and grammar, customisable listening windows, simplified and easy-to-understand responses (using Lexile-score adaptation), support for VA routines, scaffolding attention and engagement in VUIs, and multimodal input/feedback.
Key findings
Participants used VAs for information retrieval, entertainment, and learning, with 62% of queries general-purpose and 38% STEAM-related. Over the eight weeks, error rates dropped substantially for participants who adapted their strategy: P3's error rate fell from 39% (weeks 1-4) to 12% (weeks 5-8) after she began preparing and rehearsing queries in advance. Across all logged Google Assistant interactions, the overall error rate was roughly 15% (234 successes, 35 errors), driven by mispronunciation (16 errors), ambiguous questions (10), and timeouts (9). Three dominant mispronunciation patterns were identified: minimal-pair confusion ("leader" vs "ladder"), phonemic substitution ("sandwich" vs "Swedish"), and named-entity distortion ("Makey Makey" vs "Mickey Mickey"). Peer scaffolding emerged as a critical and under-reported mechanism — peers repeated commands, modelled phrasings, and suggested retries — alongside coach support. Several participants formed parasocial or companion-like relationships with the devices (greeting them, commenting on responses), which the authors interpret cautiously: sustained use supported perceived social connectedness without displacing human contact, because interactions were embedded in group sessions. Comprehension breakdowns were common: VA responses (e.g., explaining "apple" at roughly a Grade 7-8 reading level) routinely exceeded participants' comprehension, motivating the Lexile-score-based simplification recommendation. Non-verbal and minimally verbal participants (P5, P7, P8) faced persistent barriers, reinforcing the need for multimodal input.
Relevance
For practitioners building voice interfaces, the paper offers concrete, specification-level design changes rather than general principles: make the silence-threshold (endpoint detection) configurable so users who stammer or pause are not cut off; expose a "simplify further" control on screen devices; integrate repair mechanisms that surface candidate matches when confidence is low; and tune responses to Lexile levels 1-3 for users with ID. The finding that peer scaffolding — not just caregiver mediation — sustains adoption is a useful corrective for researchers planning 1:1 VA deployments. The six design considerations also transfer to older adults with MCI, non-native speakers, and users with specific-learning disabilities. Limitations to flag: the two devices were introduced at different times (Nest Hub first, Echo Show three weeks in), confounding device comparisons; the structured disability-support setting may have elevated engagement above what homes or workplaces would see; wake-word failures were not visible in logs, so real error rates are likely higher than reported; and the authors acknowledge no team member identifies as having an ID, underscoring that the design considerations still need to be validated via co-design with users with ID.
Tags: voice assistant · intellectual disability · smart display · cognitive accessibility · qualitative research · speech recognition · peer support