Situated Understanding of Errors in Older Adults' Interactions with Voice Assistants: A Month-Long, In-Home Study

Amama Mahmood, Junxiang Wang, Chien-Ming Huang · 2026 · ACM Transactions on Accessible Computing · doi:10.1145/3796236

Summary

This month-long field study examined how 15 older adults (ages 66-94) interact with voice assistants in their homes, with a particular focus on errors and conversational breakdowns. The researchers deployed Amazon Echo Dot smart speakers augmented with custom audio recording devices that captured complete interactions plus 10 seconds of post-interaction audio to record users' immediate reactions and recovery attempts. The study collected 2,552 one-turn interactions over four weeks. During weeks 3-4, researchers also deployed a ChatGPT-powered Alexa skill to explore how LLM integration affects older adults' interaction dynamics. The methodology represents a significant advance over traditional approaches that rely on usage logs and post-hoc interviews, which fail to capture the nuances of real-time error encounters and spontaneous user reactions. The research analyzed error types, resolution rates, recovery strategies, and how these patterns evolved over time. Qualitative analysis of post-interaction interviews explored participants' perceptions of error causes, blame attribution, and expectations for VA behavior.

Key findings

Nearly one in four interactions (24.76%) resulted in errors, with VA errors (intent recognition failures) being most common (32.3% of errors), followed by speech recognition errors (24.4%) and human errors (11.4%). Critically, error rates remained consistent across all four weeks—older adults did not improve at avoiding or recovering from errors through experience alone. Only 25.47% of errors were resolved on the immediate retry attempt. Participants recognized approximately 80% of errors through their verbal/vocal reactions or recovery attempts, yet VAs rarely leveraged these cues to initiate self-repair. Speech recognition errors had particularly low resolution rates (31.8%), reflecting VAs' limited accommodation of older adults' slower speech patterns and age-related speech characteristics. The ChatGPT-powered VA showed higher error resolution rates (14.1% vs 8.7%) and better contextual continuity. However, it introduced new interaction challenges: older adults struggled to distinguish between standard and LLM-enabled capabilities, and the conversational framing ("let's chat") created mental model mismatches where users perceived the VA as a social partner rather than a functional tool, paradoxically reducing their willingness to attempt error recovery.

Relevance

This research provides crucial evidence for designing voice assistants that work effectively for older adults—a population with high VA adoption rates who stand to benefit significantly from voice interfaces. The finding that error rates don't decrease with experience challenges assumptions that users will "learn" VA interaction patterns over time. The nine design considerations offer actionable guidance: VAs should leverage users' immediate vocal and verbal reactions for error detection; accommodate slower speech patterns and extended speaking time; proactively suggest actions rather than merely reacting to commands; provide transparent information about capabilities rather than anthropomorphic framing; and implement context-sensitive feedback during error recovery. For accessibility practitioners, the study highlights that LLM integration alone won't solve interaction challenges for older adults—in fact, more sophisticated conversational capabilities can introduce new barriers. Effective design requires understanding age-specific factors like forgetfulness, speech difficulties, and preference for natural conversation styles, then building systems that explicitly accommodate these characteristics rather than expecting users to adapt.

Tags: voice assistants · older adults · conversational AI · errors · smart speakers · large language models · aging · speech recognition · in-home study