NarrAid: Supporting Storytelling of People with Aphasia via Generative Visual Scene Displays

Xiangfei Hu, Xiuqi Zheng, Qi Liu, Zejian Li, Ying Zhang, Lu Wang, Xipei Ren · 2026 · Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA ’26) · doi:10.1145/3772363.3798625

Summary

NarrAid is a generative AI-driven Visual Scene Display (VSD) system designed to support storytelling, not just basic wants and needs, for people with aphasia (PWA). The authors argue that existing AAC tools, including traditional VSDs, support functional communication well but offer little scaffolding for narrative, and that VSD-based approaches in particular depend heavily on caregivers manually preparing personal photos that meet VSD compositional rules (e.g., contextualized scenes with the speaker engaged in activity, no direct gaze at the camera). The team conducted a formative study using speed dating with three storyboards (voice, sketch, and click-card interaction) involving Speech-Language Pathologists and caregivers, and produced a codebook of 11 challenges grouped into language barriers, lack of confidence, and interaction/technology constraints. Based on those findings they built NarrAid as a tablet web app with three core capabilities: structured narrative guidance using six narrative elements (time, place, people, actions, objects, emotions) with a free-expression fallback; multimodal input via voice plus categorized AAC pictographic image cards (sourced from ARASAAC) and five core buttons (Yes / No / Skip / Not clear / Generate); and a generative VSD pipeline that uses an LLM to synthesize a coherent narrative and a text-to-image model to generate VSD-compliant high-context scene images. A within-subjects preliminary user study (N=6 PWA-caregiver pairs, mild-to-moderate nonfluent aphasia, mean age 56.5) compared NarrAid against a non-AI baseline VSD over two sessions per participant.

Key findings

On a 5-point scale, NarrAid scored substantially higher than the baseline VSD on overall usability (M=4.35 vs 2.57, p=0.02) and on every story-creation dimension assessed: overall experience (4.67 vs 2.67), complexity of story (4.33 vs 2.67), agency (4.67 vs 1.50), motivation (4.83 vs 1.83), and accuracy (4.50 vs 2.67), all p<=0.03. For storytelling, NarrAid was rated higher on confidence (4.17 vs 2.83) and ability (4.00 vs 2.17), with engagement trending positive (4.50 vs 1.67, p=0.06). Qualitatively, PWA preferred selecting image cards over speech input because cards reduced linguistic load, and the structured six-element questioning let them answer with one or two words. Generated images established common ground with communication partners and reduced misunderstandings; ambiguity in generated images sometimes sparked useful conversation rather than impeding it. Limitations surfaced: text-to-image models occasionally produced stereotyped or inaccurate depictions (gender mismatches caused user confusion more than race mismatches), and caregivers expressed hesitation about prompt-text editing. The authors propose a generate-verify-display pipeline using a Multimodal LLM and content-safety APIs, plus human escalation for high-uncertainty outputs.

Relevance

This paper is directly useful for accessibility practitioners working on AAC, aphasia, and AI-assisted communication. It articulates a concrete gap in current AAC: tools optimize for transactional needs while narrative, identity, and social connection go unsupported, and it shows that generative AI can plausibly fill that gap without requiring caregivers to curate personal photo libraries. The six-narrative-element scaffold and the categorized image-card taxonomy (Time, Places, People, Actions, Objects, Emotions) are transferable design patterns for any AAC tool aiming at richer expression. The paper also surfaces a real risk that any team deploying generative AI in assistive contexts must plan for: text-to-image models can amplify gender, occupation, and race stereotypes, which is particularly harmful when the output is meant to represent the user's own life. As an extended abstract with N=6 over short sessions, the evidence is preliminary; longitudinal deployment in home and therapy contexts is needed to assess whether benefits persist.

Tags: aphasia · augmentative and alternative communication · visual scene display · storytelling · generative AI · large language models · text-to-image · human-AI co-creation · assistive technology