ACES: Aphasia Emulation, Realism, and the Turing Test

Joshua Hailpern, Marina Danilevsky, Karrie Karahalios · 2011 · Proceedings of the 13th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2011) · doi:10.1145/2049536.2049553

Summary

This paper validates the realism of ACES (Aphasia Characteristics Emulation Software), a system that distorts a user's instant messages to simulate the communication effects of aphasia, allowing caregivers, therapists, and family members to experience first-hand what it is like to communicate with a language disorder. A previous study with 64 participants had shown that ACES significantly increased empathy and awareness of aphasia, but had not validated whether the generated distortions were realistic. ACES uses a configurable probabilistic model grounded in cognitive psychology and speech and hearing science literature to apply distortions across five conceptual categories, and can emulate different types and severities of aphasia. The system was calibrated for each individual with aphasia by adjusting parameters until the distorted output visually resembled that person's actual transcripts. This paper presents two experiments with 24 participants from Speech and Hearing Science departments (students, faculty, and professionals — people specifically trained to identify and distinguish aphasia subtypes). The Aphasia Turing Test asked participants to label 24 text samples as "Human" or "Computer" in origin, where half were actual transcripts from people with aphasia and half were ACES-generated distortions. The "How Human" Test asked participants to rate 24 pairs of distorted text on a 1-5 Likert scale from "Definitely Human" to "Definitely Computer."

Key findings

In the Turing Test, participants correctly classified text samples only 52.26% of the time overall — barely above the 50% chance level — with no statistically significant difference between their ability to identify human-generated versus ACES-generated distortions (z=-0.75, p=0.46). This means ACES effectively passed the Turing Test: even speech and hearing science experts could not distinguish computer-generated aphasic distortions from real ones. Results varied by aphasia type: Anomic aphasia samples were correctly identified 60% of the time from the Human Group but only 54% from the Computer Group (no significant difference), while Agrammatic aphasia samples showed a surprising reversal — participants correctly identified only 41% of human-generated Agrammatic samples, significantly worse than chance (p=0.04), suggesting they may have found real Agrammatic aphasia harder to recognise than the computer emulation. In the "How Human" Test, overall ratings showed no significant difference between human and computer-generated distortions (means of 2.94 vs 3.05 on the 5-point scale, p=0.24). However, when stratified by aphasia type, Anomic ACES distortions were rated slightly more computer-like than human ones (p<0.001), while Agrammatic ACES distortions were paradoxically rated as more human than the actual human transcripts (p=0.02).

Relevance

ACES represents an important approach to building empathy for people with communication disabilities through experiential simulation rather than description alone. The validation that expert judges cannot distinguish ACES distortions from real aphasic text strengthens the case for using such software in clinical training, caregiver education, and awareness programmes. For accessibility practitioners, this work highlights a broader principle: understanding a disability through first-hand experience (even simulated) can be more effective at building empathy than explanation or instruction. The technology could be adapted for training customer service representatives, healthcare workers, or emergency responders who interact with people with aphasia. The finding that even experts trained in aphasia diagnosis struggled to distinguish real from simulated distortions underscores both the quality of the emulation and the inherent variability and complexity of aphasic language production. The surprising result that Agrammatic ACES distortions were rated more human-like than actual Agrammatic transcripts raises interesting questions about how well clinical literature captures the full range of real-world aphasic communication.

Tags: aphasia · empathy · disability simulation · language disorders · instant messaging · Turing test · speech and hearing science · caregiver training