Giving Meaning to Movements: Challenges and Opportunities in Expanding Communication by Pairing Unaided AAC with Speech Generated Messages
Imran Kabir, Sharon Ann Redmon, Lynn R. Elko, Kevin Williams, Mitchell A. Case, Dawn J. Sowers, Krista Wilkinson, Syed Masum Billah · 2026 · Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI '26) · doi:10.1145/3772318.3791273
Summary
Kabir and colleagues tackle a long-standing split in Augmentative and Alternative Communication (AAC): aided AAC — speech-generating devices, symbol boards, tablet apps — produces standardised, intelligible output but is slow, visually demanding, and awkward when partners are out of line of sight; unaided AAC — body-based gestures, facial expressions — is fast and natural but depends on familiar partners who share the user's idiosyncratic vocabulary. The authors close the gap by translating wrist-worn IMU-captured gestures into speech output. Their process ran for 18 months of participatory design with three community advisors (A1-A3) — AAC users with motor and/or vision impairments — plus speech-language pathologists, AT developers, and designers. PD surfaced five target use cases (quick messages like "I need a drink", communication at a distance or around obstacles, emotion signalling, rapid onboarding of new caregivers, and optional wheelchair control) and four core technical challenges (synchronising video and sensor data for annotation, clutching to prevent false activations, automatic sensor-orientation calibration, and adapting to personal, context-dependent gesture repertoires). Out of this came AllyAAC, an Android app paired with a 9.4 g Movesense IMU that lets users or aides record template gestures, semi-automatically annotate them against synchronised video using MediaPipe, and either train a rule-based (LCSS/K-means template-matching) baseline or a personalised Transformer-based deep model with three self-supervised pretraining strategies (contrastive learning, CPC, masked reconstruction). The system was evaluated with 14 participants at the ATIA'25 conference (eight AAC users with motor impairments including two with visual impairments, plus six aides/researchers), producing a released dataset of more than 600,000 multimodal data points — the first IMU-plus-video corpus of atypical gestures from motor-impaired users.
Key findings
The rule-based baseline reached a mean F1 of 0.589 across participants but collapsed on users with multiple or overlapping gestures — dropping to 0.22 for P05 and 0.26 for P07. The personalised Transformer model, pretrained with one of three self-supervised strategies and fine-tuned on 3-20 annotated examples per gesture, reached mean F1 = 0.871 — a 28 percentage-point gain — and F1 > 0.82 for seven of eight motor-impaired participants. Pretraining strategy mattered per user: contrastive learning won for globally distinct gestures; CPC (contrastive predictive coding) won for gestures with longer distinctive trajectories (e.g., air-traced letters B/P/R); masked reconstruction won when gestures differed in localised motion. A motor-impairment gap remained — non-disabled participants scored F1 = 0.972 vs. 0.796 for motor-impaired users — and t-SNE visualisations showed overlapping embeddings for motor-impaired gestures that were cleanly separable for non-disabled ones. Six independent raters evaluating the best model in human-in-the-loop review yielded mean precision 0.88 (AC1 = 0.92 agreement). The semi-automatic annotation pipeline delivered F1 = 0.95 (IoU = 0.88) against manual annotation and cut annotation time 66% (10.2 s vs. 30.5 s per gesture). Three community advisors adopted AllyAAC into daily use alongside existing AAC devices. Qualitative findings highlighted usability strengths (video + sensor annotation felt empowering) and friction points (data transfer, model import, sensor placement variability, need for clutching and sensor-orientation compensation).
Relevance
For AAC practitioners and assistive-technology developers, this paper is one of the clearest demonstrations that the aided/unaided dichotomy is an artefact of tooling, not user need, and that personalised on-device ML can bridge the two without requiring users to conform their bodies to a generic gesture vocabulary. Concrete takeaways for broader design practice: publish IMU-plus-video datasets from disabled users to counteract the able-bodied bias in human-activity-recognition benchmarks (UCI-HAR, OPPORTUNITY, PAMAP2, UTD-MHAD, Berkeley MHAD); treat end-user annotation as an empowerment affordance but automate data transfer and model deployment because end-user programming burden was the sharpest complaint; include clutching, calibration, and error-recovery gestures as first-class features; and use self-supervised pretraining over unlabelled user recordings to make few-shot personalisation realistic. Important limitations to carry forward: 14 participants, mostly community-advisor-mediated; lab-style evaluation at a conference rather than longitudinal home deployment; no fine-grained hand/finger articulation (the IMU is wrist-only); and gesture meaning that shifts with body posture, seating, or surrounding activity remains brittle. The released dataset and code (github.com/Imran2205/AllyAAC) are a concrete resource for the disability AI community.
Tags: AAC · augmentative and alternative communication · motor impairment · speech impairment · wearable technology · IMU · gesture recognition · machine learning · personalization · participatory design · datasets