Co-Designing Multimodal Systems for Accessible Asynchronous Dance Instruction

Ujjaini Das, Shreya Kappala, Meng Chen, Mina Huh, Amy Pavel · 2026 · Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI '26) · doi:10.1145/3772318.3791376

Summary

This paper investigates how to design multimodal systems that make asynchronous dance instruction accessible to blind and low vision (BLV) learners. While online exercise videos have proliferated, particularly since COVID-19, dance tutorials rely heavily on visual demonstrations that BLV users cannot see, and audio description alone struggles to convey the continuous mechanics, timing, fluidity, and expressiveness that dance requires. The authors argue that asynchronous settings amplify these challenges because learners cannot ask questions or receive real-time instructor adaptation. To address this, the team ran three in-person co-design workshops with 28 participants: 13 BLV dancers, 11 dance teachers, and four technical experts in sound design, haptics, and audio description. Each three-hour workshop followed a five-stage structure: pre-session survey, introduction, small-group teaching of a 2.5-minute dance clip, small-group system design using prototyping materials (3D figures, a custom soundboard, tactile stickers, voice recorders), and group presentations with physical re-enactments. Dance styles covered included salsa, contemporary, hip-hop, pop, and cha-cha. Eight groups produced eight distinct system designs. The authors performed thematic analysis across transcripts, survey results, and recordings to identify shared strategies for conveying movement structure, timing, and expressiveness, and to surface how verbal narration, non-verbal audio, and haptic or tactile input can complement one another across learning and practice phases.

Key findings

Across all eight co-designed systems, participants converged on several strategies. First, groups built multimodal movement vocabularies that combined named movements with verbal descriptions, sound-to-movement mappings, and haptic cue patterns so that complex routines could be referenced as shorthand. Second, staged learning emerged as essential: early learning phases prioritized detailed verbal descriptions of mechanics, orientation, and local body-part motion, while later practice phases emphasized non-verbal sound and haptic cues, eventually layered with the actual dance music. Third, participants assigned modalities to specific roles - verbal narration for movement structure, sound for rhythm, pacing, transitions, and emotional expression, and haptics for spatial cues, limb placement, direction, weight distribution, and timing. Tactile metaphors (e.g., "your hand is a sword slicing through the air") were the most common metaphor type. Groups warned that overloading haptic cues increased cognitive load and that mapping too many motion parameters to sound reduced clarity. The authors distilled 14 design implications (DI1-DI14) spanning movement structure, timing and expressiveness, phased instruction, feedback, and customization, including the need to tailor systems to dance style and learner skill level.

Relevance

This work is highly relevant for accessibility practitioners building instructional media, assistive technologies, or audio description pipelines for movement-heavy content. It expands the scope of video accessibility beyond descriptive narration by showing how sound design and wearable or off-body haptics can carry information that words alone cannot convey: timing, fluidity, weight, and expressiveness. Practitioners working on exercise, fitness, and educational video platforms can apply the staged-learning model and the verbal/sound/haptic division of labor directly. The paper also reinforces that customization by prior experience, dance style, and modality preference is not optional but foundational. Limitations include the exploratory nature of the co-design outcomes (no high-fidelity prototype evaluation), an in-person workshop setting that does not fully mirror true asynchronous use, and limited coverage of improvisational dance forms. Future empirical validation of the design implications in deployed systems would strengthen the findings.

Tags: blind and low vision · audio description · haptics · multimodal instruction · co-design · dance education · video accessibility · sound design