Remotely Co-Designing Features for Communication Applications using Automatic Captioning with Deaf and Hearing Pairs

Matthew Seita, Sooyeon Lee, Sarah Andrew, Kristen Shinohara, Matt Huenerfauth · 2022 · Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI '22) · doi:10.1145/3491102.3501843

Summary

This CHI 2022 paper addresses two intertwined problems. First, methodologically, how can co-design research involving both Deaf/Hard-of-Hearing (DHH) and hearing participants be conducted remotely during and beyond COVID-19, when in-person sessions are not possible and masks interfere with speechreading and sign reading? Second, substantively, what features should be built into ASR-supported communication applications — for mobile use in impromptu workplace conversations and for videoconferencing tools like Zoom — to make them genuinely accessible to DHH users? The authors ran 18 two-hour online co-design workshops, each pairing one DHH participant with one hearing participant (36 total participants recruited through Rochester Institute of Technology and NTID channels). Half the pairs focused on designing mobile ASR apps (for post-COVID in-person use) and half on videoconferencing ASR features. Sessions used Zoom for conversation, ASL interpreters, and the Miro online whiteboard for collaborative sketching, with a structured four-phase workshop procedure: individual brainstorming, a dessert-drawing icebreaker, individual sketching, and collaborative prototyping. Two design dimensions framed the work: (1) how errors in ASR output should be indicated and corrected, and (2) how to implement notification systems that nudge hearing speakers to adjust their speaking behaviors (speed, volume, clarity). Data came from video recordings, exit interviews, and the produced Miro prototypes, analysed through iterative inductive coding by two researchers.

Key findings

Three successful mixed-ability remote communication strategies emerged. (1) Hearing Draw and DHH Reinforce/Critique (5 pairs): the hearing partner narrated while drawing, the DHH partner watched and commented, with Miro and Zoom positioned side by side on the DHH participant's screen. (2) Agree First then Divide Responsibility (3 pairs): pairs discussed until consensus, then split drawing duties. (3) Simultaneous Prototyping Alongside Text-Chat (2 pairs): both drew while chatting in the text channel — fast but required strong typing skills. Communication modalities varied: 4 pairs communicated orally, 8 relied on the ASL interpreter, 4 used text chat primarily, 2 mixed interpreter and chat. Prototype outcomes: For error correction in ASR captions, all but one pair agreed mistranscribed words should be visually indicated but there was no consensus on how — suggestions included squiggly underlines, coloured boxes, triangle icons, coloured text, highlighting, or bolding — and correction methods ranged from re-recording verbally to typing, dictionary suggestions, or pop-ups with suggested words. For speaker-behavior notifications, three visual-salience tiers emerged: Icons (low salience, least disruptive), Pop-ups (medium), and full-video Overlays (high salience, most likely to actually get the hearing speaker's attention but also most disruptive to DHH viewers reading captions). A central tension: notifications must be salient enough for hearing speakers to notice but unobtrusive enough not to block captions or lipreading for DHH users. Mobile and videoconferencing prototypes converged on very similar features, differing mainly in interaction affordances.

Relevance

For accessibility researchers, this paper contributes a concrete, reusable methodology for running mixed-ability remote co-design workshops — useful not just during pandemics but any time geographic dispersion, scheduling, or mobility would prevent in-person recruitment. The specific recommendations — pre-labelling workspace frames, using a dual-screen or side-by-side setup for DHH participants, prioritizing DHH communication preferences, giving participants a choice of ASL interpretation or text chat, and considering a dedicated designer-sketcher role — are directly actionable. For product designers of Zoom, Microsoft Teams, Google Meet, Live Transcribe, Otter, and similar tools, the emergent prototypes offer a DHH-informed feature shortlist: visual indicators for mistranscribed words, in-place correction affordances, and tiered notification systems for speaker-behavior nudges. The finding that mobile and videoconferencing prototypes converged suggests cross-modality design consistency is achievable. Limitations: 2-hour sessions were short for getting-to-know-you and Miro-tool learning; only 36 participants were young, tech-literate, university-affiliated; social and environmental factors shaping prototypes were not explicitly probed.

Tags: automatic speech recognition · deaf and hard of hearing · participatory design · co-design · videoconferencing · captioning · captions · mixed-ability · accessibility research · remote research