Deaf Individuals' Views on Speaking Behaviors of Hearing Peers when Using an Automatic Captioning App

Matthew Seita, Matt Huenerfauth · 2020 · Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (CHI EA '20) · doi:10.1145/3334480.3383083

Summary

This CHI 2020 Late-Breaking Work paper investigates what behaviors hearing speakers should ideally exhibit when holding in-person conversations with Deaf or deaf people using an Automatic Speech Recognition (ASR) captioning app on a mobile device. The authors position the study within a broader research thread on ASR accessibility: prior work has shown that ASR accuracy is improving rapidly and that apps like Google Live Transcribe offer a convenient channel for impromptu conversations where professional captioning or sign-language interpretation is not available, but it has also shown that hearing people tend to modify their speech (loudness, articulation, rate) when they know they are being transcribed or are speaking to a DHH partner. The gap the authors address is whether DHH users actually notice these behavior shifts and whether they view them favorably. Eight Deaf/deaf participants at Rochester Institute of Technology, all ASL-primary signers with strong English literacy, held short scripted conversations with a hearing actor who systematically varied six behaviors — speech rate, voice intensity, over-enunciation, eye contact, gesturing, and intermittent pausing — while the participant used Google Live Transcribe on a 10-inch tablet. A native-ASL-signing researcher conducted interviews; participants then rated which behaviors they noticed, which they preferred, and assigned a 1-to-10 priority score to each category. A Friedman rank-sum test with post-hoc Wilcoxon tests was used to compare priority scores across the six behavior categories.

Key findings

Noticeability varied significantly by behavior. Of the eight participants, all eight noticed gesturing and six noticed both eye contact and over-enunciation, whereas only two noticed voice intensity and five noticed speech rate or intermittent pausing. On priority scores, the only statistically significant pairwise difference was that participants prioritized Speech Rate over Voice Intensity (p = 0.001). Open-ended comments gave rich qualitative detail. Participants wanted a moderate, natural speech rate — neither too fast (ASR errors pile up) nor too slow (it feels patronizing). Voice intensity mattered less: one cochlear implant user asked explicitly not to be shouted at because loudness does not help clarity. Over-enunciation was actively disliked by speechreaders, who said exaggerated lip shapes made lipreading harder and felt rude. Eye contact was strongly valued — participants said its absence made them feel the speaker was sad or disengaged. Gesturing was helpful only when purposeful; constant gesturing distracted from lipreading and reading the screen. Occasional, short pauses after sentences were valued because they gave the participant time to read the caption and respond, but long pauses disrupted flow.

Relevance

For designers and evaluators of ASR-based captioning apps — Google Live Transcribe, Otter, Ava, Microsoft Translator, and similar — this study argues that optimizing purely for recognition accuracy misses an important piece of the user experience: the communication behaviors of the hearing speaker. It provides a practical, DHH-centered list of desirable hearing-speaker behaviors (natural rate, normal volume, minimal over-enunciation, strong eye contact, purposeful gesturing, brief sentence-end pauses) that could inform in-app coaching, training materials for hearing colleagues, or evaluation protocols. The findings also caution that behaviors triggered in the hearing speaker by ASR use may raise ASR accuracy while simultaneously harming the DHH user's subjective experience, exposing a trade-off that future app design should negotiate explicitly. Limitations include the small sample (n=8), scripted single-behavior-at-a-time design, and controlled lab setting; findings need replication in naturalistic multi-behavior workplace conversations.

Tags: automatic speech recognition · deaf and hard of hearing · captioning · captions · speaking behavior · mobile accessibility · communication · user research · accessibility research