Who is speaking: Unpacking In-text Speaker Identification Preference of Viewers who are Deaf and Hard of Hearing while Watching Live Captioned Television Program

Akhter Al Amin, Joseph Mendis, Raja Kushalnagar, Christian Vogler, Matt Huenerfauth · 2023 · Proceedings of the 20th International Web for All Conference (W4A '23) · doi:10.1145/3587281.3587286

Summary

This study systematically investigates how the number of speakers onscreen in live television programs affects Deaf and Hard of Hearing (DHH) viewers' preferences for different in-text speaker identification methods in captions. Live TV news and interviews often feature multiple people speaking with rapid turn-taking, making it difficult for DHH viewers reading captions to track who is saying what. While broadcasters use various in-text methods to indicate speaker changes — including double chevrons (>>), speaker names, text color changes, voice narration labels, and emojis — there has been no systematic study of how the number of speakers influences which method DHH viewers prefer. The researchers conducted an empirical study with 17 DHH participants (ages 19-27, recruited at Gallaudet University) who watched 12 videos across four speaker identification types and three speaker counts (2, 4, and 6 speakers). Videos were sourced from CNN, FOX, CBS, Good Morning America, and Good Morning Britain, with content selected to avoid political or emotionally charged topics. A Graeco-Latin squares design was used to counterbalance ordering effects, and each video was followed by three subjective questions (easy to follow, enjoyable, useful) and one open-ended question. A semi-structured interview session concluded the study.

Key findings

A non-parametric two-way ANOVA (Aligned Rank Transform) revealed a significant interaction effect between speaker identification method preference and number of onscreen speakers for "Enjoyable" and "Easy-to-follow" questions. The key finding is that preferences shift as speaker count changes. "Double Chevron with Speaker's Name" showed an upward trend — preference increased as speakers increased from 2 to 4 to 6, making it the preferred method for complex multi-speaker scenarios. Conversely, "Voice Narration" and "Emoji" showed downward trends, becoming less effective as speaker count grew. For 2 speakers, "Voice Narration" was preferred for ease of following. For 4 speakers, "Voice Narration" was still preferred but "Color" emerged as a viable alternative. For 6 speakers, "Emoji" was rated ineffective and should be avoided. Qualitative analysis revealed nuanced trade-offs for each method: color-coding reduces cognitive effort and avoids extra text but raises concerns for viewers with color vision deficiency; double chevron with names aids comprehension but disrupts reading fluency during rapid turn-taking; voice narration is space-efficient but numbering speakers ("Speaker 1") becomes confusing with many similar-looking speakers; and emojis were expected to convey emotion rather than identity, causing confusion when used as speaker identifiers.

Relevance

This research provides the first evidence-based decision framework for captioners and broadcasters selecting speaker identification methods based on the number of onscreen speakers. The practical guidelines are immediately actionable: use voice narration for 2-speaker scenarios, consider color or double chevron with names for 4 speakers, and avoid emoji for 6+ speakers. The qualitative findings surface important accessibility considerations — color-coding may exclude viewers with color vision deficiency, and gender-based voice narration ("Female Voice") should be avoided without proper knowledge of speakers' gender identity. The study also highlights a fundamental tension in caption design between providing enough information to identify speakers and maintaining reading fluency. For captioning standards bodies, the findings suggest current guidelines need updating to account for speaker count as a factor in speaker identification method selection. The study is limited by its university-based recruitment at Gallaudet, and future work should explore preferences across different video genres, devices, and demographic groups.

Tags: deaf and hard of hearing · captioning · live captioning · television accessibility · caption quality · caption customization · media accessibility · user preferences · Deaf community

Standards referenced: DCMP Captioning Key · BBC Mobile Accessibility Guidelines