Exploring Community-Driven Descriptions for Making Livestreams Accessible

Daniel Killough, Amy Pavel · 2023 · ASSETS '23: Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility · doi:10.1145/3597638.3608425

Summary

This paper investigates the feasibility of using livestream community members — sighted viewers who are domain experts in the content being streamed — to provide real-time descriptions that make livestreams accessible to viewers with visual impairments. Livestreams present unique accessibility challenges compared to recorded video: they feature multiple simultaneous visual streams (main video, webcams, on-screen overlays, and chat), they are unedited and often hours long, and their content is highly domain-specific (e.g., competitive gaming, digital art, cooking). Traditional audio description approaches designed for recorded media cannot be directly applied because they rely on post-production editing and fitting descriptions into natural pauses. The researchers conducted two complementary studies. In the first, 18 sighted community members with domain expertise described livestreams across seven categories (four video game genres, chess, digital art, and makeup) using three description methods: synchronous text, synchronous audio, and asynchronous text. The researchers built a Chrome extension called the Describer Extension that allowed participants to type timestamped descriptions alongside a Twitch livestream player. In the second study, 9 livestream viewers with visual impairments shared their current viewing practices and challenges, then evaluated the community-written descriptions during a co-watching exercise. All participants in the audience study used screen readers, with five being blind and four having low vision.

Key findings

Sighted community members produced an average of 21.9 descriptions per 5-minute video clip, generating 1,183 total descriptions across 54 sessions. Describers primarily used two strategies: "state descriptions" providing high-level context (56 descriptions) and "play-by-play descriptions" capturing real-time updates (215 descriptions). Describers frequently used domain-specific terminology and shorthand (e.g., "dair" for "down air" in fighting games) to keep pace with fast action. Asynchronous text descriptions were the most preferred method (mean rank 1.5), followed by synchronous audio (2.0) and synchronous text (2.33), with a statistically significant difference between asynchronous and synchronous text. However, synchronous audio produced significantly more descriptions per minute (6.28 vs. 4.22 asynchronous, 3.20 synchronous text) and more words per minute. Viewers with visual impairments rated videos at 2.3 out of 7 for accessibility without descriptions, improving to 5.2 with descriptions. Viewers reported that livestreams were most accessible when streamers had clear narration and good audio quality, but inaccessible when streamers' speech diverged from their visual actions, when on-screen text went undescribed, or when unidentified sounds played from overlays. Viewers wanted descriptions prioritizing main content first, then streamer appearance and facial expressions, and preferred text-format descriptions they could control with screen reader settings over voice descriptions that contained filler words and awkward pauses.

Relevance

This research opens an important new direction for making live video content accessible — a problem that will only grow as livestreaming becomes more central to online social interaction, education, and entertainment. The community-sourcing model is pragmatic because it leverages people who already possess the domain knowledge needed to describe specialized content, unlike professional audio describers who would need extensive preparation. For accessibility practitioners, the study reveals that viewers with visual impairments already employ creative workarounds to access livestreams — asking chat questions, relying on friends, monitoring notification sounds — but these strategies are insufficient. The finding that even imperfect community descriptions substantially improved perceived accessibility (from 2.3 to 5.2 out of 7) suggests that platforms should invest in description tools even before achieving professional-quality output. The work also highlights platform-level accessibility barriers: Twitch was largely avoided by participants with visual impairments due to poor screen reader support, while YouTube was preferred despite having fewer live features.

Tags: audio description · livestreaming · blind and low vision · crowdsourcing · video accessibility · community sourcing · screen readers