AccessComics2: Understanding the User Experience of an Accessible Comic Book Reader for Blind People with Textual Sound Effects

Yun Jung Lee, Hwayeon Joh, Suhyeon Yoo, Uran Oh · 2023 · ACM Transactions on Accessible Computing · doi:10.1145/3555720

Summary

This research investigates how to make digital comic books accessible to people with visual impairments through AccessComics, a web-based comic reader that provides scene descriptions, synthesized character voices, and sound effects. The study addresses a gap in accessibility research, which has focused heavily on photos and graphs while largely ignoring comics—a medium that combines visual art with sequential narrative. The researchers conducted three studies. First, a formative online survey with 68 blind and low-vision participants explored their experiences with audiobooks and eBooks, finding that more than half had read comics but accessibility barriers were significant. Second, an interview study with 8 blind participants evaluated the AccessComics prototype, which achieved a System Usability Scale score of 76.6 ("good"). Third, a comparative study with 16 participants (8 blind, 8 sighted) examined the effects of scene descriptions and textual sound effects across four conditions. AccessComics is implemented as a cross-platform web application using HTML, JavaScript, and CSS with Amazon Polly for text-to-speech. The system provides an introduction page with character descriptions and voice previews, configurable reading units (panel, strip, or page), and filtering options that let users customize what information they receive. Different synthesized voices are assigned to each character based on gender and age, while a narrator voice provides scene descriptions including background composition, character positions, and facial expressions.

Key findings

The study revealed that scene descriptions significantly improved both immersion and situation understanding. Conditions with scene descriptions (SW-desc, SE-desc) received higher ratings than those without, with statistical significance (p < .05). Participants reported that without descriptions, they couldn't concentrate on content or understand "what's going on." Thirteen of 16 participants preferred versions with scene descriptions. For textual sound effects (onomatopoeia like "BOOM," "THUMP"), most participants found actual sound effects more realistic and immersive than having the words read aloud by synthesized voice. However, the rating for sound effects with scene descriptions (SE-desc) was statistically higher for situation understanding than all other conditions. This suggests sound effects alone cannot compensate for missing scene descriptions. Participants identified specific information they wanted when reading comics: character details (appearance, personality, relationships) were most requested (10/16), followed by story preview (10/16), speech balloon information (9/16), character posture (5/16), and facial expressions (5/16). Notably, facial expressions were considered essential because "sometimes characters talk about something serious or make a vague statement" and readers "cannot tell if they actually mean it" without knowing the expression. The survey found audiobook and eBook preferences were nearly split, with each format having distinct advantages: audiobooks offer emotional voice acting and background sounds, while eBooks allow speed control, spelling verification, and keyword search.

Relevance

This research provides a practical framework for making comics and other sequential visual narratives accessible. The finding that scene descriptions are essential—not optional—has implications for any visual media where context matters beyond dialogue. Comics publishers and platform developers should consider accessibility from the start rather than treating it as an afterthought. The design implications offer concrete guidance: accessible comic readers should support various character voices, customizable reading speed compatible with screen reader preferences, auto-reading at larger units (pages rather than panels), filtering options for different information types, and scene descriptions that include character appearance, posture, and facial expressions. The study suggests that textual sound effects (onomatopoeia) can be converted to actual audio for a more immersive experience. For practitioners, this work demonstrates that accessibility features benefit both blind and sighted users—the study found no significant differences between user groups in their preferences for scene descriptions and sound effects. This supports universal design principles where accessibility enhancements improve the experience for everyone. The research also highlights opportunities for automation using computer vision and scene understanding to generate descriptions and map sound effects at scale.

Tags: comics · audiobooks · eBooks · screen readers · blind users · scene descriptions · sound effects · onomatopoeia · media accessibility