Tag Thunder: Towards Non-Visual Web Page Skimming

Elena Manishina, Jean-Marc Lecarpentier, Fabrice Maurel, Stéphane Ferrari, Maxence Busson · 2016 · Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '16) · doi:10.1145/2982142.2982152

Summary

This demonstration paper presents Tag Thunder, an audio equivalent of visual tag clouds designed to bring skimming capabilities to blind web users. When sighted people access a web page, they first skim the content — rapidly scanning layout, typography, and visual emphasis to identify relevant information before reading in detail. Screen reader users lack access to these visual quick-reading strategies and must instead navigate sequentially through content, which is far less efficient. Tag Thunder addresses this by extracting key terms from a web page and vocalizing them simultaneously using concurrent speech, leveraging the Cocktail Party Effect — the human ability to selectively focus on one voice among many concurrent speakers. The system has three modules: page segmentation using K-means++ clustering to group visible HTML elements into zones based on Euclidean distance and computed CSS styles; key term extraction that combines TF-IDF scores with visual prominence (font size, weight, color) to identify the most important terms from each zone; and tag thunder vocalization that places extracted terms on an audio track, with spatial positioning and audio effects reflecting the visual position and properties of each zone on the page.

Key findings

This is a short demonstration paper that primarily describes the system architecture rather than presenting detailed evaluation results. The system segments pages into configurable zones, extracts key terms weighted by both textual importance (TF-IDF) and visual styling, and produces a concurrent audio output where multiple terms are spoken simultaneously — analogous to how a tag cloud presents multiple terms visually at once. Evaluation results are described as showing "the viability of the tag thunder concept," though specific metrics are not detailed in this two-page paper. The implementation was available online for demonstration. The concept builds on two established research areas: content summarization and concurrent speech synthesis, combining them to create a novel non-visual content overview strategy.

Relevance

Tag Thunder addresses one of the most significant efficiency gaps between sighted and blind web browsing: the inability to skim. While screen reader navigation shortcuts (jumping by headings, landmarks, etc.) partially mitigate sequential reading, they do not replicate the rapid, parallel processing that visual skimming provides. The Cocktail Party Effect approach is a creative application of auditory perception research to accessibility — rather than presenting information sequentially (the screen reader paradigm), it presents key information concurrently and relies on the listener's attentional mechanisms to focus on relevant terms. For accessibility practitioners, this highlights that the sequential nature of screen reader output is itself an accessibility barrier, not just a necessary feature of audio interfaces. The integration of visual properties (font size, weight) into term importance scoring is notable because it bridges visual design intent with non-visual presentation — a designer's decision to make a heading large and bold translates into greater audio prominence in the tag thunder output.

Tags: blindness · screen reader · web accessibility · speech · natural language processing · sonification · auditory interface · content accessibility