Annotation-Based Transcoding for Nonvisual Web Access

Chieko Asakawa, Hironobu Takagi · 2000 · Proceedings of the Fourth International ACM Conference on Assistive Technologies (Assets '00) · doi:10.1145/354324.354588

Summary

This companion paper to the same authors' transcoding proxy paper (also at Assets '00) provides a detailed technical description of the annotation-based component of their IBM Japan transcoding system. The core problem addressed is that modern web pages are designed with visual fragmentation — content is organized into visual groupings using background colors, layout tables, spacing, and horizontal lines that sighted users perceive at a glance, but which are invisible to blind users reading content in HTML tag order. This creates two specific problems: visually fragmented groupings cannot be recognized through tag order reading (content from different visual groups may be interleaved in the HTML), and the roles of groups (header, navigation, main content, advertisement) are conveyed through visual design rather than semantic markup. The system uses two types of annotations created by sighted volunteers using a WYSIWYG authoring tool: structural annotations that identify visually fragmented groupings using XPath expressions to define group membership, assign roles (proper content, header, footer, advertisement, general index, updated index, layout table, etc.), and set importance values from -1 to 1; and commentary annotations that provide textual descriptions of HTML elements, particularly useful for images without alt text, image maps without area labels, and forms whose purpose is conveyed only through images. The system architecture consists of a proxy server built on IBM WebSphere Transcoding Publisher, an annotation manager that matches URLs to annotation files, and an annotation server that receives files from volunteers and stores them in a database.

Key findings

The annotation system demonstrates several sophisticated technical features. Structural annotations use XPath to identify page elements, allowing annotations to survive minor page updates — the system can also check DOM tree similarity between the annotated page and other pages on the site, enabling one annotation file to cover multiple similar pages through wildcard URL notation. The transcoding module reorders page content based on group roles and importance values: groups annotated as "proper content" with high importance (0.8) are placed at the top, while advertisements with negative importance (-0.8) are moved to the bottom. The system inserts delimiters between groups using zero-sized images with alternative text announcing the role, and "end of group" markers for navigation. Commentary annotations address a persistent gap in web content: the paper shows examples from CNN.com and AltaVista where forms, image maps, and navigation elements are completely opaque to screen readers because their purpose is communicated solely through images. The WYSIWYG authoring tool allows annotators to select visual groupings by clicking objects on the rendered page, automatically generating XPath expressions — making the annotation process accessible to non-technical volunteers. The system also proposes using Cascading Style Sheets to produce transcoded pages that closely resemble the originals, addressing copyright concerns about creating modified copies of web pages.

Relevance

This paper represents a significant early attempt to solve a problem that remains relevant today: the gap between visual page structure and semantic document structure. The concept of visually fragmented groupings that are invisible to screen readers anticipated the later development of WAI-ARIA landmarks and HTML5 semantic elements (header, nav, main, footer, aside) — which essentially formalize the roles that this system assigned through external annotations. The crowdsourced annotation approach foreshadowed later projects like WebAnywhere and Social Accessibility, and the insight that annotation should be as simple as possible to encourage volunteer participation remains relevant to any crowdsourced accessibility effort. For modern practitioners, the paper highlights that even with semantic HTML5 and ARIA available, many sites still rely on visual layout rather than semantic structure — the fundamental problem this system addressed persists. The commentary annotation concept for providing descriptions of visual-only elements is directly analogous to modern image description crowdsourcing efforts.

Tags: web accessibility · screen reader · transcoding · non-visual access · blindness and low vision · annotations · content adaptation · web navigation · crowdsourcing · authoring tools

Standards referenced: WCAG 1.0 · HTML 4.0 · WAI Guidelines