Interactive audio documents

T. V. Raman, David Gries · 1994 · Proceedings of the First Annual ACM Conference on Assistive Technologies (Assets '94) · doi:10.1145/191028.191045

Summary

This paper describes the browsing component of AsTeR (Audio System for Technical Readings), an interactive computing system developed by T. V. Raman that audio-formats electronic documents written in LaTeX to produce navigable audio documents. The paper addresses a fundamental asymmetry in information access: printed documents are passive objects explored by active readers who can skim, skip ahead, and jump between sections, while traditional audio renders the listener passive as information flows linearly past them. AsTeR reverses this by enabling "active listening" — the ability for a listener to interactively control what they hear. The system parses LaTeX documents into an internal attributed tree representation, then provides a browser that allows users to traverse this structure using simple tree-traversal commands (move to parent, child, sibling). A key design insight is that visual browsing of printed documents, while appearing random, is actually directed by the underlying logical structure — section headings, paragraphs, displayed formulas. AsTeR makes this same structure navigable in audio. The system was implemented in CLOS (an object-oriented extension of Lisp) and referenced from an Emacs window, and the first author used it extensively to listen to computer science and mathematics texts.

Key findings

The browser uses a minimal command set of the form <verb> rather than <verb> + <noun>, making it flexible and extensible — the same commands work for both general document structure (sections, paragraphs) and complex mathematical expressions. When a browsing action is executed, AsTeR provides concise summaries that include context (where the current selection sits in the document) and type (what kind of object it is), preventing the "lost in space" problem common when traversing complex structures non-visually. The system supports "relative renderings" where elements can be spoken in their original audio context (e.g., a subscript rendered in a lower pitch) by saving the audio system state with each document object. For cross-references — a major challenge in technical documents — AsTeR allows listeners to assign meaningful labels to referenced objects (e.g., labeling "theorem 1.1" as "Fermat's last theorem"), so later references use the human-meaningful name rather than an opaque number. The system also provides bookmarks for marking and returning to positions of interest. AsTeR far surpassed audio cassettes from Recordings for the Blind (RFB) in speed of availability, precision of rendering, and interactivity.

Relevance

This paper represents a seminal contribution to document accessibility, particularly for STEM content. T. V. Raman, who is blind, built AsTeR from his lived experience of needing to access complex mathematical documents — making this both a technical achievement and an example of disability-led innovation. The concept of treating document structure as a navigable tree that can be traversed through simple atomic actions laid groundwork for how modern screen readers handle structured documents. The insight that audio documents must be interactive — not just linear readings — anticipated features now standard in digital accessibility, from navigable headings in screen readers to structured navigation in EPUB. For practitioners working on mathematical or STEM accessibility today, AsTeR's approach to rendering mathematical notation through audio parameters (pitch, stereo positioning) rather than verbose descriptions remains influential and relevant to ongoing MathML accessibility work.

Tags: mathematical accessibility · audio formatting · screen readers · document accessibility · blind and low vision · text-to-speech · interactive browsing