Lessons from Developing Audio HTML Interfaces
Frankie James · 1998 · Proceedings of the Third International ACM Conference on Assistive Technologies (Assets '98) · doi:10.1145/274497.274504
Summary
This paper presents the AHA (Audio HTML Access) framework, a set of principles for choosing sounds to use in audio-based HTML interfaces designed for blind and visually impaired users. The research builds on earlier work at Stanford University exploring how web content can be rendered entirely through audio rather than visual display. James developed the AHA framework through iterative user studies comparing different audio marking techniques applied to HTML document structures. The first study used a Wizard of Oz format with mock-up interfaces to test various marking techniques on novice users, while the second study tested more experienced users with fully implemented browser prototypes. The framework identifies three core principles for sound selection in audio interfaces: Vocal Source Identity (using different voices to distinguish content types), Recognizability (ensuring sounds are immediately identifiable), and Distraction (minimizing cognitive overload from audio cues). The paper demonstrates how these principles interact with user-specific factors such as tasks, goals, background knowledge, and browsing context. Two detailed user scenarios illustrate how the framework can guide the design of personalised audio interfaces — one for a blind Stanford undergraduate navigating academic content, and another for a sighted professional using audio browsing while multitasking. The research represents an early and systematic attempt to establish design guidelines for non-visual web browsing.
Key findings
The studies revealed that the number of distinct speaking voices in an audio interface should be kept small — interfaces using many voices caused confusion rather than clarity. Context switches between speakers were generally not appropriate for marking items within a coherent text section, though they worked well for signalling structural boundaries. Sound identity proved more important than salient feature identity for reducing distraction; users who could quickly recognise a sound spent less cognitive effort processing it. The research found that the same HTML structures may require different audio treatments depending on the user population — novice users needed more explicit marking while experienced users preferred subtler cues. Heading levels were effectively conveyed through tonal sequences mapped to pitch contours, with each heading level assigned a distinct pitch pattern. Link markings were most effective when kept brief and unobtrusive, as users encountered them frequently. The paper also identified that musical sounds generally rated lower for distraction than speech-based markers for structural elements, though speech remained preferable for conveying semantic content.
Relevance
This 1998 paper is a foundational contribution to audio interface design for web accessibility. While screen readers have evolved significantly since this research, the AHA framework's core principles remain relevant to modern auditory display design. The finding that fewer, well-chosen audio cues outperform an abundance of distinct markers applies directly to contemporary screen reader verbosity settings and notification design. The emphasis on user-centred customisation — recognising that different users need different audio presentations based on expertise and context — anticipates modern accessibility personalisation efforts. For practitioners designing audio-first or voice-first interfaces today, including voice assistants and audio descriptions, the framework offers a structured approach to sound selection that balances informativeness with cognitive load. The research also demonstrates the value of iterative, user-involved design in assistive technology, a methodology that remains best practice in accessibility work.
Tags: audio interfaces · non-visual web access · sonification · speech synthesis · blind users · auditory display · HTML accessibility · sound design
Standards referenced: HTML