The HearSay Non-Visual Web Browser
Yevgen Borodin, Jalal Mahmud, I. V. Ramakrishnan, Amanda Stent · 2007 · Proceedings of the 2007 International Cross-Disciplinary Conference on Web Accessibility (W4A) · doi:10.1145/1243441.1243444
Summary
This paper presents the original HearSay non-visual web browser (version 1/2), developed at Stony Brook University in collaboration with the Helen Keller Services for the Blind. HearSay is a free, open-source, cross-platform browser written in Java that uses Mozilla for web rendering, FreeTTS for text-to-speech, and CMU Sphinx for speech recognition. Its most innovative feature is context-directed browsing: when users follow a link, HearSay automatically starts reading the destination page from the most relevant section rather than from the beginning, allowing users to skip banners, ads, and menus. This contrasts with conventional screen readers that sequentially verbalize content from the page top. The browser uses a geometrical clustering algorithm to partition web pages into meaningful segments (menus, ads, tables, articles) that users can navigate between, providing a structural understanding of the page rather than a flat sequential view. User interaction is managed through an extensible VoiceXML dialog system (VXMLSurfer), supporting voice commands, text commands, and keyboard shortcuts — treating non-visual web navigation as a dialog between user and browser.
Key findings
HearSay provides two navigation modes tailored to different user experience levels: continuous-reading mode (which reads without stopping as users navigate, suited for experienced users) and pausing mode (which stops after each item or sentence awaiting instructions, better for novices). The browser distinguishes page elements using either verbal announcements or earcons — the latter avoiding ambiguity when content words like "link" appear in the page text itself. The approximate keyword search feature reuses the context-browsing algorithm to locate the most relevant page section for search terms, going beyond simple text matching. Form filling is standardized: textbox values appear in a large editable input window (useful for low vision users with customizable size and color), while radio buttons and combo-boxes are presented as navigable lists. The system supports multi-user profiles with separate settings, history, and favorites. The VoiceXML-based interface architecture makes HearSay highly extensible — experienced users can reassign shortcuts, modify actions, change system feedback, and create new menus. The browser automatically pauses speech when losing focus to prevent interference with other accessibility software running simultaneously.
Relevance
This paper documents the foundational version of HearSay, which evolved through versions 2 and 3 (covered in later W4A papers) into an increasingly sophisticated platform for non-visual web access. The context-directed browsing concept — automatically identifying and jumping to the most relevant content section when following a link — addressed a real and persistent pain point: screen reader users spending significant time navigating past repetitive headers, navigation menus, and advertisements on every page load. This idea anticipated the skip-to-content patterns and ARIA landmark navigation that later became standard accessibility practices. The dialog-based interaction model using VoiceXML was forward-thinking, treating web browsing as a conversation rather than passive content consumption. The collaboration with Helen Keller Services for the Blind ensured the research was grounded in real user needs. For accessibility researchers, HearSay demonstrates the value of purpose-built non-visual browsers that can implement intelligent navigation strategies impossible with generic screen readers overlaid on visual browsers. The open-source, cross-platform design philosophy also set a model for accessible technology development.
Tags: non-visual web browser · screen readers · blind users · web accessibility · natural language processing · machine learning · VoiceXML · speech recognition · navigation · assistive technology · open source
Standards referenced: VoiceXML