Wizard-of-Oz evaluation of speech-driven web browsing interface for people with vision impairments

Vikas Ashok, Yevgen Borodin, Svetlana Stoyanchev, Yuri Puzis, I. V. Ramakrishnan · 2014 · Proceedings of the 11th Web for All Conference (W4A) · doi:10.1145/2596695.2596699

Summary

This paper presents a Wizard-of-Oz study with 24 blind participants to evaluate the usability and effectiveness of speech-driven web browsing as an alternative to traditional keyboard-based screen reader interaction. The study was motivated by three key shortcomings of current screen reader browsing: browsing fatigue from excessive keyboard presses, cognitive overload from memorising many shortcuts, and information overload from listening to irrelevant content sequentially. Participants completed six realistic web tasks — shopping on Amazon, filling out a university application, booking flights on Kayak, reserving hotel rooms, searching job listings on Monster.com, and sending email on Gmail — under three conditions: keyboard only, voice only, and a combination of both. In the voice condition, participants spoke unrestricted natural language commands ranging from low-level ("click the search button") to high-level ("buy this product"), which were secretly interpreted and executed by a human wizard using a screen reader, simulating an ideal speech interface. The study used the Capti Narrator screen reader with Apple's "Alex" text-to-speech voice at 220 words per minute. Participants believed they were interacting with a real system — many expressed interest in getting a copy.

Key findings

The voice condition was significantly faster than keyboard across all groups, with overall mean completion times of 231 seconds (voice) versus 358 seconds (keyboard). SUS usability scores strongly favoured voice (82.2) over keyboard (46.4), with the difference statistically significant across all demographic groups. Remarkably, 67% of participants preferred voice-only over the combination condition, even though combination offered strictly more capability — participants found switching between modalities confusing and distracting. In the combination condition, voice was the first modality used most of the time, and in 29 of 48 tasks the keyboard was never touched. Four expert users who attempted keyboard shortcuts in combination mode got lost, switched to voice, and never returned to the keyboard. Participants over 40 preferred voice-only even more strongly than younger users. The study also produced a valuable dialog corpus of natural language web browsing commands. Post-study questionnaires showed near-universal enthusiasm: average agreement of 4.79/5 for "I would like to use voice-enabled browsing in the future" and 4.68/5 for "I wish working with a web browser was like working with a human assistant." Keyboard was still preferred for form filling by some expert users, and participants noted voice would be impractical in shared office spaces for privacy reasons.

Relevance

This study provides compelling empirical evidence that speech-driven web browsing could dramatically improve the web experience for blind users — not just incrementally but transformatively, with usability scores nearly doubling compared to keyboard-based screen readers. The finding that even expert screen reader users got lost using keyboard shortcuts but succeeded easily with voice commands underscores how much cognitive overhead the current shortcut-based paradigm imposes. The study is particularly prescient given the subsequent rise of voice assistants and large language models: the "ideal" speech interface simulated by the wizard — understanding high-level goals like "buy this product" and translating them to multi-step browser actions — is increasingly achievable with current AI technology. The dialog corpus produced by the study provides reference data for building such systems. The privacy concern raised by participants about voice in shared spaces remains a real design consideration for voice-first accessibility tools.

Tags: blindness · screen readers · speech interface · voice interface · web navigation · web automation · wizard of Oz · dialog systems