Spindex (Speech Index) Improves Auditory Menu Acceptance and Navigation Performance
Myounghoon Jeon, Bruce N. Walker · 2011 · ACM Transactions on Accessible Computing · doi:10.1145/1952383.1952385
Summary
This paper introduces the "spindex" (speech index), a novel auditory cue designed to improve navigation through spoken menus on mobile devices. The spindex works by pronouncing the first letter of each menu item before the full text-to-speech (TTS) rendering—analogous to the visual index tabs on the edge of a dictionary or phone book. When users scroll rapidly through an alphabetized list, they hear "A...A...A...B...B...C..." which helps them quickly identify their location without listening to complete item names. The researchers conducted three experiments with iterative design refinements. The first experiment with 25 sighted undergraduate participants established that spindex cues significantly improved navigation speed compared to TTS alone, with the benefit more pronounced for longer lists (150 items vs 50 items). The second experiment tested four spindex design variants: basic (full volume on every item), attenuated (20dB quieter after the first letter in each category), decreased (gradually fading within each letter category), and minimal (cue only on category boundaries). All variants outperformed TTS-only, but users preferred the attenuated and decreased versions for their reduced intrusiveness while maintaining navigational benefit. The third experiment extended testing to 16 blind and visually impaired adults recruited from the Center for the Visually Impaired in Atlanta and Georgia Industries for the Blind. This population was essential for validating the design with the primary target users of auditory interfaces.
Key findings
The spindex consistently improved menu navigation performance across all three experiments. In Experiment 1, the TTS+spindex condition averaged 10,292ms compared to 11,606ms for TTS-only—a statistically significant improvement that increased with list length. Critically, spindex required less learning time than plain TTS; performance with spindex plateaued by Block 2, while TTS-only continued improving through Block 3. For visually impaired participants in Experiment 3, TTS+spindex averaged 21,308ms versus 28,117ms for TTS-only—a 24% improvement. When asked to choose their preferred design, no participant selected TTS-alone. Eight chose the basic spindex and six chose the attenuated version, with only one selecting decreased or minimal. This contrasted with sighted participants who preferred the attenuated version for its aesthetic qualities. A revealing difference emerged in subjective ratings: visually impaired participants rated both conditions less annoying than sighted participants did (1.35 vs 4.92 for TTS-only; 1.88 vs 6.24 for TTS+spindex). The authors attribute this to visually impaired users viewing auditory displays as a necessity rather than an optional enhancement, leading to greater tolerance and appreciation. Visually impaired users also valued clarity over the "fun" factor that sighted users enjoyed in attenuated designs.
Relevance
The spindex represents a practical, low-cost enhancement that can be implemented with minimal programming—TTS engines can generate the cues dynamically without storing additional audio files. This makes it immediately applicable to mobile applications, MP3 players with large libraries, and any alphabetized auditory interface. The research highlights an important tension in assistive technology design: performance optimization versus user preference. All spindex variants improved navigation speed, but the minimal version—despite equivalent performance—was rejected by users who wanted continuous feedback about their location. This validates the principle that assistive technology must feel helpful, not just be measurably effective. The study also demonstrates the importance of including target users throughout the design process rather than only using blindfolded sighted proxies. Visually impaired participants showed distinctly different preferences (favoring clarity over aesthetic refinement) and had higher baseline acceptance of auditory interfaces. For practitioners, the attenuated spindex offers the best compromise between sighted and visually impaired user preferences, making it suitable for universal design contexts where both populations will use the same interface.
Tags: auditory interfaces · mobile accessibility · text-to-speech · screen readers · user interface design · blindness · sonification