For one or for all? Survey of educator perceptions of Web Speech-based auditory description in science interactives

Brett L. Fiedler, Taliesin L. Smith, Jesse Greenberg, Emily B. Moore · 2022 · Proceedings of the 19th International Web for All Conference (W4A) · doi:10.1145/3493612.3520456

Summary

This paper investigates how educators perceive a Web Speech API-based auditory description feature called Voicing, implemented in two PhET Interactive Simulations (John Travoltage and Gravity Force Lab: Basics) used globally for science education. Unlike traditional screen reader-dependent auditory descriptions, the Voicing feature uses the browser-native Web Speech API to deliver spoken descriptions directly within the simulation, making it available to any user without requiring assistive technology. The feature is optional and highly customizable: users can toggle it on, choose what types of information are spoken (object details, context changes, helpful hints), adjust speech rate and pitch, and access on-demand descriptions via a sidebar toolbar. The researchers surveyed over 2,000 educators through two Qualtrics surveys, each embedding one of the two simulations with the Voicing feature enabled at one of three preset detail levels (PA: full object and context changes, PB: context changes only, PC: object names and on-screen text only). Educators interacted with the simulation for 30 seconds, then rated 14 Likert-style statements about their experience, answered questions about perceived benefit for student populations, and provided open-ended comments. The study builds on an existing description design framework developed by the PhET team that distinguishes between state descriptions (static, browseable summaries) and responsive descriptions (dynamic, triggered by user interaction), adapting these concepts for speech output rather than screen reader delivery.

Key findings

Educators generally approved of the Voicing feature, though ratings were more moderate than the researchers expected — all average statement ratings fell within one point of neutral (3 on a 5-point scale), with no extreme agreement or disagreement. The highest agreement was for statements about the feature being interesting, helpful, and natural to hear. Educators perceived the feature as beneficial both broadly and for specific populations: approximately 25% identified learners with difficulty interpreting visual content, 24.6% identified learners needing guidance with simulations (younger learners, those with intellectual or developmental disabilities), 23.2% identified learners with low vision, and 21.8% identified learners with diverse auditory and visual processing needs. Notably, less than 1% selected "None of the above," and educators frequently wrote in open-ended responses that "all learners" would benefit. Thematic analysis of 509 comments revealed the most common theme was desire for Voicing to be optional, followed by recognition of benefits for specific populations, and concerns about speech engine quality. The interactive design of the simulation influenced perceptions: Gravity Force Lab: Basics received consistently higher ratings than John Travoltage, likely due to its simpler interaction pattern with fewer unexpected contextual changes. Significant differences between presets were found primarily when comparing the highest and lowest detail levels (PA vs. PC), suggesting that the amount of spoken description does influence educator perception, though not as strongly as predicted.

Relevance

This research has significant implications for how accessibility features are designed and positioned in web-based educational tools. The finding that educators see Voicing as beneficial for a wide range of learners — not just those with visual impairments traditionally associated with auditory description — supports the electronic curb-cut effect and universal design principles. For accessibility practitioners, this validates the approach of building speech-based description features that are optional, customizable, and available without requiring screen reader software. The study also highlights practical challenges: speech engine quality and the "robotic" nature of synthesized voices were frequent concerns, and the feature was limited to English, reducing its value in non-English-speaking contexts. The emphasis on user control over the amount and type of spoken information provides a useful design pattern for any interactive web content seeking to add auditory descriptions. Practitioners building educational or interactive tools should consider that auditory descriptions can serve as pedagogical scaffolding, not just an accessibility accommodation.

Tags: auditory description · web speech · science education · interactive simulations · universal design · educator perceptions · surveys

Standards referenced: WCAG 2.1 · Web Speech API