Evaluating Prosodic Cues as a Means to Disambiguate Algebraic Expressions: An Empirical Study
Ed Gellenbeck, Andreas Stefik · 2009 · Proceedings of the 11th International ACM SIGACCESS Conference on Computers and Accessibility (Assets '09) · doi:10.1145/1639642.1639668
Summary
This paper investigates whether inserting pauses into text-to-speech renderings of mathematical expressions can help listeners distinguish between structurally different but verbally similar algebraic expressions. The core problem is that written mathematics relies on 2-dimensional spatial layout and specialized symbols to convey structure — for example, "x plus 1 over x minus 1" is ambiguous when spoken aloud because the listener cannot tell whether the entire expression or just the "1" is in the numerator. The authors developed XSL transformation rules that automatically convert Presentation MathML into Speech Synthesis Markup Language (SSML), inserting pauses at structurally meaningful boundaries. This work is part of a larger open-source project to produce a DAISY-formatted digital talking book reader that renders MathML content through a web browser. The target audience includes college students with learning disabilities — particularly dyslexia and ADHD — who represent over 40% of college freshmen reporting disabilities and who benefit significantly from synchronized audio-visual reading of textbooks. The approach deliberately avoids lexical cues (e.g., saying "begin fraction" or "open parenthesis") in favor of prosodic cues alone, on the grounds that pauses are simpler, more natural, and easier to automate.
Key findings
The between-subjects experiment with 16 male computer science students (mean age 22.69) showed that inserting pauses dramatically improved disambiguation of algebraic expressions. The pauses group rated correct expressions as significantly better matches than the no-pauses group (M = 164.375 vs. M = 129.125, t(14) = -3.751, p = .002). Crucially, the no-pauses group could not reliably distinguish between the correct and incorrect expression for a given audio stimulus — the difference in their ratings was non-significant (t(14) = 1.113, p = .285). In contrast, the pauses group showed a highly significant difference between correct and incorrect matches (t(14) = 7.521, p < .001). The correlation between correct and incorrect ratings was extraordinarily strong in the pauses group (r(16) = -.895, p < .000) but virtually zero in the no-pauses group (r(16) = -.285, p = .285), confirming that pauses enabled listeners to meaningfully discriminate between similar expressions. The authors note that their approach is limited to algebraic expressions and that more complex mathematics (calculus, set theory, automata theory) may require additional strategies beyond pauses alone.
Relevance
This research addresses a fundamental gap in how screen readers and audio-based tools handle mathematical content — a barrier that affects not only blind users but the large and growing population of college students with learning disabilities who rely on text-to-speech for reading. The finding that simple pauses can transform unintelligible spoken math into disambiguated expressions has direct implications for screen reader developers and DAISY content producers. The approach is practical because it leverages existing standards (MathML to SSML transformation) and can be automated, unlike solutions requiring human narration or extensive lexical markup. The work also highlights the importance of the DAISY standard and MathML for accessible educational content. While the study is small and focused on algebra, it establishes a proof of concept that prosodic manipulation is a viable and effective strategy for spoken mathematics — a finding that subsequent research has continued to build on.
Tags: mathematical accessibility · text-to-speech · screen readers · MathML · SSML · DAISY · prosody · learning disability · dyslexia
Standards referenced: MathML · SSML · DAISY 3