Proposing New Metrics to Evaluate Web Usability for the Blind

Kentarou Fukuda, Shin Saito, Hironobu Takagi, Chieko Asakawa · 2005 · CHI '05 Extended Abstracts on Human Factors in Computing Systems · doi:10.1145/1056808.1056923

Summary

This four-page CHI '05 extended abstract argues that existing automated accessibility checkers — in 2005 typified by Bobby — had plateaued: they verified the presence of ALT attributes, labels, and markup conformance, yet real blind users still found many 'compliant' web pages difficult to use. The authors propose supplementing pass/fail conformance checking with two quantitative usability metrics computed directly from the HTML and rendered output. *Navigability* measures how well structured a page is for non-visual reading. It combines the reaching time required to arrive at the main content via a voice browser (penalising pages where main content sits more than ~90 seconds of listening from the top), the presence and correctness of skip links and heading tags, the ratio of accessible links on the page, and the correct use of FORM labels and TABLE headers. *Listenability* measures how appropriate the rendered text is when read aloud: whether ALT attributes are meaningful rather than placeholder strings ('spacer gif', 'image'), whether text is duplicated redundantly near image links, and whether characters are space-separated in ways that confuse speech synthesisers (a particular problem in Japanese where '国際' reads as 'kuni sai' instead of 'kokusai'). Both metrics are built into the aDesigner tool, which inherits the same visualisation engine used in the authors' reaching-time work. To demonstrate the metrics' value the authors compute both scores across snapshots of seven major websites (US government, news, e-commerce, IT companies, and IBM) retrieved from the Wayback Machine between 1997 and 2004.

Key findings

The historical analysis showed that WCAG Priority 1 error counts from Bobby tracked poorly with actual blind-user experience: several sites with few reported errors scored badly on navigability and listenability, and vice versa. Navigability scores for 'News 1' and 'News 2' collapsed from 88–92 in 1999–2000 to 8–30 through 2001–2004 as the sites moved to denser two-dimensional layouts without adding heading or skip-link structure. IBM's own site was a clear outlier, scoring close to 100 across almost every year. 'US Gov. 2' dropped to a navigability score of 24 in 2002 because its skip links had no accessible target text. Listenability revealed problems Bobby could not detect: IBM's site lost listenability in 1999 because many images had ALT="" applied blanket-style, while 'IT Co.' improved dramatically once it removed 'spacer' ALT strings in 2002. The authors conclude that accessible markup alone does not guarantee usable speech output, and that listenability must be tracked separately. They flag that neither metric yet reflects the cognitive effort of navigation (number of operations to reach content) and commit to validating both with direct user evaluation in future work.

Relevance

This short paper made one of the earliest concrete arguments that the accessibility community should move beyond 'number of WCAG errors detected' toward usability-oriented metrics grounded in how blind users actually experience pages — a position that has since been reinforced by every major empirical study of automated-tool coverage, most recently by WebAIM Million reports. For practitioners building accessibility dashboards or QA pipelines, navigability and listenability remain useful framings: they encourage measuring time-to-main-content, scoring the semantic quality of ALT text rather than just its presence, and tracking how design changes affect non-visual reading over time. The methodology of applying automated metrics to historical snapshots from the Wayback Machine also anticipates later longitudinal accessibility studies. Limitations include reliance on the aDesigner visualisation engine (the metrics are not independently reproducible), no direct validation against blind-user task performance, and absence of treatments for dynamic content, ARIA, or mobile layouts. The metrics are best seen as complements to, not replacements for, WCAG conformance and user testing.

Tags: web accessibility · accessibility evaluation · accessibility metrics · screen readers · voice browser · blindness and low vision · automated accessibility · alt text · accessibility testing

Standards referenced: WCAG 1.0 · Section 508