Are Users the Gold Standard for Accessibility Evaluation?

Amaia Aizpurua, Myriam Arrue, Simon Harper, Markel Vigo · 2014 · Proceedings of the 11th Web for All Conference (W4A) · doi:10.1145/2596695.2596705

Summary

This paper critically examines whether user testing with blind users is a reliable "gold standard" for web accessibility evaluation. Drawing on an exploratory study with 11 legally blind participants (10 JAWS users, 1 VoiceOver user, ages 21-64) who navigated four restaurant websites with varying accessibility levels, the authors demonstrate a striking disconnect between guideline conformance and user perception. While two websites' accessibility ratings matched expert evaluations, the most revealing case was a website with significant accessibility violations that most participants perceived as very accessible — and conversely, some participants did not perceive a highly accessible website as such. The paper identifies multiple sources of bias that can undermine user testing reliability across three phases: before the test (participant recruitment, unreliable self-reported expertise), during the test (recording limitations, difficulty eliciting feedback, language barriers between evaluators and users), and after the test (interpreting ambiguous results, distinguishing accessibility from usability issues, accounting for contextual factors like network problems or screen reader version differences).

Key findings

User expertise emerged as the strongest predictor of how accurately participants could identify and report accessibility barriers. Experienced users could reference specific HTML elements or technologies causing problems, while novice users lacked vocabulary and conceptual frameworks to articulate what was wrong. Intermediate users exhibited a self-blame pattern — when encountering problems, they assumed it was their own fault rather than poor site accessibility, inflating their positive perception of inaccessible sites. Novice users, paradoxically, were more likely to blame the website or screen reader rather than themselves, but they frequently misidentified the source of problems (one participant tried ejecting a USB flash drive thinking it would fix an Adobe Flash content issue). The authors found that subjective factors including user expectations, preconceptions about a website's brand, mood, affective state, and self-confidence all influenced accessibility perceptions. Self-reported expertise was unreliable: some participants were too humble about their skills while others overestimated their abilities. Only 15% of accessibility surveys between 2000-2005 employed task-based user evaluation with disabled people, suggesting user testing remains uncommon despite being widely recommended.

Relevance

This paper delivers an important and nuanced message for anyone conducting accessibility evaluations: user testing is essential but deeply flawed as a standalone method. The finding that a website full of WCAG violations can be perceived as accessible by most test participants — and vice versa — should give pause to anyone who treats user satisfaction as proof of accessibility. For practitioners, the practical implications are significant: accessibility evaluation should combine conformance review with carefully designed user testing; evaluators need both technical accessibility knowledge and social skills to elicit meaningful feedback; self-reported expertise cannot be taken at face value; and contextual factors (screen reader version, browser, network, location) must be controlled or documented. The paper's recommendation of participatory accessibility evaluation — where evaluator and user work collaboratively, combining the evaluator's technical knowledge with the user's lived experience — offers a practical path forward. This work challenges simplistic narratives in both directions: neither "just follow WCAG" nor "just test with users" is sufficient alone.

Tags: user testing · accessibility evaluation · blind users · screen readers · research methodology · bias · WCAG 2.0 · participatory design · usability

Standards referenced: WCAG 2.0 · WCAG-EM 1.0