WAEM: A Web Accessibility Evaluation Metric Based on Partial User Experience Order

Shuyi Song, Can Wang, Liangcheng Li, Zhi Yu, Xiao Lin, Jiajun Bu · 2017 · Proceedings of the 14th International Web for All Conference (W4A) · doi:10.1145/3058555.3058576

Summary

This paper introduces WAEM (Web Accessibility Experience Metric), a novel accessibility metric that derives checkpoint weights from actual user experience data rather than from WCAG priority levels. The authors demonstrate that existing metrics like WAB and WAQM, which weight checkpoints based on WCAG priority levels (Priority 1 weighted highest, Priority 3 lowest), produce website rankings that poorly correlate with how people with disabilities actually experience those websites. WAEM addresses this by introducing the concept of Partial User Experience Orders (PUEXOs) — pairwise comparisons where a user indicates which of two websites provides a better browsing experience. The metric calculates a weighted accessibility score for each website as the dot product of its checkpoint pass rates and the checkpoint weight vector. The optimization problem of finding weights that satisfy the maximum number of PUEXOs is formulated as equivalent to the objective function of Support Vector Machines (SVMs), where the weight vector corresponds to the SVM normal vector and error tolerances correspond to slack variables. The study collected data from 45 Chinese government websites evaluated against 30 WCAG checkpoints (11 Priority 1, 8 Priority 2, 11 Priority 3), with user experience data from 23 web accessibility experts and 7 volunteers with disabilities (visual impairment of varying degrees, hearing impairment, speech disability, and movement impairment).

Key findings

WAEM significantly outperformed all comparison metrics in reflecting user experience. Using the satisfied percentage (SP) — the proportion of pairwise comparisons correctly predicted by each metric — WAEM consistently exceeded the 75th percentile of randomly weighted metrics across all 10 folds of cross-validation, while Equal Metric, Priority Metric (WAB-style), and Random Metric all performed worse. In direct comparisons (Figure 2), almost all dots representing other metrics fell below the bisecting line against WAEM, confirming its superiority. Analysis of the learned checkpoint weights revealed which checkpoints most impact user experience: Time-Based Media in Text Alternatives, Page Titled in Navigable, Keyboard in Keyboard Accessible, Non-Text Content in Text Alternatives, and Error Prevention in Input Assistance received the highest weights. These findings make intuitive sense — for example, keyboard accessibility is critical for users who cannot use a mouse, and page titles are essential for screen reader navigation. The analysis also confirmed that websites with low pass rates on high-weighted checkpoints consistently received poor user experience ratings, validating that WAEM captures real accessibility impact.

Relevance

WAEM is the precursor to the same research group's RA-WAEM (published at W4A 2018), and together they represent an important shift in how accessibility should be measured: by alignment with actual user experience rather than standards-derived assumptions about barrier severity. The key practical insight is that WCAG priority levels do not reliably predict which barriers most affect users — some Priority 2 and 3 checkpoints may have greater real-world impact than certain Priority 1 checkpoints. For organizations conducting accessibility audits, this suggests that checkpoint prioritization for remediation should be informed by user experience data, not just conformance level. The SVM-based approach for deriving optimal weights from pairwise comparisons is elegant and practical, as it only requires users to make relative judgments ("site A is better than site B") rather than absolute ratings. The checkpoint weight analysis provides actionable guidance: keyboard accessibility, text alternatives for time-based media, page titles, and error prevention emerge as the highest-impact areas. The main limitation is that WAEM does not account for varying evaluator reliability, which the follow-up RA-WAEM paper addresses.

Tags: accessibility metrics · user experience · accessibility evaluation · machine learning · SVM · web accessibility · WCAG compliance · China · checkpoint weights · visual impairment · hearing disability · motor disability · speech disability

Standards referenced: WCAG 2.0