← All reviews

Optimising the Website Accessibility Conformance Evaluation Methodology

Alexander Hambley, Yeliz Yesilada, Markel Vigo, Simon Harper · 2022 · Proceedings of the 19th International Web for All Conference (W4A) · doi:10.1145/3493612.3520452

Summary

This paper critically examines the Website Accessibility Conformance Evaluation Methodology (WCAG-EM), the W3C's standard five-step process for evaluating website conformance to WCAG. The authors identify several methodological weaknesses in WCAG-EM and propose a parallel framework that introduces statistical rigour to the evaluation process. The core concern is that WCAG-EM relies on non-probabilistic, subjective sampling — auditors manually select pages for evaluation, introducing potential bias and making it impossible to generalise findings to the broader website with any statistical confidence. The paper also challenges WCAG-EM's "principle of website enclosure" and "scope of applicability" concepts, arguing that proper population definition methods from statistical research (with formal inclusion and exclusion criteria) would be more appropriate for scoping evaluations. The proposed framework runs alongside WCAG-EM's five steps but adds systematic, data-driven processes at each stage. For population sourcing, three methods are compared: server log files (capturing pages users actually visit), breadth-first web crawling (visiting all URLs at each depth level before going deeper), and depth-first crawling (following links to their deepest point before backtracking). The authors use tools like t-SNE for dimensionality reduction and DBSCAN for clustering to analyse page populations, and Pa11y (powered by axe-core) for automated accessibility evaluation of large page sets.

Key findings

The paper proposes six metrics for evaluating and comparing population-sourcing methods in Step 1 of their framework: (1) Coverage — measured by clustering page token matrices using DBSCAN after t-SNE dimensionality reduction, comparing the number of distinct clusters each sourcing method captures; (2) Representativeness — how well the population reflects all pages on the website, noting that log files miss unvisited pages and crawlers miss non-hyperlinked pages; (3) Complexity — quantified by the proportion of embedded and interactive HTML content elements on each page; (4) Popularity — page hit-rates from server logs, enabling prioritisation of frequently-visited pages for evaluation (only available from log files, not crawlers); (5) Freshness — whether sourced pages are still current and live, particularly important for rate-limited crawlers operating over extended periods; and (6) Accessibility — automated barrier detection using axe-core severity levels (critical, serious, moderate, minor) to prioritise pages for manual review. Preliminary findings suggest breadth-first crawling produces more heterogeneous page populations with wider tag variation, while depth-first crawling yields more similar pages. Server log files uniquely provide popularity data but depend on the time window and user traffic volume.

Relevance

This research directly addresses a practical problem faced by every accessibility professional who conducts website audits: how to select which pages to evaluate. The current WCAG-EM approach relies heavily on auditor judgement, which varies between practitioners and cannot produce statistically defensible claims about overall site accessibility. For organizations facing legal compliance requirements or procurement audits under standards like EN 301 549, the inability to make generalisable statements about a website's accessibility is a significant limitation. The proposed framework offers a path toward more rigorous, reproducible audits where automated tools handle population analysis and page clustering, freeing human auditors to focus their expertise on the pages that matter most. The emphasis on server log files as a population source is particularly practical — it surfaces the pages real users actually visit, which is arguably more relevant than what a crawler discovers. However, the framework is still theoretical at this stage, with future work needed to validate the metrics against real websites. For accessibility teams, the immediate takeaway is to critically examine their own page selection methods and consider incorporating automated crawling and statistical sampling into their audit workflows.

Tags: accessibility evaluation · WCAG-EM · sampling methodology · automated testing · web crawling · conformance evaluation · accessibility audit · statistical methods

Standards referenced: WCAG 2.0 · WCAG-EM 1.0 · EN 301 549