← All reviews

Towards Real-Time Measurements of Internet Health: Optimizing Large-Scale Web Accessibility Evaluations

Luís P. Carvalho, Tiago Guerreiro, Shaun Lawson, Kyle Montague · 2023 · Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2023) · doi:10.1145/3597638.3608403

Summary

This paper investigates how to optimize page sampling strategies for large-scale web accessibility evaluations, aiming to make near real-time measurements of Internet Health feasible. Current large-scale accessibility studies are financially and time exhaustive, typically evaluating only home pages or using inconsistent methodologies for selecting which pages to test within a website. The researchers conducted an automated accessibility evaluation of 1,500 websites selected from the DomCop Top 10 Million list using the Home+ sampling method, which evaluates the home page plus all linked pages belonging to the same domain. Using the axe-core accessibility engine (covering 61 rules mapped to 22 WCAG 2.1 guidelines), they crawled websites using Puppeteer distributed across five machines, successfully capturing accessibility data from 987 websites totaling 48,335 pages and 1,346,557 axe-core rule evaluations. The researchers then compared baseline results against sub-sampled datasets at increments from 10% to 90%, with and without the home page included, using Cohen's Weighted Kappa to measure agreement across four custom metrics: WCAG violation types per website, number of websites per WCAG violation type, median number of accessibility violations, and accessibility severity.

Key findings

The study found that a sub-sample of just 20% of pages (including the home page) achieves substantial agreement with full-site evaluation results, significantly reducing the resources needed for large-scale assessments. The most commonly violated WCAG guideline was 4.1.2 Name, Role, Value (922 of 987 websites non-conforming), followed by 1.4.3 Contrast Minimum (883 websites) and 2.4.4 Link Purpose (833 websites) — all Level A requirements. Most websites had a severity rating between 2.9 and 3.4, falling between Serious and Critical categories, meaning users with disabilities face barriers that partially or fully prevent access to fundamental content. While home page accessibility consistency was high (83.6% minimum), only 30% of websites had full consistency between the home page and other pages, demonstrating that evaluating only the home page creates an overly optimistic picture. Accessibility violations were found to repeat both within and between websites, suggesting shared templates, frameworks, or external libraries as common sources. Only 13% of all rule evaluations resulted in violations, but the severity of those violations was disproportionately high, showing that even a small number of barriers can substantially impact usability.

Relevance

This research has significant practical implications for organizations conducting web accessibility monitoring at scale, including government regulators enforcing the EU Web Accessibility Directive and researchers tracking accessibility trends. The finding that 20% page sampling achieves reliable results means organizations can conduct five times more frequent evaluations with the same resources, moving closer to real-time Internet Health monitoring. The analogy the authors draw between accessibility barrier tracking and software vulnerability databases is particularly compelling — just as GitHub security scanning alerts developers to known vulnerabilities in dependencies, a similar system could alert developers when shared libraries or templates introduce accessibility barriers. For practitioners, the data reinforces that home-page-only testing is insufficient and that the most pervasive violations (missing accessible names, insufficient contrast, unclear link purpose) remain stubbornly common across the web. The proposed metrics for accessibility severity and consistency provide a more nuanced framework for benchmarking website accessibility than simple pass/fail counts.

Tags: web accessibility · large-scale evaluation · automated testing · internet health · WCAG · axe-core · page sampling

Standards referenced: WCAG 2.1