CodeA11y: Making AI Coding Assistants Useful for Accessible Web Development

Peya Mowar, Yi-Hao Peng, Jason Wu, Aaron Steinfeld, Jeffrey P. Bigham · 2025 · Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI '25) · doi:10.1145/3706598.3713335

Summary

This paper addresses a persistent problem: despite decades of accessibility standards and tools, ~96% of web pages contain accessibility violations. The authors argue that AI coding assistants like GitHub Copilot represent an untapped opportunity because developers already use them — no adoption barrier exists. Through a formative study with 16 developers without accessibility training, they identified three key failures in AI-assisted coding: (1) developers never prompted AI for accessibility — their prompts centred on visual appearance ("add a grey patch") rather than semantic structure; (2) developers blindly accepted incomplete AI suggestions, such as pasting code with empty alt attributes or placeholder labels without replacing them with meaningful values; (3) developers could not verify whether their code was accessibility-compliant. Even the two participants who knew about WCAG never prompted Copilot for accessibility. Copilot occasionally generated accessible code incidentally (e.g., adding form labels from training data patterns) but also introduced new violations (e.g., insufficient hover contrast). Based on these findings, the authors built CodeA11y, a GitHub Copilot Extension with a multi-agent architecture comprising three LLM agents (all using GPT-4o): a Responder Agent that generates accessibility-compliant code by default (prompted to assume the developer is unfamiliar with WCAG 2.1 AA), a Correction Agent that parses axe DevTools Accessibility Linter logs to surface relevant violations, and a Reminder Agent that identifies manual validation steps needed after pasting AI-generated code.

Key findings

In a controlled evaluation with 20 novice developers, CodeA11y significantly improved accessibility outcomes compared to baseline GitHub Copilot across multiple tasks. For color contrast, CodeA11y users scored significantly higher (mean 1.3 vs 0.75, p<0.05), with CodeA11y automatically ensuring contrasting button state colours. Form labelling improved significantly (mean 1.5 vs 0.88, p<0.05), with CodeA11y facilitating automatic addition of form labels. Alt-text quality improved (mean 0.7 vs 0.25, p<0.05), though 50% of CodeA11y users still submitted uninformative alt-texts — highlighting the limits of automation for content-dependent attributes. Critically, no CodeA11y user submitted completely empty alt attributes, meaning automated checkers would at least flag them, unlike the formative study where blank alts passed unnoticed. Satisfaction and ease-of-use ratings were comparable between CodeA11y and Copilot, indicating the accessibility features did not degrade the developer experience. Most revealing: only 4 of 20 participants in the evaluation study noticed that CodeA11y was providing accessibility guidance, demonstrating the tool's ability to "silently" improve code accessibility without developers needing to be conscious of it. However, many developers still dismissed manual validation reminders and prioritised visual appearance over accessibility suggestions.

Relevance

This paper is essential reading for anyone working on organisational web accessibility strategy. It provides empirical evidence for a pragmatic approach: rather than trying to train every developer on accessibility (a strategy that has demonstrably failed over 20+ years), embed accessibility into the tools developers already use. The finding that developers never prompt AI for accessibility — even those who know about WCAG — confirms that accessibility awareness alone is insufficient; it must be built into default workflows. The multi-agent architecture (generate accessible code, detect violations, remind about manual steps) provides a replicable pattern for any AI coding tool. The tension between "silent" improvement and genuine understanding is important: CodeA11y improved code but did not necessarily educate developers, raising questions about long-term sustainability. The paper's framing — that AI assistants installed over 20 million times could "silently" start producing more accessible code — represents perhaps the most scalable intervention in web accessibility to date. The open-source tool (github.com/peyajm29/codea11y) is immediately actionable for development teams.

Tags: web accessibility · AI coding assistants · developer tools · WCAG · automated testing · GitHub Copilot · large language models · accessibility education · software development

Standards referenced: WCAG 2.1