Towards Automated Accessibility Report Generation for Mobile Apps

Amanda Swearngin, Jason Wu, Xiaoyi Zhang, Esteban Gomez, Jen Coughenour, Rachel Stukenborg, Bhavya Garg, Greg Hughes, Adriana Hilliard, Jeffrey P. Bigham, Jeffrey Nichols · 2024 · ACM Transactions on Computer-Human Interaction · doi:10.1145/3674967

Summary

This paper presents a system for automatically generating whole-app accessibility reports for mobile apps, addressing key limitations of existing accessibility scanning tools. The work begins with formative interviews with eight accessibility QA professionals at a large technology company, revealing three core pain points: current tools only scan one screen at a time requiring laborious manual navigation, they produce noisy results full of duplicates and false positives, and they provide no app-wide overview of issues. From these findings, the authors derived three design goals: reduce manual scanning effort, provide an overall accessibility report across the entire app, and enable noise reduction through issue ignoring. The system combines an app crawler (or manual recording tool) with Apple's Accessibility Inspector to collect screenshots and scan results across an app, then uses ML models and heuristics to generate a summarized, de-duplicated report. The technical pipeline includes a screen grouping model that clusters instances of the same screen together to build an app storyboard, UI element matching heuristics to de-duplicate issues across screen instances, a pixel-based ignore feature for managing known issues, and a UI element detection model to filter false positives from elements not visible on screen.

Key findings

The screen grouping model achieved 96.9% accuracy (88.8% F1-score) on a new dataset of 226K screens from 6,332 apps — a significant improvement over prior baselines. The UI element matching heuristics achieved 97% accuracy (98.2% F1-score) on 138K element correspondences. In a user study with 19 accessibility engineers and QA testers, 13 of 19 preferred the automated report (AC) over single-screen (SS) or multi-screen (MS) manual scanning. Participants were significantly more satisfied with audit lists created using the AC tool (mean 4.27/5) compared to MS (3.61) and SS (3.50), with statistically significant differences. The AC was rated best for discovering issues quickly (14 participants), finding the most common issues (14 participants), and finding the most important issues (7 participants). Participants using manual scanning modes scanned only 3-5 screens in the allotted time, while the crawler covered 39-63 screens per app. Experience reports from five internal developers confirmed the system found real, high-impact issues including missing Dynamic Type support and incorrect accessibility labels.

Relevance

This research directly addresses a major bottleneck in mobile accessibility practice: the gap between knowing accessibility testing should be done and actually doing it efficiently at scale. Current tools like Accessibility Inspector and Accessibility Scanner require tedious screen-by-screen manual operation, which discourages regular use and makes comprehensive auditing impractical for large apps. By automating data collection through crawling and generating summarized reports with de-duplicated, prioritized issues, this system makes accessibility testing more actionable for development teams. The pixel-based approach is particularly significant because it works on apps with incomplete or missing view hierarchies — exactly the apps most in need of accessibility repair. The ignore feature supports regression monitoring, enabling teams to track accessibility over time. For organizations building accessibility into their development workflows, this represents a practical path toward continuous accessibility monitoring integrated into CI/CD pipelines.

Tags: automated testing · accessibility testing · mobile accessibility · app crawling · machine learning · computer vision · quality assurance · accessibility metadata · screen readers

Standards referenced: WCAG