Bridging the Gap between Automated Intervention and Actual User Experience: A Mixed-Methods Study on Mobile Accessibility Issues for Screen Reader Users

Syed Fatiul Huq, Ziyao He, Yirui He, Sam Malek · 2026 · Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI '26) · doi:10.1145/3772318.3791293

Summary

This CHI 2026 paper by Huq, He, He, and Malek (UC Irvine) argues that existing automated accessibility testing tools for mobile apps do not faithfully represent what blind screen reader users actually experience, and proposes a user-aware categorisation to bridge the gap. The authors note that WCAG was built for the web and translates imperfectly to native mobile, while prior automated tools only cover roughly 20% of real user-reported issue types. The study is a mixed-methods investigation structured around two research questions: (RQ1) What automated interventions exist for mobile screen reader accessibility? and (RQ2) How do those issues actually manifest to and affect users? For RQ1, the team ran a PRISMA-style systematic literature review across ACM DL, IEEE Xplore, ScienceDirect, Scopus, and Web of Science, screening 367 papers down to 31 (2017–2025). They classify interventions into four techniques—automated crawlers, automation support, label generators, and UI annotators—and categorise the 22 issue types they address into labelling, navigation, activation, and dynamic-change categories, mapping each to WCAG 2.2 success criteria. For RQ2, they conducted 20 user studies with blind participants (recruited via Fable Tech Labs and Program-L) on four open-source Android apps (Money Manager, uHabits, AnkiDroid, Aard2), using think-aloud protocol, hybrid deductive/inductive thematic analysis, and coding impact, perception, suggestions, and workarounds. The synthesis produces Mobile Content Accessibility Guidelines (MCAG), organised under WCAG 2.2's four principles (Perceivable, Operable, Understandable, Robust) with 11 guidelines covering 24+ concrete issue types.

Key findings

The systematic literature review revealed a clear shift in automated interventions from static source-code analysis toward dynamic, runtime, and screenshot-based approaches, with LLMs emerging as a detection oracle for subjective issues (inadequate descriptions, unnatural navigation order). The user studies surfaced issue types not captured by automated tools, most notably a new "feedback-related" category: missing action feedback, inadequate instructions, and inadequate progress indication. Navigation issues were most frequent (89 instances), while activation issues were most severe (43% caused "blockers"). The most severely impactful problems came from custom views and WebViews (unfocusable elements, non-default gestures like long-press without progress cues, visual-only state changes). Users often perceived multiple distinct technical issues as a single usability problem: when label, focus order, and feedback defects coincided on the same component, the combined effect was a blocker even if each defect alone was minor. Users preferred developer-written labels over auto-generated ones and frequently suggested non-visual alternatives—tabular layouts for charts, text entry instead of sliders, earcons, and haptic feedback. The authors conclude that automation is effective for definitive rules (missing label, unfocusable) but cannot yet reliably detect subjective issues (label quality, focus intuitiveness, feedback adequacy) without human involvement or LLM-based inference.

Relevance

For mobile accessibility practitioners, QA testers, and developer tooling teams, this paper delivers two practical artefacts: a clean taxonomy of automatically-detectable issue types mapped to WCAG 2.2 and the MCAG guideline set specifically framed around screen reader user experience on mobile. Practitioners can use MCAG directly as a checklist for TalkBack/VoiceOver testing, especially for areas WCAG covers poorly—state descriptions, alternative interaction mechanisms for custom views, feedback adequacy, and focus consistency across screens. Product teams relying on Accessibility Scanner or Espresso tests should read this as a caution: automated tool coverage tops out around 20% of user-observed issues, and the most severe issues (custom views, dynamic changes, missing feedback) are systematically under-detected. The paper also provides empirical justification for investing in human user testing with screen reader users (e.g., via Fable) and for LLM-based detection of subjective issues. Limitations include the Android-only focus, four apps, 13 North American testers, and small open-source projects (so findings may not generalise to large commercial apps or iOS). Nonetheless, MCAG is an actionable starting point the field has been missing.

Tags: mobile accessibility · screen readers · TalkBack · automated accessibility testing · software accessibility · accessibility user testing · Android accessibility · WCAG · systematic literature review · blindness and low vision

Standards referenced: WCAG 2.2