← All reviews

Automated Generation of Accessible PDF

Shaban Zulfiqar, Safa Arooj, Umar Hayat, Suleman Shahid, Asim Karim · 2020 · Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2020) · doi:10.1145/3373625.3418045

Summary

This demonstration paper presents AGAP (Automated Generation of Accessible PDF), an open-source tool that automates the generation of accessible PDFs from LaTeX source files while making the authoring process itself accessible to people with vision impairments. LaTeX is the dominant document preparation system in STEM fields, yet the PDFs it generates are not inherently accessible — by default, LaTeX does not produce structured, tagged PDFs required for screen reader navigation. While the "accessibility" package exists for generating tagged PDFs and the "Axessibility" package handles mathematical content narration, these address only part of the 86-point accessibility checklist derived from WCAG 2.0 and Section 508 guidelines. AGAP is built as an extension of ALAP (Accessible LaTeX-based Authoring and Presentation), an existing open-source desktop system that makes LaTeX editing accessible to PVIs. AGAP adds a "PDF accessibility mode" (toggled via CAPS+A) that analyzes the document upon compilation and flags accessibility violations as warning messages in the problem window with line numbers. When the text-to-speech engine is enabled, AGAP narrates violations aloud. The modified LaTeX parser treats accessibility violations the same way it treats errors — as exceptions that do not halt the build but are given priority. Examples of violations AGAP detects include: use of non-recommended fonts or watermarks, missing alt text (tooltips) for images, missing labels on form fields, broken URLs, and blank table cells.

Key findings

Of the 86 accessibility points identified from combined WCAG 2.0 and Section 508 checklists, LaTeX handles approximately 20 by default, 48 were not applicable to PDF/LaTeX contexts (e.g., OCR), and AGAP addresses the remaining 18 points. In a comparative evaluation, a proficient sighted LaTeX user generated PDFs from the same source code using both TeXlipse (a common desktop LaTeX editor) and ALAP with AGAP enabled. TeXlipse produced no accessibility prompts, leaving all violations unaddressed. AGAP generated approximately 4 compile-time warnings about issues including missing image tooltips, the accessibility package not being imported, missing form field labels, broken URLs, and blank table cells — all of which the participant was able to fix. When both resulting PDFs were evaluated using Adobe Acrobat Accessibility Checker across four categories (Document, Page Content, Forms, Tables & Lists, and Alternate Text & Headings), the TeXlipse-generated PDF had 23 accessibility issues while the AGAP-generated PDF had only 12 — a 48% reduction. AGAP is equally usable by sighted users (who can toggle TTS off via CAPS+Q) and PVIs (who receive spoken feedback), and the keyboard-shortcut-based interaction ensures full accessibility without requiring a mouse.

Relevance

AGAP addresses a critical gap in academic accessibility: STEM researchers produce millions of LaTeX documents annually, and the resulting PDFs are overwhelmingly inaccessible to screen reader users. This creates a significant barrier for blind and low-vision researchers, students, and reviewers (as documented in the Shinohara et al. paper on doctoral students in computing). By catching accessibility violations at compile time — when authors can most easily fix them — AGAP shifts accessibility from a post-production remediation task to an integrated part of the authoring workflow. The approach of treating accessibility violations as compiler warnings (rather than errors) is pragmatic: authors are informed and guided but not blocked from generating their document. For accessibility practitioners, the finding that standard LaTeX editors provide zero accessibility feedback demonstrates why inaccessible PDFs remain the default in academia. Limitations include the short demo format (3 pages), the single-user evaluation, the fact that AGAP only addresses 18 of 86 checkpoints (the rest being handled by default or out of scope), and the 12 remaining violations in the AGAP-generated PDF.

Tags: PDF accessibility · document accessibility · screen readers · STEM accessibility · automated testing · blindness and low vision · web accessibility · software development

Standards referenced: WCAG 2.0 · Section 508