← All reviews

Analyzing Visual Layout for a Non-Visual Presentation-Document Interface

Tatsuya Ishihara, Hironobu Takagi, Takashi Itoh, Chieko Asakawa · 2006 · Proceedings of the 8th International ACM SIGACCESS Conference on Computers and Accessibility (Assets '06) · doi:10.1145/1168987.1169016

Summary

This paper from IBM Japan tackles a fundamental accessibility challenge: presentation documents (like PowerPoint slides) convey information primarily through visual layout, making them inherently difficult for blind screen reader users to understand. Screen readers typically read slide objects in z-order (the stacking order), which rarely reflects the meaningful visual structure. The authors propose a method to automatically analyse the visual layout of presentation slides and generate metadata describing three types of relationships between objects: parent-child relationships (where one object visually contains another), sibling relationships (where objects are closely located and aligned), and directional relationships (where arrows connect objects or groups). The analysis uses a five-step pipeline: grouping overlapping objects by area ratios, detecting sibling groups using a graph-cut algorithm, determining distance thresholds using Otsu's method, grouping nearby objects below the threshold, and finally detecting arrow source-destination relationships. The generated metadata is then presented through DocExplorer, a prototype adaptive interface that displays the slide's visual structure as a navigable tree view — an interface pattern blind users are already familiar with from file explorers and similar applications. DocExplorer synchronises with the original OpenOffice.org Impress application, allowing both reading and editing of presentation documents.

Key findings

The visual analysis method was evaluated on two datasets of presentation slides collected from the web. On Dataset 1 (90 slides, 474 arrows, curated by authors), the system correctly analysed 71% of slides completely and 83% of individual arrows. On Dataset 2 (37 slides, 264 arrows, with ground truth from independent evaluators), accuracy was 41% for complete slides and 78% for individual arrows. Performance decreased with slide complexity — slides with fewer than 10 objects achieved 96% slide-level accuracy, while those with more than 20 objects dropped to 67%. The gap between slide-level and arrow-level accuracy indicates that most errors involve only a small number of misdetected relationships per slide, meaning minor manual corrections could fix most issues. The tree view interface required no training for blind users since it uses a familiar navigation paradigm. DocExplorer also supported multiple navigation orders (x-order, y-order) reflecting actual visual layout rather than the arbitrary z-order used by screen readers. The system was implemented in Java using OpenOffice.org's UNO API bridge.

Relevance

This research addresses a problem that persists today: visual diagrams in presentations remain largely inaccessible to screen reader users. While modern presentation tools have improved somewhat (PowerPoint now has an accessibility checker and reading order pane), the core issue — that diagrams, flowcharts, and arrow-connected objects lack semantic structure — is still prevalent. The paper's approach of automatically inferring visual relationships and presenting them as navigable tree structures was ahead of its time and anticipates modern efforts in AI-based document understanding. The work is particularly significant because it comes from IBM's accessibility research team (including Chieko Asakawa, a pioneer in web accessibility), and it moves beyond simply requiring authors to add alternative text toward automated structural analysis. For practitioners, it reinforces that presentation accessibility requires attention to diagram structure, not just reading order and alt text.

Tags: screen readers · presentation accessibility · diagram accessibility · visual layout analysis · metadata · non-visual access · document accessibility · alternative interface

Standards referenced: SVG 1.1 · CSS 2.1 · OASIS ODF