AltCanvas: A Tile-Based Editor for Visual Content Creation with Generative AI for Blind or Visually Impaired People

Seonghee Lee, Maho Kohga, Steve Landau, Sile O'Modhrain, Hari Subramonyam · 2024 · Proceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS) · doi:10.1145/3663548.3675600

Summary

This paper presents AltCanvas, a generative AI-powered illustration tool that enables blind or visually impaired (BVI) users to create visual content through a novel tile-based interface. The system addresses a fundamental gap between two existing approaches: accessible line-by-line drawing tools (suitable for simple graphics but tedious for expressive artwork) and text-to-image AI tools (capable of expressive output but lacking precise compositional control). AltCanvas combines both approaches through a dynamic tile-based interface where each tile represents an object in a visual scene. Users construct scenes incrementally by adding objects via voice commands, then editing their position, size, and arrangement using keyboard shortcuts while receiving speech descriptions and sonification feedback. The tile view provides an alternative spatial representation of the canvas — users navigate directionally through tiles to understand relative object positions without needing to comprehend absolute coordinates. The system uses GPT-4o for image generation and description, with separate prompt pipelines for tactile graphics (following BANA guidelines for clear outlines and simplified details) and color illustrations. The research involved 14 BLV participants across three iterative studies: a formative study with 5 experienced blind visual content creators, a preliminary design feedback study with 6 participants, and a final usability evaluation with 8 participants.

Key findings

The formative study with experienced BVI content creators revealed five key design requirements: precise compositional control without manual coordinate calculation, easy tactile graphic compatibility, dimension and feature-based editing, minimized print iterations, and verbal/auditory dialogic feedback. In the usability evaluation, participants completed illustration tasks averaging 10.4 minutes for simple scenes (3 objects) and 21.5 minutes for complex scenes (5 objects with 8 edit interactions). Usability ratings were high: voice commands (avg 6.5/7), tool understandability (avg 6.5/7), ease of editing (avg 6.3/7), and tile-based interface intuitiveness (avg 5.8/7). Sonification was particularly effective — participants used distinct sounds to detect canvas edges, object overlaps, and spatial relationships, with one participant noting "sounds are like colors to us." Image descriptions were used strategically throughout editing, with a clear pattern of increased description usage after edit operations and toward task completion. The tile-based paradigm successfully supported spatial cognition, with all participants able to accurately articulate object positions. Printed tactile graphics aligned well with participants' mental models, with one stating the output was "almost precisely how I imagined it would be placed." However, the unpredictability of AI-generated outputs was frustrating for experienced creators who wanted precise control.

Relevance

AltCanvas represents a significant advance in accessible creative tools by demonstrating how generative AI can be combined with constructive editing approaches to give BVI users genuine agency over visual content creation. For accessibility practitioners and tool designers, the tile-based interaction paradigm offers a transferable model for making spatial interfaces accessible — it could extend to presentation software, educational materials, game development, and urban planning visualizations. The sonification design principles (edge collision sounds, directional navigation tones, frequency-mapped size changes) provide a practical vocabulary for non-visual spatial editing. The dual rendering capability — generating both tactile graphics and color illustrations from the same composition — is particularly valuable for BVI professionals who need to produce content for diverse audiences. The research also highlights the tension between AI-generated creative output and user agency, suggesting that hybrid approaches combining generative capabilities with fine-grained constructive control best serve users with clear creative visions.

Tags: visual impairment · generative AI · image creation · tactile graphics · sonification · accessible design tools · text-to-image · spatial cognition

Standards referenced: BANA Guidelines for Tactile Graphics