Investigating 'Touch and Talk' for Blind and Low Vision People: Science Communication Assistance Through Exploring Multiple Tactile Objects

Ayaka Tsutsui, Xiyue Wang, Hironobu Takagi, Chieko Asakawa · 2025 · ASSETS 2025: 27th International ACM SIGACCESS Conference on Computers and Accessibility · doi:10.1145/3663547.3746373

Summary

This paper investigates how interactive dialogue and multiple tactile 3D models can enhance science communication for blind and low vision (BLV) users. The research addresses a gap in existing tactile learning tools: most Interactive 3D Models (I3Ms) rely on pre-defined audio commands and fixed interaction flows, limiting users to passive consumption rather than active exploration. The authors propose a "Touch and Talk" system that combines multiple tactile objects with voice-based conversational interaction powered by GPT-4. The research comprises two studies. Study 1 conducted semi-structured interviews with 22 tactile guidance experts — science museum communicators, tactile museum staff, teachers of the visually impaired (TVIs), and accessibility researchers — to identify effective instructional strategies for tactile exploration. Key findings included the importance of storytelling narration (guided sequential explanations), question-based interaction (user-driven inquiry), and adaptive communication strategies that adjust to individual learner needs, visual conditions, and familiarity levels. Study 2 employed a Wizard-of-Oz technology probe with 10 BLV participants (7 totally blind, 3 low vision) exploring two scientific themes: earthquake and tsunami mechanisms, and the Hayabusa2 asteroid exploration mission. Each theme used three tactile 3D models that participants could touch, rotate, and disassemble while receiving voice-based explanations. The system offered two interaction modes — storytelling narration (sequential guided explanation) and question-based interaction (open-ended Q&A) — that participants could switch between freely. A human wizard monitored interactions and could intervene when the AI system produced errors, using a strict protocol for handling uncertainty.

Key findings

Nine of ten participants preferred storytelling narration for reducing cognitive load and supporting structured learning, though one participant gravitated toward question-based interaction for more autonomous exploration. All participants used spontaneous verbal feedback during narration ("Ah, I see," "Thank you," "That's interesting"), demonstrating real-time engagement. Significant challenges emerged with spatial guidance: when the system said "move to the right," participants often overshot, and corrective instructions caused further deviation and stress. Participants described this as "timing stress" — the system's spoken instructions did not align with the rhythm of their hand movements, disrupting concentration. Five participants explicitly noted transcription and GPT-4 response delays as disruptive. Some participants adapted by slowing their exploration to match the system's pace, reflecting a strong desire to maintain agency over interaction flow. Understanding scores improved measurably across both themes. For the earthquake theme, average understanding increased from 2.8 to 4.8 (on a 7-point scale); for Hayabusa2, scores rose from 2.0 to 4.4. Interest also increased, from 5.4 to 5.8 for earthquakes and 4.4 to 6.4 for Hayabusa2. Multiple tactile models within a single theme helped participants form integrated mental representations, though some struggled to understand cross-model relationships without clear sequencing and spatial framing. Participants who were totally blind required additional structural guidance and spatial cues compared to low-vision participants. Braille labels on models proved ineffective for 5 of 10 participants who could not read braille, with one mistaking braille for decorative features. The paper proposes six design implications including dynamic personalization through onboarding, use of analogies for scale, real-time hand tracking, adaptive pacing, multimodal feedback for disassemblable models, and reflection prompts.

Relevance

This research has significant implications for accessible science education and museum experiences. It demonstrates that combining multiple tactile objects with conversational AI creates richer learning experiences than single-model approaches, but highlights critical design challenges that must be addressed for effective deployment. The "timing stress" finding is particularly important for any voice-guided tactile system: audio instructions must synchronize with users' natural exploration rhythms rather than imposing the system's pace. This parallels broader accessibility principles about respecting user agency and pace. The paper's finding that braille labels were ineffective for half the participants is a practical reminder that BLV populations are diverse — design assumptions about braille literacy can exclude many users. The six design recommendations provide a concrete roadmap for building autonomous Touch and Talk systems, including the three-stage technical roadmap from Wizard-of-Oz to camera-based finger tracking to fully adaptive systems. For practitioners working on accessible exhibits, educational tools, or AI-powered guidance systems, this paper offers evidence-based guidance on balancing structured narration with user-driven exploration, managing the temporal dynamics of audio-tactile interaction, and supporting the full spectrum of visual conditions within the BLV community.

Tags: tactile graphics · blind and low vision · science communication · 3D printed models · multimodal interaction · Wizard-of-Oz · spatial orientation · voice interaction · tactile exploration · assistive technology