Direct or Immersive? Comparing Smartphone-based Museum Guide Systems for Blind Visitors
Xiyue Wang, Seita Kayukawa, Hironobu Takagi, Giorgia Masoero, Chieko Asakawa · 2024 · Proceedings of the 21st International Web for All Conference (W4A) · doi:10.1145/3677846.3677856
Summary
This paper presents the first direct comparison of two smartphone-based museum guide paradigms for blind visitors: a "direct" system using turn-by-turn navigation with VoiceOver-controlled audio descriptions, and an "immersive" system using spatialized sound navigation with automatically playing narration and ambient audio. The study was conducted at Miraikan, Japan's National Museum of Emerging Science and Innovation, with seven totally blind participants (ages 20-67) who experienced both systems across two different exhibitions — a biology exhibition and an earth science exhibition, each featuring tactile exhibits. The direct guide uses iOS ARKit for visual-inertial odometry to track user position, provides spoken turn-by-turn directions with sonified confirmation cues, and presents exhibit information through a chapter-based screen reader interface that users control via VoiceOver. The immersive guide uses a markerless Visual Positioning System for localization and emits a spatialized "dinding" bell sound from the direction of the next exhibit, growing louder as the user approaches. Upon arrival, it automatically plays vivid narration with ambient sounds (such as ocean echoes for marine exhibits) and provides spoken tactile guidance. Both systems were deployed on iPhone 12 Pro devices worn around the neck, with open-ear Bluetooth earphones allowing environmental awareness. The researchers collected Likert-scale ratings, post-study interview data, and detailed video analysis of navigation performance, information recall, and tactile exhibit interactions.
Key findings
Spatialized sound navigation was more effective and preferred over turn-by-turn instructions. Five of seven participants needed additional assistance with turn-by-turn navigation (especially for turning), while only one needed help with spatialized sound. Spatialized sound produced faster travel times (mean 4.21 s/m vs 6.05 s/m) and smoother, curved walking trajectories, whereas turn-by-turn users exhibited zigzag patterns with frequent stops to reorient. Participants rated spatialized sound higher on ease of understanding (median 6 vs 5), ease of use (median 7 vs 5), and enjoyment (median 7 vs 5). For information provision, autoplay achieved better recall rates — six of seven participants recalled all exhibits correctly with autoplay versus three with screen reader controls. Participants attributed this to autoplay's natural voice narration and lower cognitive load compared to VoiceOver's monotone delivery. However, participants valued screen reader controls for autonomy, allowing them to pause, skip, and replay content at their own pace. Three participants felt rushed by autoplay when interacting with tactile exhibits. Both systems struggled with tactile exhibit guidance: over half the participants failed to locate exhibits or touched incorrect locations, with simple text-based touch instructions proving inadequate. Participants requested hand-tracking technology, tactile floor markers, and spatial overviews of exhibit layouts. Overall, four participants preferred the immersive system and three preferred the direct system, but participants unanimously expressed desire for a hybrid approach combining immersive navigation with direct control options.
Relevance
This study provides actionable design guidance for anyone developing accessible museum experiences or indoor navigation systems for blind users. The key practical insight is that neither paradigm alone is sufficient — the ideal system combines spatialized audio for intuitive navigation with on-demand direct controls for information access and autonomy. For museum professionals, the findings highlight that tactile exhibits remain a major challenge even with guide technology, as current text-based instructions cannot adequately convey spatial layout and precise touch locations. The study also reveals that VoiceOver's monotone speech significantly impacts engagement compared to expressive narration with ambient sounds, suggesting that audio guide content design matters as much as the underlying technology. The small sample size (n=7) limits generalizability, and the two systems were tested in different exhibitions rather than the same one, introducing potential confounds. Nevertheless, the proposed six design considerations — including crowd-aware navigation, voice commands, and hand-tracking for tactile interaction — offer a practical roadmap for advancing museum accessibility for blind visitors.
Tags: museum accessibility · blindness · indoor navigation · spatialized audio · screen readers · tactile exhibits · audio description · assistive technology