When Audio Is Enough: Design-Tradeoffs in Multi-Story MR Navigation
Bilgehan Cagiltay, Selim Balcisoy · 2026 · Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI '26) · doi:10.1145/3772318.3790943
Summary
This CHI '26 exploratory study asks when an augmented-reality navigation aid can rely on audio alone. The authors argue that visual-first MR interfaces impose extra cognitive load and attention tunneling in high-stakes, cognitively demanding settings (e.g., first responders, visually impaired users, hands-busy work) and that spatialised sound - which already provides 360-degree awareness - is an underused channel in complex 3D spaces. They built a custom AR navigation system on HoloLens 2 and Steam Audio, using a ray-traced digital twin of a three-floor Sabanci University building to compute realistic reverberation and occlusion. Front/back ambiguity was reduced by linearly interpolating a low-pass dampening from 0 dB at 90 degrees to -10 dB at 180 degrees, applied only when the audio source was in line of sight. Twenty-eight engineering students (four groups of seven) were pseudo-randomly assigned to a condition - no-AR (paper floor plans only), VisualOnlyAR (holographic path overlay), AudioOnlyAR (a spatialised sonar ping anchored at the target with volume increasing on approach), or AudioVisualAR (both). Each participant completed a practice task plus four real navigation tasks spanning a single floor and movement up or down between floors. During each task the researcher interrupted participants at four predetermined points to probe environmental awareness, asking whether four objects (a white stand, coat hanger, orange backpack, yellow trash can) had been noticed - a SAGAT-inspired method for situational awareness. Outcomes were distance travelled, task time, correctly noticed objects, NASA-TLX workload, and open-ended feedback.
Key findings
AudioOnlyAR was non-inferior to VisualOnlyAR on distance travelled within a 5% margin in the more decision-dense tasks, and both AR conditions beat no-AR significantly in task 3 (the only task that forced complex vertical navigation). Time performance showed no significant differences between conditions. On environmental awareness, participants using AudioOnlyAR and no-AR noticed roughly the same number of real-world objects (means around 1.7-2.0 of 4), while VisualOnlyAR and AudioVisualAR participants noticed fewer (around 1.1-1.5) - evidence that holographic path overlays cause attention tunneling that standalone spatial audio does not. NASA-TLX showed AudioOnlyAR produced significantly lower mental demand than AudioVisualAR (mean diff -1.71 on a 7-point scale, p = .040) and the lowest mental demand overall. AudioOnlyAR users also reported the least feeling rushed and comparable success ratings. Qualitatively, several participants called VisualOnlyAR 'distracting' and said they felt compelled to stare at the path even when they did not need to; AudioOnlyAR users reported the sonar ping was excellent for confirming direction of travel and detecting correct floors, but weaker at fine-grained turn decisions at junctions, where some found the reverberation overwhelming at close range. The authors propose four design guidelines: use audio as an always-on low-occlusion compass, reserve visuals for precision moments, mitigate visual tunneling by fading overlays when not needed, and use dynamic audio whose volume and sound type change with task state.
Relevance
The authors explicitly flag accessibility for blind and low-vision users as a primary downstream application, alongside first responders operating in smoke, darkness, or cognitively saturated conditions. The paper provides empirical evidence that a purely auditory MR aid can equal a visual path overlay on navigation performance in a real multi-story building while preserving environmental awareness and reducing mental workload - an important counterweight to the assumption that AR must be visual. For accessibility practitioners, the specific design decisions worth studying are: the sonar sound chosen from Pixabay for its low cognitive load and association with search, the 90-180 degree front/back dampening to combat localisation ambiguity, and the dynamic-volume proximity cue. Major caveats: the 28-participant sample had no disability requirement beyond sight/hearing, engineering students may not generalise to BLV or first responder users, and the study used a generic HRTF (personalised HRTFs remain an open access question). Follow-up work with BLV participants is needed before treating these guidelines as accessibility recommendations.
Tags: mixed reality · audio augmented reality · spatial audio · indoor navigation · wayfinding · situational awareness · cognitive load · attention tunneling · blind and low vision · multimodal interaction · non-visual navigation · first responders