Understanding the Use of a Large Language Model-Powered Guide to Make Virtual Reality Accessible for Blind and Low Vision People

Jazmin Collins, Sharon Y Lin, Tianqi Liu, Andrea Stevenson Won, Shiri Azenkot · 2026 · Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI '26) · doi:10.1145/3772318.3791143

Summary

Collins and colleagues present the first empirical user study of an AI-powered 'sighted guide' for blind and low-vision (BLV) users in social virtual reality. Social VR platforms like VRChat (40,000 concurrent players) are largely inaccessible: they require interpreting avatars, tracking changing spatial contexts, and reading non-verbal cues, and existing accessibility work has focused mostly on low-level cues (footstep sonification, collision haptics) rather than on high-level social and spatial understanding. Building on their prior design study, the authors refined a GPT-4-based conversational guide with three selectable embodied personas — a human assistant, a guide dog, and a robot — all called 'Giddy'. The guide runs inside a Unity VR environment on a Meta Quest 2 headset, uses OpenAI Whisper for speech recognition, OpenAI TTS for the three persona voices, and Unity NavMesh for avatar pathing. Users can request navigation, visual description, audio beacons, social interaction mediation, confirmation, clarification, and auditory description. The authors ran a 90-minute session with each of 16 BLV participants (recruited through the LightHouse for the Blind and Visually Impaired in San Francisco; 7 women, 9 men; ages 22-75; diverse etiologies including ROP, retinitis pigmentosa, optic atrophy, cortical visual impairment, diabetic retinopathy). Each participant completed an exploratory task alone and a social tour task with two sighted-acting confederates in two virtual parks, using both the dog persona and the robot persona in counterbalanced order.

Key findings

The guide handled 301 of 476 queries correctly (63.2% accuracy) in 6.1-11.2 seconds of average latency. Participants made 799 requests, dominated by navigation (60.4%) and visual description (31.4%), with small shares for clarification, audio beacons, social interaction, confirmation, and auditory description. The central finding is behavioural: participants treated the same guide differently depending on social context. Alone, 74.6% of queries were utilitarian commands ('Take me to the fountain'); 15.8% polite, 9.6% friendly. In the social tour, nearly all participants (12/16) became friendlier — giving the guide nicknames (Prince, Rufus, Jerry, Gabby, Robey, Diego), using gendered pronouns, and encouraging confederates to interact with it. 10/16 began role-playing with the guide based on its persona, rationalising its errors in character (Maritza: 'My dog went to sleep'). The dog persona was seen as 'cuter' or 'empowering' (a blindness symbol); the robot was dismissed when it erred. Participants held AI guides to a different standard than human guides, treating the AI as a reactive failsafe rather than a proactive navigator. Likert means: Usability 3.2, Usefulness 3.5, Joy of Use 4.1, Social Comfort 3.7, Scene Understanding 3.6, Object Perception 3.6, Navigation 3.1.

Relevance

This is a foundational empirical paper for anyone building AI accessibility tools that users will deploy around other people. The most actionable finding is that social context changes how BLV users use and relate to their tool — friendliness, nicknaming, and role-play are not quirks but strategies for rationalising AI errors, absorbing embarrassment, and inviting sighted peers into a shared experience. Product implications: AI guides should foster interactive social capabilities (petting the dog, responding to role-play), align persona appearance with the system's actual capabilities (a small puppy persona can lower expectations and encourage user 'training' behaviour), and provide explicit failsafe mechanisms (a 'teleport me back to safe location' button) rather than expecting users to phrase recovery as a conversational request. The work also surfaces a social-comfort tension: the visibility of an AI guide may disclose a user's disability in VR in ways they would rather control, echoing findings around canes and guide dogs in the physical world. Limitations include the 63.2% accuracy bottleneck driven by older-generation LLM and STT stack (mid-2023 GPT-4), no longitudinal data on how behaviour changes with familiarity, and a cohort that was new to social VR.

Tags: virtual reality · social VR · blind and low vision · AI guide · large language model · sighted guide · embodied agent · avatar · human-AI interaction · user study