AIGuide: An Augmented Reality Hand Guidance Application for People with Visual Impairments
Nelson Daniel Troncoso Aldas, Sooyeon Lee, Chonghan Lee, Mary Beth Rosson, John M. Carroll, Vijaykrishnan Narayanan · 2020 · Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2020) · doi:10.1145/3373625.3417028
Summary
This paper presents AIGuide, a self-contained offline iOS smartphone application that uses augmented reality (ARKit) to help people with visual impairments locate, navigate to, and pick up objects in their surroundings. Unlike existing object detection apps (Seeing AI, Aipoly) that identify objects but do not provide relative 3D position or hand guidance, and unlike remote sighted assistance services (Aira, BeMyEyes) that require internet and raise privacy concerns, AIGuide addresses the "last meter problem" — guiding the user's hand from detection to physical grasping. The app uses ARKit's 3D object detection to recognize pre-scanned objects and track them in real-time relative to the phone's camera. It then guides the user through four phases: (1) Selection — choosing a target object from a VoiceOver-accessible searchable list; (2) Localization — scanning the environment by moving the phone until the object is detected; (3) Guidance — multimodal directional feedback (speech for left/right/up/down corrections, beeping sound whose frequency increases with proximity, and haptic taps) guiding the hand to the object; (4) Confirmation — verifying the correct object was grasped by holding it before the camera or shaking the phone. The app works entirely offline with no external hardware, using feature-point-based object recognition from pre-scanned .arobject files embedded in the application.
Key findings
A remote at-home user study (conducted via Zoom due to COVID-19) with 10 participants with visual impairments (9 totally blind, 1 legally blind; ages 22-45; 5 male, 5 female; all VoiceOver users) evaluated three feedback modes: sound only, haptic only, and combined sound+haptic. Each participant completed 9 trials finding and picking up three grocery items (cereal box, tea canister, fruit bars box) placed on home surfaces. Only 5 failed tasks out of over 90 trials indicated high accuracy for finding and guiding. Average task completion times were similar across feedback modes (sound: 19.61s, haptic: 28.11s, combined: 20.60s) with no statistically significant differences, though sound-only was numerically fastest. However, 5 of 10 participants preferred the combined sound+haptic mode for its redundancy and completeness. An interesting disconnect emerged: haptic-only was perceived as cognitively easier but produced the slowest performance, while sound-only was fastest but only preferred by 2 participants — confirming the classic finding that performance and preference do not always align. Eight of 10 participants found the location information (distance in feet, direction in degrees) extremely helpful: "It gives enough detail that you can snap yourself to attention as supposed to think your phone is jibber-jabbering." Participants envisioned broad applications: finding lost keys, locating medication, shopping for groceries, and identifying misplaced household items. All preferred the smartphone form factor over wearables for portability and avoiding stigma: "So many people use it now."
Relevance
AIGuide demonstrates that commodity smartphone hardware (ARKit on iPhone) can solve the last-meter problem of guiding a blind person's hand to a specific object — bridging the gap between knowing an object exists (current detection apps) and physically acquiring it. For accessibility practitioners, the key design insights are: (1) different feedback modalities serve different information types — using sound for distance (continuous) and speech for direction (discrete corrections) reduced cognitive overload compared to conveying both through the same channel; (2) confirmation phases should be fast and intuitive (shaking the phone was unintuitive to participants); (3) location information in feet/inches was valued but degree measurements were confusing; (4) the scanning phase needs better progress feedback and timeouts. The remote Zoom-based user study methodology, developed out of COVID-19 necessity, is itself a methodological contribution — using two camera angles (side and front views) to observe hand-object interactions remotely. Limitations include the need to pre-scan objects (limiting generalizability), small sample size, uncontrolled home environments, and the current lack of generic object detection.
Tags: visual accessibility · augmented reality · blindness and low vision · mobile accessibility · computer vision · assistive technology · multimodal interaction · independent living