Blind Users Accessing Their Training Images in Teachable Object Recognizers

Jonggi Hong, Jaina Gandhi, Ernest Essuah Mensah, Farnaz Zamiri Azar, Kyungjun Lee, Hernisa Kacorri · 2022 · Proceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '22) · doi:10.1145/3517428.3544824

Summary

This paper introduces MYCam, an open-source iOS testbed application designed to help blind users build and improve personalized object recognizers. While teachable object recognizers allow users to train custom models by taking photos of objects they want to recognize, blind users face a fundamental challenge: they cannot visually verify whether their training photos are good enough for accurate recognition. The research addresses this gap by developing real-time "data descriptors" — automated feedback mechanisms that analyze training photo quality across five dimensions: whether the target object is cropped out of frame, whether the object appears too small, whether the user's hand is visible in the photo, whether the image is blurred, and how much variation exists across the set of training photos. MYCam uses Inception V3 with transfer learning for classification, YOLOv3 for object detection to assess cropping and size, a hand segmentation model to detect hand presence, and Laplacian variance for blur detection. Users capture 30 training photos per object category. The researchers conducted a remote user study with 12 blind participants who trained recognizers for three visually similar snack bags (Fritos, Cheetos, Lays). Participants first trained without descriptors, then received descriptor feedback and could choose whether to retrain. The study examined how descriptors affected photo-taking behavior and whether participants found the feedback useful for improving their training data.

Key findings

The study revealed that blind users' training photos frequently contained quality issues — cropped objects, insufficient variation, and hand occlusion were common problems. When data descriptors were provided, participants significantly reduced the number of cropped photos in their retraining sessions, demonstrating that targeted feedback changed photo-taking behavior. Five of twelve participants chose to retrain after seeing their descriptors, primarily motivated by high cropping rates or low variation scores. Average classification accuracy across participants was 0.65 on their own test images, reflecting the genuine difficulty of distinguishing visually similar objects. Participants overwhelmingly found descriptors easy to understand (92%) and useful (83%), though their preferences for which descriptors mattered most varied considerably. Some participants developed creative strategies in response to feedback, such as using tactile landmarks to better frame objects. The variation descriptor proved particularly informative, as many participants were unaware they were taking very similar photos repeatedly. However, retraining did not always improve accuracy — the relationship between descriptor-guided improvements and classification performance was complex, suggesting that photo quality is necessary but not sufficient for good recognition.

Relevance

This research has significant implications for making AI-powered assistive tools truly accessible to the people they serve. As teachable machine learning becomes more common in accessibility applications, ensuring that blind users can independently build and maintain their own models is essential for autonomy and self-determination. The data descriptor approach offers a practical framework that developers can adopt when building any camera-based AI tool for blind users — providing structured, non-visual feedback about image quality rather than assuming visual verification. The open-source MYCam platform enables further research and development in this space. The findings also highlight broader design considerations: automated feedback must be carefully calibrated to be actionable without being overwhelming, and different users have different information needs depending on their experience and confidence levels. For accessibility practitioners, this work reinforces that making AI tools accessible requires going beyond basic screen reader compatibility to address fundamental interaction paradigms like visual verification.

Tags: teachable AI · object recognition · blind users · machine learning · camera interaction · data quality · mobile accessibility