A TensorFlow-based Assistive Technology System for Users with Visual Impairments

Davide Mulfari · 2018 · Proceedings of the 15th International Web for All Conference (W4A 2018) · doi:10.1145/3192714.3196314

Summary

This extended abstract presents a wearable computer vision system that uses deep learning to classify objects in a blind user’s surroundings and provide audio descriptions via text-to-speech. The system addresses a limitation of smartphone-based object recognition apps: people with visual impairments have difficulty pointing a handheld camera toward a target object they cannot see. The proposed solution mounts a low-cost Technaxx Video Sport Sunglasses camera on the user’s glasses, connected to a Raspberry Pi 3 Model B single-board computer running Google’s TensorFlow framework with a Python API. All image processing occurs locally on the device without requiring an internet connection. When the user slowly moves their head, they can activate the system to classify what the camera sees and hear a computer-generated voice describing the detected objects. The system is designed for specific controlled environments such as museums, where custom image classifiers can be trained on the particular artworks or exhibits present. The training process uses transfer learning: starting with a pre-trained Inception v3 convolutional neural network (CNN) model that has learned general image features from millions of images, the final layer is retrained on a custom dataset specific to the target environment. This approach requires only 30+ images per object category (captured from different angles) and approximately 30 minutes of training time on the Raspberry Pi.

Key findings

The system was tested using two custom-trained classifiers: one for distinguishing between six different paintings in a simulated museum scenario, and another for classifying five common office objects. The painting classifier achieved 91.4% accuracy with only 214 training images across 6 categories, and the office object classifier achieved 93% accuracy. The Raspberry Pi processed each image classification in approximately 10 seconds, which the author acknowledges needs improvement for real-time use but considers acceptable for a proof-of-concept on low-cost embedded hardware. The total hardware cost was minimal, consisting only of the Raspberry Pi (~$35) and the camera sunglasses. The system works entirely offline, addressing privacy concerns about streaming visual data to cloud servers and eliminating dependence on network connectivity. The author envisions the system being used in museums where a pre-trained model specific to the exhibits could be loaded onto the device, allowing blind visitors to independently identify and learn about artworks by simply looking in their direction.

Relevance

This brief proof-of-concept demonstrates how the convergence of low-cost embedded computing (Raspberry Pi), open-source deep learning frameworks (TensorFlow), and transfer learning techniques can create affordable, wearable assistive technology for object recognition. The approach of training custom classifiers for specific environments — rather than attempting general-purpose object recognition — is a pragmatic design decision that achieves high accuracy (>90%) with minimal training data. For accessibility practitioners working with museums, galleries, or other curated spaces, this suggests that environment-specific visual recognition systems can be created relatively easily using transfer learning. The wearable glasses-mounted camera addresses the genuine usability problem of smartphone-based solutions requiring users to aim a camera they cannot see through. However, as an extended abstract with only 2 pages, the work is preliminary: there is no user evaluation with blind participants, the 10-second classification latency limits practical utility, the system only classifies single objects rather than describing scenes, and the museum use case assumes a controlled environment with a finite set of recognisable items. Nevertheless, it illustrates an accessible entry point for creating custom assistive technology using commodity hardware and open-source software.

Tags: computer vision · deep learning · blind · visual impairment · wearable technology · object recognition · Internet of Things · Raspberry Pi · museum accessibility · TensorFlow