Marker-based image recognition of dynamic content for the visually impaired

Andréa Britto Mattos, Carlos Cardonha, Diego Gallo, Priscilla Avegliano, Ricardo Herrmann, Sergio Borger · 2014 · Proceedings of the 11th Web for All Conference (W4A) · doi:10.1145/2596695.2596707

Summary

This paper from IBM Research Brazil introduces a marker-based image recognition technique to help visually impaired people access information displayed on public panels and boards with fixed layouts but dynamic content — such as vending machines, split-flap airport displays, electronic transit signs, and bus stop boards. The core challenge is that general-purpose object recognition struggles in complex real-world scenes with variable lighting, perspective distortion, and visual clutter. The authors' approach places four BCH fiducial markers (similar to those used in augmented reality) on or around the panel. These markers serve four purposes: locating target objects in the image without searching the entire scene, correcting perspective distortion via homography transformation, limiting the training set for supervised learning by identifying which specific panel the user is photographing, and guiding the user with spoken feedback on how to reposition their camera if not all markers are detected (e.g., "move the camera down" if only upper markers are visible). The system runs on mobile devices without requiring internet connectivity.

Key findings

Testing with 240 markers across 60 images in varying conditions of scale, illumination, rotation, and perspective achieved a 99.16% marker detection rate, demonstrating high robustness even in poor lighting and with blurring. For product recognition in vending machines, the system correctly identified 89.85% of 700 product images across two machines with different layouts, using a relatively simple histogram-based matching approach (colour in HSV space and Local Binary Patterns for texture). The simplicity of the recognition algorithm was possible precisely because markers solved the harder location problem — once the system knows where each product slot is, matching against a restricted training set becomes tractable. The system also detected empty slots via background removal. One source of errors was perspective misalignment caused by markers being on the glass surface rather than co-planar with the products behind the glass. The authors note this is a low-cost solution since fiducial markers can be printed on regular paper.

Relevance

This research addresses a practical everyday challenge for visually impaired people: independently accessing information in physical public spaces that changes over time and therefore cannot be pre-described or tagged with static QR codes. The marker-based approach offers advantages over both crowdsourcing (which requires internet connectivity and suffers response time delays) and pure computer vision (which struggles with real-world complexity). The concept of situational disability is also relevant — the same technique helps tourists who cannot read local signage. While this 2014 work predates the dramatic improvements in deep learning-based computer vision, the underlying design principles remain valuable: using environmental markers to constrain and simplify the recognition problem, providing camera guidance to help users take usable photographs, and designing for offline mobile operation. The approach demonstrates how modest physical modifications to public infrastructure (adding printable markers) can significantly improve accessibility.

Tags: computer vision · visual impairment · mobile accessibility · object recognition · situational disability · independent living · assistive technology