VisPhoto: Photography for People with Visual Impairments via Post-Production of Omnidirectional Camera Imaging
Naoki Hirabayashi, Masakazu Iwamura, Zheng Cheng, Kazunori Minatani, Koichi Kise · 2023 · Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS) · doi:10.1145/3597638.3608422
Summary
This paper introduces VisPhoto, a novel photography system that fundamentally reimagines how people with visual impairments (PVI) take photographs. Rather than helping users aim a conventional camera at a target in real-time (the dominant approach in prior work), VisPhoto separates photography into two stages: capture and post-production. During capture, the user presses a shutter button on a Ricoh Theta omnidirectional camera, which records the entire 360-degree surrounding scene. A voice memo is simultaneously recorded so the user can note what they intended to photograph. In post-production, the system uses object detection (Google Cloud Vision API) to identify objects in the omnidirectional image, projects it onto 24 perspective images for detection, then presents the results through an accessible web interface. Users can manually select which detected objects to include via the web page, or use an automatic mode where speech recognition matches spoken object names from the voice memo to detected objects. The system then crops an optimized region centered on the selected objects and outputs it as a conventional-looking photograph. This approach eliminates the need to aim the camera, automatically corrects camera skew, enables photographing moving targets, and allows computationally intensive processing on a server without real-time constraints.
Key findings
A user study with 24 PVI (15 completely blind, 9 low vision; ages 24-75) compared VisPhoto against the standard iOS camera and tfCam (a conventional audio-guided app modeled on BlindCamera). Twenty sighted evaluators then assessed the resulting photographs. VisPhoto was significantly faster for taking the initial photo across all target types since users only needed to press a button rather than aim. In the quality evaluation by sighted people, VisPhoto (both auto and manual modes) produced significantly higher-quality photographs than the iOS camera for all four target categories, and significantly outperformed tfCam for multiple targets and moving targets. Participant preferences were nuanced and revealing: for targets within reach (easy subjects), 71% preferred tfCam, but for difficult targets (out of reach, multiple, moving), 81% preferred VisPhoto on average. The reasons reflected fundamentally different values about photography. Those preferring tfCam emphasized self-expression and the feeling of personally creating the photograph, with comments like "Photography is a form of self-expression" and "tfCam gave me the feeling that I took photographs by myself, but VisPhoto did not." Those preferring VisPhoto valued reliability and speed, noting "there is no need to aim at the target object." Notably, two former professional photographers with low vision disagreed: one wanted personal compositional control (preferring tfCam), while the other prioritized reliably capturing targets (preferring VisPhoto), illustrating how individual photographic values shape tool preferences.
Relevance
This study makes an important contribution by reframing accessible photography as a post-production rather than real-time problem, which has significant implications for assistive technology design more broadly. The finding that 83% of PVI participants use cameras but only 42% share photographs — while 79% would share if technology improved — reveals a large unmet need in creative self-expression for blind and low vision users. For practitioners, the tension between user agency and automation is a key design insight: some users value the feeling of personally creating a photograph even if quality suffers, while others prioritize the result. This mirrors broader debates in assistive technology about independence versus efficiency. The privacy concerns raised are also significant — omnidirectional cameras capture everything surrounding the user, including content they may not want photographed, and blind users cannot visually verify what was captured before uploading to a server. Future accessible photography tools should consider offering both guided (real-time aiming) and post-production approaches to accommodate these different user values.
Tags: visual impairment · blindness · photography · computer vision · object detection · omnidirectional camera · assistive technology · self-expression · creative accessibility