Pedestrian Detection with Wearable Cameras for the Blind: A Two-way Perspective

Kyungjun Lee, Daisuke Sato, Saki Asakawa, Hernisa Kacorri, Chieko Asakawa · 2020 · Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems · doi:10.1145/3313831.3376398

Summary

Wearable cameras — embedded in commercial products like OrCam, Aira, and eSight — promise blind users equitable access to visual information about the people around them: who is approaching, where they are looking, whether eye contact is possible. But the always-on nature of such devices creates a second, rarely studied stakeholder: the bystander who may be recorded and computationally analysed without ever meeting the user. Lee and colleagues tackle this tension as a two-sided social acceptance problem, running two studies that deliberately vary camera visibility (a conspicuous GoPro with a head strap vs. near-invisible Vuzix Blade smart glasses), type of exposure (watching a video vs. physically passing a blind person in a corridor), and the granularity of features the system extracts (presence and distance only, up to age, gender, ethnicity, and facial expression). The first study is a 2×2 factorial online survey with 206 U.S. Mechanical Turkers viewing passerby- and user-perspective videos. The second study is an in-situ experiment in which 10 blind participants wear GlAccess, a working prototype built on Vuzix Blade smart glasses that uses MTCNN face detection, InsightFace embeddings, and a head-pose model to deliver audio descriptions — e.g., 'Near, looking at you, female, young' — while 40 sighted passersby walk past them in a 20-metre corridor and answer social-acceptance questionnaires. The paper frames the results as a design-space map rather than a yes/no verdict.

Key findings

Blind and sighted participants consistently disagreed on which visual features the wearable should extract. Blind users prioritised head pose, distance, and position (indicators of a person's availability for conversation) and rated age estimation relatively low; sighted bystanders strongly resisted the system detecting ethnicity (top objection) and facial expression, with many rejecting age detection as well. Camera visibility mattered: the near-invisible smart glasses drew significantly more discomfort about recording (p<0.01, r=0.715 for how noticeable the camera was) — sighted participants objected to being recorded without knowing about it. Direct, in-person exposure shifted opinions more than watching a video; in-situ sighted participants became significantly more uncomfortable with assistive use (UncomfortableAssistUse, p<0.05) after learning what the system actually captured, while becoming more permissive about one-time image use (OkaySingleUse, p<0.01). Both groups generally agreed that camera-based assistive technology for blind people is acceptable (OkayAssistUse means µ=2.2 online, µ=2.4 in-situ on a -3 to 3 scale), but roughly 30% of sighted participants refused any recognition of themselves, and one in-situ participant flipped from Strongly agree to Moderately disagree after seeing the user-perspective video. The GlAccess prototype itself achieved f-scores of 0.77–1.0 on name, gender, head-pose, and position estimation.

Relevance

This paper is essential reading for anyone designing or deploying camera-based assistive technology in shared public space. Its main contribution is methodological as much as empirical: studying social acceptance without the bystander in the room misses exactly the reactions that matter. For practitioners, the concrete design takeaways are (1) make the camera visible rather than hidden if social deployment is the goal — invisibility amplifies distrust; (2) limit extracted features to what blind users actually need for social initiation (head pose, distance, position) and treat ethnicity, facial expression, and unconstrained face recognition as high-risk additions; (3) consider a 'known people only' recognition model, which over 85% of bystanders accepted. Limitations include small, educated, campus-based samples and a single use case (corridor pedestrian detection). Nevertheless, the findings directly inform consent frameworks, feature toggles, and privacy defaults for products like OrCam, Envision Glasses, and Meta Ray-Bans.

Tags: wearable camera · pedestrian detection · social acceptance · face recognition · privacy · visual impairment · assistive technology · smart glasses · bystander privacy · crowdsourcing