"Not There Yet": Feasibility and Challenges of Mobile Sound Recognition to Support Deaf and Hard-of-Hearing People

Jeremy Zhengqi Huang, Hriday Chhabria, Dhruv Jain · 2023 · Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS) · doi:10.1145/3597638.3608431

Summary

This paper presents the first longitudinal field study of a mobile sound recognition system used by Deaf and hard of hearing (DHH) people in their daily lives. The researchers deployed SoundWatch, a smartwatch-based app that uses a deep learning model (Google YAMNet architecture) to classify environmental sounds in real-time across 20 categories (dog bark, door knock, fire alarm, speech, etc.) and deliver visual and vibrational feedback to the wearer. Ten DHH participants (average age 48.6, range 21-75) wore the smartwatch for three weeks, using it across diverse real-world contexts including home, work, social situations, vehicles, outdoors, and commercial spaces. The study included an initial interview and demo, three weeks of daily use with weekly surveys and ongoing text/email communication, and an exit interview with 25 questions across 10 categories. The app logged sound events automatically, recording an average of 112.7 sound events per day per participant. The researchers also presented participants with two mid-fidelity design prototypes for handling AI errors: one showing prediction uncertainty with multiple possible sound identities, and another allowing users to report and correct inaccurate predictions. This work extends prior lab-based and controlled evaluations by capturing the messy reality of sound recognition in everyday acoustic environments.

Key findings

The study revealed a stark gap between lab performance and real-world utility. While all 10 participants found the app helpful in specific scenarios — monitoring children, noticing door knocks, detecting approaching traffic, being reminded of running water — all 10 also reported sound recognition errors, with five explicitly stating the technology is not ready for daily adoption. Errors fell into four categories: false positives (most common, N=6, e.g., clothes rustling triggering bird predictions, car sounds registered as duck/goose), false negatives (N=4, e.g., missed emergency vehicles due to ambient noise), misattribution (N=3, e.g., microwaves and medical device beeps classified as fire alarms), and background noise interference (e.g., rain, TV sounds). The frequent false positives caused a dangerous desensitization effect — participants began ignoring notifications entirely, creating a "boy who cried wolf" dynamic that could cause them to miss critical alerts. Usage declined over time for some participants due to "growing frustration" and "burnout." Privacy concerns were nuanced: seven of 10 participants reported no concerns after three weeks, but three expressed worry at some point, and social tensions emerged (one participant hid the watch when friends visited). Participants strongly endorsed user-programmable sound recognition, with all 10 appreciating the ability to train the system on personal sounds. Nine of 10 preferred the error correction prototype over the uncertainty display, finding it more empowering to actively collaborate with the AI.

Relevance

This study delivers a critical reality check for the assistive technology field: sound recognition models that perform well in controlled settings fail in the acoustic complexity of real life. For practitioners and developers building sound awareness tools, the four error categories identified (false positives, false negatives, misattribution, background noise) provide a concrete taxonomy for testing and improvement. The desensitization finding is particularly alarming for safety-critical applications — if users learn to ignore notifications due to frequent errors, they may miss fire alarms or approaching vehicles. The design recommendations are immediately actionable: support end-user customization of sound models (letting users train on their specific microwave, doorbell, etc.), embed context awareness (adjusting confidence thresholds based on location), display AI uncertainty transparently, and enable collaborative error correction. The study also highlights an important tension in assistive AI: DHH users who cannot hear the sounds may struggle to verify or correct the AI, creating a unique challenge for human-in-the-loop approaches that assume users can assess model accuracy.

Tags: deaf and hard of hearing · sound recognition · wearable technology · smartwatch · assistive technology · machine learning · field study · human-AI interaction · privacy