Sona: Towards Context-Aware, Real-Time Personalization of Acoustic Environments for Noise Sensitivity

Jeremy Zhengqi Huang, Emani Hicks, Sidharth, Gillian R Hayes, Dhruv Jain · 2026 · Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA '26) · doi:10.1145/3772363.3799034

Summary

Huang and colleagues (University of Michigan and UC Irvine) reframe acoustic accessibility for people with noise sensitivity as selective, user-steerable soundscape mediation rather than wholesale noise cancellation. The authors argue that current commercial tools - active noise cancellation and transparency modes on AirPods Pro and similar hearables - treat the environment as a single stream, forcing a binary trade-off between comfort and situational awareness. This is the wrong trade-off for noise-sensitive users (an estimated 50-70%% of autistic adults, plus large fractions of people with ADHD and roughly one-fifth of the UK population reporting misophonia symptoms), who report wanting to stay aware of their surroundings while muting specific triggering sounds. Sona builds on the recent 'semantic hearing' research line that demonstrated real-time target sound extraction is feasible, but inverts the usual framing: rather than foregrounding a target stream (typically speech), Sona is residual-first - the ambient soundscape is the default listening signal, and individual nuisance sounds are selectively attenuated on top of it. The system has three coordinated layers: a Contextual Sensing layer using on-device YAMNet to recognize sounds and surface them as one-tap shortcuts (shifting the interaction from recall to recognition during sensory overload); a Live Audio Filtering layer using a target-conditioned DCCRN suppression model conditioned via FiLM with embeddings extracted from AudioSep, supporting adjustable per-class suppression strength; and a Personalization layer letting users record short audio samples of idiosyncratic triggers (a specific neighbor's dog, a particular appliance) and register them as new filter classes without retraining. A formative survey of 68 noise-sensitive participants screened with the SSSQ-2 informed the initial 25 sound classes.

Key findings

This is a system-and-formative-study paper without a deployment evaluation; the empirical contributions sit at three levels. (1) Survey findings on triggering sounds: among 68 SSSQ-2-positive respondents, sudden loud/impact sounds (fireworks, thunder) and sharp high-pitched sounds (metal scraping) were rated most distressing, followed by mouth sounds and tools/construction sounds, with lawn mowers (65%%), leaf blowers (61%%), and general construction noise (58%%) topping the frequency list. (2) Three design goals derived from the literature - fine-grained selective regulation, low-friction control during distress, and end-user personalization - directly map to Sona's three layers. (3) Technical feasibility: the AudioSep+DCCRN+FiLM architecture runs at ~10-15 ms inference time and ~75 ms end-to-end latency on an iPhone 16 Pro Max, making real-time interactive suppression viable on commodity hardware, and the embedding-based approach lets a single causal network handle both built-in and user-defined targets. The user-controlled suppression strength alpha cleanly separates 'what to suppress' from 'how strongly to suppress' via interpolation between original and attenuated signals. The authors position Sona as collaborative - personal context recordings can be done with family or support-network help during calmer moments rather than during acute sensory overload.

Relevance

For accessibility practitioners and product teams working on hearables, hearing tech, and neurodivergent-focused tools, this paper articulates a clear interaction model that consumer ANC products are not delivering: selective, graduated, context-aware suppression that preserves environmental awareness. The recall-to-recognition shift (surface what the system already detects rather than asking the user to specify it) is a directly applicable design pattern for any assistive tool used during cognitive overload, and the few-shot embedding-based personalization approach is a clean answer to the well-known limit of fixed sound taxonomies for idiosyncratic triggers. The framing of acoustic accessibility as a noise-sensitivity issue - spanning autism, ADHD, misophonia, and hyperacusis - usefully expands hearing accessibility beyond Deaf/hard-of-hearing users into a much larger neurodivergent population that consumer ANC underserves. Caveats: no user evaluation yet, the system is a CHI EA prototype rather than a deployed product, real-world latency and acoustic artifacts in the suppressed output are not characterized in deployment, and effectiveness for the personalized few-shot recording flow has not been measured. Deployment studies, ideally longitudinal and in real triggering environments, are the obvious next step.

Tags: noise sensitivity · misophonia · hyperacusis · autism · neurodivergence · acoustic accessibility · assistive technology · machine learning · sensory overload