Turning the Knobs of Musical Emotion: Designing Emotion-Oriented Audio Control Interface for Cochlear Implant Users

Hyojin Kim, Taein Song, Kyung Myun Lee · 2026 · CHI EA '26: Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems · doi:10.1145/3772363.3798345

Summary

Kim, Song, and Lee (KAIST) tackle a persistent but under-designed problem in hearing accessibility: cochlear implant (CI) users can hear music but receive it in a spectrally coarse, pitch-degraded form that blunts emotional perception. Rather than pursue the well-trodden sensory-substitution route (sound visualisation, haptics), the authors build a sound-manipulation interface that lets CI users themselves tune the music. The study runs in two stages. First, an acoustic-feature analysis with 22 CI users and 22 normal-hearing (NH) listeners who rated 5-second music excerpts on seven emotion categories and on continuous valence and arousal scales, using Librosa to extract low-level features (loudness, tempo, brightness, roughness, spectral centroid, RMS energy) and linear mixed-effects models to link features to ratings. Second, they design two sliders—valence and arousal—each with seven levels, driven by 49 pre-rendered versions of every excerpt produced in a digital audio workstation. The valence slider modulates spectral-centroid variability (brightness); the arousal slider modulates tempo and loudness (RMS). Fourteen CI users then completed an online study of 12 excerpts spanning cheerful, tense, calm, and sad quadrants of the valence-arousal space, followed by a post-questionnaire, SUS, and thematic-analysis interviews with nine participants. The authors frame the work as a first step toward personalised, emotion-oriented CI music listening that does not assume audio-engineering literacy.

Key findings

CI users' emotion ratings clustered more tightly and neutrally in valence-arousal space than NH listeners (two-way ANOVA group-by-emotion interaction: valence F(6,294)=10.08, arousal F(6,294)=9.59, both p<.001), with CI participants showing less negative valence for sad and tense music and attenuated arousal for tense and energetic music—confirming that CI users experience a compressed emotional range. The predictive acoustic features diverged by group: for CI listeners, spectral-centroid variability was the strongest predictor of valence and tempo plus RMS energy drove arousal, consistent with prior findings that CI users lean on temporal rather than pitch cues. Participants actively modified the music in 83.48% of trials. Subjective scores were positive: perceived effectiveness M=5.50/7, clarity of change M=5.36, trial-level emotional adequacy M=5.89, intuitiveness M=5.14, satisfaction M=5.71, willingness to reuse M=5.79, and SUS 76.79 (above the 68 benchmark). Cheerful and sad excerpts drew consistent same-direction slider use, but calm and tense excerpts produced opposite-direction valence adjustments, traced via interviews to perceptual crosstalk—users heard the valence filter as changing loudness or density. Participants appropriated the interface beyond emotion, citing vocal clarity, reduced listening fatigue, and auditory self-awareness.

Relevance

For accessibility practitioners this paper is a useful counterweight to the dominant DHH music-access paradigm of visualisation and haptics: it shows CI users can meaningfully benefit from direct audio-parameter control when parameters are abstracted into emotion-oriented, jargon-free sliders rather than exposed as raw EQ and compression knobs. The specific finding that spectral-centroid variability and tempo/RMS predict CI emotion ratings is actionable for anyone designing streaming players, hearing-aid companion apps, or music-therapy tools for CI users. Significant limits: the sample is 14, excerpts are 5 seconds long, stimuli are Western-pop-leaning, and filter crosstalk means the valence and arousal dimensions are not cleanly separable—practitioners should not present them as orthogonal in a shipping product. The spontaneous appropriation for vocal clarity and fatigue management is arguably the more important design signal: CI users want a general-purpose listening-personalisation layer, not just an emotional one.

Tags: cochlear implant · music accessibility · deaf and hard of hearing · audio interface · musical emotion · valence-arousal · personalization · HCI

Standards referenced: System Usability Scale (SUS) · Russell's Circumplex Model of Affect