LaboSignes: an Interactive French Sign Language Recognition Interface

Jules Françoise, Julie Lascar, Cyril Verrechia, Sidonie Minodier, Michèle Gouiffès, Annelies Braffort · 2026 · Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA ’26) · doi:10.1145/3772363.3799328

Summary

LaboSignes is a web-based interactive French Sign Language (LSF) recognition system aimed at search-by-sign, addressing the fact that most online resources, including bilingual dictionaries and news sites, are navigable only through written French, which is a second language for many Deaf users. The team built the system around three constraints they argue have been ignored in prior sign-language ML work: it has to run from a standard webcam (no depth cameras or data gloves), it has to be low latency to feel interactive, and it has to surface alternative results clearly so users can recover from recognition errors. The recognizer is a pose-based pipeline: MediaPipe holistic landmarks are extracted client-side in JavaScript, a subset of body, hand, and face keypoints is selected and normalized, and a lightweight Transformer encoder (2 layers, 4 heads) classifies the sequence into one of 445 LSF labels drawn from the Mediapi-RGB news corpus (~15,000 clips). End-to-end inference runs in roughly 21 ± 7 ms. The recording interface was iteratively designed in collaboration with a Deaf engineer and presents a posture-feedback disk that turns green when the user is at the right distance and framing; recording is triggered by a thumb-up gesture, with auto-stop when both hands leave the frame. Top-5 results appear as an interactive bar chart linked to reference videos and, for 196 signs, motion-capture-driven 3D avatars. A within-subjects study with 16 LSF signers (10 Deaf, 6 interpreters or experienced signers) compared three triggering mechanisms: thumb-up gesture, automatic posture-based, and button.

Key findings

Top-1 recognition accuracy on the Mediapi held-out test set was 90.87% and Top-5 was 98.56%; on the user study data (30 labels recorded with webcams in less controlled conditions) Top-1 dropped to 71.65% and Top-5 to 87.03%, indicating reasonable robustness to the domain shift from coarticulated studio signs to isolated webcam signs. The thumb-up gesture trigger was preferred by 10 of 16 participants (versus 3 for auto and 2 for button), even though there was no significant accuracy difference across triggers. Auto-trigger caused involuntary recording for 7 participants and thumb-up did so for 3; conversely 4 participants had trouble intentionally activating the thumb-up. A Mann-Whitney test found Deaf participants had significantly higher Top-1 accuracy than hearing signers (U=10, p=0.031, r=0.53), interpreted as cleaner articulation by L1 signers. A bug in the data augmentation pipeline initially produced much lower accuracy for the one left-handed participant; retraining fixed it. Errors clustered into four patterns: another person in frame or stray motion at the end of recording, depth-encoded grammar (forward/backward to mark future/past) lost in 2D pose, true sign ambiguities (e.g., "dog" and "Bordeaux"), and regional variation. Participants explicitly raised the regional-variation issue as a political concern, asking that ML systems not enforce a Paris-centric standard at the expense of local LSF dialects rooted in regional Deaf schools.

Relevance

For accessibility practitioners, the most useful contribution here is not the model but the framing of sign-language recognition as an HCI problem rather than purely a machine-learning benchmark. The authors show that recognition accuracy alone does not determine user preference (gesture trigger won despite equal accuracy) and that running everything from a commodity webcam in the browser is feasible, lowering the deployment barrier for Deaf-targeted tools that have historically required specialized hardware. The paper is also a clear example of the De Meulder "good enough" problem: ML systems trained on the available data tend to standardize a single dialect, and Deaf participants pushed back specifically because regional LSF variation is a Deaf cultural and political asset shaped by the post-Milan Congress fragmentation of French Deaf education. Practitioners building any Deaf-facing tool should treat this not as noise to filter out, but as a data-collection and governance requirement. Limitations: the 445-label vocabulary is small, drawn from one news corpus, and skewed toward news topics; only 16 participants and 30 test signs were used; and the system handles only isolated lexical signs, not depicting signs, classifiers, or continuous signing, all of which carry much of LSF's actual expressive load.

Tags: sign language · French Sign Language · sign language recognition · Deaf accessibility · gesture recognition · interface design · transformer · pose estimation · web accessibility