Effect of Automatic Sign Recognition Performance on the Usability of Video-Based Search Interfaces for Sign Language Dictionaries

Oliver Alonzo, Abraham Glasser, Matt Huenerfauth · 2019 · Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2019) · doi:10.1145/3308561.3353791

Summary

This paper investigates how the performance of automatic sign recognition technology affects user satisfaction when searching for unfamiliar words in an ASL-to-English dictionary. Looking up an unknown sign in ASL is fundamentally harder than looking up an unknown written word: ASL has no standard written form and no intuitive alphabetical ordering, so users cannot simply type in what they saw. Video-based dictionary search systems allow users to perform a sign into a webcam and receive a ranked list of possible matches. However, researchers developing such systems report performance inconsistently — some report top-4 accuracy, others top-20 or top-375 — and no prior work has examined how system performance relates to actual user satisfaction. The authors conducted two Wizard-of-Oz studies using a simulated webcam-based ASL dictionary where results were pre-determined rather than generated by real recognition technology, allowing precise control over two variables: the placement of the desired sign in the results list (positions 1, 5, 10, or 20 in the "placement study") and the precision of surrounding results (how similar the distractor signs were to the desired sign in the "precision study"). Participants were hearing ASL students (16 in the placement study, 10 in the precision study) who viewed stimulus videos of a native ASL signer performing 32 relatively advanced signs, then performed each sign into the webcam to "search" for it.

Key findings

In the placement study, both user satisfaction with ranking and perceived relevance of results decreased significantly as the desired sign appeared lower in the list (Friedman tests p<0.01 for both measures across all four placement conditions). Critically, satisfaction dropped below the midpoint of the scale somewhere between positions 10 and 20, suggesting researchers should focus on optimising placement within the top-10 or higher. In the precision study (where placement was held nearly constant at position 10±2), the similarity of surrounding signs to the desired sign also significantly affected both satisfaction (χ2=16.526, p<0.01) and perceived relevance (χ2=35.438, p<0.01). High-precision lists (filled with visually similar signs sharing handshape and location) received significantly higher satisfaction and relevance ratings than medium or low-precision lists, even though the desired sign was in the same position. This was an unexpected and important finding: users don’t just care about finding the right answer — they want the entire results list to look coherent and relevant. When comparing information retrieval metrics, normalised Discounted Cumulative Gain (nDCG), which incorporates both placement and the graded relevance of all items in the list, correlated significantly with user judgements in both studies, while binary DCG (bDCG, which only considers whether the exact desired word appears) failed to correlate with user judgements in the precision study. This suggests researchers should report nDCG rather than simple top-k accuracy.

Relevance

This paper provides essential methodological guidance for researchers building sign language recognition and dictionary search systems. The finding that users care about both the position of the correct result and the quality of surrounding results has direct implications for algorithm design: recognition systems should not only optimise for placing the correct sign near the top, but also for producing coherent, visually similar result sets. The identification of position 10-20 as a critical satisfaction threshold gives recognition researchers a concrete performance target. For the broader Deaf and hard of hearing community, improved ASL dictionary search is significant — there are approximately 500,000 ASL users in the U.S. and 28 million people who are DHH, and increasing ASL knowledge facilitates greater communication and inclusion. The study’s limitation is that participants were hearing ASL students rather than DHH users, who may have different search behaviours and preferences; future work with DHH participants is essential. The recommendation to adopt nDCG as a standard evaluation metric could help unify a fragmented research field where inconsistent reporting makes cross-system comparison nearly impossible.

Tags: sign language · ASL · sign language recognition · dictionary · Deaf and hard of hearing · information retrieval · Wizard of Oz · usability · computer vision · evaluation methods