Sign Transition Modeling and a Scalable Solution to Continuous Sign Language Recognition for Real-World Applications
Kehuang Li, Zhengyu Zhou, Chin-Hui Lee · 2016 · ACM Transactions on Accessible Computing · doi:10.1145/2850421
Summary
This paper presents a scalable framework for continuous sign language recognition (SLR) designed to work in real-world conditions using affordable hardware. The researchers address a fundamental challenge in SLR: modeling the transitions between signs. Unlike spoken language where transitions between sounds are brief and handled through coarticulation modeling, sign language transitions involve complex hand movements that are highly variable and difficult to characterize. The authors adapt Hidden Markov Model (HMM) techniques from Automatic Speech Recognition (ASR), introducing a universal transition model that implicitly classifies all transition signals through Gaussian splitting. This approach allows the system to scale to large vocabularies without the computational explosion that plagued previous methods requiring multiple explicit transition models. Data was collected using low-cost digital gloves (~$150) equipped with gyroscopes and accelerometers—dramatically cheaper than the $17,000+ CyberGloves used in prior research—making the technology potentially accessible to deaf and hard of hearing communities.
Key findings
The system achieved 87.4% word accuracy on a 510-word Chinese sign language vocabulary, with real-time processing averaging just 0.69 seconds per sentence. Testing involved 1,024 sentences from five deaf/hard of hearing signers plus one hearing sign language teacher. The universal transition model outperformed both no-transition approaches (13.9% accuracy) and extended-sign models (79.2%), while maintaining computational efficiency compared to multi-transition-model approaches that required minutes per sentence. Scalability was demonstrated by extending the vocabulary from 86 to 510 words with minimal accuracy loss and consistent real-time speed. Cross-validation on unseen signers showed 73% average accuracy, with native signers reaching 89.5%—comparable to how ASR systems perform worse on speakers with strong accents. The dominant hand carried significantly more information; using only right-hand features achieved 77% accuracy, though optimal performance required both hands.
Relevance
This research demonstrates that practical, affordable sign language recognition is achievable for medium-vocabulary real-world applications. The use of low-cost gloves rather than expensive laboratory equipment or camera-based systems (which have lighting and background limitations) is significant for actual deployment in deaf communities. The framework's compatibility with existing ASR infrastructure means it can leverage decades of speech recognition advances including deep learning techniques. For accessibility practitioners, this work shows progress toward bridging communication between sign language users and non-signers. Limitations include the focus on Chinese Sign Language (though the framework is language-agnostic), the requirement for users to wear gloves, and the current 510-word vocabulary ceiling. Future directions include larger vocabularies, improved glove design, and signer adaptation techniques.
Tags: sign language recognition · hidden Markov models · machine learning · deaf and hard of hearing · wearable technology · gesture recognition · assistive technology