Dual Level Intraframe Coding for Increased Video Telecommunication Bandwidth

David M. Saxe, Richard A. Foulds, Arthur W. Joyce · 1998 · Proceedings of the Third International ACM Conference on Assistive Technologies (Assets '98) · doi:10.1145/274497.274523

Summary

This paper from the Applied Science and Engineering Laboratories at the University of Delaware presents a dual-level video compression approach designed to make sign language transmission viable over bandwidth-limited telephone networks. The authors identify a fundamental mismatch between how standard video conferencing systems prioritise data and what sign language communication requires. Standard systems prioritise audio quality first, then image quality, then frame rate — but sign language needs the reverse: high frame rate (at least 10-12 fps for smooth biological movement perception), reasonable image quality only around the hands and face, and no audio at all. At the time, over 750,000 people in the US used sign language as a first language, but remote signing was limited to TDD text devices that forced native signers to communicate in written English — slower than signing, and a serious limitation for those whose English literacy was limited. The paper's technical contribution is a skin detection algorithm using HSV colour space histogram matching that segments each video frame into skin regions (hands and face) and non-skin regions (background, clothing). The skin regions retain full image quality while non-skin regions are heavily blurred, allowing standard compression algorithms (JPEG, MPEG) to achieve much higher compression on the degraded portions while preserving the visual detail needed to identify handshapes and facial expressions.

Key findings

The skin detection algorithm proved robust across varying skin pigmentation and complex backgrounds, using an iterative histogram matching process with automatic seed generation. In typical sign language video, about 10-20% of the image was classified as skin. The dual-level preprocessing achieved approximately 25% additional compression beyond standard MPEG-1 (compression ratio of 0.75), reducing average frame sizes from approximately 1,290 bytes to 950 bytes. This is significant because on ISDN's 64 kbps per-direction bandwidth, standard MPEG-1 alone could only manage about 6 frames per second, while the preprocessed video approached the critical 10 fps threshold needed for intelligible sign language. The approach was designed as a preprocessor compatible with any existing compression standard (JPEG, MPEG, H.261, proprietary algorithms), not requiring modification to conferencing hardware. The authors noted that the upcoming MPEG-4 standard, with its native support for multi-level compression of regions within a frame, would be particularly well-suited to this segmentation approach. A 1995 Sprint "Relay Texas" trial of ISDN-based sign language video interpreting had validated the concept but confirmed that frame rate was the primary user complaint, with fingerspelling most affected by low frame rates.

Relevance

This paper addresses a problem that was critical for Deaf telecommunications in the late 1990s and whose underlying principles remain relevant. The insight that sign language video has fundamentally different compression priorities than voice-based conferencing — needing frame rate over audio, and selective image quality for hands and face — influenced later developments in video relay services and sign language video standards. While bandwidth constraints have eased dramatically since 1998, the core idea of region-of-interest compression for sign language video has been adopted in modern codecs and is relevant wherever bandwidth remains constrained (mobile networks, developing regions). The paper also highlights an important accessibility equity issue: standard video conferencing was designed around spoken language needs, and Deaf users were disadvantaged by those design assumptions. For practitioners today, the paper serves as a reminder that communication technologies must be evaluated against the specific needs of sign language users, not just hearing users — a principle that applies to modern video platforms, WebRTC implementations, and telehealth systems.

Tags: sign language · deafness · video conferencing · video compression · telecommunications · assistive technology · computer vision

Standards referenced: MPEG-1 · MPEG-4 · JPEG · H.261