Shadow Speaking

Also known as: Shadow Captioning, Respeaking

A captioning technique where a trained human operator listens to live speech and repeats (or "respeaks") it clearly into a speech recognition system, which then generates real-time captions. The shadow speaker simplifies and normalizes the speech — removing overlapping dialogue, background noise, and unclear pronunciation — to produce more accurate ASR output than direct recognition of the original audio. Shadow speaking is widely used for live event captioning, broadcast television, and classroom accessibility, bridging the gap between fully manual stenography (which requires highly specialized skills) and fully automatic speech recognition (which struggles with spontaneous, multi-speaker environments). While more cost-effective than stenographic captioning, it still requires a trained human operator, making it significantly more expensive than fully automated approaches.

Category: captioning · assistive technology · deaf and hard of hearing

Related: Automatic Speech Recognition · Closed Captions · CART

Sources

https://doi.org/10.1145/2207016.2207053