Automatic Captions

Also known as: Auto-Generated Captions, Auto Captions, ASR Captions

Captions produced by automatic speech recognition (ASR) systems without human transcription, typically generated by the hosting platform (e.g., YouTube, Zoom, Microsoft Teams) as an optional layer on uploaded or live video. Automatic captions have dramatically expanded caption coverage but remain error-prone, particularly for speakers with accents, overlapping speech, technical vocabulary, homophones, and non-English languages — and auto-translate pipelines compound these errors when translating between languages. Research consistently shows that automatic captions fall short of educational and DHH accessibility requirements without human review, and that Word Error Rate alone underestimates their impact on comprehension; error salience, synchronisation, and caption speed shape viewer experience as strongly as raw accuracy.

Category: Captioning · Automatic Speech Recognition · Video Accessibility · AI and accessibility

Related: Automatic Speech Recognition · Captions · Closed Captions · Word Error Rate · User-Generated Captions

Sources

https://doi.org/10.1145/3772318.3791868