Automatic Caption Evaluation

Also known as: ACE, ACE Framework, ACE Metric

A caption-quality evaluation framework introduced by Sushant Kafle and Matt Huenerfauth (2017-2018) that scores automatically generated captions based on their usability for Deaf and Hard-of-Hearing readers, rather than simply counting transcription errors. For each mismatch between the reference word and the recognised word, ACE combines two sub-scores: a word-importance score (how critical that word is to sentence meaning, estimated from n-gram or neural language-model predictability) and a semantic-distance score (how far the recognised word drifts from the reference, computed via cosine similarity of word-embedding vectors such as word2vec). The two are combined with a tuning weight α fit to DHH participants' comprehension data. ACE correlates better than Word Error Rate with DHH viewers' subjective ratings of caption quality and is proposed as a training and procurement metric for ASR-based captioning systems targeting DHH users.

Category: Captioning · Accessibility Metrics · Deaf and Hard of Hearing · Automated Accessibility

Related: Word Error Rate · Caption Quality · Automatic Speech Recognition · Word Importance · NER Model · Keyword Reading Strategy

Sources