Wav2Vec

Also known as: Wav2Vec2, Wav2Vec 2.0

A family of self-supervised speech representation models from Meta AI that learn rich acoustic embeddings directly from raw waveform audio without requiring transcribed training data. Wav2Vec 2.0, introduced in 2020, became a backbone for low-resource automatic speech recognition, speaker identification, and speech emotion recognition. In accessibility tools, Wav2Vec embeddings underpin many recent captioning, sound classification, and emotion-tagging systems for Deaf, hard-of-hearing, and noise-sensitive users, because they generalize to non-speech audio events and to languages and dialects with limited labeled data. Performance is uneven across speakers and remains subject to documented racial and dialectal disparities in downstream ASR systems built on top of it.

Category: AI and accessibility · Speech Technology · Machine Learning · Captioning

Related: Automatic speech recognition · Speech Emotion Recognition

Sources

https://arxiv.org/abs/2006.11477
https://huggingface.co/docs/transformers/model_doc/wav2vec2