Speech Synthesis Markup Language

Also known as: SSML

Speech Synthesis Markup Language (SSML) is a W3C standard XML-based markup language for controlling the rendering of synthetic speech by text-to-speech (TTS) engines. SSML provides tags for specifying pronunciation, volume, pitch, speaking rate, emphasis, pauses, and voice characteristics (such as gender and age), giving content authors fine-grained control over how synthesized speech sounds. In accessibility contexts, SSML is used to produce more natural and intelligible audio output for screen readers, audio descriptions, and other assistive technology applications. SSML is widely supported by commercial and open-source TTS engines, cloud speech services (such as Amazon Polly, Google Cloud Text-to-Speech, and Microsoft Azure Speech), and is referenced in WAI-ARIA for speech rendering hints.

Category: Web Standards · Speech

Related: Text-to-Speech · Audio Description · WAI-ARIA · Screen Reader

Sources

https://www.w3.org/TR/speech-synthesis11/