Unit Selection Synthesis

Also known as: Concatenative Unit Selection, Unit Selection TTS

A text-to-speech synthesis approach that generates speech by selecting and concatenating variable-length segments of pre-recorded human speech from a large database to match the input text. Unit selection synthesizers generally produce more natural-sounding speech than formant-based systems because they use actual human voice recordings. However, speeding up the output beyond natural speaking rates requires signal processing, which can affect quality. Well-known unit selection systems include AT&T Natural Voices and IVONA. This approach is relevant to accessibility because screen reader users who listen at high speeds may experience different intelligibility depending on how the synthesis handles speed increases.

Category: Speech Technology · Assistive Technology · Auditory Interface

Related: Text-to-Speech · Formant Synthesis · Concatenative Synthesis · Screen Reader · Speech Intelligibility

Sources

https://doi.org/10.1145/2049536.2049574