Glossary

Terms used in accessibility research and practice. Each entry has a definition, common aliases, and category tags.

Search results

Accelerated Speech(also: Time-Compressed Speech, Speed-Up Speech): Audio output played at faster than normal speaking rate, commonly used by people with visual impairments when interacting with screen readers and other audio-based assistive technologies. Research shows that experienced screen reader users can comprehend speech at up to 500…
Computer-Assisted Language Learning(also: CALL, Computer-Aided Language Learning): Computer-Assisted Language Learning (CALL) refers to the use of computers and digital technology to support language education and pronunciation training. CALL systems often incorporate automatic speech recognition to provide feedback on learner pronunciation, detect…
Deaf Speech(also: Deaf Accent, Deaf Voice): Accented speech produced by many individuals who are deaf or significantly hard of hearing, resulting from incomplete acoustic feedback from their own voices. Because deaf speakers cannot fully hear themselves, their speech patterns often differ from those of hearing speakers in…
Forced Alignment(also: Phonetic Alignment, Phone-Level Alignment): Forced alignment is an automatic speech processing technique that aligns a speech recording with its known transcription at the phoneme or word level. Unlike free speech recognition which determines the most likely sequence of sounds, forced alignment constrains the recognizer…
Formant(also: Vocal Formant, Formant Frequency): A concentration of acoustic energy around a particular frequency in the speech signal, produced by the resonance of the vocal tract. Formants are labeled sequentially (F1, F2, F3, etc.) from lowest to highest frequency and are key to distinguishing different vowel sounds and…
Fundamental Frequency(also: F0, Pitch Frequency, Voice Pitch): The lowest frequency of a periodic sound wave, corresponding to the rate at which the vocal folds vibrate during voiced speech. Fundamental frequency (F0) is perceived by listeners as pitch and is a primary component of prosody — the rhythm, stress, and intonation of speech. F0…
Gaussian Mixture Model(also: GMM): A Gaussian Mixture Model (GMM) is a probabilistic model that represents data as a weighted combination of multiple Gaussian (normal) distributions. Each component Gaussian has its own mean and covariance, allowing GMMs to model complex, multimodal distributions. In speech…
Hyperarticulation(also: Clear Speech, Over-Articulation): A speaking style in which a person exaggerates the clarity of their pronunciation by moving their tongue and mouth to more extreme positions, producing more distinct vowel and consonant sounds. Hyperarticulation occurs naturally when speakers perceive that their listener is…
Iterative Crowdsourcing(also: Iterative Human Computation, Multi-Round Crowdsourcing): A human computation workflow in which multiple rounds of crowd workers build iteratively upon each other's responses to collectively achieve higher quality results than any individual worker could produce alone. In each iteration, workers are shown the previous round's outputs…
Perceptual Linear Prediction(also: PLP, PLP Coefficients): Perceptual Linear Prediction (PLP) is an acoustic feature extraction technique used in speech processing that models human auditory perception. PLP analysis applies psychoacoustic principles including critical band frequency resolution, equal-loudness pre-emphasis, and…
Speech Synthesis(also: Synthetic Speech, TTS Engine): The artificial production of human speech by computer, most commonly used in text-to-speech (TTS) systems that convert written text into spoken audio. Speech synthesis is foundational to screen readers and other assistive technologies used by people with visual impairments and…
Stammering(also: Stuttering, Stammer, Stutter): A neurological condition that affects the rhythmic flow of speech, causing involuntary repetitions, prolongations, or blocks of sounds, syllables, or words. Blocking describes audible or silent moments when a person is unable to produce a specific sound despite intending to.…
Supervector(also: GMM Supervector): A supervector is a high-dimensional feature representation created by concatenating the mean vectors from all components of a Gaussian Mixture Model (GMM) adapted to a specific speaker or utterance. This concatenation transforms variable-length speech into a fixed-length vector…
UAspeech Database(also: UAspeech, UA-Speech, Universal Access Speech): The UAspeech Database is a standardized corpus of dysarthric speech recordings created for research in accessible speech technology. It contains isolated word recordings from speakers with cerebral palsy exhibiting varying degrees of dysarthria, along with matched control…
Universal Background Model(also: UBM): A Universal Background Model (UBM) is a large Gaussian Mixture Model trained on speech from many speakers to represent speaker-independent acoustic characteristics. The UBM serves as a reference distribution against which individual speaker models are compared, typically using…
Voice Activity Detection(also: VAD, Speech Detection): A signal processing technique that automatically determines whether a segment of audio contains human speech or not. In accessibility applications, voice activity detection is used in audio description timing systems to identify non-speech segments where descriptions can be…
Voice Interface(also: Speech Interface, Voice User Interface, VUI): An interface that allows users to interact with a system using spoken natural language commands rather than keyboard, mouse, or touch input. Voice interfaces range from simple command-and-control systems that recognise fixed phrases to conversational assistants that interpret…

17 results.

Category

Search results