Glossary

Terms used in accessibility research and practice. Each entry has a definition, common aliases, and category tags.

Search results

SHAP(also: SHapley Additive exPlanations): A unified framework for feature-importance explanations of machine-learning models, introduced by Lundberg and Lee in 2017, grounded in Shapley values from cooperative game theory. For any model and input, SHAP assigns each feature a value representing its contribution to that…
Scene Segmentation(also: Scene Detection, Shot Boundary Detection): Scene segmentation is the process of automatically dividing a video into discrete scenes or segments based on visual changes such as cuts, transitions, or the appearance of new elements in the frame. In the context of accessibility, scene segmentation is a foundational component…
Seeing AI: A free AI-powered app developed by Microsoft for blind and low vision users that uses computer vision and AI to describe the visual world. Features include reading short text, documents, and handwriting; identifying products via barcodes; recognizing people and their emotions;…
Self-Debiasing(also: Model Self-Debiasing, Autonomous Debiasing): A class of techniques where AI systems, particularly large language models, are prompted or configured to identify and reduce their own biased outputs without external model modification or retraining. Self-debiasing approaches include prompting models to reflect on whether…
Semantic Analysis(also: Semantic Content Analysis, Semantic Similarity): The computational process of determining meaning and relationships within text, images, or other content by analyzing their semantic properties rather than just surface-level features. In accessibility, semantic analysis enables automated tools to go beyond detecting the…
Semantic Data Extraction(also: Structured Data Extraction, Information Extraction): The process of extracting structured, meaningful data from unstructured or semi-structured sources such as images, documents, web pages, or natural language text, preserving the semantic relationships between data elements. In accessibility, semantic data extraction is used to…
Semantic Segmentation(also: Pixel-Level Classification, Scene Parsing): A computer vision technique that classifies every pixel in an image into a predefined category, producing a detailed map of what objects are present and where they are located. Unlike object detection (which draws bounding boxes around objects), semantic segmentation provides…
Sensory augmentation(also: Sensory substitution system, Sensory augmentation technology): Technology that provides information from one sensory channel through an alternative modality accessible to the user, such as converting visual scenes to audio descriptions for blind users or translating sounds to visual or haptic alerts for deaf users. AI-powered sensory…
Sentiment Analysis(also: Opinion Mining): A natural language processing technique that identifies and extracts subjective information from text, classifying it as positive, negative, or neutral. In accessibility research, sentiment analysis can be applied to social media posts, product reviews, and online discussions to…
SigLIP(also: Sigmoid Loss for Language Image Pre-Training): A vision-language model that uses sigmoid loss instead of contrastive loss for aligning images with text descriptions. SigLIP improves upon CLIP by using a more efficient training objective that computes image-text similarity without requiring large batch sizes. In accessibility…
Sign Language Processing(also: SLP, Sign Language Technology): A field of artificial intelligence and computer science focused on developing computational systems that can understand, generate, and translate sign languages. Sign language processing encompasses sign language recognition (detecting and interpreting signs from video input),…
Sign Language Recognition(also: SLR, Automatic Sign Recognition): A computer vision and machine learning task focused on automatically detecting and classifying signs from video input. Sign language recognition ranges from isolated sign recognition (identifying individual signs) to continuous sign recognition (interpreting sequences of signs…
Sign Language Translation(also: SLT, Sign-to-Text Translation, Sign-to-Speech Translation): The task of converting between a sign language and a spoken or written language, in either direction. Sign-to-spoken/written translation (e.g., ASL to English) involves recognizing signs from video and producing equivalent text or speech. Spoken/written-to-sign translation…
Sign language avatar(also: Signing avatar, Virtual signer): A computer-generated animated character that produces sign language from text or speech input. While sign language avatars hold potential for scaling deaf accessibility, their premature deployment raises significant concerns: the World Federation of the Deaf and World…
Small Language Model(also: SLM): A language model, typically ranging from tens of millions to a few billion parameters, designed to run on consumer or edge devices rather than in centralized cloud data centers. Small language models sacrifice some of the broad general knowledge of frontier large language models…
Sound Classification(also: Sound Event Detection, Audio Classification): The automated process of identifying and categorizing sounds into predefined categories such as speech, music, alarms, animal sounds, or environmental noise. Sound classification is a foundational capability in sound awareness technologies for deaf and hard of hearing users,…
Sound Event Detection(also: Audio Tagging, Automatic Sound Recognition): A machine learning technique that automatically identifies and classifies sounds within an audio stream, such as music, applause, laughter, environmental noises, and other non-speech audio events. In accessibility contexts, sound event detection can complement automatic speech…
Sound awareness(also: Sound recognition, Environmental sound detection): Technology that detects and identifies sounds in the user's environment and conveys that information through alternative modalities such as visual notifications or haptic alerts. For deaf and hard-of-hearing users, sound awareness systems can identify doorbells, fire alarms,…
Speaker Diarization(also: Speaker Segmentation): The process of partitioning an audio stream into segments according to speaker identity, determining "who spoke when" in a multi-speaker recording or conversation. Speaker diarization is important for accessibility because deaf and hard of hearing individuals need to distinguish…
Speaker-dependent speech recognition(also: User-adapted ASR, Personalized speech recognition): A speech recognition approach that trains or adapts its acoustic models to a specific individual's voice characteristics, rather than relying solely on general population models. For people with cognitive disabilities, dysarthria, or other speech differences, speaker-dependent…
Speech Language Model(also: SLM, Audio Language Model, Speech Foundation Model): A class of large neural models that processes both speech and text in a single end-to-end framework, integrating tasks — automatic speech recognition, spoken language understanding, dialogue, speech generation — that traditionally required separate modular systems. Examples…
Speech Recognition(also: Voice Recognition, STT, Speech-to-Text): Technology that converts spoken language into text or commands by analyzing audio input. Speech recognition powers dictation systems, voice assistants, and voice-controlled interfaces. For accessibility, speech recognition enables text input and device control for users who…
Speech-to-Text(also: STT, Speech Recognition, Automatic Speech Recognition): Technology that converts spoken language into written text, enabling voice-based input for digital systems. In accessibility, speech-to-text serves multiple roles: it powers voice command interfaces for users who cannot use keyboard or touch input, generates real-time captions…
Stable Diffusion: An open-weights latent text-to-image diffusion model released by Stability AI in 2022. It operates by iteratively denoising a random latent tensor, conditioned on text embeddings produced by a frozen CLIP encoder, until the latent can be decoded by a VAE into a coherent image.…
Subjective Image Description(also: Subjective Visual Assessment): An image description that involves opinion, aesthetic judgment, or interpretation rather than purely factual content. Examples include assessing whether an outfit matches, whether a room setting looks nice, or whether a photograph is aesthetically pleasing. Subjective image…
Support Indicator(also: Agreement Indicator): A visual or textual cue that communicates the degree of agreement across multiple AI model responses for a particular claim. Support indicators help BLV users assess claim reliability by showing how many or which models agree. Research has explored four types: source-based ("3…
Support Vector Machine(also: SVM): A supervised machine learning algorithm used for classification and regression tasks. SVMs work by finding the optimal hyperplane that separates data points into distinct categories in a high-dimensional feature space. In accessibility research, SVMs have been used to detect…

27 results.

Category

Search results