Glossary

Terms used in accessibility research and practice. Each entry has a definition, common aliases, and category tags.

Category: computer vision

Filter

Search results

VQA(also: Visual Question Answering): VQA (Visual Question Answering) is an AI task in which a system answers natural-language questions about the content of an image. In assistive contexts, VQA systems such as Be My AI, Seeing AI, and Aira let blind and low-vision users ask about their visual surroundings - from…
Video Inpainting(also: Video Fill, Content-Aware Video Fill): A computer vision technique that fills in removed or missing regions of a video frame with plausible content generated based on surrounding visual information. Video inpainting is used in accessibility applications to seamlessly remove distracting visual elements (overlays,…
Video Segmentation(also: Scene Segmentation, Video Scene Detection): The process of dividing a video into meaningful segments or scenes based on visual changes, content shifts, or thematic transitions. Video segmentation enables granular customization and navigation, allowing viewers to apply different settings to different parts of a video or…
Vision Language Model(also: VLM, Vision-Language Model, Multimodal Large Language Model): A machine-learning model trained to take both images and natural-language text as input and to produce natural-language output. Modern VLMs — such as GPT-4o, Gemini, and Claude — can describe a photo, read text inside an image, answer questions about a scene, identify objects,…
Visual Dialogue(also: Visual Dialog, VisDial): Visual dialogue is an AI task that involves holding a multi-turn natural language conversation about visual content such as an image or video frame. Unlike single-turn visual question answering (VQA), visual dialogue systems maintain context across multiple exchanges, using…
Visual Document Understanding(also: VDU, Document Understanding): A field of AI research focused on the interpretation and analysis of visually-rich digital documents such as forms, tables, menus, reports, receipts, and academic papers. Visual document understanding goes beyond basic OCR text extraction by comprehending the spatial layout,…
Visual Grounding(also: Grounded Visual Understanding): The ability of an AI model to connect its language output to specific elements actually present in the visual input, ensuring that descriptions and responses are anchored to real objects and scenes rather than generated from learned patterns or assumptions. Poor visual grounding…
Visual Inertial Odometry(also: VIO): A motion tracking technique that combines camera-based visual tracking with inertial sensor data (gyroscopes and accelerometers) to estimate a device’s position and orientation in 3D space with high accuracy. VIO works by tracking salient visual features across consecutive video…
Visual Interpreter(also: Visual Interpreter Service, Visual Description Service, VIDS): A visual interpreter or description service (VIDS) is a technology or human-powered service that provides people who are blind or have low vision with descriptions of their visual surroundings, typically by receiving camera feeds from the user's smartphone or smart glasses.…
Visual Language Model(also: VLM, Vision-Language Model): AI models that can process and reason about both visual and textual information, combining computer vision with large language model capabilities. VLMs could potentially enhance assessment descriptors by providing contextually rich and customizable descriptions of visual…
Visual Layout Analysis(also: Layout Analysis, Document Layout Analysis): The automated process of examining the spatial arrangement and visual properties of elements within a document to infer meaningful structural relationships between them. In accessibility contexts, visual layout analysis is used to automatically generate metadata about how…
Visual Question Answering(also: VQA): A task in which a system receives an image and a natural language question about that image, then generates a natural language answer. VQA emerged as a key accessibility paradigm through services like VizWiz, where blind users could submit photos with questions and receive…
Visual Saliency(also: Saliency Detection, Visual Attention Prediction): A computer vision concept referring to the degree to which visual elements attract attention compared to their surroundings. Saliency detection models predict which parts of an image or video frame will draw the viewer eye first, based on factors like contrast, color, motion,…
Visual Saliency(also: Saliency, Saliency Detection, Saliency Map): A computational measure of how much a particular region of an image or video stands out from its surroundings and attracts visual attention. Saliency models predict where people are most likely to look based on factors such as contrast, colour, motion, and semantic content. In…
Visual question answering(also: VQA, Visual QA): A computer vision and natural language processing task in which a system answers natural language questions about the content of an image or video. In accessibility contexts, VQA enables blind and visually impaired users to query visual content interactively — asking specific…
Visual-Inertial Odometry(also: VIO): A computer vision technique that combines camera imagery with motion sensor data (accelerometer and gyroscope) to track a device's position and orientation in 3D space. In accessibility applications, VIO enables smartphones to maintain awareness of object positions even when…

16 results.

Category

Search results