Open-Vocabulary Detection

Also known as: Open-Vocabulary Object Detection, OVD

A class of computer vision object detection models that accept arbitrary text queries at inference time rather than being restricted to a fixed set of pre-trained classes. Instead of only recognizing, for example, the 80 COCO categories, an open-vocabulary detector (such as YOLO-World) takes user-supplied text prompts like 'cup', 'wheelchair ramp', or 'sparrow' and returns bounding boxes for matching objects. In accessibility tools for blind and low vision users, open-vocabulary detection is important because it lets the system be tuned to task- and environment-specific vocabularies, reducing irrelevant announcements and auditory clutter while letting the user control exactly what the system reports.

Category: Computer Vision · AI and accessibility · Machine Learning · Assistive Technology

Related: YOLO · Object recognition · Scene Description · Assistive technology

Sources

https://arxiv.org/abs/2401.17270
https://huggingface.co/docs/transformers/en/tasks/zero_shot_object_detection