Visual Question Answering

Also known as: VQA

A task in which a system receives an image and a natural language question about that image, then generates a natural language answer. VQA emerged as a key accessibility paradigm through services like VizWiz, where blind users could submit photos with questions and receive answers from crowd workers or AI. Modern VQA has evolved from single-image analysis to dynamic video-stream interaction through multimodal language models, fundamentally changing how blind users can query and understand their visual environment.

Category: artificial intelligence · visual impairment · computer vision

Related: Voice and Video-Capable Language Model · Be My Eyes · Visual Grounding

Sources

https://doi.org/10.1145/3663547.3749833