Visual Question Answering
Also known as: VQA
A task in which a system receives an image and a natural language question about that image, then generates a natural language answer. VQA emerged as a key accessibility paradigm through services like VizWiz, where blind users could submit photos with questions and receive answers from crowd workers or AI. Modern VQA has evolved from single-image analysis to dynamic video-stream interaction through multimodal language models, fundamentally changing how blind users can query and understand their visual environment.
Category: artificial intelligence · visual impairment · computer vision
Related: Voice and Video-Capable Language Model · Be My Eyes · Visual Grounding