Image Captioning

Also known as: Automatic Image Description, AI Image Description

A computer vision task in which an AI model generates a natural language description of the content of an image. In accessibility contexts, image captioning technology enables visually impaired users to understand visual content by converting images into text that can be read aloud by screen readers or text-to-speech systems. Modern image captioning uses vision-language models like BLIP and ViLT that combine visual feature extraction with language generation. While AI-generated captions are improving rapidly, they still have limitations in accuracy, context awareness, and capturing subjective or culturally relevant details, which is why human-authored alt text remains the gold standard for web accessibility under WCAG guidelines.

Category: artificial intelligence · assistive technology · computer vision

Related: Alternative Text · Visual Question Answering · Vision-Language Model · Object Detection · Screen Reader

Sources

https://www.w3.org/WAI/tutorials/images/