Table Recognition

Also known as: Table Detection, Table Extraction

The automated process of detecting and reconstructing tabular data structures from document images, scanned PDFs, or other non-structured formats. Table recognition goes beyond basic OCR by identifying the spatial relationships between text elements — rows, columns, cells, headers, and merged regions — and converting them into structured, semantically meaningful formats like HTML tables. This is critical for document accessibility because tabular data in image-only documents is completely inaccessible to screen readers, which require proper markup (th, td, scope attributes) to convey the row-column relationships that give the data meaning.

Category: document accessibility · image processing

Related: Optical Character Recognition · Document accessibility · Alternative Text

Sources

https://dl.acm.org/doi/10.1145/3058555.3058581