← All terms

ALTO

Also known as: Analyzed Layout and Text Object, ALTO XML

An open XML format maintained by the US Library of Congress that stores the output of OCR (optical character recognition) together with the layout information of the scanned pages: character positions, confidence scores, candidate alternative characters, block and line geometry, and reading order. ALTO is commonly paired with METS for digital-library deliverables and is used throughout the cultural-heritage sector. In accessibility contexts ALTO matters because preserving character-level coordinates lets downstream tools regenerate accessible derivatives (EPUB, DAISY) and refine OCR errors over time without losing earlier correction work. Extensions have been proposed to handle non-Latin scripts, right-to-left and vertical writing modes, and language-specific features such as ruby annotations.

Category: file formats · digitization · OCR

Related: Optical Character Recognition · EPUB · DAISY

Sources