I'm Always a Little Skeptical of It: Verification Practices of Blind Users When Working with Generative AI in Spreadsheets
Minoli Perera, Swamy Ananthanarayan, Cagatay Goncu, Kim Marriott · 2026 · Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI '26) · doi:10.1145/3772318.3790988
Summary
This CHI 2026 paper reports a remote study with 12 blind screen reader users (11 totally blind, 1 legally blind) examining how they verify outputs produced by Generative AI tools when working on accuracy-critical spreadsheet tasks. Spreadsheets are pervasive in workplace and educational settings but are notoriously difficult for screen reader users, who must navigate a two-dimensional grid cell-by-cell with no easy way to form a holistic view. GenAI assistants such as Microsoft Copilot in Excel, Gemini in Google Sheets, ChatGPT, and Claude promise to flatten this barrier by letting users describe tasks in natural language — but they also hallucinate, especially on numeric and visual tasks, and push correctness responsibility onto the user. Participants completed one of two scenarios: analyzing 2013-2023 World Bank inflation data across 162 countries (information extraction, trend analysis, chart creation) or modifying a 100-student marksheet (formula generation, conditional formatting, visual formatting). Sessions were about two hours, conducted via Zoom with screen sharing and audio recording, and participants used their own preferred AI tool (ChatGPT most common, then Copilot and Gemini) and screen reader (JAWS or NVDA). The authors applied inductive thematic analysis to observational and interview data, cross-coded by two researchers, to identify verification methods, workflows, and error-response patterns, producing detailed workflow diagrams for each task.
Key findings
No participant skipped verification; all described 'never fully trusting' GenAI output. Five verification strategies emerged: (1) manual checks using screen reader and Excel features (sorting, Find, Filter by Color, cell inspection); (2) same-AI verification — asking the same model follow-up questions, requesting code or chain-of-thought reasoning, or starting a fresh chat to avoid anchoring; (3) cross-AI verification — feeding the output to a different model (9/12 participants), especially for visual tasks, using Picture Smart AI, Be My AI, Seeing AI, NVDA AI Content Describer; (4) sighted human verification (all but P9 said they would seek it for visual tasks); and (5) leveraging prior knowledge of the data. Errors were concentrated in visual tasks — 15 of 18 observed errors — including missing charts, incorrect data points, duplicated chart legends, chart content cut-off, missing or incorrect conditional formatting colors, and overextended columns. Roughly half of visual errors went unnoticed. A striking finding: more accessible and reputable interfaces induced trust bias, with P7 acknowledging agreement even when data was wrong. AI agreement across chats or models sometimes produced false confidence because models fail in correlated ways. Chain-of-thought reasoning helped 10/12 participants build confidence but was often accessibility-buried — screen readers skipped past reasoning controls to read the answer.
Relevance
For anyone designing or evaluating AI assistants for blind users, this study operation-alizes 'skeptical trust' into a concrete taxonomy of verification behaviors and exposes where current tools fail them. Practical implications for developers: surface chain-of-thought reasoning via accessible, discoverable controls; build multi-model cross-checking directly into the interface so blind users do not have to manually pipe output between tools; let visual-description AIs ingest the full spreadsheet rather than only screenshots so off-screen errors are detectable; and help users craft verification prompts rather than requiring expert prompt engineering. For accessibility testers and procurement teams, the paper is evidence that GenAI-in-Excel/Sheets does not close the accessibility gap for visually-intensive tasks (charts, conditional formatting) — sighted assistance is still the de-facto last resort, with real costs to independence. Limitations: 12 English-speaking participants from US/Australia/Canada/India skewing intermediate proficiency; no longitudinal data; researchers were sighted.
Tags: blindness · screen readers · Generative AI · spreadsheets · AI accessibility · assistive technology · hallucinations · verification · Microsoft Copilot · ChatGPT