← All reviews

Notably Inaccessible — Data Driven Understanding of Data Science Notebook (In)Accessibility

Venkatesh Potluri, Sudheesh Singanamalla, Nussara Tieanklin, Jennifer Mankoff · 2023 · Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '23) · doi:10.1145/3597638.3608417

Summary

This paper presents the first large-scale analysis of the accessibility of computational notebooks — interactive documents combining code, text, and data visualizations that have become the standard tool for data science work. The researchers analyzed 100,000 Jupyter notebooks randomly sampled from a dataset of 10 million hosted on GitHub, focusing on three dimensions: data artifact accessibility, authoring practices, and the impact of distribution infrastructure. Their analysis pipeline extracted source code, output figures (342,722 images classified into 28 categories), and tables from the notebooks, then assessed accessibility through automated scanning with aXe and HTML Code Sniffer engines across six popular IDE themes, complemented by manual screen reader testing with JAWS, NVDA, and VoiceOver across multiple browsers. The research addresses a significant gap: while extensive work has examined how data scientists use notebooks and how to make notebook editor interfaces more accessible, almost nothing was known about the accessibility barriers embedded in the notebook artifacts themselves — the published outputs that blind and visually impaired (BVI) users encounter when consuming shared notebooks. The study intentionally used optimistic upper-bound metrics, meaning the actual accessibility situation is likely worse than reported.

Key findings

The results paint a stark picture of notebook inaccessibility. Of 342,722 programmatically generated images, 99.81% lack alternative text entirely — only one image had ALT text generated from code. Matplotlib, the most popular charting library (used in over 70,000 function calls), does not support embedding ALT text in its outputs despite being technically feasible. Only 34.1% of notebooks contain data tables, yet tables are critical for making visualizations accessible to screen reader users. Just 4.53% of notebooks with figures include both markdown descriptions and accompanying tables near their images. Regarding navigation, 48.36% of notebooks contain a heading in the first cell, but only 59.67% of those use the correct H1 level, meaning screen reader users may skip useful content or encounter unexpected document structure. Theme choice has a significant impact on accessibility: the VSCode default Horizon theme produced 84.95% fewer accessibility errors than the Jupyter default Light theme. Manual screen reader testing revealed that large notebooks (above the 85th percentile in file size) can crash screen readers or browser tabs entirely, and JAWS and NVDA on Windows skip all base64-encoded images in code output cells, while VoiceOver on Mac performed more reliably across notebook sizes.

Relevance

This research is essential reading for anyone involved in data science education, open data publishing, or accessibility advocacy in technical communities. The findings reveal that the data science ecosystem has a fundamental accessibility gap: the tools most commonly used to create and share analytical work produce outputs that are almost entirely inaccessible to blind and visually impaired users. For practitioners, the actionable takeaways are clear — add ALT text to visualizations, include data tables alongside charts, use proper heading hierarchy in markdown cells, and choose accessible themes when exporting notebooks. For tool developers, the paper demonstrates that relatively small changes (like enabling ALT text in matplotlib's PNG backend via EXIF metadata) could have outsized impact given how widely these libraries are used. Organizations publishing notebooks should consider integrating accessibility checks into their CI/CD pipelines.

Tags: computational notebooks · data science · screen readers · blind and visually impaired · data visualization · alternative text · automated testing · web accessibility

Standards referenced: WCAG 2.0 · WCAG 2AA