AutoChemplete - Making Chemical Structural Formulas Accessible (Extended Abstract)
Merlin Knaeble, Gabriel Sailer, Zihan Chen, Thorsten Schwarz, Kailun Yang, Mario Nadj, Rainer Stiefelhagen, Alexander Maedche · 2023 · Proceedings of the 20th International Web for All Conference (W4A) · doi:10.1145/3587281.3588145
Summary
This extended abstract presents AutoChemplete, an interactive labeling tool designed to make chemical structural formulas accessible to blind and low vision (BLV) students. The paper highlights a stark gap in STEM education: while 69% of US BLV students express interest in STEM during high school, only 8% pursue related college degrees, largely due to inaccessible materials. Chemical structural formulas — visual representations of atoms and bonds — are essential to studying chemistry but are almost entirely inaccessible in standard PDF documents. Even students with slight color vision impairments find approximately 40% of published figures inaccessible, and 85% of BLV STEM students report receiving course materials later than peers because of the time required for manual accessibility remediation. AutoChemplete addresses this by combining machine learning with human verification in an autocomplete-style workflow. The tool ingests a bitmap image of a chemical structural formula, uses a transformer-based encoder-decoder model to predict its SMILES (Simplified Molecular-Input Line-Entry System) string representation, then performs a similarity search against the PubChem database to suggest candidate molecules. Users can accept a suggestion directly, edit the predicted SMILES string, or refine suggestions iteratively — each manual edit triggers a refreshed similarity search, embodying a "global autocomplete" paradigm where the system suggests entire molecular solutions rather than partial ones.
Key findings
The design of AutoChemplete was informed by semi-structured interviews with three BLV chemists and four accessibility annotation professionals, yielding five core requirements: exactness of annotations (R1), speed (R2), support for less-skilled users (R3), diverse output formats without redundant work (R4), and integration into existing workflows (R5). The transformer-based ML model achieved an exact match accuracy of 83.36% for predicting SMILES strings from structural formula images, surpassing prior work (67-83% accuracy). The tool generates multiple output formats from a single correct SMILES string: textual representations (colloquial and IUPAC names) for screen readers and Braille displays, configurable vector graphics for visual or tactile output, and machine-readable formats like InChI for export. A user study with 15 participants of varying chemical expertise — from a music student to a chemistry professor — revealed seven themes. Most notably, AutoChemplete enabled participants regardless of expertise to successfully annotate structural formulas. Novices compared input and suggestions at the individual atom and bond level, while experts compared at the substructure level. Participants across the skill spectrum found the tool entertaining to use.
Relevance
AutoChemplete demonstrates a compelling design pattern for accessibility tooling: combining imperfect AI with human-in-the-loop verification through an autocomplete metaphor. This approach is generalizable well beyond chemistry — any domain where visual content in documents needs accessible annotation could benefit from a similar pipeline of ML prediction, similarity search, and interactive refinement. For accessibility practitioners working in education, the tool directly addresses the chronic bottleneck of making STEM materials accessible in a timely manner. The finding that non-expert users can successfully annotate chemical formulas with AutoChemplete is particularly significant, as it could dramatically expand the pool of people who can contribute to accessibility remediation. The multi-format output approach — generating visual, textual-auditory, and tactile representations from a single canonical form — is an efficient model for serving diverse user needs without duplicating effort.
Tags: STEM accessibility · chemistry accessibility · blind and low vision · interactive labeling · machine learning · document accessibility · tactile graphics