Characterizing Visual Intents for People with Low Vision through Eye Tracking

Ru Wang, Ruijia Chen, Anqiao Erica Cai, Zhiyuan Li, Sanbrita Mondal, Yuhang Zhao · 2025 · Proceedings of the 27th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2025) · doi:10.1145/3663547.3746391

Summary

This study investigates how people with low vision use their gaze when viewing images, with the goal of understanding their visual intents — the immediate, in-the-moment goals behind their eye movements. While assistive technologies like magnifiers and contrast enhancement exist for low vision users, these tools apply uniformly without adapting to what the user is actually trying to do. Understanding visual intents could enable smarter, context-aware assistive tools that provide the right support at the right time. The researchers conducted an eye-tracking-based retrospective think-aloud study with 20 low vision participants (ages 21-82, with conditions including macular degeneration, retinitis pigmentosa, glaucoma, cone dystrophy, and others) and 20 sighted controls. Participants viewed images from five everyday contexts (news, e-commerce, social media, travel, productivity) on a 24-inch display with a Tobii Pro Fusion eye tracker while answering questions designed to elicit different levels of visual information processing. Crucially, after completing each task, participants watched playback of their own gaze trajectories and reflected on what they were doing and why, allowing the researchers to ground the taxonomy in participants' subjective experiences rather than purely algorithmic classification. From qualitative coding of these reflections (achieving Cohen's Kappa of 0.73), the researchers derived a taxonomy of five visual intents shared by both groups: searching (directing fixations toward a target object), observing (concentrated fixations on a single object to identify details), traversing (sequential fixations across adjacent objects for counting or reading), comparing (shifting back and forth between objects to identify relationships), and exploring (widely distributed fixations to gather contextual information across the entire image).

Key findings

The study revealed significant differences in gaze behavior across visual intents. Observing produced longer fixation durations and lower stationary entropy (more concentrated attention) than other intents. Comparing showed longer saccade amplitudes and more uneven attention distribution across objects. Searching and exploring involved more time spent on image backgrounds rather than foreground objects. Comparing low vision and sighted participants revealed that low vision participants visited significantly more objects overall, likely to compensate for their visual conditions. During traversing and exploring, low vision participants scanned more broadly than sighted participants, who could gather contextual information through peripheral vision without fixating directly. Sighted participants also showed lower stationary entropy during traversing and exploring, indicating more efficient scanning. Visual ability had measurable effects: low visual acuity was associated with shorter saccades (suggesting higher cognitive load), and peripheral vision loss led to visiting more objects (compensatory scanning). Low vision participants also demonstrated unique behaviors beyond the five intents, such as deliberately shifting gaze to higher-contrast regions to "recalibrate" their color perception, and experiencing visual confusion from object misidentification that led to repeated re-examination.

Relevance

This research provides a foundational framework for designing intent-aware assistive technologies for low vision users. Rather than applying blanket augmentations like magnification everywhere, future tools could detect what the user is trying to do — searching, observing, comparing — and tailor support accordingly. For example, a system detecting comparing intent could magnify and enhance contrast on the specific objects being compared, while exploring intent might trigger a broader scene description. The five-intent taxonomy is practical and grounded in real user behavior across diverse visual conditions. For practitioners building low vision tools, the key takeaway is that visual ability (both acuity and peripheral vision) significantly shapes gaze patterns, meaning intent recognition models trained only on sighted users will likely fail for low vision users. The study also highlights that current eye tracking calibration methods need adaptation for low vision users, as standard approaches assume precise fixation ability that many low vision people lack.

Tags: low vision · eye tracking · visual intent · gaze behavior · assistive technology · user research · image accessibility