Task Mode: Dynamic Filtering for Task-Specific Web Navigation using LLMs

Ananya Gubbi Mohanbabu, Yotam Sechayk, Amy Pavel · 2025 · ASSETS 2025: 27th International ACM SIGACCESS Conference on Computers and Accessibility · doi:10.1145/3663547.3746401

Summary

This paper presents Task Mode, a Chrome browser extension that dynamically filters web content based on user-specified goals using large language models (specifically GPT-4o) to identify and prioritize task-relevant elements while suppressing distractions. The system addresses a fundamental accessibility disparity: while sighted users can visually skim a webpage in seconds to find relevant content, screen reader users must navigate sequentially through all elements, spending minutes traversing irrelevant content. Task Mode works by extracting the complete HTML structure of a webpage, decomposing the user's natural language task description into structured components (entity, constraints, actions, defaults, fallbacks), then using GPT-4o to assign relevance scores (0-100) to each element based on how well it supports task completion. The system processes textual elements, images (using CLIP similarity scoring combined with alt text analysis), and SVG icons separately. Three visualization modes are offered: color gradients showing relevance as a heatmap overlay, varying opacity that fades irrelevant content, and a task-specific-only mode that completely hides low-relevance elements using aria-hidden attributes. Crucially, Task Mode preserves the original page layout and structure rather than restructuring the DOM, maintaining users' spatial mental models. The system persists task context across multiple pages, supports real-time task updates, and allows users to adjust the relevance threshold to control how much content is displayed. The extension was implemented in JavaScript with a Python Flask backend and Firebase database, processing pages with an average latency of 10.68 seconds and cost of /bin/zsh.10 per page.

Key findings

A user study with 12 participants (6 vision users, 6 screen reader users) demonstrated significant results. Screen reader users completed tasks 52% faster with Task Mode compared to traditional browsing, with mean task times decreasing from 211 seconds to 102 seconds (p < 0.05). The task completion time gap between screen reader users and vision users decreased from 2x to 1.2x, substantially reducing the accessibility disparity. Screen reader users reported significantly reduced mental demand (2.28 vs 3.72, p = 0.034) and physical demand (1.44 vs 2.11, p = 0.035) on the NASA-TLX scale. For vision users, Task Mode showed non-significant improvements in completion time (107 to 84 seconds) but was preferred over a human-agent collaborative browsing approach, which users found unreliable and frustrating due to lack of transparency. Importantly, 11 of 12 participants wanted to use Task Mode in the future. Screen reader users particularly valued that Task Mode complemented rather than replaced their existing navigation strategies (heading and landmark navigation), while vision users appreciated reduced cognitive load and fine-grained filtering control. A key design insight was that the system was designed for both visual and non-visual access from the beginning, rather than being retrofitted — an approach the authors argue produces better outcomes than the typical pattern of designing for sighted users first and adding accessibility later.

Relevance

Task Mode represents a significant advance in using LLMs to bridge the accessibility gap in web navigation. The approach of designing simultaneously for screen reader users and vision users — rather than retrofitting — offers a model for inclusive technology development. For practitioners, the finding that over 95% of top websites have WCAG failures underscores the need for tools that can improve web accessibility on the user's side. The system's preservation of page structure and complementary relationship with existing screen reader navigation strategies (headings, landmarks) demonstrates thoughtful design that respects established assistive technology workflows. Limitations include the processing latency (10.68 seconds per page), cost (/bin/zsh.10 per page), inability to handle WebGL/Canvas content, and occasional errors in relevance scoring where LLMs may miss important interface elements or misunderstand context. The small sample size (N=12) limits statistical power, and the controlled study tasks may not fully represent the complexity of real-world browsing.

Tags: web navigation · screen reader · large language model · content filtering · task-based browsing · browser extension · inclusive design · web accessibility · GPT-4o · DOM manipulation

Standards referenced: WCAG · WAI-ARIA