Accessible maps

Seven years of work on something the accessibility field has effectively abandoned: real spatial cognition for non-sighted users, not turn-by-turn navigation. Seven working demos, from a single building interior to the whole of Canada: four rendered maps explored with a screen reader, keyboard, touch or voice, and three described maps that speak — from fixed descriptions to a free-form conversation. Deliberately different from one another, because there is no one right way to render a map; it depends on what the map is for. What they share is an approach to spatial cognition, and one theoretical contribution about how coordinate systems collapse under modality conversion.

The position

Maps, just like websites — and any other modal experience — need CISNA. You’re trying to give the spatial knowledge of a place to a person who cannot see; the inventory of features, the navigation between them, and the semantics of each are all in play, and the existing field has barely got past “turn-by-turn directions if you happen to be on this exact bus.” I do not do these things by half.

The tagline is do Google Maps right. The commercial mapping companies have made small beer progress on accessibility over a decade. The accessibility-focused alternatives have invested heavily in step-by-step navigation while leaving spatial cognition essentially unsolved. The maps work here does the part the field has abandoned.

Origin: this is not a retrofit

“Maps need CISNA” is not a contemporary framing applied to a new project. The doctoral Design Language chapter from around 2009 cites Google Maps as the worked example of CISNA’s composite-content handling: “The maps presented on Google Maps would be a good example of this, as each map is a composite of images and text.” Howell’s 2005 paper on spatial metaphors for speech-based mobile city-guide services is cited alongside it as precedent reading. The CISNA architecture was being mapped onto interactive geographic content in the working papers fifteen years before the current SVG-tile platform shipped. What follows is the worked example of a 2009 claim.

The field critique

Before naming where the field has stalled, an integrity note about the evidence: Bob’s positions on Audiom, GoodMaps, and Blind Square are observer-grade — based on the academic literature, published material, and direct field interaction. CNIB Access Labs has not formally evaluated any of them. The only competitor Bob has tested in a structured way is Navilens, via a small-scale installation usability test with two lived-experience testers plus Bob trying it out (CNIB Access Labs engagement; not a formal audit). The phrase Bob uses about it: “I wouldn’t be prepared to call it an audit.” That asymmetry of evidence matters in both directions — more grounded than observer-only commentary, not inflated into formal-audit language.

The four products below are CNIB Access Labs partners, recommended case-by-case depending on the environment and the kind of movement the user needs. Each represents a distinct class of navigation-and-wayfinding tool with its own pros and cons; they are not like-for-like alternatives to one another.

The field map:

Audiom (XR Navigation) — the closest existing work and the most accomplished commercial team in the space. Pin-as-datum, arrow-key movement, configurable step size, surface-underfoot announcement. Backed by 13 academic studies, 150 blind + 40 sighted co-design participants, third-party VPAT, deployed at the Wisconsin Geological Survey, Georgia Tech, NASA, and the University of Washington. Genuine strengths in empirical validation and procurement readiness that the work here does not yet have.
Navilens — a real-world signage augmentation via proprietary visual codes, not a digital map at all. Massive deployment scale (MTA, Barcelona Metro, Heathrow, Coca-Cola packaging, hundreds of brands). The structural limit: Navilens cannot give spatial knowledge of a place you haven’t visited yet — the codes are physically placed; the product augments a route once a user is already walking it.
GoodMaps — indoor wayfinding for venues mapped with their LiDAR-based 3D point-cloud technology, deployed at airports (MidAmerica St. Louis), university campuses (York University’s Glendon Campus among others), and other commercial venues. Three surfaces: a mobile app for in-venue turn-by-turn with foot-level positioning, a web platform offering interactive 3D venue maps that can be previewed before a visit, and an SDK letting venue partners embed the positioning in their own apps. The map exploration is real but venue-bounded — the user gets a map of the venue they are entering, not a cognitive model of general space or unmapped places.
BlindSquare — positional awareness in real time. As the user moves, the app announces nearby points of interest, intersections, and venue features, letting them build a mental picture of the world immediately around them. Outdoors, the positioning is GPS plus OpenStreetMap and Foursquare data; indoors, it is Apple iBeacons that venues install, each beacon programmed to describe its location (door, service counter, washroom, vestibule). Every Service Canada location in Canada is BlindSquare-enabled, alongside the Yonge & St. Clair neighbourhood deployment in Toronto and other sites. Not turn-by-turn routing; not a pre-built spatial map — the user assembles the model from in-the-moment announcements about what is right here, right now.

The gap, summarised: the most frustrating thing about accessible maps is how little real progress there has been on spatial cognition specifically. Navigation gets the attention. Cognition gets the concession.

The research literature shows the same split. Manaswi Saha, Jon Froehlich, and colleagues’ 2022 CHI study of multi-stakeholder accessibility-map visualizations is careful, empirical, top-venue work — how policymakers, department officials, advocates, caregivers, and people with mobility impairments make sense of sidewalk-accessibility data across seven map types. Its own stated limitation is the tell: the visualizations, the authors note, were not designed to support people with different visual abilities, a gap they name explicitly and defer to future work. The data is about accessibility; the map is not accessible to a non-sighted reader. That deferred piece — a map a non-sighted person can actually read and reason over — is where this work starts.

Seven working demos

Each demo runs at its own URL; the pages here are the briefs that frame them and link out. The demos divide into two families by how the map reaches the reader: four rendered maps draw the space and make it addressable, and three described maps have no graphics at all — they speak.

Rendered maps

What the four share is the approach to spatial cognition — pin-as-datum at viewport centre, dual-mode interaction (Cartesian via touch, polar via keyboard) — not the rendering or the feature set, which differ on purpose.

The difference is fit-for-purpose. In the search and map pin demo, what matters are the pinned points of interest — where the properties are — not the detail of the streets around them; so it renders a raster base with an interactive pin overlay drawn on top, and only the pins need to be addressable. The East End Toronto streetmap and the terminal map are about exploring the detailed space itself, so there everything is drawn as addressable SVG, and the richer affordances — ARIA landmarks, category filters, the rotor, the F6 region cycle — live there rather than in the pin demo. There is no one perfect solution; the right rendering follows the job the map is doing.

Search and map pin demo — the simplest, and the demo that produced the theoretical finding. By far the most stripped-down: residential streets, no interior detail. The simplicity is what exposed the asymmetry between visual scanning and blind navigation.
East End Toronto streetmap — the earliest OSM-rendering demo, first shown at a 45-minute in-person session at the 2019 Guelph Accessibility Conference. The conceptual model the family of maps shares — ARIA Landmarks, filters, rotor — originated here. Rendering is deliberately basic; the contribution is the SVG structure for screen-reader navigation.
Tiled Toronto map — the architectural successor to East End Toronto, taken to city scale: pre-rendered SVG tiles from a custom tile server, and a live, context-aware viewer with far more content, including a search over everything the map knows. Live, covering the whole of Toronto.
Terminal map — interior airport-terminal wayfinding (the worked example is YVR’s Level 3 departures). The most feature-rich of the rendered demos: gates, security, washrooms, retail, services. The terminal-grade demonstration that the approach scales beyond residential subdivisions.

Described maps

Built for a phone in the hand rather than a desktop, these start from where you are standing — or any place you name — and answer from a search index built in the same offline OSM parse as the tiled map’s tiles, covering the whole of Canada and a set of trial cities elsewhere. They describe and orient; none of them navigates, and each says so before it starts.

Context Map — your surroundings in three fixed spoken descriptions: a quick sketch, a continuous commentary as you move, and a detailed read-out. Assembled on the site’s own server from the map index; nothing you ask ever leaves the site. The fixed menu is also the limit — it can only tell you what it was built to tell you.
Conversational map — removes the buttons: ask in plain language, typed or spoken, about where you are or anywhere on the map. A language model interprets the question and chooses the map lookups; every distance and direction is computed from the map, never guessed. The trade is stated before you start: your words and location are sent to a hosted model to be understood.
Knowledge map — the same conversation with more behind it: accessibility detail down to mapped barriers in your vicinity, unnamed roads, paths and buildings for area context, transit routes and schedule patterns from published timetables, house numbers real and estimated, a hands-free voice conversation, the Context Map’s follow-me narration as you walk, cited place knowledge from Wikipedia and Wikivoyage — each such answer naming its source and its age — and a personal memory (“remember where I am”) kept on your own device.

Spatial cognition under modality conversion

The theoretical contribution. When spatial information is rendered through a modality that is sequential rather than parallel (audio, screen reader, haptic) and that the user occupies rather than observes, the spatial reference frame collapses from Cartesian to first-person polar coordinates centred on the user. Cartesian space is a sighted observer’s frame; polar space is an embodied user’s frame. The modality shift forces the frame shift.

The finding is the same one the audio Tetris work produced in different vocabulary. Converting a visual game to audio shifted the player from third-person observational to first-person immersive; converting a visual map to screen-reader-mediated audio shifted the coordinate system from Cartesian to polar centred on a chosen reference point. POIs became (name, distance, compass direction) arranged in onion-skin order from a chosen centre. Same asymmetry expressed in coordinate-system terms.

It is not modality alone — it is modality plus interaction model. Touch as input mode preserves Cartesian even when output is audio, because the finger gives direct spatial reference. The fuller picture:

Visual + Cartesian — trivially the sighted user’s case.
Audio + sequential traversal (keyboard / screen-reader-only) — polar, centred on a chosen reference. The original finding.
Audio + touch exploration — Cartesian via touch (the finger is the spatial reference; each location announces what is under it) plus polar on tap (when the user interrogates a specific POI, the polar coordinates describe its surroundings).
Audio + live egocentric (in-situ navigation) — polar centred on the user’s actual GPS location, with compass orientation. Two distinct polar systems exist: allocentric (centred on a chosen reference, declarative, exploratory) and egocentric (centred on the user, dynamic, navigational).

The pin-as-datum is the embodiment of all of this in the UI. In all three demos the pin sits at the centre of the viewport; the map orbits the pin. That makes the pin the visible signifier of four things at once: the visual marker (sighted users see it at centre); the polar origin (all distances and directions are relative to it); the datum (fixed reference the map orbits); and the user’s agent in the multi-agent / Community-of-Practice framing — negotiating on behalf of user capability and preference. Wheelchair users have agents that prioritise gradients, ramps, accessible washrooms; blind users have agents that prioritise accessible crossings and green spaces for guide-dog rest breaks. Same OSM data, same pin, same datum — but the map adapts differently because the agent at the centre is negotiating differently. That is CISNA plus the four-model capability framework plus the multi-agent CoP framing, applied to spatial cognition.

The build-level account of this — the digraph beneath the pins, the circuit that orders the polar sweep, and the markup that exposes them — is set out in how an accessible map is built.

This is paper-shaped substance that has not yet been written up. Working title: “Maps need CISNA: applying capability modelling and multi-agent communities of practice to accessible cartography.” A research direction, not a published claim.

Technical foundation

Addressable rendering where the goal is to explore the space. Commercial maps moved to raster tiles for performance; raster is opaque to screen readers. SVG elements are individually addressable, focusable, semantically labellable, scalable without resampling. Where the job is to explore the detailed space — the East End Toronto streetmap, the terminal map, the tiled Toronto map — everything is drawn as SVG, the opposite of the field’s performance-driven raster choice. Where the job is to find pinned points of interest rather than explore the surrounding detail — the search and map pin demo — a raster base carries an addressable pin overlay, and only the pins need to be vector. The accessible layer is always addressable; whether the base is SVG follows the map’s purpose, not dogma.
Pre-rendered SVG; OpenStreetMap never queried at runtime. Nothing on the platform queries OpenStreetMap (or an Overpass endpoint) at runtime. The published demos use one-time static OSM pulls, rendered offline, and served as plain assets — the East End Toronto streetmap, for instance, is a single SVG generated from one long-ago OSM extract; the data isn’t refreshed. The tiled Toronto map extends the same principle to a city: OSM data is processed offline into 0.01° geographic squares (~1km²), each rendered as a compressed SVG.gz file with ARIA labels pre-built at generation time, served from a tile server Bob maintains. The viewer fetches tiles from that server as the viewport pans; the spatial database is touched only at tile-generation time, never at view time. The map search and the described maps do query at runtime — but a self-hosted index, built in the same offline parse as the tiles; OpenStreetMap itself is still never touched at view time.
CSS-based filtering for clutter management. Visibility toggles run at CSS speed, not JavaScript speed.
OpenStreetMap as the data source. Community-maintained, openly licensed, with the fine-grained tagging the indoor and pedestrian pieces of the maps work depend on.

Universal-design discipline across four user populations

Not the usual one or two. Across the body of work, the interaction model addresses four user populations — screen-reader users, keyboard users, voice-control users (via Dragon NaturallySpeaking), and touch users — each with first-class affordances rather than a fallback experience.

The concepts below are distributed across the demos: this is the current state-of-play of the accessible maps work as a whole, not a feature list any single demo implements end-to-end. Each demo carries some subset, and each new demo has been the surface on which one or another of these ideas was first expressed in code.

Rotor (iOS VoiceOver style) for narrowing tab order to a chosen POI class. Borrowed directly from the idiom users already know.
F6 landmark cycle. Three-position cycle (sidebar → map → controls), with last-position memory at each landmark. Two F6 taps from a selected map POI returns the user to the sidebar where they were. Borrowed from Windows / Microsoft Office.
Voice control via Dragon NaturallySpeaking. Rotor includes a Dragon-optimised mode with voice-friendly category names. The voice population is often skipped; not skipped here.
Context-adapted skip-links. Standard skip-to-content / skip-to-map-controls augmented with domain-specific landmarks (e.g. “skip to Pier A / B / C / D / E” in the terminal map, with focus moving to the lowest-numbered gate in that pier).

The seven-year arc

The arc begins with the search and map pin demo: accessible spatial information about a residential subdivision — the work that produced the polar-coordinate finding. The East End Toronto streetmap followed: first publicly shown at a 45-minute in-person session at the 2019 Guelph Accessibility Conference (a low-fidelity, black-and-white, file:///-served rendering of an east Toronto streetmap) and the demo that introduced the ARIA Landmarks + filters + rotor model the family of maps now shares. The tiled Toronto map followed as the direct architectural successor of East End Toronto, scaling the single-tile pipeline to a full city with its own SVG tile server — live now, covering the whole of Toronto. The terminal map carried the conceptual model into an indoor airport surface (worked example: YVR’s Level 3 departures). Most recently the family changed modality: the Context Map reads your surroundings aloud in three fixed descriptions, the Conversational map puts the same index behind free-form questions, and the Knowledge map adds transit schedule patterns and cited place knowledge to the conversation. Same design vocabulary throughout; materially improved engineering at each step — first in rendering, now in description.

Known gaps

Surface-under-foot announcement — Audiom has it (Esri facility data carries surface metadata). The newer demos now read OSM’s surface and smoothness tags where they exist — the tiled map can filter on them, and the described maps speak them — but OSM doesn’t carry them consistently for pedestrian-relevant features, so coverage is patchy. A data-source limitation, not a design oversight.
Configurable step size on arrow-key movement — currently a TypeScript constant; should be user-configurable (city scale needs 50–100m steps; building scale needs 1–2m steps).
Direction-of-flow indication for unidirectional corridors on the terminal map — needed for any traveller who shouldn’t have to discover the direction by walking it. Known gap.
Right-click menu for non-drag pin placement — designed but not implemented.
No third-party VPAT or empirical usability validation at scale — Audiom has both; the work here has neither. Honest gap; the demos are working evidence, not procurement-ready artefacts.

Reading on

How an accessible map is built — the shared model the four demos are instances of: the digraph, the polar circuit, and how it is exposed to assistive technology.
Search and map pin demo — simplest demo, polar finding origin.
East End Toronto streetmap — 2019 origin; introduced the ARIA Landmarks + filters + rotor model the family shares.
Tiled Toronto map — East End Toronto taken to city scale; pre-rendered SVG tiles from a custom server, live and context-aware.
Terminal map — interior wayfinding, airport scale.
Context Map — your surroundings read aloud in three fixed descriptions; self-hosted end to end.
Conversational map — free-form questions over the same map index; the model phrases, the map measures.
Knowledge map — the conversation plus richer map detail, transit schedule patterns, and cited place knowledge.
The CISNA Model — the methodological substrate the maps work applies.
The 2029 framework — the multi-agent CoP framing the pin-as-datum embodies.