Speaking the map, and finding your place

Two problems sit underneath an accessible map and are still genuinely open: how a feature is announced when a reader explores by touch, and how a reader stays oriented — finding focus when the map is zoomed out, and knowing where they are when there is no single anchor to describe everything against. This page works through both, and is honest that they are design questions, not settled practice.

Who announces, and who draws focus

The trick this map plays is to expose the SVG so that the screen reader announces the content on explore-by-touch. The map is real, first-class SVG on the page, each feature carrying its ARIA label, so when a reader moves a finger (or a mouse) over it, the screen reader itself speaks what is underneath. The announcement is the assistive technology’s job, done well, for free.

There is a cost to that delegation: if the assistive technology does the announcing, it also draws the focus indicator — and screen readers are not sophisticated about this yet. The focus outline is usually a rectangle, regardless of the actual SVG shape it surrounds, so an irregular building or a curving road gets a bounding box. Its default thickness and colour are tuned for text on a page, and they work poorly over the busy, varied fills of a map. Hand the announcement to the screen reader and you lose control of the focus indicator’s shape, weight, and colour — which matters a great deal on a map.

The terminal map takes the other path: it handles explore-by-touch in its own code and announces through an ARIA live region. Because the map is now in control, it can draw its own shape-aware highlight, in the right weight and colour — the focus-indicator problem goes away. But a new one arrives: a live region queues, and does not interrupt. A reader exploring a busy map by touch gets stuck behind a stale list — the live region reads out every feature their finger has already passed over before it catches up to where the finger is now.

Neither approach is ideal. Delegating to the screen reader gives accurate announcements and explore-by-touch for free, but a rectangular, poorly contrasting focus outline you cannot style. Owning it in code gives full control of the highlight, but inherits the live region’s staleness. And the “announce only where I am now, interrupting whatever came before” behaviour that would fix the second problem is intrinsic to the screen reader’s own focus and explore-by-touch engine — it is exactly what the first approach gets for free, and exactly what the live-region approach structurally loses. It is not a feature you can bolt back onto a live region.

It is tempting to think a live region could just be made to interrupt, but it can’t, reliably. aria-live=“assertive” only raises the priority of an announcement — and even its interrupt behaviour varies across NVDA, JAWS, VoiceOver, and TalkBack. There is no property that flushes an already-queued backlog, and replacing the region’s text doesn’t guarantee the screen reader discards speech it has already started. Live regions are specified for low-frequency status messages, not for high-frequency positional tracking. The realistic mitigations are only partial: you can throttle to dwell — announce only when the finger settles, not on every feature passed over, which never generates the backlog but loses the continuous “drag and hear everything” feel; or you can move real DOM focus to the feature under the finger, which restores the screen reader’s self-interruption but lands you straight back on its rectangular focus outline — the other horn of the trade-off.

A possible answer: the Web Speech API and audio ducking

The candidate fix — proposed, not yet built or tested — is to drop the live region and announce through the Web Speech API (speechSynthesis) instead. A live region cannot flush its backlog; speechSynthesis can. Calling speechSynthesis.cancel() clears the current and queued utterances, and you immediately speak() the new one. So on explore-by-touch you cancel-then-speak on every move, and the reader only ever hears where the finger is now — the self-interrupting behaviour the live region lacks, recovered in code.

It is not a clean win. The page’s synthesis voice will not be the reader’s own screen-reader voice, so they hear a mix of two voices — the map in one, their screen reader in another. And because a map that just talks would be constantly chatty to everyone, the behaviour has to sit behind an opt-in accessibility toggle rather than be on by default.

What it does not need is any attempt to silence the screen reader. When the screen reader speaks, the operating system applies audio ducking — it lowers other audio, the map’s synthesis included, so the screen reader stays legible. That is exactly what you want: the screen reader should be able to talk at the same time, for system messages, and remain clearly on top. The division of labour is clean — the page owns feature announcement, the screen reader owns its own system and chrome messages, and ducking arbitrates the overlap. The one genuinely open detail is that speechSynthesis will not inherit the reader’s chosen voice, rate, or verbosity, so the map voice may feel foreign or wrongly paced unless those are exposed as settings.

Finding focus when the map is zoomed out

Zoom on this demo is fixed by the reader. When the map is zoomed out, it can be very hard to see where focus is moving, because the features are — by definition — very small. The demo today does the simplest thing: a location indicator of a fixed size, regardless of zoom. What should happen when focus lands on a feature too small to see is an open question, with four candidate answers:

A magnification bubble over the current location — like the old macOS Dock magnification effect (now off by default, but presumably still available), magnifying the focused area in place.
Zoom on tab to guarantee a minimum indicator size — when focus moves, zoom the map so the indicator is always at least some readable size. This is the opposite of the current fixed-size approach.
Contextual zoom — zoom by the graphical size and the semantic meaning of the focused feature. A park, a school ground, or hospital grounds would be framed together with some of the surrounding locality. The open sub-questions: what to do with a long road or street, which doesn’t frame neatly, and how much context to give a point feature like a transit stop.
Leave the zoom alone and improve the marker — keep a static indicator but make it findable at any zoom, the way ZoomText offers a cross-hairs locator alongside its enlarged-pointer variations.

Bob’s preference is the third — contextual zoom — but with two honest caveats: it isn’t built fully (the terminal map only gestures at the idea), and there is no user data to decide on yet.

Knowing where you are

The search and map pin demo had it easy: every pin is described relative to the chosen property, so even without explore-by-touch a reader gets a sense of the spatial relationships — this amenity is north-east of the property, two hundred metres away. There is a natural anchor to describe everything against. A full streetmap has no such anchor; tab into it and you are, in effect, stuck in Cartesian coordinates inside a graph. Conveying where you are needs two things at once.

A relative account. Describe each feature by its nearest neighbours, by their relative importance, and by cardinal direction — which is what the terminal map does, each point of interest naming its nearest others and their compass direction. This map does not do it yet, and the reason is the data: the OpenStreetMap extract is too incomplete and unstructured to rank importance or to trust the neighbours it offers — as the East End Toronto streetmap page sets out in full.

An absolute account. Alongside the relative description, the reader should know the Cartesian reality — how large a slice of map they actually have. How many metres or kilometres (feet or miles) the view spans side to side and top to bottom; and, at least roughly, how much real distance the width of their finger represents. That last one translates the physical act of exploring by touch into real-world scale: a finger-width is so many metres, so a reader can feel how far they have moved, not just what they are over.

Or create an anchor. A third option is to let the reader plant one — essentially what you do in Google Maps when you drag Street View onto a point on the map. Drop a movable reference point and everything can be described relative to it, which restores the search and map pin demo’s “relative to the property” advantage on a map that has no built-in anchor. Bob is experimenting with this in the tiled Toronto map.

Reading on

East End Toronto streetmap — the demo these notes are about.
How an accessible map is built — the shared model: typed nodes, the convenience graph, and how a circuit is read.
Maps — the wider accessible maps work.