Conversational map

The Context Map gives you three fixed descriptions: quick, continuous, and detailed. Useful, but one size fits all. The Conversational map removes the buttons and lets you simply ask — in plain language, by typing or speaking aloud — about where you are now, or anywhere on the map at all.

A test, not a finished demo — the next step past the three-button model, being tried out and learned from.

Try the interactive demo

Open the Conversational map (opens in a new window)

You will be asked to read and accept the notice, then to allow location access. It opens in its own window; close it to come back here.

What it is

A plain text box. You ask a question — “what’s near me?”, “is there a step-free entrance to the library?”, “how far is the CN Tower, and which way?” — and it answers from the same map database behind the other maps. The fixed descriptions could only ever tell you the handful of things they were built to tell you. This lets you ask the question you actually have.

And it is not limited to where you are standing. Because every named place, address and feature in the index can be looked up by name, you can ask about anywhere — what surrounds a station across the city, whether a park has accessible paths, how a neighbourhood you have never been to is laid out.

How it works

Behind the text box, a language model interprets your question and decides what to look up — it does not invent answers, and it does not do the geography itself. It calls a small set of map tools that query the index and return the facts already worked out: what is near a point, what a place contains, the distance and direction from one spot to another. The model’s job is to understand what you asked and put the answer in plain words; every distance and direction comes from the map, computed, not guessed.

That division is deliberate. Language models are unreliable at spatial reasoning — distances, bearings, what lies between two places — so none of that is left to the model. It chooses the questions; the map does the measuring.

Sending your words elsewhere to answer

Understanding free-form questions needs a capable language model, and that runs as a hosted service rather than on the page. So to answer you, what you type — or, if you speak your question, your voice as you say it — together with your current location (when you have shared it) is sent over the internet to a third-party service to be processed; spoken questions go to a separate speech-to-text service first to be turned into words. The rest of the site is self-hosted and sends nothing to anyone; this one feature is the exception, and the notice before you start says so plainly. Do not type or say anything you would not want handled that way.

The same map underneath

There is still no map to look at. The answers come from the OpenSearch index that powers the tiled Toronto map and the Context Map — every shop, crossing, bench and water’s edge as a record with its position, its kind, and its accessibility detail. The index now reaches well beyond Toronto: it covers the whole of Canada, so the same question works in St. John’s, Yellowknife or Victoria as in Cambridge.

Built on OpenStreetMap

All of the place data comes from OpenStreetMap, the crowd-sourced map of the world. Its limitations come straight through: a building nobody has traced is missing, a shop that changed hands may carry the old name. The map can only ever be as current and complete as the data underneath. As on the Context Map, it only tells you what is mapped — silence means “not mapped”, never “not there”.

A test, not a tool

This is unfinished, untested software, and it says so before you can use it. Every time you open it you read and accept a notice — that it can be wrong, that it can misjudge distance or direction or answer incorrectly, that it sends your words and location to an outside service, and that it is not for navigation or any safety decision. Keep using your usual ways of getting around at all times.

You can type your question or speak it aloud and hear the answer read back — with clock-face directions relative to the way you are facing, the heading work from the Context Map carried straight over.

Colophon

A colophon is the note at the back of a book about how it was made. Each map in this family gets one, because the decisions behind an accessible map — what to store, what to match, what to leave out — are the interesting part, and worth showing rather than burying. Several shaped this one: what it borrows from the maps before it, how you speak to it and where that sends your voice, why a chatbot is the interface at all, how the search copes when a spoken name arrives mis-spelled, and whether to put back the unnamed paths and buildings most maps drop.

Built on the maps before it

Open the Conversational map, ask a question aloud, and most of what happens around the answer is not new. The answer read out in a synthetic voice, the clock-face directions relative to the way you are facing, the screen kept awake while you listen — all of it is machinery lifted, almost unchanged, from the Context Map. These maps are a family, not separate builds, and the family shares its parts.

Spoken answers use the browser’s own built-in speech, with a fallback: on a phone with no voice of its own — a de-Googled Android, say — the answer is written instead into a quiet, polite live region, and the reader’s own screen reader speaks it. The two never sound at once. That arrangement was worked out on the Context Map and carried straight over. So was the compass: a tilt-compensated reading of the phone’s magnetometer turns “north-east” into “about two o’clock” — relative to where you are actually facing, which is what a walker needs — and that is the Context Map’s code, reused whole.

And the screen wake lock: ask a question and the phone must not lock halfway through reading you the answer, so the page holds the screen awake while it is open and in front of you. It can only do that while it is the visible tab — it cannot keep the screen on with the phone pocketed or locked, which would need a native app, and this is a web page. The same limit, and the same code, as the Context Map it came from.

None of this is remarkable on its own. It is in the colophon because the reuse is the point: a new idea — the conversation — resting on settled, tested machinery rather than rebuilt from nothing.

Hearing the question, and where your voice goes

You can type, or tap Speak and talk. It streams as you go: the words appear in the box in real time, and it sends the moment you stop — about a second’s pause is taken as “finished” — so most of the time there is no second tap. Speak and Stop are both still there for when you want them (Stop sends straight away), and a short rising tone marks the microphone going live, a falling one marks it stopping, so a blind user knows the state without watching the screen. What was heard is read back before it is acted on, so a mishear is caught by ear.

The hard case is that a blind user is often somewhere loud — a march, a platform, a busy street — and the microphone hears all of it, not just them. Two things handle that. The speech service separates the voices it picks up, and the app locks onto the first one to say a few words — you, holding the phone — keeping only your words and dropping the conversation happening behind you. And it decides you have finished not from silence, which never comes in a crowd, but from the gaps between your words — so the chatter around you doesn’t stop it knowing you have stopped. It is a heuristic, not a guarantee — a bystander who gets a sentence in first could fool the lock — but for a phone you are holding and talking into it is right nearly always, and it is the whole reason for using a speech service strong in noise rather than the one built into the browser.

Turning speech into text is the one part the page cannot do itself: a language model cannot transcribe audio, so that is a separate service — Deepgram. So a spoken question leaves the device twice — once as sound to be turned into words, once as those words to be answered — where a typed one leaves it only once.

A typed question leaves the device once:

    your words + location  ->  language model     (understands, answers)

A spoken question leaves it twice:

    your voice (audio)     ->  speech-to-text      (becomes words)
    your words + location  ->  language model

Where the audio goes got better with the streaming, too. Your voice now goes straight from your browser to the speech service, not through my server on the way. The key that would let anyone run up a bill on that service never reaches your phone; instead my server hands the browser a token good for about thirty seconds — long enough to open the connection and no longer. So no audio passes through my server, the path is as direct and as quick as it can be, and the credential stays mine.

The service is Deepgram, for plain practical reasons: I already had a key, its models hold up well in exactly the noisy conditions above, and being someone else’s cloud it puts no load on the small server everything else here runs on. The honest preference is still to run the speech-to-text on my own machine one day, so the audio never leaves at all; that needs a bigger box than the site sits on now, so for the moment it is a hosted service — openly disclosed, and told to you before you start.

Why a chatbot is the interface

All of that — your words leaving the device, twice over for a spoken question — follows from one decision: to make a language-model chatbot the interface at all.

The Context Map answers with three fixed buttons: quick, continuous, detailed. They always work, they never surprise you, and — this is the part that matters here — they need no outside help. The descriptions are assembled on my own server from my own map data, and nothing about your question ever leaves. The price of that is that they can only ever tell you the handful of things they were built to tell you.

The chatbot trades that property for its opposite. There are no fixed questions, so you can ask the one you actually have — “is the library’s side entrance step-free?”, “where’s the nearest bench in the shade?” — and to understand a question phrased any way at all takes a model too large to run on a phone, or realistically on a small server. So it runs as a hosted service, and understanding your question means sending it there. The flexibility and the privacy cost are one decision seen from two sides; you cannot take the first without the second.

A language model as an interface carries a second cost: it can be confidently wrong. The guard against it is the division of labour described further up this page — the model is allowed to choose what to look up and to put the answer into words, and nothing else. It never measures a distance or a direction; those come from the map, computed. It is handed the facts and asked to phrase them, not asked to know them. That does not make it incapable of error — so the map says so plainly, every time you open it — but it keeps the errors to wording, not invented geography.

So the chatbot is not free, and it is the one place this site reaches outside itself. The rest of the site is self-hosted and sends nothing to anyone; this map, to answer a question it was never specifically built for, sends your words to a model that can. Whether that trade is worth it depends on the question you have — which is exactly why the fixed-button Context Map is not being retired, but kept alongside it. Two answers to the same need, each giving up something different.

When the map mishears a name

Ask a question out loud and it goes through speech-to-text first, which is good at ordinary words and bad at proper nouns it has never seen — street names most of all. A real example: spoken aloud, “Hannaford Street” came back as “Hanaford”, a letter short, and the street was not found.

The obvious diagnosis — the search should match names by sound, not spelling — turned out to be wrong about what had actually failed. The spelling-tolerant match had already found Hannaford Street; the problem was which one it picked. With no sense of where the question was being asked from, a same-distance look-alike two provinces away — a “Handford” near Ottawa — tied the real street next to you and won. It was a ranking problem wearing a spelling problem’s clothes.

The fix was to anchor every lookup to where you are standing and let closeness break the tie, so the local street wins. With that in place, ordinary one- or two-letter mishears are absorbed by the spelling-tolerant match anyway — “Spadeena” finds Spadina, “Bathert” finds Bathurst — and the right feature beside you comes back.

That leaves the harder mishears, where the spoken word lands more than a letter or two from the real name. The textbook tool for those is phonetic matching: index every name by how it sounds, and match on the sound. I built it, and measured it, before deciding whether to keep it.

The common phonetic encoder, double metaphone, proved too coarse to help: it reduces a word to a short sound-code, but the codes collide, so a search for the misheard word drags in hundreds of unrelated names. A precise encoder, Beider-Morse, is far cleaner — it keeps genuine sound-alikes together while letting nonsense fall away.

Matching by sound means giving every name a sound-code. The common
encoder, double metaphone, is too coarse — the codes collide:

    Yonge  ->  ANJ  ANK
    Young  ->  ANK
    Wing   ->  ANK          one code, shared by hundreds of words

A precise encoder, Beider-Morse, keeps real sound-alikes together
and lets nonsense fall away:

    Yong   ->  iank  ionk           (the misheard input)
    Yonge  ->  iank  iongi ...      shares "iank" with Yong
    Young  ->  ionk  iunk  ...      shares "ionk" with Yong
    Hong   ->  ank   onk   ...      shares nothing with Yong

But clean or not, it changed nothing where it mattered. Stand on Yonge Street, say “Yonge”, and speech-to-text writes “Young”. “Young” is a real word — there are Young Cafés and Young Drivers, and they match it exactly. Phonetic matching pulls Yonge Street into the running, but it sits below those exact matches with the sound-code or without it.

You are standing on Yonge Street. You say "Yonge"; speech-to-text
writes "Young". The top matches, with the sound-code and without:

    with phonetic              without (what runs today)
    ------------------------   ------------------------
    Young Drivers of Canada    Young Drivers of Canada
    Young Cafe                 Young Cafe
    Way Young Tech             Way Young Tech
  > Yonge Street               Yonge Street <
    ...                        ...

Identical order. The real word "Young" matches exactly and wins
either way; what puts Yonge on the list at all is that you are
standing on it.

So I left phonetic search on the shelf. The honest reason is that no sound-code can — or should — make “Yonge” beat an exact “Young”; that would break every real search for Young. What actually resolves it is context, and the conversational map already has it: it knows you are standing on Yonge Street, and can simply say so — “you’re on Yonge Street; did you mean that, or Young Drivers of Canada, two hundred metres away?” Re-processing every record in the index, for a heavier index and a result that reorders nothing, was a cost without a benefit. The simpler machinery — spelling-tolerance, closeness, and the model’s knowing where you are — carries it. Phonetic search here is a thing I tried, measured, and chose against, which is a different thing from one I never thought of.

Putting the unnamed map back in

Maps, and map searches especially, are built around names. A named street is findable — you type it and there it is. An unnamed service lane, a footpath cutting across a park, a building nobody has labelled: these are usually dropped from the searchable map, because there is nothing to type to find them.

For a sighted reader that loss is invisible — they see the laneway, the alley, the dense row of buildings, named or not. Reading the map through description, a blind user gets none of it unless it is in the data, and it is exactly the orientation a sighted reader has for free: that you are hemmed in by buildings, that a footpath cuts off to your left, that this block is dense and the next one open. The principle these maps hold to is that the non-visual reader gets the same map the sighted one does; the unnamed texture is part of that map, so it has to go back in.

Putting it back takes care, because there are millions of these features and they must not clutter a search for named places. So they go in marked description-only, carrying no searchable text — the same way the map already handles unnamed water, woodland and parkland: as character that colours a description without ever surfacing in a name search. Within that, the two kinds are stored differently, by how much detail earns its keep.

Unnamed paths and laneways keep their full shape and their accessibility tags. An unnamed footpath’s surface, width and steps are the whole point of an accessible map, so they are worth the space — enough for the map to say “a footpath about twelve metres to your left”.
Anonymous buildings are kept deliberately thin: a centre point and a coarse size — small, medium or large — and nothing more, no outline. That is enough to feel a place’s density (“dozens of buildings within a hundred metres, a couple of them large”) without the index ballooning under the sheer number of them.

The cost is real but modest, and I measured it rather than guessed. Recovering the unnamed features roughly doubles the feature count in a dense city, but because the thinned buildings are so cheap to store it adds only a few gigabytes across the whole of Canada. A deliberate trade: more to keep, in exchange for a map that can describe the spaces between the named things, not only the named things themselves.

Both decisions are recent. The spelling-tolerant, closeness-ranked search is already live; the unnamed features are being folded into the map now. This is a test, learned from in the open — the reasoning is written down because a decision you can see is one you can argue with.

Source

GPL-3.0, part of the tiled Toronto map project — github.com/bobdodd/tiled-toronto-map. The place data is derived from OpenStreetMap, © OpenStreetMap contributors, under ODbL.