Audio Tetris

A Java/JOAL audio rendering of the most visual game, built as the doctoral framework’s deliberate falsification test. Discovered, by accident, that the modality shift turned a third-person observational game into a first-person immersive one.

The person

The audio Tetris was built as the rendering case study in the doctoral work. Audio testers included Bob’s husband, Taodi, who appears by name in the case-study chapter: “Taodi took a while to understand...” A working tool tested on a real listener, not a theoretical exercise.

The constraint

Tetris is a paradigmatically visual game: falling tiles, terrain shape, fit quality, line completion. The rendering case study had to express all of that in audio, with sufficient richness that the player could play the game. The information channels needed: type of falling tile, position, orientation, terrain shape under the falling tile, fit quality of any given placement, gravity (rate of fall), line completion, scoring. Audio is sequential by default, where the visual scene is parallel.

Tetris was deliberately chosen as a falsification test, not as a teaching example. From the case study:

“What really defeats existing assistive technology is the proximal content inherent in the game — rotating and guiding falling shapes to match gaps on the floor of the grid. If the approach in this research to accessibility is truly better than existing AT, then one would expect to see it succeed in this proximal context.”

The doctoral framework hung on Tetris working.

The artefact

A Java implementation using JOAL (Java OpenAL audio bindings), with seven specific audio metaphors developed and iterated:

Aside — whispering the type of the next tile and the contents of the hold box into the player’s right ear.
Musical sonar — a single note for each column of the falling tile’s width, played in sequence around the user; higher notes mean better fit. “It works surprisingly well. Well, once you get the idea.”
Dancing margins — sounds left and right of the player, with 3D distance expressing the grid distance to the play-area edges. Iterated several times; the 3D audio engine wasn’t great, so the implementation settled on a dance in music rather than a dance in location.
Talking scrollbar — the falling-tile sound played left, middle, or right to locate the tile horizontally.
Direction-as-direction — animated sounds passing the player in N/S/E/W directions to indicate orientation. Eventually replaced with a separate spoken voice in a different register from the tile description, because the 3D audio quality wouldn’t support the directional metaphor.
Gravity as waterfall — ambient sound of falling water, manipulating volume and pitch over time so the water feels nearer. Implemented as a point source after experimentation.
Braided audio — interleaved play-out of musical sonar and dancing margins, with prioritisation (two scans of sonar to one of margins) as a way to share the audio resource and express importance simultaneously. Adapted from Schmandt’s “Audio Hallway” navigation approach.

The insight

Converting Tetris from visual to audio turned a third-person observational game into a first-person immersive one. And it wasn’t deliberate.

From the chapter:

“The game became immersive because the player became the centre of all interaction modalities. The tile moves relative to the player (and simultaneously, the distance of the margins from the tile are described relatively to the position of the user), gravity ebbs and flows towards the user, and the sonar plays out around the player.”

The audio metaphors were built to test specific information-channel hypotheses. The immersion re-framing was the by-product, and it raises a question that the original research question didn’t anticipate: when assistive tech translates from one modality to another, is it merely changing the channel, or is it changing the experience itself?

The teaching

Two pieces, both of which generalise beyond Tetris.

The sonic design space is naturally immersive. Bob looked for observational audio metaphors as alternatives to the immersive ones, and “came up empty.” When the modality is audio, the player is at the centre of the perceptual field by default; the observational stance that the visual version of Tetris encouraged is not available without active engineering effort to suppress immersion. The bias of the modality matters.

Current screen-reader assistive tech is therefore an extremely narrow slice of what audio accessibility could be. Most observational tools — screen readers, captioning, text-to-speech of visual UIs — translate visual content into a single linear audio channel, losing parallelism, losing positional information, losing the option for immersion. Audio interfaces can be richly immersive; the dominant assistive-tech approaches just don’t take that option. When we build assistive tech, the question worth asking is not just can the user access the content? but what experience are we offering? Two different questions, two different success criteria.

The closure

A perfect closure exists in the Personas appendix of the doctoral framework. From David Furness’s persona — profoundly deaf, protanopic — comes the line:

“Even a simple game such as Tetris is a problem on versions with a black background as one of the standard shapes — a long red rectangle is essentially invisible to him.”

The very game chosen as the framework’s hardest test fails for the colour-blind user, in a way the standard visual game also fails. The framework exists to handle exactly this case. The case study and the Personas appendix close the loop on each other.

Reading on

Tetris as accessibility testbed — the methodology framing that this artefact tested.