A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation

Matt Huenerfauth · 2004 · Proceedings of the Student Research Workshop at HLT-NAACL 2004 · doi:10.5555/1614038.1614043

Summary

Huenerfauth's 2004 student-research-workshop paper proposes a 'multi-path' (or 'pyramidal') architecture for English-to-American Sign Language machine translation that unifies the three classical MT paradigms — direct, transfer, and interlingua — into a single system, with each input sentence routed to the pathway best suited to its linguistic demands. The motivation is concretely accessibility-driven: most deaf U.S. high school graduates read English at a fourth-grade level, so accessibility aids that assume strong English literacy (closed captioning, teletype telephony) exclude users who are fluent in ASL but struggle with written English. The translation challenge is unusually hard because ASL is a visual-spatial language with no written form, no natural corpora, and no standard orthography, making statistical MT impractical; ASL also uses 'classifier predicates' — hand movements that iconically trace objects and paths in the 3D space in front of the signer, so that 'the car drove up the bumpy road' becomes a single hand tracing a bumpy up-and-down path. The paper proposes an interlingua based on a 3D virtual-reality scene model, reusing the Natural Language Instructions (NLI) system and its Parameterized Action Representations (PARs), originally built to drive animated agents from English commands. In this architecture simple sentences take the direct pathway (bilingual lexicon plus reordering), moderately complex sentences take the transfer pathway (syntactic/semantic analysis with transfer rules), and spatially descriptive sentences requiring classifier predicates route through the interlingual pathway, which builds a 3D scene model and uses it to generate topologically faithful hand movements for the signing avatar.

Key findings

As a design paper rather than an evaluation, the contributions are architectural rather than empirical. Huenerfauth argues that any single MT paradigm is ill-suited to English-to-ASL: direct systems cannot handle reorderings and default to Signed Exact English-style word-for-word output; transfer systems can handle some non-spatial divergence but cannot generate classifier predicates; and a pure interlingua would require prohibitively expensive linguistic and world-knowledge resources if it tried to cover arbitrary English. The multi-path architecture sidesteps this by deliberately limiting each pathway's coverage — the interlingual pathway only has to represent the narrow class of domains where classifier predicates matter (vehicle motion, furniture arrangement, giving directions, spatial descriptions), while the transfer and direct pathways catch the rest. A second contribution is the proposal that a 3D virtual-reality scene model can itself act as an interlingua for the classifier-predicate subset of ASL, with object coordinates and motion paths serving as a language-neutral semantic representation. Classifier predicates occur at least once every 100 signs (Morford and MacFarlane 2003), so ignoring them, as existing transfer-based systems do, yields output that is fluent only for a limited subset of English input. The paper also notes that non-topological use of signing space — e.g., pronominal reference tokens positioned in space — is a separate, non-trivial problem that the VR model naturally supports.

Relevance

For accessibility practitioners, this paper is valuable for framing why English-to-ASL captioning and translation systems have not simply been solved by the general MT progress of the past two decades: ASL's spatial grammar creates phenomena no text-to-text MT architecture can produce, so generic commercial MT cannot substitute for human ASL interpreters on the core cases that matter most. The literacy-gap framing reinforces that English captioning is not equivalent to ASL access for a substantial fraction of deaf users. The multi-path insight — route each sentence to the simplest pathway that can handle it — remains relevant to current sign-language translation pipelines and signing-avatar products, which still struggle with classifier predicates and spatial grammar. Limitations are typical of early design papers: no implementation, no evaluation with Deaf users, and an architecture that presumes English input rather than addressing bidirectional ASL-to-English needs.

Tags: ASL · American Sign Language · deaf accessibility · sign language · sign language animation · sign language generation · sign language machine translation · machine translation · MT architecture · interlingua · transfer machine translation · direct machine translation · signing avatar · animation · computational linguistics · virtual reality · classifier predicates · English literacy · natural language generation