Demo of VJ-Voicebot: Control of Robotic Arm with the Vocal Joystick

Brandi House, Jon Malkin, Jeff Bilmes · 2007 · Proceedings of the 9th International ACM SIGACCESS Conference on Computers and Accessibility (Assets '07) · doi:10.1145/1296843.1296895

Summary

This demonstration paper from the University of Washington presents VJ-Voicebot, a system that allows individuals with motor disabilities to continuously control a 5 degrees-of-freedom robotic arm using non-verbal vocal sounds. The system builds on the Vocal Joystick (VJ) engine, which leverages the flexibility of the human vocal tract to provide continuous control parameters without the constraints of spoken language. The VJ engine extracts pitch, loudness, and vowel quality from continuous vocal sounds in real time — vowel quality maps to a 2D direction space while loudness controls speed. This approach differs fundamentally from spoken language commands (which are discrete and limited) and from brain-computer interfaces (which are invasive and expensive). The hardware is a Lynx6 hobbyist robotic arm by Lynxmotion with shoulder rotation, shoulder bend, elbow bend, wrist rotate, and wrist bend joints, plus a 2-prong gripper for manipulating small objects. Only a standard headset microphone and sound card are needed beyond the arm itself. Two control methods are implemented: inverse kinematics (the user specifies gripper position in Cartesian space using four vowels and pitch, with joint angles calculated automatically) and forward kinematics (the user explicitly controls each joint angle, requiring more thought but offering direct control).

Key findings

The system successfully demonstrated real-time vocal control of all five arm joints plus the gripper. In inverse kinematics mode, the vowels "uw" (boot) and "ae" (cat) move the gripper closer/further from the base, "iy" (feet) and "aw" (flaw) move left/right, and pitch changes raise/lower the gripper. A "ch" sound opens and closes the gripper, and a "k" sound switches to wrist orientation mode. In forward kinematics mode, only three DOF can be controlled simultaneously, with two shoulder rotations each controlled by vowel sounds and elbow bend controlled by pitch. The authors identified future directions including semi-autonomous robotic control where the VJ-Voicebot could learn trajectories for frequently used movements (e.g., training the arm to perform a "drink" command), reducing vocal fatigue. Sensors for collision detection and strain protection were noted as necessary for practical deployment, analogous to proprioception in the human nervous system.

Relevance

This paper demonstrates an innovative application of continuous voice-based control for assistive robotics, addressing a real need for people with severe motor impairments who cannot operate physical joysticks or use their limbs but retain vocal ability. The non-verbal vocal approach is significant because it provides continuous, proportional control (like a joystick) rather than the discrete, command-by-command interaction typical of speech recognition — enabling smooth, real-time manipulation of physical objects. For practitioners, the distinction between inverse kinematics (intuitive Cartesian positioning) and forward kinematics (direct joint control) illustrates an important design tradeoff in assistive robotics between ease of use and precision. The suggestion of learned semi-autonomous trajectories for common tasks points toward the important principle that assistive systems should adapt to reduce user effort over time, particularly for physically fatiguing input methods.

Tags: assistive robotics · voice interface · motor impairment · continuous voice control · robotic arm · vocal joystick · alternative input · independent living