Accuracy and Reliability of At-Home Quantification of Motor Impairments Using a Computer-Based Pointing Task with Children with Ataxia-Telangiectasia
Vineet Pandey, Nergis C. Khan, Anoopum S. Gupta, Krzysztof Z. Gajos · 2023 · ACM Transactions on Accessible Computing · doi:10.1145/3581790
Summary
This paper validates at-home, unsupervised use of Hevelius, a web-based active digital phenotyping system that quantifies motor impairment in the dominant arm through mouse pointing tasks. The study addresses a critical gap: while prior research showed that crowdsourced behavioral data produces valid aggregate results, it remained unclear whether individual-level measurements could be accurate and reliable enough for clinical or accessibility applications when collected without professional supervision. Hevelius presents participants with a series of pointing tasks—clicking on targets of varying sizes and distances—and extracts 32 measures from the mouse cursor movement trajectories, including movement time, peak speed, normalized jerk (movement smoothness), number of pauses, and directional changes. These measures are converted to age-specific z-scores using normative data from 229,017 participants collected via LabintheWild, then fed into a regression model that estimates the dominant arm component of the Brief Ataxia Rating Scale (BARS), a clinician-administered assessment. The study involved 13 children with ataxia-telangiectasia (A-T)—a rare, progressive neurological disorder causing increasing motor impairment—and 9 healthy siblings as controls. Children first completed a supervised session at a clinic event, then used Hevelius at home weekly for up to 14 weeks with caregiver assistance but without researcher supervision. The system was adapted for at-home use with personalized minimum target sizes (calibrated during the supervised session) and a "test drive" mode for caregivers.
Key findings
Single unsupervised sessions produced BARS estimates with higher mean absolute error (MAE = 0.57) than single supervised sessions (MAE = 0.53), as expected given the potential for interruptions and distractions at home. However, taking a median of just 2 consecutive unsupervised sessions reduced MAE to 0.49—better than single supervised sessions. MAE continued improving with more sessions, reaching 0.46 with 10 sessions aggregated. Test-retest reliability (measured via Intraclass Correlation Coefficient) was moderate for single sessions (ICC = 0.67 for A-T participants) but reached "good" levels (ICC = 0.81) when aggregating 2 consecutive sessions and "excellent" (ICC = 0.92) with 4 sessions. Six of the 32 individual measures showed good reliability (ICC ≥ 0.75) with just 2 sessions: movement time, number of pauses, longest pause duration, execution time, click duration, and normalized jerk. Other measures, including movement offset and variability measures, showed poor reliability even with 4 sessions aggregated. Participant mood, fatigue, and sleep quality (self-reported) did not significantly explain session-to-session variability in BARS estimates, suggesting other unmeasured factors drive the variance. The median session took 11 minutes; caregivers reported some children found the task frustrating, and participants developed compensatory strategies like using their non-dominant hand to stabilize their mouse arm.
Relevance
This research has significant implications for both accessibility and clinical applications. For ability-based assistive technology design, the 32 measures provide a comprehensive framework for quantifying motor performance that could inform personalized interface adaptations—the measures distinguish different aspects of impairment (speed, accuracy, smoothness, consistency) that may require different accommodations. The finding that 2-3 unsupervised sessions produce reliable individual measurements opens possibilities for longitudinal motor assessment in accessibility research without requiring lab visits. This is particularly valuable for populations with rare conditions (like A-T) where participants are geographically dispersed, and for tracking progressive conditions where frequent monitoring is beneficial. The age-specific z-score approach—comparing performance against normative data for that age—is methodologically important for separating disability effects from developmental effects in pediatric populations. For practitioners, the specific measures identified as reliable (movement time, pause characteristics, jerk) could be prioritized in simpler assessment tools, while unreliable measures (movement offset, variability metrics) may not be suitable for individual-level decisions without substantial aggregation. The study also demonstrates the value of disease foundation partnerships for recruiting rare disease participants.
Tags: motor impairment · digital phenotyping · remote assessment · ataxia · pointing tasks · pediatric · ability-based design · mouse input