Usage of Subjective Scales in Accessibility Research

Shari Trewin, Diogo Marques, Tiago Guerreiro · 2015 · ASSETS '15: Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility · doi:10.1145/2700648.2809867

Summary

This methodological paper investigates whether positive response bias affects subjective ratings in accessibility research—a critical concern given that Likert-type items are widely used to evaluate assistive technologies. The authors pursued two complementary approaches: a systematic analysis of ASSETS papers from 2010-2014, and an experimental study comparing responses from participants with visual impairment to those from students without disabilities. The literature review examined 51 studies that used Likert-type items with participants facing accessibility challenges. Key findings: only 8% had anonymous response submission; 94% used custom-designed items rather than validated instruments like SUS or NASA-TLX; 75% of items used positive language; and the mean rating (3.64 on a 5-point scale) exceeded typical HCI study baselines (3.55). Ratings for proposed innovations were even higher (3.74) compared to existing technologies (2.9). Most studies used verbal question administration—necessary for accessibility but increasing social desirability bias. The experimental study recruited 16 participants with visual impairment and 16 graduate students to use two telephone information systems: one well-designed ("Usable") and one deliberately degraded with usability problems. Both groups completed four information-retrieval tasks on each system and rated task ease using Sauro's validated Single Ease Question (7-point scale).

Key findings

The experimental results revealed striking differences between groups. While both groups completed all tasks successfully, the student group rated the degraded system significantly lower than the usable system (mean difference 0.86, p<0.0001), demonstrating sensitivity to usability problems. The visually impaired group showed no significant difference in ratings between systems (mean difference 0.13, p>0.025)—despite taking longer and making more navigation errors on the degraded system. The correlation between subjective ratings and objective task completion times was substantially weaker for participants with visual impairment (r=0.50) compared to students (r=0.70). This disconnect was illustrated by individual examples: one participant took over 200 seconds on a task with a suboptimal navigation path, then rated ease at 6/7; another took unnecessary steps and rated the task 7/7. The authors propose several explanations: daily experience with accessibility barriers may create a lower usability baseline (making accessible-but-flawed systems seem "very easy" by comparison); participants may want to encourage researchers; social desirability bias may differ between populations; or the task was genuinely easier for participants accustomed to auditory interfaces. The ceiling effect produced by high ratings masked genuine usability differences that were detectable in objective measures.

Relevance

This paper is essential reading for anyone conducting user research with people with disabilities. The findings suggest that standard usability metrics and their benchmarks may not transfer directly to accessibility contexts—a SUS score of 70 (typically considered "good") may represent a different threshold for assistive technologies. Researchers receiving strongly positive subjective ratings should consider whether bias is inflating results before concluding their innovations are effective. The authors provide six actionable recommendations: use validated scales with balanced item wording; consider electronic rather than verbal administration where feasible; use verbal labels rather than just numbers when presenting scales verbally; report presentation methods precisely; and interpret positive ratings cautiously. The finding that 94% of accessibility studies use custom items rather than validated instruments represents a significant methodological gap the field should address. For practitioners, the study highlights a nuanced point: when subjective ratings seem at odds with objective measures, this may reflect genuinely different priorities between disabled users and researcher assumptions—not just bias. However, distinguishing meaningful preference differences from response artifacts requires careful study design and interpretation.

Tags: research methodology · Likert scales · response bias · usability testing · accessibility research · user studies · visual impairment · evaluation methods