Perturbation testing

Also known as: Counterfactual testing, Template-based testing

A bias evaluation methodology for NLP models that systematically substitutes identity-related terms (e.g., disability phrases) in otherwise identical sentences to measure whether the model produces different predictions based on the identity mention alone. By holding all other text constant and varying only the disability reference, perturbation testing isolates the model's learned associations with specific groups and quantifies the magnitude of bias.

Category: artificial intelligence · evaluation · ethics

Related: Algorithmic bias · Benchmark dataset · Toxicity detection · Representational harm

Sources

https://doi.org/10.1145/3386296.3386305