Red Teaming

Also known as: Generative Red-Teaming, AI Red Teaming

A structured evaluation practice in which an adversarial team probes a system — traditionally a network or application, increasingly an AI model or conversational agent — with realistic attack scenarios to find failures before malicious actors do. Generative red-teaming specifically targets LLM and generative-AI outputs with jailbreak, prompt-injection, instruction-hierarchy, and role-play attacks. In accessible AI products, red-teaming should explicitly cover disability-relevant misuse patterns (impersonation of a caregiver, extraction of health disclosures, misinformation that exploits information asymmetries).

Category: Security · AI · Research Methods

Related: Jailbreak · Prompt Injection · Threat Modeling

Sources

https://www.nist.gov/itl/ai-risk-management-framework