← All terms

Jailbreak

Also known as: LLM Jailbreak, AI Jailbreak

In the context of generative AI, a class of adversarial input designed to bypass a model's safety rules, instruction-following constraints, or content policy — for example, instructions that tell the model to "ignore previous rules" or role-play as an unrestricted assistant. Distinct from but related to prompt injection. Relevant to accessibility products that embed LLMs because jailbreaks can cause an assistive agent to produce clinically harmful, misleading, or biased output to users who may have fewer resources to detect or correct the error.

Category: AI · Security · Human-AI Collaboration

Related: Prompt Injection · Large Language Model · Hallucination

Sources