AI Red Teaming

AI Red Teaming

Definition

Structured, adversarial testing of an AI system by dedicated teams or tools that deliberately surface security, bias, hallucination and misuse patterns. Distinct from classical software pentesting in its focus on model-inherent risks — prompt injection, jailbreaks, harmful outputs, data leakage through generative responses.

Noise — Signal

Red teaming gets sold as "we're testing the AI for security". The phrase obscures a decisive distinction: provider red teaming before model release tests different risks than deployer red teaming before an application goes live. Most security gaps in productive AI applications arise from the specific combination of model, tools, permissions and data — and are not covered by provider red teaming. Outsourcing compliance to the model provider shifts a responsibility that cannot, in fact, be handed off.

The right question

Not: "Has the provider red-teamed the model?" But: "Which attack vectors arise from the specific combination of our application — tools, data access, permissions, workflows — and who tests that combination before it goes into production?"

Definition

Noise — Signal

The right question

Related terms

Guardrails

Prompt Injection

Evaluation (Eval)

Model Governance