AI Red Teaming
Definition
Structured, adversarial testing of an AI system by dedicated teams or tools that deliberately surface security, bias, hallucination and misuse patterns. Distinct from classical software pentesting in its focus on model-inherent risks — prompt injection, jailbreaks, harmful outputs, data leakage through generative responses.
Noise — Signal
Red teaming gets sold as "we're testing the AI for security". The phrase obscures a decisive distinction: provider red teaming before model release tests different risks than deployer red teaming before an application goes live. Most security gaps in productive AI applications arise from the specific combination of model, tools, permissions and data — and are not covered by provider red teaming. Outsourcing compliance to the model provider shifts a responsibility that cannot, in fact, be handed off.
The right question
Not: "Has the provider red-teamed the model?" But: "Which attack vectors arise from the specific combination of our application — tools, data access, permissions, workflows — and who tests that combination before it goes into production?"