Prompt Injection

Prompt Injection

Definition

An attack technique in which an attacker crafts inputs so that the model ignores, overrides or turns its original system instructions against the operator. Direct (in the user input) or indirect (in content the model processes — documents, web content, tool outputs).

Noise — Signal

Prompt injection is often dismissed as something that "can be solved with a few filters". Today it is the OWASP top-1 risk factor for LLM applications, and there is no complete technical mitigation. Indirect prompt injection — instructions an attacker embeds in a document or email that the model later processes — is particularly relevant to agentic AI architectures and enterprise search. An application that processes external, untrusted content and is at the same time allowed to perform privileged actions is structurally vulnerable.

The right question

Not: "How do we prevent prompt injection?" But: "At which points does our system process untrusted content, which actions is the model allowed to trigger at those points, and which permission design reduces the blast radius in the event of a successful injection?"

Definition

Noise — Signal

The right question

Related terms

Guardrails

AI Red Teaming

Agentic AI

Model Governance