Guardrails

Definition

Mechanisms before, during or after model inference that filter, restrict or escalate undesired inputs or outputs — from simple regex filters through classifiers to specialised guardrail models and policy engines.

Noise — Signal

Guardrails are often sold as "the safety layer that makes the model safe". They are not. They are a layer of additional heuristics between input and model, or between model and user. They reduce risk, they do not eliminate it, and they have their own failure modes: false positives that block legitimate requests, false negatives that let problematic content through. In regulated industries, guardrails do not replace risk management — they are one building block within it.

The right question

Not: "Which guardrails do we need?" But: "Which risks do we address at which layer — input, model, output, workflow — how do we measure the hit rate in live operation, and which risks remain structurally outside what guardrails can deliver?"

← Back to glossary

Guardrails

Definition

Noise — Signal

The right question

Related terms

Prompt Injection

AI Red Teaming

Agentic AI

Evaluation (Eval)