Guardrails
Definition
Mechanisms before, during or after model inference that filter, restrict or escalate undesired inputs or outputs — from simple regex filters through classifiers to specialised guardrail models and policy engines.
Noise — Signal
Guardrails are often sold as "the safety layer that makes the model safe". They are not. They are a layer of additional heuristics between input and model, or between model and user. They reduce risk, they do not eliminate it, and they have their own failure modes: false positives that block legitimate requests, false negatives that let problematic content through. In regulated industries, guardrails do not replace risk management — they are one building block within it.
The right question
Not: "Which guardrails do we need?" But: "Which risks do we address at which layer — input, model, output, workflow — how do we measure the hit rate in live operation, and which risks remain structurally outside what guardrails can deliver?"