Harness

Harness

Definition

A structured software layer that connects an AI model to the outside world — through defined inputs, tool calls, eval routines, safety checks and output processing. In 2026 the term is used in two main contexts: as eval harness (systematic testing infrastructure such as lm-evaluation-harness or OpenAI Evals) and as agent harness (runtime scaffolding that turns a model into an agentic system — tool calls, memory, escalation paths).

Noise — Signal

When vendors say "we deliver foundation model X", that's only half the story — quality and safety of a productive system come to a substantial degree from the surrounding harness. Two applications with identical models can deliver very different results depending on how the harness exposes tools, curates context, catches hallucinations and builds in eval loops. The common shortcut in the market: the model is bought, the harness is "somehow built ourselves". That is exactly where lock-in, security gaps and migration risk emerge.

The right question

Not: "Which model are we deploying?" But: "Which components of our harness — tool routing, context curation, eval loop, guardrails, escalation — do we build ourselves, which do we buy, and where does lock-in arise if we want to swap the model?"

Definition

Noise — Signal

The right question

Related terms

Agentic AI

Evaluation (Eval)

Guardrails

Foundation Model

Prompt Injection