Distillation

Definition

A technique for training a smaller "student" model to approximate the behaviour of a larger "teacher" model. Goal: comparable quality at significantly lower inference cost and latency. Frequently used in combination with fine-tuning on domain-specific tasks.

Noise — Signal

Distillation is often simplified to "we just take a smaller model and become cheaper". Success depends on three conditions: high-quality training data or an accessible teacher model, a tightly defined task, and systematic evaluation. Distillation without a clearly bounded use case produces a model that looks stable on benchmarks but breaks at the edge cases.

The right question

Not: "Can we distil the model to save costs?" But: "Which sub-task is tightly enough defined that a smaller model can map it reliably, and do we have the evaluation infrastructure to detect quality drift before it reaches the end customer?"

← Back to glossary

Distillation

Definition

Noise — Signal

The right question

Related terms

Fine-Tuning

Inference Cost / TCO

Evaluation (Eval)

Edge AI