Inference Cost / TCO
Definition
Inference cost denotes the ongoing cost of model use per request — typically per token, per image or per second of compute. Total cost of ownership (TCO) additionally encompasses development, data preparation, eval, hosting, monitoring, retraining and compliance overhead across the lifecycle.
Noise — Signal
AI business cases are routinely calculated on the basis of list prices ("$5 per million tokens"). In production, actual costs typically run three to ten times higher: long prompts, multiple model calls per user action (routing, reasoning, verification), retries, eval calls, monitoring pipelines. On top of that come infrastructure scaling costs, which are pushed onto the provider with foundation-model APIs but remain visible in on-premises setups.
The right question
Not: "What does the model cost us?" But: "What are the full costs per productive user action across the entire application path — and how does the ratio change when we scale by a factor of 10 or 100?"