The Problem
Pick the version that's yours.
TEAM: You shipped an agent doing PR review across your dev team. $0.30/review in QA. The team was happy, finance was happy, you went home Friday. Tuesday morning: $11,400 on your Anthropic key in 36 hours from one tool-call loop.
SOLO: You shipped one ticket-triage agent for your two-person startup. $0.30/ticket in testing. You felt good about the math. Tuesday morning: $4,800 from a single ticket thread the agent kept “clarifying” overnight.
BOT: You shipped customer-facing AI completing invoice line items. $0.02/completion in QA. You shipped to all 12,000 users on Monday. Tuesday morning: a Substack post brought 200 new signups, three of them used your AI as a personal assistant, and your monthly margin is now -180%.
Then Monday's invoice arrived. Three modes, one failure: cost was a runtime constraint and you didn't enforce one.
The Core Insight
Cost is a runtime constraint, not a model-pick decision. The cheapest agent is the one with a hard ceiling.
Most teams treat cost as a model-selection problem (“Sonnet vs Opus”) or a context-shaping problem (“trim the prompt”). Both matter. Neither saves you when an agent loops.
The pattern that saves you is a cost envelope attached to every task: a hard token budget, a hard wall-clock budget, and an attribution tag. When the agent hits either limit, the harness kills it — before the model gets a chance to reason about whether it should keep going. The whole point is to make cost discipline a non-negotiable runtime check, not a polite suggestion the agent might honour if it remembers.
// across actors
TEAM: Per-engineer Copilot budgets + alerts when one developer's spend doubles week-over-week.
SOLO: Per-task envelope on your agent runs ($X for code review, $Y for triage) + a kill-switch when a single run exceeds 5× the historical median.
BOT: Customer-facing AI feature with a per-customer cost ceiling so one power user can't bankrupt the unit economics.