2026-05-06 · claudecert.com
Shipping your first agent to production
A pragmatic checklist: evals, observability, budgets, kill switch.
Before flipping the agent on for users:
- Golden set. 20–50 representative tasks with expected outcomes. Run before every prompt change.
- Observability. Log model, tokens, latency, tool calls, errors. Dashboard the top 5.
- Cost budget. Per-user and per-task hard limits. Refuse with a friendly message when hit.
- Kill switch. A feature flag that turns the agent off without a deploy.
- Human review queue. Sample 1–5% of completions and grade them weekly.
- Rollout plan. Shadow → 1% → 10% → 100% with quality + cost gates at each step.
Agents that get production traction are the boring well-instrumented ones.
