2026-03-19 · claudecert.com

Prompt cache economics: why static-first ordering matters

The 5-minute cache cuts input cost by ~90% on hits. Almost everyone loses cache hits by putting volatile content first.

Prompt caching matches on prefix. The cache key is the exact byte sequence from the start of the prompt up to the marked breakpoint. Any drift in those bytes — date strings, user IDs, timestamps — invalidates the cache.

The fix is structural: put your stable instructions first, then your session-stable data (the user's profile, tenant config), then your per-call content (the new message, today's date, anything dynamic). Mark the boundary between stable and volatile with a cache control breakpoint.

This isn't a micro-optimization. On a system prompt of 4K tokens and 100 calls per session, you save 360K input tokens — roughly $0.27 on Sonnet pricing. Across a fleet, that adds up fast.