Brent Haskins / Applied AI

Caching Is a Product Decision, Not an Infrastructure Toggle

Q: How do I decide between serving stale content from cache vs waiting for a fresh response from origin?

Start with user expectations. If the data is rarely changed and slight staleness is invisible (e.g., a QR code for a static URL), serve stale while revalidating. If the data is real-time and accuracy matters (e.g., order book depth), never serve stale. Use stale-while-revalidate for the former, strict no-cache for the latter. The product contract dictates the cache policy, not the other way around.

Q: What is stale-if-error and when should I use it?

stale-if-error tells the CDN to serve a stale cached version if the origin server returns an error (5xx, timeout) for up to a specified time, like 86400 seconds. Use it for any endpoint where a slightly old response is better than a blank error page – think product listings, reference data, or loading states. Never use it for financial transactions or user-specific payments where accuracy is non-negotiable.

Q: How do I measure the user-perceived latency vs cache-hit ratio?

Cache-hit ratio alone is misleading; it doesn't capture delayed origin fetches. Measure Time to First Byte (TTFB) for cached vs uncached requests at the edge, and use Real User Monitoring (RUM) to track actual user experience across regions. For a product engineer, a 95th-percentile TTFB under 500ms globally is the real target, not a 99% hit ratio that still leaves some users waiting 2+ seconds.

June 25, 20266 min readBy Brent Haskins

Most teams treat caching as a CDN configuration problem. But the real product impact comes from decisions about staleness, fallback strategies, and latency budgets. Drawing from production experience and recent edge-caching benchmarks (QR code generation under global distribution, 2026), this post argues that caching headers like stale-while-revalidate and stale-if-error are product contracts that define how users perceive speed and reliability. For product engineers shipping SaaS or AI interfaces, the difference between a 200ms response and a 2s one is often a matter of intentional staleness policy, not raw infrastructure horsepower.

Performance + UX
Product Thinking
AI Product Engineering

The short answer

Caching is not a CDN toggle. It’s a product contract you write with HTTP headers. Every Cache-Control directive, every stale-while-revalidate window, and every stale-if-error fallback is a statement about what your users will see when your origin blinks. And your origin will blink.

In 2026, global edge networks are table stakes. AWS CloudFront offers 600+ points of presence. Services like URL-QR generate images from edgeless infrastructure with single-digit millisecond latency worldwide, as shown by recent benchmarks that all tests returned <50ms TTFB by relying on edge caching for identical query parameters. But that’s easy when the resource is static. The hard part — the product part — is deciding how to handle dynamic data, partial updates, and origin failures.

If you treat caching as an ops concern, you’ll optimize for cache-hit ratio and miss the real metric: perceived latency under edge conditions. The product engineer’s job is to define the staleness budget — how old is too old before the user sees a spinner or an error? That’s a product decision, not an infrastructure one.

Key takeaways

Use stale-while-revalidate for any endpoint where a slightly stale response is better than a wait. The 10-minute window in a max-age=3600, stale-while-revalidate=600 policy hides revalidation latency from most users.
stale-if-error is your best friend for graceful degradation. Setting it to 86400 gives a full day of fallback content when origin returns 5xx, turning edge nodes into resilience layers.
Cache keys matter more than CDN provider. Include only the query parameters that change the response — omitting tracking params, session IDs, and non-essential headers prevents cache fragmentation.
Latency testing must target origin paths, not cached edges. For real-time apps like Polymarket order books, testing the /book endpoint bypasses caching and reveals actual origin performance. CDN metrics alone lie.
Product engineers should own cache policies alongside backend teams. A stale product listing is an annoyance; a stale payment amount is a bug. The policy reflects the data’s criticality.

The real problem: caching as a product contract

Most caching advice focuses on performance: faster loads, lower origin load. That’s table stakes. The real problem is that caching defines reliability for users. When your database is in us-east-1 and a user in Sydney makes a request, the round trip is 200ms before any compute. If the edge serves fresh content, you’re golden. If the cache misses, they wait. If the origin fails, they get an error — unless you’ve planned for it.

A product engineer sees caching as a UX interface. The three directives from the 2026 caching reference — max-age, stale-while-revalidate, stale-if-error — compose into a policy that says: “We accept one hour of freshness, with a ten-minute window to hide revalidation latency, and up to a day of last-good content if everything breaks.” That’s a product promise, not a config file.

For AI-powered products, caching is even trickier. Timestamps, confidence scores, and generation parameters change rapidly. You can’t cache a model response that depends on user context. But you can cache the template UI, the static assets, and the reference data the model queries. Know what to cache and what to skip.

Tradeoffs: when stale is better than slow

Staleness is not binary. It’s a spectrum from “exactly right now” to “good enough for a pricing decision.” For a QR code reader that converts URLs to static images, caching the response for hours is fine — the QR encodes the same data. For a real-time dashboard showing order book depth, any stale quote is dangerous.

The product tradeoff: user tolerance for staleness vs. origin latency. If your origin takes 2 seconds to generate a personalized widget, and the worst-case staleness is 5 minutes, serving stale while revalidating in the background (using stale-while-revalidate) keeps the user in a productive flow. If you refresh the widget on every page load, they stare at loading states for 2 seconds — and bounce.

I’ve seen teams implement aggressive caching on admin dashboards because the data changed only every 15 minutes. Users didn’t notice stale data, but they did notice the spinner. The origin was slow because it recomputed aggregates. The fix: max-age=900 with stale-while-revalidate=300. The dashboard felt instant, and the origin load dropped by 60%.

How this looks in a shipped product

Consider the QR-code generation service benchmarked in June 2026. The same GET request — https://url-qr.com/?url=…&format=svg&size=12 — was tested from multiple global locations. Responses came back in under 50ms. No region parameter, no CDN purging. Why? The edge caches the SVG response keyed on the exact URL, format, and size. If a second user requests the same URL, they get the cached SVG instantly.

That’s the easy case. The hard case is when the URL changes per user (e.g., contains a session token). Then caching breaks. The product decision becomes: do we strip the session token from the cache key? Only if the resource is identical for all users. For a QR code, yes. For an AI-generated image, probably not.

The lesson: design your API cache keys alongside your product requirements. If two different user inputs produce the same output, share the cache. If they differ, don’t. And communicate that with your frontend team so they know when to add cache-busting parameters.

What to evaluate: latency budgets and fallback strategies

Latency budgets must account for cache misses. A CDN feature list (like CloudFront’s) means nothing if your origin can’t serve in time. Measure the 95th percentile origin TTFB for your critical paths. If it’s above 500ms, you need aggressive caching or you need to optimize origin — likely both.

Evaluate your error fallback coverage. Which endpoints have stale-if-error? Which return a 503 immediately? For a product engineer, this is a risk inventory. Map every API response to a staleness tolerance: values (prices, scores), content (text, images), and metadata (categories, labels). Assign a policy. Test it with a synthetic outage.

In practice, I keep a spreadsheet of endpoints with their cache directives, latency budgets, and fallback behavior. It’s part of the product spec, not the ops runbook. When a founder asks “Why did the app show a 2-hour-old price?” I can point to the policy tradeoff: we chose freshness over speed for that endpoint, and the origin slowed down. The fix was tuning the origin query, not the cache.

Closing: a concrete next step

This week, audit your app’s most-loaded API endpoint. Check its Cache-Control header. Does it have stale-while-revalidate? stale-if-error? If not, ask your backend team why. If they say “we don’t need it,” test what happens when the origin takes 2 seconds. That’s your product latency budget unspoken. Write it down, set a policy, and own it.

FAQ

Questions people ask about this topic.

How do I decide between serving stale content from cache vs waiting for a fresh response from origin?