Conversation
This will work automatically with Batcache, as it only instructs CloudFront to change its behaviour, rather than everything.
mattheu
left a comment
There was a problem hiding this comment.
Looking to implement something similar I went digging around for any prior art and came across this. But I'm not sure about your implementation here.
TLDR I think it would be better to keep maxage a bit shorter combined with longer swr thresholds.
You're still using the long cache maxage as implemented in #1. And then you're adding a really short stale-while-revalidate threshold (Both long and short thresholds are pretty short).
But would it not be better to keep the maxage a bit shorter, and increase the swr threshold depending on the type of content?
In this example, some pages are cached a very long time e.g. 14 days. And then you have a tiny 60s window in which you're willing to serve stale content. Which I'm not sure makes a lot of sense. For this type of thing why not set maxage to something shorter say 1 hour and stale-while-revalidate to 14 days?
In my use case I've been thinking about the following:
- New/dynamic content maxage 5 mins, swr of 30s.
- Medium content: maxage 5 mins swr of 1 hour
- Old/static content: maxage 1 hour swr of 72 hours.
mattheu
left a comment
There was a problem hiding this comment.
Also - you'r setting both stale-while-revalidate declaration AND must-revalidate declaration. Do these 2 not conflict?
The behaviour I'm trying to create here is to set normal-ish cache lifetimes, with an swr that covers regeneration time. After maxage, the cache is expired (technically stale), but swr gives us a threshold where the cache server will refresh that cached data while still serving expired content. The case for maxage = 5 mins + swr = 30s is for new, highly-trafficked content. Here we want the content to refresh every 5 mins, but we're protecting the backend from a thundering herd if it expires. 30s covers the page generation time. For the longer maxage = 14 days + swr = 60s, this is slower, older content with fewer requests. We give a bit of a larger threshold since the backend may not have all the data in the object cache eg, so page generation may take longer.
You can do it both ways, depending on what you want to achieve. I think this is probably best demonstrated with two examples, of both low and high traffic. Let's say we have an old post which gets a few requests per day (spread out across the day) and is low traffic - let's say 10 req/day. With maxage = 14 days + swr = 60s, the page will be served from the cache every day until the day when it expires, when it will then miss the cache and hit the backend directly (unless it's perfectly timed). Across the 14 days, we'll get 1 backend request, but users will get the fresh content right away. With maxage = 1h + swr = 14 days, the page will be revalidated on almost every request, while users will receive the previous content. We'll get 140 req across the 14 days, and users will get stale content basically every time. Now, let's say we have an old post which is still very popular - let's say 20k req/day (~13/min). With maxage = 14 days + swr = 60s, the page will be served from the cache for almost every user, and the content will still be refreshed when it expires - now due to the higher traffic, there'll be no cache misses. Across 14 days, we'd get 1 backend request. With maxage = 1h + swr = 14 days, the page will be served from the cache for almost every user, but the content will be constantly refreshed, even though it's unlikely to change. Across 14 days, we'd get 336 backend requests, even though it doesn't really need revalidation. You could of course do something like maxage = 14 days + swr = 2 days, but there's some diminishing returns here since it only affects that long tail of low traffic content - which could get ejected from the cache anyway. You basically need to guess the frequency of your content access. I could be convinced that 13 days + 1 day swr might be better here, but it ultimately matters a little less. |
I think you're right, I hadn't fully tested this. I think Cloudfront might ignore the must-revalidate, but per standard, seems like it should override it - which is not what we want. |
|
From some further reading and checking nginx, Caddy, and Varnish source, none of them use must-revalidate as a signal at all. Firefox and Chrome both use must-revalidate over stale-while-revalidate. This needs some further testing to see if must-revalidate is used in CloudFront - it's possible that must-revalidate only applies to private caches rather than shared caches, but I don't see anything in the spec that would indicate why that's the case. Edit: Indeed, looks like 3.5 permits shared caches to store even with must-revalidate, so I think the header is correct as-is. Still, needs testing. |
This will work automatically with Batcache, as it only instructs CloudFront to change its behaviour, rather than everything.
stale-while-revalidatetime is basically added tomax-age, so ifmax-ageis 600 and swr is 60, the item will last in CloudFront for 660 seconds.I picked 30/60s here as thresholds; it's diminishing returns past that. It shouldn't take more than 30s to regenerate a request anyway, so for frequently accessed data, it should basically always be served from the cache now.