Polishing your cache with Varnish David Smalley, Co-Founder of Litmus I’m David Smalley - co-founder of Litmus Talking about our newest site, Doctype.
Polishing your cache with Varnish
David Smalley, Co-Founder of Litmus
I’m David Smalley - co-founder of Litmus
Talking about our newest site, Doctype.
http://doctype.com
Doctype is the newest project from Litmus
It’s a web design q&a site
Heavily inspired by Stackoverflow et-al
In fact, when we got in touch with Jeff Atwood, he proposed we join his
Web League of Justice
We knew we’d get big traffic from the stackoverflow affiliation
Most people would be anonymous users
Wanted to avoid embarrassment
Didn’t want to spend time+money on a big cluster pre-emptively
After we made the rails site as efficient as possible we went looking at...
Caching
Caching
Most visitors would be anonymous, just hopping in on search engine, browsing and leaving
Hopefully lots of spidering traffic as a result of good quality, fresh content and our association with Stackoverflow
Wikipedia set a good standard for caching
"Squid cache servers handle about 78% of requests, almost all which are made by viewers who are not logged in to the site.
During load surges from media mentions, the Squids handle almost all of the traffic."
http://meta.wikimedia.org/wiki/Cache_strategy
<Read quote>
Old quote from Wikimedia’s meta wiki, from 2005-ish
Caching
We all know about Rails caching
- page caching- action caching- fragment caching
Page caching
•!Best caching for the anonymous access strategy•!Page gets parsed once and written to disk•!All subsequent requests get served by web server from disk•!Problem is we have anonymous AND non-anonymous users•!Can't distinguish between the two with page caching•!ALSO - each server in a future cluster would have to look after its own cache. After cache clear would lead to page being reparsed and recached on each app server
Action Caching
•!Caches output of an action to a rails cache store•!Lets us run filters etc. first so can distinguish between logged in/out•!Can use memcached so things are only cached once amongst the cluster•!Still has to hit rails and process before serving the cached content•!Potentially still runs all the queries you have in your controller
Fragment Caching
•!Cache bits of the page into rails cache store•!Would be good to cache the post-markdown processed questions/answers•!Still runs all the queries in the controller•!Still hitting Rails
Not happy with the options I went back to Wikipedia
"Squid cache servers handle about 78% of requests..."
http://meta.wikimedia.org/wiki/Cache_strategy
So I went and researched reverse proxy caches and came across Varnish
“Varnish is a reverse Web accelerator designed for content-heavy dynamic web sites. In contrast to other HTTP accelerators, many of which began life as client-side proxies or origin servers, Varnish was designed from the ground up as an accelerator for incoming traffic.”
Used by search.twitter.comhulu.comwikia.com
As I dug deeper I found a ruby library that handled cache purging via the varnish CLI interface
Klarlack basically means varnish in german
Advantages of a reverse proxy cache server
Can load balance between app servers
Only caches things once across the cluster
Can use logic to determine how and when to cache and serve from cache
“There are only two hard things in Computer Science: cache invalidation and naming things”
- Phil Karlton
We’ve all heard this quote before
But we had a few things in our favour
Cache sweepers + a good ruby library for communicating with varnish
Will fit right in with the way we normally handle our caches in Rails
We also had advantages in Doctype
Comments
Answers
Questions
Simple object model
QuestionsAnswers& Comments
Everything basically centres around a question page, change any of them and just purge the question page it relates to
Cache Sweeping
With Rails cache sweepers, and the klarlack library. I wrote a plugin as some glue between the two
MDF
Imaginatively, on the varnish/wood theme I called it MDF
YAML file holds details of the cache servers and which port they are running the varnish CLI
MDF
plugin then basically calls the purge command against each of the servers listed in the YAML file. I modified it slightly to include the http host in the purge because my varnish servers handles a few different sites
MDF
Normal looking cache sweeper, just passes the purge path through to the MDF plugin
doesn’t like caching
Rails doesn’t like caching
Just look at the default headers you get back from it
doesn’t like caching
it says everything is private, with a max-age of zero, this means no caching
We need to fix this in our code.
doesn’t like caching
I added this method to application_controller
unless we’re in development mode, or there is someone logged in, then set a default cache age of 30 minutes, and also set “public” in the cache header which tells proxies to cache it.
What is s-maxage?
Basically, it's a max-age header that only public caches listen to, not browser caches. This ensure's we retain control on expiry with our backend cache purging antics
You *need* to call cache_control on every action you want to cache. Think carefully before you do this
Using our cache_control method
Throw a call to cache_control into any method we want to cache. With no options itʼll just do the default and set the age to 30 minutes
Using our cache_control method
On some actions we may only want to cache for a short amount of time. Here, as the sphinx index is updated via cron every 5 minutes and doesnʼt tie into a cache sweeper, we set the cache time to 5 minutes.
Back to Varnish config...
How do we make varnish do anonymous only caching?
Logged in users have a user_credentials cookie.
As weʼre using authlogic, any logged in users have a user_credentials cookie so letʼs differentiate on that
but....
Caches donʼt like cookies
If varnish seeʼs a cookie in the request then it wonʼt cache - for safety to ensure you donʼt cache a users private data
however....
Varnish can meddle with the request - we know when cookies are needed and when they are not, so we can create a varnish config that handles that correctly
Snippet of varnish config
If itʼs an image, css, javascript or an icon - unset any cookies
If the user has a user_credentials cookie - skip the cache
If the user hits one of these urls - skip the cache
If the request isnʼt a head of a get, skip the cache
Otherwise, bin the cookie and check the cache
Snippet of varnish config
This means that users can hit the login page, get a cookie which comes back in response to their POST request (not cached remember) and then once theyʼve got the cookie, theyʼre cache free.
Otherwise, cache with extreme prejudice
Snippet of varnish config
This means that users can hit the login page, get a cookie which comes back in response to their POST request (not cached remember) and then once theyʼve got the cookie, theyʼre cache free.
Otherwise, cache with extreme prejudice
I think youʼll find itʼs a bit more complicated than that...
Actually, varnish config is quite complicated. Thereʼs a vcl_recv and a vcl_fetch section. One deals with incoming client requests, one deals with the response to back end requests.
Because our clients can send, and our backend can also send cookies. We need to have the cookie filtering block repeated in those two sections.
Iʼll post my varnish config along with this presentation on my blog
Stats
Actually, I have no real stats to show you.
During peak times and search spidering the site definitely benefits from being cached.
However, in our case we definitely cached prematurely. We had good reason to do so.
We wanted to avoid a hammering when Jeff Atwood announced us on the Stackoverflow blog. But ultimately you donʼt need this kind of caching to start out with unless youʼre expecting a big traffic spike by being mentioned in a national newspaper or something
Final thoughts
One thing Iʼve learned in my time running successful commercial websites - and working for two hosting companies itʼs that caching is not a crutch
If your site is slow and shit before you cache, itʼll be slow, shit and temperamental after you cache.
Badly written sites that cache heavily tend to fall to their knees when the cache is cleared for some reason - the expensive action gets hammered and multiple concurrent requests to recache it are triggered.
Iʼve seen too many people use caching as a way to make a badly written site hobble along
What you should be aiming for, is writing an efficient site and then once the load starts to build - applying caching selectively to maintain the same level of efficiency and throughput.
Not everything has to be cached, focus on your most hit actions.
Different caching strategies for different situations
Varnish is just one of the caching strategies you can look at
Itʼs particularly suited for a site where most of the users are anonymous
For mostly private sites you should be looking at fragment caching using memcached
Further reading:
I was heavily inspired by this presentationhttp://www.slideshare.net/schoefmax/caching-with-varnish-1642989
Varnishhttp://varnish.projects.linpro.no/
Wikimedia Caching Strategy (a bit old)http://meta.wikimedia.org/wiki/Cache_strategy
Rails Guide to Cachinghttp://guides.rubyonrails.org/caching_with_rails.html
This talk, and some config snippets, will be posted on my bloghttp://davidsmalley.com
Questions?