Distributed Release Management Deploying etsy.com 40+ times per day
Mike Brittain
Engineering Director, Etsy
@mikebrittain mikebrittain.com/talks
1st Day Assignment Put your face on etsy.com/about
What I’m showing you tonight is the result of four years of iteration.
Small incremental changes to the application “Dark” features: new classes, methods, controllers Graphics, stylesheets, templates Copy/content changes !
App deploys
Turning flags on, off, or % ramp up
Config deploys
Latent bugs and security holes Traffic management, load shedding Adding and removing infrastructure !
Tweaking config flags or releasing patches.
“Operating” the site
IRC, #push
/topic mbrittain | jgoulah | rsnyder | ekastner
/topic mbrittain, jgoulah, rsnyder | ekastner
Keep real people in the loop
Queue, with max batch size of seven.
Automated deployment run by humans
4 people in this deploy.
“I’ve pushed my changes to master.”
“Everyone has checked in.”
Build QA and Pre-prod
Build progress
Status in #push
Git SHA1 in for each env.
Date, username, deploy log, changeset, link to dashboard from time of deploy
Reporting what’s going on in Deployinator, and who triggered
Status from build cluster
Pre-prod (“princess”) has been deployed. !
SHA1 of the change Time it took to deploy Link to changeset in GitHub Log of the deploy script
Btw, there are three bots talking in channel at this point. O_o
Queuing for next deploy
Humans talk to other humans from time to time.
Talking to pushbot. !
Pushbot knows some Spanish… because, ya know, why not?
Link to test results for CI environment, along with how long the tests took.Alerting by name.
8 minutes have elapsed… We’ve built and tested our release in the CI environment (“QA”). !
QA build failed our 5 min. SLA for tests.
“Try” is our pre-commit testing cluster.
Bots help reinforce our values. This is especially helpful for new people on the team.
Still 8 minutes elapsed… Pre-prod has been deployed and tested. !
This ran in parallel with our QA build and tests.
Cross-traffic: In a separate channel (#config), our app configs files were deployed to pre-prod.
Cross-traffic: Ops team deployed a configuration change.
And, yes… another non-human.
Code is live Link to dashboard.
13 minutes elapsed… Code is now in production with public traffic.
Who committed code in the last deploy? And how many lines did each of them change?
Handoff for the next deploy.
Entire app deploy took 15 minutes. !
4 people running the deployment 8 committers Config deploy and Chef change deployed in parallel.
Optimal queue size
Normalized communication
Improved visibility
Historical record is ideal for post-mortems
Organic evolution
Hold up the queue (.hold)
Work the issue with the people available in #push
Additional help always available in #sysops
Buddy-system for off-hours deploys
Ops-on-call, dev-on-call
When something goes wrong?
25 Million Items listed 60+ Million Monthly unique visitors 200 Countries with annual transactions !
175+ Committers, everyone deploys
Items by anjaysdesigns, betwixxt, OneStarLeatherGoods, mediumcontrol, TheDesignPallet
@mikebrittain
DEPLOYMENTS PER DAYAPP CODE CONFIG FILES
Start small. (We did.)
Automated tests and production monitoring.
Have a story around maintaining quality.
“We can always go back to the old way.”
Demonstrate value to leadership.
Go write your own story.
Thank you.
Mike Brittain
Engineering Director, Etsy
@mikebrittain mikebrittain.com/talks