Top Banner
Distributed Release Management Deploying etsy.com 40+ times per day Mike Brittain Engineering Director, Etsy @mikebrittain mikebrittain.com/talks
46

Distributed Release Management

May 08, 2015

Download

Technology

Mike Brittain

Full Stack Engineering Meetup in NYC, May 27, 2014.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Distributed Release Management

Distributed Release Management Deploying etsy.com 40+ times per day

Mike Brittain

Engineering Director, Etsy

@mikebrittain mikebrittain.com/talks

Page 2: Distributed Release Management

1st Day Assignment Put your face on etsy.com/about

Page 3: Distributed Release Management

What I’m showing you tonight is the result of four years of iteration.

Page 4: Distributed Release Management

Small incremental changes to the application “Dark” features: new classes, methods, controllers Graphics, stylesheets, templates Copy/content changes !

App deploys

Turning flags on, off, or % ramp up

Config deploys

Page 5: Distributed Release Management

Latent bugs and security holes Traffic management, load shedding Adding and removing infrastructure !

Tweaking config flags or releasing patches.

“Operating” the site

Page 6: Distributed Release Management

IRC, #push

Page 7: Distributed Release Management

/topic mbrittain | jgoulah | rsnyder | ekastner

Page 8: Distributed Release Management

/topic mbrittain, jgoulah, rsnyder | ekastner

Page 9: Distributed Release Management

Keep real people in the loop

Queue, with max batch size of seven.

Automated deployment run by humans

Page 10: Distributed Release Management

4 people in this deploy.

“I’ve pushed my changes to master.”

“Everyone has checked in.”

Page 11: Distributed Release Management

Build QA and Pre-prod

Build progress

Status in #push

Git SHA1 in for each env.

Date, username, deploy log, changeset, link to dashboard from time of deploy

Page 12: Distributed Release Management
Page 13: Distributed Release Management

Reporting what’s going on in Deployinator, and who triggered

Status from build cluster

Page 14: Distributed Release Management

Pre-prod (“princess”) has been deployed. !

SHA1 of the change Time it took to deploy Link to changeset in GitHub Log of the deploy script

Page 15: Distributed Release Management

Btw, there are three bots talking in channel at this point. O_o

Page 16: Distributed Release Management

Queuing for next deploy

Humans talk to other humans from time to time.

Page 17: Distributed Release Management

Talking to pushbot. !

Pushbot knows some Spanish… because, ya know, why not?

Page 18: Distributed Release Management

Link to test results for CI environment, along with how long the tests took.Alerting by name.

Page 19: Distributed Release Management

8 minutes have elapsed… We’ve built and tested our release in the CI environment (“QA”). !

QA build failed our 5 min. SLA for tests.

Page 20: Distributed Release Management

“Try” is our pre-commit testing cluster.

Page 21: Distributed Release Management

Bots help reinforce our values. This is especially helpful for new people on the team.

Page 22: Distributed Release Management
Page 23: Distributed Release Management

Still 8 minutes elapsed… Pre-prod has been deployed and tested. !

This ran in parallel with our QA build and tests.

Page 24: Distributed Release Management
Page 25: Distributed Release Management

Cross-traffic: In a separate channel (#config), our app configs files were deployed to pre-prod.

Page 26: Distributed Release Management
Page 27: Distributed Release Management
Page 28: Distributed Release Management
Page 29: Distributed Release Management
Page 30: Distributed Release Management

Cross-traffic: Ops team deployed a configuration change.

And, yes… another non-human.

Page 31: Distributed Release Management
Page 32: Distributed Release Management

Code is live Link to dashboard.

Page 33: Distributed Release Management
Page 34: Distributed Release Management

13 minutes elapsed… Code is now in production with public traffic.

Page 35: Distributed Release Management

Who committed code in the last deploy? And how many lines did each of them change?

Page 36: Distributed Release Management
Page 37: Distributed Release Management
Page 38: Distributed Release Management

Handoff for the next deploy.

Page 39: Distributed Release Management

Entire app deploy took 15 minutes. !

4 people running the deployment 8 committers Config deploy and Chef change deployed in parallel.

Page 40: Distributed Release Management

Optimal queue size

Normalized communication

Improved visibility

Historical record is ideal for post-mortems

Organic evolution

Page 41: Distributed Release Management

Hold up the queue (.hold)

Work the issue with the people available in #push

Additional help always available in #sysops

Buddy-system for off-hours deploys

Ops-on-call, dev-on-call

When something goes wrong?

Page 42: Distributed Release Management

25 Million Items listed 60+ Million Monthly unique visitors 200 Countries with annual transactions !

175+ Committers, everyone deploys

Items by anjaysdesigns, betwixxt, OneStarLeatherGoods, mediumcontrol, TheDesignPallet

Page 43: Distributed Release Management

@mikebrittain

DEPLOYMENTS PER DAYAPP CODE CONFIG FILES

Page 44: Distributed Release Management

Start small. (We did.)

Automated tests and production monitoring.

Have a story around maintaining quality.

“We can always go back to the old way.”

Demonstrate value to leadership.

Page 45: Distributed Release Management

Go write your own story.

Page 46: Distributed Release Management

Thank you.

Mike Brittain

Engineering Director, Etsy

@mikebrittain mikebrittain.com/talks