Top Banner
0 Scaling with Continuous Deployment Web 2.0 Expo New York, NY, September 29, 2010 Brett G. Durrett (@bdurrett) Vice President Engineering & Operations, IMVU, Inc.
34

Scaling Continuous Deployment at IMVU

May 13, 2015

Download

Technology

Brett Durrett

Scaling with Continuous Deployment as presented at the Web 2.0 Expo, New York, September 29, 2010
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Scaling Continuous Deployment at IMVU

0

Scaling with Continuous Deployment

Web 2.0 Expo

New York, NY, September 29, 2010

Brett G. Durrett (@bdurrett)

Vice President Engineering & Operations, IMVU, Inc.

Page 2: Scaling Continuous Deployment at IMVU

An online community where members use 3D avatars

to meet new people, chat, create

and have fun with their friends

Page 3: Scaling Continuous Deployment at IMVU

2

Who is my audience?

Mix of engineering / product?

Page 4: Scaling Continuous Deployment at IMVU

3

Who is my audience?

Mix of engineering / product?

How many from a startup?

Page 5: Scaling Continuous Deployment at IMVU

4

Who is my audience?

Mix of engineering / product?

How many from a startup?

How many believe iterating on your product

is critical to the success of your business?

Page 6: Scaling Continuous Deployment at IMVU

5

How quickly can your business iterate?

Page 7: Scaling Continuous Deployment at IMVU

6

Can I interest you in some

Continuous Deployment?

Page 8: Scaling Continuous Deployment at IMVU

7

In a Nutshell

What is Continuous Deployment?

• Engineer commits code

• 20 minutes later it is live in production

• Repeat for about 50 commits per day

Page 9: Scaling Continuous Deployment at IMVU

8

Does This Really Work?

“Maybe this is just viable for a single

developer … your site will be down. A lot.”

“It seems like the author either has no

customers or very understanding

customers”

Responses to February 2009 blog posting about Continuous Deployment at IMVU

(at the time IMVU had a $12 million run rate)

Page 10: Scaling Continuous Deployment at IMVU

9

Benefits

• Regressions easy to find, correct

• Releases have zero overhead

• Rapid iteration using real customer metrics

Page 11: Scaling Continuous Deployment at IMVU

Finding and Fixing Problems

• Each release has few

changes, 1-3 commits

• Production issues

correlate with check-

in timestamp

• No overhead to

producing a new

release to correct

issue

Identifying cause

takes minutes

Page 12: Scaling Continuous Deployment at IMVU

11

CD at IMVU: Simple Overview

All tests

pass?

Local tests

pass, engineer

commits code

Lots and lots of

tests run

Code deployed

to all servers

Metrics

good?

Code deployed

to % of servers

Metrics

still

good?

Rollback

(Blocks)

Revert commit

(Blocks)

No

Yes

No

Yes

No

Yes

Win!

Page 13: Scaling Continuous Deployment at IMVU

12

CD at IMVU: Detailed Overview

Page 14: Scaling Continuous Deployment at IMVU

13

Getting Started – Extreme Basics

1. Continuous integration system

2. Production monitoring and alerting

– System performance

– Business metrics

– Trending is nice too

3. Simple deploy / roll-back system

Page 15: Scaling Continuous Deployment at IMVU

14

Commit to Making Forward Progress

• Require coverage for all new code

• Add coverage for bugs / regressions

• Understand and fix root cause of failures

Page 16: Scaling Continuous Deployment at IMVU

Expect Some Hurdles

• Production outages

• New overhead

– Tests

– Build systems

• Production outages

• Frustration

• Production outages

(but well worth it)

Page 17: Scaling Continuous Deployment at IMVU

16

Dealing with SQL

Problems

• Difficult to roll-back schema

• Alter statements lock / impact customers

Solutions

• New schema has formal review process

• No alter on large tables, create new table

– Copy on read

– Complete migration with background job

Page 18: Scaling Continuous Deployment at IMVU

17

Big Features

• Developed on trunk, not branch

– “hidden” from customers by A/B experiment

– 100% control, add QA to experiment

• Deployed daily during development

• Slow roll-out by increasing experiment %

– Experiment closed = fully launched

Page 19: Scaling Continuous Deployment at IMVU

18

Test Speed

Slow tests burden to scaling

• Can’t run all tests in sandbox

• Faster to debug on build cluster

If possible…

• Keep tests fast

• Keep tests specific

Page 20: Scaling Continuous Deployment at IMVU

19

The cost of failing tests

As the team grows…

• More likely to have test failures

• More people blocked as a result

Intermittent failures very bad

Eliminate the root cause

Page 21: Scaling Continuous Deployment at IMVU

20

Other Issues

• Won’t catch issues that fail slowly– SELECT * FROM growing_table WHERE 1

• Some critical areas cause hard lock-ups

– MySQL

– Memcached

• Lack of test coverage of older code

– Not an issue if you start with test coverage

Page 22: Scaling Continuous Deployment at IMVU

21

Does Continuous Deployment Scale?

• Technical staff ~50 people

• 10 million monthly unique visitors

• Peak ~130K concurrent IM client logins

• It’s a real business!

– $40 million run rate

– Profitable and doubled revenue in 2009

Page 23: Scaling Continuous Deployment at IMVU

22

Newer Scaling Challenges

Biggest challenges come with growth of the

engineering organization

Page 24: Scaling Continuous Deployment at IMVU

23

SLA for Build Systems

Build systems are a critical service

Page 25: Scaling Continuous Deployment at IMVU

24

SLA for Build Systems

Build systems are a critical service

Run them that way

Page 26: Scaling Continuous Deployment at IMVU

25

Build and Push Times

Page 27: Scaling Continuous Deployment at IMVU

26

Overall Availability

Page 28: Scaling Continuous Deployment at IMVU

http://www.flickr.com/photos/onebigchickenman/4869442019/

Page 29: Scaling Continuous Deployment at IMVU

28

Build Throughput

• Initial implementation sequential builds

– Scaled okay to ~20 engineers

– Like trains running every 20 minutes

– One “red” blocks all following builds

• Solution: build isolation

– Enable testing single build without deploy

– “Red” build pulled, allow other builds to pass

Page 30: Scaling Continuous Deployment at IMVU

29

Web Build Software

• Custom test-file runner with JS GUI

• PHP SimpleTest

• Python's built-in unittest

• Selenium Core with in-house API wrapper

• YUITest for browser JS unit tests

• Erlang Eunit

• Buildbot

Page 31: Scaling Continuous Deployment at IMVU

30

Current Systems

• > 15,000 tests

• 86 web build servers

– 62 Linux

– 24 Windows

• ~ 10 minutes on build servers

• Deploy to cluster of ~700 servers

Page 32: Scaling Continuous Deployment at IMVU

31

Conclusion

• Continuous Deployment is possible!

• Starting earlier is easier - baby steps

• The value of being able to iterate

outweighs the challenges

Page 33: Scaling Continuous Deployment at IMVU

32

Questions?

Page 34: Scaling Continuous Deployment at IMVU

Thank You!

Brett G. Durrett

[email protected]

Twitter: @bdurrett

IMVU recognized as:

Inc. 500

http://bit.ly/dv52wK

Red Herring 100:

http://bit.ly/bbz5Ex

Best Place to Work:

http://bit.ly/aAVdp8

(and we're hiring)

http://www.imvu.com/jobs