Why Everyone Needs DevOps Now: My Fourteen Year Journey Studying High Performing IT Organizations - Gene Kim, Author of The Phoenix Project

Post on 31-Aug-2014

880 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

How do great IT organizations simultaneously deliver stellar service levels and fast flow of new features into production? It requires creating a “super-tribe”, where development, test, IT operations and information security genuinely work together to solve business objectives as opposed to throwing each under the bus. In this talk, Gene Kim will describe what successful development organization transformations look like, and how they were achieved from a Dev and Ops perspective. Drawing upon a 14 year study of high performing IT organizations, Gene will share the best known methods, recipes and case studies of how to implement successful DevOps-style transformations. See Gene Kim's Edge Presentation: http://www.akamai.com/html/custconf/edgetv-developers.html#gene-kim The Akamai Edge Conference is a gathering of the industry revolutionaries who are committed to creating leading edge experiences, realizing the full potential of what is possible in a Faster Forward World. From customer innovation stories, industry panels, technical labs, partner and government forums to Web security and developers' tracks, there’s something for everyone at Edge 2013. Learn more at http://www.akamai.com/edge

Transcript

Session ID:

@RealGeneKim, genek@realgenekim.me

Gene Kim

Why Everyone Needs DevOps Now:

My Fourteen Year Journey Studying High Performing IT Organizations

@RealGeneKim

Where Did The High Performers Come From?

@RealGeneKim

Visible Ops: Playbook of High Performers

� The IT Process Institute has been studying high-performing organizations since 1999

� What is common to all the high performers?

� What is different between them and average and low performers?

� How did they become great?

www.ITPI.org

@RealGeneKim

Act I: IT Ops Fixing Fragile Artifacts

@RealGeneKim

@RealGeneKim

The Product Managers

@RealGeneKim

Act 2: The Developers

@RealGeneKim

@RealGeneKim

@RealGeneKim

IT Ops And Dev At War

10

@RealGeneKim

Nothing Left For Infosec

@RealGeneKim

@RealGeneKim13

The Downward

Spiral…

@RealGeneKim

@RealGeneKim

So, CEOs Don’t Trust IT…

� “If IT fails I don't know why… and if IT succeeds I don't know why.”

� “By managing inputs and outputs, I can hold any area of the business accountable – except for IT…”

� “Large investments in IT projects that eventual fail, without warning. And the CIO is the first to say, ‘I told you so.’”

� “I can’t hold IT accountable – IT is way too ‘slippery.’”

15

Source: Gene Kim 2012

@RealGeneKim16

The IT Core Chronic Conflict

� Every IT organization is pressured to simultaneously:

� Respond more quickly to urgent business needs

� Provide stable, secure and predictable IT service

Source: The authors acknowledge Dr. Eliyahu Goldratt, creator of the Theory of Constraints and author of The Goal, has written extensively on the theory and practice of identifying and resolving

core, chronic conflicts.

@RealGeneKim

Every Company Is An IT Company…

� 95% of all capital projects have an IT component…

� 50% of all capital spending is technology-related

We are here…

Where we need to be…

IT is always in the way(again…)

@RealGeneKim

The Urgency Of This Business Problem

“Of the Fortune 500 companies in 1955, 87% are gone...

“In 1958, the Fortune 500 tenure was 61 years; now it’s 18 years…”

–Richard Foster, “Creative Destruction”

18

19 | Reimagining the Application Lifecycle

Obama campaign’s tech team beat Romney by using opposite strategy—“insourcing.”

Even taken with the software and Web hosting expenses, the Obama campaign spent a seventh of what the Romney campaign spent on digital….

In the end, the deciding factor wasn’t what the Obama campaign spent money on, but what it did with all that money. Insourcing gave the campaign a strategic flexibility that the Romney campaign lacked….

“This is the difference...between a well run professional machine and a gaggle of amateurs....I would be shocked if such a chasm exists next cycle between the parties—these aren’t mistakes to be repeated if you want to do things like win elections.”

http://arstechnica.com/information-technology/2012/11/how-team-obamas-tech-efficiency-left-romney-it-in-dust/

How Team Obama’s tech efficiency left Romney IT in dust

Technologies accelerate business-practice changes

Hired campaign staff engineers from Facebook, Twitter, Google, Microsoft, and technology startups.

“We ran the election 66,000 times

every night,” said a senior official,

describing the computer

simulations the campaign ran to

figure out Obama’s odds of

winning each swing state. “And

every morning we got the spit-out

— here are your chances of

winning these states. And that is

how we allocated resources.”

Surveys used live interviewers, very large sample sizes and very short questionnaires, which focused on vote preference and strength of support, with no more than a handful of additional substantive questions.

The massive scope of its polling effort helped guide the Obama campaign in ways that would be impossible with conventional polling…three-day rolling-average tracking in each state.

Build. Measure. Learn.

http://www.theatlantic.com/technology/archive/2012/11/when-the-nerds-go-marching-in/265325/

http://www.huffingtonpost.com/2012/11/21/obama-campaign-polls-2012_n_2171242.html

http://swampland.time.com/2012/11/07/inside-the-secret-world-of-quants-and-data-crunchers-who-helped-obama-win/

Act 3:There Must Be A Better Way…

21

@RealGeneKimSource: John Allspaw

@RealGeneKim

@RealGeneKimSource: John Allspaw

@RealGeneKim

@RealGeneKimSource: John Allspaw

@RealGeneKimSource: John Allspaw

@RealGeneKimSource: Theo Schlossnagle

@RealGeneKimSource: Theo Schlossnagle

@RealGeneKimSource: Theo Schlossnagle

@RealGeneKimSource: John Jenkins, Amazon.com

@RealGeneKim

@RealGeneKim

Who Is Doing DevOps?

� Google, Amazon, Netflix, Etsy, Akamai, Twitter, Facebook, Pinterest …

� BNY Mellon, Bank of America, World Bank, Paychex, Intuit…

� The Gap, Nordstrom, REI, Macy’s, GameStop, Target …

� Portland State University, Seton Hill University, Kansas State University…

� Who else?

33

@RealGeneKim

High Performing DevOps Teams

� They’re more agile

� 30x more frequent deployments

� 8,000x faster lead time than their peers

� They’re more reliable

� 2x the change success rate

� 12x faster MTTR

Source: Puppet Labs 2012 State Of DevOps: http://puppetlabs.com/2013-state-of-devops-infographic

@RealGeneKim35

How Can We BetterSell DevOps?

36

@RealGeneKim

Eric Passmore, former SVP Global Engineering, AOL (2007)

37

The Downward SpiralOperations Sees…

� Fragile applications are prone to failure

� Long time required to figure out “which bit got flipped”

� Detective control is a salesperson

� Too much time required to restore service

� Too much firefighting and unplanned work

� Planned project work cannot complete

� Frustrated customers leave

� Market share goes down

� Business misses Wall Street commitments

� Business makes even larger promises to Wall Street

Dev Sees…

� More urgent, date-driven projects put into the queue

� Even more fragile code put into production

� More releases have increasingly “turbulent installs”

� Release cycles lengthen to amortize “cost of deployments”

� Failing bigger deployments more difficult to diagnose

� Most senior and constrained IT ops resources have less time to fix underlying process problems

� Ever increasing backlog of infrastructure projects that could fix root cause and reduce costs

� Ever increasing amount of tension between IT Ops and Development

These aren’t IT Operations problems…

These are business problems!

@RealGeneKim

Gene Kim, CTO, Tripwire, Inc. (2006)

39

The Downward SpiralOperations Sees…

� Fragile applications are prone to failure

� Long time required to figure out “which bit got flipped”

� Detective control is a salesperson

� Too much time required to restore service

� Too much firefighting and unplanned work

� Planned project work cannot complete

� Frustrated customers leave

� Market share goes down

� Business misses Wall Street commitments

� Business makes even larger promises to Wall Street

Dev Sees…

� More urgent, date-driven projects put into the queue

� Even more fragile code put into production

� More releases have increasingly “turbulent installs”

� Release cycles lengthen to amortize “cost of deployments”

� Failing bigger deployments more difficult to diagnose

� Most senior and constrained IT ops resources have less time to fix underlying process problems

� Ever increasing backlog of infrastructure projects that could fix root cause and reduce costs

� Ever increasing amount of tension between IT Ops and Development

These aren’t IT Operations problems…

These are business problems!

@RealGeneKim

Anonymous Product Manager / UX (2011)

41

The Downward SpiralOperations Sees…

� Fragile applications are prone to failure

� Long time required to figure out “which bit got flipped”

� Detective control is a salesperson

� Too much time required to restore service

� Too much firefighting and unplanned work

� Planned project work cannot complete

� Frustrated customers leave

� Market share goes down

� Business misses Wall Street commitments

� Business makes even larger promises to Wall Street

Dev Sees…

� More urgent, date-driven projects put into the queue

� Even more fragile code put into production

� More releases have increasingly “turbulent installs”

� Release cycles lengthen to amortize “cost of deployments”

� Failing bigger deployments more difficult to diagnose

� Most senior and constrained IT ops resources have less time to fix underlying process problems

� Ever increasing backlog of infrastructure projects that could fix root cause and reduce costs

� Ever increasing amount of tension between IT Ops and Development

These aren’t IT Operations problems…

These are business problems!

@RealGeneKim

Anonymous Infosec Officer (2012)

43

@RealGeneKim44

@RealGeneKim

@RealGeneKim46

“This book will have a profound effect on IT, just as The Goal did for manufacturing.” –JezHumble, co-author Continuous Delivery

“This is the IT swamp draining manual for anyone who is neck deep in alligators.” –Adrian Cockroft, Cloud Architect at Netflix

“This is The Goal for our decade, and is for any IT professional who wants their life back.” –Charles Betz, IT architect, author “Architecture and Patterns for IT”

@RealGeneKim

The First Way: Flow

@RealGeneKim

The First Way: Flow

� Understand the flow of work

� Always seek to increase flow

� Never unconsciously pass defects downstream

� Never allow local optimization to cause global degradation

� Achieve profound understanding of the system

@RealGeneKim

“Annual business planning sessions can be madding. They think IT Operations is an ‘all you can eat buffet.’”

-Ben Rockwood, Director Systems Engineering,Joyent

@RealGeneKim

Define The Work and Make It Visible

� Business projects (e.g., new order system)

� Internal IT projects (e.g., configuration management, automation, debt reduction)

� Changes (e.g., deploys, improve database performance)

� Unplanned work (e.g., site down, site impaired)

50

@RealGeneKim

Questions

� What is your lead time for changes? (i.e., how long does it take to go from “code committed” to “code successfully running in production”)

� How much of that is queue time vs. run time?

51

@RealGeneKim

@RealGeneKim

@RealGeneKim

Create One Step Environment Creation Process

� Make environments available early in the Development process

� Make sure Dev builds the code and environment at the same time

� Create a common Dev, QA and Production environment creation process

@RealGeneKim

If I had a magic wand, I’d change the Agile sprints and definition of “done”:

“At the end of each sprint, we must have working and shippable code, demonstrated in an environment that resembles production.”

@RealGeneKim

Deploy Smaller Changes, More Frequently *

� Decouple feature releases from code deployments

� Deploy features in a disabled state, using feature flags

� Require all developers check code into trunk daily (at least)

� Practice deploying smaller changes, which dramatically reduces risk and improves MTTR

56

@RealGeneKim

Breaking The Bottlenecks In The Flow

� Environment creation

� Code deployment

� Test setup and run

� Overly tight architecture

� Development

� Product management

57

58

How organizations achieve high performance

• 89% are using infrastructure version control

• 82% are using automated code deployments

Source: Puppet Labs 2012 State Of DevOps: http://puppetlabs.com/2013-state-of-devops-infographic

@RealGeneKim

Why Dedicated Teams Vs. Shared Services

59

@RealGeneKim

@RealGeneKim

Leankit Kanban

61

@RealGeneKim

Blackboard Learn: 2005-Present

62

Source: David Ashman, Chief Architect, Blackboard, Inc.

@RealGeneKim

Blackboard Learn Building Blocks

63

Source: David Ashman, Chief Architect, Blackboard, Inc.

@RealGeneKim

The First Way: Outcomes

� Creating single repository for code and environments

� Determinism in the release process

� Consistent Dev, Test and Production environments, all properly built before deployment begins

� A continuous delivery pipeline that can be relied upon and daily Dev code commits

� Free ourselves from the learned behavior of catastrophic deployments

� Decreased lead time

� Reduce deployment times from 6 hours to 45 minutes

� Refactor deployment process that had 1300+ steps spanning 4 weeks

� Faster cycle time and release cadence

@RealGeneKim

The Second Way: Feedback

@RealGeneKim

The Second Way: Feedback

� Understand and respond to the needs of all customers, internal and external

� Shorten and amplify all feedback loops: stop the line when necessary

� Create quality at the source

� Create and embed knowledge where we need it

@RealGeneKim67

Source: John Shook

@RealGeneKim

“We found that when we woke up developers at 2am, defects got fixed faster than ever”

– Patrick Lightbody, CEO, BrowserMob

@RealGeneKim

Require That Devs Manage Their Own Code For 6+ Months

69Source: Tom Limoncelli, Google

@RealGeneKim

Test Whether Developers Qualify For IT Operations Resources

� Types/frequency of pager alerts

� Maturity of monitoring

� System architecture review

� Release process

� Defect counts and severity

� Production hygiene

70Source: Tom Limoncelli, Google

@RealGeneKim

Return Fragile Services Back To Dev

71Source: Tom Limoncelli, Google

@RealGeneKim

Feedback And Situational Awareness

“Having a developer add a monitoring metric shouldn’t feel like a schema change.”

– John Allspaw, SVP Tech Ops, Etsy

72

@RealGeneKim73

@RealGeneKim74

@RealGeneKim

Integrating Into Continuous Delivery

� The days of reviewing RFCs in Word docs in change management meetings are over

� Failures must result in automated tests in the continuous deployment pipeline (Release, Config, Change)

� Invite or embed Ops into Dev standups and the scrum teams (“hey, we can sprint and scrums, too!”)

@RealGeneKim

Embed Dev Into IT Ops

� Embed Dev into IT Ops incident escalation process

� Put production monitoring in pre-production environments

� Invite Dev to post-mortems/root cause analysis meeting

� Have Dev and Infosec cross-train IT Operations

� Ensure application monitoring/metrics to aid in Ops and Infosecwork (e.g., incident/problem management)

@RealGeneKim

What’s In It For Infosec And QA?

77

@RealGeneKim

The Second Way:Outcomes

� Defects and security issues getting fixed faster than ever

� Standardized and reusable Ops and Infosec user stories now part of the Agile process

� All groups communicating and coordinating better

� Everybody is getting more work done

@RealGeneKim

The Third Way:Continual Experimentation And Learning

@RealGeneKim

The Third Way:Continual Experimentation And Learning

� Foster a culture that rewards:

� Experimentation (taking risks) and learning from failure

� Repetition is the prerequisite to mastery

� Why?

� You need a culture that keeps pushing into the danger zone

� And have the habits that enable you to survive in the danger zone

@RealGeneKim

Break Things Early And Often

“Do painful things more frequently, so you can make it less painful… We don’t get pushback from Dev, because they know it makes rollouts smoother.”

– Adrian Cockcroft, Architect, Netflix

@RealGeneKim82

@RealGeneKim

Inject Failures Often

@RealGeneKim

You Don’t Choose Chaos Monkey…Chaos Monkey Chooses You

@RealGeneKim

Break Things Before Production

� Enforce consistency in code, environments and configurations across the environments

� Add your ASSERTs to find misconfigurations, enforce https, etc.

� Add static code analysis to automated continuous integration and testing process

@RealGeneKim

Reduce Technical Debt

� “The deal with engineering goes like this. Product management takes 20% of the capacity right off the top and gives this to engineering to spend as they see fit. Whatever is required to avoid, ‘we need to stop features to rewrite code.

“If you’re in really bad shape today, you might need to make this 30% or even more of the resources. I get nervous when I find teams that think they can get away with much less than 20%.”

– Marty Cagan, Inspired

@RealGeneKim

Allocate 20% Of Cycles To Technical Debt Reduction

@RealGeneKim

Recognize Compounding Technical Debt…

@RealGeneKim

That Gets Worse…

@RealGeneKim

And Fixing It…

Source: Pingdom

@RealGeneKim

An Innovation Culture

“By installing a rampant innovation culture, they now do 165 experiments in the three months of tax season.

Our business result? Conversion rate of the website is up 50 percent. Employee result? Everyone loves it, because now their ideas can make it to market.”

–Scott Cook, Intuit Founder

91

@RealGeneKim

Convergence And Evolution Of Ideas

� Four Steps To The Epiphany, Steven Blank (2005)

� Principles Of Product Development Flow: Second Generation Lean Product Development, Donald Reinertsen (2009)

� Lean Startup, Eric Ries (2011)

� Lean UX, Jeff Gothelf (2013)

92

93

Performance by DevOps maturity

Organizations that implemented DevOps practices over 12

months ago were 5x more likely to be high performing than

organizations that weren’t implementing DevOps at all. Source: Puppet Labs 2012 State Of DevOps: http://puppetlabs.com/2013-state-of-devops-infographic

Why Do I Think This Is Important?

94

@RealGeneKim95

The Downward

Spiral…

@RealGeneKim

@RealGeneKim97

@RealGeneKim

If I Could Wave A Magic Wand, Everyone Will…

� See the suffering downstream, and have confidence that your intuitions and skills can make a profound and positive difference…

� Become conversant with DevOps and recognize the practices when you see them

� Be energized about how practitioners can contribute in this organizational journey

� Leave with some concrete steps to get some great outcomes

� Help create a team that starts putting DevOps practices into place

98

@RealGeneKim

If I Could Wave A Magic Wand, Everyone Will…

� Become conversant with DevOps and recognize the practices when you see them

� Be energized about how practitioners can contribute in this organizational journey

� Leave with some concrete steps to get some great outcomes

� Become a part of a team that starts putting DevOps practices into place

99

@RealGeneKim100

“Some books you give to friends, for the joy of sharing a great novel.

“Some books you recommend to your colleagues and employees, to create common ground.

“Some books you share with your boss, to plant the seeds of a big idea.

“The Phoenix Project is all three.”

–Jeremiah Shirk, Integration & Infrastructure Manager at Kansas State University

@RealGeneKim

Our Mission: Positively Impact The Lives Of One Million IT Workers By 2017

� Free 170 page excerpt:http://itrevolution.com/the-phoenix-project-excerpt/

� http://slideshare.net/realgenekim

� DevOps Defensive Audit Toolkit

� Enterprise DevOps Case Studies

� Early draft of upcoming “DevOpsCookbook” (Allspaw, DeBois, Edwards, Humble, Kim, Orzen)

� Email me at genek@realgenekim.me

top related