Session ID: @RealGeneKim, [email protected] Gene Kim Why Everyone Needs DevOps Now: My Fourteen Year Journey Studying High Performing IT Organizations
Aug 31, 2014
Session ID:
@RealGeneKim, [email protected]
Gene Kim
Why Everyone Needs DevOps Now:
My Fourteen Year Journey Studying High Performing IT Organizations
@RealGeneKim
Where Did The High Performers Come From?
@RealGeneKim
Visible Ops: Playbook of High Performers
� The IT Process Institute has been studying high-performing organizations since 1999
� What is common to all the high performers?
� What is different between them and average and low performers?
� How did they become great?
www.ITPI.org
@RealGeneKim
Act I: IT Ops Fixing Fragile Artifacts
@RealGeneKim
@RealGeneKim
The Product Managers
@RealGeneKim
Act 2: The Developers
@RealGeneKim
@RealGeneKim
@RealGeneKim
IT Ops And Dev At War
10
@RealGeneKim
Nothing Left For Infosec
@RealGeneKim
@RealGeneKim13
The Downward
Spiral…
@RealGeneKim
@RealGeneKim
So, CEOs Don’t Trust IT…
� “If IT fails I don't know why… and if IT succeeds I don't know why.”
� “By managing inputs and outputs, I can hold any area of the business accountable – except for IT…”
� “Large investments in IT projects that eventual fail, without warning. And the CIO is the first to say, ‘I told you so.’”
� “I can’t hold IT accountable – IT is way too ‘slippery.’”
15
Source: Gene Kim 2012
@RealGeneKim16
The IT Core Chronic Conflict
� Every IT organization is pressured to simultaneously:
� Respond more quickly to urgent business needs
� Provide stable, secure and predictable IT service
Source: The authors acknowledge Dr. Eliyahu Goldratt, creator of the Theory of Constraints and author of The Goal, has written extensively on the theory and practice of identifying and resolving
core, chronic conflicts.
@RealGeneKim
Every Company Is An IT Company…
� 95% of all capital projects have an IT component…
� 50% of all capital spending is technology-related
We are here…
Where we need to be…
IT is always in the way(again…)
@RealGeneKim
The Urgency Of This Business Problem
“Of the Fortune 500 companies in 1955, 87% are gone...
“In 1958, the Fortune 500 tenure was 61 years; now it’s 18 years…”
–Richard Foster, “Creative Destruction”
18
19 | Reimagining the Application Lifecycle
Obama campaign’s tech team beat Romney by using opposite strategy—“insourcing.”
Even taken with the software and Web hosting expenses, the Obama campaign spent a seventh of what the Romney campaign spent on digital….
In the end, the deciding factor wasn’t what the Obama campaign spent money on, but what it did with all that money. Insourcing gave the campaign a strategic flexibility that the Romney campaign lacked….
“This is the difference...between a well run professional machine and a gaggle of amateurs....I would be shocked if such a chasm exists next cycle between the parties—these aren’t mistakes to be repeated if you want to do things like win elections.”
http://arstechnica.com/information-technology/2012/11/how-team-obamas-tech-efficiency-left-romney-it-in-dust/
How Team Obama’s tech efficiency left Romney IT in dust
Technologies accelerate business-practice changes
Hired campaign staff engineers from Facebook, Twitter, Google, Microsoft, and technology startups.
“We ran the election 66,000 times
every night,” said a senior official,
describing the computer
simulations the campaign ran to
figure out Obama’s odds of
winning each swing state. “And
every morning we got the spit-out
— here are your chances of
winning these states. And that is
how we allocated resources.”
Surveys used live interviewers, very large sample sizes and very short questionnaires, which focused on vote preference and strength of support, with no more than a handful of additional substantive questions.
The massive scope of its polling effort helped guide the Obama campaign in ways that would be impossible with conventional polling…three-day rolling-average tracking in each state.
Build. Measure. Learn.
http://www.theatlantic.com/technology/archive/2012/11/when-the-nerds-go-marching-in/265325/
http://www.huffingtonpost.com/2012/11/21/obama-campaign-polls-2012_n_2171242.html
http://swampland.time.com/2012/11/07/inside-the-secret-world-of-quants-and-data-crunchers-who-helped-obama-win/
Act 3:There Must Be A Better Way…
21
@RealGeneKimSource: John Allspaw
@RealGeneKim
@RealGeneKimSource: John Allspaw
@RealGeneKim
@RealGeneKimSource: John Allspaw
@RealGeneKimSource: John Allspaw
@RealGeneKimSource: Theo Schlossnagle
@RealGeneKimSource: Theo Schlossnagle
@RealGeneKimSource: Theo Schlossnagle
@RealGeneKimSource: John Jenkins, Amazon.com
@RealGeneKim
@RealGeneKim
Who Is Doing DevOps?
� Google, Amazon, Netflix, Etsy, Akamai, Twitter, Facebook, Pinterest …
� BNY Mellon, Bank of America, World Bank, Paychex, Intuit…
� The Gap, Nordstrom, REI, Macy’s, GameStop, Target …
� Portland State University, Seton Hill University, Kansas State University…
� Who else?
33
@RealGeneKim
High Performing DevOps Teams
� They’re more agile
� 30x more frequent deployments
� 8,000x faster lead time than their peers
� They’re more reliable
� 2x the change success rate
� 12x faster MTTR
Source: Puppet Labs 2012 State Of DevOps: http://puppetlabs.com/2013-state-of-devops-infographic
@RealGeneKim35
How Can We BetterSell DevOps?
36
@RealGeneKim
Eric Passmore, former SVP Global Engineering, AOL (2007)
37
The Downward SpiralOperations Sees…
� Fragile applications are prone to failure
� Long time required to figure out “which bit got flipped”
� Detective control is a salesperson
� Too much time required to restore service
� Too much firefighting and unplanned work
� Planned project work cannot complete
� Frustrated customers leave
� Market share goes down
� Business misses Wall Street commitments
� Business makes even larger promises to Wall Street
Dev Sees…
� More urgent, date-driven projects put into the queue
� Even more fragile code put into production
� More releases have increasingly “turbulent installs”
� Release cycles lengthen to amortize “cost of deployments”
� Failing bigger deployments more difficult to diagnose
� Most senior and constrained IT ops resources have less time to fix underlying process problems
� Ever increasing backlog of infrastructure projects that could fix root cause and reduce costs
� Ever increasing amount of tension between IT Ops and Development
These aren’t IT Operations problems…
These are business problems!
@RealGeneKim
Gene Kim, CTO, Tripwire, Inc. (2006)
39
The Downward SpiralOperations Sees…
� Fragile applications are prone to failure
� Long time required to figure out “which bit got flipped”
� Detective control is a salesperson
� Too much time required to restore service
� Too much firefighting and unplanned work
� Planned project work cannot complete
� Frustrated customers leave
� Market share goes down
� Business misses Wall Street commitments
� Business makes even larger promises to Wall Street
Dev Sees…
� More urgent, date-driven projects put into the queue
� Even more fragile code put into production
� More releases have increasingly “turbulent installs”
� Release cycles lengthen to amortize “cost of deployments”
� Failing bigger deployments more difficult to diagnose
� Most senior and constrained IT ops resources have less time to fix underlying process problems
� Ever increasing backlog of infrastructure projects that could fix root cause and reduce costs
� Ever increasing amount of tension between IT Ops and Development
These aren’t IT Operations problems…
These are business problems!
@RealGeneKim
Anonymous Product Manager / UX (2011)
41
The Downward SpiralOperations Sees…
� Fragile applications are prone to failure
� Long time required to figure out “which bit got flipped”
� Detective control is a salesperson
� Too much time required to restore service
� Too much firefighting and unplanned work
� Planned project work cannot complete
� Frustrated customers leave
� Market share goes down
� Business misses Wall Street commitments
� Business makes even larger promises to Wall Street
Dev Sees…
� More urgent, date-driven projects put into the queue
� Even more fragile code put into production
� More releases have increasingly “turbulent installs”
� Release cycles lengthen to amortize “cost of deployments”
� Failing bigger deployments more difficult to diagnose
� Most senior and constrained IT ops resources have less time to fix underlying process problems
� Ever increasing backlog of infrastructure projects that could fix root cause and reduce costs
� Ever increasing amount of tension between IT Ops and Development
These aren’t IT Operations problems…
These are business problems!
@RealGeneKim
Anonymous Infosec Officer (2012)
43
@RealGeneKim44
@RealGeneKim
@RealGeneKim46
“This book will have a profound effect on IT, just as The Goal did for manufacturing.” –JezHumble, co-author Continuous Delivery
“This is the IT swamp draining manual for anyone who is neck deep in alligators.” –Adrian Cockroft, Cloud Architect at Netflix
“This is The Goal for our decade, and is for any IT professional who wants their life back.” –Charles Betz, IT architect, author “Architecture and Patterns for IT”
@RealGeneKim
The First Way: Flow
@RealGeneKim
The First Way: Flow
� Understand the flow of work
� Always seek to increase flow
� Never unconsciously pass defects downstream
� Never allow local optimization to cause global degradation
� Achieve profound understanding of the system
@RealGeneKim
“Annual business planning sessions can be madding. They think IT Operations is an ‘all you can eat buffet.’”
-Ben Rockwood, Director Systems Engineering,Joyent
@RealGeneKim
Define The Work and Make It Visible
� Business projects (e.g., new order system)
� Internal IT projects (e.g., configuration management, automation, debt reduction)
� Changes (e.g., deploys, improve database performance)
� Unplanned work (e.g., site down, site impaired)
50
@RealGeneKim
Questions
� What is your lead time for changes? (i.e., how long does it take to go from “code committed” to “code successfully running in production”)
� How much of that is queue time vs. run time?
51
@RealGeneKim
@RealGeneKim
@RealGeneKim
Create One Step Environment Creation Process
� Make environments available early in the Development process
� Make sure Dev builds the code and environment at the same time
� Create a common Dev, QA and Production environment creation process
@RealGeneKim
If I had a magic wand, I’d change the Agile sprints and definition of “done”:
“At the end of each sprint, we must have working and shippable code, demonstrated in an environment that resembles production.”
@RealGeneKim
Deploy Smaller Changes, More Frequently *
� Decouple feature releases from code deployments
� Deploy features in a disabled state, using feature flags
� Require all developers check code into trunk daily (at least)
� Practice deploying smaller changes, which dramatically reduces risk and improves MTTR
56
@RealGeneKim
Breaking The Bottlenecks In The Flow
� Environment creation
� Code deployment
� Test setup and run
� Overly tight architecture
� Development
� Product management
57
58
How organizations achieve high performance
• 89% are using infrastructure version control
• 82% are using automated code deployments
Source: Puppet Labs 2012 State Of DevOps: http://puppetlabs.com/2013-state-of-devops-infographic
@RealGeneKim
Why Dedicated Teams Vs. Shared Services
59
@RealGeneKim
@RealGeneKim
Leankit Kanban
61
@RealGeneKim
Blackboard Learn: 2005-Present
62
Source: David Ashman, Chief Architect, Blackboard, Inc.
@RealGeneKim
Blackboard Learn Building Blocks
63
Source: David Ashman, Chief Architect, Blackboard, Inc.
@RealGeneKim
The First Way: Outcomes
� Creating single repository for code and environments
� Determinism in the release process
� Consistent Dev, Test and Production environments, all properly built before deployment begins
� A continuous delivery pipeline that can be relied upon and daily Dev code commits
� Free ourselves from the learned behavior of catastrophic deployments
� Decreased lead time
� Reduce deployment times from 6 hours to 45 minutes
� Refactor deployment process that had 1300+ steps spanning 4 weeks
� Faster cycle time and release cadence
@RealGeneKim
The Second Way: Feedback
@RealGeneKim
The Second Way: Feedback
� Understand and respond to the needs of all customers, internal and external
� Shorten and amplify all feedback loops: stop the line when necessary
� Create quality at the source
� Create and embed knowledge where we need it
@RealGeneKim67
Source: John Shook
@RealGeneKim
“We found that when we woke up developers at 2am, defects got fixed faster than ever”
– Patrick Lightbody, CEO, BrowserMob
@RealGeneKim
Require That Devs Manage Their Own Code For 6+ Months
69Source: Tom Limoncelli, Google
@RealGeneKim
Test Whether Developers Qualify For IT Operations Resources
� Types/frequency of pager alerts
� Maturity of monitoring
� System architecture review
� Release process
� Defect counts and severity
� Production hygiene
70Source: Tom Limoncelli, Google
@RealGeneKim
Return Fragile Services Back To Dev
71Source: Tom Limoncelli, Google
@RealGeneKim
Feedback And Situational Awareness
“Having a developer add a monitoring metric shouldn’t feel like a schema change.”
– John Allspaw, SVP Tech Ops, Etsy
72
@RealGeneKim73
@RealGeneKim74
@RealGeneKim
Integrating Into Continuous Delivery
� The days of reviewing RFCs in Word docs in change management meetings are over
� Failures must result in automated tests in the continuous deployment pipeline (Release, Config, Change)
� Invite or embed Ops into Dev standups and the scrum teams (“hey, we can sprint and scrums, too!”)
@RealGeneKim
Embed Dev Into IT Ops
� Embed Dev into IT Ops incident escalation process
� Put production monitoring in pre-production environments
� Invite Dev to post-mortems/root cause analysis meeting
� Have Dev and Infosec cross-train IT Operations
� Ensure application monitoring/metrics to aid in Ops and Infosecwork (e.g., incident/problem management)
@RealGeneKim
What’s In It For Infosec And QA?
77
@RealGeneKim
The Second Way:Outcomes
� Defects and security issues getting fixed faster than ever
� Standardized and reusable Ops and Infosec user stories now part of the Agile process
� All groups communicating and coordinating better
� Everybody is getting more work done
@RealGeneKim
The Third Way:Continual Experimentation And Learning
@RealGeneKim
The Third Way:Continual Experimentation And Learning
� Foster a culture that rewards:
� Experimentation (taking risks) and learning from failure
� Repetition is the prerequisite to mastery
� Why?
� You need a culture that keeps pushing into the danger zone
� And have the habits that enable you to survive in the danger zone
@RealGeneKim
Break Things Early And Often
“Do painful things more frequently, so you can make it less painful… We don’t get pushback from Dev, because they know it makes rollouts smoother.”
– Adrian Cockcroft, Architect, Netflix
@RealGeneKim82
@RealGeneKim
Inject Failures Often
@RealGeneKim
You Don’t Choose Chaos Monkey…Chaos Monkey Chooses You
@RealGeneKim
Break Things Before Production
� Enforce consistency in code, environments and configurations across the environments
� Add your ASSERTs to find misconfigurations, enforce https, etc.
� Add static code analysis to automated continuous integration and testing process
@RealGeneKim
Reduce Technical Debt
� “The deal with engineering goes like this. Product management takes 20% of the capacity right off the top and gives this to engineering to spend as they see fit. Whatever is required to avoid, ‘we need to stop features to rewrite code.
“If you’re in really bad shape today, you might need to make this 30% or even more of the resources. I get nervous when I find teams that think they can get away with much less than 20%.”
– Marty Cagan, Inspired
@RealGeneKim
Allocate 20% Of Cycles To Technical Debt Reduction
@RealGeneKim
Recognize Compounding Technical Debt…
@RealGeneKim
That Gets Worse…
@RealGeneKim
And Fixing It…
Source: Pingdom
@RealGeneKim
An Innovation Culture
“By installing a rampant innovation culture, they now do 165 experiments in the three months of tax season.
Our business result? Conversion rate of the website is up 50 percent. Employee result? Everyone loves it, because now their ideas can make it to market.”
–Scott Cook, Intuit Founder
91
@RealGeneKim
Convergence And Evolution Of Ideas
� Four Steps To The Epiphany, Steven Blank (2005)
� Principles Of Product Development Flow: Second Generation Lean Product Development, Donald Reinertsen (2009)
� Lean Startup, Eric Ries (2011)
� Lean UX, Jeff Gothelf (2013)
92
93
Performance by DevOps maturity
Organizations that implemented DevOps practices over 12
months ago were 5x more likely to be high performing than
organizations that weren’t implementing DevOps at all. Source: Puppet Labs 2012 State Of DevOps: http://puppetlabs.com/2013-state-of-devops-infographic
Why Do I Think This Is Important?
94
@RealGeneKim95
The Downward
Spiral…
@RealGeneKim
@RealGeneKim97
@RealGeneKim
If I Could Wave A Magic Wand, Everyone Will…
� See the suffering downstream, and have confidence that your intuitions and skills can make a profound and positive difference…
� Become conversant with DevOps and recognize the practices when you see them
� Be energized about how practitioners can contribute in this organizational journey
� Leave with some concrete steps to get some great outcomes
� Help create a team that starts putting DevOps practices into place
98
@RealGeneKim
If I Could Wave A Magic Wand, Everyone Will…
� Become conversant with DevOps and recognize the practices when you see them
� Be energized about how practitioners can contribute in this organizational journey
� Leave with some concrete steps to get some great outcomes
� Become a part of a team that starts putting DevOps practices into place
99
@RealGeneKim100
“Some books you give to friends, for the joy of sharing a great novel.
“Some books you recommend to your colleagues and employees, to create common ground.
“Some books you share with your boss, to plant the seeds of a big idea.
“The Phoenix Project is all three.”
–Jeremiah Shirk, Integration & Infrastructure Manager at Kansas State University
@RealGeneKim
Our Mission: Positively Impact The Lives Of One Million IT Workers By 2017
� Free 170 page excerpt:http://itrevolution.com/the-phoenix-project-excerpt/
� http://slideshare.net/realgenekim
� DevOps Defensive Audit Toolkit
� Enterprise DevOps Case Studies
� Early draft of upcoming “DevOpsCookbook” (Allspaw, DeBois, Edwards, Humble, Kim, Orzen)
� Email me at [email protected]