FAIL FAST, FAIL OFTEN Gordon Haff @ghaff, Technology Evangelist William Henry @ipbabble, DevOps Strategy Lead 13 July 2016
FAIL FAST, FAIL OFTEN
Gordon Haff @ghaff, Technology EvangelistWilliam Henry @ipbabble, DevOps Strategy Lead13 July 2016
FAILURE
2
3
FAILURE
4
FAILURE
ALSO FAILURE
5
FAILURES HAVE CONSEQUENCES
6
THE INESCAPABLE CONCLUSION?
7
DON’TFAIL
8
DON’TFAIL
9
FAILWELL
10
11
Experiment by Peter Skillman, former VP of design at Palm
12
WHAT HE LEARNED
• Kindergarteners do not spend 15 minutes in a bunch of status transactions trying to figure out who is going to be CEO of Spaghetti Corporation.
• They don’t sit around talking about the problem. They just start building to determine what works and what doesn’t.
SOFTWARE = GREAT MATCH FOR
FAILING WELL
13
14
FIVE PRINCIPLES:
THE RIGHT
scopeapproachworkflowincentivesculture
15
THE RIGHT SCOPEConstrain the impact of failure
• Enable experimentation
• Stop cascading of failures
• Make deployments incremental, frequent, and routine events
• Generally decouple activities and decisions from each other
• Small, autonomous, bounded context services
16
SMALL
• “Two pizza teams”
• Well-defined functional units
• Organized around business capabilities (Conway's Law)
17
AUTONOMOUS
• Implementation changes can happen independently of other services
• Data and functionality exposed only through service calls over the network
• Designed to be externalizable
• No back-doors
18
THE RIGHT APPROACHContinuously experiment, iterate, and improve
• It’s about the process
• Identify mistakes early
• Establish safety nets
• Fail and move on
19
THE PROCESSInvolves people and communication
• The most effective process have continuous communication - think scrums and kanban
• Allows for collaboration that can identify failures before they happen
• Allows for feedback to continuously improve and cultivate growth
• Provides transparency
20
DEV LESSONS: BREAKING CODE VIOLENTLYBuild in violent failures to highlight issues
• C/C++ lessons:
• Sanity check using assertions
• Invariant checks
• If ever I’m here in the code and these conditions aren’t met, then I have no business being here. Something is wrong and I should fail violently.
• Involves tracing through the failure
21
AUTOMATED REGRESSION TESTING
• As products and services evolve we discovered that maintaining and incrementally adding new tests became valuable
• These tests were/are most often based on experienced failures and bugs
• Scripts were developed to run nightly builds against various developer changes to test for regression
• Testing tools evolved - proprietary and open source
22
OPS LESSONS: CHAOS MONKEYTest robustness of recovery using failure
• Platform should provide uninterrupted services to the customer
• Therefore:
• Should always recover in acceptable amount of time
• We should have random failures to ensure that changes have not regressed or caused new recovery problems
http://understeer.hatenablog.com/entry/2012/02/29/224629
23
THE RIGHT WORKFLOWRepeatably automate for consistency
• Goal is repeatable automation
• Toyota’s yellow cord
• Initially pipelines may be very different
• Different tools
• Traditional vs. “cloud native”
• It’s a journey• Consolidation evolves naturally
24
DESIRABLE ENTERPRISE CI/CD WORKFLOW
myRepo ProjectRepo
CI
Commit Push
Pass/Fail
Local Test
BuildRepo
CD
ReleaseRepo
Monitor
Build Test Review/Appr Deliver Deploy
3rd Party
25
CI/CD PIPELINE TOOLSET
CI/CD Workflow UI
gerrit
26
OPS LESSONS: RED/GREENConfiguration as code has built in failure
Continuous Integration / Continuous Deployment
Image & Package &Metadata Repository
src repo
Dev./Build QA Productionin OHC
Events
27
THE RIGHT INCENTIVESAlign rewards and behavior with desirable outcomes
• Incentives (advancement, money, recognition) need to reward trust, cooperation, and innovation
• Peer reward systems also valuable
• Individual has control over their own success
• But people still have responsibility for their actions
28
THE RIGHT CULTUREBuild systems and organizations that allow for failing well
• Transparency
• Even good decisions can have bad outcomes
• Innovation inherently risky• Cut losses (avoid sunk cost fallacy)
This is why open source is so successful!
29
30
BUT CULTURE ISN’T SOMETHING YOU JUST CHANGE
• Lack of agreed-to model of what “right” culture looks like
• Different organizations require different behaviors
• Culture change is difficult to measure and quantify
• Culture is very hard to impose
• Culture is an output, not an input
31
CULTURE IS:
emergentpervasivethe keystone
plus.google.com/+RedHat
linkedin.com/company/red-hat
youtube.com/user/RedHatVideos
facebook.com/redhatinc
twitter.com/RedHatNews
THANK YOU
CREDITS
33
Tacoma Narrows Bridge: Barney Elliott; The Camera Shop - Screenshot taken from 16MM Kodachrome motion picture film by Barney Elliott.
Time cover: Time, Inc.
Wipeout, Flickr/CC: https://www.flickr.com/photos/andymorffew/15843725192
Marshmallow challenge: http://marshmallowchallenge.com/Welcome.html
Linux Collaboration Summit: Linux Foundation.
Two pizzas: Flickr/CC https://www.flickr.com/photos/dongkwan/283076601
Frog: Kathy CC/Flickr https://flic.kr/p/b9fFV
Square peg Flickr/CC: https://www.flickr.com/photos/epublicist/3546059144/