Top Banner
"GameDay" Achieving resilience through Chaos Engineering Matt Fellows @matthewfellows #AAGameDay #ChaosTesting Pete Cohen @petecohen
42

GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

Jun 07, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

"GameDay"Achieving resilience through Chaos Engineering

Matt Fellows@matthewfellows

#AAGameDay#ChaosTesting

Pete Cohen@petecohen

Page 2: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

MATT

Page 3: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

PE

E

Page 4: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

MATT

Page 5: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

What is the common thread for these catastrophes?

Page 6: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

#1 They all combined Technology with People + Process

Page 7: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

#2 They all had multiple causes

Page 8: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

Overview

■Why GameDay exercises?■ Case Studies■How you can run one for yourself

Page 9: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

■ Bugs■ Integration issues■ Distributed failure■ The squishy stuff: People + Process

Classes of issues

Page 10: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

User Interface Mobile

API Gateway

Mainframe / DB

Middleware / APIs Ve

rtica

l Slic

e

Pace layered architecture

Page 11: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

User Interface Mobile

API Gateway

Mainframe / DB

Middleware / APIs Ve

rtica

l Slic

e

Bug

bug

Page 12: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

User Interface Mobile

API Gateway

Mainframe / DB

Middleware / APIs Ve

rtica

l Slic

e

Integration Issues

integration

Page 13: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

User Interface Mobile

API Gateway

Mainframe / DB

Middleware / APIs Ve

rtica

l Slic

e

Distributed Failures

distributed

Page 14: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

User Interface Mobile

API Gateway

Mainframe / DB

Middleware / APIs Ve

rtica

l Slic

e

Catastrophes

Customers

EngineersCall Centre

bug

distributed

...

integration Public Relations

Page 15: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

Classes of issues

■ Bugs■ Integration issues■ Distributed failure■ The squishy stuff: People + Process

Page 16: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

So how do we avoid becoming front page news?

(the bad kind)

Page 17: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

Fragility vs Resilience

Page 18: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

Resilience vs Antifragility

Page 19: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

Embracing Failure

■We need to practice failure■ Software Engineering needs its Fire Drill

Page 20: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

An exercise where we place our systems - technology, people + processes -

under stress in order to learn and improve resilience.

GameDay

Page 21: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

A GameDay manifesto?

DR GameDays

Driver Process Continuous Improvement

Approach Run sheet + requirements Loose plan + a little chaos

Focus Infrastructure Customer

Who Operations Cross functional, multi-disciplinary team

Assumption System is built to a robust design

System is hazardous

Page 22: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

Once you finally start succeeding at agile…

Iterative software development Independent feature teamsNimble architecturesDistributed, scalable infrastructure

Page 23: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

We want to inspire youto give GameDays a go

Page 24: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

Case Studies

Page 25: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

Case Study: SEEK & nib

MATT

Page 26: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus
Page 27: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

Logistics - how to plan a GameDay

dius.com.au/resources/game-day■ People and roles to get involved■ Preparation workshops and planning■ Templates and checklist■ Physical space set up

Page 28: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

Get buy in

Find the

right people

Runworkshops

Logistical preparation

Runthe

GameDay

Communicate and act on outcomes

Page 29: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

Get buy in

Find the

right people

Runworkshops

Logistical preparation

Runthe

GameDay

Communicate and act on outcomes

Page 30: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

Decide which broad

areas to test

Identify scenarios

Capture hypotheses

Formulate an action plan to

set up scenarios

Scenario and hypothesis generation workshop

Get a common view of

the stack

Page 31: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus
Page 32: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus
Page 33: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

MATT

Load Balancer

API API API API

Load Balancer Load Balancer Load

BalancerLoad

Balancer

Post Mortem

Page 34: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

Post Mortem

MATT

Load Balancer

API API API API

Load Balancer Load Balancer Load Balancer Load Balancer

X X X XNo visibility!

✅ ✅ ✅ ✅

X

Release Dashboard

Page 35: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

Ingredients for catastrophe

MATT

✓Introduction of a change to the system✓Human error✓Missing local controls (tests) to prevent syntax issue✓Lack of salient information for operator (monitoring and alerting)✓Opportunity to misinterpret data✓Distance between expert and operator (process)

Page 36: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

What did we learn?

■ Just getting teams together to discuss resilience was worthwhile

■We always found something ■Our experiments reduced the impact of

hindsight bias

PETE

Page 37: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

What matters: ■ Cross-functional team ■ Planning■Open to exposing failure■ Customer focus■ Bake it in - do GameDays frequently

What doesn’t matter: ■ Size of team/company■Waterfall/Agile■ Language, technology...

PETE

Page 38: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

Are GameDays the new hack days?

■ Collaboration■ Problem solving■ Creates business value

Page 39: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus
Page 40: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus
Page 41: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

The journey towards automated resilience testing

MATT

Pre-Production:

■Create local experiments in Docker■Manual chaos in integrated

environments

Production:

■Start small!■Metrics-driven approach

Chaos Kong

pumba

Page 42: GameDay - Agile Australia · 2019-05-20 · A GameDay manifesto? DR GameDays Driver Process Continuous Improvement Approach Run sheet + requirements Loose plan + a little chaos Focus

Matt Fellows @matthewfellows [email protected] Cohen @petecohen [email protected]

For links, references, templates and your GameDay toolkit, head to:

dius.com.au/resources/game-day

Thank you!