Top Banner
October 29–November 3, 2017 | San Francisco, CA www.usenix.org/lisa17 #lisa17 Resiliency Testing with Toxiproxy Jake Pittis
65

Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Sep 26, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

October 29–November 3, 2017 | San Francisco, CAwww.usenix.org/lisa17 #lisa17

Resiliency Testing with Toxiproxy

Jake Pittis

Page 2: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Resiliency Testing with Toxiproxy

Jake Pittis

Page 3: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down
Page 4: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down
Page 5: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down
Page 6: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Reasoning about failure is hard.

Page 7: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Too many kinds of failures.

Page 8: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Large complex systems.

Page 9: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Constantly changing.

Page 10: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Our intuition is often wrong.

Page 11: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

IncidentA natural failure in production.

Page 12: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

A database writer goes down.

Page 13: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Incident

Root cause?

Ship fix!

Page 14: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

More Resilient

Incident

Root cause?

Ship fix!

Page 15: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

GamedayArtificially exercising a known failure scenario in production.

Page 16: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Flash sales.

Page 17: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Gameday

What broke?

Ship fix!

Page 18: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Gameday

What broke?

Ship fix!

Root cause?

Ship fix!

Incident

Page 19: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down
Page 20: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down
Page 21: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down
Page 22: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down
Page 23: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Resiliency is a product concern.

Page 24: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Automatically Prevent Regression

Page 25: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Accessible to All Developers

Automatically Prevent Regression

Page 26: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Lower Customer Impact

Accessible to All Developers

Automatically Prevent Regression

Page 27: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Maintain Authenticity

Lower Customer Impact

Accessible to All Developers

Automatically Prevent Regression

Page 28: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down
Page 29: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Chaos EngineeringRunning experiments in production to cause and fix unknown

failure scenarios.

Page 30: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

“Automate Experiments to Run Continuously”

Automatically Prevent Regression

Page 31: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

“Minimize Blast Radius”

Lower Customer Impact

Page 32: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

“Run Experiments in Production”

Maintain Authenticity

Page 33: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Toxiproxy

Page 34: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Development and Test Environment

Inject failures via HTTP API.

Page 35: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Development and Test Environment

Latency of 200 ms.

Page 36: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Development and Test Environment

Blackhole data.

Page 37: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Development and Test Environment

Reject connections.

Page 38: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Reactive Testing

Page 39: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Incident

Root cause?

Ship fix!

Page 40: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Incident

Root cause?

Ship fix!

Does it work?

Regression?

Page 41: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

A flash sale takes down redis while a deploy is going out.

Page 42: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

A flash sale takes down redis while a deploy is going out.

Application boot relies on redis!?

Page 43: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

A flash sale takes down redis while a deploy is going out.

Remove the dependency!

Application boot relies on redis!?

Page 44: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

A flash sale takes down redis while a deploy is going out.

Remove the dependency!

Application boot relies on redis!?

Write a Toxiproxy test!

Page 45: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down
Page 46: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Accessible to All Developers ✅

Page 47: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

No Customer Impact ✅

Accessible to All Developers ✅

Page 48: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Maintain Authenticity ✅

No Customer Impact ✅

Accessible to All Developers ✅

Page 49: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Automatically Prevent Regression ✅

Maintain Authenticity ✅

No Customer Impact ✅

Accessible to All Developers ✅

Page 50: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

A few hundred Toxiproxy Tests

Page 51: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

All you need is a thin client library.

Java, Node, Python, PHP or write your own!

Page 52: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

I used it reactively just last week.

Page 53: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Proactive Testing

Page 54: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Resiliency MatrixSections

Services

Page 55: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Resiliency MatrixSections

Services

Page 56: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Resiliency MatrixSections

Services

Page 57: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down
Page 58: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Gameday

What broke?Root cause?

Ship fix and Toxiproxy test!

Incident

Ship fix and Toxiproxy test!

Create Resiliency Matrix

Test every intersection.

Page 59: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

What’s next?

Page 60: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

All our applications should have a resiliency matrices.

Page 61: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Integrate Toxiproxy into all our applications by default.

Page 62: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Gameday everything we can’t write Toxiproxy tests for.

Page 63: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Automate the gamedays.

Page 64: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Gamedays

Page 65: Resiliency Testing with Toxiproxy · A flash sale takes down redis while a deploy is going out. Remove the dependency! Application boot relies on redis!? A flash sale takes down

Toxiproxy is open source.(github.com/Shopify/toxiproxy)

Go read the readme for more information!

Thanks!