Top Banner
Embracing Failure (not my life story)
25

Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

Jun 01, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

Embracing Failure(not my life story)

Page 2: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.
Page 3: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.
Page 4: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

Setting the Mood•Understand that they WILL

happen •Failures are not binary

•Impact determines importance •deadlines for fixes are variable

Page 5: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

Terminology

•Website •Production •Downtime

Page 6: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

Monitor Failures

Page 7: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

What is Monitoring?•Graphs. Everywhere. •Alerts on failures

•phone calls •texts

•Answers: Are we failing?

Page 8: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.
Page 9: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

healthcare.gov

•Know when you’re down before CNN

Page 10: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.
Page 11: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

Postmortems(fool me once. shame on you.

fool me twice. shame on me.)

Page 12: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

Postmortems

1. Reconstruct the factual timeline

2. Root cause analysis

3. Remediation items

Page 13: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

Postmortems

•Why did we fail? •Blameless •Moderated

Page 14: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

Gamedays(You wouldn’t wing a talk.

Don’t wing a hot fix)

Page 15: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

Gameday

•Best defense is a good offense

•Simulate possible failures •Do it in production

Page 16: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

kill -9

1. Draw a block diagram

2. Cut every connection

3. Watch the fireworks

Page 17: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

SafeMachine(like a state machine … but safer)

Page 18: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

Try, Try, Try again•What if we could just retry

failures? •Side effects are the root of all

evil •Safe failures vs Unsafe failures

Page 19: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

What’s in a SafeMachine

•Actions •States

START Computed File

Uploaded File END

compute uploadrecord

successful

Page 20: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

initialize_succeeded

initialize_failed

initialize_inprogress

computed_succeeded

Page 21: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

START

a1

a1

a2

a2

a2

a3

a3

a3

END

The Pipeline

Page 22: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

The Pipeline

START Computed File

Uploaded File END

Safe Unsafe Safe

Page 23: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

Embracing Failure•Monitor •Postmortems •Gamedays - you wouldn’t

wing a talk? •SafeMachine

Page 24: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

@chriswu_

Page 25: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

Additional resources

• Postmortems https://codeascraft.com/2012/05/22/blameless-postmortems/

• Gamedays - https://stripe.com/blog/game-day-exercises-at-stripe

• links at the bottom of this post are also great

• Error Tracking - https://getsentry.com/welcome/