Top Banner
architecting for failure building fault-tolerant systems Jakub Derda Warsaw, 2015
23

4Developers 2015: Designing for failure - architecting fault-tolerant system - Jakub Derda

Jul 16, 2015

Download

Software

PROIDEA
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 4Developers 2015: Designing for failure - architecting fault-tolerant system - Jakub Derda

architecting for failure building fault-tolerant systems

Jakub Derda

Warsaw, 2015

Page 2: 4Developers 2015: Designing for failure - architecting fault-tolerant system - Jakub Derda

‘Tree’ component – overview

Page 3: 4Developers 2015: Designing for failure - architecting fault-tolerant system - Jakub Derda

‘Tree’ component – detailed view

Page 4: 4Developers 2015: Designing for failure - architecting fault-tolerant system - Jakub Derda

‘Tree’ component – detailed view

client

network connection

sever

Page 5: 4Developers 2015: Designing for failure - architecting fault-tolerant system - Jakub Derda

‘Tree’ component – detailed view

human factor software client library

ISP protocol stack network

load balancers OS power source

client

network connection

sever

Page 6: 4Developers 2015: Designing for failure - architecting fault-tolerant system - Jakub Derda

Your component – detailed viewWhat is a fault?

Page 7: 4Developers 2015: Designing for failure - architecting fault-tolerant system - Jakub Derda

What is not a fault?

Service is not working

on our side*

* Caused by e.g. technical failures, outages, corrupted data, attacks

Page 8: 4Developers 2015: Designing for failure - architecting fault-tolerant system - Jakub Derda

What is a fault?

The real fault is when we don’t

deliver valueto customers.

Page 9: 4Developers 2015: Designing for failure - architecting fault-tolerant system - Jakub Derda

Value delivering without working system

Bring your own wine, we’re waiting for license.Last election in Poland

Page 10: 4Developers 2015: Designing for failure - architecting fault-tolerant system - Jakub Derda

What fault-tolerance is not?

It’s NOT making sure your system

never goes down.

It (eventually) will.

Page 11: 4Developers 2015: Designing for failure - architecting fault-tolerant system - Jakub Derda

What is a fault-tolerance?

It’s making sure that system can

quickly recover and/or

client is not impacted.

Page 12: 4Developers 2015: Designing for failure - architecting fault-tolerant system - Jakub Derda

How to solve it?

Page 13: 4Developers 2015: Designing for failure - architecting fault-tolerant system - Jakub Derda

Solving – redundancy

Hot/warm replicas

Caches

Geographical distribution, CDNs

Hardware redundancy

Alternative systems and procedures

Page 14: 4Developers 2015: Designing for failure - architecting fault-tolerant system - Jakub Derda

Solving – design

Stateless

Auditing

Idempotent requests

Uniqueness / randomness

Asynchronous and decoupling

EIPs

Commands, not data

Break the rules

Page 15: 4Developers 2015: Designing for failure - architecting fault-tolerant system - Jakub Derda

Solving – procedures

Backup creation, cleanup and restore

QA & potential problems

Continuous integration

Deployment

Page 16: 4Developers 2015: Designing for failure - architecting fault-tolerant system - Jakub Derda

Solving – observe

Dive deep, post-mortems

Identify bottlenecks

Observe key metrics

Verify assumptions

Predict traffic

Page 17: 4Developers 2015: Designing for failure - architecting fault-tolerant system - Jakub Derda

Tradeoffs - simple

1/scope

QUALITY

Page 18: 4Developers 2015: Designing for failure - architecting fault-tolerant system - Jakub Derda

Tradeoffs - real

cost

durability

time

consistency

trust

audit (traceability)

complexity

security

scalability

functionalitystability

reliability

extensibility

performance

maintainability

manageability

Page 19: 4Developers 2015: Designing for failure - architecting fault-tolerant system - Jakub Derda

Summary

Learn to live with

crashes

Page 20: 4Developers 2015: Designing for failure - architecting fault-tolerant system - Jakub Derda

Summary

Automate

procedures

Page 21: 4Developers 2015: Designing for failure - architecting fault-tolerant system - Jakub Derda

Summary

Don’t be afraid to

cross the line

Page 22: 4Developers 2015: Designing for failure - architecting fault-tolerant system - Jakub Derda

Fault tolerance is not a

property of a design,

it’s a process.

Page 23: 4Developers 2015: Designing for failure - architecting fault-tolerant system - Jakub Derda