Top Banner
Akka Cluster and Beyond Jonas Bonér CTO Typesafe @jboner The Road to
298

The Road to Akka Cluster and Beyond

Sep 08, 2014

Download

Technology

Jonas Bonér

Today, the skills of writing distributed applications is both more important and at the same time more challenging than ever. With the advent of mobile devices, NoSQL databases, cloud services etc. you most likely already have a distributed system at your hands—whether you like it or not. Distributed computing is the new norm.

In this talk we will take you on a journey across the distributed computing landscape. We will start with walking through some of the early work in computer architecture—setting the stage for what we are doing today. Then continue through distributed computing, discussing things like important Impossibility Theorems (FLP, CAP), Consensus Protocols (Raft, HAT, Epidemic Gossip etc.), Failure Detection (Accrual, Byzantine etc.), up to today’s very exciting research in the field, like ACID 2.0, Disorderly Programming (CRDTs, CALM etc).

Along the way we will discuss the decisions and trade-offs that were made when creating Akka Cluster, its theoretical foundation, why it is designed the way it is and what the future holds.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Road to Akka Cluster and Beyond

Akka Cluster and Beyond

Jonas Bonér CTO Typesafe

@jboner

The Road to

Page 2: The Road to Akka Cluster and Beyond

1. Distributed System is the New Normal 2. Theoretical Models for Computation 3. State of the Art in Distributed Systems 4. The Akka Way

The JOURNEY

Page 3: The Road to Akka Cluster and Beyond

What is a Distributed

System?

Page 4: The Road to Akka Cluster and Beyond

What is a

and Why would You Need one?

Distributed

System?

Page 5: The Road to Akka Cluster and Beyond

Distributed Computing is the New

normal

Page 6: The Road to Akka Cluster and Beyond

Distributed Computing is the New

normal

you already have a distributed system, WHETHER you want it or not

Page 7: The Road to Akka Cluster and Beyond

Distributed Computing is the New

normal

you already have a distributed system, WHETHER you want it or not

MobileNOSQL Databases

Cloud ServicesSQL Replication

Page 8: The Road to Akka Cluster and Beyond

essence of distributed computing?

What is the

Page 9: The Road to Akka Cluster and Beyond

essence of distributed computing?

overcome• Information travels at

the speed of light• Independent things

fail independently

What is the It’s to try to

Page 10: The Road to Akka Cluster and Beyond

Why do we need it?

Page 11: The Road to Akka Cluster and Beyond

Why do we need it?

Scalability When you outgrow

the resources of a single node

Page 12: The Road to Akka Cluster and Beyond

Why do we need it?

Scalability When you outgrow

the resources of a single node

Availability Providing resilience if one node fails

Page 13: The Road to Akka Cluster and Beyond

Why do we need it?

Scalability When you outgrow

the resources of a single node

Availability Providing resilience if one node fails

Rich stateful clients

Page 14: The Road to Akka Cluster and Beyond

The Problem?

Page 15: The Road to Akka Cluster and Beyond

It is still Very Hard

The Problem?

Page 16: The Road to Akka Cluster and Beyond

NO DIFFERENCEBetween a

Slow NODE and a

Dead NODE

Page 17: The Road to Akka Cluster and Beyond

The network is

Inherently Unreliable

Page 18: The Road to Akka Cluster and Beyond

Fallacies

Peter Deutsch’s 8 Fallacies

of Distributed

Computing

Page 19: The Road to Akka Cluster and Beyond

Fallacies1. The network is reliable

Peter Deutsch’s 8 Fallacies

of Distributed

Computing

Page 20: The Road to Akka Cluster and Beyond

Fallacies1. The network is reliable2. Latency is zeroPeter Deutsch’s

8 Fallacies of

Distributed Computing

Page 21: The Road to Akka Cluster and Beyond

Fallacies1. The network is reliable2. Latency is zero3. Bandwidth is infinite

Peter Deutsch’s 8 Fallacies

of Distributed

Computing

Page 22: The Road to Akka Cluster and Beyond

Fallacies1. The network is reliable2. Latency is zero3. Bandwidth is infinite4. The network is secure

Peter Deutsch’s 8 Fallacies

of Distributed

Computing

Page 23: The Road to Akka Cluster and Beyond

Fallacies1. The network is reliable2. Latency is zero3. Bandwidth is infinite4. The network is secure5. Topology doesn't change

Peter Deutsch’s 8 Fallacies

of Distributed

Computing

Page 24: The Road to Akka Cluster and Beyond

Fallacies1. The network is reliable2. Latency is zero3. Bandwidth is infinite4. The network is secure5. Topology doesn't change6. There is one administrator

Peter Deutsch’s 8 Fallacies

of Distributed

Computing

Page 25: The Road to Akka Cluster and Beyond

Fallacies1. The network is reliable2. Latency is zero3. Bandwidth is infinite4. The network is secure5. Topology doesn't change6. There is one administrator7. Transport cost is zero

Peter Deutsch’s 8 Fallacies

of Distributed

Computing

Page 26: The Road to Akka Cluster and Beyond

Fallacies1. The network is reliable2. Latency is zero3. Bandwidth is infinite4. The network is secure5. Topology doesn't change6. There is one administrator7. Transport cost is zero8. The network is homogeneous

Peter Deutsch’s 8 Fallacies

of Distributed

Computing

Page 27: The Road to Akka Cluster and Beyond

So, yes…

Page 28: The Road to Akka Cluster and Beyond

It is still Very Hard

So, yes…

Page 29: The Road to Akka Cluster and Beyond

1. Guaranteed Delivery 2. Synchronous RPC 3. Distributed Objects 4. Distributed Shared Mutable State 5. Serializable Distributed Transactions

Graveyard of distributed systems

Page 30: The Road to Akka Cluster and Beyond

Partition for scale

Replicate for resilience

General strategies

Divide & Conquer

Page 31: The Road to Akka Cluster and Beyond

Requires

Loose Coupling

General strategies

Location Transparency

Asynchronous Communication

Local State

Page 32: The Road to Akka Cluster and Beyond

theoretical

Models

Page 33: The Road to Akka Cluster and Beyond
Page 34: The Road to Akka Cluster and Beyond

Lambda CalculusAlonzo Church 1930

Page 35: The Road to Akka Cluster and Beyond

Lambda Calculus

state Immutable state Managed through functional application Referential transparent

Alonzo Church 1930

Page 36: The Road to Akka Cluster and Beyond

order

β-reduction—can be performed in any order

Normal order Applicative order

Call-by-name order Call-by-value order Call-by-need order

Lambda Calculus

state Immutable state Managed through functional application Referential transparent

Alonzo Church 1930

Page 37: The Road to Akka Cluster and Beyond

Even in parallel

order

β-reduction—can be performed in any order

Normal order Applicative order

Call-by-name order Call-by-value order Call-by-need order

Lambda Calculus

state Immutable state Managed through functional application Referential transparent

Alonzo Church 1930

Page 38: The Road to Akka Cluster and Beyond

Even in parallel

order

β-reduction—can be performed in any order

Normal order Applicative order

Call-by-name order Call-by-value order Call-by-need order

Lambda Calculus

state Immutable state Managed through functional application Referential transparent

Alonzo Church 1930

Great foundation

for concurrent

systems

Page 39: The Road to Akka Cluster and Beyond
Page 40: The Road to Akka Cluster and Beyond

Memory

Control Unit Arithmetic Logic Unit

Input Output

Accumulator

Von neumann machineJohn von Neumann 1945

Page 41: The Road to Akka Cluster and Beyond

Von neumann machineJohn von Neumann 1945

Page 42: The Road to Akka Cluster and Beyond

Von neumann machine

state Mutable state In-place updates

John von Neumann 1945

Page 43: The Road to Akka Cluster and Beyond

order Total order

List of instructions Array of memory

Von neumann machine

state Mutable state In-place updates

John von Neumann 1945

Page 44: The Road to Akka Cluster and Beyond

order Total order

List of instructions Array of memory

Von neumann machine

state Mutable state In-place updates

John von Neumann 1945

Concurrency

Does Not

Work Well

Page 45: The Road to Akka Cluster and Beyond

order Total order

List of instructions Array of memory

Von neumann machine

state Mutable state In-place updates

John von Neumann 1945

Concurrency

Does Not

Work Well

Distribution Does Not Work Well

Page 46: The Road to Akka Cluster and Beyond
Page 47: The Road to Akka Cluster and Beyond

transactionsJim Gray 1981

Page 48: The Road to Akka Cluster and Beyond

transactions

state Isolation of updates Atomicity

Jim Gray 1981

Page 49: The Road to Akka Cluster and Beyond

order Serializability

Disorder across transactions

Illusion of order within transactions

transactions

state Isolation of updates Atomicity

Jim Gray 1981

Page 50: The Road to Akka Cluster and Beyond

order Serializability

Disorder across transactions

Illusion of order within transactions

transactions

state Isolation of updates Atomicity

Jim Gray 1981

Concurrency

Works

Work Well

Page 51: The Road to Akka Cluster and Beyond

order Serializability

Disorder across transactions

Illusion of order within transactions

transactions

state Isolation of updates Atomicity

Jim Gray 1981

Concurrency

Works

Work Well

Distribution Does Not

Work Well

Page 52: The Road to Akka Cluster and Beyond

A model for distributed Computation

Should Allow

explicit reasoning

abouT

1. Concurrency 2. Distribution 3. Mobility

Carlos Varela 2013

Page 53: The Road to Akka Cluster and Beyond
Page 54: The Road to Akka Cluster and Beyond

actorsCarl HEWITT 1973

Page 55: The Road to Akka Cluster and Beyond

actors

state Share nothing Atomicity within the actor using Lambda Calculus

Carl HEWITT 1973

Page 56: The Road to Akka Cluster and Beyond

order Async message passing

Non-determinism in message delivery

actors

state Share nothing Atomicity within the actor using Lambda Calculus

Carl HEWITT 1973

Page 57: The Road to Akka Cluster and Beyond

order Async message passing

Non-determinism in message delivery

actors

state Share nothing Atomicity within the actor using Lambda Calculus

Carl HEWITT 1973

Concurrency

Works Very well

Page 58: The Road to Akka Cluster and Beyond

order Async message passing

Non-determinism in message delivery

actors

state Share nothing Atomicity within the actor using Lambda Calculus

Carl HEWITT 1973

Concurrency

Works Very wellDistribution Works Very Well

Page 59: The Road to Akka Cluster and Beyond

other interesting models

That are suitable for distributed

systems

1. Pi Calculus 2. Ambient Calculus 3. Join Calculus

Page 60: The Road to Akka Cluster and Beyond

state of the

The Art

Page 61: The Road to Akka Cluster and Beyond

Impossibility Theorems

Page 62: The Road to Akka Cluster and Beyond

Impossibility of Distributed Consensus with One Faulty Process

Page 63: The Road to Akka Cluster and Beyond

Impossibility of Distributed Consensus with One Faulty Process

FLPFischer

Lynch Paterson

1985

Page 64: The Road to Akka Cluster and Beyond

Impossibility of Distributed Consensus with One Faulty Process

FLP “The FLP result shows that inan asynchronous setting,where only one processormight crash, there is nodistributed algorithm thatsolves the consensus problem”- The Paper Trail

Fischer

Lynch Paterson

1985

Page 65: The Road to Akka Cluster and Beyond

Impossibility of Distributed Consensus with One Faulty Process

FLP “The FLP result shows that inan asynchronous setting,where only one processormight crash, there is nodistributed algorithm thatsolves the consensus problem”- The Paper Trail

Fischer

Lynch Paterson

1985

Page 66: The Road to Akka Cluster and Beyond

Impossibility of Distributed Consensus with One Faulty Process

FLPFischer

Lynch Paterson

1985

Page 67: The Road to Akka Cluster and Beyond

Impossibility of Distributed Consensus with One Faulty Process

FLP “These results do not show thatsuch problems cannot be‘solved’ in practice; rather, theypoint up the need for morerefined models of distributedcomputing” - FLP paper

Fischer

Lynch Paterson

1985

Page 68: The Road to Akka Cluster and Beyond

CAPTheorem

Page 69: The Road to Akka Cluster and Beyond

Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services

CAPTheorem

Page 70: The Road to Akka Cluster and Beyond

Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services

Conjecture by Eric Brewer 2000 Proof by Lynch & Gilbert 2002CAP

Theorem

Page 71: The Road to Akka Cluster and Beyond

Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services

Conjecture by Eric Brewer 2000 Proof by Lynch & Gilbert 2002

Linearizabilityis impossible

CAPTheorem

Page 72: The Road to Akka Cluster and Beyond

linearizability

Page 73: The Road to Akka Cluster and Beyond

linearizability“Under linearizable consistency, alloperations appear to have executed

atomically in an order that is consistent withthe global real-time ordering of operations.”

Herlihy & Wing 1991

Page 74: The Road to Akka Cluster and Beyond

linearizability“Under linearizable consistency, alloperations appear to have executed

atomically in an order that is consistent withthe global real-time ordering of operations.”

Herlihy & Wing 1991

Less formally: A read will return the last completed

write (made on any replica)

Page 75: The Road to Akka Cluster and Beyond

dissecting CAP

Page 76: The Road to Akka Cluster and Beyond

dissecting CAP1. Very influential—but very narrow scope

Page 77: The Road to Akka Cluster and Beyond

dissecting CAP1. Very influential—but very narrow scope2. “[CAP] has lead to confusion and misunderstandings

regarding replica consistency, transactional isolation and high availability” - Bailis et.al in HAT paper

Page 78: The Road to Akka Cluster and Beyond

dissecting CAP1. Very influential—but very narrow scope2. “[CAP] has lead to confusion and misunderstandings

regarding replica consistency, transactional isolation and high availability” - Bailis et.al in HAT paper

3. Linearizability is very often not required

Page 79: The Road to Akka Cluster and Beyond

dissecting CAP1. Very influential—but very narrow scope2. “[CAP] has lead to confusion and misunderstandings

regarding replica consistency, transactional isolation and high availability” - Bailis et.al in HAT paper

3. Linearizability is very often not required4. Ignores latency—but in practice latency & partitions are

deeply related

Page 80: The Road to Akka Cluster and Beyond

dissecting CAP1. Very influential—but very narrow scope2. “[CAP] has lead to confusion and misunderstandings

regarding replica consistency, transactional isolation and high availability” - Bailis et.al in HAT paper

3. Linearizability is very often not required4. Ignores latency—but in practice latency & partitions are

deeply related5. Partitions are rare—so why sacrifice C or A all the time?

Page 81: The Road to Akka Cluster and Beyond

dissecting CAP1. Very influential—but very narrow scope2. “[CAP] has lead to confusion and misunderstandings

regarding replica consistency, transactional isolation and high availability” - Bailis et.al in HAT paper

3. Linearizability is very often not required4. Ignores latency—but in practice latency & partitions are

deeply related5. Partitions are rare—so why sacrifice C or A all the time?6. Not black and white—can be fine-grained and dynamic

Page 82: The Road to Akka Cluster and Beyond

dissecting CAP1. Very influential—but very narrow scope2. “[CAP] has lead to confusion and misunderstandings

regarding replica consistency, transactional isolation and high availability” - Bailis et.al in HAT paper

3. Linearizability is very often not required4. Ignores latency—but in practice latency & partitions are

deeply related5. Partitions are rare—so why sacrifice C or A all the time?6. Not black and white—can be fine-grained and dynamic7. Read ‘CAP Twelve Years Later’ - Eric Brewer

Page 83: The Road to Akka Cluster and Beyond

consensus

Page 84: The Road to Akka Cluster and Beyond

consensus“The problem of reaching agreement

among remote processes is one of the most fundamental problems in distributed

computing and is at the core of many algorithms for distributed data processing,

distributed file management, and fault-tolerant distributed applications.”Fischer, Lynch & Paterson 1985

Page 85: The Road to Akka Cluster and Beyond

Consistency models

Page 86: The Road to Akka Cluster and Beyond

Consistency models

Strong

Page 87: The Road to Akka Cluster and Beyond

Consistency models

StrongWeak

Page 88: The Road to Akka Cluster and Beyond

Consistency models

StrongWeakEventual

Page 89: The Road to Akka Cluster and Beyond

Time & Order

Page 90: The Road to Akka Cluster and Beyond

Last write winsglobal clock timestamp

Page 91: The Road to Akka Cluster and Beyond

Last write winsglobal clock timestamp

Page 92: The Road to Akka Cluster and Beyond

lamport clockslogical clock causal consistency Leslie lamport 1978

Page 93: The Road to Akka Cluster and Beyond

lamport clockslogical clock causal consistency Leslie lamport 1978

1. When a process does work, increment the counter

Page 94: The Road to Akka Cluster and Beyond

lamport clockslogical clock causal consistency Leslie lamport 1978

1. When a process does work, increment the counter2. When a process sends a message, include the counter

Page 95: The Road to Akka Cluster and Beyond

lamport clockslogical clock causal consistency Leslie lamport 1978

1. When a process does work, increment the counter2. When a process sends a message, include the counter3. When a message is received, merge the counter

(set the counter to max(local, received) + 1)

Page 96: The Road to Akka Cluster and Beyond

vector clocksExtends lamport clocks colin fidge 1988

Page 97: The Road to Akka Cluster and Beyond

vector clocksExtends lamport clocks colin fidge 1988

1. Each node owns and increments its own Lamport Clock

Page 98: The Road to Akka Cluster and Beyond

vector clocksExtends lamport clocks colin fidge 1988

1. Each node owns and increments its own Lamport Clock2. Alway keep the full history of all increments

Page 99: The Road to Akka Cluster and Beyond

vector clocksExtends lamport clocks colin fidge 1988

1. Each node owns and increments its own Lamport Clock2. Alway keep the full history of all increments3. Merges by calculating the max—monotonic merge

Page 100: The Road to Akka Cluster and Beyond

Quorum

Page 101: The Road to Akka Cluster and Beyond

Quorum

Strict majority vote

Page 102: The Road to Akka Cluster and Beyond

Quorum

Strict majority vote

Sloppy partial vote

Page 103: The Road to Akka Cluster and Beyond

Quorum

Strict majority vote

Sloppy partial vote• Most use R + W > N ⇒ R & W overlap

Page 104: The Road to Akka Cluster and Beyond

Quorum

Strict majority vote

Sloppy partial vote• Most use R + W > N ⇒ R & W overlap

• If N / 2 + 1 is still alive ⇒ all good

Page 105: The Road to Akka Cluster and Beyond

Quorum

Strict majority vote

Sloppy partial vote• Most use R + W > N ⇒ R & W overlap

• If N / 2 + 1 is still alive ⇒ all good• Most use N ⩵ 3

Page 106: The Road to Akka Cluster and Beyond

failure Detection

Page 107: The Road to Akka Cluster and Beyond

Failure detectionFormal model

Page 108: The Road to Akka Cluster and Beyond

Strong completeness

Failure detectionFormal model

Page 109: The Road to Akka Cluster and Beyond

Strong completeness Every crashed process is eventually suspected by every correct process

Failure detectionFormal model

Page 110: The Road to Akka Cluster and Beyond

Strong completeness Every crashed process is eventually suspected by every correct process

Weak completeness

Failure detectionFormal model

Page 111: The Road to Akka Cluster and Beyond

Strong completeness Every crashed process is eventually suspected by every correct process

Weak completeness Every crashed process is eventually suspected by some correct process

Failure detectionFormal model

Page 112: The Road to Akka Cluster and Beyond

Strong completeness Every crashed process is eventually suspected by every correct process

Weak completeness Every crashed process is eventually suspected by some correct process

Strong accuracy

Failure detectionFormal model

Page 113: The Road to Akka Cluster and Beyond

Strong completeness Every crashed process is eventually suspected by every correct process

Weak completeness Every crashed process is eventually suspected by some correct process

Strong accuracy No correct process is suspected ever

Failure detectionFormal model

Page 114: The Road to Akka Cluster and Beyond

Strong completeness Every crashed process is eventually suspected by every correct process

Weak completeness Every crashed process is eventually suspected by some correct process

Strong accuracy No correct process is suspected ever

Failure detection

No false positives

Formal model

Page 115: The Road to Akka Cluster and Beyond

Strong completeness Every crashed process is eventually suspected by every correct process

Weak completeness Every crashed process is eventually suspected by some correct process

Strong accuracy No correct process is suspected ever

Weak accuracy

Failure detection

No false positives

Formal model

Page 116: The Road to Akka Cluster and Beyond

Strong completeness Every crashed process is eventually suspected by every correct process

Weak completeness Every crashed process is eventually suspected by some correct process

Strong accuracy No correct process is suspected ever

Weak accuracy Some correct process is never suspected

Failure detection

No false positives

Formal model

Page 117: The Road to Akka Cluster and Beyond

Strong completeness Every crashed process is eventually suspected by every correct process

Weak completeness Every crashed process is eventually suspected by some correct process

Strong accuracy No correct process is suspected ever

Weak accuracy Some correct process is never suspected

Failure detection

No false positives

Some false positives

Formal model

Page 118: The Road to Akka Cluster and Beyond

Accrual Failure detectorHayashibara et. al. 2004

Page 119: The Road to Akka Cluster and Beyond

Keeps history of heartbeat statistics

Accrual Failure detectorHayashibara et. al. 2004

Page 120: The Road to Akka Cluster and Beyond

Keeps history of heartbeat statisticsDecouples monitoring from interpretation

Accrual Failure detectorHayashibara et. al. 2004

Page 121: The Road to Akka Cluster and Beyond

Keeps history of heartbeat statisticsDecouples monitoring from interpretationCalculates a likelihood

(phi value) that the process is down

Accrual Failure detectorHayashibara et. al. 2004

Page 122: The Road to Akka Cluster and Beyond

Not YES or NO

Keeps history of heartbeat statisticsDecouples monitoring from interpretationCalculates a likelihood

(phi value) that the process is down

Accrual Failure detectorHayashibara et. al. 2004

Page 123: The Road to Akka Cluster and Beyond

Not YES or NO

Keeps history of heartbeat statisticsDecouples monitoring from interpretationCalculates a likelihood

(phi value) that the process is down

Accrual Failure detectorHayashibara et. al. 2004

Takes network hiccups into account

Page 124: The Road to Akka Cluster and Beyond

Not YES or NO

Keeps history of heartbeat statisticsDecouples monitoring from interpretationCalculates a likelihood

(phi value) that the process is down

Accrual Failure detectorHayashibara et. al. 2004

Takes network hiccups into account

phi = -log10(1 - F(timeSinceLastHeartbeat)) F is the cumulative distribution function of a normal distribution with mean and standard deviation estimated from historical heartbeat inter-arrival times

Page 125: The Road to Akka Cluster and Beyond

SWIM Failure detectordas et. al. 2002

Page 126: The Road to Akka Cluster and Beyond

SWIM Failure detectordas et. al. 2002

Separates cluster dissemination from heartbeats

Page 127: The Road to Akka Cluster and Beyond

SWIM Failure detectordas et. al. 2002

Separates cluster dissemination from heartbeatsQuarantine: suspected ⇒ time window ⇒ faulty

Page 128: The Road to Akka Cluster and Beyond

SWIM Failure detectordas et. al. 2002

Separates cluster dissemination from heartbeatsQuarantine: suspected ⇒ time window ⇒ faulty

Delegated heartbeat:

Page 129: The Road to Akka Cluster and Beyond

SWIM Failure detectordas et. al. 2002

Separates cluster dissemination from heartbeatsQuarantine: suspected ⇒ time window ⇒ faulty

Delegated heartbeat:1. Node N picks random member M ⇒ sends PING(M)

Page 130: The Road to Akka Cluster and Beyond

SWIM Failure detectordas et. al. 2002

Separates cluster dissemination from heartbeatsQuarantine: suspected ⇒ time window ⇒ faulty

Delegated heartbeat:1. Node N picks random member M ⇒ sends PING(M)

2. Then either:

Page 131: The Road to Akka Cluster and Beyond

SWIM Failure detectordas et. al. 2002

Separates cluster dissemination from heartbeatsQuarantine: suspected ⇒ time window ⇒ faulty

Delegated heartbeat:1. Node N picks random member M ⇒ sends PING(M)

2. Then either:1. M replies with ACK(M), or

Page 132: The Road to Akka Cluster and Beyond

SWIM Failure detectordas et. al. 2002

Separates cluster dissemination from heartbeatsQuarantine: suspected ⇒ time window ⇒ faulty

Delegated heartbeat:1. Node N picks random member M ⇒ sends PING(M)

2. Then either:1. M replies with ACK(M), or2. We get a timeout, then

Page 133: The Road to Akka Cluster and Beyond

SWIM Failure detectordas et. al. 2002

Separates cluster dissemination from heartbeatsQuarantine: suspected ⇒ time window ⇒ faulty

Delegated heartbeat:1. Node N picks random member M ⇒ sends PING(M)

2. Then either:1. M replies with ACK(M), or2. We get a timeout, then

1. N sends PING(M) to a set of random members Rn

Page 134: The Road to Akka Cluster and Beyond

SWIM Failure detectordas et. al. 2002

Separates cluster dissemination from heartbeatsQuarantine: suspected ⇒ time window ⇒ faulty

Delegated heartbeat:1. Node N picks random member M ⇒ sends PING(M)

2. Then either:1. M replies with ACK(M), or2. We get a timeout, then

1. N sends PING(M) to a set of random members Rn2. Each Rn sends PING(M) to M

Page 135: The Road to Akka Cluster and Beyond

SWIM Failure detectordas et. al. 2002

Separates cluster dissemination from heartbeatsQuarantine: suspected ⇒ time window ⇒ faulty

Delegated heartbeat:1. Node N picks random member M ⇒ sends PING(M)

2. Then either:1. M replies with ACK(M), or2. We get a timeout, then

1. N sends PING(M) to a set of random members Rn2. Each Rn sends PING(M) to M3. On receipt of ACK(M) Rn forwards it to N

Page 136: The Road to Akka Cluster and Beyond

byzantine Failure detectorliskov et. al. 1999

Page 137: The Road to Akka Cluster and Beyond

Supports

misbehaving

processes

byzantine Failure detectorliskov et. al. 1999

Page 138: The Road to Akka Cluster and Beyond

Supports

misbehaving

processes

byzantine Failure detectorliskov et. al. 1999

Omission failures

Page 139: The Road to Akka Cluster and Beyond

Supports

misbehaving

processes

byzantine Failure detectorliskov et. al. 1999

Omission failures Crash failures, failing to receive a request, or failing to send a response

Page 140: The Road to Akka Cluster and Beyond

Supports

misbehaving

processes

byzantine Failure detectorliskov et. al. 1999

Omission failures Crash failures, failing to receive a request, or failing to send a response

Commission failures

Page 141: The Road to Akka Cluster and Beyond

Supports

misbehaving

processes

byzantine Failure detectorliskov et. al. 1999

Omission failures Crash failures, failing to receive a request, or failing to send a response

Commission failures Processing a request incorrectly, corrupting local state, and/or sending an incorrect or inconsistent response to a request

Page 142: The Road to Akka Cluster and Beyond

Supports

misbehaving

processes

byzantine Failure detectorliskov et. al. 1999

Omission failures Crash failures, failing to receive a request, or failing to send a response

Commission failures Processing a request incorrectly, corrupting local state, and/or sending an incorrect or inconsistent response to a request

Very expensive, not practical

Page 143: The Road to Akka Cluster and Beyond

Supports

misbehaving

processes

byzantine Failure detectorliskov et. al. 1999

Omission failures Crash failures, failing to receive a request, or failing to send a response

Commission failures Processing a request incorrectly, corrupting local state, and/or sending an incorrect or inconsistent response to a request

Very expensive, not practical

Page 144: The Road to Akka Cluster and Beyond

replication

Page 145: The Road to Akka Cluster and Beyond

Active (Push) !

Asynchronous

Types of replication

Passive (Pull) !

Synchronous

VS

VS

Page 146: The Road to Akka Cluster and Beyond

master/slave Replication

Page 147: The Road to Akka Cluster and Beyond

Tree replication

Page 148: The Road to Akka Cluster and Beyond

master/master Replication

Page 149: The Road to Akka Cluster and Beyond

buddy Replication

Page 150: The Road to Akka Cluster and Beyond

buddy Replication

Page 151: The Road to Akka Cluster and Beyond

analysis of replication consensus

strategies

Ryan Barrett 2009

Page 152: The Road to Akka Cluster and Beyond

Strong Consistency

Page 153: The Road to Akka Cluster and Beyond

“Immutability Changes Everything” - Pat Helland

Immutable Data

Immutability

Share Nothing Architecture

Page 154: The Road to Akka Cluster and Beyond

“Immutability Changes Everything” - Pat Helland

Immutable Data

Immutability

Share Nothing Architecture

TRUE ScalabilityIs the path towards

Page 155: The Road to Akka Cluster and Beyond

"The database is a cache of a subset of the log” - Pat Helland

Think In Facts

Page 156: The Road to Akka Cluster and Beyond

"The database is a cache of a subset of the log” - Pat Helland

Think In Facts

Never delete data Knowledge only grows  Append-Only Event Log Use Event Sourcing and/or CQRS

Page 157: The Road to Akka Cluster and Beyond

Aggregate Roots Can wrap multiple Entities

Aggregate Root is the Transactional Boundary

Page 158: The Road to Akka Cluster and Beyond

Aggregate Roots Can wrap multiple Entities

Strong Consistency Within Aggregate Eventual Consistency Between Aggregates

Aggregate Root is the Transactional Boundary

Page 159: The Road to Akka Cluster and Beyond

Aggregate Roots Can wrap multiple Entities

Strong Consistency Within Aggregate Eventual Consistency Between Aggregates

Aggregate Root is the Transactional Boundary

No limit to scalability 

Page 160: The Road to Akka Cluster and Beyond

Distributed transactions Strikes Back

Page 161: The Road to Akka Cluster and Beyond

Highly Available TransactionsPeter Bailis et. al. 2013

CAPHAT

NOT

Page 162: The Road to Akka Cluster and Beyond

Executive Summary

Highly Available TransactionsPeter Bailis et. al. 2013

CAPHAT

NOT

Page 163: The Road to Akka Cluster and Beyond

Executive Summary• Most SQL DBs do not provide Serializability,

but weaker guarantees—for performance reasons (including Oracle 11g)

Highly Available TransactionsPeter Bailis et. al. 2013

CAPHAT

NOT

Page 164: The Road to Akka Cluster and Beyond

Executive Summary• Most SQL DBs do not provide Serializability,

but weaker guarantees—for performance reasons (including Oracle 11g)

• Some weaker transaction guarantees are possible to implement in a HA manner

Highly Available TransactionsPeter Bailis et. al. 2013

CAPHAT

NOT

Page 165: The Road to Akka Cluster and Beyond

Executive Summary• Most SQL DBs do not provide Serializability,

but weaker guarantees—for performance reasons (including Oracle 11g)

• Some weaker transaction guarantees are possible to implement in a HA manner

• What transaction semantics can be provided with HA?

Highly Available TransactionsPeter Bailis et. al. 2013

CAPHAT

NOT

Page 166: The Road to Akka Cluster and Beyond

HAT

Page 167: The Road to Akka Cluster and Beyond

Unavailable • Serializable • Snapshot Isolation • Repeatable Read • Cursor Stability • etc.

Highly Available • Read Committed • Read Uncommitted • Read Your Writes • Monotonic Atomic View • Monotonic Read/Write • etc.

HAT

Page 168: The Road to Akka Cluster and Beyond

Other scalable or Highly Available Transactional Research

Page 169: The Road to Akka Cluster and Beyond

Other scalable or Highly Available Transactional Research

Bolt-On Consistency Bailis et. al. 2013

Page 170: The Road to Akka Cluster and Beyond

Other scalable or Highly Available Transactional Research

Bolt-On Consistency Bailis et. al. 2013

Calvin Thompson et. al. 2012

Page 171: The Road to Akka Cluster and Beyond

Other scalable or Highly Available Transactional Research

Bolt-On Consistency Bailis et. al. 2013

Calvin Thompson et. al. 2012

Spanner (Google) Corbett et. al. 2012

Page 172: The Road to Akka Cluster and Beyond

consensus Protocols

Page 173: The Road to Akka Cluster and Beyond

Specification

Page 174: The Road to Akka Cluster and Beyond

Events 1. Request(v) 2. Decide(v)

Specification

Page 175: The Road to Akka Cluster and Beyond

Events 1. Request(v) 2. Decide(v)

Specification

Properties

Page 176: The Road to Akka Cluster and Beyond

Events 1. Request(v) 2. Decide(v)

Specification

Properties1. Termination: every process eventually decides on a value v

Page 177: The Road to Akka Cluster and Beyond

Events 1. Request(v) 2. Decide(v)

Specification

Properties1. Termination: every process eventually decides on a value v2. Validity: if a process decides v, then v was proposed by

some process

Page 178: The Road to Akka Cluster and Beyond

Events 1. Request(v) 2. Decide(v)

Specification

Properties1. Termination: every process eventually decides on a value v2. Validity: if a process decides v, then v was proposed by

some process3. Integrity: no process decides twice

Page 179: The Road to Akka Cluster and Beyond

Events 1. Request(v) 2. Decide(v)

Specification

Properties1. Termination: every process eventually decides on a value v2. Validity: if a process decides v, then v was proposed by

some process3. Integrity: no process decides twice4. Agreement: no two correct processes decide differently

Page 180: The Road to Akka Cluster and Beyond

Consensus Algorithms

CAP

Page 181: The Road to Akka Cluster and Beyond

Consensus AlgorithmsStrong consistency

CAP

Page 182: The Road to Akka Cluster and Beyond

Consensus AlgorithmsStrong consistencyPartition tolerant if replicas > N/2-1

CAP

Page 183: The Road to Akka Cluster and Beyond

Consensus AlgorithmsStrong consistencyPartition tolerant if replicas > N/2-1

Dynamic master

CAP

Page 184: The Road to Akka Cluster and Beyond

Consensus AlgorithmsStrong consistencyPartition tolerant if replicas > N/2-1

Dynamic masterHigh latency

CAP

Page 185: The Road to Akka Cluster and Beyond

Consensus AlgorithmsStrong consistencyPartition tolerant if replicas > N/2-1

Dynamic masterHigh latencyMedium throughput

CAP

Page 186: The Road to Akka Cluster and Beyond

Consensus Algorithms

Paxos Lamport 1989

Strong consistencyPartition tolerant if replicas > N/2-1

Dynamic masterHigh latencyMedium throughput

CAP

Page 187: The Road to Akka Cluster and Beyond

Consensus Algorithms

Paxos Lamport 1989

ZAB reed & junquiera 2008

Strong consistencyPartition tolerant if replicas > N/2-1

Dynamic masterHigh latencyMedium throughput

CAP

Page 188: The Road to Akka Cluster and Beyond

Consensus Algorithms

Paxos Lamport 1989

ZAB reed & junquiera 2008

Raft ongaro & ousterhout 2013

Strong consistencyPartition tolerant if replicas > N/2-1

Dynamic masterHigh latencyMedium throughput

CAP

Page 189: The Road to Akka Cluster and Beyond

eventual Consistency

Page 190: The Road to Akka Cluster and Beyond

Dynamo

VerYinfluential

CAP

Vogels et. al. 2007

Page 191: The Road to Akka Cluster and Beyond

Dynamo

Popularized • Eventual consistency • Epidemic gossip • Consistent hashing • Sloppy quorum/Hinted handoff • Read repair • Anti-Entropy W/ Merkle trees

VerYinfluential

CAP

Vogels et. al. 2007

Page 192: The Road to Akka Cluster and Beyond

Consistent HashingKarger et. al. 1997

Page 193: The Road to Akka Cluster and Beyond

Consistent Hashing

Support elasticity—easier to scale up and down Avoids hotspots Enables partitioning and replication

Karger et. al. 1997

Page 194: The Road to Akka Cluster and Beyond

Consistent Hashing

Support elasticity—easier to scale up and down Avoids hotspots Enables partitioning and replication

Karger et. al. 1997

Only K/N nodes needs to be remapped when adding or removing a node (K=#keys, N=#nodes)

Page 195: The Road to Akka Cluster and Beyond

disorderly Programming

Page 196: The Road to Akka Cluster and Beyond

ACID 2.0

Page 197: The Road to Akka Cluster and Beyond

ACID 2.0Associative Batch-insensitive (grouping doesn't matter) a+(b+c)=(a+b)+c

Page 198: The Road to Akka Cluster and Beyond

ACID 2.0Associative Batch-insensitive (grouping doesn't matter) a+(b+c)=(a+b)+c

Commutative Order-insensitive (order doesn't matter) a+b=b+a

Page 199: The Road to Akka Cluster and Beyond

ACID 2.0Associative Batch-insensitive (grouping doesn't matter) a+(b+c)=(a+b)+c

Commutative Order-insensitive (order doesn't matter) a+b=b+a

Idempotent Retransmission-insensitive (duplication does not matter) a+a=a

Page 200: The Road to Akka Cluster and Beyond

ACID 2.0Associative Batch-insensitive (grouping doesn't matter) a+(b+c)=(a+b)+c

Commutative Order-insensitive (order doesn't matter) a+b=b+a

Idempotent Retransmission-insensitive (duplication does not matter) a+a=a

Eventually Consistent

Page 201: The Road to Akka Cluster and Beyond

Convergent & Commutative Replicated Data Types

Shapiro et. al. 2011

Page 202: The Road to Akka Cluster and Beyond

Convergent & Commutative Replicated Data Types

CRDTShapiro et. al. 2011

Page 203: The Road to Akka Cluster and Beyond

Convergent & Commutative Replicated Data Types

CRDTShapiro et. al. 2011

Join Semilattice Monotonic merge function

Page 204: The Road to Akka Cluster and Beyond

Convergent & Commutative Replicated Data Types

Data types Counters Registers

Sets Maps

Graphs

CRDTShapiro et. al. 2011

Join Semilattice Monotonic merge function

Page 205: The Road to Akka Cluster and Beyond

Convergent & Commutative Replicated Data Types

Data types Counters Registers

Sets Maps

Graphs

CRDTCAP

Shapiro et. al. 2011

Join Semilattice Monotonic merge function

Page 206: The Road to Akka Cluster and Beyond

CRDT types

CvRDTConvergent State-based

CmRDTCommutative

Ops-based

Page 207: The Road to Akka Cluster and Beyond

CALM theoremConsistency As Logical Monotonicity

Hellerstein et. al. 2011

Page 208: The Road to Akka Cluster and Beyond

CALM theoremConsistency As Logical Monotonicity

Distributed Logic Datalog/Dedalus Monotonic functions Just add facts to the system Model state as Lattices Similar to CRDTs (without the scope problem)

Hellerstein et. al. 2011

Page 209: The Road to Akka Cluster and Beyond

CALM theoremConsistency As Logical Monotonicity

Distributed Logic Datalog/Dedalus Monotonic functions Just add facts to the system Model state as Lattices Similar to CRDTs (without the scope problem)

Hellerstein et. al. 2011

Bloom Language Compiler help to detect &

encapsulate non-monotonicity

Page 210: The Road to Akka Cluster and Beyond

How eventual is

Page 211: The Road to Akka Cluster and Beyond

How eventual is Eventual

consistency?

Page 212: The Road to Akka Cluster and Beyond

How eventual is

How consistent is

Eventual

consistency?

Page 213: The Road to Akka Cluster and Beyond

How eventual is

How consistent is

Eventual

consistency?

Probabilistically Bounded Staleness Peter Bailis et. al 2012

PBS

Page 214: The Road to Akka Cluster and Beyond

How eventual is

How consistent is

Eventual

consistency?

Probabilistically Bounded Staleness Peter Bailis et. al 2012

PBS

Page 215: The Road to Akka Cluster and Beyond

epidemic Gossip

Page 216: The Road to Akka Cluster and Beyond

Node ring & Epidemic Gossip

Popularized By

CHORD Stoica et al 2001

Page 217: The Road to Akka Cluster and Beyond

Node ring & Epidemic Gossip

Member Node

Member Node

Member Node

Member Node

Member Node

Member Node

Member Node

Member Node

Member Node

Member Node

Popularized By

CHORD Stoica et al 2001

Page 218: The Road to Akka Cluster and Beyond

Node ring & Epidemic Gossip

Member Node

Member Node

Member Node

Member Node

Member Node

Member Node

Member Node

Member Node

Member Node

Member Node

Popularized By

CHORD Stoica et al 2001

Page 219: The Road to Akka Cluster and Beyond

Node ring & Epidemic Gossip

Member Node

Member Node

Member Node

Member Node

Member Node

Member Node

Member Node

Member Node

Member Node

Member Node

Popularized By

CHORD Stoica et al 2001

Page 220: The Road to Akka Cluster and Beyond

Node ring & Epidemic Gossip

Member Node

Member Node

Member Node

Member Node

Member Node

Member Node

Member Node

Member Node

Member Node

Member Node

Popularized By

CHORD Stoica et al 2001

CAP

Page 221: The Road to Akka Cluster and Beyond

Decentralized P2P No SPOF or SPOB

Very Scalable

Fully Elastic

Benefits of Epidemic Gossip

!

Requires minimal administration Often used with VECTOR CLOCKS

Page 222: The Road to Akka Cluster and Beyond

1. Separation of failure detection heartbeat and dissemination of data - DAS et. al. 2002 (SWIM)

2. Push/Pull gossip - Khambatti et. al 2003 1. Hash and compare data 2. Use single hash or Merkle Trees

Some Standard Optimizations to Epidemic Gossip

Page 223: The Road to Akka Cluster and Beyond

The Akka Way

Page 224: The Road to Akka Cluster and Beyond

cluster membership in Akka

Page 225: The Road to Akka Cluster and Beyond

cluster membership in Akka• Dynamo-style master-less decentralized P2P

Page 226: The Road to Akka Cluster and Beyond

cluster membership in Akka• Dynamo-style master-less decentralized P2P

• Epidemic Gossip—Node Ring

Page 227: The Road to Akka Cluster and Beyond

cluster membership in Akka• Dynamo-style master-less decentralized P2P

• Epidemic Gossip—Node Ring

• Vector Clocks for causal consistency

Page 228: The Road to Akka Cluster and Beyond

cluster membership in Akka• Dynamo-style master-less decentralized P2P

• Epidemic Gossip—Node Ring

• Vector Clocks for causal consistency

• Fully elastic with no SPOF or SPOB

Page 229: The Road to Akka Cluster and Beyond

cluster membership in Akka• Dynamo-style master-less decentralized P2P

• Epidemic Gossip—Node Ring

• Vector Clocks for causal consistency

• Fully elastic with no SPOF or SPOB

• Very scalable—2400 nodes (on GCE)

Page 230: The Road to Akka Cluster and Beyond

cluster membership in Akka• Dynamo-style master-less decentralized P2P

• Epidemic Gossip—Node Ring

• Vector Clocks for causal consistency

• Fully elastic with no SPOF or SPOB

• Very scalable—2400 nodes (on GCE)

• High throughput—1000 nodes in 4 min (on GCE)

Page 231: The Road to Akka Cluster and Beyond

Node Lifecycle in Akka

Page 232: The Road to Akka Cluster and Beyond

StateGossip

GOSSIPING case class Gossip( members: SortedSet[Member], seen: Set[UniqueAddress], reachability: Reachability, version: VectorClock)

Page 233: The Road to Akka Cluster and Beyond

StateGossip

GOSSIPING case class Gossip( members: SortedSet[Member], seen: Set[UniqueAddress], reachability: Reachability, version: VectorClock)

Is a CRDT

Page 234: The Road to Akka Cluster and Beyond

StateGossip

GOSSIPING case class Gossip( members: SortedSet[Member], seen: Set[UniqueAddress], reachability: Reachability, version: VectorClock)

Is a CRDTOrdered node ring

Page 235: The Road to Akka Cluster and Beyond

StateGossip

GOSSIPING case class Gossip( members: SortedSet[Member], seen: Set[UniqueAddress], reachability: Reachability, version: VectorClock)

Is a CRDTOrdered node ring

Seen set for convergence

Page 236: The Road to Akka Cluster and Beyond

StateGossip

GOSSIPING case class Gossip( members: SortedSet[Member], seen: Set[UniqueAddress], reachability: Reachability, version: VectorClock)

Is a CRDTOrdered node ring

Seen set for convergence

Unreachable set

Page 237: The Road to Akka Cluster and Beyond

StateGossip

GOSSIPING case class Gossip( members: SortedSet[Member], seen: Set[UniqueAddress], reachability: Reachability, version: VectorClock)

Is a CRDTOrdered node ring

Seen set for convergence

Unreachable set

Version

Page 238: The Road to Akka Cluster and Beyond

StateGossip

GOSSIPING case class Gossip( members: SortedSet[Member], seen: Set[UniqueAddress], reachability: Reachability, version: VectorClock)

1. Picks random node with older/newer version

Is a CRDTOrdered node ring

Seen set for convergence

Unreachable set

Version

Page 239: The Road to Akka Cluster and Beyond

StateGossip

GOSSIPING case class Gossip( members: SortedSet[Member], seen: Set[UniqueAddress], reachability: Reachability, version: VectorClock)

1. Picks random node with older/newer version

2. Gossips in a request/reply conversational fashion

Is a CRDTOrdered node ring

Seen set for convergence

Unreachable set

Version

Page 240: The Road to Akka Cluster and Beyond

StateGossip

GOSSIPING case class Gossip( members: SortedSet[Member], seen: Set[UniqueAddress], reachability: Reachability, version: VectorClock)

1. Picks random node with older/newer version

2. Gossips in a request/reply conversational fashion

3. Updates internal state and adds himself to ‘seen’ set

Is a CRDTOrdered node ring

Seen set for convergence

Unreachable set

Version

Page 241: The Road to Akka Cluster and Beyond

Cluster Convergence

Page 242: The Road to Akka Cluster and Beyond

Cluster Convergence

Reached when: 1. All nodes are represented in the seen set 2. No members are unreachable, or 3. All unreachable members have status down or exiting

Page 243: The Road to Akka Cluster and Beyond

GOSSIPBIASED

Page 244: The Road to Akka Cluster and Beyond

GOSSIPBIASED

80% bias to nodes not in seen table Up to 400 nodes, then reduced

Page 245: The Road to Akka Cluster and Beyond

PUSH/PULLGOSSIP

Page 246: The Road to Akka Cluster and Beyond

PUSH/PULLGOSSIPVariation

Page 247: The Road to Akka Cluster and Beyond

PUSH/PULLGOSSIPVariation

case class Status(version: VectorClock)

Page 248: The Road to Akka Cluster and Beyond

ROLELEADER

Page 249: The Road to Akka Cluster and Beyond

ROLELEADER

Any node can be the leader

Page 250: The Road to Akka Cluster and Beyond

ROLE1. No election, but deterministic—1st member in node ring

LEADERAny node can be the leader

Page 251: The Road to Akka Cluster and Beyond

ROLE1. No election, but deterministic—1st member in node ring2. Can change after cluster convergence

LEADERAny node can be the leader

Page 252: The Road to Akka Cluster and Beyond

ROLE1. No election, but deterministic—1st member in node ring2. Can change after cluster convergence3. Leader duties

1. Moving joining members to up2. Moving exiting members to removed3. Can auto-down a member

LEADERAny node can be the leader

Page 253: The Road to Akka Cluster and Beyond

ROLE1. No election, but deterministic—1st member in node ring2. Can change after cluster convergence3. Leader duties

1. Moving joining members to up2. Moving exiting members to removed3. Can auto-down a member

LEADERAny node can be the leader

Page 254: The Road to Akka Cluster and Beyond

ROLE1. No election, but deterministic—1st member in node ring2. Can change after cluster convergence3. Leader duties

1. Moving joining members to up2. Moving exiting members to removed3. Can auto-down a member

LEADERAny node can be the leader

Page 255: The Road to Akka Cluster and Beyond

ROLE1. No election, but deterministic—1st member in node ring2. Can change after cluster convergence3. Leader duties

1. Moving joining members to up2. Moving exiting members to removed3. Can auto-down a member

LEADERAny node can be the leader

Page 256: The Road to Akka Cluster and Beyond

Failure Detection

Page 257: The Road to Akka Cluster and Beyond

Failure Detection

Hashes the node ring

Picks 5 nodes

Request/Reply heartbeat

Page 258: The Road to Akka Cluster and Beyond

Failure Detection

Hashes the node ring

Picks 5 nodes

Request/Reply heartbeat

To increase likelihood of bridging racks and data centers

Page 259: The Road to Akka Cluster and Beyond

Failure Detection

Cluster Membership

Remote Death Watch

Remote Supervision

Hashes the node ring

Picks 5 nodes

Request/Reply heartbeat

To increase likelihood of bridging racks and data centersUsed by

Page 260: The Road to Akka Cluster and Beyond

Failure DetectionIs an Accrual Failure Detector

Page 261: The Road to Akka Cluster and Beyond

Failure DetectionIs an Accrual Failure Detector

Does not help much in practice

Page 262: The Road to Akka Cluster and Beyond

Failure DetectionIs an Accrual Failure Detector

Does not help much in practice

Need to add delay to deal with Garbage Collection

Page 263: The Road to Akka Cluster and Beyond

Failure DetectionIs an Accrual Failure Detector

Does not help much in practice

Instead of this

Need to add delay to deal with Garbage Collection

Page 264: The Road to Akka Cluster and Beyond

Failure DetectionIs an Accrual Failure Detector

Does not help much in practice

Instead of this It often looks like this

Need to add delay to deal with Garbage Collection

Page 265: The Road to Akka Cluster and Beyond

Network Partitions

Page 266: The Road to Akka Cluster and Beyond

Network Partitions1. Failure Detector can marks an unavailable member Unreachable

Page 267: The Road to Akka Cluster and Beyond

Network Partitions1. Failure Detector can marks an unavailable member Unreachable

2. If one node is Unreachable then no cluster Convergence

Page 268: The Road to Akka Cluster and Beyond

Network Partitions1. Failure Detector can marks an unavailable member Unreachable

2. If one node is Unreachable then no cluster Convergence

3. This means that the Leader can no longer perform it’s duties

Page 269: The Road to Akka Cluster and Beyond

Network Partitions1. Failure Detector can marks an unavailable member Unreachable

2. If one node is Unreachable then no cluster Convergence

3. This means that the Leader can no longer perform it’s dutiesSplit Brain

Page 270: The Road to Akka Cluster and Beyond

Network Partitions1. Failure Detector can marks an unavailable member Unreachable

2. If one node is Unreachable then no cluster Convergence

3. This means that the Leader can no longer perform it’s duties

4. Member can come back from Unreachable—Or:

Split Brain

Page 271: The Road to Akka Cluster and Beyond

Network Partitions1. Failure Detector can marks an unavailable member Unreachable

2. If one node is Unreachable then no cluster Convergence

3. This means that the Leader can no longer perform it’s duties

4. Member can come back from Unreachable—Or:

5. The node needs to be marked as Down—either through:

Split Brain

Page 272: The Road to Akka Cluster and Beyond

Network Partitions1. Failure Detector can marks an unavailable member Unreachable

2. If one node is Unreachable then no cluster Convergence

3. This means that the Leader can no longer perform it’s duties

4. Member can come back from Unreachable—Or:

5. The node needs to be marked as Down—either through:

1. auto-down

2. Manual down

Split Brain

Page 273: The Road to Akka Cluster and Beyond

Network Partitions1. Failure Detector can marks an unavailable member Unreachable

2. If one node is Unreachable then no cluster Convergence

3. This means that the Leader can no longer perform it’s duties

4. Member can come back from Unreachable—Or:

5. The node needs to be marked as Down—either through:

1. auto-down

2. Manual down

Split Brain

Page 274: The Road to Akka Cluster and Beyond

Potential FUTURE Optimizations

Page 275: The Road to Akka Cluster and Beyond

Potential FUTURE Optimizations

• Vector Clock pruning

Page 276: The Road to Akka Cluster and Beyond

Potential FUTURE Optimizations

• Vector Clock pruning

• Delegated heartbeat—like in the SWIM paper—to find ways around a network split

Page 277: The Road to Akka Cluster and Beyond

Potential FUTURE Optimizations

• Vector Clock pruning

• Delegated heartbeat—like in the SWIM paper—to find ways around a network split

• “Real” push/pull gossip—with fine grained hashing of state and only shipping deltas (perhaps using Merkle trees)

Page 278: The Road to Akka Cluster and Beyond

Potential FUTURE Optimizations

• Vector Clock pruning

• Delegated heartbeat—like in the SWIM paper—to find ways around a network split

• “Real” push/pull gossip—with fine grained hashing of state and only shipping deltas (perhaps using Merkle trees)

• More advanced out-of-the-box auto-down patterns

Page 279: The Road to Akka Cluster and Beyond

Akka Modules For Distribution

Akka Cluster

Akka Remote

Akka IO

Clustered Singleton

Clustered Routers

Clustered Pub/Sub

Cluster Client

Consistent Hashing

Page 280: The Road to Akka Cluster and Beyond

Beyond…and

Page 281: The Road to Akka Cluster and Beyond

Akka & The Road Ahead Akka Persistence

Akka HTTP

Akka Reactive Streams

Akka Services

Akka Microkernel Redeux

Page 282: The Road to Akka Cluster and Beyond

Akka & The Road Ahead Akka Persistence

Akka HTTP

Akka Reactive Streams

Akka Services

Akka Microkernel Redeux

Akka 2.3

Page 283: The Road to Akka Cluster and Beyond

Akka & The Road Ahead Akka Persistence

Akka HTTP

Akka Reactive Streams

Akka Services

Akka Microkernel Redeux

Akka 2.3

Akka 2.4

Page 284: The Road to Akka Cluster and Beyond

Akka & The Road Ahead Akka Persistence

Akka HTTP

Akka Reactive Streams

Akka Services

Akka Microkernel Redeux

Akka 2.3

Akka 2.4

Akka 2.4?

Page 285: The Road to Akka Cluster and Beyond

Akka & The Road Ahead Akka Persistence

Akka HTTP

Akka Reactive Streams

Akka Services

Akka Microkernel Redeux

Akka 2.3

Akka 2.4

Akka 2.4?

Akka 2.4?

Page 286: The Road to Akka Cluster and Beyond

Akka & The Road Ahead Akka Persistence

Akka HTTP

Akka Reactive Streams

Akka Services

Akka Microkernel Redeux

Akka 2.3

Akka 2.4

Akka 2.4?

Akka 2.4?

?

Page 287: The Road to Akka Cluster and Beyond

anyQuestions?

Page 288: The Road to Akka Cluster and Beyond

The Road to

Akka Cluster and Beyond

Jonas Bonér CTO Typesafe

@jboner

Page 289: The Road to Akka Cluster and Beyond

References• General Distributed Systems

• Summary of network reliability post-mortems—more terrifying than the most horrifying Stephen King novel: http://aphyr.com/posts/288-the-network-is-reliable

• A Note on Distributed Computing: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.7628

• On the problems with RPC: http://steve.vinoski.net/pdf/IEEE-Convenience_Over_Correctness.pdf

• 8 Fallacies of Distributed Computing: https://blogs.oracle.com/jag/resource/Fallacies.html

• 6 Misconceptions of Distributed Computing: www.dsg.cs.tcd.ie/~vjcahill/sigops98/papers/vogels.ps

• Distributed Computing Systems—A Foundational Approach: http://www.amazon.com/Programming-Distributed-Computing-Systems-Foundational/dp/0262018985

• Introduction to Reliable and Secure Distributed Programming: http://www.distributedprogramming.net/

• Nice short overview on Distributed Systems: http://book.mixu.net/distsys/ • Meta list of distributed systems readings: https://gist.github.com/macintux/

6227368

Page 290: The Road to Akka Cluster and Beyond

References!

• Actor Model • Great discussion between Erik Meijer & Carl

Hewitt or the essence of the Actor Model: http://channel9.msdn.com/Shows/Going+Deep/Hewitt-Meijer-and-Szyperski-The-Actor-Model-everything-you-wanted-to-know-but-were-afraid-to-ask

• Carl Hewitt’s 1973 paper defining the Actor Model: http://worrydream.com/refs/Hewitt-ActorModel.pdf

• Gul Agha’s Doctoral Dissertation: https://dspace.mit.edu/handle/1721.1/6952

Page 291: The Road to Akka Cluster and Beyond

References• FLP

• Impossibility of Distributed Consensus with One Faulty Process: http://cs-www.cs.yale.edu/homes/arvind/cs425/doc/fischer.pdf

• A Brief Tour of FLP: http://the-paper-trail.org/blog/a-brief-tour-of-flp-impossibility/

• CAP • Brewer’s Conjecture and the Feasibility of Consistent, Available,

Partition-Tolerant Web Services: http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf

• You Can’t Sacrifice Partition Tolerance: http://codahale.com/you-cant-sacrifice-partition-tolerance/

• Linearizability: A Correctness Condition for Concurrent Objects: http://courses.cs.vt.edu/~cs5204/fall07-kafura/Papers/TransactionalMemory/Linearizability.pdf

• CAP Twelve Years Later: How the "Rules" Have Changed: http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed

• Consistency vs. Availability: http://www.infoq.com/news/2008/01/consistency-vs-availability

Page 292: The Road to Akka Cluster and Beyond

References• Time & Order

• Post on the problems with Last Write Wins in Riak: http://aphyr.com/posts/285-call-me-maybe-riak

• Time, Clocks, and the Ordering of Events in a Distributed System: http://research.microsoft.com/en-us/um/people/lamport/pubs/time-clocks.pdf

• Vector Clocks: http://zoo.cs.yale.edu/classes/cs426/2012/lab/bib/fidge88timestamps.pdf

• Failure Detection • Unreliable Failure Detectors for Reliable Distributed Systems:

http://www.cs.utexas.edu/~lorenzo/corsi/cs380d/papers/p225-chandra.pdf

• The ϕ Accrual Failure Detector: http://ddg.jaist.ac.jp/pub/HDY+04.pdf

• SWIM Failure Detector: http://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf

• Practical Byzantine Fault Tolerance: http://www.pmg.lcs.mit.edu/papers/osdi99.pdf

Page 293: The Road to Akka Cluster and Beyond

References• Transactions

• Jim Gray’s classic book: http://www.amazon.com/Transaction-Processing-Concepts-Techniques-Management/dp/1558601902

• Highly Available Transactions: Virtues and Limitations: http://www.bailis.org/papers/hat-vldb2014.pdf

• Bolt on Consistency: http://db.cs.berkeley.edu/papers/sigmod13-bolton.pdf

• Calvin: Fast Distributed Transactions for Partitioned Database Systems: http://cs.yale.edu/homes/thomson/publications/calvin-sigmod12.pdf

• Spanner: Google's Globally-Distributed Database: http://research.google.com/archive/spanner.html

• Life beyond Distributed Transactions: an Apostate’s Opinion https://cs.brown.edu/courses/cs227/archives/2012/papers/weaker/cidr07p15.pdf

• Immutability Changes Everything—Pat Hellands talk at Ricon: http://vimeo.com/52831373

• Unschackle Your Domain (Event Sourcing): http://www.infoq.com/presentations/greg-young-unshackle-qcon08

• CQRS: http://martinfowler.com/bliki/CQRS.html

Page 294: The Road to Akka Cluster and Beyond

References• Consensus

• Paxos Made Simple: http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf

• Paxos Made Moderately Complex: http://www.cs.cornell.edu/courses/cs7412/2011sp/paxos.pdf

• A simple totally ordered broadcast protocol (ZAB): labs.yahoo.com/files/ladis08.pdf

• In Search of an Understandable Consensus Algorithm (Raft): https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf

• Replication strategy comparison diagram: http://snarfed.org/transactions_across_datacenters_io.html

• Distributed Snapshots: Determining Global States of Distributed Systems: http://www.cs.swarthmore.edu/~newhall/readings/snapshots.pdf

Page 295: The Road to Akka Cluster and Beyond

References• Eventual Consistency

• Dynamo: Amazon’s Highly Available Key-value Store: http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf

• Consistency vs. Availability: http://www.infoq.com/news/2008/01/consistency-vs-availability

• Consistent Hashing and Random Trees: http://thor.cs.ucsb.edu/~ravenben/papers/coreos/kll+97.pdf

• PBS: Probabilistically Bounded Staleness: http://pbs.cs.berkeley.edu/

Page 296: The Road to Akka Cluster and Beyond

References• Epidemic Gossip

• Chord: A Scalable Peer-to-peer Lookup Service for Internet• Applications: http://pdos.csail.mit.edu/papers/chord:sigcomm01/

chord_sigcomm.pdf• Gossip-style Failure Detector: http://www.cs.cornell.edu/home/rvr/

papers/GossipFD.pdf• GEMS: http://www.hcs.ufl.edu/pubs/GEMS2005.pdf• Efficient Reconciliation and Flow Control for Anti-Entropy Protocols:

http://www.cs.cornell.edu/home/rvr/papers/flowgossip.pdf• 2400 Akka nodes on GCE: http://typesafe.com/blog/running-a-2400-

akka-nodes-cluster-on-google-compute-engine • Starting 1000 Akka nodes in 4 min: http://typesafe.com/blog/starting-

up-a-1000-node-akka-cluster-in-4-minutes-on-google-compute-engine

• Push Pull Gossiping: http://khambatti.com/mujtaba/ArticlesAndPapers/pdpta03.pdf

• SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol: http://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf

Page 297: The Road to Akka Cluster and Beyond

References• Conflict-Free Replicated Data Types (CRDTs)

• A comprehensive study of Convergent and Commutative Replicated Data Types: http://hal.upmc.fr/docs/00/55/55/88/PDF/techreport.pdf

• Mark Shapiro talks about CRDTs at Microsoft: http://research.microsoft.com/apps/video/dl.aspx?id=153540

• Akka CRDT project: https://github.com/jboner/akka-crdt• CALM

• Dedalus: Datalog in Time and Space: http://db.cs.berkeley.edu/papers/datalog2011-dedalus.pdf

• CALM: http://www.cs.berkeley.edu/~palvaro/cidr11.pdf• Logic and Lattices for Distributed Programming: http://

db.cs.berkeley.edu/papers/UCB-lattice-tr.pdf• Bloom Language website: http://bloom-lang.net• Joe Hellerstein talks about CALM: http://vimeo.com/

53904989

Page 298: The Road to Akka Cluster and Beyond

References• Akka Cluster

• My Akka Cluster Implementation Notes: https://gist.github.com/jboner/7692270

• Akka Cluster Specification: http://doc.akka.io/docs/akka/snapshot/common/cluster.html

• Akka Cluster Docs: http://doc.akka.io/docs/akka/snapshot/scala/cluster-usage.html

• Akka Failure Detector Docs: http://doc.akka.io/docs/akka/snapshot/scala/remoting.html#Failure_Detector

• Akka Roadmap: https://docs.google.com/a/typesafe.com/document/d/18W9-fKs55wiFNjXL9q50PYOnR7-nnsImzJqHOPPbM4E/mobilebasic?pli=1&hl=en_US

• Where Akka Came From: http://letitcrash.com/post/40599293211/where-akka-came-from