Consistency without consensus Linearizable Resilient Data Types (LRDT) Kaushik Rajan Sagar Chordia Kapil Vaswani Ganesan Ramalingam Sriram Rajamani
Feb 24, 2016
Consistency without consensusLinearizable Resilient Data Types (LRDT)
Kaushik RajanSagar Chordia Kapil Vaswani
Ganesan RamalingamSriram Rajamani
Consistency & consensusAdd(The Hobbit)
Add(Kindle)
GetCart()
Processes agree on ordering of operations
GetCart()
No deterministic algorithm in the presence
of failures [FLP]
Commuting updates• What if all update operations commute?– Ordering of updates doesn’t matter!– Eventual consistency reduces to eventual message delivery– Single round trip latency
• What if we desire linearizability?– Updates don’t commute with arbitrary reads – Reads must be consistently ordered with updates– Semantics of queries like the current top(k) elements well
understood
Commuting updatesAdd(The Hobbit)
Add(Kindle)
GetCart()
GetCart()
{}
{The Hobbit, Kindle}
Reads must observe comparable sets of operations
Linearizable resilient data typesPossible Impossible
Don’t know
S S’
op1
op2op1
op2
P1 : commutes(s,op1,op2)
op1
op2
S
S1
S2
op1
P2 : nullify(s,op1,op2)
op2
S
S1
S2
op2
op1
Examples• Read write register :
every pair of writes nullify• Read write memory :
writes to the same location nullify, writes to different locations commute
Examples• Set : add, remove and read the whole set– Add(u), Remove(v) commute– Add(u), Remove(u) nullify – Add(*), Add(*) commute– Remove(*) Remove(*) commute
• Counter : IncrBy(x), DecrBy(x), SetTo(v), Read()– SetTo(v) nullifies all other operations– Other pairs of updates commute
• Other examples Heaps, union-find, atomic snapshot objects…
Lattice agreement• Consistency reduces to lattice agreement– Weaker problem than consensus– Solvable in an asynchronous distributed system
• Assumptions– t < n/2 failures– Eventual message delivery
Lattice agreement• processes, each process starts with a value belonging
to a join semi lattice• Each non-faulty process outputs a value– (Validity) Each process’ output is a join of one or more input
values including its own– (Consistency) Any two output values are comparable– (Liveness) Every correct process eventually outputs a value
Lattice agreement
{}
{𝑎} {𝑏 } {𝑐 }
{𝑎 ,𝑏 } {𝑏 ,𝑐 } {𝑎 ,𝑐 }
{𝑎 ,𝑏 ,𝑐 }
𝑝1 𝑝2
𝑝3𝑝2
𝑝3𝑝2
𝑝1
a = Add(The Hobbit)b = Add(Kindle)c = Add(Lumia)
Send to all acceptors
All Acks
?
Output
𝑣 𝑖←⋁ ∀ 𝑁𝑎𝑐𝑘 (𝑎 𝑗 )𝑎 𝑗
wait for majority of acceptors to respond
On receiving
𝑎𝑖≤𝑣 𝑗
S S
Y
N
Y N
PROPOSERS ACCEPTORSInitially
𝑎𝑖=𝑎𝑖∨𝑣 𝑗 𝑎𝑖=𝑎𝑖∨𝑣 𝑗
Safety and liveness• Safety always guaranteed• Lattice agreement is t-resilient – Liveness guaranteed if quorum of processes are non-faulty
and communication is reliable– Processes output value in at-most n round trips, where n is
the number of processes
Generalized lattice agreement• Generalization of lattice agreement – Processes receive sequence of values– Values belong to an infinite lattice
• Processes output a sequence of values– (Validity) Every output value is a join of some received values – (Consistency) Any two output values are comparable (i.e.
output values form a chain)– (Liveness) Every value received by a correct process is
eventually included in an output value
GLA algorithm• Liveness (t-resilient)– Every received value is eventually included in some output in
n round trips– Adaptive, complexity depends on contention
• Fast path – Received values output in one round trip
• Reconfigurable – Replicas can be added/removed dynamically
From GLA to linearizability• Update commands form power set lattice• Updates return once majority of processes have learnt a
command set that includes the update command• Read performed by (ABD style algorithm)
1. reading the learnt command set from a quorum of processes2. Writing back the largest among these to a quorum3. Constructing state corresponding to the largest command set
by exploiting commutativity and nullification• Multi-master replication– Does not require a single primary/leader
Impossibility
• Consensus reductionConsensus(b)
Si S0
if(b) then op1 else op2s = read()if(s = S1,S12) return
trueelse return false
Pair of idempotent update operations that neither commute nor nullify at some state s0
S0
S1S1
2
S2S2
1
op2
op1
op1
op2
Si
Op*
op2
op1
Implications for designing ADTsMost commands commute
Implications for designing ADTs
neither commute nor nullify at
;
The Gap : Open problems Doubly saturating counter
0 1Incr()
Decr()
2Incr()
Decr()
nIncr()
Decr()Decr()
Incr()
Incr() and Decr() commute at 1 … n-1Incr() and Dect() nullify at 0 and n
Don’t know if this is possible or impossible
Summary
graph, RW mem… queues, sequences
Possible Impossible??Saturating
counter