April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.
Post on 20-Dec-2015
214 Views
Preview:
Transcript
April 2008 1
Brahms
Byzantine-Resilient Random Membership SamplingBortnikov, Gurevich, Keidar, Kliot, and Shraer
April 2008 2
Edward (Eddie) Bortnikov Maxim (Max) Gurevich Idit Keidar
Gabriel (Gabi) Kliot Alexander (Alex) Shraer
April 2008 3
Why Random Node Sampling
Gossip partners Random choices make gossip protocols work
Unstructured overlay networks E.g., among super-peers Random links provide robustness, expansion
Gathering statistics Probe random nodes
Choosing cache locations
April 2008 4
The Setting
Many nodes – n 10,000s, 100,000s, 1,000,000s, …
Come and go Churn
Full network Like the Internet
Every joining node knows some others (Initial) Connectivity
April 2008 5
Adversary Attacks
Faulty nodes (portion f of ids) Attack other nodes Byzantine failures
May want to bias samples Isolate nodes, DoS nodes Promote themselves, bias statistics
April 2008 6
Previous Work
Benign gossip membership Small (logarithmic) views Robust to churn and benign failures Empirical study [Lpbcast,Scamp,Cyclon,PSS] Analytical study [Allavena et al.] Never proven uniform samples Spatial correlation among neighbors’ views [PSS]
Byzantine-resilient gossip Full views [MMR,MS,Fireflies,Drum,BAR] Small views, some resilience [SPSS] We are not aware of any analytical work
April 2008 7
Our Contributions
1. Gossip-based attack-tolerant membership Linear portion f of failures O(n1/3)-size partial views Correct nodes remain connected Mathematically analyzed, validated in
simulations
2. Random sampling Novel memory-efficient approach Converges to proven independent uniform
samples
The view is not all bad
Better than benign gossip
April 2008 8
Brahms
1. Sampling - local component
2. Gossip - distributed componentsample
Sampler
Gossip
view
April 2008 9
Sampler Building Block
Input: data stream, one element at a time Bias: some values appear more than others Used with stream of gossiped ids
Output: uniform random sample of unique elements seen thus far Independent of other Samplers One element at a time (converging)
next
sample
Sampler
April 2008 10
Sampler Implementation
Memory: stores one element at a time Use random hash function h
From min-wise independent family [Broder et al.] For each set X, and all , Xx
||
1))()}(Pr(min{
XxhXh
Sampler
next
sample
init
Keep id with smallest hash so far
Choose random hash function
April 2008 11
Component S: Sampling and Validation
SamplerSampler
sample
Sampler Sampler
nextinit
using pings
id streamfrom gossip
ValidatorValidator Validator Validator
S
April 2008 12
Gossip Process
Provides the stream of ids for S Needs to ensure connectivity Use a bag of tricks to overcome attacks
April 2008 13
Gossip-Based Membership Primer
Small (sub-linear) local view V V constantly changes - essential due to churn
Typically, evolves in (unsynchronized) rounds Push: send my id to some node in V
Reinforce underrepresented nodes Pull: retrieve view from some node in V
Spread knowledge within the network [Allavena et al. ‘05]: both are essential
Low probability for partitions and star topologies
April 2008 14
Brahms Gossip Rounds
Each round: Send pushes, pulls to random nodes from V Wait to receive pulls, pushes Update S with all received ids (Sometimes) re-compute V
Tricky! Beware of adversary attacks
April 2008 15
Problem 1: Push Drowning
Push Alice
Push Bob
Push Dana
Push Caro
lP
ush
Ed
Push Mallory
Push M
&M
Push Malfoy
A
D
B
EM
M
M
M
M
April 2008 16
Trick 1: Rate-Limit Pushes
Use limited messages to bound faulty pushes system-wide
Assume at most p of pushes are faulty E.g., computational puzzles/virtual currency Faulty nodes can send portion p of them Views won’t be all bad
April 2008 17
Problem 2: Quick Isolation
Push Alice
Push Bob
Push Dana
Push Caro
l
Pu
sh E
d Push MalloryPush M
&M
Pu
sh M
alfo
y
A
C
E
M
Ha! She’s out! Now let’s move on
to the next guy!
D
April 2008 18
Trick 2: Detection & Recovery
Do not re-compute V in rounds when too many pushes are received
Push MalloryPush M
&M
Pu
sh M
alfo
y
Hey! I’m swamped!I better ignore all of ‘em pushes…
Push Bob
Slows down isolation; does not prevent it Isolation takes log (view size) time
April 2008 19
Problem 3: Pull Deterioration
Pull
M8
A B M1 M2
C D M3 M4Pull
PullM
7
E F M5 M6
PullE
M 3
M3 E M7 M8
50% faulty ids in views 75% faulty ids in views
April 2008 20
Trick 3: Balance Pulls & Pushes
Control contribution of push - α|V| ids versus contribution of pull - β|V| ids Parameters α, β
Pull-only eventually all faulty ids Push-only quick isolation of attacked node Push ensures: system-wide not all bad ids Pull slows down (does not prevent) isolation
April 2008 21
Trick 4: History Samples
Attacker influences both push and pull
Feedback γ|V| random ids from S Parameters α + β + γ = 1
Attacker loses control - samples are eventually perfectly uniform
Yoo-hoo, is there any good process out there?
April 2008 23
Samples Take Time To Help
Judicious use essential (e.g., 10%) Deal with churn, bootstrap
Need connectivity for uniform sampling Rely on sampling for connectivity?
Break cycle: Assume attack starts when samples are empty Analyze time to partition without samples Samples become useful at least as fast
April 2008 24
Key Property
With appropriate parameters E.g.,
Time to partition > time to convergence of 1st good sample
Prove lower boundusing tricks 1,2,3
(not using samples yet)
Prove upper bound until some good sample persists
forever
3| | | | ( )V S n
Self-healing from partitions
Time to Partition Analysis
Easiest partition to cause – isolate one node Targeted attack
Analysis of targeted attack: Assume unrealistically strong adversary Analyze evolution of faulty ids in views as a
random process Show lower bound on time to isolation Depends on p, |V|, α, β Independent of n
April 2008 25
April 2008 26
Scalability of PSP(t) – Probability of Perfect Sample at Time t Analysis says:
For scalability, want small and constant convergence time independent of system size, e.g., when
2| | | |
( ) (1 )V S
nPSP t e
3| | | | ( )V S n
2| | (log ),| | ( )
log
nV n S
n
April 2008 27
Analysis
1. Sampling - mathematical analysis
2. Connectivity - analysis and simulation
3. Full system simulation
April 2008 28
Connectivity Sampling
Theorem: If overlay remains connected indefinitely, samples are eventually uniform
April 2008 29
Sampling Connectivity Ever After
Perfect sample of a sampler with hash h: the id with the lowest h(id) system-wide
If correct, sticks once the sampler sees it Correct perfect sample self-healing from
partitions ever after
April 2008 31
Connectivity Analysis 1: Balanced Attacks Attack all nodes the same Maximizes faulty ids in views system-wide
in any single round If repeated, system converges to fixed point
ratio of faulty ids in views, which is < 1 if γ=0 (no history) and p < 1/3 or History samples are used, any p
There are always good ids in views!
April 2008 32
Fixed Point Analysis: Push
i
Local view node 1
Local view node i
Time t:
push
1 Time t+1:
push from faulty node
lost push
x(t) – portion of faulty nodes in views at round t;portion of faulty pushes to correct nodes :
p / ( p + ( 1 − p )( 1 − x(t) ) )
April 2008 33
Fixed Point Analysis: Pull
i
Local view node 1
Local view node i
Time t:
pull from i: faulty with probability x(t)
Time t+1:
E[x(t+1)] = p / (p + (1 − p)(1 − x(t))) + ( x(t) + (1-x(t))x(t) ) + γf
pull from faulty
April 2008 34
Faulty Ids in Fixed Point
With a few history samples, any
portion of bad nodes can be toleratedPerfectly validated
fixed pointsand convergence
Assumed perfect in analysis, real history
in simulations
April 2008 36
Connectivity Analysis 2:Targeted Attack – Roadmap Step 1: analysis without history samples
Isolation in logarithmic time … but not too fast, thanks to tricks 1,2,3
Step 2: analysis of history sample convergence Time-to-perfect-sample < Time-to-Isolation
Step 3: putting it all together Empirical evaluation No isolation happens
April 2008 37
Targeted Attack – Step 1
Q: How fast (lower bound) can an attacker isolate one node from the rest?
Worst-case assumptions No use of history samples ( = 0) Unrealistically strong adversary
Observes the exact number of correct pushes and complements it to α|V|
Attacked node not represented initially Balanced attack on the rest of the system
April 2008 38
Isolation w/out History Samples
n = 1000
p = 0.2
α=β=0.5
γ=0
Depend on α,β,p
Isolation time for |V|=60
2x2 i, ji
E(indegree(t 1)) E(indegree(t))A , A 1
E(outdegree(t 1)) E(outdegree(t)
April 2008 39
Step 2: Sample Convergence
Perfect sample in 2-3 rounds
n = 1000
p = 0.2
α=β=0.5, γ=0
40% unique ids
Empirically verified
April 2008 40
Step 3: Putting It All TogetherNo Isolation with History Samples
Works well despite small PSP
n = 1000
p = 0.2
α=β=0.45
γ=0.1
April 2008 41
p = 0.2
α=β=0.45
γ=0.1
Sample Convergence (Balanced)
32|||| NSV
32|||| nSV
Convergence twice as fast with 33|||| nSV
top related