Top Banner
April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer
42

April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 1

Brahms

Byzantine-Resilient Random Membership SamplingBortnikov, Gurevich, Keidar, Kliot, and Shraer

Page 2: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 2

Edward (Eddie) Bortnikov Maxim (Max) Gurevich Idit Keidar

Gabriel (Gabi) Kliot Alexander (Alex) Shraer

Page 3: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 3

Why Random Node Sampling

Gossip partners Random choices make gossip protocols work

Unstructured overlay networks E.g., among super-peers Random links provide robustness, expansion

Gathering statistics Probe random nodes

Choosing cache locations

Page 4: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 4

The Setting

Many nodes – n 10,000s, 100,000s, 1,000,000s, …

Come and go Churn

Full network Like the Internet

Every joining node knows some others (Initial) Connectivity

Page 5: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 5

Adversary Attacks

Faulty nodes (portion f of ids) Attack other nodes Byzantine failures

May want to bias samples Isolate nodes, DoS nodes Promote themselves, bias statistics

Page 6: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 6

Previous Work

Benign gossip membership Small (logarithmic) views Robust to churn and benign failures Empirical study [Lpbcast,Scamp,Cyclon,PSS] Analytical study [Allavena et al.] Never proven uniform samples Spatial correlation among neighbors’ views [PSS]

Byzantine-resilient gossip Full views [MMR,MS,Fireflies,Drum,BAR] Small views, some resilience [SPSS] We are not aware of any analytical work

Page 7: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 7

Our Contributions

1. Gossip-based attack-tolerant membership Linear portion f of failures O(n1/3)-size partial views Correct nodes remain connected Mathematically analyzed, validated in

simulations

2. Random sampling Novel memory-efficient approach Converges to proven independent uniform

samples

The view is not all bad

Better than benign gossip

Page 8: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 8

Brahms

1. Sampling - local component

2. Gossip - distributed componentsample

Sampler

Gossip

view

Page 9: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 9

Sampler Building Block

Input: data stream, one element at a time Bias: some values appear more than others Used with stream of gossiped ids

Output: uniform random sample of unique elements seen thus far Independent of other Samplers One element at a time (converging)

next

sample

Sampler

Page 10: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 10

Sampler Implementation

Memory: stores one element at a time Use random hash function h

From min-wise independent family [Broder et al.] For each set X, and all , Xx

||

1))()}(Pr(min{

XxhXh

Sampler

next

sample

init

Keep id with smallest hash so far

Choose random hash function

Page 11: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 11

Component S: Sampling and Validation

SamplerSampler

sample

Sampler Sampler

nextinit

using pings

id streamfrom gossip

ValidatorValidator Validator Validator

S

Page 12: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 12

Gossip Process

Provides the stream of ids for S Needs to ensure connectivity Use a bag of tricks to overcome attacks

Page 13: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 13

Gossip-Based Membership Primer

Small (sub-linear) local view V V constantly changes - essential due to churn

Typically, evolves in (unsynchronized) rounds Push: send my id to some node in V

Reinforce underrepresented nodes Pull: retrieve view from some node in V

Spread knowledge within the network [Allavena et al. ‘05]: both are essential

Low probability for partitions and star topologies

Page 14: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 14

Brahms Gossip Rounds

Each round: Send pushes, pulls to random nodes from V Wait to receive pulls, pushes Update S with all received ids (Sometimes) re-compute V

Tricky! Beware of adversary attacks

Page 15: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 15

Problem 1: Push Drowning

Push Alice

Push Bob

Push Dana

Push Caro

lP

ush

Ed

Push Mallory

Push M

&M

Push Malfoy

A

D

B

EM

M

M

M

M

Page 16: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 16

Trick 1: Rate-Limit Pushes

Use limited messages to bound faulty pushes system-wide

Assume at most p of pushes are faulty E.g., computational puzzles/virtual currency Faulty nodes can send portion p of them Views won’t be all bad

Page 17: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 17

Problem 2: Quick Isolation

Push Alice

Push Bob

Push Dana

Push Caro

l

Pu

sh E

d Push MalloryPush M

&M

Pu

sh M

alfo

y

A

C

E

M

Ha! She’s out! Now let’s move on

to the next guy!

D

Page 18: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 18

Trick 2: Detection & Recovery

Do not re-compute V in rounds when too many pushes are received

Push MalloryPush M

&M

Pu

sh M

alfo

y

Hey! I’m swamped!I better ignore all of ‘em pushes…

Push Bob

Slows down isolation; does not prevent it Isolation takes log (view size) time

Page 19: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 19

Problem 3: Pull Deterioration

Pull

M8

A B M1 M2

C D M3 M4Pull

PullM

7

E F M5 M6

PullE

M 3

M3 E M7 M8

50% faulty ids in views 75% faulty ids in views

Page 20: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 20

Trick 3: Balance Pulls & Pushes

Control contribution of push - α|V| ids versus contribution of pull - β|V| ids Parameters α, β

Pull-only eventually all faulty ids Push-only quick isolation of attacked node Push ensures: system-wide not all bad ids Pull slows down (does not prevent) isolation

Page 21: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 21

Trick 4: History Samples

Attacker influences both push and pull

Feedback γ|V| random ids from S Parameters α + β + γ = 1

Attacker loses control - samples are eventually perfectly uniform

Yoo-hoo, is there any good process out there?

Page 22: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 22

View and Sample Maintenance

Pushed ids

Pulled ids

S |V| |V| |V|

View V Sample

Page 23: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 23

Samples Take Time To Help

Judicious use essential (e.g., 10%) Deal with churn, bootstrap

Need connectivity for uniform sampling Rely on sampling for connectivity?

Break cycle: Assume attack starts when samples are empty Analyze time to partition without samples Samples become useful at least as fast

Page 24: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 24

Key Property

With appropriate parameters E.g.,

Time to partition > time to convergence of 1st good sample

Prove lower boundusing tricks 1,2,3

(not using samples yet)

Prove upper bound until some good sample persists

forever

3| | | | ( )V S n

Self-healing from partitions

Page 25: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

Time to Partition Analysis

Easiest partition to cause – isolate one node Targeted attack

Analysis of targeted attack: Assume unrealistically strong adversary Analyze evolution of faulty ids in views as a

random process Show lower bound on time to isolation Depends on p, |V|, α, β Independent of n

April 2008 25

Page 26: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 26

Scalability of PSP(t) – Probability of Perfect Sample at Time t Analysis says:

For scalability, want small and constant convergence time independent of system size, e.g., when

2| | | |

( ) (1 )V S

nPSP t e

3| | | | ( )V S n

2| | (log ),| | ( )

log

nV n S

n

Page 27: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 27

Analysis

1. Sampling - mathematical analysis

2. Connectivity - analysis and simulation

3. Full system simulation

Page 28: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 28

Connectivity Sampling

Theorem: If overlay remains connected indefinitely, samples are eventually uniform

Page 29: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 29

Sampling Connectivity Ever After

Perfect sample of a sampler with hash h: the id with the lowest h(id) system-wide

If correct, sticks once the sampler sees it Correct perfect sample self-healing from

partitions ever after

Page 30: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 30

Convergence to 1st Perfect Sample

n = 1000

f = 0.2

40% unique ids in stream

Page 31: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 31

Connectivity Analysis 1: Balanced Attacks Attack all nodes the same Maximizes faulty ids in views system-wide

in any single round If repeated, system converges to fixed point

ratio of faulty ids in views, which is < 1 if γ=0 (no history) and p < 1/3 or History samples are used, any p

There are always good ids in views!

Page 32: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 32

Fixed Point Analysis: Push

i

Local view node 1

Local view node i

Time t:

push

1 Time t+1:

push from faulty node

lost push

x(t) – portion of faulty nodes in views at round t;portion of faulty pushes to correct nodes :

p / ( p + ( 1 − p )( 1 − x(t) ) )

Page 33: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 33

Fixed Point Analysis: Pull

i

Local view node 1

Local view node i

Time t:

pull from i: faulty with probability x(t)

Time t+1:

E[x(t+1)] = p / (p + (1 − p)(1 − x(t))) + ( x(t) + (1-x(t))x(t) ) + γf

pull from faulty

Page 34: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 34

Faulty Ids in Fixed Point

With a few history samples, any

portion of bad nodes can be toleratedPerfectly validated

fixed pointsand convergence

Assumed perfect in analysis, real history

in simulations

Page 35: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 35

Convergence to Fixed Point

n = 1000

p = 0.2

α=β=0.5

γ=0

Page 36: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 36

Connectivity Analysis 2:Targeted Attack – Roadmap Step 1: analysis without history samples

Isolation in logarithmic time … but not too fast, thanks to tricks 1,2,3

Step 2: analysis of history sample convergence Time-to-perfect-sample < Time-to-Isolation

Step 3: putting it all together Empirical evaluation No isolation happens

Page 37: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 37

Targeted Attack – Step 1

Q: How fast (lower bound) can an attacker isolate one node from the rest?

Worst-case assumptions No use of history samples ( = 0) Unrealistically strong adversary

Observes the exact number of correct pushes and complements it to α|V|

Attacked node not represented initially Balanced attack on the rest of the system

Page 38: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 38

Isolation w/out History Samples

n = 1000

p = 0.2

α=β=0.5

γ=0

Depend on α,β,p

Isolation time for |V|=60

2x2 i, ji

E(indegree(t 1)) E(indegree(t))A , A 1

E(outdegree(t 1)) E(outdegree(t)

Page 39: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 39

Step 2: Sample Convergence

Perfect sample in 2-3 rounds

n = 1000

p = 0.2

α=β=0.5, γ=0

40% unique ids

Empirically verified

Page 40: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 40

Step 3: Putting It All TogetherNo Isolation with History Samples

Works well despite small PSP

n = 1000

p = 0.2

α=β=0.45

γ=0.1

Page 41: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 41

p = 0.2

α=β=0.45

γ=0.1

Sample Convergence (Balanced)

32|||| NSV

32|||| nSV

Convergence twice as fast with 33|||| nSV

Page 42: April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

April 2008 42

Summary

O(n1/3)-size views Resist attacks / failures of linear portion Converge to proven uniform samples Precise analysis of impact of attacks