SHELL: A Distributed and Oblivious Heap with Applications for Robust Information Systems and Heterogeneous Peer-to-Peer Networks Christian Scheideler Stefan.

SHELL: A Distributed and Oblivious Heap

with Applications for Robust Information Systems and Heterogeneous Peer-to-Peer Networks

Christian Scheideler

Stefan Schmid

Network Algorithms

Summer 2008

Stefan Schmid @ TU München, 2008 2

DISTRIBUTED COMPUTING

• Prof. Scheideler an Konferenz

• Deshalb: Spezialprogramm

• Shell- Baut auf gelerntem auf!

- Ongoing work...

Keine Unterlagen

Hat noch Lücken, ev. auch Fehler

/ Slides auf Englisch damit auch sonst mal gebrauchbar!

Offen für Inputs / Ideen!

Bevor wir SHELL anschauen...



• Today, still many challenges in distributed systems (e.g., the Internet)

• E.g., viruses, spam, DoS attacks, selfish users, etc.

• Very active research

• For example, peer-to-peer computing- Dynamics / churn: Peers join and leave frequently

- In 1,000,000 network where peer sessions are around 60 minutes, there are hundreds of membership changes every second!

- Peer-to-peer based on contributions of participants: problematic if users are selfish!

- E.g., BitThief free-rides in BitTorrent

- Heterogeneity: peers have different Internet connections, different CPUs, run different operating systems, etc.

Motivation



• SHELL = our overlay architecture

• Basically, a distributed heap

• Refresher: min heap- children have larger key

than parent

- e.g., useful for priority

queues (fast removeMin())

SHELL Overview

slide from GAD lecture 2008...


Heap Refresher

• Heap in GAD...



• What is a distributed heap?

• We assume that peers have a key / order / rank / id- for example: time when peer joined

• (Min-) heap property: Peers only connect to peers of lower order

- for example: peers only connect to older peers

- Shell constructs a directed overlay

(however, backward edges, see later)

A Distributed Heap?

28

232126

18 17 2019

169

10 3



• What is an oblivious distributed heap?

• Oblivious = overlay topology only depends on set of currently active peers (and their IDs / orders) in the network

- but not on history, e.g., on time when these peers joined!

- example: if at join time, a new peer is inserted at the end of a list of peers, the resulting topology is not oblivious

- example: if a new peer is inserted in a list of peers with respect to the peer‘s order, the topology is oblivious

An Oblivious Distributed Heap? (1)



An Oblivious Distributed Heap? (2)

• Why is oblivious good? - the oblivious property is useful when it comes to fault-tolerance

- e.g., desktops may crash temporarily, and will then rejoin

- if topology is oblivious, peers can „remember“ their old contacts, and

when an old contact reappears, it can be integrated

immediately (instantaneous rejoin)

• Many systems today are oblivious - e.g., Pastry, Chord, etc.

- but not: e.g., Pagoda

- many systems in practice are not: Gnutella, BitTorrent, etc.



• Primary goal: dynamic and robust overlay

• In particular:- maintaining heap property

- low peer degree, low network diameter, low congestion

- fast join / rejoin / leave

- peers can simply crash

Objectives of Shell

• Applications- i-SHELL: A distributed information system robust to Sybil attacks

- h-SHELL: A peer-to-peer system for heterogeneous environments


• How to achieve these goals?

• Overlay based on continuous-discrete approach- basically a de Bruijn graph

Overlay Graph (1)

• Refresher: continuous-discrete approach- peers in cyclic [0,1)-interval

- connected to peer responsible for continuous position x/2 and (x+1)/2


• Our distributed heap has larger peer degree

• Space is divided into different partitions- partition i = 2i intervals of size 1/2i

- global partition renders analysis

simpler („same views“)

Overlay Graph (2)


• Peer connects to all peers of lower order in- Level-i home interval (interval which includes position x of peer)

- Adjacent level-i intervals to home

- de Bruijn intervals: intervals which include position x/2 and (x+1)/2

• What is level i?- Level i chosen such that there are c log np peers in interval

- np = total number of peers in system with lower order

- np can be estimated, in the following we assume it is given

Overlay Graph (3)


• In order to ensure connectivity when many peers leave, interval size must be increased over time (peer upgrades to larger partition)

• Similarly, if many peers of lower order join in interval, peers needs to downgrade

• In addition to these forward edges, peers store incoming edges- called backward edges

Overlay Graph (4)


• These edges are already sufficient for Shell

• However, in order to speed-up changes between levels, peer additionally store pointers to peers it would connect to if it upgraded- to „funnel“ to which peer would connect

- of course, peer only connects to these lower order peers once they are on the corresponding level

- requires notification mechanism

Overlay Graph (5)

Level i

Level i-1

Level i-2

Level 1... ...

• In the following, we will

not consider funnel edges

in further detail!


Implication: Monotonicity

• From this construction, we can already derive some properties

• For instance, Shell features a monotonicity property:

If two peers p and p‘ are connected to the same interval I and if p is of larger order than p‘, then p knows strictly more peers in I

- because peers only connect to lower order peers in an interval


Distributed Order...: A Simplification

• In the following, we will assume that peers have distinct IDs

• E.g., assigned at join time by network entry point

• Otherwise: in case of multiple joins close in time, peers may not be able to decide which is older => need to introduce blackout zones, etc.

• In the following, we will not consider this issue in more detail


Analysis of Degree (1)

• Topological description allows to analyze the peer degree

• Peers employ the following strategy: if number of neighbors falls below c log n_p in at least one interval, all intervals are doubled

• According to Chernoff bounds, it holds that

if one interval contains c log n peers, there is

no interval of size larger (1+d) c log n for any

d > 0, with high probability.

• Therefore, degree is in O(log n) w.h.p.- with funnel edges, the degree is log square


Analysis of Degree (2)

• What about incoming / backward edges?


Routing (1)

• The Shell overlay allows peers to route messages

• Similarly to continuous-discrete routing (adjusting one bit after another)

• Routing operation route(x) consists of two phasesPhase 1: Route along forward edges to peer of lower order which is closest to x

(or: to a lower order peer whose home region contains position x)

Phase 2: Descent along backward edges to peer which is closest to x

Implication: If a peer wants to send a message

to a peer of lower order, only Phase 1 is necessary,

and the message will not traverse any higher order peers!


Routing (2)

• Observe that in our overlay, peers have multiple neighbors which could be used for the next de Bruijn routing hop (log n neighbors per interval)

• This can be exploited in order to minimize congestion

• Routing policy: peer p always forwards packets to its neighbor which is of largest order among the eligible peers (lower order than p)

• This alleviates load on very low order peers


Routing (3)

• Visualization of routing

towards

higher order

peers

• Messages travel towards lower order peers

• But on each hop, as high order peer as possible is taken


Routing (4)

• Analysis of Phase 1- accoring to continuous-discrete routing, at most log n hops are needed to destination

- we make the following observation:

towards

higher order

peers

prob that all peers of order lower than p

but higher than n_p-l_1 are in other intervalprob that this peer is located

in the corresponding interval


Routing (5)

• Generally for i-th hop:

towards

higher order

peers

• Summing up, after some lines of calculation, the probability that the

final peer reached is of order np/2 or smaller is at most O(np-c) for

some constant c

With high probability, in first phase of routing, request

travels to peer of order at least np/2.


Routing (6)

• Definition of congestion:

towards

higher order

peers

• So what is the congestion in the first routing phase?


Routing (7)

towards

higher order

peers

• So what is the congestion in the first routing phase?

At most k peers can send via p, routing path is

of length log 2k and probability that it enters

interval on one of these hops is c log k / k

See our argument before...


Routing (8)

Theorem: First phase of routing terminates in logarithmic

time and yields congestion of asymptotically log2 np.


Routing (9)

• Routing phase 2: descent along backward edges to higher order peers- idea: binary search which exploits monotonicity property

- higher order peers know more about interval

- on each level i, go to highest order peer which is located in interval which includes final position x

- terminates in logarithmic time

- logarithmic congestion: in each hop, a peer forwards at most one request


Join and Leave

• Join: similar to lookup, find highest order peer in final interval, get integrated

• Leave: peers can even crash, not particular operation

• Change of level in time O(1), update cost induced at other peers in O(log2 n)


Application 1: i-Shell

• i-Shell is a distributed information system

• Idea: data management through consistent hashing approach

• Generalized to multiple levels: on each level, data is stored on peer closest to x

- on each hop during insertion, a replica is placed

• Order of peers: time-stamps (assigned by network entry point)

• Thus: peers only connect to older peers


i-Shell

• Therefore:

- we immediately get that two peers p and p‘ can communicate on paths which include only peers which are of peers at least their age

- this renders the communication independent of younger peers

• Side benefit: measurement studies have shown that older peers typically have a longer remaining session time- renders topology more stable

• Shells imply rebustness to various attacks

• E.g., Sybil attack


Sybil Attack (1)

• Sybil attack

- big problem in Internet

- e.g., spam

- Sybil: book by Flora Rheta about person with 16 identities

• Attacker seeks to acquire many identities- e.g., to control large fraction of network

• Countermeasures- virutal identities: captchas etc.

- real identities? botnet?

- Douceur has shown that issue is difficult to deal with in distributed environments...


Sybil Attack (2)

• Shell is resilient to Sybil attacks of any scale!

• Model: Sybil attack starts at some time t0

• Theorem: traffic of old peers independent of Sybil attack

• Techniques

- Admission control

- Rate control3

475

10 8 912

2114

15 11attack originates from lower peers

higher peers can perform a

rate control algorithm

traffic between older

peers unaffected


Application 2: h-Shell

• Alternatively, IDs could represent inverse of the peers‘ capabilities

• Therefore: peers only connect to peers with stronger capabilities

• Interesting architecture for heterogeneous systems

• Corollary: paths between strong peers only include strong peers

• Interesting, e.g., for

multi-quality live-streaming


Conclusion

• Distributed heap based on continuous-discrete appraoch

• Oblivious for highly transient environments

• Robustness to Sybil attacks of arbitrary scale

• Alternatively, useful for heterogeneous environments

• Work in progress...


SHELL: A Distributed and Oblivious Heap with Applications for Robust Information Systems and Heterogeneous Peer-to-Peer Networks Christian Scheideler Stefan.

Documents

distributed systems

peers order

distributed heap refresher

active peers

list of peers

leave peers

peer system

heterogeneous peer