SHELL: A Distributed and Oblivious Heap with Applications for Robust Information Systems and Heterogeneous Peer- to-Peer Networks Christian Scheideler Stefan Schmid Network Algorith Summer 20
Jan 04, 2016
SHELL: A Distributed and Oblivious Heap
with Applications for Robust Information Systems and Heterogeneous Peer-to-Peer Networks
Christian Scheideler
Stefan Schmid
Network Algorithms
Summer 2008
Stefan Schmid @ TU München, 2008 2
DISTRIBUTED COMPUTING
• Prof. Scheideler an Konferenz
• Deshalb: Spezialprogramm
• Shell- Baut auf gelerntem auf!
- Ongoing work...
Keine Unterlagen
Hat noch Lücken, ev. auch Fehler
/ Slides auf Englisch damit auch sonst mal gebrauchbar!
Offen für Inputs / Ideen!
Bevor wir SHELL anschauen...
Stefan Schmid @ TU München, 2008 3
DISTRIBUTED COMPUTING
• Today, still many challenges in distributed systems (e.g., the Internet)
• E.g., viruses, spam, DoS attacks, selfish users, etc.
• Very active research
• For example, peer-to-peer computing- Dynamics / churn: Peers join and leave frequently
- In 1,000,000 network where peer sessions are around 60 minutes, there are hundreds of membership changes every second!
- Peer-to-peer based on contributions of participants: problematic if users are selfish!
- E.g., BitThief free-rides in BitTorrent
- Heterogeneity: peers have different Internet connections, different CPUs, run different operating systems, etc.
Motivation
Stefan Schmid @ TU München, 2008 4
DISTRIBUTED COMPUTING
• SHELL = our overlay architecture
• Basically, a distributed heap
• Refresher: min heap- children have larger key
than parent
- e.g., useful for priority
queues (fast removeMin())
SHELL Overview
slide from GAD lecture 2008...
Stefan Schmid @ TU München, 2008 5
Heap Refresher
• Heap in GAD...
Stefan Schmid @ TU München, 2008 6
DISTRIBUTED COMPUTING
• What is a distributed heap?
• We assume that peers have a key / order / rank / id- for example: time when peer joined
• (Min-) heap property: Peers only connect to peers of lower order
- for example: peers only connect to older peers
- Shell constructs a directed overlay
(however, backward edges, see later)
A Distributed Heap?
28
232126
18 17 2019
169
10 3
Stefan Schmid @ TU München, 2008 7
DISTRIBUTED COMPUTING
• What is an oblivious distributed heap?
• Oblivious = overlay topology only depends on set of currently active peers (and their IDs / orders) in the network
- but not on history, e.g., on time when these peers joined!
- example: if at join time, a new peer is inserted at the end of a list of peers, the resulting topology is not oblivious
- example: if a new peer is inserted in a list of peers with respect to the peer‘s order, the topology is oblivious
An Oblivious Distributed Heap? (1)
Stefan Schmid @ TU München, 2008 8
DISTRIBUTED COMPUTING
An Oblivious Distributed Heap? (2)
• Why is oblivious good? - the oblivious property is useful when it comes to fault-tolerance
- e.g., desktops may crash temporarily, and will then rejoin
- if topology is oblivious, peers can „remember“ their old contacts, and
when an old contact reappears, it can be integrated
immediately (instantaneous rejoin)
• Many systems today are oblivious - e.g., Pastry, Chord, etc.
- but not: e.g., Pagoda
- many systems in practice are not: Gnutella, BitTorrent, etc.
Stefan Schmid @ TU München, 2008 9
DISTRIBUTED COMPUTING
• Primary goal: dynamic and robust overlay
• In particular:- maintaining heap property
- low peer degree, low network diameter, low congestion
- fast join / rejoin / leave
- peers can simply crash
Objectives of Shell
• Applications- i-SHELL: A distributed information system robust to Sybil attacks
- h-SHELL: A peer-to-peer system for heterogeneous environments
Stefan Schmid @ TU München, 2008 10
• How to achieve these goals?
• Overlay based on continuous-discrete approach- basically a de Bruijn graph
Overlay Graph (1)
• Refresher: continuous-discrete approach- peers in cyclic [0,1)-interval
- connected to peer responsible for continuous position x/2 and (x+1)/2
Stefan Schmid @ TU München, 2008 11
• Our distributed heap has larger peer degree
• Space is divided into different partitions- partition i = 2i intervals of size 1/2i
- global partition renders analysis
simpler („same views“)
Overlay Graph (2)
Stefan Schmid @ TU München, 2008 12
• Peer connects to all peers of lower order in- Level-i home interval (interval which includes position x of peer)
- Adjacent level-i intervals to home
- de Bruijn intervals: intervals which include position x/2 and (x+1)/2
• What is level i?- Level i chosen such that there are c log np peers in interval
- np = total number of peers in system with lower order
- np can be estimated, in the following we assume it is given
Overlay Graph (3)
Stefan Schmid @ TU München, 2008 13
• In order to ensure connectivity when many peers leave, interval size must be increased over time (peer upgrades to larger partition)
• Similarly, if many peers of lower order join in interval, peers needs to downgrade
• In addition to these forward edges, peers store incoming edges- called backward edges
Overlay Graph (4)
Stefan Schmid @ TU München, 2008 14
• These edges are already sufficient for Shell
• However, in order to speed-up changes between levels, peer additionally store pointers to peers it would connect to if it upgraded- to „funnel“ to which peer would connect
- of course, peer only connects to these lower order peers once they are on the corresponding level
- requires notification mechanism
Overlay Graph (5)
Level i
Level i-1
Level i-2
Level 1... ...
• In the following, we will
not consider funnel edges
in further detail!
Stefan Schmid @ TU München, 2008 15
Implication: Monotonicity
• From this construction, we can already derive some properties
• For instance, Shell features a monotonicity property:
If two peers p and p‘ are connected to the same interval I and if p is of larger order than p‘, then p knows strictly more peers in I
- because peers only connect to lower order peers in an interval
Stefan Schmid @ TU München, 2008 16
Distributed Order...: A Simplification
• In the following, we will assume that peers have distinct IDs
• E.g., assigned at join time by network entry point
• Otherwise: in case of multiple joins close in time, peers may not be able to decide which is older => need to introduce blackout zones, etc.
• In the following, we will not consider this issue in more detail
Stefan Schmid @ TU München, 2008 17
Analysis of Degree (1)
• Topological description allows to analyze the peer degree
• Peers employ the following strategy: if number of neighbors falls below c log n_p in at least one interval, all intervals are doubled
• According to Chernoff bounds, it holds that
if one interval contains c log n peers, there is
no interval of size larger (1+d) c log n for any
d > 0, with high probability.
• Therefore, degree is in O(log n) w.h.p.- with funnel edges, the degree is log square
Stefan Schmid @ TU München, 2008 18
Analysis of Degree (2)
• What about incoming / backward edges?
Stefan Schmid @ TU München, 2008 19
Routing (1)
• The Shell overlay allows peers to route messages
• Similarly to continuous-discrete routing (adjusting one bit after another)
• Routing operation route(x) consists of two phasesPhase 1: Route along forward edges to peer of lower order which is closest to x
(or: to a lower order peer whose home region contains position x)
Phase 2: Descent along backward edges to peer which is closest to x
Implication: If a peer wants to send a message
to a peer of lower order, only Phase 1 is necessary,
and the message will not traverse any higher order peers!
Stefan Schmid @ TU München, 2008 20
Routing (2)
• Observe that in our overlay, peers have multiple neighbors which could be used for the next de Bruijn routing hop (log n neighbors per interval)
• This can be exploited in order to minimize congestion
• Routing policy: peer p always forwards packets to its neighbor which is of largest order among the eligible peers (lower order than p)
• This alleviates load on very low order peers
Stefan Schmid @ TU München, 2008 21
Routing (3)
• Visualization of routing
towards
higher order
peers
• Messages travel towards lower order peers
• But on each hop, as high order peer as possible is taken
Stefan Schmid @ TU München, 2008 22
Routing (4)
• Analysis of Phase 1- accoring to continuous-discrete routing, at most log n hops are needed to destination
- we make the following observation:
towards
higher order
peers
prob that all peers of order lower than p
but higher than n_p-l_1 are in other intervalprob that this peer is located
in the corresponding interval
Stefan Schmid @ TU München, 2008 23
Routing (5)
• Generally for i-th hop:
towards
higher order
peers
• Summing up, after some lines of calculation, the probability that the
final peer reached is of order np/2 or smaller is at most O(np-c) for
some constant c
With high probability, in first phase of routing, request
travels to peer of order at least np/2.
Stefan Schmid @ TU München, 2008 24
Routing (6)
• Definition of congestion:
towards
higher order
peers
• So what is the congestion in the first routing phase?
Stefan Schmid @ TU München, 2008 25
Routing (7)
towards
higher order
peers
• So what is the congestion in the first routing phase?
At most k peers can send via p, routing path is
of length log 2k and probability that it enters
interval on one of these hops is c log k / k
See our argument before...
Stefan Schmid @ TU München, 2008 26
Routing (8)
Theorem: First phase of routing terminates in logarithmic
time and yields congestion of asymptotically log2 np.
Stefan Schmid @ TU München, 2008 27
Routing (9)
• Routing phase 2: descent along backward edges to higher order peers- idea: binary search which exploits monotonicity property
- higher order peers know more about interval
- on each level i, go to highest order peer which is located in interval which includes final position x
- terminates in logarithmic time
- logarithmic congestion: in each hop, a peer forwards at most one request
Stefan Schmid @ TU München, 2008 28
Join and Leave
• Join: similar to lookup, find highest order peer in final interval, get integrated
• Leave: peers can even crash, not particular operation
• Change of level in time O(1), update cost induced at other peers in O(log2 n)
Stefan Schmid @ TU München, 2008 29
Application 1: i-Shell
• i-Shell is a distributed information system
• Idea: data management through consistent hashing approach
• Generalized to multiple levels: on each level, data is stored on peer closest to x
- on each hop during insertion, a replica is placed
• Order of peers: time-stamps (assigned by network entry point)
• Thus: peers only connect to older peers
Stefan Schmid @ TU München, 2008 30
i-Shell
• Therefore:
- we immediately get that two peers p and p‘ can communicate on paths which include only peers which are of peers at least their age
- this renders the communication independent of younger peers
• Side benefit: measurement studies have shown that older peers typically have a longer remaining session time- renders topology more stable
• Shells imply rebustness to various attacks
• E.g., Sybil attack
Stefan Schmid @ TU München, 2008 31
Sybil Attack (1)
• Sybil attack
- big problem in Internet
- e.g., spam
- Sybil: book by Flora Rheta about person with 16 identities
• Attacker seeks to acquire many identities- e.g., to control large fraction of network
• Countermeasures- virutal identities: captchas etc.
- real identities? botnet?
- Douceur has shown that issue is difficult to deal with in distributed environments...
Stefan Schmid @ TU München, 2008 32
Sybil Attack (2)
• Shell is resilient to Sybil attacks of any scale!
• Model: Sybil attack starts at some time t0
• Theorem: traffic of old peers independent of Sybil attack
• Techniques
- Admission control
- Rate control3
475
10 8 912
2114
15 11attack originates from lower peers
higher peers can perform a
rate control algorithm
traffic between older
peers unaffected
Stefan Schmid @ TU München, 2008 33
Application 2: h-Shell
• Alternatively, IDs could represent inverse of the peers‘ capabilities
• Therefore: peers only connect to peers with stronger capabilities
• Interesting architecture for heterogeneous systems
• Corollary: paths between strong peers only include strong peers
• Interesting, e.g., for
multi-quality live-streaming
Stefan Schmid @ TU München, 2008 34
Conclusion
• Distributed heap based on continuous-discrete appraoch
• Oblivious for highly transient environments
• Robustness to Sybil attacks of arbitrary scale
• Alternatively, useful for heterogeneous environments
• Work in progress...
Stefan Schmid @ TU München, 2008 35