1 PASTRY PASTRY
1
PASTRYPASTRY
2
Pastry paper “Pastry: Scalable, decentralized object location and routing for
large-scale peer-to-peer systems” by Antony Rowstron (Microsoft Research) and Peter Druschel (Rice University), IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), Heidelberg, Germany, pages 329-350, November, 2001
Pastry Homepage http://research.microsoft.com/en-us/um/people/antr/Pastry/default.htm
Sources
Related work
Chord [Sigcomm’01] CAN [Sigcomm’01] Tapestry [TR UCB/CSD-01-1141]
PNRP [unpub.] Viceroy [PODC ’02] Kademlia [IPTPS ’02] Small World [Kleinberg ‘99, ‘00] Plaxton Trees [Plaxton et al. ‘97] Generalized Hypercube [Bhuyan et al. ‘84]
4
Pastry
Generic p2p location and routing substrate (DHT)
Self-organizing overlay network (join, departures, locality repair)
Consistent hashing Lookup/insert object in < log2
b N routing steps
(expected) O(log N) per-node state Network locality heuristics
Scalable, fault resilient, self-organizing, locality aware, secure
5
Pastry: Object distribution
objId/key
Consistent hashing
128 bit circular id space
nodeIds (uniform random)
objIds/keys (uniform random)
Invariant: node with numerically closest nodeId maintains object
nodeIds
O2128 - 1
6
Pastry: Object insertion/lookup
X
Route(X)
Msg with key X is routed to live node with nodeId closest to X
Problem:
complete routing table not feasible
O2128 - 1
CMPT 880: P2P Systems - SFU 7
Pastry Node
Represented by 128-bit randomly chosen nodeId (Hash of IP or public key)
NodeId is in base 2b (b is a configuration parameter; b typical value 2 or 4)
Evenly distributed nodeIds along the circular namespace (0-2128 – 1 space).
Routes a message in O(log N) steps to destination N: size of network
Node state contains: Leaf Set ( L ) Routing table ( R ) Neighborhood Set ( M )
8
Pastry node state
Leaf set: L/2 Numerically closest nodes (L is a configuration parameter = 16, 32 typically )
Routing Table (Prefix-based)
Neighborhood Set: M physically closest nodes
9
Pastry node state (Leaf Set)
Serves as a fall back for routing table and contains: L/2 numerically closest and larger nodeIds L/2 numerically closest and smaller
nodIds Size of L is typically 2b or 2 x 2b
Nodes in L are numerically close (could be geographically diverse)
10
Pastry node state: Neighborhood set (M)
Contains the IP addresses and nodeIds of closest nodes according to proximity metric
Size of |M| is typically 2b or 2x2b
Not used in routing, but instead for maintaining locality properties
11
Node state: Routing Table
Matrix of Log2b N rows and 2b – 1
columns (N is the number of nodes in the network) Entries in row n match the first n digits of
current nodeId AND Column number follows matched digits:
Format: matched digits–column number–rest of ID
Log2b N populated on average
12
Node10233102 (2), (b = 2, l = 8)
0 1 2 302212102 22301203 31203203
11301233 12230203 1302102210031203 10132102 1032330210200230 10211302 102230210230322 10231000 1023212110233001 10233232
10233120
13
Pastry: Routing
Tradeoff
O(log N) routing table size 2b * log2
bN + 2l
O(log N) message forwarding steps
Prefix Routing Node IDs and keys from randomized namespace (SHA-1)
incremental routing towards destination ID each node has small set of outgoing routes log (n) neighbors per node, log (n) hops between any node pair
To: ABCE
ID: ABCE
A930
AB5F
ABC0
15
Pastry: Routing table (# 10233102)
L nodes in leaf set
log2b N Rows
(actuallylog2b 2128=
128/b)
2b columns
L neighbors
16
D: Message KeyLi: ith closest NodeId in leaf setshl(A, B): Length of prefix shared by nodes A and BRi
j: (j, i)th entry of routing table
(1) Node is in the leaf set
(2) Forward message to a closer node (Better match)
(3) Forward towards numericallyCloser node (not a better match)
Pastry: Routing procedure
17
Pastry: Routing procedure
If (destination is within range of our leaf set) forward to numerically closest member
elselet l = length of shared prefix let d = value of l-th digit in D’s addressif (Rld exists)
forward to Rld
else forward to a known node (from ) that (a) shares at least as long a prefix(b) is numerically closer than this node
MRL
CMPT 880: P2P Systems - SFU 18
If message with key D is within range of leaf set, forward to numerically closest leaf
Else forward to node that shares at least one more digit with D in its prefix than current nodeId
If no such node exists, forward to node that shares at least as many digits with D as current nodeId but numerically nearer than current nodeId
Pastry: Routing procedure
19
Pastry: Routing
Properties• log2
b N steps • O(log N) state
d46a1c
Look for (d46a1c)
d462ba
d4213f
d13da3
65a1fc
d467c4d471f1
20
Pastry: Locality properties
Assumption: scalar proximity metric e.g. ping/RTT delay, # IP hops traceroute, subnet masks a node can probe distance to any other node
Proximity invariant: Each routing table entry refers to a node closeto the local node (in the proximity space), amongall nodes with the appropriate nodeId prefix.
21
Pastry: Geometric Routing in proximity space
d46a1c
Route(d46a1c)
d462ba
d4213f
d13da3
65a1fc
d467c4d471f1 d467c4
65a1fcd13da3
d4213f
d462ba
Proximity space
The proximity distance traveled by message in each routing step is exponentially increasing (entry in row l is chosen from a set of nodes of size N/2bl)The distance traveled by message from its source increases monotonically at each step (message takes larger and larger strides)
NodeId space
22
Pastry: Locality properties
Each routing step is local, but there is no guarantee of globally shortest path
Nevertheless, simulations show: Expected distance traveled by a message
in the proximity space is within a small constant of the minimum
Among k nodes with nodeIds closest to the key, message likely to reach the node closest to the source node first
23
Pastry: Self-organization
Initializing and maintaining routing tables and leaf sets
Node addition Node departure (failure)
The goal is to maintain all routing table entries
to refer to a near node, among all live nodes with appropriate prefix
24
New node X contacts nearby node A A routes “join” message to X, which arrives
to Z, closest to X X obtains leaf set from Z, i’th row for
routing table from i’th node from A to Z X informs any nodes that need to be aware
of its arrival X also improves its table locality by requesting
neighborhood sets from all nodes X knows In practice: optimistic approach
Pastry: Node addition
25
Pastry: Node addition
X=d46a1c
Route(d46a1c)
d462ba
d4213f
d13da3
A = 65a1fc
Z=d467c4d471f1
New node: X=d46a1c
A is X’s neighbor
26
d467c4
65a1fcd13da3
d4213f
d462ba
Proximity space
Pastry: Node addition
New node: d46a1c
d46a1c
Route(d46a1c)
d462bad4213f
d13da3
65a1fc
d467c4d471f1
NodeId space
X is close to A, B is close to B1. Why X is close to B1?The expected distance from B to its row one entries (B1) is
much largerthan the expected distance from A to B (chosen from
exponentially decreasing set size)
X
B1 is first row of B
27
Node departure (failure)
Leaf set repair (eager – all the time): Leaf set members exchange keep-alive
messages request set from furthest live node in set
Routing table repair (lazy – upon failure): get table from peers in the same row, if not
found – from higher rows Neighborhood set repair (eager)
28
Pastry: Summary
Generic p2p overlay network Scalable, fault resilient, self-
organizing, secure O(log N) routing steps (expected) O(log N) routing table size Network locality properties