Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.

Post on 14-Dec-2015

215 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

Transcript

Pastry

Peter Druschel, Rice University

Antony Rowstron, Microsoft Research UK

Some slides are borrowed from the original presentation by the authors

Outline

• Background

• Pastry

• Pastry proximity routing

• PAST

Background

Peer-to-peer systems

• distribution

• decentralized control

• self-organization

• symmetry (communication, node roles)

Common issues

• Organize, maintain overlay network

• Resource allocation/load balancing

• Resource location

• Network proximity routing

Pastry provides a generic p2p substrate

Architecture

TCP/IP

Pastry

Network storage

Event notification

Internet

P2p substrate (self-organizingoverlay network)

P2p application layer?

Structured p2p overlays

The primitive route(M, X) routes message M to the live node with node Id closest to key X

Node ids and keys are from a large, sparse id space

Distributed Hash Tables (DHT)

k6,v6

k1,v1

k5,v5

k2,v2

k4,v4

k3,v3

nodes

Operations:insert(k,v)lookup(k)

P2P overlay networ

k

P2P overlay networ

k

• p2p overlay maps keys to nodes• completely decentralized and self-organizing• robust, scalable

Outline

• Background

• Pastry

• Pastry proximity routing

• PAST

• SCRIBE

• Conclusions

Pastry: Object distribution

objId

Consistent hashing [Karger et al. ‘97]

128 bit circular id space

nodeIds (uniform random)

objIds (uniform random)

Invariant: node with numerically closest nodeId maintains object

nodeIds

O2128-1

Pastry: Object insertion/lookup

X

Route(X)

Msg with key X is routed to live node with nodeId closest to X

Problem: complete

routing table not feasible

O2128-1

Pastry: Routing

Tradeoff

• O(log N) routing table size

• O(log N) message forwarding steps

Routing table of # 65a1fcx0x

1x

2x

3x

4x

5x

7x

8x

9x

ax

bx

cx

dx

ex

fx

60x

61x

62x

63x

64x

66x

67x

68x

69x

6ax

6bx

6cx

6dx

6ex

6fx

650x

651x

652x

653x

654x

655x

656x

657x

658x

659x

65bx

65cx

65dx

65ex

65fx

65a0x

65a2x

65a3x

65a4x

65a5x

65a6x

65a7x

65a8x

65a9x

65aax

65abx

65acx

65adx

65aex

65afx

log16 N rows

Row 0

Row 1

Row 2

Row 3

Pastry: Routing

Propertieslog16 N steps O(log N) state

d46a1c

Route(d46a1c)

d462ba

d4213f

d13da3

65a1fc

d467c4d471f1

Pastry: Leaf sets

Each node maintains IP addresses of the nodes with the L/2 numerically closest larger and smaller nodeIds, respectively.

• routing efficiency/robustness

• fault detection (keep-alive)

• application-specific local coordination

Pastry: Routing procedureif (destination is within range of our leaf set)

forward to numerically closest memberelse

let l = length of shared prefix let d = value of l-th digit in D’s addressif (Rl

d exists) (Rld = entry at column d row l)

forward to Rld

else forward to a known node that (a) shares at least as long a prefix(b) is numerically closer than this node

Pastry: Performance

Integrity of overlay/ message delivery:• guaranteed unless L/2 simultaneous failures

of nodes with adjacent nodeIds

Number of routing hops:• No failures: < log16 N expected, 128/b + 1 max

• During failure recovery:– O(N) worst case, average case much better

Self-organization

How are the routing tables and leaf sets

initialized and maintained?

• Node addition

• Node departure (failure)

Pastry: Node addition

d46a1c

Route(d46a1c)

d462ba

d4213f

d13da3

65a1fc

d467c4d471f1

New node: d46a1c

Node departure (failure)

Leaf set members exchange heartbeat

• Leaf set repair (eager): request set from farthest live node in set

• Routing table repair (lazy): get table from peers in the same row, then higher rows

Pastry: Experimental results

Prototype

• implemented in Java

• emulated network

• deployed currently at ~25 sites worldwide

Pastry: Average # of hops

L=16, 100k random queries

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

1000 10000 100000

Number of nodes

Average number of hops

Pastry

Log(N)

Outline

• Background

• Pastry

• Pastry proximity routing

Pastry: Proximity routing

Proximity metric = time delay estimated by ping

A node can probe distance to any other node

Each routing table entry uses a node close to the local node (in the proximity space), among all nodes with the appropriate node Id prefix.

Pastry: Routes in proximity space

d46a1c

Route(d46a1c)

d462ba

d4213f

d13da3

65a1fc

d467c4d471f1

NodeId space

d467c4

65a1fcd13da3

d4213f

d462ba

Proximity space

Pastry: Distance traveled

L=16, 100k random queries, Euclidean proximity space

0.8

0.9

1

1.1

1.2

1.3

1.4

1000 10000 100000Number of nodes

Relative DistancePastry

Complete routing table

Pastry: Locality properties

Expected distance traveled by a message in the proximity space is within a small constant of the minimum

Among k nodes with node Ids closest to the key, message likely to reach the node closest to the source node first

d467c4

65a1fcd13da3

d4213f

d462ba

Proximity space

Pastry: Node addition

New node: d46a1c

d46a1c

Route(d46a1c)

d462ba

d4213f

d13da3

65a1fc

d467c4d471f1

NodeId space

Pastry delay vs IP delay

0

500

1000

1500

2000

2500

0 200 400 600 800 1000 1200 1400

Distance between source and destination

Distance traveled by Pastry message

Mean = 1.59

GATech top., .5M hosts, 60K nodes, 20K random messages

Pastry: API

• route(M, X): route message M to node with nodeId numerically closest to X

• deliver(M): deliver message M to application• forwarding(M, X): message M is being

forwarded towards key X• newLeaf(L): report change in leaf set L to

application

Pastry: Security

• Secure nodeId assignment

• Secure node join protocols

• Randomized routing

• Byzantine fault-tolerant leaf set membership protocol

Pastry: Summary

• Generic p2p overlay network

• Scalable, fault resilient, self-organizing, secure

• O(log N) routing steps (expected)

• O(log N) routing table size

• Network proximity routing

PAST: File Retrieval

fileId file located in log16 N steps (expected)

usually locates replica nearest client C

Lookup

k replicasC

top related