CMPT 401 Summer 2008 Dr. Alexandra Fedorova Lecture XIV: P2P
Dec 19, 2015
CMPT 401 Summer 2008
Dr. Alexandra Fedorova
Lecture XIV: P2P
2CMPT 401 2008 © A. Fedorova
Outline
• Definition of peer-to-peer systems• Motivation and challenges of peer-to-peer systems• Early P2P systems (Napster, Gnutella)• Structured overlays (Pastry)• P2P applications: Squirrel, OceanStore
3CMPT 401 2008 © A. Fedorova
Definition of P2P
Peer to PeerClient Server
P2P systems motivated by massive computing resources connected over the network available all over the world
4CMPT 401 2008 © A. Fedorova
• Definition #1– A network architecture – Without centralized coordination
• Definition #2– Each node/peer is client and server at the same time– Each peer provides content and/or resources– Direct exchange between peers– Autonomy of peers (can join and leave at their will)
Definition of Peer-2-Peer
5CMPT 401 2008 © A. Fedorova
Why P2P?• Enable the sharing of data and resources.• Computer and Internet usage has exploded in the recent
years• Massive computing resource available at the edges of the
Internet – storage, cycles, content, human presence.
6CMPT 401 2008 © A. Fedorova
WORLD INTERNET USAGE AND POPULATION STATISTICS
World Regions Population( 2007 Est.)
Population% of World
Internet Usage,Latest Data
% Population( Penetration )
Usage% of
World
Usage Growth2000-2007
Africa 933,448,292 14.2 % 33,545,600 3.6 % 2.9 % 643.1 %
Asia 3,712,527,624 56.5 % 418,007,015 11.3 % 36.2 % 265.7 %
Europe 809,624,686 12.3 % 321,853,477 39.8 % 27.9% 206.2 %
Middle East 193,452,727 2.9 % 19,539,300 10.1 % 1.7 % 494.8 %
North America 334,538,018 5.1 % 232,655,287 69.5 % 20.2% 115.2 %
Latin America/Caribbean
556,606,627 8.5 % 109,961,609 19.8 % 9.5 % 508.6 %
Oceania / Australia 34,468,443 0.5 % 18,796,490 54.5 % 1.6 % 146.7 %
WORLD TOTAL 6,574,666,417 100.0 % 1,154,358,778 17.6 % 100.0 % 219.8 %
7CMPT 401 2008 © A. Fedorova
Benefits and Challenges
• Benefits– Massive resources– Load balancing– Anonymity– Fault tolerance– Locality
• Challenges– Security– Failure handling (nodes coming and leaving – churn)– Efficiency – massive system: how to search it efficiently– Support data mutation
8CMPT 401 2008 © A. Fedorova
Evolution of P2P Systems
• Three generations:• Generation 1: Early music exchange services (Napster,
Gnutella) • Generation 2: Offers greater scalability, anonymity and fault
tolerance (Kazaa)• Generation 3: Emergence of middleware layers for the
application-independent management (Pastry, Tapestry)
9CMPT 401 2008 © A. Fedorova
Architecture
Peer to Peer
Hybrid (Napster, SETI@Home)
Pure
Unstructured(Gnutella)
Structured(Pastry)
Super-peer(Kazaa)
10CMPT 401 2008 © A. Fedorova
Overlay Routing versus IP Routing
• Routing overlays: route from one node in the P2P system to another
• At each hop deliver to the next P2P node• Another layer or routing on top of existing IP routing
11CMPT 401 2008 © A. Fedorova
Search in Hybrid P2P: Napster
Lookup Server,Index table
Peer A
Peer B(song A)
Peer C
Peer D
2.Return IP(B)
3. Download from B
0. Upload Song Names
0. Upload Song Names
0. Upload Song Names
1. Query song A
• Lookup centralized• Peers provide meta-information to Lookup server• Data exchange between peers
12CMPT 401 2008 © A. Fedorova
Search in Unstructured P2P
Peer A Peer B Peer C
Peer I(song A)
Peer F
1. Query song A
2. query
Peer H
Peer EPeer D
Peer G
3. [File found]Download
TTL= N TTL= N-1
13CMPT 401 2008 © A. Fedorova
Common Issues
• Organize, maintain overlay network– node arrivals– node failures
• Resource allocation/load balancing• Resource location• Network proximity routing
Idea: provide a generic P2P substrate (Pastry, Chord, Tapestry)
14CMPT 401 2008 © A. Fedorova
Architecture
TCP/IP
Pastry
Network storage
Event notification
Internet
P2P substrate (self-organizingoverlay network)
P2P application layer?
16CMPT 401 2008 © A. Fedorova
Pastry: Object distribution
objId
Globally Unique IDs (GUIDs)128 bit circular GUID space
nodeIds (uniform random)
objIds (uniform random)
Invariant: node with numerically closest nodeId maintains object
nodeIds
O2128-1
17CMPT 401 2008 © A. Fedorova
Pastry: Object insertion/lookup
X
Route(X)
Msg with key X is routed to live node with nodeId closest to X
Problem: complete routing table not feasible
O2128-1
18CMPT 401 2008 © A. Fedorova
Pastry Routing
• Leaf sets – closest nodes• Routing table – subset of nodes that are far away• If you are far from the target node/object, route using
routing table• Once you get closer use the leaf set• Routing table has to be well populated, so you can reach
many far-away destinations• A complete routing table can be very large• How to make routing table size feasible?
19CMPT 401 2008 © A. Fedorova
Pastry: Routing
Properties• log16 N steps • O(log N) state
d46a1c
Route(d46a1c)
d462ba
d4213f
d13da3
65a1fc
d467c4d471f1
20CMPT 401 2008 © A. Fedorova
Pastry: Routing table (# 65a1fc)
log16 Nrows
Row 0
Row 1
Row 2
Row 3
0x
1x
2x
3x
4x
5x
7x
8x
9x
ax
bx
cx
dx
ex
fx
60x
61x
62x
63x
64x
66x
67x
68x
69x
6ax
6bx
6cx
6dx
6ex
6fx
650x
651x
652x
653x
654x
655x
656x
657x
658x
659x
65bx
65cx
65dx
65ex
65fx
65a0x
65a2x
65a3x
65a4x
65a5x
65a6x
65a7x
65a8x
65a9x
65aax
65abx
65acx
65adx
65aex
65afx
21CMPT 401 2008 © A. Fedorova
Pastry Routing Table
• Each row i corresponds to the length of the common prefix– row 0 – 0 hex digits in common– row 1 – 1 common hex digit in common
• Each column corresponds to (i+1)st digit that’s not in common– column 0 – first uncommon digit is 0– column A – first uncommon digit is A
• Corresponding entries are [GUID, IP] pairs• You go as far down the rows in routing table as possible• When you can’t go anymore (no more matching digits), forward
request to [GUID, IP] in the column containing the first uncommon digit
22CMPT 401 2008 © A. Fedorova
Pastry Routing: What’s the Next Hop?
log16 Nrows
Row 0
Row 1
Row 2
Row 3
0x
1x
2x
3x
4x
5x
7x
8x
9x
ax
bx
cx
dx
ex
fx
60x
61x
62x
63x
64x
66x
67x
68x
69x
6ax
6bx
6cx
6dx
6ex
6fx
650x
651x
652x
653x
654x
655x
656x
657x
658x
659x
65bx
65cx
65dx
65ex
65fx
65a0x
65a2x
65a3x
65a4x
65a5x
65a6x
65a7x
65a8x
65a9x
65aax
65abx
65acx
65adx
65aex
65afx
23CMPT 401 2008 © A. Fedorova
Pastry: Routing Algorithmif (destination D is within range of our leaf set)
forward to numerically closest memberelse
let l = length of shared prefix let d = value of l+1-th digit in D’s addresslet Rl
d =table entry at row=l, column=d if (Rl
d exists) forward to IP address at Rl
d
else forward to a known node that (a) shares at least as long a prefix(b) is numerically closer than this node
24CMPT 401 2008 © A. Fedorova
Let’s Play Pastry!
• User at node 65a1fc• Wants to get to object with GUID d46a1c• We will see how each next hop is found using a routing
table or leaf set• So, let’s start with routing table and leaf set at node
65a1fc
25CMPT 401 2008 © A. Fedorova
Node: 65a1fc Destination: d46a1c
Leaf set:65a12365abba65badd65cafe
GUID = d13da3
0x
1x
2x
3x
4x
5x
7x
8x
9x
ax
bx
cx
dx
ex
fx
60x
61x
62x
63x
64x
66x
67x
68x
69x
6ax
6bx
6cx
6dx
6ex
6fx
650x
651x
652x
653x
654x
655x
656x
657x
658x
659x
65bx
65cx
65dx
65ex
65fx
65a0x
65a2x
65a3x
65a4x
65a5x
65a6x
65a7x
65a8x
65a9x
65aax
65abx
65acx
65adx
65aex
65afx
26CMPT 401 2008 © A. Fedorova
Node: d13da3 Destination: d46a1c
Leaf set:d13555d14abcda1367dbcdd5
GUID = d4213f
0x
1x
2x
3x
4x
5x
6x
7x
8x
9x
ax
bx
cx
d ex
fx
d0x
d2x
d3x
d4x
d5x
d6x
d7x
d8x
d9x
dax
dbx
dcx
ddx
dex
dfx
d10x
d11x
d12x
d14x
d15x
d16x
d17x
d18x
d19x
d1ax
d1bx
d1cx
d1dx
d1ex
d1fx
d130x
d131x
d132x
d133x
d134x
d135x
d136x
d137x
d138x
d139x
d13ax
d13bx
d13cx
d13ex
d13fx
27CMPT 401 2008 © A. Fedorova
Node: d4213f Destination: d46a1c
Leaf set:d42cabd42fabdacabbddaddd
GUID = d462ba
0x
1x
2x
3x
4x
5x
6x
7x
8x
9x
ax
bx
cx
d ex
fx
d0x
d1x
d2x
d3x
d5x
d6x
d7x
d8x
d9x
dax
dbx
dcx
ddx
dex
dfx
d40x
d41x
d43x
d44x
d45x
d46x
d47x
d48x
d49x
d4ax
d4bx
d4cx
d4dx
d4ex
d4fx
d420x
d422x
d423x
d424x
d425x
d426x
d427x
d428x
d429x
d42ax
d42bx
d42cx
d42dx
d42ex
d42fx
28CMPT 401 2008 © A. Fedorova
0x
1x
2x
3x
4x
5x
6x
7x
8x
9x
ax
bx
cx
d ex
fx
d0x
d1x
d2x
d3x
d5x
d6x
d7x
d8x
d9x
dax
dbx
dcx
ddx
dex
dfx
d40x
d41x
d42x
d43x
d44x
d45x
d47x
d48x
d49x
d4ax
d4bx
d4cx
d4dx
d4ex
d4fx
d460x
d461x
d463x
d464x
d465x
d466x
d467x
d468x
d469x
d46a
d46bx
d46cx
d46dx
d46ex
d46fx
Node: d462ba Destination: d46a1c
Leaf set:d46cabd46fabdacadadeaddd
GUID = empty?
Forward to any GUID with longest common prefix that’s numerically closer than current node
GUID = d469ab
29CMPT 401 2008 © A. Fedorova
Node: d469ab Destination: d46a1c
Leaf set:d469acd46a00d46a1cdcadda
0x
1x
2x
3x
4x
5x
6x
7x
8x
9x
ax
bx
cx
d ex
fx
d0x
d1x
d2x
d3x
d5x
d6x
d7x
d8x
d9x
dax
dbx
dcx
ddx
dex
dfx
d40x
d41x
d42x
d43x
d44x
d45x
d47x
d48x
d49x
d4ax
d4bx
d4cx
d4dx
d4ex
d4fx
d460x
d461x
d462x
d463x
d464x
d465x
d466x
d467x
d468x
d46ax
d46bx
d46cx
d46dx
d46ex
d46fx
We are done!
30CMPT 401 2008 © A. Fedorova
A New Node Joining Pastry
• Compute its own GUID X – apply SHA-1 hash function to its public key• Get IP address of at least one Pastry node (publicly available)• Find a nearby Pastry node A (by repeatedly querying nodes in a leaf
set of a known Pastry node)• Send a join message to A, with destination X• A will route message to node Z numerically closest to X• Nodes along the route are: B, C, …• Each node on the route send to X a part of its routing table and leaf
set• X constructs its own routing table and leaf set, requests additional
info if needed
31CMPT 401 2008 © A. Fedorova
Node Failure or Departure
• Repairs to leaf set– Members of leaf set are monitored with heartbeat messages– If a member has failed,
• The node searches for another node • numerically closest the failed member
– The node • asks that other node for its leaf set • adds members from that leaf set to its own leaf set
– The node also informs its other neighbours of the failure• Repairs to routing table
– Done on “when discovered basis”
32CMPT 401 2008 © A. Fedorova
Pastry Evaluation: Experimental Setup
• Evaluated on a simulator• A single machine simulates a large network of nodes• Message passing replaced by simulated transmission delay• Model join/leave behaviour of hosts• IP delays and join/leave behaviour parameters and based
on real measurements• Simulator validated using a real installation of 52 nodes
33CMPT 401 2008 © A. Fedorova
Pastry Evaluation: Dependability
• With IP message loss rate of 0% – Pastry failed to deliver 1.5 in 100,000 requests (due to
unavailability of destination host)– All requests that were delivered arrived at the correct node
• With IP message loss rate of 5% – Pastry lost 3.3 in 100,000 requests– 1.6 in 100,000 requests were delivered to the wrong node
34CMPT 401 2008 © A. Fedorova
Pastry Evaluation: Performance
• Performance metric: relative delay penalty (RDP)• RDP: ratio between delay in delivering request by the
routing overlay and in delivering that request via UDP/IP• A direct measure of the extra cost incurred in employing
an overlay routing• RDP in Pastry:
– 1.8 with zero network message loss– 2.2 with 5% network message loss
35CMPT 401 2008 © A. Fedorova
Squirrel
• Web cache. Idea: P2P caching of web objects• Cache web objects on nodes in a local network organized in a P2P
network over Pastry• Motivation: no need for a centralized proxy cache• Each Squirrel node has a Pastry GUID• Each URL has a Pastry GUID (computed by applying SHA-1 hash to the
URL)• Squirrel node whose GUID is numerically closest to the URL GUID
becomes the home node for that URL, i.e., caches that URL• Simulation-based evaluation concluded that performance is
comparable to that of the centralized cache• Squirrel was subsequently employed for real at a local network of 52
nodes
36CMPT 401 2008 © A. Fedorova
OceanStore
• Massive storage system• Incrementally scalable persistent storage facility• Replicated storage of both mutable and immutable objects• Built on top of P2P middleware Tapestry (based on GUIDs,
similar to Pastry)• OceanStore objects: like files – data stored in a set of blocks• Each object is an ordered sequence of immutable versions that
are (in principle) kept forever• Any update to an object results in the generation of a new
version
37CMPT 401 2008 © A. Fedorova
OceanStore
38CMPT 401 2008 © A. Fedorova
OceanStore, Update
• Clients contact primary replicas to make update requests• Primary replicas are powerful stable machines. They
reach an agreement of accepting the update or not• The update data will be sent to archive servers for
permanent storages• Meanwhile, the update data will be propagated to
secondary replicas for queries issued by other clients• Clients must periodically check for new copies
39CMPT 401 2008 © A. Fedorova
Summary
• P2P systems harness massive computing resources available at the edges of the Internet
• Early systems partly depended on a central server (Napster) or used unstructured routing, e.g., flooding, (Gnutella)
• Later it was identified that common requirements for P2P systems could be solved by providing P2P middleware (Pastry, Tapestry, Chord)
• P2P middleware enables routing, self organization, node arrival and departure, failure recovery
• Most P2P applications support sharing of immutable objects (Kazaa, BitTorrent)
• Some support mutable objects (OceanStore, Ivy)• Other uses of P2P technology include Internet telephony (Skype)