Richard T. B. Ma School of Computing National University of Singapore Peer-to-Peer Networks CS 4226: Internet Architecture
Richard T. B. Ma School of Computing
National University of Singapore
Peer-to-Peer Networks
CS 4226: Internet Architecture
Outline
P2P vs. traditional paradigm Properties, Advantages and Challenges
Practical P2P systems Napster, Gnutella, KaZaa, Skype, BitTorrent
Key technologies for P2P lookup services Distributed Hash Table (DHT) Two example architectures: Chord and CAN
The client/server model and extension
Client/server model: Asymmetric traditional communication model roles: ad-hoc clients vs. dedicated servers
Extended model: Delegation a new role for server (client remains the same) can be recursive or iterative
server client
request
response secondary server
delegation
response
Root DNS Servers
com DNS servers org DNS servers edu DNS servers
poly.edu DNS servers
umass.edu DNS servers yahoo.com
DNS servers amazon.com DNS servers
pbs.org DNS servers
An example: Domain Name System (DNS)
client wants IP for www.amazon.com; 1st approx: client queries a root server to find com DNS server client queries com DNS server to get amazon.com DNS server client queries amazon.com DNS server to get IP address for
www.amazon.com
requesting host cis.poly.edu
gaia.cs.umass.edu
root DNS server
local DNS server dns.poly.edu
1
2 3
4
5
6
authoritative DNS server dns.cs.umass.edu
7 8
TLD DNS server
Domain name resolution: iterative vs. recursive
requesting host cis.poly.edu
gaia.cs.umass.edu
root DNS server
local DNS server dns.poly.edu
1
2
4 5
6
authoritative DNS server dns.cs.umass.edu
7
8
TLD DNS server
3
Severed-based vs. Peer-to-peer
peer-peer Properties and problems
Properties of (pure) P2P no always-on server or central entity arbitrary end systems directly communicate no a-priori knowledge/structure flat architecture/namespace
Problems: peers are intermittently connected and change IP addresses unreliable service providers how to stay connected? how to do resource lookup?
File Distribution: Server-Client vs P2P
Question : How much time to distribute file from one server to N peers?
us
u2 d1 d2 u1
uN
dN
Server
Network (with abundant bandwidth)
File, size F
us: server upload bandwidth ui: peer i upload bandwidth
di: peer i download bandwidth
File distribution time: server-client
us
u2 d1 d2 u1
uN
dN
Server
Network (with abundant bandwidth)
F server sequentially
sends N copies: NF/us time
client i takes F/di time to download
increases linearly in 𝑁 (for large 𝑁)
= 𝑑𝑐𝑐 = max 𝑁𝑁𝑢𝑠
, 𝑁min 𝑑𝑖
Time to distribute F
to N clients using client/server approach
File distribution time: P2P
us
u2 d1 d2 u1
uN
dN
Server
Network (with abundant bandwidth)
F
server must send one copy: F/us time
client i takes F/di time to download
NF bits must be downloaded (aggregate) fastest possible upload rate: us + Σui
𝑑𝑃𝑃𝑃 = max𝐹𝑢𝑐
,𝐹
min 𝑑𝑖,
𝑁𝐹𝑢𝑠 + ∑𝑢𝑖
0
0.5
1
1.5
2
2.5
3
3.5
0 5 10 15 20 25 30 35
N
Min
imum
Dis
tribu
tion
Tim
e P2PClient-Server
Server-client vs. P2P: example
Client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us
When and when not P2P?
When is P2P the right/wrong solution?
Claim: P2P vision is technically feasible In other words, possible to build everything on
Internet without any dedicated servers But just because it’s technically feasible,
doesn’t necessarily make sense… In other words, just because we can do it P2P,
doesn’t mean that we should do it P2P
So, when is P2P the right solution?!?
Some Criteria
Budget How much money do we have?
Resource relevance How widely are resources interesting to users?
Trust How much trust there is between users?
Rate of system change How fast do things change in the system?
Criticality How critical is the service to the users?
P2P Applications and Systems File sharing
Napster (99-01), KaZaA (01-12), Gnutella Content distribution
BitTorrent VoIP and messaging
Skype Video streaming
PPLive, PPStream Others applications
P2P computation, P2P storage, …
Napster: How does it work?
Based on a central index server user registers with the central server server sends list of files to be shared server knows all the peers and files in network
Searching based on keywords search results: a list of files with information
about the file and the peer sharing it e.g., encoding rate, file size, peer’s bandwidth some information entered by user, unreliable
Napster: How does it work?
Pretty much like the use of delegation
However, change the role of client/server, making peer-to-peer
Napster: Pros and Cons Weaknesses:
downloading from a single peer only
single point of failure of the server
large computation to handle queries
unreliable content
vulnerable to attacks
lawsuits
Strengths: a consistent view of the
network
fast and efficient searching
guarantee correct search answers
Gnutella: How does it work?
Has only peers, all of which are fully equal conceptually an overlay network
To join the network, peer needs the address of another active peer out-of-band channel, e.g., get it from a website
Once joined, peer learns about others and learns about the topology of the network
Queries are flooded into the network
Downloads directly between peers
Gnutella: How does it work?
Query
Query
Hit
Hit
HTTP File transfer
Gnutella: Pros and Cons Weaknesses:
inefficient queries flooding
• wastes lot of network and peer resources
• how to deal with it?
inefficient network management
• constant probing is needed
Strengths: fully distributed
open protocol • Easy to write clients, e.g., no
KaZaA for Linux
robust against node failures • only true for random failures,
as it forms a power-law network
less susceptible to denial of service attack
KaZaA: How does it work?
Two kinds of nodes Ordinary Nodes (ON): a normal user peer Supernodes (SN): a user peer with more
resources/responsibilities than ON
Forms a two-tier hierarchy top level has only SN, lower level only ON ON belongs to one SN: can change at will, but
only one SN at a time
SN acts as a “hub” for all its ON-children Keeps track of files in those ON-children peers
KaZaA Super nodes
exchange information between themselves
do not form a complete mesh
Ordinary nodes obtains address of SN,
sends request and gives list of files to share
SN starts keeping track of this ON
not visible to other SN
KaZaA: Ordinary vs. Super Nodes ON can be promoted to SN if it has sufficient
resources (bandwidth, up time) user can typically refuse to become a SN typical bandwidth requirement: 160-200 kbps
13% of ON responsible for 80% of uploads SN change connections to other SN on a time
scale of tens of minutes allows for larger network range to be explored avg. lifetime of SN 2.5 hours, but high variance
SN don’t cache info from disconnected ON estimated 30,000 SN at any given time one SN has connections to 30-50 other SN
Skype Allows the user to make calls to
other computers on Internet real phone network and real phone number
forwarded to Skype (costs money) very popular, ~300 million downloads, ~15 million
concurrent users online
Similar architecture to that of KaZaA supernodes and ordinary nodes but: Skype is perfectly legal (the affected
industry is “only” telcos, they sell DSL…)
Skype: How does it work?
inherently P2P: pairs of users communicate.
proprietary, encrypted application-layer protocol (inferred via reverse engineering)
hierarchical overlay with SNs
index maps usernames to IP addresses; distributed over SNs
Skype clients (SC)
Supernode (SN)
Skype login server
Peers (supernodes) as relays problem when both
Alice and Bob are behind “NATs”. NAT prevents an outside
peer from initiating a call to insider peer
solution: using Alice’s and Bob’s
SNs, relay is chosen each peer initiates
session with relay. peers can now
communicate through NATs via relay
BitTorrent: P2P Content Distribution BitTorrent builds a network (swarm) for
every file that is being distributed
Big advantage: can send “link” (.torrent) to a friend “link” always refers to the same file not feasible on search-based Napster, Gnutella,
or KaZaA (hard to identify particular files)
Downside : no searching possible websites with “link collections” and search
capabilities exist, but no name service
BitTorrent: How does it work?
For each shared file, there is (initially) one server (seed) which hosts the original copy file is broken into chunks
“torrent” file: metadata about the content torrent file hosted typically on a web server
Client downloads torrent file: Metadata indicates the sizes and checksums of chunks identifies a tracker
BitTorrent: To start with … Web server
Tracker
Seed
.torrent file
Tracker: 137.89.211.1 Chunks: 42 Chunk 1: … Chunk 2: … …
1 2
1. seed starts tracker 2. seed creates torrent-file and host it somewhere 3. a new client obtains the torrent file 4. the new client contacts tracker and obtain the “peers” 5. the new client download/exchange chunks with peers
New client
3
4
5
BitTorrent: file distribution tracker: a server that keeps track of which seeds and peers are in the swarm; doesn’t participate in actual file distribution
obtain list of peers
trading chunks
peer
torrent: group of peers exchanging chunks of a file
swarm: seeds+peers
file divided into 256KB chunks.
peer joining torrent: has no chunks, but will accumulate them over time registers with tracker to get list of peers,
connects to subset of peers (“neighbors”)
when downloading, peer uploads chunks to others
peers may come and go
once peer has entire file, it may (selfishly) leave or (altruistically) remain
BitTorrent: a bit details
Pulling Chunks at any given time, peers
have different subsets of file chunks
periodically, a peer (Alice) asks each neighbor for list of chunks that they have.
Alice sends requests for her missing chunks rarest first
Pushing Chunks: tit-for-tat Alice sends chunks to four
neighbors currently sending her chunks at the highest rate re-evaluate top 4 every 10
secs
every 30 secs: randomly select another peer, starts sending chunks newly chosen peer may join
top 4 “optimistically un-choke”
BitTorrent: more details
BitTorrent: Tit-for-tat (1) Alice “optimistically unchokes” Bob (2) Alice becomes one of Bob’s top-four providers; Bob reciprocates (3) Bob becomes one of Alice’s top-four providers
With higher upload rate, can find better trading partners & get file faster!
BitTorrent: Open Issues Everyone must contribute
clients behind a firewall? low-bandwidth clients have a disadvantage?
BT’s impact on the network fast download != nearby in network
Optimal chunk selection algorithm rarest-first seems to work well in practice is it optimal? fastest for single peer or overall?
Is tit-for-tat really necessary? are there situations where free-riding should
be allowed or even be encouraged?
Related issues Dealing with today‘s users
usenet/email worked when users behaved well; now, spam is everywhere!
need accountability: identify individuals, even if “pseudonymously“
Preserve privacy (somehow conflicting goal) Prevent “freeriding“
reputation tracking mechanisms help voting mechanisms and payment schemes effort went into accountability in P2P systems tit-for-tat scheme in BitTorrent
Outline
P2P vs. traditional paradigm Properties, Advantages and Challenges
Practical P2P systems Napster, Gnutella, KaZaa, Skype, BitTorrent
Key technologies for P2P lookup services Distributed Hash Table (DHT) Two example architectures: Chord and CAN
Searching and Addressing
Two ways to find objects, which determine how network is constructed how objectives are placed how efficient objects can be found
Examples (search or addressing?)
Google DNS, IP routing Napster, Gnutella, KaZaa, BitTorrent
Searching vs. Addressing Searching:
no need to know unique names (more user friendly)
hard to make efficient (can solve with $$, see Google)
need to compare actual objects to know if they are the same
Addressing: object location can
be made efficient
each object uniquely identifiable
need to know unique names
need to maintain structure required for addressing
Two types of P2P Unstructured networks/systems
cause the need for searching does not mean complete lack of structure
• has graph structure, e.g., power-law, hierachy … but peers are free to join anywhere, choose
neighbors freely, objects are stored anywhere
Structured networks/systems allow for addressing, deterministic routing network structure determines where peers
belong in the net and where objects are stored how can we build such structured networks?
Key Value Store Database contains entries in the form of
(key, value) pairs key: ss number; value: human name key: content type; value: IP address
Operations/interface Put(key, value) Get(key) value
Looks like a table find an object takes 𝑂 𝑁 how to locate an object efficiently?
key value John 8732-7436 Adam 2349-5763 Mary 8734-7263 Linda 3682-8923
Recall: Hash Tables Data structure
fixed-sized array of hash buckets
allow insertions, deletions and lookups in 𝑂 1
Hash function maps keys to hash buckets with desirable properties fast to compute even distribution of keys
0 1 2 3 4 5 6 7
16
26
45 84
31
ℎ𝑎𝑐ℎ 𝑥 = 𝑥 𝑚𝑚𝑑 8
index of hash buckets
keys that map to the bucket
42
Distributed Hash Table (DHT)
Idea: distribute hash buckets to peers
Core question: how to design and implement an efficient mechanism to find which peer is
responsible for which hash bucket?
route between them?
0 1 2
16
26
3 4 5 45
84
6 7 31
42
DHT: Principles Each node is responsible
for one or more buckets as nodes joins and leaves,
the responsibilities change Nodes communicate
among themselves to find the responsible node Scalable communications
make DHT efficient DHTs support all the
hash table operations
0 1 2
16
26
3 4 5 45
84
6 7 31
42
DHT: Examples We’ll study: Chord (2001) and CAN (2001)
Other examples Pastry/Tapestry (2001): based on Plaxton routing Kademlia (2002): based on XOR-metric
All provide the same abstraction store key-value pairs when given a key, can retrieve/store the value no semantics associated with key or value
Major differences design of namespace and routing in the overlay
References
I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan, “Chord: A scalable peer-to-peer lookup service for internet applications,” in Proc. SIGCOMM, San Diego, CA, Aug. 2001, pp. 149–160.
S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Schenker, “A scalable content-addressable network,” in Proc. SIGCOMM, San Diego, CA, Aug. 2001, pp. 161–172.
Chord: Basics
From MIT, used in P2P storage systems
Uses SHA-1 hash function in practice results in a 160-bit object/node identification same hash function for both objects and nodes
• node ID hashed from IP address • object ID hashed from object name
Organized in a ring which wraps around nodes keep track of predecessor and successor
Chord
example: namespace 0, 𝑃3 − 1
an overlay network who are the successor and
predecessor of node 3?
Chord: how to assign indices?
In general, assign identifier to each node/object in the range 0,𝑃𝑚 − 1 each identifier can be represented by 𝑚 bits
Central issue: assign (key, value) pairs to nodes/peers
Rule: assign indices to the node that has the closest ID convention: closest is the immediate successor
successor 1 = 1 successor 𝑃 = 3 successor 6 = 0
who is taking care of indices: 1, 2 and 6?
1
2
6
Chord: find a particular node
If we look for index 7, and we start at node 2, how many steps? successor 7 = 0 𝑃 ⇒ 3 ⇒ 4 ⇒ 6 ⇒ 0
In general, it
takes 𝑂 𝑁 steps 𝑁 is the # of nodes too slow for large 𝑁
Chord: adding shortcuts
Notation Definition 𝑓𝑓𝑓𝑓𝑓𝑓[𝑘]. 𝑐𝑠𝑎𝑓𝑠 𝑓 + 𝑃𝑘−1 mod 𝑃𝑚, 1 ≤ 𝑘 ≤ 𝑚
. 𝑓𝑓𝑠𝑓𝑓𝑖𝑎𝑖 𝑓𝑓𝑓𝑓𝑓𝑓[𝑘]. 𝑐𝑠𝑎𝑓𝑠, 𝑓𝑓𝑓𝑓𝑓𝑓[𝑘 + 1]. 𝑐𝑠𝑎𝑓𝑠 .𝑓𝑚𝑑𝑓 first node ≥ 𝑓. 𝑓𝑓𝑓𝑓𝑓𝑓 𝑘 . 𝑐𝑠𝑎𝑓𝑠
successor the next node on the identifier circle; i.e., 𝑓𝑓𝑓𝑓𝑓𝑓 1 .𝑓𝑚𝑑𝑓
predecessor the previous node on the identifier circle
Each node 𝑓 maintains a finger table that includes at most 𝑚 shortcuts 𝑓th finger/shortcut is at least 𝑃𝑖−1 far apart
Finger table of node 𝑓
Fingers for node 3 and 6: start start int. succ.
4 𝟒,𝟓 5 𝟓,𝟕 7 𝟕,𝟑
start int. succ. 7 𝟕,𝟎 0 𝟎,𝟐 2 𝟐,𝟔
Fingers for node 3 and 6: node start int. succ.
4 𝟒,𝟓 4 5 𝟓,𝟕 6 7 𝟕,𝟑 0
start int. succ. 7 𝟕,𝟎 0 0 𝟎,𝟐 0 2 𝟐,𝟔 2
Node Join start int. succ. 2 𝟐,𝟑 1 3 𝟑,𝟓 1 5 𝟓,𝟏 1
Node Join start int. succ. 2 𝟐,𝟑 2 3 𝟑,𝟓 1 5 𝟓,𝟏 1
start int. succ. 3 𝟑,𝟒 1 4 𝟒,𝟔 1 6 𝟔,𝟐 1
Node Join start int. succ. 2 𝟐,𝟑 2 3 𝟑,𝟓 6 5 𝟓,𝟏 6
start int. succ. 3 𝟑,𝟒 6 4 𝟒,𝟔 6 6 𝟔,𝟐 6
start int. succ. 7 𝟕,𝟎 0 0 𝟎,𝟐 0 2 𝟐,𝟔 2
start int. succ. 1 𝟏,𝟐 1 2 𝟐,𝟒 2 4 𝟒,𝟎 6
Routing start int. succ. 2 𝟐,𝟑 2 3 𝟑,𝟓 6 5 𝟓,𝟏 6
start int. succ. 3 𝟑,𝟒 6 4 𝟒,𝟔 6 6 𝟔,𝟐 6
start int. succ. 7 𝟕,𝟎 0 0 𝟎,𝟐 0 2 𝟐,𝟔 2
start int. succ. 1 𝟏,𝟐 1 2 𝟐,𝟒 2 4 𝟒,𝟎 6
query node 1: hash(key)=7
where is it located?
Node Leave
peer 1 abruptly leaves peer 0 detects; makes 2 its immediate successor;
asks 2 who its immediate successor is; makes 2’s immediate successor its second successor.
To handle node departure, require each node to know the IP address of its two successors
Each node periodically pings its two successors to see if they are still alive.
Chord: Performance
Finding an object takes 𝑂 log𝑁 steps
For 𝑁 nodes and 𝐾 objects each node is responsible for O 𝐾/𝑁 objects when an 𝑁 + 1 𝑠ℎ node joins or leaves,
responsibility of 𝑂 𝐾/𝑁 indices change hands
Any node joining or leaving an 𝑁-node network uses 𝑂 log𝑁 ∗ log𝑁 messages to re-establish the routing and finger tables initialize finger table and predecessor (for join)
From a ring to …
Two-dimensional torus
CAN: Basics
Scalable content-addressable network (CAN)
From Berkley, published in 2001 in the same conference as Chord
Namespace is a 𝑑-dimensional torus
Keep track of neighbors only no need to store shortcuts routing in a 𝑑-dimensional Euclidean space
CAN
a new node A joins via an existing node I randomly choose a coordinate (x,y)
A I
(x,y)
CAN
A I
(x,y)
route to node J from node I discover that node J owns (x,y)
J
CAN
J
A split node J’s zone by half. now node A owns one half
Splitting/merging namespace
Splitting a zone when a new node joins in a sequential order of the coordinates: split
along the 𝑋 dimension first, and then 𝑌 for the 2-dimensional space, each zone is a
square or a 1:2 narrow rectangle
When an existing node departs merge back to a neighbor, if it can be done otherwise, a neighbor node might temporarily
handle multiple zones
CAN
routing is easy: routing table contains 4 neighbors
J
A
CAN
routing is easy: routing table contains 4 neighbors
J
A
CAN
B
node B insert(K,V)
1. �𝒂 = 𝒉𝒙 𝑲𝒃 = 𝒉𝒚(𝑲)
2. route (K,V) to coordinate (a,b)
3. node who owns (a,b) stores
(a,b)
𝑥 = 𝑎
𝑦 = 𝑏
CAN B
node C retrieve (K,V)
1. �𝒂 = 𝒉𝒙 𝑲𝒃 = 𝒉𝒚(𝑲)
2. route
“retrieve(K,V)” to the node who owns (a,b)
(a,b)
C
CAN: Extension and Performance
Increase the dimension 𝑑 > 𝑃 increase routing table size and hash functions but shorter path
State information 𝑂 𝑑 maintain information of 𝑃𝑑 neighbors
Routing takes 𝑂 𝑑𝑓1/𝑑 with 𝑓 nodes average path length is 𝑑/4 𝑓1/𝑑
From 2D to 3D
CAN: Extension and Performance
Multiple realities multiple independent coordinate spaces each node gets a different zone in each space multiple copies of data stored in multiple nodes also shorter path
Routing weighted by round-trip-times take network topology into consideration forward to the “best” neighbor
Dimensions vs. Realities
increasing dimension reduces # hops more but large reality has other benefits
More References A. Rowstron and P. Druschel, "Pastry:
Scalable, decentralized object location and routing for large-scale peer-to-peer systems” IFIP/ACM International Conference on Distributed Systems Platforms (Middleware ’01), 329–350.
B. Zhao, L. Huang, J. Stribling, S. Rhea, A. Joseph, J. Kubiatowicz, ”Tapestry: A Resilient Global-scale Overlay for Service Deployment,” IEEE Journal on Selected Areas in Communications, 22(1): 2004.