Top Banner
Distributed Systems Distributed Systems Peer Peer - - to to - - Peer Peer Prof. Dr.-Ing. Torben Weis Universität Duisburg-Essen
39
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Peer2Peer

Distributed SystemsDistributed SystemsPeerPeer--toto--PeerPeer

Prof. Dr.-Ing. Torben WeisUniversität Duisburg-Essen

Page 2: Peer2Peer

The History of P2P

USENET

1979

ICQ

1996

1998

Jabber

2000

GnutellaeDonkeyFreenet

2002

Kademlia

2005

WASTEAvalanche

2006

PNRP

Page 3: Peer2Peer

Domains of P2P

Page 4: Peer2Peer

4

Why PeerWhy Peer--toto--Peer?Peer?

Earlier days: ∙ Client-server architecture in area of distributed systems

∙ Problems:◦ High load on servers◦ CS systems not indefinitely scalable◦ Server acts as Single Point of Failure (SPOF)

Excessive use of internet required alternatives

Major goal: ∙ Distribute load fairly on all nodes participating in network

Page 5: Peer2Peer

Paradigm ShiftParadigm Shift

End of ´90s Peer-to-Peer (P2P) systems begin to displace client-server systems

Benefits

Today P2P traffic sums up to 60% of all internet traffic

Self‐organizing

decentral fair (costs, bandwidth,…)

scalable dynamic

autonomous

anonymous

Page 6: Peer2Peer

Institut für Informationstechnik

Torben Weis 6

Where does a peer live ?

∙ Peers live in an overlay above regular IP network∙ Connections between peers pose virtual links

◦ May correspond to a path consisting of several physical links∙ Overlay allows routing

messages to destinations not specified by IP address

Page 7: Peer2Peer

What is a peer ?

Page 8: Peer2Peer

P2P Attributes

Page 9: Peer2Peer

Generations

Page 10: Peer2Peer

1. Generation: Centralized P2P

ServerServer

PeerPeer

ConnectConnectQueryQuery

ReplyReply

TransferTransfer

e.g. Napster

•Server stores index•File transfer using P2P

•Scalability problems•Single point of failure

Page 11: Peer2Peer

2. Generation: Pure P2P

PeerPeer

ConnectConnect

QueryQuery

ReplyReply

TransferTransfer

e.g. Gnutella

•All peers are the same•Queries/Pings are forwarded•No global knowledge•Very robust

•Performance problems•Message flooding•Colissions

•No hit guarantee

Page 12: Peer2Peer

2. Generation: Hybrid P2P

PeerPeer

ConnectConnect

Super-PeerSuper-Peer

QueryQueryReplyReplyTransferTransfer

e.g. Fasttrack

•Different Roles•Regular Peers•Super-Peers

•Structure uses hierarchy•Super-Peers use localknowledge for queries•On miss, forward query

Page 13: Peer2Peer

3. Generation: Distributed Hash Tables

PeerPeer

ConnectConnect

Key spaceKey space

DataData

QueryQueryReplyReplyTransferTransfer

e.g. Chord, Pastry…

•All peers are the same•No hierarchy•Fair load balancing•Nodes and objects are mapped in the same key space

Page 14: Peer2Peer

Institut für Informationstechnik

Torben Weis 14

Napster - Development

∙ Developed by Shawn Fanning ◦ Born 1980 in Brockton, Massachusetts◦ Started studying Computer Science in Boston 1999◦ Foundation Napster Inc. 05/1999◦ Roxio buys remains of Napster in 2002 after bankruptcy

∙ Friends came up with the idea of Napster∙ No equivalent software for direct transfer available

Goal: Exchange music within circle of friends

Page 15: Peer2Peer

Institut für Informationstechnik

Torben Weis 15

Napster – Network Structure

∙ Star-like Structure∙ Central server

◦ Farm consists of ~200 servers

∙ Servers store indices∙ User connect with server:

◦ Server-Client communication while searching

◦ Client-Client communication while transfering

Page 16: Peer2Peer

Institut für Informationstechnik

Torben Weis 16

Napster - Tasks

∙ Portal for exchanging MP3 files∙ Distinct roles for clients and server

∙ Server◦ Indexes all .mp3 files in network◦ Relays communication between peers

∙ Clients◦ Goal: Download music◦ Upload list of shared files

Page 17: Peer2Peer

∙ Mp3 Search◦ Client sends query to server◦ Server searches database◦ Server sends result set to client

∙ Mp3 Download1. Peer A sends query for song XY2. Server sends address for

peer B to A3. Client sends request to peer B4. Download commences

Institut für Informationstechnik

Torben Weis 17

Napster – File Transfer

Page 18: Peer2Peer

∙ MP3 Download behind Firewall1. Peer A sends query for song2. Server tells connection data

of peer A to peer B3. Peer B opens connection to peer A4. Peer A starts download from peer B

Institut für Informationstechnik

Torben Weis 18

Napster – File Transfer

Page 19: Peer2Peer

Institut für Informationstechnik

Torben Weis 19

Napster - Conclusion

∙ Pros:◦ Recent view on network due to central database◦ Support of MP3s only low risk on virus download

∙ Cons:◦ Scalability: bottleneck server farm ◦ Server poses Single Point of Failure ◦ No security, file transfers not encrypted◦ Censorship of database possible (using filters)◦ Freerider problem◦ No chunking possible, download only from single peer/file

high dependency on a single peer

Page 20: Peer2Peer

Institut für Informationstechnik

Torben Weis 20

Napster - Summary

∙ Napster pioneered peer-to-peer systems∙ Only small use of P2P technology

◦ File download is p2p-based◦ File search is client-server-based

∙ Protocol is closed-source, reverse-engineering enabled development of OpenNap

∙ Napster got sued (4/2000), finally turned off (7/2001)∙ Napster now acts as legal, commercial music provider

Page 21: Peer2Peer

Institut für Informationstechnik

Torben Weis 21

Gnutella - Development

• Started after prohibition of Napster (1999-2001) • Justin Frankel (Nullsoft) publishes V0.4 in March 2000• Mother company AOL stops distribution • Already downloaded thousand-fold• Reverse-engineering revealed protocol

Goal:• Simple exchange of music in company network• No usage of central components

Page 22: Peer2Peer

Institut für Informationstechnik

Torben Weis 22

Gnutella – Properties

∙ Fully decentralized P2P network∙ Allows for download of all file types∙ No role allocation pure P2P∙ Each node is server and client Servent∙ Members are autonomous∙ Robust network, mainly 3-4 open connections

Problem:∙ Finding entry point (Bootstrapping)

◦ Host-Cache Server◦ List with known hosts from former sessions

Page 23: Peer2Peer

Institut für Informationstechnik

Torben Weis 23

Gnutella - Messages

∙ Message ID◦ Distinct Identifier for messages in the network

∙ Payload Descriptor◦ Ping, Pong, Query, Query Hit, Push

∙ Time to Live (TTL)◦ Hops to go until packet is dropped, common value: TTL=7

∙ Hops◦ Hops packet has already taken

∙ Payload Length

Page 24: Peer2Peer

Institut für Informationstechnik

Torben Weis 24

Gnutella – Network Structure

∙ No central index peers have to probe neighbors∙ Regularly broadcasting ping messages∙ Peer receives pong on same path it was sent, contains

◦ Information about address: IP, Port, Servent ID…◦ Amount and size of shared files

∙ Loss of a node may lead tonetwork partitioning

∙ Ping frequency vs.Up-to-dateness◦ Ping size = 22 bytes◦ 1000 peers/ 3 connections each

~64 MB/sec

Page 25: Peer2Peer

Institut für Informationstechnik

Torben Weis 25

Gnutella - Search

∙ Broadcast query to neighbors∙ QueryHit contains servent ID, address, speed…∙ Search runtime equals breadth-first search (O(|V|+|E|)∙ Search only limited through TTL∙ Client sends request

GET /get/4356/foo.mp3 HTTP/1.0User-Agent: Gnutella

Connection: Keep-AliveRange: bytes=0-

Page 26: Peer2Peer

Institut für Informationstechnik

Torben Weis 26

Gnutella – File Transfer

∙ Download using HTTP

∙ Use of push messages to bypass firewalls◦ Common query ends with timeout◦ Client sends push message (ID + address) to server◦ Server opens connection to client

∙ Download complicated if◦ Both peers behind Firewall◦ IP-Masquerading is used

HTTP 200 OKServer: Gnutella

Content-type: application/binary

Content-length: 3457827

Page 27: Peer2Peer

Institut für Informationstechnik

Torben Weis 27

Gnutella – Problems

∙ Scalability◦ Massive traffic for keeping network up-to-date

∙ Reliability◦ In dense networks packets drop after 3 hops◦ Long paths reduce success rate

∙ Security ◦ No use of hash values◦ Similar to DDoS-attacks

∙ Privacy◦ Packets not encrypted

Page 28: Peer2Peer

Institut für Informationstechnik

Torben Weis 28

Gnutella - Conclusion

Pros:∙ Very robust, connection init using TCP∙ Autonomous peers∙ Communication using UDP

Cons:∙ No guaranteed hits∙ Massive traffic and high latency∙ Massive scalability problems

Page 29: Peer2Peer

Institut für Informationstechnik

Torben Weis 29

Chord – DHT-based P2P

∙ First 3rd Generation P2P system∙ Developed by Ion Stoica @ MIT in 2001∙ Scientific approach due to drawbacks of 2nd generation∙ Complete decentralization while

◦ Offering efficient and correct searches◦ Providing good scalability◦ Relying on flat network structure (no hierarchy)◦ Balancing load fairly on all nodes

∙ First use of distributed hash tables in P2P

Page 30: Peer2Peer

Institut für Informationstechnik

Torben Weis 30

Chord – Use of DHTs

∙ Cryptographic function SHA-1◦ 160bit allow for addressing 2160 peers and objects◦ Collisions highly unlikely

◦ SHA-1 guarantees major variation even on minor changes• SHA-1(Franz)=b259d15d278969d8c6cc682bc5fb8c032a5a43de• SHA-1(Frank)=0df02da8548eeef2174c97c2ade67b4c5adc3160

◦ Keys in key space are • Equally distributed• Avoid collisions• Distinct

Page 31: Peer2Peer

Institut für Informationstechnik

Torben Weis 31

Chord – Data Mapping

Nodes

Data

Example for 4 bit-space•f(x) = 3 * x mod 16•f(47) = 3 * 47 mod 16 = 141 mod 16 = 13

Page 32: Peer2Peer

Institut für Informationstechnik

Torben Weis 32

Chord – Search

Example∙ 4 Nodes, 5 Object∙ Nodes responsible for

all keys between itselfand its predecessor

Search:∙ Nodes aware of both neighbors∙ Query direct neighbor∙ Runtime: O(n)

Page 33: Peer2Peer

Institut für Informationstechnik

Torben Weis 33

Chord – Finger Tables

∙ Use of finger tables for abbreviations◦ N nodes, m entries◦ n=2m ⇔ m=log2n◦ Finger[k]

first node on circle that suceeds(n+2k-1)mod 2m, 1≤k ≤m

◦ Successor = finger [1]◦ Predecessor = previous node

// search the local table for the highest predecessor of id

n:closest preceding node(id)for i = m downto 1

if (finger[i] є (n; id))return finger[i];

return n;

Page 34: Peer2Peer

Institut für Informationstechnik

Torben Weis 34

Chord – Improved Search

∙ Example: finger table for 22

∙ 22 searches 38◦ Query sent to node known to be closest lower then 38 30◦ 30 sends query to successor asking for responsibility yes, found data

i Address Node

1 22+20=23 262 22+21=24 263 22+22=26 264 22+23=30 305 22+24=38 396 22+25=54 55

Page 35: Peer2Peer

Institut für Informationstechnik

Torben Weis 35

Chord – Adding new nodes

∙ New node q uses hash function to generate its ID:=55∙ Search for this ID delivers successor(55):=56∙ Correction steps:

◦ Predecessor of 56 (46) becomes predecessor of 55

◦ 55 becomes predecessor of 56◦ 55 becomes successor of 46◦ Copy finger table from 46 and

update all entries◦ All fingers from 46 have to check

their finger table too◦ Move data to new node if necessary

Page 36: Peer2Peer

Institut für Informationstechnik

Torben Weis 36

Chord – Node leaves network

∙ Successor of 22 does not respond any more∙ Search next living finger (39)∙ Go backwards to last

functioning node∙ Last node becomes new

successor of 22

Page 37: Peer2Peer

Institut für Informationstechnik

Torben Weis 37

Chord – Node leaves network

Problem: several nodes leave concurrently◦ Successor(22) = ?◦ Going backwards ends in 39

as 37 is not reachable◦ Successor(22)=39

though L is alive◦ Data from L not accessible

Solution: successor list◦ Nodes store

r=O(log n) successors22 knows of L and integrates it

Page 38: Peer2Peer

Institut für Informationstechnik

Torben Weis 38

Chord - Conclusion

Pros:∙ Fully decentralized architecture∙ Equity among nodes, no role allocation∙ Improved scalability∙ Efficient and correct search methods: O(log n)

Cons:∙ Huge efforts to keep finger and neighbors up-to-date∙ Join and leave operations costly∙ No support for security, anonymity or firewalled users

Page 39: Peer2Peer

Institut für Informationstechnik

Torben Weis 39

Summary

∙ P2P has shown new ways in exchanging data◦ Fairness regarding disk space, bandwidth…◦ Scalability allows for huge amounts of users◦ Improved robustness due to decentralization

∙ Still P2P is mainly found in prototypes◦ Especially 3rd generation applications only in scientific areas◦ Popular applications (Edonkey…) use 2nd generation protocols

∙ Future work◦ Use of P2P technology in Vista (PNRP for distributed DNS service)◦ OceanStore works on distributed data archives◦ Applications you build on your own…