The BitTorrent content distribution system CS217 Advanced Topics in Internet Research Guest Lecture Nikitas Liogkas, 5/11/2006.
Post on 17-Dec-2015
214 Views
Preview:
Transcript
The BitTorrentcontent distribution system
CS217 Advanced Topics in
Internet Research
Guest Lecture
Nikitas Liogkas, 5/11/2006
Motivation
flash crowd (aka slashdot) effect many clients, few servers
Problem: servers cannot handle load
Solution: swarming clients download pieces of the file
from each other has been proven to have good scaling
and performance properties
Presentation outline
Joining the system Encoding / metadata file Tracker protocol Peer wire protocol Piece selection Peer selection Client implementations Resources
new leecher
Joining a torrent
Peers divided into: seeds: have the entire file leechers: still downloading
datarequest
peer list
metadata file
join
1
2 3
4seed/leecher
website
tracker
1. obtain the metadata file (out of band)2. contact the tracker 3. obtain a peer list (contains seeds & leechers)4. contact peers from that list for data
!
Exchanging data
I have leecher A
● verify pieces using hashes
● download sub-pieces (blocks) in parallel
● advertise received pieces to the entire peer list
● interested: need pieces that a given peer has
seed
leecher B
leecher C
Bencoding
encoding format of all exchanged messages four types
byte strings integers lists dictionaries (mapping keys to values)
examples 4:spam represents the string “spam” i10e represents the integer 10
Metadata file structure
contains information necessary to contact the tracker and describes the files in the torrent announce URL of tracker file name file length piece length (typically 256KB) SHA-1 hashes of pieces for verification also creation date, comment, creator, …
Tracker protocol
communicates with clients via HTTP/HTTPS client GET request
info_hash: uniquely identifies the file peer_id: chosen by and uniquely identifies the client client IP and port numwant: how many peers to return (defaults to 50) stats: bytes uploaded, downloaded, left
tracker GET response interval: how often to contact the tracker list of peers, containing peer id, IP and port stats: complete, incomplete
tracker-less mode; based on the Kademlia DHT
Presentation outline
Joining the system Encoding / metadata file Tracker protocol Peer wire protocol Piece selection Peer selection Client implementations Resources
Peer wire protocol
implemented directly on top of TCP messages
handshake (maybe with bitfield) keep-alive choke / unchoke interested / not interested have (advertisement of a newly acquired piece) request / piece cancel (only used in “endgame mode”) port (used in tracker-less mode)
Piece selection
when downloading starts: choose at random get complete pieces as quickly as possible obtain something to offer to others
after we have 4 pieces: pick (local) rarest first achieves the fastest replication of rare pieces obtain something of value only get unique pieces from the seed
endgame mode defense against the “last-block problem” send requests for missing sub-pieces to all
peers in our peer list send cancel messages upon receipt of a sub-piece
Last-block problem
at the end of the download, a peer may have trouble finding the few missing pieces
based on anecdotal evidence other proposals
network coding [Gkantsidis et al., Infocom’05] prefer to upload to peers with similar file
completeness; unfair for the peers having most of the pieces [Tian et al., Infocom’06]
Last-block problem – a myth?
is it a problem after all? figure from [Legout et al., INRIA-TR-2006], with permission
Peer selection - unchoking
leecher A
seed
leecher B
leecher C
• periodically (typically every 10 seconds) calculate data-receiving rates
• upload to (unchoke) the fastest
• constant number of unchoking slots
• based on the “tit-for-tat” strategy
Optimistic unchoking
periodically select a peer at random and upload to it typically every 3 unchoking rounds (30 seconds)
multi-purpose mechanism allow bootstrapping of new clients continuously look for the fastest partners robustness: every peer has a non-zero chance
of interacting with any other peer
Seed unchoking
old algorithm unchoke the fastest leechers problem: fastest peers may monopolize seeds
new algorithm periodically sort all leechers according to their last unchoke time prefer the most recently unchoked leechers; on a tie, prefer the fastest (presumably) achieves equal spread of seed bandwidth
new listrequest
peer list
Downloading only from seeds
leecher A
seed
leecher B
leecher C
tracker
● repeatedly query the tracker for peer lists
● distinguish the seeds, and receive data from them
● violates fairness model; may be harmful to honest peers
Rate- vs. volume-based selection
Proponents of rate-based decisions: [Cohen, P2PECON’03], and[INRIA TR’2006]
Proponents of volume-based decisions:[Bharambe et al., MSR-TR-2005],[Gkantsidis et al., Infocom’05], [Jun et al., P2PECON’05], andeDonkey file-sharing system
No clear winner yet!
Client implementations
mainline: written in Python; right now, the only one employing the new seed unchoking algorithm
Azureus: the most popular, written in Java; implements a special protocol between clients(e.g. peers can exchange peer lists)
other popular clients: ABC, BitComet, BitLord, BitTornado, μTorrent, Opera browser
various non-standard extensions retaliation mode: detect compromised/malicious peers anti-snubbing: ignore a peer who ignores us super seeding: seed masquerading as a leecher
Resources #1
Basic BitTorrent mechanisms [Cohen, P2PECON’03]
BitTorrent specification Wikihttp://wiki.theory.org/BitTorrentSpecification
Measurement studies [Izal et al., PAM’04], [Pouwelse et al., Delft TR 2004 and IPTPS’05], [Guo et al., IMC’05], and[Legout et al., INRIA-TR-2006]
Resources #2
Theoretical analysis and modeling [Qiu et al., SIGCOMM’04], and[Tian et al., Infocom’06]
Simulations [Bharambe et al., MSR-TR-2005]
Sharing incentives and exploiting them [Shneidman et al., PINS’04],[Jun et al., P2PECON’05], and[Liogkas et al., IPTPS’06]
Conclusion and food for thought
BitTorrent is fast and robust
Yet, many parameters are arbitrarily set number of unchoking slots unchoking round duration size of pieces / sub-pieces
What can we learn from BitTorrent for the design of future P2P content distribution protocols?
top related