Top Banner
Peer to Peer File Peer to Peer File Sharing: A Survey Sharing: A Survey Ismail Guvenc and Juan Jose Ismail Guvenc and Juan Jose Urdaneta Urdaneta
38

Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

Dec 14, 2015

Download

Documents

Sandra Pott
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

Peer to Peer File Sharing: A Peer to Peer File Sharing: A SurveySurvey

Ismail Guvenc and Juan Jose UrdanetaIsmail Guvenc and Juan Jose Urdaneta

Page 2: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

OutlineOutline

Peer-to-Peer ConceptPeer-to-Peer Concept Overview of P2P systems:Overview of P2P systems:

– Napster, Gnutella, Freenet, Freehaven, Napster, Gnutella, Freenet, Freehaven, Oceanstore, PAST, Farsite, Publius, CFS, Oceanstore, PAST, Farsite, Publius, CFS, Tapestry, Pastry, Chord, Can and othersTapestry, Pastry, Chord, Can and others

Comparison of the systemsComparison of the systems ConclusionConclusion

Page 3: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

What is Peer-to-Peer?What is Peer-to-Peer?

Every node is designed to(but may not by Every node is designed to(but may not by user choice) provide some service that helps user choice) provide some service that helps other nodes in the network to get serviceother nodes in the network to get service

Each node potentially has the same Each node potentially has the same responsibilityresponsibility

Sharing can be in different ways:Sharing can be in different ways:– CPU cycles: CPU cycles: SETI@HomeSETI@Home– Storage space: Napster, Gnutella, Freenet…Storage space: Napster, Gnutella, Freenet…

Page 4: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

P2P: Why so attractive?P2P: Why so attractive?

Peer-to-peer applications fostered Peer-to-peer applications fostered explosive growth in recent years. explosive growth in recent years. – Low cost and high availability of large Low cost and high availability of large

numbers of computing and storage resources, numbers of computing and storage resources, – Increased network connectivity Increased network connectivity

» As long as these issues keep their importance, As long as these issues keep their importance, peer-to-peer applications will continue to gain peer-to-peer applications will continue to gain importanceimportance

Page 5: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

Main Design Goals of P2P Main Design Goals of P2P systemssystems

Ability to operate in a dynamic environmentAbility to operate in a dynamic environment Performance and scalabilityPerformance and scalability ReliabilityReliability Anonymity: Freenet, Freehaven, PubliusAnonymity: Freenet, Freehaven, Publius Accountability: Freehaven, FarsiteAccountability: Freehaven, Farsite

Page 6: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

First generation P2P routing and First generation P2P routing and location schemeslocation schemes

Napster, Gnutella, Freenet…Napster, Gnutella, Freenet… Intended for large scale sharing of data filesIntended for large scale sharing of data files Reliable content location was not Reliable content location was not

guaranteed guaranteed Self-organization and scalability: two issues Self-organization and scalability: two issues

to be addressedto be addressed

Page 7: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

Second generation P2P systemsSecond generation P2P systems

Pastry, Tapestry, Chord, CAN…Pastry, Tapestry, Chord, CAN… They guarantee a definite answer to a query They guarantee a definite answer to a query

in a bounded number of network hops. in a bounded number of network hops. They form a self-organizing overlay They form a self-organizing overlay

network. network. They provide a load balanced,They provide a load balanced, fault-tolerant fault-tolerant

distributed hash table, in which items can distributed hash table, in which items can be inserted and looked up in a bounded be inserted and looked up in a bounded number of forwarding hops. number of forwarding hops.

Page 8: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

NapsterNapsterApplication-level, client-server protocol over point to-point TCP, Application-level, client-server protocol over point to-point TCP,

centralized systemcentralized system

Retrieval: four stepsRetrieval: four steps Connect to Napster serverConnect to Napster server Upload your list of files (push) to server.Upload your list of files (push) to server. Give server keywords to search the full list with.Give server keywords to search the full list with. Select “best” of correct answers. (pings)Select “best” of correct answers. (pings)

centralized server: single logical point of failure, can load balance centralized server: single logical point of failure, can load balance among servers using DNS rotation, potential for congestion, Napster among servers using DNS rotation, potential for congestion, Napster “in control” (freedom is an illusion)“in control” (freedom is an illusion)

no security: passwords in plain text, no authentication, no anonymityno security: passwords in plain text, no authentication, no anonymity

Page 9: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

Napster: How it works?(1)Napster: How it works?(1)

napster.com

users

File list is uploaded

1.

Page 10: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

Napster: How it works?(2)Napster: How it works?(2)

napster.com

user

Requestand

results

User requests search at server.

2.

Page 11: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

Napster: How it works?(3)Napster: How it works?(3)

napster.com

user

pings pings

User pings hosts that apparently have data.

Looks for best transfer rate.

3.

Page 12: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

Napster: How it works?(4)Napster: How it works?(4)

napster.com

user

Retrievesfile

User retrieves file

4.

Page 13: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

GnutellaGnutella peer-to-peer networking: applications connect to peer peer-to-peer networking: applications connect to peer

applications applications focus: decentralized method of searching for filesfocus: decentralized method of searching for files each application instance serves to:each application instance serves to:

– store selected filesstore selected files– route queries (file searches) from and to its neighboring peersroute queries (file searches) from and to its neighboring peers– respond to queries (serve file) if file stored locallyrespond to queries (serve file) if file stored locally

How it works:How it works:Searching by flooding:Searching by flooding:– If you don’t have the file you want, query 7 of your partners.If you don’t have the file you want, query 7 of your partners.– If they don’t have it, they contact 7 of their partners, for a maximum hop If they don’t have it, they contact 7 of their partners, for a maximum hop

count of 10.count of 10.– Requests are flooded, but there is no tree structure.Requests are flooded, but there is no tree structure.– No looping but packets may be received twice.No looping but packets may be received twice.

Note: Play gnutella animation at:

http://www.limewire.com/index.jsp/p2p

Page 14: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

Freenet(discussed in class before)Freenet(discussed in class before) Completely anonymous, for producers or consumers of Completely anonymous, for producers or consumers of

informationinformation Resistance to attempts by third parties to deny access to Resistance to attempts by third parties to deny access to

informationinformation Goals:Goals:

– Anonymity for producers and consumersAnonymity for producers and consumers– Deniability for information storersDeniability for information storers– Resistance to denial attacksResistance to denial attacks– Efficient storing and routingEfficient storing and routing– Does NOT provideDoes NOT provide

» Permanent file storagePermanent file storage» Load balancing Load balancing » Anonymity for general n/w usageAnonymity for general n/w usage

Page 15: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

Free havenFree haven AnonymousAnonymous Resists powerful adversaries to find/destroy dataResists powerful adversaries to find/destroy data GoalsGoals

– Anonymity: publishers, readers, serversAnonymity: publishers, readers, servers

– Persistence: lifetime determined by publisherPersistence: lifetime determined by publisher

– Flexibility: add/remove nodesFlexibility: add/remove nodes

– Accountability: reputationAccountability: reputation

A server gives up space => gets space on other A server gives up space => gets space on other serversservers

Page 16: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

Freehaven: PublicationFreehaven: Publication

Split doc into n shares, k of which can Split doc into n shares, k of which can rebuild the file(k<n)rebuild the file(k<n)– Large k => brittle fileLarge k => brittle file– Small k =>larger share, more duplicationSmall k =>larger share, more duplication

Generate (SKGenerate (SKdocdoc, PK, PKdocdoc) and encrypt each ) and encrypt each share with SKshare with SKdocdoc

Store on a server:Store on a server:– Encrypted share, timestamp, expiration date, Encrypted share, timestamp, expiration date,

hash(PKhash(PKdocdoc))

Page 17: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

Freehaven: RetrievalFreehaven: Retrieval

Documents are indexed by hash(PKDocuments are indexed by hash(PKdocdoc)) Reader generates (PKReader generates (PKclient ,client ,SKSKclientclient) and a one-time ) and a one-time

remailler reply blockremailler reply block Reader broadcasts hash(PKReader broadcasts hash(PKdocdoc), PK), PKclientclient ,and the ,and the

remailler block remailler block – To all servers it knows aboutTo all servers it knows about– Broadcasts may be queued and bulk sentBroadcasts may be queued and bulk sent

Servers holding shares with hash(PKServers holding shares with hash(PKdocdoc))– Encode the share with PKEncode the share with PKclientclient

– Send it using the remailler blockSend it using the remailler block

Page 18: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

Freehaven: Share expirationFreehaven: Share expiration

Absolute DateAbsolute Date ““Price” of a file: size x lifetimePrice” of a file: size x lifetime Freenet and Mojo Notion favor Popular documentsFreenet and Mojo Notion favor Popular documents

Unsolved problems:Unsolved problems:– Large corrupt servers, list of “discouraged” documents, Large corrupt servers, list of “discouraged” documents,

DoSDoS

Not ready for wide deployment:Not ready for wide deployment:– Inefficient communication=>few users=>weak anonymityInefficient communication=>few users=>weak anonymity

Page 19: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

PAST & PastryPAST & Pastry Pastry:Pastry:

– Completely decentralized, scalable, and self-Completely decentralized, scalable, and self-organizing; it automatically adapts to the organizing; it automatically adapts to the arrival, departure and failure of nodes.arrival, departure and failure of nodes.

– Seeks to minimize the distance messages travel, Seeks to minimize the distance messages travel, according to a scalar proximity metric like the according to a scalar proximity metric like the number of IP routing hops.number of IP routing hops.

– In a Pastry network, In a Pastry network, » Each node has a unique id, nodeId.Each node has a unique id, nodeId.» Presented with a message & a key, Pastry node Presented with a message & a key, Pastry node

efficiently routes the message to the node with a efficiently routes the message to the node with a nodeId that is numerically closest to the key.nodeId that is numerically closest to the key.

Page 20: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

Pastry: NodeIdPastry: NodeId Leaf set: Leaf set: stores stores

numerically numerically closest nodeIds.closest nodeIds.

Routing table: Routing table: Common prefix Common prefix with 10233102-with 10233102-next digit-rest of next digit-rest of NodeIdNodeId

Neighborhood setNeighborhood set: : Stores closest Stores closest nodes according to nodes according to proximity metricproximity metric

Page 21: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

Pastry: RoutingPastry: Routing Given a message, Check:Given a message, Check:

If it falls within the range of nodeId’s covered in the leaf If it falls within the range of nodeId’s covered in the leaf set, then forward directly to it.set, then forward directly to it.If not, using the Routing table, the message is forwarded to If not, using the Routing table, the message is forwarded to a node that shares a most common prefix with the key.a node that shares a most common prefix with the key.

If routing table is empty or the node cannot be reached, If routing table is empty or the node cannot be reached, then forward to a node that is numerically closer to the key then forward to a node that is numerically closer to the key and also shares a prefix with the keyand also shares a prefix with the key..

Performance:Performance:– If key within the leaf set = O ( 1 ) If key within the leaf set = O ( 1 ) – If key goes to the routing table=O(LogIf key goes to the routing table=O(Log N)N)– Worst case = O (N) Worst case = O (N) ( under failures)( under failures)

Page 22: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

PASTPAST

PAST: an archival, cooperative file storage and distribution PAST: an archival, cooperative file storage and distribution facility.facility.

uses uses Pastry Pastry as its routing schemeas its routing scheme Offers persistent storage services for replicated read-only filesOffers persistent storage services for replicated read-only files Owners can insert or reclaim files, but clients can just look upOwners can insert or reclaim files, but clients can just look up Collection of PAST nodes form an overlay network. A PAST Collection of PAST nodes form an overlay network. A PAST

node is at least an access point, but it can also contribute to node is at least an access point, but it can also contribute to storage and participate in the routing optionally as well.storage and participate in the routing optionally as well.

Security: Each node and each user in the system holds a Security: Each node and each user in the system holds a smartcard with which there is a private/public key pair smartcard with which there is a private/public key pair associated. associated.

Three operations: insert, lookup and reclaim Three operations: insert, lookup and reclaim

Page 23: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

FarsiteFarsite Farsite is a symbiotic, serverless, distributed file system. Farsite is a symbiotic, serverless, distributed file system. Symbiotic: It works among cooperating but not completely trusting the Symbiotic: It works among cooperating but not completely trusting the

clients. clients. Main design goals: Main design goals:

– To provide high availability and reliability for file storage.To provide high availability and reliability for file storage.– To provide security and resistance to Byzantine threats.To provide security and resistance to Byzantine threats.– To have the system automatically configure and tune itself adaptively.To have the system automatically configure and tune itself adaptively.

Farsite first Farsite first encryptsencrypts the contents of the files. This prevents an the contents of the files. This prevents an unauthorized user to read the file.unauthorized user to read the file.

Such a user can not read a file even if it is in his own desktop Such a user can not read a file even if it is in his own desktop computer, because of encryption. computer, because of encryption.

Digital signatures are used to prevent an unauthorized user to write a Digital signatures are used to prevent an unauthorized user to write a file.file.

After encryption, After encryption, multiple replicasmultiple replicas of the file are made and they are of the file are made and they are distributed to several other client machines.distributed to several other client machines.

Page 24: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

PubliusPublius Publius system mainly focuses on availability and Publius system mainly focuses on availability and

anonymity. It maintains availability by distributing files as anonymity. It maintains availability by distributing files as shares over n web servers. J of these shares are enough to shares over n web servers. J of these shares are enough to reconstruct a file. reconstruct a file.

For publishing the file, we first encrypt the document with For publishing the file, we first encrypt the document with key K. Then K is split in n shares, any j of which can key K. Then K is split in n shares, any j of which can rebuild K. K(doc) and a share are sent to n servers. rebuild K. K(doc) and a share are sent to n servers. “Name” of the document is the address of n servers.“Name” of the document is the address of n servers.

Query operation is basically running a local web proxy, Query operation is basically running a local web proxy, contacting j servers and rebuild K.contacting j servers and rebuild K.

While the identity of the servers are not anonymized, an While the identity of the servers are not anonymized, an attacker can remove information by forcing the closure of attacker can remove information by forcing the closure of n-k+1 servers. n-k+1 servers.

Publius lacks Publius lacks accountability(DoS with garbage) and accountability(DoS with garbage) and smooth join/leave for servers.smooth join/leave for servers.

Page 25: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

ChordChord

Chord:Chord: Provides peer-to-peer hash lookup service:

Lookup(key) IP address

Chord does not store the data

Efficient: O(Log N) messages per lookup

N is the total number of servers

Scalable: O(Log N) state per node

Robust: survives massive changes in membership

Page 26: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

Chord: Lookup MechanismChord: Lookup Mechanism

Lookups take O(Log N) hops

N32

N10

N5

N20

N110

N99

N80

N60

Lookup(K19)

K19

Page 27: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

TapestryTapestry

Self-administered, self-organized, location Self-administered, self-organized, location independent, scalable, fault-tolerantindependent, scalable, fault-tolerant

Each node has a neighbor map table with Each node has a neighbor map table with neighbor information.neighbor information.

Page 28: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

Tapestry Cont.Tapestry Cont.

The system is able to adapt to network The system is able to adapt to network changes because it algorithms are dynamic.changes because it algorithms are dynamic.

This also provides for Fault-handlingThis also provides for Fault-handling

Page 29: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

CANCAN

The network is created in a tree-like form.The network is created in a tree-like form. Each node is associated to one in the upper Each node is associated to one in the upper

level an to a group in the lower level.level an to a group in the lower level. A query travels from the uppermost level A query travels from the uppermost level

down through the network until a match is down through the network until a match is found or until it reaches the lowermost found or until it reaches the lowermost level.level.

For its query model, scalability is an issue.For its query model, scalability is an issue.

Page 30: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

CAN Cont.CAN Cont.

The tree like network:The tree like network:

Page 31: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

OceanstoreOceanstore

De-centralized but monitored system.De-centralized but monitored system. Build thinking of untrusted peers (for data Build thinking of untrusted peers (for data

storage) and nomadic data.storage) and nomadic data. Monitoring allows pro-active movement of Monitoring allows pro-active movement of

data.data. Uses replication and caching.Uses replication and caching. Two lookup methods used: Fast Two lookup methods used: Fast

probabilistic and slow deterministic.probabilistic and slow deterministic.

Page 32: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

MojoNationMojoNation

Centralized: Central Service Broker + many Centralized: Central Service Broker + many peers.peers.

When a file is inserted it’s hashed. This is When a file is inserted it’s hashed. This is the files Unique Identifier.the files Unique Identifier.

Uses fragmentation and replication (50%)Uses fragmentation and replication (50%) Load balancing is an issue since it is Load balancing is an issue since it is

market-based.market-based.

Page 33: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

PantherPanther

Based on Based on ChordChord lookup algorithms lookup algorithms ChordChord capabilities are used for load capabilities are used for load

balancing and replicationbalancing and replication Files and file chunks are identified by keys Files and file chunks are identified by keys

generated through generated through ChordsChords hash system. hash system. Replication, fragmentation and caching are Replication, fragmentation and caching are

used.used. Authentication is provided through the use Authentication is provided through the use

of public and private keys.of public and private keys.

Page 34: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

EternityEternity

Provide “eternal” storage capabilities.Provide “eternal” storage capabilities. Protect data even from its publisher and Protect data even from its publisher and

systems administratorssystems administrators Fragmentation and replication are proposedFragmentation and replication are proposed Anonymity is used to protect dataAnonymity is used to protect data

Page 35: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

OthersOthers Others:Others:

– Ohaha system uses consistent hashing-like algorithm to map documents to nodes. Query routing is like the one in freenet, which brings some of the weaknesses of freenet together.

– The Rewebber maintains a measure of anonymity for producers of web information by means of an encrypted URL service. TAZ extends Rewebber by using chains of nested encrypted URL’s that successively point to different rewebber services to be contacted.

– Intermemory and INDIA are two cooperative systems where files are divided into redundant shares and distributed among many servers. They are intended for long term archival storage along the lines of Eternity.

– The xFS file system focuses on providing support to distributed applications on workstations interconnected by a very high-performance network, providing high availability and reliability.

– Frangipani is a file system built on the Petal distributed virtual disk, providing high availability and reliability like xFS through distributed RAID semantics. Unlike xFS, Petal provides support for transparently adding, deleting, or reconfiguring servers.

– GNUnet is free software, available to the general public under the GNU Public License (GPL). As opposed to Napster and Gnutella, GNUnet was designed with security in mind as the highest priority.

Page 36: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

ComparisonComparisonThings to note Routing and File Retrieval

Napster Centralized server(single point of failure), not scalable Centralized lookup, keyword is used for look-up

Gnutella Inefficient routing(flooding), not scalable Flooded routing(multicast), filename is used for look-up

Freenet Scales well, intelligent routing, anonymity maintained No structured lookup algorithm, degrades efficiency

Free Haven Anonymous, accountability maintained File is split into n shares, k of which can rebuild the file

Shared-Private keys used for encryption

Farsite Scales well, accountability is maintained, feasible Files stored in encrypted format

Publius Anonymity is maintained File is split into n shares, k of which can rebuild the file

Pastry Leafset, Routing Table, Neighborhood set O(logN) without neighbor failure, O(1) if node is in leaf set

Used by Squirrel, PAST, Scribe Routing based on address prefixes

Chord lookup(key)=> IP address, used by CFS, Panther O(logN) with neighbor failure, using replicas on successors

Routing based on finger table(O(LogN)) Routing based on numerical difference with destination node

Tapestry Neighbor Map Table on Nodes Routing based on address prefixes

CAN Tree-like network topology Routing table doesn't grow with network size

Scalability is an issue Messages routed in d-dim. Space, each node has routing

table of O(d) entries, any node reached in (dN1/d) routing hops

Page 37: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

File Storage Systems File Storage Systems ComparisonComparison

ReplicationReplication

CFSCFS

MojonationMojonation

OceanStoreOceanStore

PantherPanther

De-centralizedDe-centralizedFragmentationFragmentationCachingCaching

Page 38: Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

ConclusionConclusion

Issues that need to be addressed: Caching, Issues that need to be addressed: Caching, versioning, fragmentation, replication.versioning, fragmentation, replication.

Copyright laws!Copyright laws! The technology is very promising. It will The technology is very promising. It will

probably a common thing in the near future.probably a common thing in the near future.