P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion.
Post on 05-Jan-2016
230 Views
Preview:
Transcript
P2P Computing
MIRA YUN
September 16, 2005
Outline
What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion
P2P
“Peer-to-peer” (P2P) refers to a class of systems and applications that employ distributed resources to perform a function in a decentralized manner
Generally opposed to the client/server architecture
Peers
A peer gives some resources and obtains other resources in return. Peer = like each other All participants are peers (in the pure form of a P2P net.)
Each peer depends on other peers Meaningless to be alone
Peers are autonomous (self governing) if not wholly controlled by each other or by the same authority as everyone else
What is P2P?
“The sharing of computer resources and services by direct exchange between systems” [p2pwg, 2001].
“Systems and applications that employ distributed resources to perform critical functions in a decentralized manner”
enables peers to share their resources (information, processing, presence, etc.) with at most a limited interaction with a Centralized server.
Taxonomy of computer systems
P2P Models : pure, hybrid, super-peers
Pure: peers have same capability and responsibility. symmetric communication. No host superior; all hosts can act as client or server. examples: Gnutella, Freenet
Hybrid: servers facilitate the interaction between peers addressing bypasses the DNS, but a central server as
directory examples: Napster, ICQ, Jabber
P2P Models : pure, hybrid, super-peers
Super-peers A super-peer is a node in a peer-to-peer network that
operates both as a server to a set of clients, and as an equal in a network of super-peers.
Super-peer networks try to balance the efficiency of centralized search, and the autonomy, load balancing and robustness to attacks provided by distributed search.
example: Kazaa
P2P search models
Centralized directory model There is a central index. Once the requested file is located,
exchange takes place directly
between peers.
P2P search models Napster
Created in 1999 by Shawn Fanning a freshman student at Northeastern University.
To freely get MP3 music files. Central index server, P2P exchange
Sued several times, suspended. The music industry is against Napster
because people can get music for free instead of paying for a CD.
Napster's defense is that the files are personal files that people maintain on their own machines, and therefore Napster is not responsible.
P2P search models
Flooded requests model Each request from a peer is flooded/broadcast to directly
connected peers (1) which in turn flood their peers (2). Propagated until a maximum number of floods
occur (typically 5 to 9) or the request
is answered. Used by Gnutella Requires a lot of bandwidth,
does not scale Good for company networks
P2P search models
Document routing model Each peer is assigned a random ID; each peers knows a
number of other peers. When a document is published, an ID is computed by hash
on the document contents and name. Each peer routes the document
to the node with the most similar
ID until the nearest peer ID is
the current peer's ID.
P2P search models
Document routing model When a peer requests the document, the request will go to
the peer with the ID most similar to the document ID. This process is repeated until a copy of the document is
found. Then the document is transferred
back to the request originator,
while each peer participating
in the routing will keep
a local copy.
P2P search models
Document routing model Efficient for large communities But document ID must be known before posting request Used in FreeNet Four improved algorithms:
Chord, CAN, Tapestry and Pastry.
Characteristics
Decentralization Centralized systems
Ideal for some applications Bottlenecks Inefficient use of resources Expensive to setup Hard to maintain
Decentralized systems P2P emphasis on the users' ownership and control of data
and resources. Fully decentralized is difficult in practice Hybrid approach
Characteristics
Scalability Limited by factors:
The amount of centralized operations The amount of state The inherent parallelism an application exhibits
Scalability also depends on the ratio of communication to computation between the nodes
Napset: can scale up to over 6 million users SETI@home : close to 3.5 million users so far
Characteristics
Anonymity One goal of P2P is to allow people to use systems
without concern for legal issue. Three different kinds of anonymity
sender anonymity, Receiver anonymity mutual anonymity Gnutella
Request is broadcast and rebroadcast until it reaches a peer with the content
Freenet Request is sent and forward to a peer that is most likely to have the
content
Characteristics
Self-Organization Needed because of scalability, fault resilience, and the
cost of ownership. Adaptation is required to handle the changes caused by
peers connecting and disconnecting from the P2P systems. Cost of Ownership
Reduces the cost of owning the systems and the content, and the cost of maintaining them.
SETI@home faster than fastest supercomputer in world, cost is 1%
Ad-Hoc Connectivity Has a strong effect on all classes of P2P systems
Characteristics
Performance Influenced by three types of resources:
processing, storage, and networking. Three key approaches to optimize performance:
Replication: puts copies of objects/files closer to the requesting peers
Caching : Reduces the path length required to fetch a file/object and therefore the number of messages exchanged between the peers.
Intelligent routing and network organization:
Taxonomy of P2P systems
- Processing scalability in massive multi-
parameters systems - Run by a central controller - Fork and join mechanism - Limitations
• Independent small parts• Internet latencies
- Intel claim speed-ups from 15hours to 30 minutes in case of interest
rate swap modeling by using P2P
Distributed Computing
Distributed Computing
SETI@home (Search for Extraterrestrial Intelligence) A collection of research projects aimed at discovering alien
civilizations. Goals: to search for extraterrestrial radio emissions. Design: Two major components: data server & client. Decentralization and Scalability:
distributes files (350KB large) to its users.
Jay ShethJay ShethJay Sheth
- Application level collaboration between users - Event based applications such as Instant messaging, chat, online games - Challenges
• Location of other peers (e.g.. NetMeeting requires to know other peers IP address)
• Real time constraints e.g.. Game DOOM
Collaboration
Jay ShethJay ShethJay Sheth
- Platforms have support for primary P2P components : naming, discovery, communication, security and resource aggregation - Candidates for future P2P platform : .net, JXTA
Platforms
Platforms (JXTA) JXTA = Juxtapose = side by side Open-source initiative from Sun (Java)
“JXTA™ technology is a set of open protocols that allow any connected device on the network ranging from cell phones and wireless PDAs to PCs and servers to communicate and collaborate in a P2P manner.”
“JXTA peers create a virtual network where any peer can interact with other peers and resources directly even when some of the peers and resources are behind firewalls and NATs or are on different network transports.”
Objectives: Interoperability - across systems and communities Platform independence - multiple/diverse languages, systems,
and networks Ubiquity - every device with a digital heartbeat
Platforms (JXTA)
Architecture JXTA application layer JXTA service layer JXTA core layer Set of 6 protocols
Peer Endpoint Protocols: available route to destination Peer Rendezvous Protocol : sign in/out, authentication Peer Resolver Protocol : send/receiver search queries for peers Pipe Binding protocols : pipe advertisement to pipe and point Peer Information protocol : learn peer’s status/properties Peer Discovery Protocol : find peers, groups, advertisement
- Content storage and exchange is where P2P is most successful
• Napster, Gnutella, Kazza
File Sharing
Gnutella Protocol v0.4 (1/5)
One of the most popular file-sharing protocols. Operates without a central Index Server (such as Napster). Clients (downloaders) are also servers => servents Clients may join or leave the network at any time => highly fault-
tolerant but with a cost! Searches are done within the virtual network while actual
downloads are done offline (with HTTP). The core of the protocol consists of 5 descriptors (PING,
PONG, QUERY, QUERYHIT and PUSH).
Gnutella Protocol (2/5) A Peer (p) needs to connect to 1 or more other Gnutella Peers
in order to participate in the virtual Network p initially doesn’t know IPs of its fellow file-sharers
Gnutella Network N
?
Servent p
Gnutella Protocol (3/5)a. HostCaches – The initial connection P connects to a HostCache H to obtain a set of IP addresses of
active peers. P might alternatively probe its cache to find peers it was
connected in the past.
Gnutella Network N
!
Servent p
Hostcache Servere.g. connect1.gnutellahosts.com:6346
1
2
Request/Receive a set of Active
Peers
H
Connect to network
Gnutella Protocol (4/5)
b. Ping/Pong – The communication overhead Although p is already connected it must discover new peers since its current
connections may break. Thus, it sends periodically PING messages which are broadcasted (message
flooding). If a host e.g. p2 is available it will respond with a PONG (routed only the same
path the PING came from). P might utilize this response and attempt a connection to p2 in order to increase
its degree. Gnutella Network N
Servent p
PING1
PONG2
Servent p2
Gnutella Protocol (5/5)
c. Query/QueryHit – The utilization Query descriptors contain unstructured queries e.g. “celine dion
mp3” They are again, like PING, broadcasted with a typical TTL=7. If a host e.g. p2 matches the query it will respond with a Queryhit
descriptor
d. Push – Enable downloads from peers that are firewalled. If a peer is firewalled => we can’t connect to him. Hence we request
from him to establish a connection on us and to send us the file.
Conclusions Not anything new ... but right time to:
Take advantage of available resources Find an alternative to centralized c/s solutions There is something attractive about the defiance or avoidance of authority.
Raised legal copyright issues
Currently, 60% to 89% of all Internet traffic is due to p2p traffic => source of revenue => marketing argument.
Potential good match between adhoc nets and P2P
Interesting architectural and technical issues behind ... And challenging requirements
Summary of P2P computing
top related