Top Banner
Comparing Hybrid Peer-to- Peer Systems Beverly Yang and Hector Garcia-Molina Presented by Marco Barreno November 3, 2003 CS 294-4: Peer-to-peer systems
22

Comparing Hybrid Peer-to-Peer Systems Beverly Yang and Hector Garcia-Molina Presented by Marco Barreno November 3, 2003 CS 294-4: Peer-to-peer systems.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Comparing Hybrid Peer-to-Peer Systems Beverly Yang and Hector Garcia-Molina Presented by Marco Barreno November 3, 2003 CS 294-4: Peer-to-peer systems.

Comparing Hybrid Peer-to-Peer Systems

Beverly Yang and Hector Garcia-Molina

Presented by Marco BarrenoNovember 3, 2003

CS 294-4: Peer-to-peer systems

Page 2: Comparing Hybrid Peer-to-Peer Systems Beverly Yang and Hector Garcia-Molina Presented by Marco Barreno November 3, 2003 CS 294-4: Peer-to-peer systems.

Hybrid peer-to-peer systems

Pure peer-to-peer systems are hard to scale

Gnutella

Look at hybrids between p2p and server-client

Servers will index files, clients download from each other directly

Searching can be done more efficiently on a server

Napster (but Napster had its own problems...)

Several other architectures

Page 3: Comparing Hybrid Peer-to-Peer Systems Beverly Yang and Hector Garcia-Molina Presented by Marco Barreno November 3, 2003 CS 294-4: Peer-to-peer systems.

Questions for hybrid systems

Best way to organize servers?

Index replication policy?

What queries are submitted often?

How do we deal with churn?

How do query patterns affect performance?

Page 4: Comparing Hybrid Peer-to-Peer Systems Beverly Yang and Hector Garcia-Molina Presented by Marco Barreno November 3, 2003 CS 294-4: Peer-to-peer systems.

Contributions of this paper

Presents several architectures for hybrid systems

Presents and evaluates a probabilistic model for queries

Compares architectures quantitatively, based on their models and the music sharing domain

Compares strategies in non-music-sharing domains (a bit)

Page 5: Comparing Hybrid Peer-to-Peer Systems Beverly Yang and Hector Garcia-Molina Presented by Marco Barreno November 3, 2003 CS 294-4: Peer-to-peer systems.

General concepts: basic actions

Login

A client connects to a server and uploads metadata about the files it offers

It is a local user to that server, a remote user to others

Query

A list of words to search on

Satisfied if preset maximum number of results found

Download

Contact peer directly after getting info from server

Page 6: Comparing Hybrid Peer-to-Peer Systems Beverly Yang and Hector Garcia-Molina Presented by Marco Barreno November 3, 2003 CS 294-4: Peer-to-peer systems.

Goal

The goal of this study is to maximize UsersPerServer

What do you think of this goal?

Page 7: Comparing Hybrid Peer-to-Peer Systems Beverly Yang and Hector Garcia-Molina Presented by Marco Barreno November 3, 2003 CS 294-4: Peer-to-peer systems.

Batch vs. incremental logins

Batch: on login/logout, user's entire metadata set is added/removed

Allows index to remain small, but login/logout is expensive

Incremental: metadata kept in index at all times, and only deltas are sent at login

Saves much effort on login/logout

Queries become more expensive, as server must filter for online users

Page 8: Comparing Hybrid Peer-to-Peer Systems Beverly Yang and Hector Garcia-Molina Presented by Marco Barreno November 3, 2003 CS 294-4: Peer-to-peer systems.

Architectures (1)

Chained architecture

Servers are arranged in a linear chain (ring?)

Each server keeps metadata for local users

Unsatisfied queries sent along chain

Logins and downloads scalable; queries potentially expensive

Page 9: Comparing Hybrid Peer-to-Peer Systems Beverly Yang and Hector Garcia-Molina Presented by Marco Barreno November 3, 2003 CS 294-4: Peer-to-peer systems.

Architectures (2)

Full replication architecture

Each server keeps metadata about all users

Logins expensive

Queries cheap

Page 10: Comparing Hybrid Peer-to-Peer Systems Beverly Yang and Hector Garcia-Molina Presented by Marco Barreno November 3, 2003 CS 294-4: Peer-to-peer systems.

Architectures (3)

Hash architecture

Metadata words hashed so a particular server is responsible for a particular subset of them

Queries sent to relevant servers

On login, metadata sent to all relevant servers

Limited number of servers need to see each query, but sending the lists may be expensive

Page 11: Comparing Hybrid Peer-to-Peer Systems Beverly Yang and Hector Garcia-Molina Presented by Marco Barreno November 3, 2003 CS 294-4: Peer-to-peer systems.

Architectures (4)

Unchained architecture

Servers are independent and don't communicate

A user can only search files on the server he/she connects to

Napster

Disadvantage: user's views are limited

Advantage: scales very well (as servers, users increase together)

Page 12: Comparing Hybrid Peer-to-Peer Systems Beverly Yang and Hector Garcia-Molina Presented by Marco Barreno November 3, 2003 CS 294-4: Peer-to-peer systems.

Query model

Universe of queries: q1, q

2, q

3, ...; densities f, g

g(i) is probability that a submitted query is query q

i (query popularity)

f(i) is probability that any given file will match query q

i (selection power)

g tells us what queries users like to submit, while f tells us which files users like to store

Page 13: Comparing Hybrid Peer-to-Peer Systems Beverly Yang and Hector Garcia-Molina Presented by Marco Barreno November 3, 2003 CS 294-4: Peer-to-peer systems.

Expected results for chained

ExServ = Expected number of servers needed to obtain R results (MaxResults)

If P(s) is the probability that exactly s servers are needed to return R or more results, we have:

ExLocalResults based on (UsersPerServer * FilesPerUser) files

ExTotalResults based on (ExLocalResults * k) files

Page 14: Comparing Hybrid Peer-to-Peer Systems Beverly Yang and Hector Garcia-Molina Presented by Marco Barreno November 3, 2003 CS 294-4: Peer-to-peer systems.

Expected values for others

ExServ trivially 1 for full replication and unchained

ExServ is equivalent to balls-in-bins for hash

Page 15: Comparing Hybrid Peer-to-Peer Systems Beverly Yang and Hector Garcia-Molina Presented by Marco Barreno November 3, 2003 CS 294-4: Peer-to-peer systems.

Distributions for f() and g()

Exponential distributions work well for music domain:

Monotonically decreasing

Popularity and selection power are correlated

Most popular has highest selection power, and so on

Page 16: Comparing Hybrid Peer-to-Peer Systems Beverly Yang and Hector Garcia-Molina Presented by Marco Barreno November 3, 2003 CS 294-4: Peer-to-peer systems.

Validation of query model

M(n) = expected # results from n files

Q(n) = probability we don't get R results

These data gathered from OpenNap

Page 17: Comparing Hybrid Peer-to-Peer Systems Beverly Yang and Hector Garcia-Molina Presented by Marco Barreno November 3, 2003 CS 294-4: Peer-to-peer systems.

Performance model

CPU cycles

Cost estimates based on examination and guesswork, plus some experiments

Matched OpenNap relatively well for batch logins

Inter-server bandwidth

Varies among architectures

Server-client bandwidth

Napster protocol: Login, AddFile, RemoveFile

Take min over resources (iterative estimation)

Page 18: Comparing Hybrid Peer-to-Peer Systems Beverly Yang and Hector Garcia-Molina Presented by Marco Barreno November 3, 2003 CS 294-4: Peer-to-peer systems.

Evaluation

Metric: max users per server (throughput, not latency)

Page 19: Comparing Hybrid Peer-to-Peer Systems Beverly Yang and Hector Garcia-Molina Presented by Marco Barreno November 3, 2003 CS 294-4: Peer-to-peer systems.

Memory requirements

Page 20: Comparing Hybrid Peer-to-Peer Systems Beverly Yang and Hector Garcia-Molina Presented by Marco Barreno November 3, 2003 CS 294-4: Peer-to-peer systems.

Beyond music

f() and g() could be different

May be no or negative correlation

e.g. Adding “price > 0” to a query makes it less popular but doesn't change size of result set

e.g. Archive system will return more results from farther in the past (queries presumably rarer)

No or negative correlation can be modeled by adjusting the ratio of the parameters to f and g

No: r = 1 Negative: r >> 1

Page 21: Comparing Hybrid Peer-to-Peer Systems Beverly Yang and Hector Garcia-Molina Presented by Marco Barreno November 3, 2003 CS 294-4: Peer-to-peer systems.

CPU performance vs. r

Page 22: Comparing Hybrid Peer-to-Peer Systems Beverly Yang and Hector Garcia-Molina Presented by Marco Barreno November 3, 2003 CS 294-4: Peer-to-peer systems.

Conclusion

Chained is the best architecture for music domain

Full replication might be good with lots of cheap memory and stable network connections

Incremental logins do best when there is negative correlation between f and g, and it performs best in short, bandwidth-limited sessions