Top Banner
A Measurement Study of Peer- to-Peer File Sharing Systems Presented by Cristina Abad
32

A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Mar 27, 2015

Download

Documents

Olivia Harper
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

A Measurement Study of Peer-to-Peer File Sharing Systems

Presented by

Cristina Abad

Page 2: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Motivation

In a P2P file sharing system, peers are usually in the “edge” of the network

Does this affect/limit the quality of the infrastructure?

What are the characteristics of hosts that choose to participate?

Solution: Measure Gnutella and Napster traffic to help understand these issues

Page 3: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Napster

Page 4: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Gnutella

Page 5: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Methodology Crawler periodically takes “snapshot” of

Napster/Gnutella– capture basic info (peers, files shared, …)

For peers discovered– measure bottleneck bandwidth– measure latency– track content and degree of sharing

Measure lifetime– track availability of peers (at P2P and IP level)

Page 6: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Crawling Napster Peers can only be discovered by querying

index Crawler issues queries with names of

popular song artists Query responses contain

– IP, reported bandwidth, files shared (number, names and sizes)

Results:– Captured 40-60% of Napster hosts

(contributing to 80-95% of total files)– Could not capture peers that do not share files

Page 7: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Crawling Gnutella Crawler uses ping/pong to discover peers Each crawl captured aprox. 10000 peers

Page 8: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Measuring bandwidth Reported bandwidth may not be accurate

(ignorance or lies) Use bottleneck bandwidth as approximation

to available bandwidth– capacity of slowest host along path between

two hosts

Used SProbe to actively measure both upstream and downstream bottleneck bandwidth– Similar to “packet pair” technique

Page 9: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Packet Pair Technique Two packets queued next to each other at

bottleneck link exit the link t seconds apart:

Then,

Kevin Lai and Mary Baker. “Measuring bandwidth”. In Proceedings of IEEE INFOCOM '99. 1999.

bnlb

st 2 s2: size of second packet

bbnl: bottleneck bandwidth

t

sbbnl

2

Page 10: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

How many peers are server-like? High-bandwidth, low latency, high

availability 8% have upstreambb 10Mbps

Page 11: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Availability – Host uptimes

Page 12: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Availability – Session duration

Page 13: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Free-riders

Page 14: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Is Gnutella robust?

Page 15: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload

Presented by

Cristina Abad

Page 16: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Three-tiered approach

1. Analyze 200-day trace of Kazaa traffic Considered only traffic going from U.

Washington to the outside

2. Develop a model of multimedia workloads

Analyze and confirm hypothesis

3. Explore potential impact of locality -awareness in Kazaa

Page 17: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Contributions Obtained some useful characterizations of

Kazaa’s traffic Showed that Kazaa’s workload is not Zipf

– Showed that other workloads (multimedia) may not be Zipf either

Presented a model of P2P file-sharing workloads based on their trace results– Validated the model through simulations that

yielded results very similar to those from traces Proved the usefulness of exploiting locality-

aware request routing

Page 18: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Measurement results

Users are patient Users slow down as they age Kazaa is not one workload Kazaa clients fetch objects at-most-once Popularity of objects is often short-lived Kazaa is not Zipf

Page 19: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

User characteristics (1)

Users are patient

Page 20: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

User characteristics (2) Users slow down as they age

– clients “die”– older clients ask for less each time they use

system

Page 21: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

User characteristics (3) Client activity

– Tracing used could only detect users when their clients transfer data

– Thus, they only report statistics on client activity, which is a lower bound on availability

– Avg session lengths are typically small (median: 2.4 mins)

• Many transactions fail

• Periods of inactivity may occur during a request if client cannot find an available server with the object

Page 22: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Object characteristics (1) Kazaa is not

one workload

Page 23: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Object characteristics (2)

Kazaa object dynamics– Kazaa clients fetch objects at most once– Popularity of objects is often short-lived– Most popular objects tend to be recently born

objects– Most requests are for old objects

Page 24: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Object characteristics (3) Kazaa is not Zipf Web access patterns are Zipf: small number of

objects are extremely popular, but there is a long tail of unpopular requests.

Zipf’s law: popularity of ith-most popular object is proportional to i-α, (α: Zipf coefficient)

(Zipf) looks linear on log-log

scale

Page 25: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Model of P2P file-sharing workloads On average, a client requests 2 objects/day P(x): probability that a user requests an

object of popularity rank x Zipf(1)– Adjusted so that objects are requested at most

once A(x): probability that a newly arrived object

is inserted at popularity rank x Zipf(1) All objects are assumed to have same size Use caching to observe performance

changes (effectiveness hit rate)

Page 26: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Model – Simulation results File-sharing effectiveness diminishes with

client age– System evolves towards one with no locality

and objects chosen at random from large space New object arrivals improve performance

– Arrivals replenish supply of popular objects New clients cannot stabilize performance

– Can’t compensate for increasing number of old clients

– Overall bandwidth increases in proportion to population size

Page 27: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Model validation By tweaking the arrival rate of of new

objects, were able to match trace results (with 5475 new arrivals per year)

Page 28: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Exploring locality-awareness

Currently organizations shape or filter P2P traffic

Alternative strategy: exploit locality in file-sharing workload– Caching; or,– Use content available within organization to

substantially decrease external bandwidth usage– Result: 86% of externally downloaded bytes

could be avoided by using an organizational proxy

Page 29: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Questions?

Page 30: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Analysis

How can results obtained be used when evaluating P2P schemes?

Are any of the measurements obtained biased?

Peers are heterogeneous– Incentives– Enforcement (e.g. super-peers in Kazaa)

Page 31: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

SProbe

Works in uncooperative environments Works on asymmetric network paths Exploit properties of TCP protocol

– Send SYN packet with large payload; then, measure time dispersion of received RST packet

Page 32: A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Zipf Linguist George Kingsley Zipf observed that for

many frequency distributions, the n-th largest frequency is proportional to a negative power of the rank order n

"Zipf's law" is also sometimes used to refer to the corresponding probability distribution

Is an instance of a power law Zipf's law is often demonstrated by plotting the

data, with the axes being log(rank order) and log(frequency). If the points are close to a single straight line, the distribution follows Zipf's law.