Top Banner
1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stut zbach, Reza Rejaie University of Oregon Multimedia Computing and Networking 2006 (MMCN’06), 18-19th January ose, California, USA
33

1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

1

Characterizing Files in the Modern Gnutella Network:

A Measurement Study

Shanyu Zhao, Daniel Stutzbach, Reza Rejaie

University of Oregon

SPIE Multimedia Computing and Networking 2006 (MMCN’06), 18-19th January 2006San Jose, California, USA

Page 2: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

2

Outlines

Measurement study of modern Gnutella system

Conduct static, topological and dynamic analysis

Help to improve design and evaluations of P2P file-sharing applications

Page 3: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

3

Previous studies

Focus on a small population Be more than three years old Not examine dynamics of file characteristics

over time and correlation between the overlay topology and file distribution

Page 4: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

4

Why Gnutella

Top three (eDonkey2K, FastTrack, Gnutella) Gnutella has Browse-Host extension to extra

ct the list of shared files from peers One of most studied P2P systems; compare

and contrast with previous studies

Page 5: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

5

Original Gnutella

A new node joins the system (Node A) Node A connects to some node (Node B) by pre-

existing list, a particular website, IRC and etc Node B sends its working nodes to Node A Node A connects provided nodes till certain

threshold During search, Node A sends requests to connected

nodes which in turn forward requests

Page 6: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

6

Original Gnutella

Nodes reply the request directly or indirectly depending on the firewall existence

Node A downloads file pieces from one ore more positive nodes

Unlike Napster, Gnutella is decentralized; flood-based searches

Page 7: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

7

Modern Gnutella

Contrast to unstructured overlay topology, most modern Gnutella clients adopt a two-tier overlay structure

Ultrapeers and leaf peers (majority) Legacy peers (not implement ultrapeer featur

e)

Page 8: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

8

Measurement methodology

Problems of general crawlers Slow, distorted, inflate population

Previous studies Partial snapshot, periodic probe of a fixed group Significance is doubted

Goal of this work Capture entire population (?) Short period

Page 9: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

9

Measurement methodology

Topology crawl List of neighboring nodes

Content crawl List of available files of each node Need more

Page 10: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

10

Cruiser

Parallel P2P crawler Orders of magnitude faster than previous

crawlers (?) Master-slave architecture

Slave crawls hundreds of peers and master coordinates multiple slaves

Increase degree of concurrency

Page 11: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

11

Cruiser

Using 6 off-the-shelf 1GHz GNU/Linux boxes, crawl takes 15min + 5.5hr + 15min ~ 6 hours

Each content crawl takes 10GB log file containing file name and content hash

Page 12: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

12

Dataset

Three measurement periods; within each period, take snapshots everyday

6/8/2005-6/18/2005, 8/23/2005-9/9/2005 and 10/11/2005-10/21/2005

Examine both short and long timescales

Page 13: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

13

Dataset

Page 14: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

14

Sources of unreachable nodes

Firewall Severe network congestion Peer departed Not support Browse Host protocol

Ultrapeers: depart Leaf peers: depart and firewall Contact 20% peers (~half a million)

Page 15: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

15

Problems

Low-bandwidth TCP connection Some crawls do not complete after the timeout threshold,

as they are sent at extremely low rate

File identity File name is not a reliable file identifier; so this work use

content hash

Post-processing More than 100 million distinct files Divide into 7 segments randomly, trim files of less than 10

copies in a segment, combine trimmed back to one

Page 16: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

16

Static analysis

Ratio of free riders Degree of resources sharing among

cooperative peers File popularity distribution File type analysis

Page 17: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

17

Ratio of free riders

Free riders drop, ratio of ultrapeers is lower, long-lived peers slightly higher, # files not strongly correlate

Page 18: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

18

Degree of resources sharing among cooperative peers

Distribution of # peers sharing x files – power-law distribution

Page 19: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

19

Degree of resources sharing among cooperative peers

Distribution of contributed disk space – power-law distribution

Page 20: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

20

Degree of resources sharing among cooperative peers

Correlation not as strong as previous studies Discernable line with slope 3.7MB/file which

is typical size of MP3 audio file

Page 21: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

21

File popularity distribution

Page 22: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

22

File type analysis

Page 23: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

23

File type analysis

Previous studies Current studies

Music 67.2% files

79.2% bytes

67% files

40% bytes

Video 2.1% files

19.1% bytes

6% files

52.5% bytes

Page 24: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

24

Topological analysis

Per-file perspective – figure a & b Per-peer perspective – figure c

Page 25: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

25

Topological analysis

Churn (dynamics of peer participation) is dominant factor Depart Join Leaf peers become ultrapeers Rapid change in overlay topology prevents format

ion of topological clustering

Page 26: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

26

Dynamics analysis

Variations in shared files by individual peers Variations in popularity of individual files Trends in popularity variations

Page 27: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

27

Variations in shared files by individual peers

Page 28: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

28

Variations in popularity of individual files

Focus on top 100 and top 1000 files

Page 29: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

29

Trends in popularity variations

Track top 10 files across several days (fig a & b) Over several months (fig c)

Page 30: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

30

Conclusion

Use parallel crawl to obtain snapshots of peer connectivity and available files

Conduct three types of analysis Understand the distribution, correlation and

dynamics of available files

Page 31: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

31

Summary of findings

Free riding significantly drops # shared files and contributed storage space

by individual peers follow power-law distribution most peers contribute little disk space (<100MB) while small # peers contribute very large space (50-100GB)

Popularity of individual files follow Zipf distribution small # files are extremely popular but majority of files are very unpopular

Page 32: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

32

Summary of findings

Most popular file type is MP3 file (2/3 of all files, 1/3 of all bytes)

Popularity and occupied space by video files has tripled over past few years

# video files < 1/10 of audio files but occupy 25% more bytes

93% of bytes or 73% of files are multimedia files

Page 33: 1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

33

Summary of findings

Files are randomly distributed; no strong correlation between the available files at peers that are one, two or three hops apart in overlay topology

Shared files by individual slowly change over timescale of days; more popular files experience larger variations in popularity