Top Banner
Indexing data-oriented overlay networks • September 1 st , 2005 Indexing data-oriented overlay networ Presented by: Anwitaman Datta Joint work with Karl Aberer, Manfred Hauswirth, Roman Schmidt Ecole Polytechnique Fédérale de Lausanne (EPFL) Patron s: NCCR-MICS: www.mics.ch/ Evergrow: www.evergrow.org/ 2 0 0 5 Swiss National Centres of Competence in Research Mobile Information & Communication Systems EC FP6, IST priority “Complex System Research” Contract no. 001935 (FET-IP) Ever-growing global scale-free networks, their provisioning, repair and unique functions.
26

Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Dec 27, 2015

Download

Documents

Sheena Parrish
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Indexing data-oriented overlay networks • September 1st, 2005

Indexing data-oriented overlay networks

Presented by: Anwitaman Datta

Joint work with Karl Aberer, Manfred Hauswirth, Roman Schmidt

Ecole Polytechnique Fédérale de Lausanne (EPFL)

Patrons:

NCCR-MICS: www.mics.ch/

Evergrow: www.evergrow.org/

2005

Swiss National Centres of Competence in Research Mobile Information & Communication Systems

EC FP6, IST priority “Complex System Research” Contract no. 001935 (FET-IP)Ever-growing global scale-free networks, their provisioning, repair and unique functions.

Page 2: Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Indexing data-oriented overlay networks • September 1st, 2005

Structured overlays

♫ Associate each peer with some part of the load, i.e., a partition of the key-space

♪ e.g. as in Distributed Hash Tables (DHT)

♫ Provide an efficient routing mechanism to locate peer responsible for a particular part of the key-space

♪ Various choice of topology possible

Page 3: Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Indexing data-oriented overlay networks • September 1st, 2005

Structured overlay maintenance

♫ Dynamics

♪ Churn: Peers Join/Leave

♪ New data inserted

♫ Standard maintenance mechanisms♪ Correspond to updating database index

♪ Traditionally: Overlay evolution has been studied for

incremental peer population

Challenge #1: Fast construction of structured overlay from scratch

Page 4: Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Indexing data-oriented overlay networks • September 1st, 2005

♫ Hash Tables give constant time look-ups♪ At the cost of losing ordering information

♪ DHTs need log(n) network hops

♫ Can we preserve (semantic) ordering information?♪ Skewed load-distribution

Challenge #2: The structured overlay should deal with arbitrary skew of load

Overlays for data-oriented applications

Page 5: Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Indexing data-oriented overlay networks • September 1st, 2005

Toy example: Distributing skewed load

0 1

Load-distribution

1

23

45

6 7

8

Key-space

Page 6: Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Indexing data-oriented overlay networks • September 1st, 2005

♫ Key-space can be divided in two partitions

♪ Assign peers proportional to the load in the two sub-partitions

A globally coordinated recursive bisection approach

0 1

1 23

45

6

7

8

Load-distribution

Page 7: Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Indexing data-oriented overlay networks • September 1st, 2005

♫ Recursively repeat the process to repartition the sub-partitions

A globally coordinated recursive bisection approach

0 1

1

2

3

45

6

7

8

Load-distribution

Page 8: Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Indexing data-oriented overlay networks • September 1st, 2005

♫ Partitioning of the key-space s.t. there is equal load in each partition

♪ Uniform replication of the partitions♪ Important for fault-tolerance

♫ Note: A novel and general load-balancing problem.

A globally coordinated recursive bisection approach

1

Load-distribution

0

1

2

34

5

6

7

8

Page 9: Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Indexing data-oriented overlay networks • September 1st, 2005

Lessons from the globally coordinated algorithm

♫ The intermediate partitions may be such that they can not be perfectly repartitioned.

♪ There’s a fundamental limitation with any bisection based approach, as well as for any fixed key-space partitioned overlay network.

♫ Limit of dealing with load skews

♫ Nonetheless practical♪ For realistic load-skews and peer populations

Achieves an approximate load-balance.

Page 10: Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Indexing data-oriented overlay networks • September 1st, 2005

1 step: Distributed proportional partitioning - for overlay construction

♫ Given:♪ A mechanism to meet other random peers♪ A parameter p for partitioning the space

♫ Proportional partitioning: Peers partition proportional to the load distribution

♪ In a ratio p:1-p♪ Lets say: we call the sub-partitions as 0 and 1

♫ Referential integrity: Obtain reference to the other partition

♪ Needed to enable overlay routing

♫ Sorting the load/keys: Peers exchange the locally stored keys in order to store only keys for its own partition.

*

1

000,010,100

*

3

101,001

Randominteraction

1: 3

1

000,010,001

0: 1

3

101,100

Routing table

pid

Keys (only part of the prefix is shown)

Legend

0 1

partitioning

Page 11: Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Indexing data-oriented overlay networks • September 1st, 2005

Heuristic 1: Autonomous partitioning (AUT)

♫ Make a priori probabilistic decision (parameterized by p) for a sub-partition

♪ proportionality constraint automatically met

♫ Find a peer from the other partition♪ In order to meet referential integrity constraint

♫ Markovian asymptotic analysis of the process (for p = 0.5)

♪ 2 log(2) interactions (on an average) per peer

Page 12: Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Indexing data-oriented overlay networks • September 1st, 2005

Heuristic 2: Eager partitioning (for p = 0.5)

♫ Undecided peers initiate contact with other random peer

♪ If contacted peer is also undecided, contacting and contacted peers decide for different partitions (Balanced split) ♪ If contacted peer has already decided, contacting peer decides for the other partition (Unblanced split)

♫ Markovian asymptotic analysis of the process (for p = 0.5)

♪ log(2) interactions (on an average) per peer

♫ AUT is relatively inefficient♪ AUT wastes interactions in order to find a suitable peer

Challenge: Can we have a strategy which works for all values of p, and is as efficient as eager partitioning when p = 0.5?

Page 13: Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Indexing data-oriented overlay networks • September 1st, 2005

AEP: Adaptive eager partitioning (w.l.g, p ≤ 0.5)

♫ Undecided peers initiate contact with other random peers

♪ If contacted peer is also undecided, perform Balanced split with probability:

♪ Since we need more peers (a fraction of 1-p ) in sub-partition 1

♪ If the contacted peer has already decided for 0, contacting peer decides for 1

♪ If the contacted peer has already decided for 1, contacting peer decides for 0 with a probability:

♪ 1 otherwise, since we need more peers in sub-partition 1

Page 14: Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Indexing data-oriented overlay networks • September 1st, 2005

Adaptive eager partitioning: choice of parameters

♫ Markovian analysis of the interactions♪ Parameterized equations for &

♫ 0 ≤ p ≤ 1-log(2)

♫ 1-log(2) ≤ p ≤ 0.5

Page 15: Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Indexing data-oriented overlay networks • September 1st, 2005

AEP: Without global knowledge of p

♫ If we only have local estimates of p♪ Error analysis: What’s the distribution of the estimates, and how does it affect the partitioning process?

♪ Introduces systematic skew

♪ Favors larger partition

♪ Compensating the skew

Page 16: Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Indexing data-oriented overlay networks • September 1st, 2005

COR: Skew compensated for AEP

Page 17: Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Indexing data-oriented overlay networks • September 1st, 2005

Algorithmic Issues: Overlay Construction

♫ Initiating the indexing process

♫ Synchronizing and terminating the process♪ Synchronizing replicas

♫ Complexity♪ Latency: O(log(n)2)

- linear for sequential processes

♪ Communication: O(n.log(n)2) - same as in sequential processes

Page 18: Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Indexing data-oriented overlay networks • September 1st, 2005

Simulation results

♫ Discrete time simulation♪ Mathematica based proprietary simulator

♫ Workloads♪ Uniform, Pareto, Normal, real text collection from IR apps. (EU project: Alvis)

♫ Evaluation♪ Deviation w.r.to what is obtained by the globally coordinated algorithm

♪ Measured in terms of the Euclidian Distance

Page 19: Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Indexing data-oriented overlay networks • September 1st, 2005

Simulation results: How useful is the theory?

Theory vs. Heuristic (256 peers)deviation

Load distribution

Load-distributionU: UniformP: ParetoN: NormalA: Alvis IR proj. text

Page 20: Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Indexing data-oriented overlay networks • September 1st, 2005

Load-distributionU: UniformP: Pareto N: Normal

Quality of load-balancing w.r.to peer population

Peer populationsdeviation 256 512 1024

Expts: Population & Load distribution

Page 21: Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Indexing data-oriented overlay networks • September 1st, 2005

Scalability

Load-distributionU: UniformP: ParetoN: NormalA: Alvis IR proj. text

Interactions required per peer for overlay constructioninteractions 256 512 1024

Expts: Population & Load distribution

Page 22: Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Indexing data-oriented overlay networks • September 1st, 2005

From theory to practice: PlanetLab experiments♫ PlanetLab Testbed

♪ 400+ computers spread over various organizations and continents (www.planet-lab.org)

♫ Java implementation integrated with P-Grid♪ P-Grid is a full-fledged P2P software (www.p-grid.org)

♫ Workload♪ Text from IR applications studied under EU project Alvis (www.alvis.info)

Bootstrap the peers and form an unstructured network

Structured overlay construction Experiments evaluating search performance

Churn

Simulation vs. Expts

Sim Expt

0.38 0.39

deviation

peer

s

Expt period

"All models are wrong, but some are useful." - George E.P. Box

Page 23: Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Indexing data-oriented overlay networks • September 1st, 2005

Bandwidth consumptionOverlay construction phase Overlay operational phase

♪ Construction process involves sorting keys.

♪ Initially it has higher bandwidth requirement.

♪ (Later) In operational phase, the queries dominate the bandwidth consumption.

Expt period

Page 24: Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Indexing data-oriented overlay networks • September 1st, 2005

Overlay performance

♪ Overlay construction was complete and peers discovered all their replicas

♪ Plots show absolute query latency

♪ In terms of overlay hops, experiments match theory

♪ Churn leads to larger deviation, but 95% to 100% success rate

Expt period

query latency ChurnNo churn

Page 25: Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Indexing data-oriented overlay networks • September 1st, 2005

Related work

♫ Mostly sequential construction♪ Recent work on fast overlay construction [SPAA 2005]

♪ Does not deal with load-balancing

♫ Load-balancing♪ Mostly addresses uniform load-distribution case ♪ Some work on skewed loads [e.g., VLDB 2004]

♪ Incremental load/peer population changes♪ No dynamic adaptation of replication

Page 26: Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Indexing data-oriented overlay networks • September 1st, 2005

www.p-grid.org

Java implementation source-code available for download

♫lso: Range query paper@ IEEE P2P 2005