1 End-to-End Detection of Shared Bottlenecks Sridhar Machiraju and Weidong Cui Sahara Winter Retreat 2003.

1

End-to-End Detection of Shared Bottlenecks

Sridhar Machiraju and Weidong Cui

Sahara Winter Retreat 2003

2

Problem Statement

• Given 2 end-to-end flows f1 and f2, do they share a bottleneck (a congested link i.e., link with packet drops)

(OR)

• Given 2 routes R1 and R2 on the Internet, do they share a bottleneck link?

3

Why is this hard?

• No information from the network• Only information available – delay and

drops. • Lots of noise – delay from intermediate

links and drops on other links• Bottlenecks may change over time

4

Why solve this problem?

• Overlays – – RON - Decide if rerouting flows bypasses

congestion points or not– RON – Does such rerouting affect existing

flows? Which ones?– Cooperative overlays – overlay does not

want to share bottleneck with a “friendly overlay”

– OverQoS – Useful to cluster together overlay links based on shared bottlenecks

5

Why solve this problem (cont.)?

• Other applications– Massive backups of data from different

servers – do them in parallel?– Content distribution – is the use of multipath

going to improve performance?– Kazaa – parallel downloads from peers– Multihomed ASs can evaluate the

“orthogonality” in terms other than fault-tolerance

6

Related Work• Past work done only with Y or Inverted-Y

topologies using Poisson probes, packet pairs and inter-arrival times.

Receivers

Senders

7

Goals

• Provide a general solution for double-Y topology

• Work with multiple bottlenecks and provide an indicator of shared congestion

• Be able to use active probe flows and also passively observed (TCP) flows

• Complexity issues for clustering flows

8

Motivation of Our Techniques

• Droptail queues + TCP – queues exhibit bursty loss periods + no losses

• Queues build-up until bursty losses and decrease in sizes before increasing again

• Provides motivation for correlating periods of drops and delays (proportional to queue sizes)

• But…

9

Synchronization Lag

0

T

d1 d2+

Flow 1

Flow 2

Time

Sender 1

Sender 2

0 1 2 3 4 5 6 7 8

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6

0 1 2 3 4

Synchronization Lag = 3T

Note: is bounded by RTTmax/2

10

Overview of Our Techniques

• We propose 2 techniques – – Probability Distribution (PD) technique – Cross-Correlation (CC) technique

• PD is based on getting the peak of the discrete probability distribution of, minimum time between drop of a flow and drop of the other

• CC is based on getting the maximum cross-correlation assuming various synch. lags

11

PD Technique

• For each dropped packet of a flow, plot PD of minimum of the time differences between its sending time and the sending times of dropped packets of other flow

• If shared bottleneck, we expect (ideally) a 1 at d2- d1+ ; All flows may not see drops during same burst, so use threshold < 1 for peak

• We may see more than 1 drop in a burst; cluster drops into bursts and use time differences between starts of bursts

12

PD technique (contd.)

• Robustness issues: synch. lag must be smaller than the time difference between consecutive drops of a flow

Delay1

Delay2

Packet Loss

13

Cross-Correlation (CC) Technique

• Key ideas– Two “back-to-back” packets from two

different flows will experience similar packet drop/delay at the bottleneck

– If we can generate two sequences of “back-to-back” packets from two different flows, then we can calculate their cross-correlation coefficient of losses or delays to measure their “similarity”.

– If the cross-correlation coefficient is greater than some threshold, then the two flows share a bottleneck.

Network

14

Questions about the CC Technique

• How to generate two sequences of “back-to-back” packets?– UDP probes with a constant interval T

• average interval <= T/2

– Shift the sequence to overcome the synch. lag

• How long should the two sequences be to get a significant result?– When the CC coefficient becomes relatively stable– But no less than a minimum period of time

• What should the threshold be?– Use 0.1 in the experiments– Why 0.1?

15

Overcome the Synchronization Problem

Delay1

Delay2

Shift 2 packets

Packet Loss

• Find the max cross-correlation by shifting one of the two sequences within some range

• The value of the optimal shift is an estimation of the synchronization lag.

16

Wide-Area Experiments

• Challenges– Access to hosts distributed globally?– How to verify our experimental results?

• Solutions– PlanetLab (http://www.planet-lab.org)– Set up an overlay network with double-Y

topology– Application-level routers monitor losses and

delays

17

Topology with Shared Bottleneck (I)

Vancouver

Seattle Wisc

Atlanta

Bologna

Sydney

18

Topology without Shared Bottleneck (II)

Vancouver

Seattle Wisc

Atlanta

Bologna

Sydney

19

Experimental Setup

• Active Probing– 40 bytes per packet– Every 10ms

• Log packet arrival times on every node– Also can get information of losses from these

logs

• Traces from 10mins to 60mins• Threshold = 0.1 for the PD and CC

techniques

20

Overall ResultsExp # Packet Drops PD Technique

Loss CC Technique

Delay CC Technique

shared

Non-shared

Peak Value

Est. Lag

CC Coeff.

Est. Lag

CC Coeff.

Est. Lag

1(20mins)

3 2096 < 0.1 - < 0.1 - < 0.1 -

2(10mins)

6772 165 0.21 60ms 0.22 50ms 0.12 50ms

3(10mins)

2070 32 0.45100m

s0.81 80ms < 0.1 -

4(10mins)

81 2252 < 0.1 - 0.38 -1.17s 0.99 -1.17s

5(30mins)

0 5565 < 0.1 - < 0.1 - < 0.1 -

6(60mins)

10272 1127 <0.1 - 0.23 6s < 0.1 -

7(10mins)

1592 57 < 0.1 - 0.75 -1.15 < 0.1 -

8(10mins)

1895 112 0.11180m

s0.55

300ms

< 0.1 -

Failed Cases

21

Why the Delay CC Technique fails?

• Delay spikes at the non-shared part.

22

Why the PD Technique fails?

• Large synchronization lag• Few number of drops at the bottleneck

23

Open Issues

• Parameter Selection– What should the thresholds be?

• Active vs. Passive Probing– Active probing: waste network resources– Passive probing: cannot control the size/rate of the

probing sequences.

• Multiple Bottlenecks– Our techniques are not limited to the cases of single

bottlenecks.– But need more quantitative evaluations

• Probability of sharing a bottleneck– How often should we generate probing sequence to

detect if two flows share a bottleneck?– Can we give a probability rather than a 0-1 decision?

24

Conclusions

• Problem– Detect if 2 end-to-end flows share a bottleneck

• Challenge– Synchronization lag in double-Y topology

• Techniques– The Probability Distribution Technique– The Loss/Delay Cross-Correlation Technique

• Experimental Results– The Loss CC technique succeeds with all experiments– The Delay CC technique fails in some experiments

due to delay spikes at the non-shared part – The PD technique fails in some experiments due to

large synch. Lag and few number of losses at the bottleneck

1 End-to-End Detection of Shared Bottlenecks Sridhar Machiraju and Weidong Cui Sahara Winter Retreat 2003.

Documents

time slide

packet loss slide

flow delay

different flows

rerouting flows

fault tolerance slide

receivers senders slide

minimum time