Top Banner
User-level Internet Path User-level Internet Path Diagnosis Diagnosis R. Mahajan, N. Spring, D. Wetherall and T. Anderson R. Mahajan, N. Spring, D. Wetherall and T. Anderson
21

User-level Internet Path Diagnosis R. Mahajan, N. Spring, D. Wetherall and T. Anderson.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: User-level Internet Path Diagnosis R. Mahajan, N. Spring, D. Wetherall and T. Anderson.

User-level Internet Path User-level Internet Path DiagnosisDiagnosis

R. Mahajan, N. Spring, D. Wetherall and T. AndersonR. Mahajan, N. Spring, D. Wetherall and T. Anderson

Page 2: User-level Internet Path Diagnosis R. Mahajan, N. Spring, D. Wetherall and T. Anderson.

The network is a black The network is a black box…box…

...so what can I do...so what can I do

1.We want the users to be able to diagnose their paths

2.Communicate information to ISP or NOC to improve the network

Page 3: User-level Internet Path Diagnosis R. Mahajan, N. Spring, D. Wetherall and T. Anderson.

TULIP: User-level path TULIP: User-level path diagnosisdiagnosis

Objectives: Objectives:

Detect performance faults that Detect performance faults that affect a user’s flows. This involves a affect a user’s flows. This involves a measure of the magnitude of the measure of the magnitude of the fault (queuing delay, loss) and the fault (queuing delay, loss) and the localization of the faulty link.localization of the faulty link.

Page 4: User-level Internet Path Diagnosis R. Mahajan, N. Spring, D. Wetherall and T. Anderson.

How TULIP does itHow TULIP does it

Ideal Architecture – Packet based Ideal Architecture – Packet based solutionssolutions

Each router the packet traverses adds a certain Each router the packet traverses adds a certain number of information to the packet: timestamp, number of information to the packet: timestamp, global address of the router’s input interface.global address of the router’s input interface.

Issue: Packet size increases at each hop. A packet Issue: Packet size increases at each hop. A packet loss involves a loss of all the information. loss involves a loss of all the information. Corruption of a packet might yield to incorrect Corruption of a packet might yield to incorrect diagnosis data (allthough most corruption are diagnosis data (allthough most corruption are treated as losses)treated as losses)

Page 5: User-level Internet Path Diagnosis R. Mahajan, N. Spring, D. Wetherall and T. Anderson.

Because things are never Because things are never idealideal

Basic architecture sufficient for data collectionBasic architecture sufficient for data collectionAssets: Fixed packet size and sufficient information…Assets: Fixed packet size and sufficient information…Assuming : stationarity of paths (paths between source and Assuming : stationarity of paths (paths between source and

destination don’t change too often)destination don’t change too often)

Page 6: User-level Internet Path Diagnosis R. Mahajan, N. Spring, D. Wetherall and T. Anderson.

Diagnosis tools in use in Diagnosis tools in use in TULIPTULIP

Out-of-band measurement probes (or Out-of-band measurement probes (or TTL based search) TTL based search) obtain the Sample TTL and Interface IDobtain the Sample TTL and Interface ID

ICMPICMP Router timestamp Router timestamp

IP identifiers IP identifiers Approximation of the per-flow counterApproximation of the per-flow counter

Page 7: User-level Internet Path Diagnosis R. Mahajan, N. Spring, D. Wetherall and T. Anderson.

How to detect path How to detect path loss/reorderingloss/reordering

Sending two probes to determine the behavior of the remote router

Page 8: User-level Internet Path Diagnosis R. Mahajan, N. Spring, D. Wetherall and T. Anderson.

Packet queuingPacket queuing

An ICMP timestamp is used to determine the queuing delays within a router (median)

Page 9: User-level Internet Path Diagnosis R. Mahajan, N. Spring, D. Wetherall and T. Anderson.

The TULIP methodsThe TULIP methods

To perform the measurement, TULIP To perform the measurement, TULIP uses two “scanning” methods. uses two “scanning” methods. Binary search (reduces diagnostic Binary search (reduces diagnostic

traffic but at a cost of diagnosis time)traffic but at a cost of diagnosis time) Parrallel search (interleaves Parrallel search (interleaves

measurements to different routers by measurements to different routers by cycling through them in nodes)cycling through them in nodes)

Page 10: User-level Internet Path Diagnosis R. Mahajan, N. Spring, D. Wetherall and T. Anderson.

Network Load and Network Load and Diagnosis TimeDiagnosis Time

Because of the relative stationary Because of the relative stationary behavior of a router, with an behavior of a router, with an approximative diagnosis time of 10/30 approximative diagnosis time of 10/30 min, TULIP can provide accurate results. min, TULIP can provide accurate results.

The load for Binary search is B/W and The load for Binary search is B/W and for parrallel LB/W (lower bound)for parrallel LB/W (lower bound)L: # of measurable routersL: # of measurable routers

B: Bandwitdth cost of the probesB: Bandwitdth cost of the probes

W: Wait time (usually 1s)W: Wait time (usually 1s)

Page 11: User-level Internet Path Diagnosis R. Mahajan, N. Spring, D. Wetherall and T. Anderson.

Diagnosing granularityDiagnosing granularity

The granularity is the weighted The granularity is the weighted average of the lengths of its average of the lengths of its diagnosable segments. diagnosable segments.

1

2 31’ 2’

Rank(G)=2

1 2

1

0

0

1

1

1G

Page 12: User-level Internet Path Diagnosis R. Mahajan, N. Spring, D. Wetherall and T. Anderson.

Various granularity for Various granularity for different measurementsdifferent measurements

•50 % of the paths have a granularity less than 3 hops (75% <4)•TULIP matches ideal tomography implementation

Page 13: User-level Internet Path Diagnosis R. Mahajan, N. Spring, D. Wetherall and T. Anderson.

ValidationValidation

Compared results with Planet Lab Compared results with Planet Lab coupled with a tomography systemcoupled with a tomography system

Use a measure “rate delta” that Use a measure “rate delta” that computes the difference between the computes the difference between the rate at the far end minus that at the rate at the far end minus that at the near end of a segment. near end of a segment.

Negative valuesNegative values implies a lack of implies a lack of consistency (values spawn a range consistency (values spawn a range too large)too large)

Page 14: User-level Internet Path Diagnosis R. Mahajan, N. Spring, D. Wetherall and T. Anderson.

Reordering ResultsReordering Results

85 % of the results are consistent for forward path75 % for round trip (due to the asymmetric nature of some paths)

Page 15: User-level Internet Path Diagnosis R. Mahajan, N. Spring, D. Wetherall and T. Anderson.

Loss resultsLoss results

•85% again of non negative deltas•Round trip counterpart less affected by asymmetry than the Reordering diagnosis (because loss usually occurs close to the destination)

Page 16: User-level Internet Path Diagnosis R. Mahajan, N. Spring, D. Wetherall and T. Anderson.

Queuing ResultsQueuing Results

•ICMP message generation has a poor timestamp resolution (the two median within 2ms of each other – One from TCPDump on planet lab and one from TULIP).

•Forward path shows that queuing delay is consistent (very few negative values)

•Round trip reflects the variability in the return path

Page 17: User-level Internet Path Diagnosis R. Mahajan, N. Spring, D. Wetherall and T. Anderson.

The last mile…The last mile…

First hops from user is the bottleneckFirst hops from user is the bottleneck

Page 18: User-level Internet Path Diagnosis R. Mahajan, N. Spring, D. Wetherall and T. Anderson.

Persistance of a faultPersistance of a fault

•We check for how many iterations, TULIP yields similar results•80% of the path show faults persisting long enough for TULIP to diagnose them (typical time a binary search takes to locate a fault : 6 runs)

Page 19: User-level Internet Path Diagnosis R. Mahajan, N. Spring, D. Wetherall and T. Anderson.

ConclusionsConclusions

Network Operators would be able to Network Operators would be able to diagnose links efficientlydiagnose links efficiently

And a user too … if the world was And a user too … if the world was populated entirely by Computer populated entirely by Computer nerds.nerds.

Page 20: User-level Internet Path Diagnosis R. Mahajan, N. Spring, D. Wetherall and T. Anderson.

Issues…Issues…

Multiple TULIP users could reduce Multiple TULIP users could reduce the accuracy of the probing method, the accuracy of the probing method, the per flow counterthe per flow counter

An application doesn’t experience An application doesn’t experience the network the same way an active the network the same way an active measurement does. (TCP, measurement does. (TCP, application dependant as well as application dependant as well as flags)flags)

Page 21: User-level Internet Path Diagnosis R. Mahajan, N. Spring, D. Wetherall and T. Anderson.

……and possible and possible improvementsimprovements

Per flow counter at the router level Per flow counter at the router level (unrealistic)(unrealistic)

Hash source address and IPID (for Hash source address and IPID (for flow)flow)

ICMP timestamp have reception ICMP timestamp have reception time as well as transmission time time as well as transmission time (allows the calculation of the delay (allows the calculation of the delay the packet is processed at the the packet is processed at the router)router)