A Non-intrusive, Wavelet-based Approach To Detecting Network Performance Problems Polly Huang ETH Zurich Anja Feldmann U. Saarbruecken Walter Willinger AT&T Labs-Research
Mar 27, 2015
A Non-intrusive, Wavelet-based Approach To Detecting Network Performance Problems
Polly Huang ETH ZurichAnja Feldmann U. SaarbrueckenWalter Willinger AT&T Labs-Research
Road Map
Motivation and rationaleMechanism detailsConclusion and outlook
Performance Problem
Web
TCP
Network
Link/Physical
Web
TCP
Network
Link/Physical
Google.com
congestionroutingserver
else
Internet
Web
TCP
NetworkLink/Physical
congestionroutingproxyelse
Current State
Active probing Ex: traceroute, ping Disturbing - injecting unnecessary traffic Biasing - distort metrics of interest
‘Heisenberg’ effects
Passive measurements Ex: Cisco NetFlow, IP Accounting, other packet-
level measurment give much information Do not infer problems inside the network
What Would Be Cool
PassiveTrigger alerts in real time For problems due to
Server load Congestion Routing error
Common Symptoms Delay and drop
TCP’s Closed-loop Control
Delays/drops reflected in RTT/RTO estimations RTT: round trip time RTO: retransmission timeout
Quality of Network Path Values of RTT/RTO estimations Amounts of RTT/RTO samples
Can be measured passively
Detailed Estimation
Methodology A hash table of all data packets
observed One RTT sample per data-ack pair One RTO sample per data-data pair
Slow ~ #packets/observation period especially with high date rate
connections (the likely trouble makers)
Objectives
Passive measurement Non-intrusive
Infer quality of network paths Detecting network performance
problemEfficiently (so can be done in real
time) Wavelet-based technique
Road Map
Motivation and rationaleMechanism detailsConclusion and outlook
Wavelet-based Technique
Theoretical ground Wavelet transform Energy plots (or scaling plots) Interpreting energy plots
WIND, the problem detection tool Features & examples Detection methodology Validation effort
Theoretical Ground
FFT Frequency decomposition fj, Fourier coefficient Amount of the signal in frequency j
WT: wavelet transform Frequency (scale) and time decomposition dj,k, wavelet coefficient Amount of the signal in frequency j, time k
Wavelet Example
0-1
1
00 00 00 00 11 11 11 11
s1
s2
s3
s4
d1
d2
d3
d4
0 0 0 0 2 2 2 2 0 0 0 0 0 0 0 0
0 0 4 4 0 0 0 0
0 8 0 0
8 8
Self-similarity
Energy function Ej = Σ(dj,k)2/Nj
Self-similar process Ej = 2j(2H-1) C <- the magic!!
log2 Ej = (2H-1) j + log2C
linear relationship between log2 Ej and j
Self-similar Traffic
Effect of Periodicity
self-similar
Internet Traffic
Adding Periodicity
packets arrive periodically, 1 pkt/23 msec
coefficients cancel out at scale 410 00 00 00 10 00 00 00
s1
s2
s3
s4
d1
d2
d3
d4
1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0
1 0 1 0 1 0 1 0
1 1 1 1
2 0
Simulation TrafficSingle RTT
Simulation TrafficCongestion
Interpreting Energy Functions
Abrupt knees at RTT time scale RTO time scale
Knee shifts RTT/RTO time changes
Low energy level (after normalization) congestion low traffic volume
WIND - The Detection Tool
Wavelet-based Inference for Network Detection
Based on libpcap and tcpdumpOn-line mode (efficient)
Per packet: compute dj,k
Per observation period: output Ej
On a subnet basisOff-line mode
Detailed RTT/RTO estimation
Real TrafficBy Subnets
Real TrafficBy Periods
Real TrafficBy Periods
Detecting Methodology
Reference function Smoothed average
Difference Area below the reference function Weighted sum by scale
Flagged interesting Top 10% deviations
Pick Out Interesting Ones26, 30, 31
Validation By
WIND off-line mode Detailed RTT/RTO estimations Volume
Similar heuristics (area difference) CCDF of RTT/RTO Ratio of RTO/RTT Volume
Validate period 26, 30, 31
CCDF of RTO: pick out period 23, 26, 31
CCDF of RTT:pick out period 29, 30, 31
80-90% are validated interesting80-90% are validated interesting
Road Map
Motivation and rationaleMechanism detailsConclusion and outlook
Summary
Detect problems using energy plots If self-similar, clean linear relationship If periodic, getting knees If problems, knee shifts or low energy level
WIND: the online/offline analysis tool
Passive Efficient
Outlook
Full-fledged diagnosing tool More sophisticated heuristics Use of traceroute data
Illustrative examples Using the tool (beta release) Using the methodology
Questions?
http://www.tik.ee.ethz.ch/~huang