Network Anomography

Network Network AnomographyAnomography

Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan

Internet Measurement Conference 2005Berkeley, CA, USA

Presented by Huizhong SunSome slides borrow from Yin Zhang

2

Network Anomaly DetectionNetwork Anomaly Detection• Is the network experiencing unusual

conditions?– Call these conditions anomalies– Anomalies can often indicate network problems

• DDoS attack, network worms, flash crowds, misconfigurations , vendor implementation bugs, …

– Need rapid detection and diagnosis• Want to fix the problem quickly

• Questions of interest– Detection

• Is there an unusual event?– Identification

• What’s the best explanation?– Quantification

• How serious is the problem?

3

Network AnomographyNetwork Anomography• What we want

– Volume anomalies [Lakhina04]Significant changes in an Origin-Destination flow, i.e., traffic matrix element

– Detect Volume anomalies

– Identify which O-D pair

A

BC

4

Network AnomographyNetwork Anomography• Challenge

– It is difficult to measure traffic matrix directly– The anomalies detection problem is somewhat more

complex and difficult• First, anomaly detection is performed on a series of

measurements over a period of time, rather than from a single snapshot.

• In addition to changes in the traffic, the solution must build in the ability to deal with changes in routing.

• What we have– Link traffic measurements Simple Network

Management Protocol (SNMP) data on individual link loads is available almost ubiquitously.

• Network Anomography– Infer volume anomalies from link traffic

measurements

5

An IllustrationAn Illustration

Courtesy: Anukool Lakhina [Lakhina04]

6

Anomography =Anomography =Anomalies + TomographyAnomalies + Tomography

7

Mathematical FormulationMathematical Formulation

Problem: Infer changes in TM elements (xt) given link measurements (bt)

Only measure at links1

3

2router

route 1

route 3

route 2

,t,t,t xxb 321

link 2

link 1

link 3

t

t

t

t

t

t

xxx

bbb

,3

,2

,1

,3

,2

,1

011101110

8

Mathematical FormulationMathematical Formulation

bt = At xt (t=1,…,T)

Typically massively under-constrained!


3

2router

route 1

route 3

route 2

,t,t,t xxb 321

link 2

link 1

link 3

9

Static Network AnomographyStatic Network Anomography

Time-invariant At (= A), B=[b1…bT], X=[x1…xT]


3

2router

route 1

route 3

route 2

,t,t,t xxb 321

link 2

link 1

link 3

B = AX

10

Anomography StrategiesAnomography Strategies• Early Inverse

1. Inversion– Infer OD flows X by solving bt=Axt

2. Anomaly extraction– Extract volume anomalies X from inferred X

Drawback: errors in step 1 may contaminate step 2

• Late Inverse1. Anomaly extraction

– Extract link traffic anomalies B from B2. Inversion

– Infer volume anomalies X by solving bt=Axt Idea: defer “lossy” inference to the last step

11

Extracting Link Anomalies Extracting Link Anomalies BB• Temporal Anomography:

– Fourier / wavelet analysis• Link anomalies = the high frequency components

– ARIMA modeling• Diff• EWMA (Exponentially Weighted Moving Average) is

ARIMA(0, 1, 1) • Holt-Winters is ARIMA(0, 2, 2)

– Temporal PCA• PCA = Principal Component Analysis• Project columns onto principal link column vectors

• Spatial Anomography:– Spatial PCA [Lakhina04]

• Project rows onto principal link row vectors

12

Extracting Link Anomalies Extracting Link Anomalies BB• Fourier analysis

– Fourier analysis decompose a complex periodic waveform into a set of sinusoids with different amplitudes, frequencies and phases.

– The sum of these sinusoids can exactly match the original waveform.

– The idea of using the Fourier analysis to extract anomalous link traffic is to filter out the low frequency components.

– In general, low frequency components capture the daily and weekly traffic patterns, while high frequency components represent the sudden changes in traffic behavior.

13


– For a discrete-time signal x0, x1, . . . , xN-1, the Discrete Fourier Transform (DFT) is defined by

– where fn is a complex number that captures the amplitude and phase of the signal at the n-th frequency

– Lower n corresponds to a lower frequency component, with f0 being the DC component,

– fn with n close to N/2 corresponding to high frequencies

14


– The Inverse Discrete Fourier Transform (IDFT) is used to reconstruct the signal in the time domain by

– An efficient way to implement the DFT and IDFT is the Fast Fourier Transform (FFT).

– The computational complexity of the FFT is O(N log(N)).

15

Extracting Link Anomalies Extracting Link Anomalies BB• FFT based anomography.

– 1. Transform link traffic B into the frequency domain: F = FFT(B): apply the FFT on each row of B. (a row corresponds to the time series of traffic data on one link.)

– 2. Remove low frequency components: i.e. set Fi = 0, for i ∈[1, c] ∪ [N-c, N], where c is a cut-off frequency.

• (For example, using 10-minute aggregated link traffic data of one week duration, and c = 10N/60, corresponding to a frequency of one cycle per hour.)

– 3. Transform back into the time domain: i.e. we take B = IFFT(F). The result is the high frequency components in the traffic data, which we will use as anomalous link traffic

16

Extracting Link Anomalies Extracting Link Anomalies BB• Wavelet analysis

– 1. Use wavelets to decompose B into different frequency levels: W = WAVEDEC(B), by applying a multi-level 1-D wavelet decomposition on each row of B.

– 2. Then remove low- and mid-frequency components in W by setting all coefficients at frequency levels higher than wc to 0. Here wc is a cut-off frequency level.

– 3. Reconstruct the signal: B = WAVEREC(W’). The result is the high-frequency components in the traffic data.

17

Extracting Link Anomalies Extracting Link Anomalies BB• ARIMA Modeling -- Box-Jenkins methodology, or

AutoRegressive Integrated Moving Average (ARIMA)

• A class of linear time-series forecasting techniques that capture the linear dependency of the future values on the past.

• It has been extensively used for anomaly detection in univariate time series.

• To get back to anomaly detection, we simply identify the forecast errors as anomalous link traffic.

• Traffic behavior that cannot be well captured by the model is considered anomalous.

18

Extracting Link Anomalies Extracting Link Anomalies BB

• ARIMA(p, d, q) model includes three parameters:– The autoregressive parameter (p), – The number of differencing passes (d),– The moving average parameter (q). – Some model used for detecting anomalies in time-

series, • for example, the Exponentially Weighted Moving

Average (EWMA) is ARIMA(0, 1, 1); Holt-Winters is ARIMA(0, 2, 2).

19

Extracting Link Anomalies Extracting Link Anomalies BB• ARIMA(p, d, q) model includes three

parameters:– the autoregressive parameter (p), – the number of differencing passes (d),– the moving average parameter (q).

where zk is obtained by differencing the original time series d times (when d ≥ 1) or by subtracting the mean from the original time series (when d = 0), ek is the forecast error at time k, φi (i = 1, ..., p) and θj (j = 1, ..., q) are the autoregression and moving average coefficients, respectively.

20


Diagnosing Network-Wide Diagnosing Network-Wide Traffic AnomaliesTraffic Anomalies

Anukool Lakhina, Mark Crovella, Christophe Diot

“Diagnosing Network-Wide Traffic Anomalies”SIGCOMM’04,

22

Extracting Link Anomalies Extracting Link Anomalies BB• Spatial Anomography: Spatial PCA

[Lakhina04] – 1. Identify the first axis that the link traffic data

have the greatest degree of variance along the first axis

– 2. Identify the second axis that the link traffic data have the second greatest degree of variance along the second one, and so on so forth:

23

Extracting Link Anomalies Extracting Link Anomalies BB• Spatial Anomography: Spatial PCA

[Lakhina04] – 3. Divide the link traffic space into the normal

subspace and the anomalous subspace • by examining the projection of the time series of

link traffic data on each principal axis in order. • As soon as a projection is found that contains a

3σ deviation from the mean, that principal axis and all subsequent axes are assigned to the anomalous subspace.

• All previous principal axis are assigned to the normal subspace.

24

Data CollectedData Collected

Abilene Sprint-Europe

25Low Intrinsic Dimensionality of Link Low Intrinsic Dimensionality of Link TrafficTraffic

Studied via Principal Component Analysis

Key result: Normal traffic is well approximated as occupying a low dimensional subspace

Reasons: 1. Links share OD flows2. Set of OD flows also low dimensional

26

The Subspace MethodThe Subspace Method• An approach to separate normal from

anomalous traffic • Normal Subspace, : space spanned by the

first k principal components• Anomalous Subspace, : space spanned by

the remaining principal components• Then, decompose traffic on all links by

projecting onto and to obtain:

Traffic vector of all links at a particular point in time

Normal trafficvector

Residual trafficvector

27

Traffic on Link 1

Traf

fic o

n Li

nk 2

A Geometric IllustrationA Geometric Illustration

In general, anomalous traffic results in a large value of

y

28

DetectionDetection

Traffic on Link 1

Traf

fic o

n Li

nk 2

• Capture size of vector using squared prediction error (SPE):

Result due to [Jackson and Mudholkar, 1979]

29

Detection IllustrationDetection Illustration

Value ofover time (all traffic)

over time(SPE)

Value of

SPE at anomaly time points clearly stand out

30


Temporal PCA

• PCA = Principal Component Analysis• Similar with Spatial PCA• Project columns onto principal link column

vectors

31

• Temporal Anomography: B = AX• Now if we know B, how to solve the

abnormal traffic O-D pairs X ?• (1) Pseudoinverse solution• (2) Sparsity maximization

Solving bSolving btt = Ax = Axtt

32

Solving Solving bbt t = A x= A xtt

• Pseudoinverse: xt = pinv(A) bt

– Shortest minimal L2-norm solution• Solve xt subject to |bt – A xt|2 is minimal

33

Solving Solving bbt t = A x= A xtt

• Maximize sparsity – In practice, we expect only a few anomalies at

any one time, so x typically has only a small number of large values.

– Hence it is natural to proceed by maximizing the sparsity of x, i.e., solving the following l0 norm minimization problem:

34

Performance EvaluationPerformance Evaluation• Fix one anomaly extraction method• Compare “real” and “inferred”

anomalies– “real” anomalies: directly from OD flow data– “inferred” anomalies: from link data

• Order them by size– Compare the size

• How many of the top N do we find– Gives detection rate: | top N”real” top Ninferred | /

N

35

Performance EvaluationPerformance Evaluation

36


37


38


39Performance Evaluation: Performance Evaluation: AnomographyAnomography

• Hard to compare performance– Lack ground-truth: what is an anomaly?

• So compare events from different methods– Compute top M “benchmark” anomalies

• Apply an anomaly extraction method directly on OD flow data– Compute top N “inferred” anomalies

• Apply another anomography method on link data– Report min(M,N) - | top Mbenchmark top Ninferred |

• M N “false negatives”# big “benchmark” anomalies not considered big by anomography

• M N “false positives”# big “inferred” anomalies not considered big by benchmark method

– Choose M, N similar to numbers of anomalies a provider is willing to investigate, e.g. 30-50 per week

40

Anomography: “False Negatives”Anomography: “False Negatives”Top 50

Inferred“ False Negatives” with Top 30 Benchmark

Diff EWMA

H-W ARIMA

Fourier

Wavelet

T-PCA

S-PCA

Diff 0 0 1 1 5 5 17 12EWMA 0 0 1 1 5 5 17 12Holt-

Winters1 1 0 0 6 4 18 12

ARIMA 1 1 0 0 6 4 18 12Fourier 3 4 8 8 1 7 19 18Wavelet 0 1 2 2 5 0 13 11T-PCA 14 14 14 14 19 15 3 15S-PCA 10 10 13 13 15 11 1 13

1. Diff/EWMA/H.-W./ARIMA/Fourier/Wavelet all largely consistent

2. PCA methods not consistent (even with each other)- PCA cannot detect anomalies in the “normal” subspace- PCA insensitive to reordering of [b1…bT] cannot utilize all

temporal info3. Spatial methods (e.g. spatial PCA) are not self-consistent

41

Anomography: “False Positives”Anomography: “False Positives”Top 30

Inferred“ False Positives” with Top 50 Benchmark

Diff EWMA

H-W ARIMA

Fourier

Wavelet

T-PCA

S-PCA

Diff 3 3 6 6 6 4 14 14EWMA 3 3 6 6 7 5 13 15Holt-

Winters4 4 1 1 8 3 13 10

ARIMA 4 4 1 1 8 3 13 10Fourier 6 6 7 6 2 6 19 18Wavelet 6 6 6 6 8 1 13 12T-PCA 17 17 17 17 20 13 0 14S-PCA 18 18 18 18 20 14 1 141. Diff/EWMA/H.-W./ARIMA/Fourier/Wavelet all largely

consistent2. PCA methods not consistent (even with each other)

- PCA cannot detect anomalies in the “normal” subspace- PCA insensitive to reordering of [b1…bT] cannot utilize all

temporal info3. Spatial methods (e.g. spatial PCA) are not self-consistent

42

ConclusionsConclusions• Anomography = Anomalies + Tomography

– Find anomalies in {xt} given bt=Atxt (t=1,…,T)• Contributions

1. A general framework for anomography methods– Decouple anomaly extraction and inference components

2. A number of novel algorithms– Taking advantage of the range of choices for anomaly

extraction and inference components– Choosing between spatial vs. temporal approaches

3. Extensive evaluation on real traffic data– 6-month Abilene and 1-month Tier-1 ISP

• The method of choice: ARIMA + Sparsity-L1

43

Thank you !Thank you !Question?Question?

44

Extracting Link Anomalies Extracting Link Anomalies BB• Temporal Anomography: B = BT

– Fourier / wavelet analysis• Link anomalies = the high frequency components

– ARIMA modeling• Diff: ft = bt-1 bt = bt – ft

• EWMA: ft = (1-) ft-1 + bt-1 bt = bt – ft

– Temporal PCA• PCA = Principal Component Analysis• Project columns onto principal link column vectors

• Spatial Anomography: B = TB– Spatial PCA [Lakhina04]

• Project rows onto principal link row vectors

Network Anomography

Documents

anomalous link traffic

traffic behavior

network worms

network anomographywhat

individual link loads

weekly traffic patterns

network problemsddos

network anomographyyin