Top Banner
Anomaly Detection with Extreme Value Theory A. Siffer, P-A Fouque, A. Termier and C. Largouet April 26, 2017
94

Anomaly Detection with Extreme Value Theory

Sep 11, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Anomaly Detection with Extreme Value Theory

Anomaly Detection with Extreme Value Theory

A. Siffer, P-A Fouque, A. Termier and C. LargouetApril 26, 2017

Page 2: Anomaly Detection with Extreme Value Theory

Contents

Context

Providing better thresholds

Finding anomalies in streams

Application to intrusion detection

A more general framework

1

Page 3: Anomaly Detection with Extreme Value Theory

Context

Page 4: Anomaly Detection with Extreme Value Theory

General motivations

⊸ Massive usage of the Internet

• More and more vulnerabilities• More and more threats

⊸ Awareness of the sensitive data and infrastructures

2

Page 5: Anomaly Detection with Extreme Value Theory

General motivations

⊸ Massive usage of the Internet• More and more vulnerabilities

• More and more threats

⊸ Awareness of the sensitive data and infrastructures

2

Page 6: Anomaly Detection with Extreme Value Theory

General motivations

⊸ Massive usage of the Internet• More and more vulnerabilities• More and more threats

⊸ Awareness of the sensitive data and infrastructures

2

Page 7: Anomaly Detection with Extreme Value Theory

General motivations

⊸ Massive usage of the Internet• More and more vulnerabilities• More and more threats

⊸ Awareness of the sensitive data and infrastructures

2

Page 8: Anomaly Detection with Extreme Value Theory

General motivations

⊸ Massive usage of the Internet• More and more vulnerabilities• More and more threats

⊸ Awareness of the sensitive data and infrastructures

2

⇒ Network security :a major concern

Page 9: Anomaly Detection with Extreme Value Theory

A Solution

⊸ IDS (Intrusion Detection System)• Monitor traffic• Detect attacks

⊸ Current methods : rule-based

• Work fine on common and well-known attacks• Cannot detect new attacks

⊸ Emerging methods : anomaly-based

• Use the network data to estimate a normal behavior• Apply algorithms to detect abnormal events (→ attacks)

3

Page 10: Anomaly Detection with Extreme Value Theory

A Solution

⊸ IDS (Intrusion Detection System)• Monitor traffic• Detect attacks

⊸ Current methods : rule-based• Work fine on common and well-known attacks• Cannot detect new attacks

⊸ Emerging methods : anomaly-based

• Use the network data to estimate a normal behavior• Apply algorithms to detect abnormal events (→ attacks)

3

Page 11: Anomaly Detection with Extreme Value Theory

A Solution

⊸ IDS (Intrusion Detection System)• Monitor traffic• Detect attacks

⊸ Current methods : rule-based• Work fine on common and well-known attacks• Cannot detect new attacks

⊸ Emerging methods : anomaly-based• Use the network data to estimate a normal behavior• Apply algorithms to detect abnormal events (→ attacks)

3

Page 12: Anomaly Detection with Extreme Value Theory

Overview

⊸ Basic scheme

Algorithmdata alerts

⊸ Many ”standard” algorithms have been tested⊸ Complex pipelines are emerging (ensemble/hybrid techniques)

dataalerts

4

Page 13: Anomaly Detection with Extreme Value Theory

Overview

⊸ Basic scheme

Algorithmdata alerts

⊸ Many ”standard” algorithms have been tested

⊸ Complex pipelines are emerging (ensemble/hybrid techniques)

dataalerts

4

Page 14: Anomaly Detection with Extreme Value Theory

Overview

⊸ Basic scheme

Algorithmdata alerts

⊸ Many ”standard” algorithms have been tested⊸ Complex pipelines are emerging (ensemble/hybrid techniques)

dataalerts

4

Page 15: Anomaly Detection with Extreme Value Theory

Inherent problem

⊸ Algorithms are not magic• They give some information about data (scores)

• But the decision often rely on a human choice

if score>threshold then trigger alert

⊸ The thresholds are often hard-set

• Expertise• Fine-tuning• Distribution assumption

⊸ Our idea: provide dynamic threshold with a probabilisticmeaning

5

Page 16: Anomaly Detection with Extreme Value Theory

Inherent problem

⊸ Algorithms are not magic• They give some information about data (scores)• But the decision often rely on a human choice

if score>threshold then trigger alert

⊸ The thresholds are often hard-set

• Expertise• Fine-tuning• Distribution assumption

⊸ Our idea: provide dynamic threshold with a probabilisticmeaning

5

Page 17: Anomaly Detection with Extreme Value Theory

Inherent problem

⊸ Algorithms are not magic• They give some information about data (scores)• But the decision often rely on a human choice

if score>threshold then trigger alert

⊸ The thresholds are often hard-set• Expertise• Fine-tuning• Distribution assumption

⊸ Our idea: provide dynamic threshold with a probabilisticmeaning

5

Page 18: Anomaly Detection with Extreme Value Theory

Inherent problem

⊸ Algorithms are not magic• They give some information about data (scores)• But the decision often rely on a human choice

if score>threshold then trigger alert

⊸ The thresholds are often hard-set• Expertise• Fine-tuning• Distribution assumption

⊸ Our idea: provide dynamic threshold with a probabilisticmeaning

5

Page 19: Anomaly Detection with Extreme Value Theory

Providing better thresholds

Page 20: Anomaly Detection with Extreme Value Theory

My problem

0 25 50 75 100 125 150 175 200Daily payment by credit card ( )

0.000

0.005

0.010

0.015

0.020

0.025

Freq

uenc

y

⊸ How to set zq such that P(X€ > zq) < q ?

6

Page 21: Anomaly Detection with Extreme Value Theory

My problem

0 25 50 75 100 125 150 175 200Daily payment by credit card ( )

0.000

0.005

0.010

0.015

0.020

0.025

Freq

uenc

y

⊸ How to set zq such that P(X€ > zq) < q ?6

Page 22: Anomaly Detection with Extreme Value Theory

Solution 1: empirical approach

0 25 50 75 100 125 150 175 200Daily payment by credit card ( )

0.000

0.005

0.010

0.015

0.020

0.025

Freq

uenc

y

⊸ Drawbacks: stuck in the interval, poor resolution

7

Page 23: Anomaly Detection with Extreme Value Theory

Solution 1: empirical approach

0 25 50 75 100 125 150 175 200Daily payment by credit card ( )

0.000

0.005

0.010

0.015

0.020

0.025

Freq

uenc

y

1− q q

zq

⊸ Drawbacks: stuck in the interval, poor resolution

7

Page 24: Anomaly Detection with Extreme Value Theory

Solution 1: empirical approach

0 25 50 75 100 125 150 175 200Daily payment by credit card ( )

0.000

0.005

0.010

0.015

0.020

0.025

Freq

uenc

y

1− q q

zq

⊸ Drawbacks: stuck in the interval, poor resolution7

Page 25: Anomaly Detection with Extreme Value Theory

Solution 2: Standard Model

0 25 50 75 100 125 150 175 200Daily payment by credit card ( )

0.000

0.005

0.010

0.015

0.020

0.025

Freq

uenc

y

⊸ Drawbacks: manual step, distribution assumption

8

Page 26: Anomaly Detection with Extreme Value Theory

Solution 2: Standard Model

0 25 50 75 100 125 150 175 200Daily payment by credit card ( )

0.000

0.005

0.010

0.015

0.020

0.025

Freq

uenc

y

⊸ Drawbacks: manual step, distribution assumption

8

Page 27: Anomaly Detection with Extreme Value Theory

Solution 2: Standard Model

0 25 50 75 100 125 150 175 200Daily payment by credit card ( )

0.000

0.005

0.010

0.015

0.020

0.025

Freq

uenc

y

zq

⊸ Drawbacks: manual step, distribution assumption

8

Page 28: Anomaly Detection with Extreme Value Theory

Solution 2: Standard Model

0 25 50 75 100 125 150 175 200Daily payment by credit card ( )

0.000

0.005

0.010

0.015

0.020

0.025

Freq

uenc

y

zq

⊸ Drawbacks: manual step, distribution assumption8

Page 29: Anomaly Detection with Extreme Value Theory

Realities

0 100 200 3000.000

0.005

0.010

0.015

16 18 20 22 240.00

0.05

0.10

0.15

0.20

20 40 60 80 1000.000

0.005

0.010

0.015

0.020

0 25 50 750.00

0.01

0.02

0.03

⊸ Different clients and/or temporal drift

9

Page 30: Anomaly Detection with Extreme Value Theory

Results

Properties Empirical quantile Standard modelstatistical guarantees Yes Yes

easy to adapt Yes Nohigh resolution No Yes

10

Page 31: Anomaly Detection with Extreme Value Theory

Inspection of extreme events

0 25 50 75 100 125 150 175 200Daily payment by credit card ( )

0.000

0.005

0.010

0.015

0.020

0.025

Freq

uenc

y

Probability estimation ?

11

Page 32: Anomaly Detection with Extreme Value Theory

Inspection of extreme events

0 25 50 75 100 125 150 175 200Daily payment by credit card ( )

0.000

0.005

0.010

0.015

0.020

0.025

Freq

uenc

y

Probability estimation ?

11

Page 33: Anomaly Detection with Extreme Value Theory

Extreme Value Theory

⊸ Main result (Fisher-Tippett-Gnedenko, 1928)

The extreme values of any distribution have nearly the samedistribution (called Extreme Value Distribution)

0 1 2 3 4 5 6x

0.0

0.1

0.2

0.3

0.4

0.5

0.6

(X>

x)

heavy tailexponential tailbounded tail

12

Page 34: Anomaly Detection with Extreme Value Theory

Extreme Value Theory

⊸ Main result (Fisher-Tippett-Gnedenko, 1928)

The extreme values of any distribution have nearly the samedistribution (called Extreme Value Distribution)

0 1 2 3 4 5 6x

0.0

0.1

0.2

0.3

0.4

0.5

0.6

(X>

x)

heavy tailexponential tailbounded tail

12

Page 35: Anomaly Detection with Extreme Value Theory

Extreme Value Theory

⊸ Main result (Fisher-Tippett-Gnedenko, 1928)

The extreme values of any distribution have nearly the samedistribution (called Extreme Value Distribution)

0 1 2 3 4 5 6x

0.0

0.1

0.2

0.3

0.4

0.5

0.6

(X>

x)

heavy tailexponential tailbounded tail

12

Page 36: Anomaly Detection with Extreme Value Theory

An impressive analogy

⊸ Let X1, X2, . . . Xn a sequence of i.i.d. random variables with

Sn =n∑i=1

Xi Mn = max1≤i≤n

(Xi)

⊸ Central Limit TheoremSn − nµ√n

d−→ N (0, σ2)

⊸ FTG Theorem Mn − anbn

d−→ EVD(γ)

13

Page 37: Anomaly Detection with Extreme Value Theory

An impressive analogy

⊸ Let X1, X2, . . . Xn a sequence of i.i.d. random variables with

Sn =n∑i=1

Xi Mn = max1≤i≤n

(Xi)

⊸ Central Limit TheoremSn − nµ√n

d−→ N (0, σ2)

⊸ FTG Theorem Mn − anbn

d−→ EVD(γ)

13

Page 38: Anomaly Detection with Extreme Value Theory

An impressive analogy

⊸ Let X1, X2, . . . Xn a sequence of i.i.d. random variables with

Sn =n∑i=1

Xi Mn = max1≤i≤n

(Xi)

⊸ Central Limit TheoremSn − nµ√n

d−→ N (0, σ2)

⊸ FTG Theorem Mn − anbn

d−→ EVD(γ)

13

Page 39: Anomaly Detection with Extreme Value Theory

A more practical result

⊸ Second theorem of EVT (Pickands-Balkema-de Haan, 1974)

The excesses over a high threshold follow a Generalized ParetoDistribution (with parameters γ, σ)

⊸ What does it imply ?

• we have a model for extreme events• we can compute zq for q as small as desired

14

Page 40: Anomaly Detection with Extreme Value Theory

A more practical result

⊸ Second theorem of EVT (Pickands-Balkema-de Haan, 1974)

The excesses over a high threshold follow a Generalized ParetoDistribution (with parameters γ, σ)

⊸ What does it imply ?• we have a model for extreme events• we can compute zq for q as small as desired

14

Page 41: Anomaly Detection with Extreme Value Theory

How to use EVT

⊸ Get some data X1, X2 . . . Xn⊸ Set a high threshold t and retrieve the excesses Yj = Xkj − t

when Xkj > t

⊸ Fit a GPD to the Yj (→ find parameters γ, σ)⊸ Compute zq such as P(X > zq) < q

EVT

q

X1, X2 . . . Xn zq

15

Page 42: Anomaly Detection with Extreme Value Theory

How to use EVT

⊸ Get some data X1, X2 . . . Xn⊸ Set a high threshold t and retrieve the excesses Yj = Xkj − t

when Xkj > t⊸ Fit a GPD to the Yj (→ find parameters γ, σ)

⊸ Compute zq such as P(X > zq) < q

EVT

q

X1, X2 . . . Xn zq

15

Page 43: Anomaly Detection with Extreme Value Theory

How to use EVT

⊸ Get some data X1, X2 . . . Xn⊸ Set a high threshold t and retrieve the excesses Yj = Xkj − t

when Xkj > t⊸ Fit a GPD to the Yj (→ find parameters γ, σ)⊸ Compute zq such as P(X > zq) < q

EVT

q

X1, X2 . . . Xn zq

15

Page 44: Anomaly Detection with Extreme Value Theory

How to use EVT

⊸ Get some data X1, X2 . . . Xn⊸ Set a high threshold t and retrieve the excesses Yj = Xkj − t

when Xkj > t⊸ Fit a GPD to the Yj (→ find parameters γ, σ)⊸ Compute zq such as P(X > zq) < q

EVT

q

X1, X2 . . . Xn zq

15

Page 45: Anomaly Detection with Extreme Value Theory

Finding anomalies in streams

Page 46: Anomaly Detection with Extreme Value Theory

Streaming Peaks-Over-Threshold (SPOT) algorithm

(initial batch)

X1, X2 . . . Xn Calibration

q

t zq

(stream)

Xi>n Xi > zq

trigger alarmyes

no Xi > t

yesupdate model

no drop

16

Page 47: Anomaly Detection with Extreme Value Theory

Streaming Peaks-Over-Threshold (SPOT) algorithm

(initial batch)

X1, X2 . . . Xn Calibration

q

0

0.10

0.20

0.30

20 40 60 80 100 120

t zq

(stream)

Xi>n Xi > zq

trigger alarmyes

no Xi > t

yesupdate model

no drop

16

Page 48: Anomaly Detection with Extreme Value Theory

Streaming Peaks-Over-Threshold (SPOT) algorithm

(initial batch)

X1, X2 . . . Xn Calibration

q

0

0.10

0.20

0.30

20 40 60 80 100 120

t zq

(stream)

Xi>n Xi > zq

trigger alarmyes

no Xi > t

yesupdate model

no drop

16

Page 49: Anomaly Detection with Extreme Value Theory

Streaming Peaks-Over-Threshold (SPOT) algorithm

(initial batch)

X1, X2 . . . Xn Calibration

q

0

0.10

0.20

0.30

20 40 60 80 100 120

t zq

(stream)

Xi>n Xi > zq

trigger alarmyes

no Xi > t

yesupdate model

no drop

16

Page 50: Anomaly Detection with Extreme Value Theory

Streaming Peaks-Over-Threshold (SPOT) algorithm

(initial batch)

X1, X2 . . . Xn Calibration

q

0

0.10

0.20

0.30

20 40 60 80 100 120

t zq

(stream)

Xi>n Xi > zq

trigger alarmyes

no Xi > t

yesupdate model

no drop

16

Page 51: Anomaly Detection with Extreme Value Theory

Streaming Peaks-Over-Threshold (SPOT) algorithm

(initial batch)

X1, X2 . . . Xn Calibration

q

0

0.10

0.20

0.30

20 40 60 80 100 120

t zq

(stream)

Xi>n Xi > zq

trigger alarmyes

no Xi > t

yesupdate model

no drop

16

Page 52: Anomaly Detection with Extreme Value Theory

Streaming Peaks-Over-Threshold (SPOT) algorithm

(initial batch)

X1, X2 . . . Xn Calibration

q

0

0.10

0.20

0.30

20 40 60 80 100 120

t zq

(stream)

Xi>n Xi > zq

trigger alarmyes

no Xi > t

yesupdate model

no drop

16

Page 53: Anomaly Detection with Extreme Value Theory

Streaming Peaks-Over-Threshold (SPOT) algorithm

(initial batch)

X1, X2 . . . Xn Calibration

q

0

0.10

0.20

0.30

20 40 60 80 100 120

t zq

(stream)

Xi>n Xi > zq

trigger alarmyes

no Xi > t

yesupdate model

no drop

16

Page 54: Anomaly Detection with Extreme Value Theory

Streaming Peaks-Over-Threshold (SPOT) algorithm

(initial batch)

X1, X2 . . . Xn Calibration

q

0

0.10

0.20

0.30

20 40 60 80 100 120

t zq

(stream)

Xi>n Xi > zq

trigger alarmyes

no Xi > t

yesupdate model

no drop

16

Page 55: Anomaly Detection with Extreme Value Theory

Streaming Peaks-Over-Threshold (SPOT) algorithm

(initial batch)

X1, X2 . . . Xn Calibration

q

0

0.10

0.20

0.30

20 40 60 80 100 120

t zq

(stream)

Xi>n Xi > zq

trigger alarmyes

no Xi > t

yesupdate model

no drop

16

Page 56: Anomaly Detection with Extreme Value Theory

Streaming Peaks-Over-Threshold (SPOT) algorithm

(initial batch)

X1, X2 . . . Xn Calibration

q

0

0.10

0.20

0.30

20 40 60 80 100 120

t zq

(stream)

Xi>n Xi > zq

trigger alarmyes

no Xi > t

yesupdate model

no drop

16

Page 57: Anomaly Detection with Extreme Value Theory

Streaming Peaks-Over-Threshold (SPOT) algorithm

(initial batch)

X1, X2 . . . Xn Calibration

q

0

0.10

0.20

0.30

20 40 60 80 100 120

t zq

(stream)

Xi>n Xi > zq

trigger alarmyes

no Xi > t

yesupdate model

no drop

16

Page 58: Anomaly Detection with Extreme Value Theory

Can we trust that threshold zq ?

⊸ An example with ground truth : a Gaussian White Noise• 40 streams with 200 000 iid variables drawn from N (0, 1)• q = 10−3 ⇒ theoretical threshold zth ≃ 3.09

⊸ Averaged relative error

0 50000 100000 150000 200000Number of observations

0.01

0.02

0.03

0.04

Rela

tive

erro

r

17

Page 59: Anomaly Detection with Extreme Value Theory

Can we trust that threshold zq ?

⊸ An example with ground truth : a Gaussian White Noise• 40 streams with 200 000 iid variables drawn from N (0, 1)• q = 10−3 ⇒ theoretical threshold zth ≃ 3.09

⊸ Averaged relative error

0 50000 100000 150000 200000Number of observations

0.01

0.02

0.03

0.04

Rela

tive

erro

r

17

Page 60: Anomaly Detection with Extreme Value Theory

Application to intrusion detection

Page 61: Anomaly Detection with Extreme Value Theory

About the data

⊸ Lack of relevant public datasets to test the algorithms ...

⊸ KDD99 ? See [McHugh 2000] and [Mahoney & Chan 2003]⊸ We rather use MAWI

• 15 min a day of real traffic (.pcap file)• Anomaly patterns given by the MAWILab [Fontugne et al. 2010]with taxonomy [Mazel et al. 2014]

⊸ Preprocessing step : raw .pcap → NetFlow format (onlymetadata)

18

Page 62: Anomaly Detection with Extreme Value Theory

About the data

⊸ Lack of relevant public datasets to test the algorithms ...⊸ KDD99 ? See [McHugh 2000] and [Mahoney & Chan 2003]

⊸ We rather use MAWI

• 15 min a day of real traffic (.pcap file)• Anomaly patterns given by the MAWILab [Fontugne et al. 2010]with taxonomy [Mazel et al. 2014]

⊸ Preprocessing step : raw .pcap → NetFlow format (onlymetadata)

18

Page 63: Anomaly Detection with Extreme Value Theory

About the data

⊸ Lack of relevant public datasets to test the algorithms ...⊸ KDD99 ? See [McHugh 2000] and [Mahoney & Chan 2003]⊸ We rather use MAWI

• 15 min a day of real traffic (.pcap file)• Anomaly patterns given by the MAWILab [Fontugne et al. 2010]with taxonomy [Mazel et al. 2014]

⊸ Preprocessing step : raw .pcap → NetFlow format (onlymetadata)

18

Page 64: Anomaly Detection with Extreme Value Theory

About the data

⊸ Lack of relevant public datasets to test the algorithms ...⊸ KDD99 ? See [McHugh 2000] and [Mahoney & Chan 2003]⊸ We rather use MAWI

• 15 min a day of real traffic (.pcap file)• Anomaly patterns given by the MAWILab [Fontugne et al. 2010]with taxonomy [Mazel et al. 2014]

⊸ Preprocessing step : raw .pcap → NetFlow format (onlymetadata)

18

Page 65: Anomaly Detection with Extreme Value Theory

An example to detect network syn scan

⊸ The ratio of SYN packets : relevant feature to detect networkscan [Fernandes & Owezarski 2009]

0.0 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0Time (s)

0.0

0.2

0.4

0.6

0.8

ratio

of S

YN p

acke

ts in

50m

s tim

e wi

ndow

⊸ Goal: find peaks

19

Page 66: Anomaly Detection with Extreme Value Theory

An example to detect network syn scan

⊸ The ratio of SYN packets : relevant feature to detect networkscan [Fernandes & Owezarski 2009]

0.0 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0Time (s)

0.0

0.2

0.4

0.6

0.8ra

tio o

f SYN

pac

kets

in 5

0ms t

ime

wind

ow

⊸ Goal: find peaks

19

Page 67: Anomaly Detection with Extreme Value Theory

An example to detect network syn scan

⊸ The ratio of SYN packets : relevant feature to detect networkscan [Fernandes & Owezarski 2009]

0.0 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0Time (s)

0.0

0.2

0.4

0.6

0.8ra

tio o

f SYN

pac

kets

in 5

0ms t

ime

wind

ow

⊸ Goal: find peaks19

Page 68: Anomaly Detection with Extreme Value Theory

SPOT results

⊸ Parameters : q = 10−4,n = 2000 (from the previous day record)

0.0 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0Time (s)

0.0

0.2

0.4

0.6

0.8

ratio

of S

YN p

acke

ts in

50m

s tim

e wi

ndow

20

Page 69: Anomaly Detection with Extreme Value Theory

SPOT results

⊸ Parameters : q = 10−4,n = 2000 (from the previous day record)

0.0 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0Time (s)

0.0

0.2

0.4

0.6

0.8ra

tio o

f SYN

pac

kets

in 5

0ms t

ime

wind

ow

20

Page 70: Anomaly Detection with Extreme Value Theory

Do we really flag scan attacks ?

⊸ The main parameter q: a False Positive regulator

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5

FPr (%)

0.0

20.0

40.0

60.0

80.0

100.0

TPr

(%)

9. 10−6

1. 10−5 5. 10−5

8. 10−5

8. 10−6

6. 10−3

⊸ 86% of scan flows detected with less than 4% of FP

21

Page 71: Anomaly Detection with Extreme Value Theory

Do we really flag scan attacks ?

⊸ The main parameter q: a False Positive regulator

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5

FPr (%)

0.0

20.0

40.0

60.0

80.0

100.0TPr

(%)

9. 10−6

1. 10−5 5. 10−5

8. 10−5

8. 10−6

6. 10−3

⊸ 86% of scan flows detected with less than 4% of FP

21

Page 72: Anomaly Detection with Extreme Value Theory

Do we really flag scan attacks ?

⊸ The main parameter q: a False Positive regulator

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5

FPr (%)

0.0

20.0

40.0

60.0

80.0

100.0TPr

(%)

9. 10−6

1. 10−5 5. 10−5

8. 10−5

8. 10−6

6. 10−3

⊸ 86% of scan flows detected with less than 4% of FP21

Page 73: Anomaly Detection with Extreme Value Theory

A more general framework

Page 74: Anomaly Detection with Extreme Value Theory

SPOT Specifications

⊸ A single main parameter q• With a probabilistic meaning → P(X > zq) < q• False Positive regulator

⊸ Stream capable

• Incremental learning• Fast (∼ 1000 values/s)• Low memory usage (only the excesses)

22

Page 75: Anomaly Detection with Extreme Value Theory

SPOT Specifications

⊸ A single main parameter q• With a probabilistic meaning → P(X > zq) < q• False Positive regulator

⊸ Stream capable• Incremental learning• Fast (∼ 1000 values/s)• Low memory usage (only the excesses)

22

Page 76: Anomaly Detection with Extreme Value Theory

Other things ?

⊸ SPOT• performs dynamic thresholding without distribution assumption• uses it to detect network anomalies

⊸ But it could be adapted to

• compute upper and lower thresholds• other fields• drifting contexts (with an additional parameter) → DSPOT

23

Page 77: Anomaly Detection with Extreme Value Theory

Other things ?

⊸ SPOT• performs dynamic thresholding without distribution assumption• uses it to detect network anomalies

⊸ But it could be adapted to

• compute upper and lower thresholds• other fields• drifting contexts (with an additional parameter) → DSPOT

23

Page 78: Anomaly Detection with Extreme Value Theory

Other things ?

⊸ SPOT• performs dynamic thresholding without distribution assumption• uses it to detect network anomalies

⊸ But it could be adapted to• compute upper and lower thresholds

• other fields• drifting contexts (with an additional parameter) → DSPOT

23

Page 79: Anomaly Detection with Extreme Value Theory

Other things ?

⊸ SPOT• performs dynamic thresholding without distribution assumption• uses it to detect network anomalies

⊸ But it could be adapted to• compute upper and lower thresholds• other fields

• drifting contexts (with an additional parameter) → DSPOT

23

Page 80: Anomaly Detection with Extreme Value Theory

Other things ?

⊸ SPOT• performs dynamic thresholding without distribution assumption• uses it to detect network anomalies

⊸ But it could be adapted to• compute upper and lower thresholds• other fields• drifting contexts (with an additional parameter) → DSPOT

23

Page 81: Anomaly Detection with Extreme Value Theory

A recent example

⊸ Thursday the 9th of February 2017

• 9h : explosion at Flamanville nuclear plant• 11h : official declaration of the incident by EDF

⊸ What about the EDF stock prices ?

24

Page 82: Anomaly Detection with Extreme Value Theory

A recent example

⊸ Thursday the 9th of February 2017

• 9h : explosion at Flamanville nuclear plant• 11h : official declaration of the incident by EDF

⊸ What about the EDF stock prices ?

24

Page 83: Anomaly Detection with Extreme Value Theory

A recent example

⊸ Thursday the 9th of February 2017• 9h : explosion at Flamanville nuclear plant

• 11h : official declaration of the incident by EDF

⊸ What about the EDF stock prices ?

24

Page 84: Anomaly Detection with Extreme Value Theory

A recent example

⊸ Thursday the 9th of February 2017• 9h : explosion at Flamanville nuclear plant• 11h : official declaration of the incident by EDF

⊸ What about the EDF stock prices ?

24

Page 85: Anomaly Detection with Extreme Value Theory

A recent example

⊸ Thursday the 9th of February 2017• 9h : explosion at Flamanville nuclear plant• 11h : official declaration of the incident by EDF

⊸ What about the EDF stock prices ?

24

Page 86: Anomaly Detection with Extreme Value Theory

EDF stock prices

09:02

09:42

10:34

11:32

12:19

13:30

14:43

15:34

16:21

17:14

Time

9.05

9.10

9.15

9.20

9.25

9.30

9.35

9.40ED

F st

ock

price

()

25

Page 87: Anomaly Detection with Extreme Value Theory

EDF stock prices

09:02

09:42

10:34

11:32

12:19

13:30

14:43

15:34

16:21

17:14

Time

9.05

9.10

9.15

9.20

9.25

9.30

9.35

9.40ED

F st

ock

price

()

25

Page 88: Anomaly Detection with Extreme Value Theory

Conclusion

⊸ Context: A great deal of work has been done to developanomaly detection algorithms

⊸ Problem: Decision thresholds rely on either distributionassumption or expertise

⊸ Our solution: Building dynamic threshold with a probabilisticmeaning

• Application to detect network anomalies• But a general tool to monitor online time series in a blind way

⊸ Future: Adapt the method to higher dimensions

26

Page 89: Anomaly Detection with Extreme Value Theory

Conclusion

⊸ Context: A great deal of work has been done to developanomaly detection algorithms

⊸ Problem: Decision thresholds rely on either distributionassumption or expertise

⊸ Our solution: Building dynamic threshold with a probabilisticmeaning

• Application to detect network anomalies• But a general tool to monitor online time series in a blind way

⊸ Future: Adapt the method to higher dimensions

26

Page 90: Anomaly Detection with Extreme Value Theory

Conclusion

⊸ Context: A great deal of work has been done to developanomaly detection algorithms

⊸ Problem: Decision thresholds rely on either distributionassumption or expertise

⊸ Our solution: Building dynamic threshold with a probabilisticmeaning

• Application to detect network anomalies• But a general tool to monitor online time series in a blind way

⊸ Future: Adapt the method to higher dimensions

26

Page 91: Anomaly Detection with Extreme Value Theory

Conclusion

⊸ Context: A great deal of work has been done to developanomaly detection algorithms

⊸ Problem: Decision thresholds rely on either distributionassumption or expertise

⊸ Our solution: Building dynamic threshold with a probabilisticmeaning

• Application to detect network anomalies• But a general tool to monitor online time series in a blind way

⊸ Future: Adapt the method to higher dimensions

26

Page 92: Anomaly Detection with Extreme Value Theory

Conclusion

⊸ Context: A great deal of work has been done to developanomaly detection algorithms

⊸ Problem: Decision thresholds rely on either distributionassumption or expertise

⊸ Our solution: Building dynamic threshold with a probabilisticmeaning

• Application to detect network anomalies

• But a general tool to monitor online time series in a blind way

⊸ Future: Adapt the method to higher dimensions

26

Page 93: Anomaly Detection with Extreme Value Theory

Conclusion

⊸ Context: A great deal of work has been done to developanomaly detection algorithms

⊸ Problem: Decision thresholds rely on either distributionassumption or expertise

⊸ Our solution: Building dynamic threshold with a probabilisticmeaning

• Application to detect network anomalies• But a general tool to monitor online time series in a blind way

⊸ Future: Adapt the method to higher dimensions

26

Page 94: Anomaly Detection with Extreme Value Theory

Conclusion

⊸ Context: A great deal of work has been done to developanomaly detection algorithms

⊸ Problem: Decision thresholds rely on either distributionassumption or expertise

⊸ Our solution: Building dynamic threshold with a probabilisticmeaning

• Application to detect network anomalies• But a general tool to monitor online time series in a blind way

⊸ Future: Adapt the method to higher dimensions

26