Top Banner
Machine Learning for Network Anomaly Detection Matt Mahoney
24

Machine Learning for Network Anomaly Detection

Jan 14, 2016

Download

Documents

Machine Learning for Network Anomaly Detection. Matt Mahoney. Network Anomaly Detection. Network – Monitors traffic to protect connected hosts Anomaly – Models normal behavior to detect novel attacks (some false alarms) Detection – Was there an attack?. Host Based Methods. Virus Scanners - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Machine Learning for Network Anomaly Detection

Machine Learning for Network Anomaly Detection

Matt Mahoney

Page 2: Machine Learning for Network Anomaly Detection

Network Anomaly Detection

• Network – Monitors traffic to protect connected hosts

• Anomaly – Models normal behavior to detect novel attacks (some false alarms)

• Detection – Was there an attack?

Page 3: Machine Learning for Network Anomaly Detection

Host Based Methods

• Virus Scanners

• File System Integrity Checkers (Tripwire, DERBI)

• Audit Logs

• System Call Monitoring – Self/Nonself (Forrest)

Page 4: Machine Learning for Network Anomaly Detection

Network Based Methods

• Firewalls

• Signature Detection (SNORT, Bro)

• Anomaly Detection (eBayes, NIDES, ADAM, SPADE)

Page 5: Machine Learning for Network Anomaly Detection

User Modeling

• Source address – unauthorized users of authenticated services (telnet, ssh, pop3, imap)

• Destination address – IP scans

• Destination port – port scans

Page 6: Machine Learning for Network Anomaly Detection

Frequency Based Models

• Used by SPADE, ADAM, NIDES, eBayes, etc.

• Anomaly score = 1/P(event)

• Event probabilities estimated by counting

Page 7: Machine Learning for Network Anomaly Detection

Attacks on Public Services

PHF – exploits a CGI script bug on older Apache web servers

GET /cgi-bin/phf?Qalias=x%0a/usr

/bin/ypcat%20passwd

Page 8: Machine Learning for Network Anomaly Detection

Buffer Overflows

• 1988 Morris Worm – fingerd

• 2003 SQL Sapphire Wormchar buf[100];

gets(buf);

buf stackExploit code

Return Address0 100

Page 9: Machine Learning for Network Anomaly Detection

TCP/IP Denial of Service Attacks

• Teardrop – overlapping IP fragments

• Ping of Death – IP fragments reassemble to > 64K

• Dosnuke – urgent data in NetBIOS packet

• Land – identical source and destination addresses

Page 10: Machine Learning for Network Anomaly Detection

Protocol Modeling

• Attacks exploit bugs

• Bugs are most common in the least tested code

• Most testing occurs after delivery

• Therefore unusual data is more likely to be hostile

Page 11: Machine Learning for Network Anomaly Detection

Protocol Models

• PHAD, NETAD – Packet Headers (Ethernet, IP, TCP, UDP, ICMP)

• ALAD, LERAD – Client TCP application payloads (HTTP, SMTP, FTP, …)

Page 12: Machine Learning for Network Anomaly Detection

Time Based Models

• Training and test phases

• Values never seen in training are suspicious

• Score = t/p = tn/r where– t = time since last anomaly– n = number of training examples– r = number of allowed values– p = r/n = fraction of values that are novel

Page 13: Machine Learning for Network Anomaly Detection

Example tn/r

• Training: 0000111000 n/r = 10/2

• Testing: 01223– 0: no score– 1: no score– 2: tn/r = 6 x 10/2 = 30– 2: tn/r = 1 x 10/2 = 5– 3: tn/r = 1 x 10/2 = 5

Page 14: Machine Learning for Network Anomaly Detection

PHAD – Fixed Rules

• 34 packet header fields– Ethernet (address, protocol)– IP (TOS, TTL, fragmentation, addresses)– TCP (options, flags, port numbers)– UDP (port numbers, checksum)– ICMP (type, code, checksum)

• Global model

Page 15: Machine Learning for Network Anomaly Detection

LERAD – Learns conditional Rules

• Models inbound client TCP (addresses, ports, flags, 8 words in payload)

• Learns conditional rules

If port = 80 then word1 = GET, POST (n/r = 10000/2)

Page 16: Machine Learning for Network Anomaly Detection

LERAD Rule Learning

• If word1 = GET then port = 80 (n/r = 2/1)• word1 = GET, HELO (n/r = 3/2)• If address = Marx then port = 80, 25 (n/r =

2/2)

Address Port Word1 Word2

Hume 80 GET /

Marx 80 GET /index.html

Marx 25 HELO Pascal

Page 17: Machine Learning for Network Anomaly Detection

LERAD Rule Learning

• Randomly pick rules based on matching attributes

• Select nonoverlapping rules with high n/r on a sample

• Train on full training set (new n/r)

• Discard rules that discover novel values in last 10% of training (known false alarms)

Page 18: Machine Learning for Network Anomaly Detection

DARPA/Lincoln Labs Evaluation

• 1 week of attack-free training data

• 2 weeks with 201 attacks

SunOS Solaris Linux NT

RouterInternet

SnifferAttacks

Page 19: Machine Learning for Network Anomaly Detection

Attacks out of 201 Detected at 10 False Alarms per Day

0

20

40

60

80

100

120

140

PHAD ALAD LERAD NETAD

Page 20: Machine Learning for Network Anomaly Detection

Problems with Synthetic Traffic

• Attributes are too predictable: TTL, TOS, TCP options, TCP window size, HTTP, SMTP command formatting

• Too few sources: Client addresses, HTTP user agents, ssh versions

• Too “clean”: no checksum errors, fragmentation, garbage data in reserved fields, malformed commands

Page 21: Machine Learning for Network Anomaly Detection

Real Traffic is Less Predictable

r (Number ofvalues)

Time

Synthetic

Real

Page 22: Machine Learning for Network Anomaly Detection

Mixed Traffic: Fewer Detections, but More are Legitimate

0

20

40

60

80

100

120

140

PHAD ALAD LERAD NETAD

Total

Legitimate

Page 23: Machine Learning for Network Anomaly Detection

Project Status

• Philip K. Chan – Project Leader

• Gaurav Tandon – Applying LERAD to system call arguments

• Rachna Vargiya – Application payload tokenization

• Mohammad Arshad – Network traffic outlier analysis by clustering

Page 24: Machine Learning for Network Anomaly Detection

Further Reading

• Learning Nonstationary Models of Normal Network Traffic for Detecting Novel Attacks by Matthew V. Mahoney and Philip K. Chan, Proc. KDD.

• Network Traffic Anomaly Detection Based on Packet Bytes by Matthew V. Mahoney, Proc. ACM-SAC.

• http://cs.fit.edu/~mmahoney/dist/