DRIEI PhD Program in Electronic and Computer Engineering PhD School in Information Engineering Host and Network based Anomaly Detectors for HTTP A8acks Pattern Recognition and Applications Group Department of Electrical and Electronic Engineering University of Cagliari, Italy Advisor Prof. Giorgio Giacinto By Davide Ariu
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
DRIEI PhD Program in Electronic and Computer Engineering PhD School in Information Engineering
Host and Network based Anomaly Detectors for HTTP A8acks
Pattern Recognition and Applications Group Department of Electrical and Electronic Engineering University of Cagliari, Italy
Advisor
Prof. Giorgio Giacinto
By
Davide Ariu
Outline
• Web Applica6ons – Mo@va@ons
– Overview
• Intrusion Detec6on Systems – Network vs. Host‐based IDS
– Signature Based IDS
– Anomaly‐based IDS
• Network Based IDS: Payload Analysis – State of Art
– Contribu6on #1: McPAD
– Contribu6on #2: HMMPayl
• Host Based IDS: Request URI Analysis – Contribu6on #3: HMM-Web
• Conclusions
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 2
Web Applica6ons Security Mo6va6ons
• More than 200,000,000 of sites (January 2010)1
– A lot of sensi@ve data sent everyday over the newtork
• Cybercriminals interested in sensi6ve data:
– E.g. Credit Card Numbers
– E.g. Bank Account Creden6als
– E.g. Iden66es theXs. The full iden@ty of a European ci@zen might be quite interes@ng for a terrorist since the free circula@on within European Union Countries.
• Vulnerabili6es on Web Applica6ons
– More than 50% of vulnerabili@es discovered during the first half of 2009 affected Web Applica@ons2
Long Request Buffer Overflow aPack HEAD / aaaaaaa…aaaaaaaaaaaa
URL Decoding Error aPack GET /d/winnt/system32/cmd.exe?/c+dir HTTP/1.0
Host: www Connection: close
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 12
Payload Analysis Mo6va6ons
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 13
• PAYL is based on the n‐gram analysis, a technique that was proposed to solve text classifica@on problems2: – A sliding window of width n runs over the payload – The occurrences of n‐grams are counted and their rela6ve frequencies
are calculated – Example n=1
– Example n=2
1Wang et al., “Anomalous Payload‐based Network Intrusion Detec6on”, RAID Int. Symposium, 2004. 2Damashek, “Gauging similarity with n‐Grams: Language‐independent Categoriza6on of Text”, Science, 1995.
State of Art: PAYL1
4 3 3 1 3 4 2 3 3 4
4 3 3 1 3 4 2 3 3 4
1-gram
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 14
2-gram
State of Art: PAYL
• PAYL is quite effec@ve but: – A value of n=1 doesn’t take into account the structure of the payload • It might be quite simple for an a8acker to mimic distribu@ons of 1‐grams1
• It is difficult to detect a8acks that slightly modify the sta@s@cs of the payload
– To model the structure of the payload a value of n>=2 must be considered • Since the payload is represented in a feature space of size 256n a value of n bigger than 2 can’t be used
1Fogla et al. “Polymorphic Blending APack”, USENIX Security Symposium, 2006.
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 15
Original Contribu6on n°1 McPAD1
Mul@ple Classifiers Payload Anomaly Detector
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 16
1R. Perdisci, D. Ariu, P. Fogla, G. Giacinto, W. Lee. McPAD: A Mul,ple classifier system for accurate payload‐based anomaly detec,on. Computer Networks, 2009. Special Issue on Traffic Classifica@on and Its Applica@ons to Modern Networks
• IDEA: The n‐gram analysis can be approximated using n‐1 classifiers each one of which works into a feature space of size 2562
• We calculate rela@ve frequencies of pairs of bytes from 0 to ν posi6ons away from each other (2‐ν‐gram analysis)
• The 2‐ν‐gram analysis only allows for an approximate representa@on of n‐grams. Ques6on – Is there any algorithm that has the same expressive power of the n‐gram analysis but doesn’t suffer from the same limita@ons in terms of computa@onal cost?
Answer
– Yes, we can use Hidden Markov Models
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 23
Original Contribu6on n°2 HMMPayl1
Hidden Markov Models for the Analysis of the HTTP Payload
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 24
1D. Ariu, G. Giacinto, R. Tronci. HMMPayl: an Intrusion Detec,on System based on Hidden Markov Models. SubmiPed to Computers and Security, Elsevier, 2010.
HMMPayl Hidden Markov Models for Payload Analysis
• IDEA: We can consider an n‐gram as a sequence and model it using HMM.
• Using the HMM we can associate a probability to each sequence extracted from the payload.
• Star@ng from the probabili@es associated to all the sequence extracted from the payload we can obtain an overall probability for it.
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 25
2 1 2 0 0 1 2 1 0 2
• E.g. Given a toy payload (with a window width = 5)
HMMPayl A simple example
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 26
2 1 2 0 0
1 2 0 0 1
1 2 0 0 1
1 2 0 0 1
1 2 0 0 1
1 2 0 0 1
Sequence 1
Sequence 2
Sequence 3
Sequence 4
Sequence 5
Sequence 6
HMM
0.62
0.65
0.67
0.70
0.68
0.64
= 0.66
Probability of the payload
HMMPayl Scheme
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 27
HMMPayl Experimental Setup
• Legi@mate traffic – 7 days of HTTP requests toward the web server of the College of
Compu@ng at Georgia Tech (GT) – 6 days of HTTP requests toward the web server of our
department (DIEE) – 5 days or HTTP request from the DARPA dataset
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 30
1R. Tronci, G. Giacinto, F. Roli, “Dynamic score selec,on for fusion on mul,ple biometric matchers”, ICIAP 2007
HMMPayl Experimental Results: Sequences Sampling
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 31
Host Based IDS Analysis of the Request‐URI
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 32
Original Contribu6on n°3 HMM‐Web1
Hidden Markov Models for Web Applica@ons Protec@on
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 33
1I. Corona, D. Ariu, G. Giacinto. HMM‐Web: A framework for the detec,on of aEacks against web applica,ons. IEEE Interna@onal Conference on Communica@ons, Dreden,2009.
Analysis of the Request URI Mo6va6ons
• With the Request URI input arguments can be provided to the Web Applica6on – Input arguments are provided as aPribute‐value pairs
• Normal requests should be generated clicking somewhere in a web page – The posi@on of a8ributes in the request depends on the
hyperlink
• An aPribute can’t receive whatever value – A model of the values that an a8ribute can receive is necessary – It is important to dis@nguish between alphabe@c‐characters,
digits and meta‐characters.
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 34
HMM‐Web
HMM‐Web Scheme
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 35
GET /search.php?cat=32&key=hmm HTTP/1.1
Module: index.php
Module: list.php
Module: search.php
HMM‐Web Scheme
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 36
Module: list.php
Module: index.php
GET /search.php?cat=32&key=hmm HTTP/1.1
HMM Ensemble Sequence of APributes
cat-key
HMM Ensemble Cat APribute Value
HMM Ensemble Key APribute Value
Module: search.php
3-2
h-m-m
Experimental Results Effec6veness of aPributes’ codifica6on
The curve on the right has been obtained using the codifica6on proposed by Kruegel et al. In “A mul,model approach to the detec,on of web‐based aEacks”, Computer Networks, 2005.
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 37
Conclusions ‐ 1
• With this research we addressed the problem of protec6ng web applica6ons
• We proposed Network‐based IDS that offer protec@ons against a wide range of aPacks
• We proposed an IDS (McPAD) that achieved both high classifica6on accuracy and robustness against a8empts of evasion
• We proposed an IDS (HMMPayl) that realizes a very accurate model of the payload outperforming previously proposed approaches
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 38
Conclusion ‐ 2
• We shown that Mul6ple Classifiers are useful to increase both the classifica6on accuracy and the robustness against aPempts of evasion
• We proposed also a Host‐Based solu6on (HMM-Web) to model the input provided to web applica@ons.
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 39