-
Politecnico di MilanoDipartimento di Elettronica e
Informazione
DOTTORATO DI RICERCA IN INGEGNERIADELLINFORMAZIONE
Integrated Detection of Anomalous Behaviorof Computer
Infrastructures
Doctoral Dissertation of:FedericoMaggi
Advisor:Prof. Stefano Zanero
Tutor:Prof. Letizia Tanca
Supervisor of the Doctoral Program:Prof. Patrizio Colaneri
- XXII
-
P MDipartimento di Elettronica e Informazione
Piazza Leonardo da Vinci , I- Milano
-
iPreface
is thesis embraces all the eorts that I put during the last
threeyears as a PhD student at Politecnico di Milano. I have been
work-ing under the supervision of Prof. S. Zanero and Prof. G.
Serazzi,who is also the leader of the research group I am part of.
In thistime frame I had the wonderful opportunity of being
initiated toresearch, which radically changed the way I look at
things: I foundmy natural thinking outside the box attitude that
was probablywell-hidden under a thick layer of
lack-of-opportunities, I took partof very interesting joint works
among which the year I spent atthe Computer Security Laboratory at
UC Santa Barbara is at therst place, and I discovered the Zen of my
life.
My research is all about computers and every other
technologypossibly related to them. Clearly, the way I look at
computers haschanged a bit since when I was seven. Still, I can
remember me,typing on that Commodore in front of a tube TV screen,
tryingto get that dn routine written in Basic to work. I was just
playing,obviously, but when I recently found a picture of me in
front of thatscreen...it all became clear.
COMMODORE 64 BASIC X2 64K RAM SYSTEM 38128 BASIC BYTES FREE
READY.
BENVENUTO NEL COMPUTER DI FEDERICO.SCRIVI LA PAROLA DORDINE:
at says WELCOME TO FEDERICOS COMPUTER.TYPE IN THE PASSWORD. So,
although my attempt ofwriting a program to authenticate myself was
a little bit naive being limited to a print instruction up to that
point apart, of course I thought maybe I am not in the wrong place,
and the fact that myresearch is still about security is a good
sign!
Many years later, this work comes to life. ere is a
humongousamount of people that, directly or indirectly, have
contributed tomyresearch and, in particular, to this work. Since my
rst step into the
-
ii
lab, I will not, ever, be thankful enough to Stefano, who,
despite myskepticism, convinced me to submit that application for
the PhDprogram. For trusting me since the very rst moment I am
thankfulto Prof. G. Serazzi as well, who has been always
supportive. Forhosting and supporting my research abroad I thank
Prof. G. Vigna,Prof. C. Kruegel, and Prof. R. Kemmerer. Also, I
wish to thankProf. M. Matteucci for the great collaboration, Prof.
I. Epifani forher insightful suggestions and Prof. H. Bos for the
detailed reviewand the constructive comments.
On the colleagues-side of this acknowledgments I put all
thefellows of Room , Guido, the crew of the seclab and, in
partic-ular, Wil with whom I shared all the pain of paper writing
betweenSept and Jun .
On the friends-side of this list Lorenzo and Simona go rst,for
being our family.
I have tried to translate in simple words the innite gratitudeI
have and will always have to Valentina and my parents for beingmy
xed point in my life. Obviously, I failed.
F MMilano
September
-
iv
Abstract
is dissertation details our research on anomaly detec-tion
techniques, that are central to several classic security-related
tasks such as network monitoring, but it also havebroader
applications such as program behavior characteriza-tion or malware
classication. In particular, we worked onanomaly detection from
three dierent perspective, with thecommon goal of recognizing
awkward activity on computerinfrastructures. In fact, a computer
system has several weakspots that must be protected to avoid
attackers to take ad-vantage of them. We focused on protecting the
operatingsystem, central to any computer, to avoid malicious code
tosubvert its normal activity. Secondly, we concentrated
onprotecting the web applications, which can be considered
themodern, shared operating systems; because of their
immensepopularity, they have indeed become the most targeted
entrypoint to violate a system. Last, we experimented with
noveltechniques with the aim of identifying related events
(e.g.,alerts reported by intrusion detection systems) to build
newand more compact knowledge to detect malicious activity
onlarge-scale systems.
Our contributions regarding host-based protection sys-tems focus
on characterizing a process behavior through thesystem calls
invoked into the kernel. In particular, we en-gineered and
carefully tested dierent versions of a multi-model detection system
using both stochastic and determin-istic models to capture the
features of the system calls duringnormal operation of the
operating system. Besides demon-strating the eectiveness of our
approaches, we conrmedthat the use of nite-state, deterministic
models allow to de-tect deviations from the process control owwith
the highestaccuracy; however, our contribution combine this
eective-ness with advanced models for the system calls
argumentsresulting in a signicantly decreased number of false
alarms.
Our contributions regarding web-based protection sys-tems focus
on advanced training procedures to enable learn-ing systems to
performwell even in presence of changes in theweb application
source code particularly frequent in theWeb . era. We also
addressed data scarcity issues that is areal problem when deploying
an anomaly detector to protecta new, never-used-before application.
Both these issues dra-matically decrease the detection capabilities
of an intrusion
-
vdetection system but can be eectively mitigated by adopt-ing
the techniques we propose.
Last, we investigated the use of dierent stochastic andfuzzy
models to perform automatic alert correlation, which isas post
processing step to intrusion detection. We proposeda fuzzy model
that formally denes the errors that inevitablyoccur if time-based
alert aggregation (i.e., two alerts are con-sidered correlated if
they are close in time) is used. ismodel allow to account for
measurements errors and avoidfalse correlations due to delays, for
instance, or incorrect pa-rameter settings. In addition, we dened a
model to describethe alert generation as a stochastic process and
experimentedwith non-parametric statistical tests to dene robust,
zero-conguration correlation systems.
e aforementioned tools have been tested over dierentdatasets
that are thoroughly documented in this document and lead to
interesting results.
-
vi
Sommario
Questa tesi descrive in dettaglio la nostra ricerca
sulletecniche di anomaly detection. Tali tecniche sono
fonda-mentali per risolvere problemi classici legati alla
sicurezza,come per esempio il monitoraggio di una rete, ma hanno
an-che applicazioni di piu ampio spettro come lanalisi del
com-portamento di un processo in un sistema o la classicazionedi
malware. In particolare, il nostro lavoro si concentra su
treprospettive dierenti, con lo scopo comune di rilevare atti-vita
sospette in un sistema informatico. Difatti, un sistemainformatico
ha diversi punti deboli che devono essere protettiper evitare che
un aggressore possa approttarne. Ci siamoconcentrati sulla
protezione del sistema operativo, presentein qualsiasi computer,
per evitare che un programma possaalterarne il funzionamento. In
secondo luogo ci siamo con-centrati sulla protezione delle
applicazioni web, che possonoessere considerate il moderno sistema
operativo globale; in-fatti, la loro immensa popolarita ha fatto si
che diventasseroil bersaglio preferito per violare un sistema.
Inne, abbia-mo sperimentato nuove tecniche per identicare relazioni
traeventi (e.g., alert riportati da sistemi di intrusion
detection)con lo scopo di costruire nuova conoscenza per poter
rilevareattivita sospette su sistemi di larga-scala.
Riguardo ai sistemi di anomaly detection host-based cisiamo
focalizzati sulla caratterizzazione del comportamen-to dei processi
basandoci sul usso di system call invocatenel kernel. In
particolare, abbiamo ingegnerizzato e valuta-to accuratamente
diverse versioni di un sistema di anoma-ly detection multi-modello
che utilizza sia modelli stocasti-ci che modelli deterministici per
catturare le caratteristichedelle system call durante il
funzionamento normale del si-stema operativo. Oltre ad aver
dimostrato lecacia dei no-stri approcci, abbiamo confermato che
lutilizzo di modellideterministici a stati niti permettono di
rilevare con estre-ma accuratezza quando un processo devia
signicativamen-te dal normale control ow; tuttavia, lapproccio che
propo-niamo combina tale ecacia con modelli stocastici
avanzatipermodellizzare gli argomenti delle system call per
diminuiresignicativamente il numero di falsi allarmi.
Riguardo alla protezione delle applicazioni web ci
siamofocalizzati su procedure avanzate di addestramento. Lo sco-po
e permettere ai sistemi basati su apprendimento non su-
-
vii
pervisionato di funzionare correttamente anche in presenzadi
cambiamenti nel codice delle applicazioni web feno-meno
particolarmente frequente nellera del Web .. Ab-biamo anche
arontato le problematiche dovute alla scari-sita di dati di
addestramento, un ostacolo piu che realisticospecialmente se
lapplicazione da proteggere non e mai sta-ta utilizzata prima.
Entrambe le problematiche hanno comeconseguenza un drammatico
abbassamento delle capacita didetection degli strumenti ma possono
essere ecacementemitigate adottando le tecniche che proponiamo.
Inne abbiamo investigato lutilizzo di diversi modelli,sia
stocastici che fuzzy, per la correlazione di allarmi auto-matica,
fase successiva alla rilevazione di intrusioni. Abbia-mo proposto
un modello fuzzy che denisce formalmente glierrori che
inevitabilmente avvengono quando si adottano al-goritmi di
correlazione basati sulla distanza nel tempo (i.e.,due allarmi sono
considerati correlati se sono stati riportatipiu o meno nello
stesso istante di tempo). Questo model-lo permette di tener conto
anche di errori di misurazione edevitare decisioni scorrete nel
caso di ritardi di propagazione.Inoltre, abbiamo denito un modello
che descrive la genera-zione di allarmi come un processo stocastico
e abbiamo spe-rimentato con dei test non parametrici per denire dei
criteridi correlazione robusti e che non richiedono
congurazione.
-
Contents
Introduction . Todays Security reats . . . . . . . . . . . . .
.
.. e Role of Intrusion Detection . . . . . . . Original
Contributions . . . . . . . . . . . . . . .
.. Host-based Anomaly Detection . . . . . . .. Web-based Anomaly
Detection . . . . . . .. Alert Correlation . . . . . . . . . . . .
. .
. Document Structure . . . . . . . . . . . . . . . .
DetectingMalicious Activity . Intrusion Detection . . . . . . .
. . . . . . . . . .
.. Evaluation . . . . . . . . . . . . . . . . . .. Alert
Correlation . . . . . . . . . . . . . . .. Taxonomic Dimensions . .
. . . . . . . .
. Relevant Anomaly Detection Techniques . . . . . ..
Network-based techniques . . . . . . . . . .. Host-based techniques
. . . . . . . . . . . .. Web-based techniques . . . . . . . . . .
.
. Relevant Alert Correlation Techniques . . . . . . . Evaluation
Issues and Challenges . . . . . . . . .
.. Regularities in audit data of IDEVAL . . . .. e base-rate
fallacy . . . . . . . . . . . .
. Concluding Remarks . . . . . . . . . . . . . . . .
Host-based Anomaly Detection . Preliminaries . . . . . . . . . .
. . . . . . . . . . . Malicious System Calls Detection . . . . . .
. . .
.. Analysis of SyscallAnomaly . . . . . . . .
viii
-
CONTENTS ix
.. Improving SyscallAnomaly . . . . . . . . . .. Capturing
process behavior . . . . . . . . .. Prototype implementation . . .
. . . . . . .. Experimental Results . . . . . . . . . . . .
. Mixing Deterministic and Stochastic Models . . . .. Enhanced
Detection Models . . . . . . . . .. Experimental Results . . . . .
. . . . . . .
. Forensics Use of Anomaly Detection Techniques . .. Problem
statement . . . . . . . . . . . . . .. Experimental Results . . . .
. . . . . . . .
. Concluding Remarks . . . . . . . . . . . . . . . .
Anomaly Detection ofWeb-based Attacks . Preliminaries . . . . .
. . . . . . . . . . . . . . .
.. Anomaly Detectors of Web-based Attacks ..
AComprehensiveDetection System toMit-
igate Web-based Attacks . . . . . . . . . . .. Evaluation Data .
. . . . . . . . . . . . .
. Training With Scarce Data . . . . . . . . . . . . . ..
Non-uniformly distributed training data . .. Exploiting global
knowledge . . . . . . . . .. Experimental Results . . . . . . . . .
. . .
. Addressing Changes in Web Applications . . . . . .. Web
Application Concept drift . . . . . . .. Addressing concept drift .
. . . . . . . . . .. Experimental Results . . . . . . . . . . .
.
. Concluding Remarks . . . . . . . . . . . . . . . .
Network and Host Alert Correlation . Fuzzy Models and Measures
for Alert Fusion . . .
.. Time-based alert correlation . . . . . . . . . Mitigating
Evaluation Issues . . . . . . . . . . . .
.. A common alert representation . . . . . . .. Proposed
Evaluation Metrics . . . . . . . . .. Experimental Results . . . .
. . . . . . . .
. Using Non-parametric Statistical Tests . . . . . . .. e
Granger Causality Test . . . . . . . . .. Modeling alerts as
stochastic processes . .
. Concluding Remarks . . . . . . . . . . . . . . . .
-
x CONTENTS
Conclusions
Bibliography
Index
List of Acronyms
List of Figures
List of Tables
-
Introduction
Network connected devices such as personal computers,
mobilephones, or gaming consoles are nowadays enjoying immense
pop-ularity. In parallel, the Web and the humongous amount of
ser-vices it oers have certainly became the most ubiquitous tools
ofall the times. Facebook counts more than millions active usersof
which millions are using it on mobile devices; not to men-tion that
more than billion photos are uploaded to the site eachmonth
[Facebook, ]. And this is just one, popular website.One year ago,
Google estimated that the approximate number ofunique Uniform
Resource Locators (URLs) is trillion [Alpert andHajaj, ], while
YouTube has stockedmore than million videosas ofMarch , with ,,
views just on the most popularvideo as of January [Singer, ]. And
people from all overthe world inundate the Web with more than
million tweets perday. Not only theWeb . has became predominant; in
fact, think-ing that on December the Internet was made of one site
andtoday it counts more than million sites is just astonishing
[Za-kon, ].
e Internet and theWeb are huge [MiniwattsMarketingGrp.,]. e
relevant fact, however, is that they both became themost advanced
workplace. Almost every industry connected itsown network to the
Internet and relies on these infrastructures for
-
. I
a vast majority of transactions; most of the time monetary
transac-tions. As an example, every year Google looses
approximately millions of US Dollars in ignored ads because of the
Im feelinglucky button. e scary part is that, during their daily
work activi-ties, people typically pay poor or no attention at all
to the risks thatderive from exchanging any kind of information
over such a com-plex, interconnected infrastructure. is is
demonstrated by theeectiveness of social engineering [Mitnick, ]
scams carriedover the Internet or the phone [Granger, ]. Recall
that of the phishing is related to nance. Now, compare this
landscapeto what the most famous security quote states.
e only truly secure computer is one buried inconcrete, with the
power turned o and the networkcable cut.
Anonymous
In fact, the Internet is all but a safe place [Ofer Shezaf
andJeremiahGrossman andRobert Auger, ], withmore than ,known data
breaches between and [Clearinghouse, ]and an estimate of ,, records
stolen by intruders. Onemay wonder why the advance of research in
computer security andthe increased awareness of governments and
public institutions arestill not capable of avoiding such
incidents. Besides the fact thatthe aforementioned numbers would be
order of magnitude higherin absence of countermeasures, todays
security issues are, basically,caused by the combination of two
phenomena: the high amount ofsoftware vulnerabilities and the
eectiveness of todays exploitationstrategy.
software aws (un)surprisingly, software is aected by
vulner-abilities. Incidentally, tools that have to do with the
Web,namely, browsers and rd-party extensions, and web
applica-tions, are the most vulnerable ones. For instance, in ,
Se-cunia reported around security vulnerabilities
forMozillaFirefox, for Internet Explorers ActiveX [Secunia, ].Oce
suites and e-mail clients, that are certainly the
must-have-installed tool on every workstation, hold the second
po-sition [e SANS Institute, ].
-
.. Todays Security reats
massication of attacks in parallel to the explosion of the Web.,
attackers and the underground economy have quicklylearned that a
sweep of exploits run against every reachablehost have more chances
to nd a vulnerable target and, thus,is much more protable compared
to a single eort to breakinto a high-value, well-protected
machine.
ese circumstances have initiated a vicious circle that
providesthe attackers with a very large pool of vulnerable targets.
Vul-nerable client hosts are compromised to ensure virtually
unlimitedbandwidth and computational resources to attackers, while
serverside applications are violated to host malicious code used to
in-fect client visitors. And so forth. An old fashioned attacker
wouldhave violated a single site using all the resources available,
stolendata and sold it to the underground market. Instead, a
modernattacker adopts a vampire approach and exploit client-side
soft-ware vulnerabilities to take (remote) control of million
hosts. Inthe past the diusion of malicious code such as viruses was
sus-tained by sharing of infected, cracked software through oppy
orcompact disks; nowadays, the Web oers unlimited, public storageto
attackers that deploy their exploit on compromised websites.
us, not only the type of vulnerabilities has changed,
posingvirtually every interconnected device at risk. e exploitation
strat-egy created new types of threats that take advantage of
classic ma-licious code patterns but in a new, extensive, and
tremendously ef-fective way.
. Todays Securityreats
Every year, new threats are discovered and attacker take
advan-tage of them until eective countermeasures are found. en,
newthreats are discovered, and so forth. Symantec quanties the
amountof new malicious code threats to be ,, as of [Turneret al.,
], , one year earlier and only , in .us, countermeasures must
advance at least with the same growrate. In addition:
[...] the current threat landscape such as the in-creasing
complexity and sophistication of attacks, the
-
. I
F .: Illustration taken from [Holz, ] and cIEEE. Authorized
license limited to Politecnico di Milano.
evolution of attackers and attack patterns, and mali-cious
activities being pushed to emerging countries show not just the
benets of, but also the need for in-creased cooperation among
security companies, gov-ernments, academics, and other
organizations and in-dividuals to combat these changes [Turner et
al., ].
Todays underground economy run a very procientmarket: ev-eryone
can buy credit card information for as low as $.$, fullidentities
for just $.$ or rent a scam hosting solution for $$ per week plus
$-$ for the design [Turner et al., ].
e main underlying technology actually employs a classic typeof
software called bot (jargon for robot), which is not malicious
perse, but is used to remotely control a network of compromised
hosts,called botnet [Holz, ]. Remote commands can be of any typeand
typically include launching an attack, starting a phishing orspam
campaign, or even updating to the latest version of the botsoftware
by downloading the binary code from a host controlled by
-
.. Todays Security reats
the attackers (usually called bot master) [Stone-Gross et al.,
].e exchange good has now become the botnet infrastructure
itselfrather than the data that can be stolen or the spam that can
be sent.ese are mere outputs of todays most popular service oered
forrent by the underground economy.
.. e Role of Intrusion Detection
e aforementioned, dramatic big picture may lead to think thatthe
malicious software will eventually proliferate at every host ofthe
Internet and no eective remediation exists. However, a morecareful
analysis reveals that, despite the complexity of this scenario,the
problems that must be solved by a security infrastructure canbe
decomposed into relatively simple tasks that, surprisingly,
mayalready have a solution. Let us look at an example.
Example .. is is how a sample exploitation can be
structured:
injection a malicious request is sent to the vulnerable web
ap-plication with the goal of corrupting all the responses sentto
legitimate clients from that moment on. For instance,more than one
releases of the popular WordPress blog appli-cation are vulnerable
to injection attacks that allow an at-tacker to permanently include
arbitrary content to the pages.Typically, such an arbitrary content
is malicious code (e.g.,JavaScript, VBSCrip, ActionScript, ActiveX)
that, every timea legitimate user requests the infected page,
executes on theclient host.
infection Assuming that the compromised site is frequently
ac-cessed this might be the realistic case of the WordPress-powered
ZDNet news blog a signicant amount of clientsvisit it. Due to the
high popularity of vulnerable browsersand plug-ins, the client may
run InternetExplorer that isthe most popular or an outdated release
of Firefox onWin-dows. is create the perfect circumstances for the
maliciouspage to successfully execute. In the best case, it may
down-load a virus or a genericmalware from awebsite under
control
http://secunia.com/advisories/http://wordpress.org/showcase/zdnet/
-
. I
of the attacker, so infecting the machine. In the worst
case,this code may also exploit specic browser vulnerabilities
andexecute in privileged mode.
control & use e malicious code just download installs
andhides itself onto the victims computer, which has just joineda
botnet. As part of it, the client host can be remotely con-trolled
by the attackers who can, for instance, rent it, use itsbandwidth
and computational power along with other com-puters to run a
distributed Denial of Service (DoS) attack.Also, the host can be
used to automatically perform the sameattacks described above
against other vulnerable web appli-cations. And so forth.
is simple yet quite realistic example shows the various kindsof
malicious activity that are generated during a typical
drive-byexploitation. It also shows its requirements and
assumptions thatmust hold to guarantee success. More precisely, we
can recognize:
network activity clearly, the whole interaction relies on a
net-work connection over the Internet: the HyperText
TransferProtocol (HTTP) connections used, for instance, to
down-load the malicious code as well as to launch the injection
at-tack used to compromise the web server.
host activity similarly to every other type of attack against
anapplication, when the client-side code executes, the browser(or
one of its extension plug-ins) is forced to behave improp-erly. If
the malicious code executes till completion the attacksucceeds and
the host is infected. is happens only if theplatform, operating
system, and browser all match the re-quirements assumed by the
exploit designer. For instance,the attack may succeed on Windows
and not on MacOS X,although the vulnerable version of, say, Firefox
is the same onboth the hosts.
HTTP trac in order to exploit the vulnerability of the
webapplication, the attacking clientmust
generatemaliciousHTTPrequests. For instance, in the case of an
Structured QueryLanguage (SQL) injection that is the second most
com-mon vulnerability in a web application instead of a regular
-
.. Todays Security reats
GET /index.php?username=myuser
the web server might be forced to process a
GET /index.php?username= OR x=x--\&content=
that causes the index.php page to behave improperly.
It is now clear that protection mechanisms that analyze the
net-work trac, the activity of the clients operating system, the
webservers HTTP logs, or any combination of the three, have
chancesof recognizing that something malicious is happening in the
net-work. For instance, if the Internet Service Provider (ISP)
networkadopt Snort, a lightweight Intrusion Detection System (IDS)
that an-alyzes the network trac for known attack patterns, could
block allthe packets marked as suspicious. is would prevent, for
instance,the SQL injection to reach the web application. A similar
protec-tion level can be achieved by using other tools such as
ModSecu-rity [Ristic, ]. One of the problems that may arise with
theseclassic, widely adopted solutions is if a zero day attack is
used. Azero day attack or threat exploits a vulnerability that is
unknown tothe public, undisclosed to the software vendor, or a x is
not avail-able; thus, protection mechanisms that merely blacklist
knownma-licious activity immediately become ineective. In a similar
vein,if the client is protected by an anti-virus, the infection
phase canbe blocked. However, this countermeasure is once again
successfulonly if the anti-virus is capable of recognizing the
malicious code,which assumes that the code is known to be
malicious.
Ideally, an eective and comprehensive countermeasure can
beachieved if all the protection tools involved (e.g., client-side,
server-side, network-side) can collaborate together. For instance,
if a web-site is publicly reported to bemalicious, a client-side
protection toolshould block all the content downloaded from that
particular web-site. is is only a simple example.
us, countermeasures against todays threats already exist butare
subject to at least two drawbacks:
they oer protection only against known threats. To be ef-fective
we must assume that all the hostile trac can be enu-merated, which
is clearly an impossible task.
-
. I
Why is Enumerating Badness a dumb idea?Its a dumb idea because
sometime around the amount of Badness in the Internet began
tovastly outweigh the amount of Goodness. For ev-ery harmless,
legitimate, application, there are dozensor hundreds of pieces of
malware, worm tests, ex-ploits, or viral code. Examine a typical
antiviruspackage and youll see it knows about ,+ virusesthat might
infect your machine. Compare that tothe legitimate or so apps that
Ive installed onmy machine, and you can see its rather dumb totry
to track , pieces of Badness when even asimpleton could track
pieces ofGoodness [Ranum,].
they lack of cooperation, which is crucial to detect global
andslow attacks.
is said, we conclude that classic approaches such as dynamicand
static code analysis and IDS already oer good protection
butindustry and research should move toward methods that require
lit-tle or no knowledge. In this work, we indeed focus on the so
calledanomaly-based approaches, i.e., those that attempt to
recognize thethreats by detecting any variation from a systems
normal operation,rather than looking for signs of
known-to-be-malicious activity.
. Original Contributions
Ourmain research area is IntrusionDetection (ID). In particular,
wefocus on anomaly-based approaches to detect malicious
activities.Since todays threats are complex, a single point of
inspection isnot eective. A more comprehensive monitoring system is
moredesirable to protect both the network, the applications running
on acertain host, and the web applications (that are particularly
exposeddue to the immense popularity of the Web). Our
contributionsfocus on the mitigation of both host-based and
web-based attacks,along with two techniques to correlate alerts
from hybrid sensors.
-
.. Original Contributions
.. Host-based Anomaly Detection
Typical malicious processes can be detected by modeling the
char-acteristics (e.g., type of arguments, sequences) of the system
callsexecuted by the kernel, and by agging unexpected deviations
asattacks. Regarding this type of approaches, our contributions
focuson hybrid models to accurately characterize the behavior of a
binaryapplication. In particular:
we enhanced, re-engineered, and evaluated a novel tool
formodeling the normal activity of the Linux . kernel. Com-pared to
other existing solutions, our system shows better de-tection
capabilities and good contextualization of the alertsreported. ese
results are detailed in Section ..
We engineered and evaluated an IDS to demonstrate thatthe
combined use of () deterministic models to characterizea process
control ow and () stochastic models to capturenormal features of
the data ow, lead to better detection ac-curacy. Compared to the
existing deterministic and stochas-tic approaches separately, our
system shows better accuracy,with almost zero false positives. ese
results are detailed inSection ..
We adapted our techniques for forensics investigation. Byrunning
experiments on real-world data and attacks, we showthat our system
is able to detect hidden tamper evidence al-though sophisticated
anti-forensics tools (e.g., userland pro-cess execution) have been
used. ese results are detailed inSection ..
.. Web-based Anomaly Detection
Attempts of compromising a web application can be detected
bymodeling the characteristics (e.g., parameter values, character
dis-tributions, session content) of the HTTP messages exchanged
be-tween servers and clients during normal operation. is
approachcan detect virtually any attempt of tampering with HTTP
mes-sages, which is assumed to be evidence of attack. In this
researcheld, our contributions focus on training data scarcity
issues alongwith the problems that arise when an application
changes its legitbehavior. In particular:
-
. I
we contributed to the development of a system that learnsthe
legit behavior of a web application. Such a behavior isdened by
means of features extracted from ) HTTP re-quests, ) HTTP
responses, ) SQL queries to the underly-ing database, if any. Each
feature is extracted and learned byusing dierentmodels, some of
which are improvements overwell-known approaches and some others
are original. emain contribution of this work is the combination of
databasequery models with HTTP-based models. e resulting sys-tem
has been validated through preliminary experiments thatshown very
high accuracy. ese results are detailed in Sec-tion ...
we developed a technique to automatically detect legit changesin
web applications with the goal of suppressing the largeamount of
false detections due to code upgrades, frequent intodays web
applications. We run experiments on real-worlddata to show that our
simple but very eective approach ac-curately predict changes in web
applications and can distin-guish good vs. malicious changes (i.e.,
attacks). ese resultsare detailed in Section ..
We designed and evaluated a machine learning technique
toaggregate IDS models with the goal of ensuring good detec-tion
accuracy even in case of scarce training data available.Our
approach relies on clustering techniques and nearest-neighbor
search to look-up well-trained models used to re-place
under-trained ones that are prone to overtting andthus false
detections. Experiments on real-world data haveshown that almost
every false alert due to overtting is avoidedwith as low as -
training samples per model. ese re-sults are described in Section
..
Although these techniques have been developed on top of
aweb-based anomaly detector, they are suciently generic to be
eas-ily adapted to other systems using learning approaches.
.. Alert Correlation
IDS alerts are usually post-processed to generate compact
reportsand eliminate redundant, meaningless, or false detections.
In this
-
.. Document Structure
research eld, our contributions focus on unsupervised
techniquesapplied to aggregate and correlate alert events with the
goal of re-ducing the eort of the security ocer. In particular:
We developed and tested an approach that accounts for thecommon
measurement errors (e.g., delays and uncertainties)that occur in
the alert generation process. Our approach ex-ploits fuzzy metrics
both to model errors and to constructan alert aggregation criterion
based on distance in time. istechnique has been show to be more
robust compared to clas-sic time-distance based aggregationmetrics.
ese results aredetailed in Section ..
We designed and tested a prototype that models the
alertgeneration process as a stochastic process. is setting
al-lowed us to construct a simple, non-parametric hypothesistest
that can detect whether two alert streams are correlatedor not.
Besides its simplicity, the advantage of our approachis to not
requiring any parameter. ese results are describedin Section ..
e aforementioned results have been published in the proceed-ings
of international conferences and international journals.
. Document Structure
is document is structured as follows. Chapter introduces theID,
that is the topic of our research. In particular, Chapter
rigor-ously describes all the basic components that are necessary
to denethe ID task and an IDSs. e reader with knowledge on this
subjectmay skip the rst part of the chapter and focus on Section .
and. that include a comprehensive review of the most relevant
andinuential modern approaches on network-, host-, web-based
IDtechniques, along with a separate overview of the alert
correlationapproaches.
As described in Section ., the description of our
contributionsis structured into three chapters. Chapter focuses on
host-basedtechniques, Chapter regards web-based anomaly detection,
whileChapter described two techniques that allow to recognize
rela-tions between alerts reported by network- and host-based
systems.Reading Section .. is recommended before reading Chapter
.
-
. I
e reader interested in protection techniques for the
operatingsystem can skim through Section .. and then read Chapter
.e reader with interests on web-based protection techniques canread
Section .. and then Chapter . Similarly, the reader inter-ested in
alert correlation systems can skim through Section ..and .. and
then read Chapter .
-
Detecting Malicious Activity
Malicious activity is a generic term that refers to automated
orman-ual attempts of compromising the security of a computer
infrastruc-ture. Examples of malicious activity include the output
generated(or its eect on a system) by the execution of simple
script kiddies,viruses, DoS attacks, exploits of Cross-Site
Scripting (XSS) or SQLinjection vulnerabilities, and so forth. is
chapter describes theresearch tools and methodologies available to
detect and mitigatemalicious activity on a network, a single host,
a web-server and thecombination of the three.
First, the background concepts and the ID problem are de-scribed
in this chapter along with a taxonomic description of themost
relevant aspects of an IDS. Secondly, a detailed survey of
theselected state-of-the-art anomaly detection approaches is
presentedwith the help of further classication dimensions. In
addition, theproblem of alert correlation, that is an orthogonal
topic, is describedand the most relevant, recent research
approaches are overviewed.Last but not least, the problem of
evaluating an IDS is presentedto provide the essential terminology
and criteria to understand theeectiveness of both the reviewed
literature and our original con-tributions.
-
. D M A
ISP network
Internet
ClientsCu
stom
ers'
clie
nts
Broadbandnetwork
Virtual hostingCustomers' servers DBs
= Anti-malware= host-based IDS
= network-based IDS
Deployable protection mechanisms
= web-based IDS
F .: Generic and comprehensive deployment scenario forIDSs.
. Intrusion Detection
ID is the practice of analyzing a system to nd malicious
activity.is section denes this concept more precisely by means of
sim-ple but rigorous denitions. e context of such denitions is
thegeneric scenario of an Internet site, for instance, the network
of anISP. An example is depicted in Figure .. An Internet site
and,in general, the Internet itself is the state-of-the-art
computer in-frastructure. In fact, it is a network that adopts
almost any kind ofknown computer technology (e.g., protocols,
standards, contain-ment measures), it runs a rich variety of
servers such as HTTP,File Transfer Protocol (FTP), Secure SHell
(SSH), Virtual PrivateNetwork (VPN) to support a broad spectrum of
sophisticated ap-plications and services (e.g., web applications,
e-commerce, appli-cations, the FacebookPlatform,
GoogleApplications). In addition, ageneric Internet site receives
and process trac from virtually anyuser connected to the Net and
thus represents the perfect researchtestbed for IDS.
Denition .. (System) A system is the domain of interest
forsecurity.
Note that a system can be a physical or a logical one. For
instance,
-
.. Intrusion Detection
a physical system could be a host (e.g., theDataBase (DB) server
orone of the client machines shown in the gure), a whole
network(e.g., the ISP network shown in the gure); a logical system
can bean application, a service, such as one of the web services
run in avirtual machine deployed in the ISP network. While running,
eachsystem produces activity, that we dene as follows.
Denition .. (System activity) A system activity is any data
gen-erated while the system is working. Such activity is formalized
asequence of events I = [I1; I2; Ii; : : : ; IN ].
For instance, each of the clients of Figure . produces system
logs:in this case I would contain an Ii for each log entry. A
humanreadable representation follows.
chatWithContact:(null)] got a nil targetContact. Aug 18
00:29:40[0x0-0x1b01b].org.mozilla.firefox[0]: NPP_Initialize called
Aug 1800:29:40 [0x0-0x1b01b].org.mozilla.firefox[0]: 2009-08-18
00:29:40.039firefox-bin[254:10b]
NPP_New(instance=0x169e0178,mode=2,argc=5) Aug 1800:29:40
[0x0-0x1b01b].org.mozilla.firefox[0]: 2009-08-18
00:29:40.052firefox-bin[254:10b] NPP_NewStream end=396239
Similarly, the web servers generates HTTP requests and
re-sponses: in this case Iwould contain an Ii for
eachHTTPmessage.Its human readable representation follows.
/media//images/favicon.ico HTTP/1.0 200 1150 -
Mozilla/5.0(Macintosh; U; Intel Mac OS X 10.5; en-US;
rv:1.9.0.10)Gecko/2009042315 Firefox/3.0.10 Ubiquity/0.1.4
128.111.48.4[20/May/2009:15:26:44 -0700] POST /report/ HTTP/1.0 200
19171http://www.phonephishing.info/report/ Mozilla/5.0 (Macintosh;
U;Intel Mac OS X 10.5; en-US; rv:1.9.0.10)
Gecko/2009042315Firefox/3.0.10 Ubiquity/0.1.4 128.111.48.4
[20/May/2009:15:26:44-0700] GET /media//css/main.css HTTP/1.0 200
5525http://www.phonephishing.info/report/ Mozilla/5.0 (Macintosh;
U;Intel Mac OS X 10.5; en-US; rv:1.9.0.10)
Gecko/2009042315Firefox/3.0.10 Ubiquity/0.1.4 128.111.48.4
[20/May/2009:15:26:44-0700] GET /media//css/roundbox.css HTTP/1.0
200 731http://www.phonephishing.info/media//css/main.css
Mozilla/5.0(Macintosh; U; Intel Mac OS X 10.5; en-US;
rv:1.9.0.10)Gecko/2009042315 Firefox/3.0.10 Ubiquity/0.1.4
Other examples of system activity are described in Section ...In
this document, we often used the term normal behavior refer-
ring to a set of characteristics (e.g., the distribution of the
charactersof string parameters, the mean and standard deviation of
the values
-
. D M A
of integer parameters) extracted from the system activity
gatheredduring normal operation (i.e., without being compromised).
More-over, in the remainder of this document, we need other
denitions.
Denition .. (Activity Prole) e activity prole (or activitymodel)
cI is a set of models
cI = hm(1); : : : ;m(u); : : : ;m(U)i
generated by extracting features from the system activity I.
is denition will be used in Section ... and Example ..and ..
describe an instance ofm(u). An example of a real-worldprole is
described in Example ... We can now dene the:
Denition .. (System Behavior) e system behavior is the setof
features (or models), along with their numeric values, extractedby
(or contained in) the activity prole.
In particular, we will use this term as a synonym of normal
sys-tem behavior, referring to the system behavior during normal
op-eration.
Given the high-accessibility of the Internet, publicly
availablesystems such as web servers, web applications, DB servers,
are con-stantly at risk. In particular, they can be compromised
with thegoal of stealing valuable information, deploying infection
kits orrunning phishing and spam campaigns. ese are all examples
ofintrusions. More formally.
Denition .. (Intrusion) An intrusion is the automated orman-ual
act of violating one or more security paradigms (i.e.,
conden-tiality, integrity, availability) of a system. Intrusions
are formalizedas a sequence of events:
O = [O1; O2; Oi; : : : ; OM ] ITypically, when an intrusion
takes place, a system behaves unex-pectedly and, as a consequence,
its activity diers than during nor-mal operation. is is because an
attack or the execution ofmalwarecode often exploit vulnerabilities
to bring the system into states itwas not designed for. For this
reason, the activity that the system
-
.. Intrusion Detection
generates is called malicious activity; often, this term is also
usedto indicate the attack or the malware code execution itself. An
ex-ample of Oi is the following log entry: it shows evidence of a
XSSattack that will make a vulnerable page to display the arbitrary
con-tent supplied as part of the GET request (while the page was
notintentionally designed to this purpose).
/report/add/comment//
HTTP/1.0 200 731
http://www.phonephishing.info/report/add/Mozilla/5.0 (Macintosh; U;
Intel Mac OS X 10.5; en-US; rv:1.9.0.10)Gecko/2009042315
Firefox/3.0.10 Ubiquity/0.1.4
Note .. We adopt a simplied representation of intrusions
withrespect to a datasetD (i.e., both normal andmalicious): withO
Dwe indicate that the activity I contains malicious events O;
how-ever, strictly speaking, intrusions events and activity events
can beof completely dierent types, thus the relation may not be
de-ned in a strict mathematical sense.
If a system is well-designed, any intrusion attempts always
leavesome traces in the systems activity. ese traces are called
tamperevidence and are essential to perform intrusion
detection.
Denition .. (Intrusion Detection) Intrusion detection is the
sep-aration of intrusions from normal activity through the analysis
ofthe activity of a system, while the system is running.
Intrusionsare marked as alerts, which are formalized as a sequence
of eventA = [A1; A2; Ai; : : : ; AL] O.Similarly, A O must not be
interpreted in a strict sense: it isjust a notation to indicate
that for each intrusion, an alert may ormay not exist. Note that ID
can be also performed by manuallyinspecting a system activity. is
is clearly a tedious and unecienttask, thus research and industrial
interests are focused on automaticapproaches.
Denition .. (Intrusion Detection System) An intrusion detec-tion
system is an automatic tool that performs the ID task.
-
. D M A
..I;O .IDS .A
F .: Abstract I/O model of an IDS.
Given the above denitions, an abstract model of an IDS is
shownin Figure ..
Although each IDS relies on its own data model and formatto
represent A, the Internet Engineering Task Force (IETF) pro-posed
IntrusionDetectionMessage Exchange Format (IDMEF) [De-bar et al., ]
as a common format for reporting alert streamsgenerated by dierent
IDS.
.. Evaluation
Evaluating an IDS means running it on collected data D = I [
Othat resembles real-world scenarios. is means that such data
in-cludes both intrusions O and normal activity I, i.e., jIj; jOj
> 0and I \ O = ?, i.e., I must include no malicious activity
otherthan O. e system is run in a controlled environment to collect
Aalong with performance indicators for comparison with other
sys-tems. is section presents the basic criteria used to evaluate
mod-ern IDSs.
More precisely, to perform an evaluation experiment correctlya
fundamental hypothesis must hold: the set O must be knownand
perfectly distinguishable from I. In other words, this meansthat, D
must be labeled with a truth le, i.e., a list of all the
eventsknown to be malicious. is allows to treat the ID problem as
aclassic classication problem, for which a set of
well-establishedevaluation metrics are available. In particular, we
are interested atcalculating the following sets.
Denition .. (True Positives (TPs)) e set of true positives isTP
:= fAi 2 A j 9Oj : f(Ai) = Ojg.Where f : A 7! O is a generic
function that, given an alertAi ndsthe corresponding intrusion Oj
by parsing the truth le. e TPset is basically the set of alerts
that are red because a real intrusionhas taken place. e perfect IDS
is such that TP O.
-
.. Intrusion Detection
Denition .. (True Positives (TPs)) e set of false positives
is
FP := fAi 2 A j6 9Oj : f(Ai) = Ojg:
On the other hand, the alerts in FP are incorrect because no
realintrusion can be found in the observed activity. e perfect IDS
issuch that FP = ?.
Denition .. (True Negatives (TNs)) e set of true negativesis
TN := fIj 2 I j6 9Ai : f(Ai) = Ijg:Note that the set of TN does
not contain alerts. Basically, it isthe set of correctly unreported
alerts. e perfect IDS is such thatTN I.
Denition .. (False Negatives (FNs)) e set of false
negativesis
FN := fOj 2 O j6 9Ai : f(Ai) = Ojg:Similarly to TN , FN does not
contain alerts. Basically, it is theset of incorrectly unreported
alerts. e perfect IDS is such thatFN = ?. Note that, TP + TN + FP +
TN = 1 must hold.
In this and other documents, the term false alert refers to FN
[FP . Given the aforementioned sets, aggregated measures can
becalculated.
Denition .. (Detection Rate (DR)) e detection rate, or
truepositive rate, is dened as:
DR :=TP
TP + FN:
e perfect IDS is such that DR = 1. us, the DR measures
thedetection capabilities of the system, that is, the amount of
maliciousevents correctly classied and reported as alerts. On the
other side,the False Positive Rate (FPR) is dened as follows.
Denition .. (FPR) e false positive rate is dened as:
FPR :=FP
FP + TN:
-
. D M A
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Det
ectio
n R
ate
(DR)
False Positive Rate (FPR)
Random classifier
F .: e ROC space.
e perfect IDS is such that FPR = 0. us, the FPR measuresthe
inaccuracies of the system, that is, the amount of legit
eventsincorrectly classied and reported as alerts. ere are also
othermetrics such as the accuracy and the precision which are often
usedto evaluate information retrieval systems; however, these
metricsare not popular for IDS evaluation.
e ROC analysis, originally adopted to measure
transmissionquality of radar systems, is often used to produce a
compact and easyto understand evaluation of classication systems.
Even thoughthere is no standardized procedure to evaluate an IDS,
the researchcommunity agrees on the use of ROC curves to compare
the detec-tion capabilities and quality of an IDS. A ROC curve is
the plot ofDR = DR(FPR) and is obtained by tuning the IDS to trade
oFalse Positives (FPs) against true positives. Without going into
thedetails, each point of a ROC curve correspond to a xed amount
ofDR and FPR calculated under certain conditions (e.g.,
sensitivityparameters). By modifying its conguration the quality of
the clas-sication changes and other points of the ROC are
determined.e ROC space is plotted in Figure . along with the
perfor-mances of a random classiers, characterized byDR = FPR.
eperfect, ideal IDS can increase itsDR from to with FPR =
0:however, it must hold that FPR! 1) DR! 1.
-
.. Intrusion Detection
.. Alert Correlation
e problem of intrusion detection is challenging in todays
com-plex networks. In fact, it is common to have more than one
IDSdeployed, monitoring dierent segments and dierent aspects ofthe
whole infrastructure (e.g., hosts, applications, network, etc.).e
amount of alerts reported by a network of IDSs running in acomplex
computer infrastructure is larger, by several orders of mag-nitude,
than what was common in the smaller networks monitoredyears ago. In
such a context, network administrators are loadedby several alerts
and long security reports often containing a non-negligible amount
of FPs. us, the creation of a clean, compact,and unied view of the
security status of the network is needed.is process is commonly
known as alert correlation [Valeur et al.,] and it is currently one
of the most dicult challenges of thisresearch eld. More
precisely.
Denition .. (Alert Correlation) e alert correlation is the
iden-tication of relations among alerts
A1; A2; Ai; : : : ; AL 2 A1 [ A2 [ [ AKto generate a unique,
more compact and comprehensive sequence
of alerts A0 = [A01; A02; : : : ; A0P ].
A desirable property is that A0 should be as complete as A1 [A2
[ [AK without introducing errors such as alerts that do
notcorrespond to a real intrusion. As for the ID, alert correlation
canbe a manual, and tedious, task. Clearly, automatic alert
correlationsystems are more attractive and can be considered a
complement ofa modern IDS.
Denition .. (Alert Correlation System) An alert
correlationsystem is an automatic tool that performs the alert
correlation task.
e overall IDS abstract model is complemented with an alert
cor-relation engine as shown in Figure ..
.. Taxonomic Dimensions
e rst comprehensive taxonomy of IDSs has been proposed in[Debar
et al., ] and revised in [Debar et al., ]. Another
-
. D M A
..I;O .IDS
.A1...An
.ACS .A0
F .: Abstract I/Omodel of an IDSwith an alert
correlationsystem.
good survey appeared in [Axelsson, a].Compared to the classic
survey found in the literature, this sec-
tion complements the basic taxonomic dimensions by focusing
onmodern techniques. In particular, todays intrusion detection
ap-proaches can be categorized bymeans of the specicmodeling
tech-niques appeared in recent research.
It is important to note that the taxonomic dimensions that
arehereby suggested are not exhaustive, thus certain IDSs may not
tinto it (e.g., [Costa et al., ; Portokalidis et al., ; Newsomeand
Song, ]). On the other hand, an exhaustive and detailedtaxonomy
would be dicult to read. To overcome this diculty,in this section
we describe a high-level, technique-agnostic taxon-omy based on the
dimensions summarized in Table .; in eachsub-section of Section .,
which focus on anomaly-based mod-els, we expand the taxonomic
dimensions by listing and accuratelydetailing further classication
criteria.
... Type of model
IDSs must be divided into two opposite groups: misuse-based
vs.anomaly-based. e former create models of the malicious activ-ity
while the latter create models of normal activity.
Misuse-basedmodels look for patterns of malicious activity;
anomaly-based mod-els look for unexpected activity. In some sense,
IDSs can eitherblacklist or whitelist the observed activity.
Typically, the rst type of models consists in a database of all
theknown attacks. Besides requiring frequent updates which is justa
technical diculty and can be easily automated misuse-basedsystems
assumes the feasibility of enumerating all the
maliciousactivity.
Despite the limitation of being inherently incomplete,
misuse-based systemswidely adopted in the real-world [Roesch, ,
].
-
.. Intrusion Detection
is is mainly due to their simplicity (i.e., attack models are
trig-gered by means of pattern matching algorithms) and accuracy
(i.e.,they generate virtually no false alerts because an attack
signaturecan either match or not).
Anomaly-based approaches are more complex because creat-ing a
specication of normal activity is obviously a dicult task.For this
reason, there are no well-established and widely-adoptedtechniques;
instead, misuse models are as sophisticated as a patternmatching
problem. In fact, the research on anomaly-based systemsis very
active.
ese systems are eective only under the assumption that
ma-licious activity, such as an attack or malware being executed,
al-ways produces sucient deviations from normal activity such
thatmodels are triggered, i.e., anomalies. is clearly has the
positiveside-eect of requiring zero knowledge on the malicious
activity,which makes these systems particularly interesting. e
negativeside-eect is their tendency of producing a signicant amount
offalse alerts (the notion of signicant amount of false alerts will
bediscussed in detail in Section ..).
Obviously, an IDS can benet of both the techniques by,
forinstance, enumerating all the known attacks and using
anomaly-based heuristics to prompt for suspicious activity. is
practice isoften adopted by modern anti-viruses.
It is important to remark that, dierently from the other
taxo-nomic dimensions, the distinction between misuse- and
anomaly-based approaches is fundamental: they are based on opposite
hy-potheses and yield to completely dierent system designs and
re-sults. eir duality is highlighted in Table ..
Example .. (Misuse vs. Anomaly) A misuse-based system Mand an
anomaly-based system A process the same log containing afull dump
of the system calls invoked by the kernel of an auditedmachine. Log
entries are in the form:
(, , ...)
e systemM has the following simple attack model:
Generated from the real world exploit
http://milw0rm.com/exploits/9303
-
. D M A
Feature M- A-
Modeled activity: Malicious NormalDetection method: Matching
Deviationreats detected: Known AnyFalse negatives: High LowFalse
positives: Low High
Maintenance cost: High LowAttack desc.: Accurate Absent
System design: Easy Dicult
Table .: Duality between misuse- and anomaly-based
intrusiondetection techniques. Note that, an anomaly-based IDS can
detectAny threat, under the assumption that an attack always
generatesa deviation in the modeled activity.
if (function_name == read) { /* ... */ if
(match(decode(arg3_value), a{4}b{4}c{4}d{4}e{4}\
f{4}...x{4}3RH~TY7{33}QZjAXP0A0AkAAQ2AB2BB0BBAB\
XP8ABuJIXkweaHrJwpf02pQzePMhyzWwSuQnioXPOHuBxKn\
aQlkOjpJHIvKOYokObPPwRN1uqt5PA... )) fire_alert(VLC bug 35500 is
being exploited!); /* ... */ }
e simple attack signature looks for a pattern generated fromthe
exploit. If the content of the buer (i.e., arg3 value) that
storesthe malicious le matches the given pattern then an alert is
red.On the other hand, the systemA has the followingmodel, based
onthe sample character distribution of each les content. Such
fre-quencies are calculated during normal operation of the
application.
/* ... */ cd[
-
.. Intrusion Detection
Obviously, more sophisticated models can be designed. epurpose
of this example is that of highlighting the main dierencesbetween
the two approaches.
A generalization of the aforementioned examples allows us
tobetter dene an anomaly-based IDS.
Denition .. (Anomaly-based IDS) An anomaly-based IDS isa type of
IDS that generate alerts A by relying on normal activityproles
(Denition ..).
... System activity
IDSs can be classied based on the type of the activity they
mon-itor. e classic literature distinguishes between
network-basedand host-based systems; Network-based Intrusion
Detection System(NIDS) andHost-based Intrusion Detection System
(HIDS), respec-tively. e former inspect network trac (i.e., raw
bytes sniedfrom the network adapters), and the latter inspect the
activity ofthe operating system. e scope of a network-based system
is aslarge as the broadcast domain of the monitored network. On
theother hand, the scope host-based systems is limited to the
singlehost.
Network-based IDSs have the advantage of having a large
scope,while host-based ones have the advantage of being fed with
de-tailed information about the host they run on (e.g., process
infor-mation, Central Processing Unit (CPU) load, number of active
pro-cesses, number of users). is information is often unavailable
tonetwork-based systems and can be useful to rene a decision
re-garding suspicious activity. For example, by inspecting both
thenetwork trac and the kernel activity, an IDS can lter the
alertsregarding the Apache web server version .. on all the hosts
run-ning version ... On the other hand, network-based systems
arecentralized and are much more easy to manage and deploy.
How-ever, NIDS are limited to the inspection of unencrypted
payload,while HIDS may have it decrypted by the application layer.
Forexample, a NIDS cannot detect malicious HTTPS trac.
e network stack is standardized (see Note ..), thus thedenition
of network-based IDS is precise. On the other hand, be-cause of the
immense variety of operating system implementations,
-
. D M A
a clear denition of host data lacks. Existing host-based IDSs
an-alyze audit log les in several formats, other systems keep track
ofthe commands issued by the users through the console. Some
ef-forts have been made to propose standard formats for host
data:Basic Security Module (BSM) and its modern
re-implementationcalled OpenBSM [Watson and Salamon, ; Watson, ]
areprobably the most used by the research community as they
allowdevelopers to gather the full dump of the system calls before
exe-cution in the kernel.
Note .. Although the network stack implementation may varyfrom
system to system (e.g.,Windows and Cisco platforms have dif-ferent
implementation of Trasmission Control Protocol (TCP)), it
isimportant to underline that the notion of IP, TCP, HTTP packet
iswell dened in a system-agnostic way, while the notion of
operatingsystem activity is rather vague and by no means
standardized.
Example .. describes a sample host-based misuse detectionsystem
that inspects the arguments of the system calls. A similar,but more
sophisticated, example based on network trac is Snort[Roesch, ,
].
Note .. (Inspection layer) Network-based systems can be fur-ther
categorized by their protocol inspection capabilities, with
re-spect to the network stack. webanomaly [Kruegel et al., ;
Robert-son, ] is network-based, in the sense that runs in
promiscu-ous mode and inspects network data. On the other hand, it
alsoHTTP-based, in the sense that decodes the payload and
recon-structs the HTTP messages (i.e., request and response) to
detectattacks against web applications.
... Model construction method
Regardless of their type, misuse- or anomaly-based, models can
bespecied either manually or automatically. However, for their
na-ture, misuse models are often manually written because they
arebased on the exhaustive enumeration of known malicious
activ-ity. Typically, these models are called attack signatures;
the largestrepository of manually generated misuse signatures is
released bythe SourcereVulnerabilityResearchTeamTM [Sourcere, ].
Mis-use signatures can be generated automatically: for instance, in
[Singh
-
.. Intrusion Detection
et al., ] amethod to buildmisusemodels of worms is
described.Similarly, low-interaction honeypots often uses malware
emulationto automatically generate signatures. Two recently
proposed tech-niques are [Portokalidis et al., ; Portokalidis and
Bos, ].
On the other hand, anomaly-basedmodels are more suitable
forautomatic generation. Most of the anomaly-based approaches inthe
literature focus on unsupervised learning mechanisms to con-struct
models that precisely capture the normal activity observed.Manually
specied models are typically more accurate and are lessprone to
FPs, although automatic techniques are clearly more de-sirable.
Example .. (Learning character distributions) InExample ..,the
described system A adopts a learning based character distribu-tion
model for strings. Without going into the details, the
ideadescribed in [Mutz et al., ] observes string arguments and
es-timate the characters distribution over the American Standard
forInformation Interxchange (ASCII) set. More practically, the
modelis a histogramH(c); 8c 2 0; 255, whereH(c) is the normalized
fre-quency of character c. During detection, a 2 test is used to
decidewhether or not a certain sting is deemed anomalous.
Beside the obvious advantage of being resilient against
evasion,this model requires no human intervention.
-
. D M A
. Relevant Anomaly Detection Techniques
Our research focuses on anomaly detection. In this section,
theselected state of the art approaches are reviewed with
particular at-tention to network-, host- and web-based techniques,
along withthe most inuential approaches in the recent literature
alert corre-lation. is section provides the reader with the basic
concepts tounderstand our contributions.
.. Network-based techniques
Our research does not include network-based IDSs. Our
contri-butions in alert correlation, however, leverage both
network- andhost-based techniques, thus a brief overview of the
latest (i.e., pro-posed between and ) network-based detection
approachesis provided in this section. We remark that all the
techniques in-cluded in the following are based on TCP/IP, meaning
that modelsof normal activity are constructed by inspecting the
decoded net-work frames up to the TCP layer.
In Table . the selected approaches are marked with bulletsto
highlight their specic characteristics. Such characteristics
arebased on our analysis and experience, thus, other
classicationsmaybe possible. ey are dened as follows:
Time refers to the use of timestamp information, extracted
fromnetwork packets, to model normal packets. For example,normal
packets may be modeled by their minimum and max-imum inter-arrival
time.
Header means that the TCP header is decoded and the elds
aremodeled. For example, normal packets may be modeled bythe
observed ports range.
Payload refers to the use of the payload, either at Internet
Protocol(IP) or TCP layer. For example, normal packets may
bemodeled by the most frequent byte in the observed payloads.
Stochastic means that stochastic techniques are exploited to
createmodels. For example, the model of normal packets may
beconstructed by estimating the sample mean and variance ofcertain
features (e.g., port number, content length).
-
.. Relevant Anomaly Detection Techniques
Deterministic means that certain features are modeled followinga
deterministic approach. For example, normal packets maybe only
those containing a specied set of values for the TimeTo Live (TTL)
eld.
Clustering refers to the use of clustering (and subsequent
classi-cation) techniques. For instance, payload byte vectors may
becompressed using a Self Organizing Map (SOM) where classof
dierent packets will stimulate neighbor nodes.
Note that, since recent research have experimented with
severaltechniques and algorithms, mixed approaches exist and often
leadto better results.
In [Mahoney and Chan, ] a mostly deterministic, simpledetection
technique is presented. During training, each eld ofthe header of
the TCP packets are extracted and tokenized into bytes bins (for
memory eciency reasons). e tokenized valuesare clustered by means
of their values and every time a new value isobserved the
clustering is updated. e detection approach is de-terministic,
since a packet is classied as anomalous if the values ofits header
do not match any of the clusters. Besides the fact of
beingcompletely unsupervised, this techniques detects between and
of the probe and DoS attacks, respectively, in Intrusion Detec-tion
eVALuation (IDEVAL) [Lippmann et al., ]. Slightperformance issues
and a rate of FPs per day (i.e., roughly, false alert every hours)
are the only disadvantage of the approach.
e approach described in [Kruegel et al., ] reconstructsthe
payload stream for each service (i.e., port); this avoids to
evadethe detection mechanism by using packet fragmentation. In
addi-tion to this, a basic application inspection is performed to
distin-guish among the dierent types of request of the specic
service,e.g., for HTTP the service type could be GET, POST, HEAD.e
sample mean and variance of the content length are also calcu-lated
and the distribution of the bytes (interpreted as ASCII
char-acters) found in the payload is estimated using simple
histograms.e anomaly score used for detection aggregates
information re-garding the type of service, expected content length
and payloaddistribution. With a low performance overhead and low
FPR, thissystem is capable of detecting anomalous interactions with
the ap-
-
. D M A
A
T
H
P
S
D
.C
[Mahoney
andChan,]
[Kruegelet
al.,]
[Sekar
etal.,]
[Ram
adas,]
[M
ahoneyand
Chan,b]
[Zanero
andSavaresi,]
[W
angand
Stolfo,]
[Zanero,b]
[Bolzoniet
al.,]
[Wang
etal.,]
Table
.:Taxonom
yofthe
selectedstate
ofthe
artapproaches
fornetw
ork-basedanom
alydetection.
-
.. Relevant Anomaly Detection Techniques
plication layer. One critique is that the system has not been
testedon several applications other than Domain Name System
(DNS).
Probably inspired by the misuse-based, nite-state
techniquedescribed in [Vigna and Kemmerer, ], in [Sekar et al.,
]the authors describe a system to learn the TCP specication. ebasic
nite state model is extended with a network of stochasticproperties
(e.g., the frequency of certain transitions, the most com-mon value
of a state attribute, the distribution of the values foundin the
elds of the IP packets) among states and transitions.
Suchproperties are estimated during training and exploited at
detectionto implement smoother thresholds that ensure as low as .
falsealerts per day. On the other hand, the deterministic nature of
thenite state machine detects attacks with a DR.
Learning Rules for AnomalyDetection (LERAD), the system
de-scribed in [Mahoney andChan, b] is an optimized
ruleminingalgorithm that works well on data with tokenized domains
suchas the elds of TCP or HTTP packets. Although the idea
im-plemented in LERAD can be applied at any protocol layer, it
hasbeen tested on TCP and HTTP but no more than the of theattack in
the testing dataset were detected. Even if the FPR is ac-ceptable
(i.e., alerts per day) its limited detection capabilitiesworsen if
real-world data is used instead of synthetic datasets suchas
IDEVAL. In [Tandon and Chan, ] the LERAD algorithm(Learning Rules
for Anomaly Detection) is used to mine rules ex-pressing normal
values of arguments, normal sequences of systemcalls, or both. No
relationship is learned among the values of dier-ent arguments;
sequences and argument values are handled sepa-rately; the
evaluation is quite poor however, and uses non-standardmetrics.
Unsupervised learning techniques to mine pattern from pay-load
of packets has been shown to be an eective approach. Boththe
network-based approaches described so far and other proposals[Labib
and Vemuri, ; Mahoney, ] had to cope with datarepresented using a
high number of dimensions (e.g., a vector with dimensions, that is
the maximum number of bytes in the TCPpayload). While the majority
of the proposals circumvent the is-sue by ignoring the payload, the
aforementioned issue is brilliantlysolved in [Ramadas, ] and
extended in Unsupervised LearningIDS with -Stages Engine (ULISSE)
[Zanero and Savaresi, ;Zanero, b] by exploiting the clustering
capabilities of a SOM
-
. D M A
[Kohonen, ] with a faster algorithm [Zanero, a], specif-ically
designed for high-dimensional data. e payload of TCPpackets is
indeed compressed into the bi-dimensional grid repre-senting the
SOM, organized in such a way that class of packetscan be quickly
extracted. e approach relies on, and conrms,the assumption that the
trac belongs to a relatively small numberof services and protocols
that can be mapped onto a small numberof clusters. Network packet
are modeled as a multivariate time-series, where the variables
include the packet class into the SOMplus some other features
extracted from the header. At detection,a fast discounting learning
algorithm for outlier detection [Yaman-ishi et al., ] is used to
detect anomalous packets. Although thesystem has not been released
yet, the prototype is able to reach a. DR with as few as . FPs. In
comparison, one of theprototype dealing with payloads available in
literature [Wang et al.,], the best overall result leads to the
detection of . of theattacks, with a FPR that is between . and . e
main weak-ness of this approach is that it works at the granularity
of the pack-ets and thus might be prone to simple evasion attempts
(e.g., bysplitting an attack onto several malicious packets,
interleaved withlong sequence of legit packets). Inspired by [Wang
et al., ]and [Zanero and Savaresi, ], [Bolzoni et al., ] has
beenproposed.
e approach presented in [Wang et al., ] diers from theone
described in [Zanero and Savaresi, ; Zanero, b] eventhough the
underlying key idea is rather similar: byte frequency
dis-tribution. Both the two approaches, and also [Kruegel et al.,
],exploit the distribution of byte values found in the network
packetsto produce some sort of normality signatures. [Wang et al.,
]utilizes a simple clustering algorithm to aggregate similar
packetsand produce a smoother and more abstract signature, [Zanero
andSavaresi, ; Zanero, b] introduces the use of SOM to ac-complish
the task of nding classes of normal packets.
An extension to [Wang and Stolfo, ], which uses 1-grams,is
described in [Wang et al., ] that uses higher-order, random-ized
n-grams to mitigate mimicry attacks. In addition, the newerapproach
does not estimate the frequency distribution of the n-grams, which
causes many FPs if n increases. Instead, it adoptsa ltering
technique to compress the n-grams into memory e-cient arrays of
bits. is decreased the FPR of about two orders of
-
.. Relevant Anomaly Detection Techniques
magnitude.
.. Host-based techniques
A survey of the latest (i.e., proposed between and ) host-based
detection approaches is provided in this section. Most ofthe
techniques leverage system call invoked by the kernel to
createmodels of normal behavior of processes.
In Table . the selected approaches are marked with bulletsto
highlight their specic characteristics. Such characteristics
arebased on our analysis and experience, thus, other
classicationsmaybe possible. ey are dened as follows:
Syscall refers, in general, to the use of system calls to
characterizenormal host activity. For example, a process may be
modeledby the stream of system calls invoked during normal
opera-tion.
Stochastic means that stochastic techniques are exploited to
createmodels. For example, the model of normal processes may
beconstructed by estimating the sample mean and variance ofcertain
features (e.g., length of the opens path argument).
Deterministic means that certain features are modeled following
adeterministic approach. For example, normal processes maybe only
those that use a xed number, say , of le descrip-tors.
Comprehensive approaches are those that have been
extensivelydeveloped and, in general, incorporate a rich set of
features,beyond the proof-of-concept.
Context refers to the use of context information in general.
Forexample, the normal behavior of processes can be modeledalso by
means of the number of the environmental variablesutilized or by
the sequence of system calls invoked.
Data means that the data ow is taken into account. For
example,normal processes may be modeled by the set of values of
thesystem call arguments.
Forensics means that the approach has been also evaluated for
o-line, forensics analysis.
-
. D M A
Our contributions are included in Table . and are detailedin
Chapter . Note that, since recent research have experimentedwith
several techniques and algorithms, mixed approaches exist andoften
lead to better results.
Host-based anomaly detection has been part of intrusion
de-tection since its very inception: it already appears in the
seminalwork [Anderson, ]. However, the denition of a set of
sta-tistical characterization techniques for events, variables and
coun-ters such as the CPU load and the usage of certain commands
isdue to [Denning, ]. e rst mention of intrusion detectionthrough
the analysis of the sequence of syscalls from system pro-cesses is
in [Forrest et al., ], where normal sequences of sys-tem calls are
considered. A similar idea was presented earlier in[Ko et al., ],
which proposes a misuse-based idea by manuallydescribe the
canonical sequence of calls of each and every program,something
evidently impossible in practice.
In [Lee and Stolfo, ] a set of models based on data
miningtechniques is proposed. In principle, the models are agnostic
withrespect to the type of raw event collected, which can be user
activity(e.g., login time, CPU load), network packets (e.g., data
collectedwith tcpdump), or operating system activity (e.g., system
call tracescollected with OpenBSM). Events are processed using
automaticclassication algorithms to assign labels drawn from a nite
set.In addition, frequent episodes are extracted and, nally,
associationrules among events are mined. Such there algorithms are
combinedtogether at detection time. Events marked with wrong
labels, un-expectedly frequent episodes or rule violations will all
trigger alerts.
Alternatively, other authors proposed to use static analysis,
asopposed to dynamic learning, to prole a programs normal
behav-ior. e technique described in [Wagner and Dean, ] com-bines
the benets of dynamic and static analysis. In particular,
threemodels are proposed to automatically derive a specication of
theapplication behavior: call-graph, context-free grammars (or
non-deterministic pushdown automata), and digraphs. All the mod-els
building blocks are system calls. e call-graph is
staticallyconstructed and then simulated, while the program is
running, toresolve non-deterministic paths. In some sense, the
context-freegrammar model called abstract stack is the evolution of
thecall-graph model as it allows to keep track of the state (i.e.,
callstack). e digraph actually k-graph model is keeps track of
-
.. Relevant Anomaly Detection Techniques
A
S
D.
S
C
.
C
D
F
[Lee
andStolfo,]
[Sekaretal.,]
[WagnerandDean,]
[TandonandChan,]
[ Kruegeletal.,a]
[Zanero,]
[G
inetal.,]
[Mutzetal.,]
[Bhatkaretal.,]
[Mutzetal.,]
[FetzerandSuesskraut,]
[Maggietal.,]
[M
aggietal.,a]
[ Frossietal.,]
Table.:
Taxonom
yoftheselected
stateoftheartapproachesforhost-based
anom
alydetection.
Our
contributions
arehighlighted.
-
. D M A
k-long sequences of system calls from an arbitrary point of the
ex-ecution. Despite is simplicity, which ensures a negligible
perfor-mance overhead with respect to the others, this model
achieves thebest detection precision.
In [Tandon and Chan, ] the LERAD algorithm (Learn-ing Rules for
Anomaly Detection) is described. Basically, it isa learning system
to mine rules expressing normal values of ar-guments, normal
sequences of system calls, or both. In partic-ular, the basic
algorithm learns rules in the form A = a;B =b; ) X 2 fx1; x2; : : :
g where uppercase letters indicate pa-rameter names (e.g., path,
flags) while lowercase symbols indicatetheir corresponding values.
For some reason, the rule-learning al-gorithm rst extracts random
pairs from the training set to gener-ate a rst set of rules. After
this, two optimization steps are runto remove rules with low
coverage and those prone to generate FPs(according to a validation
dataset). A system call is deemed anoma-lous if no matching rule is
found. A similar learning and detectionalgorithm is run among
sequences of system calls. e main ad-vantage of the described
approach is that no relationship is learnedamong the values of
dierent arguments of the same system call.Beside the unrealistic
assumption regarding the availability of a la-beled validation
dataset, another side issue is that the evaluation isquite poor and
uses non-standard metrics.
LibAnomaly [Kruegel et al., a] is a library to
implementstochastic, self-learning, host-based anomaly detection
systems. Ageneric anomaly detection model is trained using a number
of sys-tem calls from a training set. At detection time, a
likelihood ratingis returned by the model for each new, unseen
system call (i.e., theprobability of it being generated by the
model). A condence rat-ing can be computed at training for any
model, by determining howwell it ts its training set; this value
can be used at runtime to pro-vide additional information on the
reliability of the model. Whendata is available, by using
cross-validation, an overtting rating canalso be optionally
computed.
LibAnomaly includes four basicmodels. e string
lengthmodelcomputes, from the strings seen in the training phase,
the samplemean and variance 2 of their lengths. In the detection
phase,given l, the length of the observed string, the likelihood p
of the in-put string length with respect to the values observed in
training is
-
.. Relevant Anomaly Detection Techniques
equal to one if l < + and 2
(l)2 otherwise. As mentioned in theExample .., the character
distribution model analyzes the dis-crete probability distribution
of characters in a string. At trainingtime, the so called ideal
character distribution is estimated: eachstring is considered as a
set of characters, which are inserted intoan histogram, in
decreasing order of occurrence, with a classicalrank
order/frequency representation. During the training phase, acompact
representation of mean and variance of the frequency foreach rank
is computed. For detection, a 2 Pearson test returnsthe likelihood
that the observed string histogram comes from thelearned model. e
structural inference model encodes the syntaxof strings. ese are
simplied before the analysis, using a set of re-duction rules, and
then used to generate a probabilistic grammar bymeans of a Markov
model induced by exploiting a Bayesian merg-ing procedure, as
described in [Stolcke and Omohundro, c,b, b]. e token search model
is applied to argumentswhich contain ags or modes. During
detection, if the eld hasbeen agged as a token, the input is
compared against the storedvalues list. If it matches a former
input, the model returns (i.e.,not anomalous), else it returns
(i.e., anomalous).
In [Zanero, ] a general Bayesian framework for encodingthe
behavior of users is proposed. e approach is based on hintsdrawn
from the quantitative methods of ethology and behavioralsciences. e
behavior of a user interacting with a text-based con-sole is
encoded as a Hidden Markov Model (HMM). e observa-tion set includes
the commands (e.g., ls, vim, cd, du) encounteredduring training.
us, the system learns the user behavior in termsof the model
structure (e.g., number of states) and parameters
(e.g.,transitionmatrix, emission probabilities). At detection,
unexpectedor out of context commands are detected as violations
(i.e., lowervalue) of the learned probabilities. One of the major
drawbacks ofthis system is its applicability to real-world
scenarios: in fact, to-days host-based threats perform more
sophisticated and stealthyoperations than invoking commands.
In [Gin et al., ] an improved version of [Wagner andDean, ] is
presented. It is based on the analysis of the binariesand
incorporates the execution environment as a model constraint.More
precisely, the environment is dened as the portion of inputknown at
process load time and xed until it exits. In addition,
-
. D M A
the technique deals with dynamically-linked libraries and is
capableof constructing the data-ow analysis even across dierent
shared-objects.
An extended version of LibAnomaly is described in [Mutz et
al.,]. e basic detectionmodels are essentially the same. In
addi-tion, the authors exploit Bayesian networks instead of naive
thresh-olds to classify system calls according to each model
output. isresults in an improvement in the DRs. Some of our work
describedin Section is based upon this and the original version of
LibAno-maly.
Data-ow analysis has been also recently exploited in [Bhatkaret
al., ], where an anomaly detection framework is
developed.Basically, it builds an Finite State Automaton (FSA)
model of eachmonitored program, on top of which it creates a
network of rela-tions (called properties) among the system call
arguments encoun-tered during training. Such a network of
properties is the maindierence with respect to other FSA based
IDSs. Instead of apure control ow check, which focuses on the
behavior of the soft-ware in terms of sequences of system calls, it
also performs a socalled data ow check on the internal variables of
the program alongtheir existing cycles. is approach has really
interesting properties,among which the fact that not being
stochastic useful propertiescan be demonstrated in terms of
detection assurance. On the otherhand, though, the set of
relationships that can be learned is limited(whereas the relations
encoded by means of the stochastic modelswe describe in Section ..
are not decided a priori and thus vir-tually innite). e relations
are all deterministic, which leads toa brittle detection model
potentially prone to FPs. Finally, it doesnot discover any type of
relationship between dierent argumentsof the same call.
is knowledge is exploited in terms of unary and binary
re-lationships. For instance, if an open system call always uses
thesame lename at the same point, a unary property can be
derived.Similarly, relationships among two arguments are supported,
by in-ference over the observed sequences of system calls, creating
con-straints for the detection phase. Unary relationships include
equal(the value of a given argument is always constant), elementOf
(anargument can take a limited set of values), subsetOf (a
generaliza-tion of elementOf, indicating that an argument can take
multiplevalues, all of which drawn from a set), range (species
boundaries
-
.. Relevant Anomaly Detection Techniques
source dir = d i r ; t a r g e t f i l e = f i l e ;out = open (
t a r g e t f i l e , WR) ;
push ( source dir ) ; while ( ( dir name =pop ( ) ) != NULL)
f
d = opendir ( dir name ) ; foreach (d i r en t ry 2 d ) f
if ( i sd i r ec to ry ( d i r en t ry ) ) push ( d i r en t ry
) ; else f in = open ( d i r en t ry , RD) ; read (
in , buf ) ; wri te ( out , buf ) ; close ( in ) ; g g g close (
out ) ; return 0; g
1
3
6
8
11 12
13
14
18
1920
start(I, O)
FD3 = open(F3,M3)M3elementOf{WR}
M3equal O
opendir(F6)
isWithinDirI
F8isWithinDir F6
isDirectory F8
F8isWithinDir F6
isDirectory F8
FD11 = open(F11,M11)
F11equal F8
read(FD12)
FD12equal FD11
write(FD13)
FD13equal FD3
close(FD14)
FD14
equal FD11
close(FD14)
FD14equal FD11
close(FD18)
FD18
equal FD3
return(0)
1
F .: A data ow example with both unary and binary
rela-tions.
for numeric arguments), isWithinDir (a le argument is always
con-tained within a specied directory), hasExtension (le
extensions).Binary relationships include: equal (equality between
system calloperands), isWithinDir (le located in a specied
directory; con-tains is the opposite), hasSameDirAs, hasSameBaseAs,
hasSameEx-tensionAs (two arguments have a common directory, base
directoryor extension, respectively).
e behavior of each application is logged by storing the
ProcessIDentier (PID), the Program Counter (PC), along with the
systemcalls invoked, their arguments and returned value. e use of
thePC to identify the states in the FSA stands out as an
importantdierence from other approaches. e PC of each system call
isdetermined through stack unwinding (i.e., going back through
theactivation records of the process stack until a valid PC is
found).e technique obviously handles process cloning and
forking.
e learning algorithm is rather simple: each time a new valueis
found, it is checked against all the known values of the sametype.
Relations are inferred for each execution of the monitoredprogram
and then pruned on a set intersection basis. For instance,if
relations R1 and R2 are learned from an execution trace T1 butR1
only is satised in trace T2, the resulting model will not
containR2. Such a process is obviously prone to FPs if the training
phaseis not exhaustive, because invalid relations would be kept
instead
-
. D M A
of being discarded. Figure . shows an example (due to [Bhatkaret
al., ]) of the nal result of this process. During detection,missing
transitions or violations of properties are agged as alerts.e
detection engine keeps track of the execution over the learnedFSA,
comparing transitions and relations with what happens, andraising
an alert if an edge is missing or a constraint is violated.
is FSA approach is promising and has interesting
featuresespecially in terms of detection capabilities. On the other
hand, itonly takes into account relationships between dierent types
of ar-guments. Also, the set of properties is limited to pre-dened
onesand totally deterministic. is leads to a possibly incomplete
de-tection model potentially prone to false alerts. In Section .
wedetail how our approach improves the original implementation.
Another approach based on the analysis of system calls is
[Fet-zer and Suesskraut, ]. In principle, the system is similar to
thebehavior-based techniques we mentioned before. However, the
au-thors have tried to overcome two limitations of the
learning-basedapproaches which, typically, have high FPRs and
require a quiteample training set. is last issue is mitigated by
adopting a com-pletely dierent approach: instead of requiring
training, the sys-tem administrator is required to specify a set of
small whitelist-likemodels of the desired behavior of a certain
application. At run-time, these models are evolved and adapted to
the particular con-text the protected application runs into; in
particular, the systemexploits taint analysis to update a system
call model on-demand.is system can oer very high levels of
protection but the eortrequired to specify the initial model may
not be so trivial; however,the eort may be worth for
mission-critical applications on whichcustomized hardening would be
needed anyways.
.. Web-based techniques
A survey of the latest (i.e., proposed between and ) host-based
detection approaches is provided in this section. All the
tech-niques included in the following are based onHTTP, meaning
thatmodels of normal activity are constructed either by inspecting
thedecoded network frames up to the HTTP layer, or by acting
asreverse HTTP proxies.
In Table . the selected approaches are marked with bulletsto
highlight their specic characteristics. Such characteristics
are
-
.. Relevant Anomaly Detection Techniques
based on our analysis and experience, thus, other
classicationsmaybe possible. ey are dened as follows:
Adaptive refers to the capability of self-adapting to variations
inthe normal behavior.
Stochastic means that stochastic techniques are exploited to
createmodels. For example, the model of normal HTTP requestsmay be
constructed by estimating the sample mean and vari-ance of certain
features (e.g., length of the string parameterscontained in a POST
request).
Deterministic means that certain features are modeled followinga
deterministic approach. For example, normal HTTP ses-sions may be
only those that are generated by a certain nitestate ma