Top Banner
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 42, NO. 6, NOVEMBER 2012 1579 Dynamical System Theory for the Detection of Anomalous Behavior in Computer Programs Nitin Kanaskar, Remzi Seker, Member, IEEE, Jiang Bian, Member, IEEE, and Vir V. Phoha, Senior Member, IEEE Abstract—Code injection is a common approach which is utilized to exploit applications. We introduce some of the well-established techniques and formalisms of dynamical system theory into anal- ysis of program behavior via system calls to detect code injections into an applications execution space. We accept a program as a blackbox dynamical system whose internals are not known, but whose output we can observe. The blackbox system observable in our model is the system calls the program makes. The collected system calls are treated as signals which are used to reconstruct the system’s phase space. Then, by using the well-established tech- niques from dynamical system theory, we quantify the amount of complexity of the system’s (program’s) behavior. The change in the behavior of a compromised system is detected as anomalous behav- ior compared with the baseline established from a clean program. We test the proposed approach against DARPA-98 dataset and a real-world exploit and present code injection experiments to show the applicability of our approach. Index Terms—Anomalous behavior, approximate entropy, cen- tral tendency measure (CTM) dynamical system, intrusion de- tection, percent determinism, percent ratio, percent recurrence, recurrence plots, system call sequence. I. INTRODUCTION A CCORDING to McAfee Avert Labs [1], the number of malware has been growing exponentially. While there were more than 135 000 pieces of malware in 2007, in 2008 this number went up to almost 1.5 million averaging about 3500 pieces of malware a day. This number skyrocketed to 20 million in 2010 [2], while in the first quarter of 2011 more than 6 million unique malware samples were identified by the McAfee Labs. At the current rate of growth, the cumulative malware collection is expected to reach 75 million samples by the end of 2011 [3]. Given the exponential growth in the number of malware, ap- proaches differing from those based on developing signatures Manuscript received July 12, 2011; revised December 13, 2011 and March 6, 2012; accepted June 28, 2012. Date of current version December 17, 2012. This work was supported in part by the Louisiana Board of Regents under P-KSFI Grant LEQSF (2007-12)-ENH-PKSFI-PRS-03. The work of V. Phoha was sup- ported by AFOSR Grant FA9550-09-1-0715. This paper was recommended by Associate Editor J. Wang. N. Kanaskar is with the University of Arkansas at Little Rock, Little Rock, AR 72204 USA, also with the University of of Arkansas for Medical Sciences, Little Rock, AR 72205 USA (e-mail: [email protected]). R. Seker was with the Department of Computer Science, University of Arkansas at Little Rock, Little Rock, AR 72204, USA. He is now with ECSSE Department, Embry-Riddle Aeronautical University (e-mail: remzi.seker@erau. edu). J. Bian is with the Division of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR 72205 USA (e-mail: [email protected]). V. V. Phoha is with the Center for Secure Cyberspace, Louisiana Tech Uni- versity, Ruston, LA 71272 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSMCC.2012.2208187 for every malware are needed. System call analysis changes the focus from developing signatures that identify malware to devel- oping signatures that can identify an application’s normal and abnormal behavior [4]–[6]. In this application-oriented perspec- tive, the focus is on how the application behaves, and deviation from preset values is considered as anomalous. Detecting applications that have been derailed and intrusions constitute an important part of the information security frame- work of a computing network. Intrusive attacks often take place by derailing an application and making it behave in ways it was not anticipated. The primary objective of our work can be described as an attempt at the timely detection of system call pattern changes to recognize abnormal application behavior. Through this paper, we present a dynamical system theory ap- proach to system call analysis with the distinct goal of detecting successful attacks. A derailed application is an application that is being used in a new context. This new context should reflect itself in the dynamics associated with system calls made by the derailed application. To accomplish the goal of detecting anoma- lous behavior, we characterize the normal behavior of an appli- cation on the basis of certain dynamical system characteristics. Values of selected dynamical system behavior characterization tools such as approximate entropy, central tendency measure (CTM), and recurrence plot-derived measures are observed for the abnormal behavior of the application and compared with values obtained during its normal behavior. Collectively, the analysis using these tools facilitates us to differentiate between normal and abnormal behaviors of an application. The appli- cations that are selected for the dynamical system analysis are httpd, vsftpd, named, cupsd, and proftpd. All of these daemon programs exhibit a very well defined behavioral pattern. The specific range of dynamical system behavior characterization values defines a daemon normal execution. Any abnormal ac- tivity on the part of a daemon is believed to result in consid- erable change in the monitored values. It is our belief that the dynamical system approach can identify such sudden behavioral deviations for these daemons with a high probability as com- pared with normal user applications. Another significant motive behind using these daemons in our study is that these programs can be compromised by an attacker to gain root level access, albeit for a short time period. Thus, quicker methods to detect abnormal activities of daemons are always needed. The rest of this paper is organized as follows. Section II ex- plains some basic theoretical background on dynamical system analysis necessary for system call analysis. Section III describes the simulation experiment environment setup for different ap- plications. Section IV explains various analysis approaches. Section V discusses and analyzes the results of our prototype implementation for daemon applications httpd, vsftpd, named, 1094-6977/$31.00 © 2012 IEEE
11

Dynamical System Theory for the Detection of Anomalous Behavior in Computer Programs

Apr 27, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dynamical System Theory for the Detection of Anomalous Behavior in Computer Programs

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 42, NO. 6, NOVEMBER 2012 1579

Dynamical System Theory for the Detection ofAnomalous Behavior in Computer Programs

Nitin Kanaskar, Remzi Seker, Member, IEEE, Jiang Bian, Member, IEEE, and Vir V. Phoha, Senior Member, IEEE

Abstract—Code injection is a common approach which is utilizedto exploit applications. We introduce some of the well-establishedtechniques and formalisms of dynamical system theory into anal-ysis of program behavior via system calls to detect code injectionsinto an applications execution space. We accept a program as ablackbox dynamical system whose internals are not known, butwhose output we can observe. The blackbox system observable inour model is the system calls the program makes. The collectedsystem calls are treated as signals which are used to reconstructthe system’s phase space. Then, by using the well-established tech-niques from dynamical system theory, we quantify the amount ofcomplexity of the system’s (program’s) behavior. The change in thebehavior of a compromised system is detected as anomalous behav-ior compared with the baseline established from a clean program.We test the proposed approach against DARPA-98 dataset and areal-world exploit and present code injection experiments to showthe applicability of our approach.

Index Terms—Anomalous behavior, approximate entropy, cen-tral tendency measure (CTM) dynamical system, intrusion de-tection, percent determinism, percent ratio, percent recurrence,recurrence plots, system call sequence.

I. INTRODUCTION

ACCORDING to McAfee Avert Labs [1], the number ofmalware has been growing exponentially. While there

were more than 135 000 pieces of malware in 2007, in 2008this number went up to almost 1.5 million averaging about 3500pieces of malware a day. This number skyrocketed to 20 millionin 2010 [2], while in the first quarter of 2011 more than 6 millionunique malware samples were identified by the McAfee Labs.At the current rate of growth, the cumulative malware collectionis expected to reach 75 million samples by the end of 2011 [3].Given the exponential growth in the number of malware, ap-proaches differing from those based on developing signatures

Manuscript received July 12, 2011; revised December 13, 2011 and March 6,2012; accepted June 28, 2012. Date of current version December 17, 2012. Thiswork was supported in part by the Louisiana Board of Regents under P-KSFIGrant LEQSF (2007-12)-ENH-PKSFI-PRS-03. The work of V. Phoha was sup-ported by AFOSR Grant FA9550-09-1-0715. This paper was recommended byAssociate Editor J. Wang.

N. Kanaskar is with the University of Arkansas at Little Rock, Little Rock,AR 72204 USA, also with the University of of Arkansas for Medical Sciences,Little Rock, AR 72205 USA (e-mail: [email protected]).

R. Seker was with the Department of Computer Science, University ofArkansas at Little Rock, Little Rock, AR 72204, USA. He is now with ECSSEDepartment, Embry-Riddle Aeronautical University (e-mail: [email protected]).

J. Bian is with the Division of Biomedical Informatics, University of Arkansasfor Medical Sciences, Little Rock, AR 72205 USA (e-mail: [email protected]).

V. V. Phoha is with the Center for Secure Cyberspace, Louisiana Tech Uni-versity, Ruston, LA 71272 USA (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TSMCC.2012.2208187

for every malware are needed. System call analysis changes thefocus from developing signatures that identify malware to devel-oping signatures that can identify an application’s normal andabnormal behavior [4]–[6]. In this application-oriented perspec-tive, the focus is on how the application behaves, and deviationfrom preset values is considered as anomalous.

Detecting applications that have been derailed and intrusionsconstitute an important part of the information security frame-work of a computing network. Intrusive attacks often take placeby derailing an application and making it behave in ways itwas not anticipated. The primary objective of our work can bedescribed as an attempt at the timely detection of system callpattern changes to recognize abnormal application behavior.Through this paper, we present a dynamical system theory ap-proach to system call analysis with the distinct goal of detectingsuccessful attacks. A derailed application is an application thatis being used in a new context. This new context should reflectitself in the dynamics associated with system calls made by thederailed application. To accomplish the goal of detecting anoma-lous behavior, we characterize the normal behavior of an appli-cation on the basis of certain dynamical system characteristics.Values of selected dynamical system behavior characterizationtools such as approximate entropy, central tendency measure(CTM), and recurrence plot-derived measures are observed forthe abnormal behavior of the application and compared withvalues obtained during its normal behavior. Collectively, theanalysis using these tools facilitates us to differentiate betweennormal and abnormal behaviors of an application. The appli-cations that are selected for the dynamical system analysis arehttpd, vsftpd, named, cupsd, and proftpd. All of these daemonprograms exhibit a very well defined behavioral pattern. Thespecific range of dynamical system behavior characterizationvalues defines a daemon normal execution. Any abnormal ac-tivity on the part of a daemon is believed to result in consid-erable change in the monitored values. It is our belief that thedynamical system approach can identify such sudden behavioraldeviations for these daemons with a high probability as com-pared with normal user applications. Another significant motivebehind using these daemons in our study is that these programscan be compromised by an attacker to gain root level access,albeit for a short time period. Thus, quicker methods to detectabnormal activities of daemons are always needed.

The rest of this paper is organized as follows. Section II ex-plains some basic theoretical background on dynamical systemanalysis necessary for system call analysis. Section III describesthe simulation experiment environment setup for different ap-plications. Section IV explains various analysis approaches.Section V discusses and analyzes the results of our prototypeimplementation for daemon applications httpd, vsftpd, named,

1094-6977/$31.00 © 2012 IEEE

Page 2: Dynamical System Theory for the Detection of Anomalous Behavior in Computer Programs

1580 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 42, NO. 6, NOVEMBER 2012

cupsd, and proftpd. The first four daemons are subjected tosimulation experiment, whereas analysis of proftpd behavior isperformed by running a real-world exploit on it. A behavioranalysis of the DARPA-98 [7] system call argument data is alsopresented in this section. A few research projects on systemcall argument values exist in the literature, notable ones amongwhich are [8]–[10]. Section VI compares the dynamical systemapproach presented in this paper with a few of the relevant re-search works. Section VII presents a discussion of our method.Section VIII summarizes the work. And, Section IX presentsa discussion of the future work, such as to test our theory onother operating systems, and to build a classification model forintrusion detection based on dynamical system measurements.

II. DYNAMICAL SYSTEM APPROACH

Dynamical system theory has been explored by researchersto understand and explain nonlinear behavior in nature such asturbulence in sea and atmosphere, fluctuation in wildlife popu-lations, accumulation of vehicles on highways, oil flow in un-derground pipes, electronic devices, and many other universallydiverse events [11]. An excellent introduction to nonlinear dy-namics and chaos can be found in [12]. Researchers have alsoperformed extensive analysis as to how stock prices and for-eign exchange rates vary from nonlinear dynamics theory view-point [13]. A dynamical system, put simplistically, traverses aset of states called state space. System states can be defined interms of values of variables in the system. A dynamical system’sbehavior appears random to standard statistical tests, although ittraverses the state space on the basis of some deterministic rules.

As an example, the traffic flow on highway when observedappears random in movement. It is governed by determinis-tic rules that are laid down by the traffic system in the givenplace. Three of the variables that can be observed in such asystem are as follows: 1) number of vehicles stopped at a trafficlight; 2) number of vehicles moving between this traffic lightand neighboring traffic lights; and 3) average speed of trafficmoving between neighboring traffic lights. These variables areaffected by unpredictable events such as accidents, existence ofan ambulance in traffic, or just individual driving styles. Hence,this dynamical system behavior has aspects of randomness orunpredictability along with its determinism. In a similar vein, asoftware application can be considered to have a set of states. Anumber of observable system variables such as memory spaceoccupied, processor cycles used, number of client requests be-ing handled in the case of a server application, amount of filesystem resources being accessed by the application, and appli-cations system call trace define its state. At any given instanceduring its execution, these variables can be influenced by unpre-dictable factors such as available memory space, the number ofother applications vying for processor cycles, number of clientsconnected to the server, amount of network bandwidth available,etc.

Determinism of the system comes from the fact that instruc-tions of a software code are executed in blocks according to thefunctionalities that are implemented in the application. Thus,instructions are executed in a predefined sequence, but depend-

Fig. 1. Dynamical system perspective of a software application.

ing upon input values to the application and the unpredictablefactors mentioned previously, this sequence may change. Sys-tem call sequences that are invoked by the software exhibit thisdeterminism characteristic of the application at a fine-grainedlayer. They represent individual actions performed by applica-tions such as opening a file, reading data from a file, closing afile, and so on. A sequence of system calls invoked by the appli-cation can be considered as a sequence of states it goes throughduring its execution. In other words, system call trace becomessystem observable to represent its behavior. The system call se-quence for the application can be observed and studied over aperiod of time to characterize application behavior in terms ofdegree of determinism.

Fig. 1 presents our overall approach to use an applicationssystem call sequence as a dynamical system observable. Sys-tem call trace which is generated by the application programmay be considered as an observable through a 1-D time seriesfrom which we can reconstruct the state space of the applicationby the process of embedding [14], [15]. We believe an applica-tion’s long-term behavior dynamics can be better understood byreconstructing the application’s state space through embeddingand then applying certain dynamical system analysis tools to an-alyze it. We choose approximate entropy, CTM, and recurrenceplot analysis techniques to study and characterize application’slong-term behavior. These measures define a system’s charac-terization in terms of degree of determinism, similarity, and rateof variability and are the characteristics which we find usefulto create an application’s normal behavior profile. Embeddingdimension and time delay are determined empirically which isthe most common method to ascertain these parameters [15].

We give here theoretical description of the dynamical systembehavior characterization measures.

A. Approximate Entropy

Using this measure, we study an application’s behavior fromthe perspective of the system’s information complexity and uti-lize the state space reconstructed for the application from itssystem call time-series data. Approximate entropy measure wasproposed by Pincus [16] to assess a system’s information com-plexity. It is a statistical measure capable of classifying complexsystems with relatively few data points. Approximate entropyhas been successfully utilized to quantify complexity in physicalsystems as well as physiological systems. It works satisfactorily

Page 3: Dynamical System Theory for the Detection of Anomalous Behavior in Computer Programs

KANASKAR et al.: DYNAMICAL SYSTEM THEORY FOR THE DETECTION OF ANOMALOUS BEHAVIOR IN COMPUTER PROGRAMS 1581

on small lengths of time-series data to give their complexitymeasure [17].

Let us consider a 1-D time series:

x = x(1), x(2), x(3), . . . , x(N). (1)

All the scalar components of this time series are equis-paced in time. A series of m-dimensional points u(1), u(2),u(3), . . . , u(N − m + 1) is formed from (1) such that

u(i) = x(i), x(i + 1), x(i + 2), . . . , x(i + m − 1). (2)

Each of these points specifies a point in the reconstructedm-dimensional state space. A measure Cm

i is defined as

Cmi (r) = Nij/(N − m + 1) (3)

where Nij = Number of u(j) such that the Euclidean distance[u(i), u(j)] < r, r is the radius of the sphere in m-dimensionalspace centered at u(i). Euclidean distance is given as

|u(i) − u(j)| = 2

√√√√

m−1∑

k=0

[x(i + k) − x(j + k)]2 . (4)

We define that

Cm (r) = (N − m + 1)−1N −m+1∑

i=1

Cmi (r) (5)

φm (r) = (N − m + 1)−1N −m+1∑

i=1

logCmi (r). (6)

The approximate entropy for some fixed values of m and r isdefined as

ApEn(m, r) = limN →∞

[

φm (r) − φm+1(r)]

. (7)

B. Central Tendency Measure

Here, we attempt to establish the measure of the chaotic be-havior of the application’s system call time series by calculatingits CTM [18]. CTM is a metric to evaluate the degree of vari-ability in a given data. CTM has been employed in the analysisof various physiological processes such as heart rate variabilityand behavior of schizophrenic patients [19]. The second-orderdifference plot for a time series a(1), a(2), . . . , a(n), obtainedby plotting a(n + 2)a(n + 1) versus a(n + 1)a(n), acts as atool to measure this variability factor for a dynamical system. Ina time series a(1), a(2), . . . , a(n), a(n + 1), a(n + 2) of lengthN , if r is denoted as the radius of the sphere around the origin,then

CTM =

[N −2∑

i=1

δ(di)

]

/(N − 2) (8)

where

δ(di) =

⎪⎨

⎪⎩

1, if |[a(i + 2) − a(i + 1)]2

−[a(i + 1) − a(i)]2 |0.5 < r

0, otherwise.

(9)

The value of radius r is selected depending upon the natureof data [19].

C. Recurrence Plots

Recurrence Plots is a dynamical time-series analysis tech-nique which was developed by Eckmann et al. [20]. It graphi-cally demonstrates time correlation between different points onthe state space of a dynamical system. Point (i, j) in a recur-rence plot is marked black (or 1) if two points that represent thesystem states at instants i and j are close enough as defined bya criterion of Euclidean distance r. Thus,

RP(i, j) ={

1, if d[x(i), x(j)] ≤ r

0, otherwise.(10)

Points x(i) and x(j) are part of an embedded time-series data.The ith row of this multidimensional vector represents the sys-tem state at the ith instant. Recurrence plots have been exploitedto discern hidden patterns and nonstationeries in time-series datafor physiological systems. Different structural elements in recur-rence plots denote certain qualitative aspects of the time-seriesdata in terms of determinism, recurrent patterns. We define per-cent recurrence, percent determinism, and percent ratio [21] tocharacterize an application’s normal behavior.

1) Percent Recurrence: Percent recurrence gives the fractionof points in multidimensional state space repeating previoussystem dynamics. It helps us distinguish a process with periodicdynamic behavior from that with an aperiodic behavior. Themore the points that are observed at same states of a dynamicalsystem, the more the periodicity exhibited by its reconstructedstate space.

2) Percent Determinism: Percent determinism is associatedwith line structures that are present in a recurrence plot. A line ofidentity (LOI) is formed on a plot for all points where i = j. Thisis the line having the slope of 1 which passes through the originand divides the plot area into two congruent triangles. Theremay appear other line structure(s) which are parallel to the LOI.Such a line in the plot is formed by points (1s) that are diago-nally adjacent with no white spaces (0s) in between. For exam-ple, if pairs of consecutive points [x(i), x(j)], [x(i + 1), x(j +1)], [x(i + 2), x(j + 2)], . . . , [x(i + N), x(j + N)] in multidi-mensional state space of a dynamical system exhibit the samedynamics, then the corresponding points placed in the recur-rence plot form a line parallel to the LOI. Percentage of pointsin these lines articulate how much structure of the state spacerepeats on consecutive points of the state.

3) Percent Ratio: Percent ratio is the ratio of percent deter-minism to percent recurrence in the plot. This quantity capturesthe extent to which the system state space is experiencing suddenvariations. Therefore, it is an effective indicator of sudden tran-sitions in state dynamics of a process. All of the aforementionedrecurrence plot parameters strongly highlight the presence ofhidden rhythms and determinism characteristics in data.

III. SIMULATION EXPERIMENT ENVIRONMENT

We first collect system call trace for normal execution ofeach daemon in the form of time-series data. Then, we simulate

Page 4: Dynamical System Theory for the Detection of Anomalous Behavior in Computer Programs

1582 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 42, NO. 6, NOVEMBER 2012

the code injection event for the daemon by changing its sourcecode and recompiling it into new executable binary. To mimiccode injection, the added code is a simple for-loop inserted intoone of the main source files of the daemon to invoke certainsystem calls repeatedly. The injected code causes the runtimeimage of the program to change and this aptly simulates the codeinjection phenomenon in such intrusions. strace utility availableon Linux is used to capture the system calls invoked in responseto certain input requests by each daemon, httpd, vsftpd, named,and cupsd, into a normal data text file. One of the source filesis then changed to add the injected code as described earlier.System call trace files for the new daemon executable, whichis abnormal now, in response to the same set of input requestsare then collected in another file. Later, with the system callmappings presented in /usr/include/asm/unistd.h on all Linuxplatforms, both normal and abnormal data files are convertedinto 1-D number time-series format.

IV. ANALYSIS APPROACH

Four analyses approaches as explained next are adopted tostudy the dynamical system behavior of each daemon [4].

A. Clustered Subsystem Approach

System calls from the trace are grouped or clustered accordingto their functionality such as network, disk operation, memory,access control check, etc. Each cluster is represented by a uniquenumber in the trace to form a new number time series. This traceis generated to capture daemon behavior at a more abstract level.Each such time series is subjected to process of embedding andthen dynamical system analysis where five behavior character-izing measures are computed at cumulatively increasing datalengths and plotted against corresponding data lengths.

B. Nonclustered Subsystem Approach

This approach represents each system call with unique num-ber. Thus, system call trace captured from the experiment ispreprocessed to get 1-D number time series. From this 1-D timeseries, the state space of daemon’s behavior is reconstructed bythe process of embedding and subjected to dynamical systemanalysis for cumulatively increasing data lengths.

C. Clustered Children Processes as a Subsystem Approach

Each child process for a daemon is treated as a dimensionor system variable of the main system. System call trace foreach child is preprocessed to get 1-D time series and clusteredaccording to the functionality (i.e., each system call number isreplaced by a unique number representing system call functioncategory). Each such series of equal length is combined to forma 2-D matrix of dimensions m × N , where m is the number ofchildren processes of daemon, and N is the data length of eachchild process trace. This matrix is then subjected to dynamicalsystem analysis without the process of embedding as we alreadyhave m-dimensional time series representing reconstructed statespace of the system.

Fig. 2. Dynamical system measures for httpd (m = 2, τ = 1).

D. Nonclustered Children Processes as a Subsystem Approach

In this approach, an m-dimensional matrix is created exactlyas previously, but without clustering the system calls.

Among the four approaches, it is observed that the nonclus-tered subsystem approach captures best system dynamics [4].The plots of dynamical system measures that are obtained withthis approach clearly distinguish daemon abnormal behaviorfrom the normal one. An important aspect to note is that differ-ent daemons require different lengths of data for the detection ofabnormal behavior. For brevity, we present the graphical resultsthat are obtained only for the nonclustered subsystem approachhere.

V. RESULTS AND OBSERVATIONS

A. Simulation Experiments

We investigate the dynamical system characteristics of httpd,vsftpd, cupsd, and named using the nonclustered subsystem ap-proach and present our simulation results in the form of graphs.A common convention used in all the graphs for daemons is thatthe square point curve corresponds to normal behavior systemcall data, and the asterisk curve represents abnormal behaviorsystem call data. Each figure shows five dynamical system mea-sures, approximate entropy, CTM, percent Recurrence, percentdeterminism, and percent ratio, with best embedding parametersm = 2 and τ = 1. For each of the daemons, the embedding pro-cess parameters, m is varied between 2 and 15, and τ is variedbetween 1 and 3.

The values of embedding dimension m and time delay τthat give the best possible results for shortest data lengths areselected. We determine the best result as the one leading tomaximum discrimination between pre- and postcode-injectionsystem call data. This discrimination needs to be achieved foras few data points as possible (i.e., the fewer the system callsneeded for detection, the quicker the detection). The graphs forhttpd system call trace for the embedding dimension (m = 2)and time delay (τ = 1) for the nonclustered subsystem approachare illustrated in Fig. 2. Figs. 3–5 show the graphs obtained for

Page 5: Dynamical System Theory for the Detection of Anomalous Behavior in Computer Programs

KANASKAR et al.: DYNAMICAL SYSTEM THEORY FOR THE DETECTION OF ANOMALOUS BEHAVIOR IN COMPUTER PROGRAMS 1583

Fig. 3. Dynamical system measures for vsftpd (m = 2, τ = 1).

Fig. 4. Dynamical system measures for named (m = 2, τ = 1).

vsftpd, named, and cupsd daemons using the same approach,respectively.

Preliminary results substantiate our claim that a thoroughdynamical system approach allows us to discriminate betweennormal daemon execution and the one with code injection. Forrepeatability of the results, we confirmed the validity of thesemeasures by repeating the whole procedure on two differenttimes. The behavioral changes due to code injection are reflectedprominently through the graphs presented. We get the best dis-crimination between normal and abnormal data for embeddingdimension (m = 2) and time delay (τ = 1).

From Fig. 3, we observe that approximate entropy for vsftpdis reduced after the code injection. The normal system call traceof vsftpd consists of system call sequences which are of verysmall lengths (i.e., 2 to 3) invoked in an irregular manner. Theinjected code invokes three system calls 20 times repeatedly.This explains why the approximate entropy (i.e., informationcomplexity) of the normal trace is higher than the injected code.

Fig. 5. Dynamical system measures for cupsd (m = 2, τ = 1).

Percent determinism is in agreement with approximate entropy,where the complexity decreases after the code injection anddegree of the determinism increases.

The percent ratio graph shows that the rate of change ofpercent determinism with respect to percent recurrence afterthe code injection is higher than that for normal trace. Thisreflects the fact that a bigger fraction of consecutive state-spacepoints repeat system dynamics with respect to a given fractionof overall repetitions. It is observed at higher values of m thatthis parameter reverses its behavior—therefore, the value forpostcode injection is lesser than the value for precode injection.

Let us have a look at the behavior profile observed for nameddaemon as exhibited in Fig. 4. Approximate entropy decreasesfor both the normal and abnormal scenarios when m increases.This behavior reverses for higher embedding dimensions.

The absolute difference between the values for both the sce-narios decreases as the embedding dimension and delay valuesincrease. This means that the regularity of system call sequencesreduces when embedded in higher dimensions and delays. Bestdiscrimination between the two scenarios is achieved for m = 2,and τ = 1 (see Fig. 4). Percent recurrence also reduces afterthe code injection event because, a highly regular behavior ofnamed is changed by the new sequence of system calls added tothe source code. This means that lesser number of state-spacepoints repeat previously exhibited system dynamics. The nameddaemon when configured as a caching-only name server has avery well defined action protocol. It forwards all the requests itreceives to primary or slave domain name servers mentioned inthe named.conf file. That is why its system call trace has a highdeterminism in its normal behavior. The source code is changedto add a loop of system calls—symlink, access, unlink—whichare a completely new set of actions for the daemon. Hence, withthe addition of this code, the percent determinism value of thedaemon system call trace decreases.

The graphs (see Fig. 5) for cupsd indicate that approximateentropy almost fails to distinguish the two behaviors—pre- andpostcode injection—of cupsd. Observation of the normal trace

Page 6: Dynamical System Theory for the Detection of Anomalous Behavior in Computer Programs

1584 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 42, NO. 6, NOVEMBER 2012

of cupsd reveals that it contains highly regular patterns of systemcalls whose regularity is even more than the regularity of systemcalls inserted by the injected code.

This explains why the normal behavior approximate entropyis smaller than the one for injected code. Percent determinismshows noticeable difference for the two scenarios. This differ-ence can be explained as follows: The trace has many systemcalls consecutively placed making the state-space points repeaton consecutive instances giving rise to the normal behavior per-cent determinism value. By addition of the new code with a newset of repetitive system calls, the number of such instances wheredynamics repeat on consecutive points decreases, thus resultingin the decreased values of percent determinism. The originalcode of cupsd has a set of system calls which are invoked witha high regularity to give a high percentage of state-space pointsrepeating previous dynamics. When the new code is injected,percentage of these points reduce as the new system calls aredifferent from the normal behavior system calls and their fre-quency being lower than the normal ones.

B. Experiments and Results With the DARPA-98 Dataset

In this section, we evaluate our dynamical system analysis ap-proach on the DARPA-98 dataset. In this experiment, we extendour intrusion detection approach to consider the system call ar-gument values. Although quite a few critiques of the DARPA-98data generation methods have emerged in the literature [22], weprovide here our analysis of the same to substantiate dynamicalsystem theory approach. Host-level system call audit data are inthe form of collection of audit records. Each record correspondsto a system call and the associated parameters. We select pathargument accessed by a system call for processing as it actsas a significant indicator of the system calls’ intentions and inturn behavior dynamics of particular application that invokesthe system calls. Each path argument string is converted into aunique number employing the following procedure:

1) extract each character from the string, c;2) get the ASCII code for the character, n;3) multiply n by 2i , where i is the index of the character in

the string;4) Add the above product term for each character to give a

final numeric representation of the string.Number time series generated in this manner is subjected to

dynamical system analysis method. Thus, this time series pre-cisely represents a pattern of system paths—system resource—accessed over a given time period on a host machine by differentapplications. We present detailed analysis of the results obtainedhere. In Fig. 6, each graph contains two curves: The solid blueline represents path access pattern for the normal execution ofthe telnet session captured in a bsm trace file, and the dashed redcurve corresponds to an instance of eject attack which is causedby a buffer overflow vulnerability in the eject utility on Solaris2.5 platforms. The X-axis represents cumulative data lengthsprocessed, and the Y -axis represents the values of different dy-namical system measures. Sudden sharp overshoot exhibited bythe dashed red curve is the data region which corresponds to theattack instance. In the absence of the attack, the curve would be

Fig. 6. Dynamical system measures for path argument (m = 2, τ = 1).

a continuous solid blue curve without the overshoot. The nor-mal telnet session behavior curve is obtained by replacing theattack session data by a normal telnet session. The overshootin all the graphs indicates with reinforced confidence that thereis something abnormal the system is going through, and hencedeserves attention. A few points after the overshoot, the curvealmost becomes a straight line which amounts to the noticeableamount of increased determinism in the system. This proves ourbasic hypothesis about system determinism as the deciding fac-tor to identify abnormal changes in system dynamics caused byintrusions. Each graph consists of 100 points on the plot, eachof which corresponds to the addition of 100 actual data pointsto the original length. Analysis starts from the data length equalto 500; thus, a total of 96 points are plotted on each graph.The overshoot is observed at the 69th point which correspondsto actual data points from 330 200 to 330 300. On studyingcorresponding data from the bsm audit files, it shows repeatedinvocation of system call open and close whose path argumentsdo not exist. This kind of activity is typically perceived to bethe hallmark before any code injection incident. Each of thesesystem calls does not perform any activity and acts as a NOPinstruction. ffbconfig and fdformat are the other two buffer over-flow vulnerabilities which are detected by our analysis methods.

C. Experiment With a Real-World Exploit

In consideration of the growing number of critiques inDARPA-98 experiments, it becomes imperative to evaluate thedynamical system approach on real-world exploits. For this pur-pose, we selected ProFTPD 1.3.0 as our target application to beexploited. Details of a working exploit for this daemon can befound in [23]. ProFTPD 1.3.0 installed by default in Ubuntu6.10 with kernel 2.6.17 is known to have this vulnerability. Wecollect normal behavior system call trace of proftpd in responseto some routine ftp operations on the daemon. Then the bufferoverflow exploit is run on the daemon to obtain a command shell.

Page 7: Dynamical System Theory for the Detection of Anomalous Behavior in Computer Programs

KANASKAR et al.: DYNAMICAL SYSTEM THEORY FOR THE DETECTION OF ANOMALOUS BEHAVIOR IN COMPUTER PROGRAMS 1585

Fig. 7. Dynamical system measures for proftpd daemon, with attack data afterfirst 100 healthy data (m = 2, τ = 1).

Fig. 8. Dynamical system measures for proftpd daemon, with attack data afterfirst 300 healthy data (m = 2, τ = 1).

Corresponding system call trace is captured in a different file.System call trace is transformed into 1-D numeric time seriesusing the system call mapping file in /usr/src/linux/include/asm-i386/unistd.h. Through experiments, we find that the nonclus-tered subsystem approach for analysis gives the best discrimina-tion between the normal and attacked processes of the daemon.Normal behavior system call trace for the daemon is analyzedfor the data length of 1000 points. The process created as a resultof the attack is captured and analyzed for 181 data points. Forthe purpose of our analysis, attack process system call trace ismixed with the system call trace of a normal process at knownlocations (i.e., time points). The mixed and normal traces arethen analyzed using dynamical system behavior measures, andcorresponding graphs are plotted. Fig. 7 shows all the dynamicalsystem measures for normal trace and mixed trace when attacktrace is inserted after first 100 normal trace points. Fig. 8 showsdynamical system measures when attack data are inserted after

Fig. 9. Dynamical system measures for proftpd daemon, with attack data afterfirst 500 healthy data (m = 2, τ = 1).

Fig. 10. Comparing dynamical system measures between healthy and attackedproftpd daemons (m = 2, τ = 1).

first 300 normal data points. Fig. 9 shows dynamical systemmeasures when attack data are inserted after first 500 normalpoints.

After the successful execution of attack, each command runon the shell forks a new process on the server. In our case, fivesuch processes are created. We capture the system call traceof all five child-processes and combine them (i.e., accordingto the timestamp of each system call) into one file to comparetheir dynamical system behavior with the normal process. Cor-responding graphs are shown in Fig. 10.

For the aforementioned analysis, normal behavior system calltrace obtained at two different times is plotted along with thecombined five processes which measured after a successful at-tack execution. Our motive is to substantiate the consistent dif-ference between the dynamical system measures of attacked pro-cesses and normal processes. All processes’ system call tracesare generated in response to the same sequence of commands

Page 8: Dynamical System Theory for the Detection of Anomalous Behavior in Computer Programs

1586 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 42, NO. 6, NOVEMBER 2012

run from the ftp command shell. We mix the attack process datawith the normal one at known locations to check if differencesbetween the measures are projected at corresponding points inthe graphs. We can distinctly locate from the graphs in Figs. 7–9 that attack process trace starts its deviations from the normaltrace from 100, 300, and 500 points onward, respectively.

VI. COMPARATIVE ANALYSIS

To distinguish our proposed approach, in this section we re-late our approach with some of the relevant research approachesthat are proposed in the literature. Hofmeyr et al. proposed thedefinition of normal application behavior using small uniquesequences of system calls [5]. Using a simplistic measure, theHamming distance, they identified abnormal sequences. Theyunknowingly utilized the process of embedding by defining adatabase of normal system call sequences of fixed length. How-ever, they warn that the Hamming distance is not a formallydefined and proven metric to determine the abnormality of se-quences. Thus, the simplicity advantage is overshadowed bythe lack of proper formal analysis of the system call layer. Ourapproach is based on widely accepted techniques that are em-ployed successfully in diverse fields [11], [12]. Hofmeyr andKosoresow defined variable-length sequence macros and deter-ministic finite automata (DFA) to define the normal behavior ofan application [6]. The DFA tended to be quite large for largeapplications like sendmail. There was no ideal method definedto create the DFA, and it was created manually for any applica-tion to be tested. Our analysis approach can be fully automated,and is indeed one future improvement planned for our work.Jones and Li incorporated temporal information between sys-tem call pairs in the database [24]. Under the assumption ofnormal distribution for system call timing, they ignored someof the normal behavior (e.g., I/O) system call timing informa-tion because of the large variance. We accommodate all systemcall sequences for our analysis except library calls, and someolder system calls. Cabrera et al. came up with the concept ofanomaly dictionary consisting of anomalous sequences for theclassification of anomalies [25]. This feature was built upon thedictionary of normal sequences proposed by Hofmeyr [5]. Thisapproach required the creation of the anomaly dictionary fromknown anomaly sequences. Our approach, which is based on dy-namical system analysis techniques, is independent of any suchdatabase of anomalous sequences. Nguyen et al. developed abuffer overflow attack detection system with Linux kernel mod-ification [26]. Their system is based on a database of all childrenprocesses forked by a given process. The main drawback of theirmethod was that there were many processes whose childrenprocesses could not be ascertained a priori. The training periodneeded for their prototype was of three months which was quite along time. Qiao et al. put forth the hidden Markov model (HMM)for application behavior profiling [27]. The number of states inHMM was determined experimentally as there was no formalmethod for doing that. The training process of HMM took a longtime.

In comparison, we have shown that a dynamical system anal-ysis approach can fare well in many respects. Experimentally,

we have determined that the minimum number of data pointsrequired for the method to be successful is around 100. Second,this method does not require a full-fledged database to storenormal application sequences. Third, and very importantly, themethod is independent of the timing characteristics of systemcall. Foundation of the dynamical system approach to intrusiondetection is the coexistence of determinism and some random-ness. Any intrusion activity which causes sudden deviationsin the balance among the former two can be detected by ouranalysis approach.

Mutz et al. adopted an entirely different methodology from allof the aforementioned approaches in that they formulated systemcall argument models in terms of their lengths and characterdistribution [9]. They did not take system call sequences intoconsideration. They focused especially on detection of mimicryattacks which cannot be detected by most of the aforementionedresearch approaches.

We have incorporated system call argument analysis into ourapproach as an extension to our previous work [4]. The primaryadvantage of our approach is dynamical system analysis whichdoes not require an extensive training phase. In our implemen-tation, preprocessing is employed to get 1-D time series fromsystem call traces of desired lengths in all our experiments—simulation experiments with various daemons, the DARPA-98 bsm audit files, and the real-world proftpd exploit. Then,the dynamical system measures and characteristics graphs aregenerated in MATLAB. There exists a known issue of false-positive alarms for our approach when applied on the DARPA-98 dataset. These are attributed to the rapid changes occurringin a system during the startup phase of an application. Finetuning of the implementation is required to overcome this draw-back. This remains to be our important future enhancement.Another distinct feature of our research turns out to be analy-sis of the real-world exploit data published more recently. Werun a successful buffer overflow exploit against an FTP dae-mon to collect system call traces—dynamical system states ofthe daemon—to compute dynamical system measures duringattack execution phase and compare their values with the mea-sures for normal behavior. Thus, our analysis and results arebased not only on simulation data, but also on real-world attackdata. Future expansion of our research work, invariably, is di-rected toward evaluating more real-world attack data on differentapplications.

VII. DISCUSSION

It is imperative to do performance evaluation of any newapproach to a research problem from scientific perspective alongtwo dimensions: reliability and validity. Industry acceptanceof a new intrusion detection system (IDS) approach cruciallydepends on the scientific rigor of the approach established on thebasis of these two metrics. Tavallaee et al. further convert thesetwo metrics into three factors that are suitable for IDS approachevaluation—the employed data, the performed experiments, andthe performance evaluation [28]. Hence, an evaluation of ourdynamical system approach is presented here on the basis ofthese three factors.

Page 9: Dynamical System Theory for the Detection of Anomalous Behavior in Computer Programs

KANASKAR et al.: DYNAMICAL SYSTEM THEORY FOR THE DETECTION OF ANOMALOUS BEHAVIOR IN COMPUTER PROGRAMS 1587

A. Employed Data

We use three types of data as inputs to our dynamical systemapproach. First, we perform simulation experiments using dae-mon applications (i.e., httpd, vsftpd, named, and cupsd). Theseapplications exhibit a well-defined behavior pattern as comparedto user applications. These daemon applications, if not config-ured correctly, may be compromised to attain system-level ac-cess by an attacker. An attack most invariably changes daemon’sbehavior in terms of system call pattern executed. This changedsystem call pattern is simulated by adding certain system callsin the source code of the daemon and passing the same inputrequests to newly compiled daemon executable. We use Linuxstandard configuration file unistd.h to convert system call traceinto a number series. As system call argument captures systemcall semantics at a very fine level, we used system call argumentdata from the DARPA-98 host-level bsm audit data set. Thebsm audit file is preprocessed to capture only path argument forsystem calls, and these are then converted into a number timeseries by employing a string to unique number conversion algo-rithm. Third, we executed an actual exploit against the ProFTPDserver on a vulnerable Ubuntu system and collected system calltrace from the compromised process. Our data consist of a sys-tem call sequence and a system call argument which representapplication behavior fairly accurately.

B. Performed Experiments

All the experiment procedures have been clearly explainedin the previous sections. In the first and third data preparationmethods, the Linux standard system call mapping file—unistd.h(i.e., from respective Linux distributions)—has been utilized toconvert system call trace into number time series.

C. Performance Evaluation

Efficiency of our dynamical system approach is described interms of time used to convert system call traces to number seriesand time used to produce the dynamical system measures. Allof these procedures are completed within an average of 2–10 sin MATLAB. Although MATLAB is a great tool for numericanalysis, there are certainly limitations in terms of executionspeed and performance. Therefore, we can reasonably expect asignificant performance boost when using other programminglanguages such as C or C++. In Table I, we illustrate the exe-cution time to calculate each dynamical system measure in bothMATLAB and C++. And we can see that the C++ implemen-tation is significantly faster than its counterpart in MATLAB(i.e., the C++ implementation is in average of six to eighttimes faster than the MATLAB implementation). System calltrace length considered varies from 500 to 2000; the dynami-cal system measures considered are the approximate entropy,the percent recurrence and the percent determinism from therecurrence plot (i.e., the percent ratio is merely the ratio of per-cent determinism to percent recurrence in the plot), as well asthe CTM; and the parameters considered for all measures arem = 2 and t = 1. It is obvious that the execution time increaseswhen the length of the system call trace considered increases.

TABLE IPERFORMANCE EVALUATION: MATLAB VERSUS C++

However, based on the results of the experiments, 500 systemcalls are sufficient to differentiate an intrusion from a normalprocess in most cases, where the total execution time of com-puting all dynamical system measures for 500 time points ismerely 0.02 s in C++.

Through the experiments, we have found that the dynamicalsystem approach can be utilized to model application normalbehaviors. Deviations from these normal behaviors, which arecaused by intrusions, can be effectively identified from dynam-ical system measures. This has been validated on simulatedintrusion and a real-world exploit data.

Anomaly in our experiments has been considered as any be-havior of an application which causes considerable deviation indynamical system measures from the established normal values.We do not use any data reduction technique such as samplingwhich can affect the IDS performance. Regarding ratio of abnor-mal data to normal data in the dataset, we have not consideredany specific ratio. Our goal is to test and validate our approach onas many real-world exploits as possible which can constitute dif-ferent ratios of abnormal to normal data. The dynamical systemapproach used by us depends upon two parameters employed inthe embedding phase—embedding dimension and delay. Deter-mination of these parameters for best system dynamics approx-imation is done empirically as there does not exist a universalmethod for doing the same. Parameters which render best differ-entiation between normal and abnormal behaviors are selectedin our study. We have performed first and third experiments atdifferent times to collect different data and analyzed using ourdynamical system approach. For both the times, the results ob-tained in terms of deviations in the dynamical system measuresfrom graphical plots are found to be exactly similar.

VIII. CONCLUSION

Treating programs as blackbox dynamical systems whose be-havioral characteristics change with code injection and applyinga set of dynamical system measures gives promising results. Al-though system call sequence analysis is pivotal to identifying

Page 10: Dynamical System Theory for the Detection of Anomalous Behavior in Computer Programs

1588 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 42, NO. 6, NOVEMBER 2012

anomalous behaviors of an application, it is the minimum lengthof the trace to be analyzed that is the key to decide whether itis an anomalous sequence. The minimum data length (e.g., thenumber of system calls) needed for the detection of anomalousbehavior varies for different daemons. Through our experiments,we have seen that a data length of 300 system calls was enoughto detect anomalous behavior in all of the daemons we utilized.

The proposed approach requires profiling of applications inthe sense that threshold values for the utilized measures must bedefined for programs/daemons, which will be monitored. Thisrequirement is not a serious disadvantage as the threshold valuescan be calculated shortly after programs’ installation. Focusingon the asset to be protected (i.e., in this case the program) iseasier than focusing on the large number of malware whichform a threat to the application. Therefore, the effort necessaryfor computing the threshold values is much less than identifyingspecific malware and developing signatures to protect againstthem.

We initially tested the dynamical system approach on simula-tion experiments using httpd 2.2.2, cups 1.2.7, vsftp 2.0.5, andbind 9.3.3 on Fedora Core 4 Linux. After obtaining promisingresults, we validated our approach using the DARPA-98 dataset.Finally, we confirmed the usefulness of the proposed approachusing a real-world exploit present proftpd 1.3.0 on Ubuntu 6.10.

IX. FUTURE WORK

We have plans to conduct the same experiments in platformsother than Linux, such as Windows and other Unix-based oper-ating systems. We are currently working on experiments for theWindows platform, and the initial results confirmed our findings.

Furthermore, a natural extension of our current work is tobuild a classification model using dynamical system measuresas feature vectors. Further research efforts are needed in findingthe best classification algorithms, feature selection/reductionmethods, and conducting a set of comprehensive experimentsto measure the predictor, in terms of its prediction accuracy,precision and recall, sensitivity and specificity.

REFERENCES

[1] (2009). 2009 threat predictions. McAfee Labs [Online]. Available: http://www.mcafee.com / us / local_content / reports / 2009_threat_predictions_report.pdf

[2] (2010) Mcafee threats report: Fourth quarter 2010. McAfeeLabs [Online]. Available: https://secure.mcafee.com/us/resources/reports/rp-quarterly-threat-q4-2010.pdf

[3] (2011) Mcafee threats report: First quarter 2011. McAfee Labs[Online]. Available: https://secure.mcafee.com/us/resources/reports/rp-quarterly-threat-q1-2011.pdf

[4] N. Kanaskar, R. Seker, and S. Ramaswamy. (2007). “A dynami-cal system approach to intrusion detection using system call analy-sis,” in SEKE, Knowledge Systems Institute Gradute School. [Online].pp. 710–717. Available: http://dblp.uni-trier.de/db/conf/seke/seke2007.html#KanaskarSR07

[5] S. A. Hofmeyr, S. Forrest, and A. Somayaji. (1998, Aug.) “Intru-sion detection using sequences of system calls,” J. Comput. Secur.[Online]. vol. 6, pp. 151–180. Available: http://portal.acm.org/citation.cfm?id=1298081.1298084

[6] A. P. Kosoresow and S. A. Hofmeyr. (1997, Sep.). “Intrusion detection viasystem call traces,” IEEE Softw.. [Online]. 14(5), pp. 35–42. Available:http://portal.acm.org/citation.cfm?id=624621.625758

[7] D. J. Fried, I. Graf, J. W. Haines, K. R. Kendall, D. Mcclung, D. Weber,S. E. Webster, D. Wyschogrod, R. K. Cunningham, and M. A. Zissman,“Evaluating intrusion detection systems: The 1998 DARPA off-line intru-sion detection evaluation,” in Proc. DARPA Inf. Survivability Conf. Expo.,2000, pp. 12–26.

[8] S. Bhatkar, A. Chaturvedi, and R. Sekar. (2006). “Dataflow anomalydetection,” in Proceedings of the 2006 IEEE Symposium Security Pri-vacy. [Online]. pp. 48–62. Available: http://portal.acm.org/citation.cfm?id=1130235.1130362

[9] D. Mutz, F. Valeur, G. Vigna, and C. Kruegel. (2006, Feb.). “Anomaloussystem call detection,” ACM Trans. Inf. Syst. Secur.. [Online]. 9(1), pp. 61–93. Available: http://doi.acm.org/10.1145/1127345.1127348

[10] G. Tandon and P. Chan, “Learning rules from system call argumentsand sequences for anomaly detection,” in Proc. Workshop Data MiningComput. Security, 2003.

[11] S. Sharma, “An exploratory study of chaos in human–machine systemdynamics,” IEEE Trans. Syst., Man, Cybern., A, vol. 36, no. 2, pp. 319–326, Mar. 2006.

[12] S. H. Strogatz. (2001, Jan.). Nonlinear Dynamics and Chaos: WithApplications to Physics, Biology, Chemistry, and Engineering (Stud-ies in Nonlinearity). (1st ed.), Westview Press, [Online]. Available:http://www.worldcat.org/isbn/0738204536

[13] W. A. Brock, D. A. Hsieh, and B. LeBaron, Nonlinear Dynamics, Chaos,and Instability: Statistical Theory and Economic Evidence. Cambridge,MA: MIT Press, Oct. 1991.

[14] F. Takens. (1985). “On the numerical determination of the dimension ofan attractor,” in Dynamical Systems and Bifurcations (Lecture Notes inMathematics) [Online]. 1125, pp. 99–106. Available: http://dx.doi.org/10.1007/BFb0075637

[15] M. Small, Applied Nonlinear Time Series Analysis: Applications inPhysics, Physiology and Finance (World Scientific Series on NonlinearScience). Singapore: World Scientific, Mar. 2005.

[16] S. M. Pincus, “Approximate entropy as a measure of system complexity,”Proc. Nat. Acad. Sci., vol. 88, pp. 2297–2301, 1991.

[17] M. Akay, “Influence of the vagus nerve on respiratory patterns during earlymaturation,” IEEE Trans. Biomed. Eng., vol. 52, no. 11, pp. 1863–1868,Nov 2005.

[18] M. Cohen, D. Hudson, and P. Deedwania, “Applying continuous chaoticmodeling to cardiac signal analysis,” IEEE Eng. Med. Biol. Mag.,, vol. 15,no. 5, pp. 97–102, Sep./Oct. 1996.

[19] D. Hudson, M. Cohen, and P. Deedwania, “Classification of heart failurepatients using continuous chaotic modeling,” in Proc. 18th World Congr.Med. Phys. Biomed. Eng., 1997.

[20] J.-P. Eckmann, S. O. Kamphorst, and D. Ruelle, “Recurrence plotsof dynamical systems,” Europhys. Lett., vol. 4, pp. 973–977,1987.

[21] C. Webber and J. Zbilut, “Dynamical assessment of physiological systemsand states using recurrence plots strategies,” J. Appl. Phys., vol. 76, no. 2,pp. 965–973, Feb. 1994.

[22] J. Mchugh, (Nov. 2000). “Testing intrusion detection systems: A critiqueof the 1998 and 1999 DARPA intrusion detection system evaluationsas performed by lincoln laboratory,” ACM Trans. Inf. Syst. Secur. [On-line]. 3(4), pp. 262–294. Available: http://doi.acm.org/10.1145/382912.382923

[23] CoreLabs Research. (2006). ProFTPD controls buffer overflow. [Online].Available: http://www.coresecurity.com/content/proftpd-controls-buffer-overflow

[24] A. Jones and S. Li. (2001). “Temporal signatures for intrusion detection,”in Proceedings of the 17th Annual Computer Security Applications Con-ference. [Online]. p. 252. Available: http://portal.acm.org/citation.cfm?id=872016.872185

[25] J. B. D. Cabrera, L. Lewis, and R. K. Mehra. (2001, Dec.). “Detec-tion and classification of intrusions and faults using sequences of sys-tem calls,” ACM SIGMOD Rec. [Online]. 30(4), pp. 25–34. Available:http://doi.acm.org/10.1145/604264.604269

[26] N. Nguyen, P. Reiher, and G. Kuenning, “Detecting insider threats bymonitoring system call activity,” in Proc. IEEE Syst., Man and Cybern.Soc. Inf. Assurance Workshop, Jun. 2003, pp. 45–52.

[27] Y. Qiao, X. Xin, Y. Bin, and S. Ge, “Anomaly intrusion detection methodbased on HMM,” Electron. Lett., vol. 38, no. 13, pp. 663–664, Jun.2002.

[28] M. Tavallaee, N. Stakhanova, and A. Ghorbani, “Toward credible eval-uation of anomaly-based intrusion-detection methods,” IEEE Trans.Syst., Man, Cybern., C, Appl. Rev., vol. 40, no. 5, pp. 516–524, Sep.2010.

Page 11: Dynamical System Theory for the Detection of Anomalous Behavior in Computer Programs

KANASKAR et al.: DYNAMICAL SYSTEM THEORY FOR THE DETECTION OF ANOMALOUS BEHAVIOR IN COMPUTER PROGRAMS 1589

Nitin Kanaskar received the M.S. degree in appliedscience in 2006 and is working toward the Ph.D. de-gree at the University of Arkansas at Little Rock,Little Rock.

He is currently an Application System Ana-lyst with the IT Research Department, Universityof Arkansas for Medical Sciences, Little Rock. Hisresearch interests include information security, ma-chine learning, anomaly detection, knowledge dis-covery, and analysis.

Remzi Seker (M’96) received the B.Sc. and M.S.degrees from the Department of Electrical and Elec-tronics Engineering, University of Cukurova, Adana,Turkey. He received the Ph.D. degree is in com-puter engineering from the University of Alabama atBirmingham, Birmingham.

He is a Professor in the Electrical Computer Soft-ware and Systems Engineering (ECSSE) Departmentat Embry-Riddle Aeronautical University. His re-search interests include safety and security criticalsystems and computer forensics. He coauthored one

of the first papers published on mobile phishing.

Jiang Bian (M’10) received the M.S. degree in com-puter science and the Ph.D. degree in integrated com-puting both from University of Arkansas at LittleRock, Little Rock, in 2007 and 2010, respectively.

He is currently an Assistant Professor of Biomed-ical Informatics with the University of Arkansas forMedical Sciences, Little Rock. His research inter-ests include big-data analytic, brain connectivity net-works, computational neuroscience, machine learn-ing, anomaly detection, secure distributed file system,and knowledge discovery and analysis.

Vir V. Phoha (M’96–SM’02) received the Ph.D. de-gree in computer science from Texas Tech University,Lubbock, in 1992. He is currently a W. W. Chew En-dowed Professor of computer science and directorof the Center for Secure Cyberspace with LouisianaTech University, Ruston. His research interest in-cludes Web and Internet security, machine learning,anomaly detection, fault mitigation in software sys-tems, and knowledge discovery and analysis.