A Graph Based Approach Toward Network Forensics Analysis › ~fortega › spring17 › df › research › a4-wang.pdfA Graph Based Approach Towards Network Forensics Analysis · 4:

4

A Graph Based Approach Toward NetworkForensics Analysis

WEI WANG and THOMAS E. DANIELSIowa State University

In this article we develop a novel graph-based approach toward network forensics analysis.Central to our approach is the evidence graph model that facilitates evidence presentation andautomated reasoning. Based on the evidence graph, we propose a hierarchical reasoning frame-work that consists of two levels. Local reasoning aims to infer the functional states of networkentities from local observations. Global reasoning aims to identify important entities from the

graph structure and extract groups of densely correlated participants in the attack scenario. Thisarticle also presents a framework for interactive hypothesis testing, which helps to identify theattacker’s nonexplicit attack activities from secondary evidence. We developed a prototype sys-tem that implements the techniques discussed. Experimental results on various attack datasetsdemonstrate that our analysis mechanism achieves good coverage and accuracy in attack groupand scenario extraction with less dependence on hard-coded expert knowledge.

Categories and Subject Descriptors: K.6.5 [Management of Computing and Information

Systems]: Security and Protection

General Terms: Security

Additional Key Words and Phrases: network forensics, evidence graph, hierarchical reasoning

ACM Reference Format:

Wang, W. and Daniels, T. E. 2008. A graph based approach towards network forensics analysis.ACM Trans. Inf. Syst. Secur. 12, 1, Article 4 (October 2008), 33 pages. DOI = 10.1145/1410234.1410238. http://doi.acm.org/10.1145/1410234.1410238.

Authors’ addresses: W. Wang, and T. E. Daniels, Department of Electrical and Computer Engi-neering, Iowa State University, Ames, IA 50011; emails: {weiwang, daniels}@iastate.edu.This work is supported by the National Science Foundation under Grant No.0627409, the DOI un-der contract No. NBCHC030107, and by the Iowa State University Information Assurance Center.A preliminary version of this article appeared as “Building Evidence Graphs for NetworkForensics Analysis” in proceedings of the 21st Annual Computer Security Applications Conference(ACSAC’05), Tucson, AZ., December 2005. A demonstration of our prototype system was presentedat the 5th Digital Forensics Research Workshop (DFRWS’05), New Orleans, LA, August 2005.Permission to make digital or hard copies of part or all of this work for personal or classroom useis granted without fee provided that copies are not made or distributed for profit or direct com-mercial advantage and that copies show this notice on the first page or initial screen of a displayalong with the full citation. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credits is permitted. To copy otherwise, to republish, to poston servers, to redistribute to lists, or to use any component of this work in other works requiresprior specific permission and/or a fee. Permissions may be requested from the Publications Dept.,ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, [email protected]© 2008 ACM 1094-9224/2008/10-ART4 $5.00 DOI: 10.1145/1410234.1410238. http://doi.acm.org/

10.1145/1410234.1410238.

ACM Transactions on Information and Systems Security, Vol. 12, No. 1, Article 4, Pub. date: October 2008.

4: 2 · W. Wang and T. E. Daniels

1. INTRODUCTION

Networks today are plagued by the increasing scale and impact of cyberattacks. Although various network security techniques such as firewalls andintrusion detection systems have been developed for detection and preventionof attacks, these defensive mechanisms are not sufficient to eliminate cyberattack threats. It is important to develop effective post-hoc investigation mech-anisms that hold attackers responsible for their malicious actions—this is therealm of network forensics. More formally, network forensics is a subfield ofdigital forensics which aims to identify suspicious entities in the attack sceneand reconstruct stepwise actions of the attacker by reasoning with intrusionevidence captured from networked environments. The area of network foren-sics presents a rich problem space, including evidence collection, preservation,analysis and presentation. This article focuses on the analysis phase of net-work forensics.

Our work is motivated by the challenges in network forensics analysis.Below we summarize two major technical challenges in the field:

(1) Forensic analysts are overwhelmed by huge volumes of low-quality evi-dence. Evidence from standard security sensors such as IDS alerts containlarge amounts of “background noise” that comes from false positives trig-gered by benign activities and irrelevant attacks that are not part of theforeground attack scene of interest. Both tend to obscure the important in-formation for identifying attackers and reconstructing the attack scenario.

(2) Cyber attacks are becoming increasingly sophisticated. There exist manyvariations of multi-stage attacks that consist of several evolving phases,span over a large number of hosts, and utilize different strategies. More-over, evidence of attacks are distributed across various evidence sourceswhile each could only report isolated steps of a complex attack scenario. Toreveal a comprehensive view of what occurred, analysts have to go throughevidence from heterogeneous sensors and make the correlation based onexpert knowledge.

In contrast to these challenges, current practices in network forensic analy-sis are still mainly done by manual ad-hoc methods, a time-consuming anderror prone process [Institute for Security Technology Studies 2004]. Thereexists an increasing need for effective, automated, and extensible analysismethods. In essence our research studies the problem of: How to effectively

analyze large volumes of noisy forensic evidence to identify entities and events

relevant to the attack scene in a systematic approach?

The main contribution of this article is a novel graph-based approach to helpinvestigators discover coordinated attack scenarios from large amount of noisyintrusion evidence. A flexible preprocessing mechanism is developed to reducethe volume and redundancy in collected intrusion evidence. We propose theevidence graph model to transform evidence from heterogeneous sources intoan intuitive graph presentation. The evidence graph also provides a friendlyinterface for incorporating expert knowledge and out-of-band information intothe analysis process. Based on the evidence graph, we design a hierarchical


A Graph Based Approach Towards Network Forensics Analysis · 4: 3

reasoning framework that integrates forensics analysis from two differentviews. Local reasoning is based on fuzzy inference with domain knowledgewhile global reasoning is based on graph structure analysis and clusteringmethods. We implement a prototype system of the proposed techniques andpreliminary experimental results with several attack datasets indicate thepotential of our approach.

The rest of the article is organized as follows. The next section discussesrelated work. Section 3 presents the basic architecture of our forensics analysismechanism. Section describes the evidence preprocessing procedure. Section5 proposes our evidence graph model. Section 6 describes our hierarchicalreasoning framework. In Section 7, we provide experimental results in supportof our approach. Section 8 concludes this article and discusses future work.

2. RELATED WORK

In the following, we review existing works in network forensics in Section 2.1.Section 2.2 discusses related IDS and alert correlation methods. Section 2.3surveys related attack graph methods in network security analysis.

2.1 Existing Network Forensics Works

To our knowledge, little work has been done in automated network forensicsanalysis. Most mature forensics investigation tools like EnCase [EnCase] andSafeback [Safeback] focus on capture and analysis of evidence from storagemedia on a single host. Commercial tools like eTrust network forensics tool[eTrust] and NetDetector [NetDetector] captures raw network data and inves-tigate breaches inside cooperate networks, however the analysis procedure isstill mostly manual and ad hoc.

ForNet [Shanmugasundaram et al. 2003] is a distributed logging mecha-nism that could be deployed in wide area networks to aid network forensics.ForNet transforms raw network data into succinct forms named Synopsis thatcan be stored for a prolonged period of time for forensic analysis. However, theForNet work is limited to the evidence capture stage and does not address thecritical problem of how to automate the process of reasoning attack scenariosfrom saved synopsis.

Brian Carrier described a model of event reconstruction process for digi-tal crime investigation [Carrier and Spafford 2004]. Although the work ismostly theoretical, it sheds light on the requirements for developing tools thattransform collected evidence into digital crime scenes. Our work is a startingpoint to explore the technical considerations of automated network forensicsanalysis.

2.2 IDS and Alert Correlation Methods

Though intrusion detection systems (IDS) are not developed for forensicanalysis, investigators have recognized that IDS’s are an important evidencesource for network forensics because it is probably the most likely candidate to



provide first-hand information about security violations in the network. IDSsare generally classified into two categories: anomaly detection and misusedetection [Debar et al. 1999]. They all have intrinsic limitations: anomalydetection techniques suffer from high false positive rates as anomalous be-havior is benign in many cases. Misuse detection techniques rely on prede-fined patterns to detect attacks therefore they are not able to catch unknownattacks. Network forensics analysis cannot solely rely on IDS’ as they onlycatch known attacks or unusual behavior while many attacks are neitherknown nor anomalous.

The volume and base-rate fallacy of IDS alerts makes it difficult for foren-sics investigators to identify a clear picture of what happened. Therefore alertcorrelation methods have been advocated as a means to help identify undergo-ing attacks on a higher abstraction level. By linking alerts that satisfy certainrelationships together, alert correlation aims to reduce the volume of alertsand suppress the effect of false positives. Past work in alert correlation can beclassified into the following categories.

Correlation methods based on similarity of alert attributes [Valdes andSkinner 2001; Julisch 2001, 2003; Cuppens 2001; Dain and Cunningham2001a] are based on the notion alerts that belong to the same attack of-ten have similar attributes. Valdes and Skinner [Valdes and Skinner 2001]defined attributes to be used for comparison, how to evaluate the extent ofsimilarity and the approach to assign weights for different attributes. We notethat correlation of this category can be better depicted as clustering and aggre-gation. However, the correlation is not able to capture the causal relationshipsbetween events. Thus the system in itself is not sufficient to reconstruct anattack scenario.

The second category of correlation methods are based on attack scenar-ios pre-defined by experts [Debar and Wespi 2001; Morin and Debar 2003]or learned from training datasets [Dain and Cunningham 2001b]. The run-time correlation is like misuse detection: IDS alerts are matched with thepredefined pattern to formulate an attack scenario. High-level languages likeSTATL [Eckmann et al. 2000] and chronicles formalism [Morin and Debar2003] have been developed to describe the scenario. However, an inherentlimitation of these techniques is that they are not able to identify unknownnovel attack scenarios. In addition, the large number of scenario variationsmakes defining a comprehensive knowledge base of scenarios difficult.

In recent works [Cuppens and Miege 2002; Ning et al. 2002; Ning andXu 2003], a new correlation approach by defining preconditions (prerequi-sites) and post-conditions (consequences) is introduced. Alerts are correlatedtogether if an earlier alert’s post-condition matches or contributes to the pre-condition of an later alert. This method has the potential to identify newvariations of attack scenarios as we do not need to specify the complete at-tack scenarios. However, this method requires that appropriate pre- and postconditions must be manually predefined for all individual attacks in the eventstream, which could hardly be true in real-world scenarios.

Correlation methods based on statistical analysis greatly differ from theprevious three in that they do not require any prior knowledge of attacks.



Qin and Lee use the Granger Causality Test time series algorithm to correlatealerts [Qin and Lee 2003, 2004]. The underlying notion is that in multi-stepattack scenarios, alerts should have statistical similarities in their attributes.Theoretically this method has the ability to recognize novel attack scenarios,however itself is not sufficient for the complete correlation process. Furtherverification and analysis is needed to reduce the effect of false positives andreconstruct the attack scenario.

These correlation methods are complementary to our approach. We extenda simple and flexible alert aggregation mechanism derived from the attribute-based method [Valdes and Skinner 2001] in our preprocessing module. Pre-post condition-based and predefined scenario-based methods can be leveragedin our global reasoning process to evaluate correlation scores.

We believe that current alert correlation methods are not satisfactory fornetwork forensics analysis needs due to the following limitations:

—Performance of all alert correlation methods are strictly limited by the per-formance of IDS. The correlation methods are not robust to deal with frag-mentary and incomplete evidence.

—Lack of situational awareness in the correlation process make these methodsincapable of discovering events that seem fairly innocent individually, butare malicious when examined in correlation with certain context.

2.3 Attack Graphs and Vulnerability Analysis

Our work is also related to a variety of graph-based approaches in networksecurity. Attack graphs have been widely used in modelling network vulner-abilities [Ritchey and Ammann 2000; Sheyner et al. 2002; Sheyner and Wing2005; Phillips and Swiler 1998; Ramakrishnan and Sekar 1998; Jajodia et al.2005]. In an attack graph, each node represents a state of the network un-der attack and each edge represents a malicious activity that leads to networkstate transition. Therefore an attack graph enumerates all possible sequencesof exploits that could be used to achieve the intrusion goal. Researchers haveproposed various methods to generate and analyze attack graphs. Phillips andSwiler [Phillips and Swiler 1998] proposed to construct attack graphs basedon predefined attack scenarios, which is time consuming and error prone. Inrecent works, model-checking methods have been used to automatically gen-erate attack graphs [Sheyner et al. 2002; Sheyner and Wing 2005]. However,the primary limitation of these methods is scalability. Due to the huge volumeof low level system states and exploits involved, analytical complexity of theattack graph grows exponentially and become computationally infeasible forrealistic large networks. To improve scalability, Jajodia et al. [2005] developedthe TVA tool for network vulnerability analysis, which generates a graph ofdependencies among exploits that represents all possible attack paths withouthaving to enumerate them.

We note that applying attack graphs to post-hoc intrusion investigation isdifficult because accurate information on network states and vulnerabilitiesprior to the attack is often unavailable. We develop a graph model that is orga-nized around hosts and forensic evidence from heterogeneous network sources.



Fig. 1. Forensics analysis architecture overview.

Though the modelling is less fine-grained, it provides a practical view of theattack that is more applicable in post incident forensic analysis. In addition,the host-centric approach leads to higher level of abstraction and better scal-ability. Our evidence graph also captures functional states which are definedcorresponding to potential roles in an attack scenario.

3. THE NETWORK FORENSICS ANALYSIS ARCHITECTURE

Figure 1 shows the basic architecture of our proposed network forensicsanalysis system. Functionality of each component in the architecture is brieflydescribed as follows:

—Evidence collection module collects digital evidence from heterogeneoussensors deployed on the networks and hosts under investigation.

—Evidence preprocessing module transforms collected evidence into standard-ized format and performs aggregation to reduce the redundancy in rawevidence.

—Attack knowledge base provides knowledge of known exploits, including itsphase classification and target vulnerabilities.

—Assets knowledge base provides knowledge of the networks and hosts underinvestigation, including network topology, system configuration and value ofentities.

—Evidence graph manipulation module generates the evidence graph byretrieving preprocessed evidence from the depository. Hypotheses and out-of-band information are also instantiated into the evidence graph throughgraph edit operations.

—Attack reasoning module performs semi-automated reasoning based on theevidence graph. In the hierarchical reasoning process, results of localreasoning provide instant updates to the evidence graph.



The core of our analysis mechanism is the evidence graph manipula-tion module and attack reasoning module. The graph manipulation moduleconstructs the evidence graph with evidence retrieved from the depository andupdates graph attributes with information from the knowledge bases. Follow-ing that, the attack reasoning module performs automated inference based onthe evidence graph. Through the interface module, the analyst could provideexpert opinions and out-of-band information by directly editing the evidencegraph. The analyst can also incorporate hypothesis into the evidence graph byformulating specific searches for extended secondary evidence. The reasoningprocess is then performed on the updated graph to evaluate credibility of thehypotheses.

4. EVIDENCE PREPROCESSING

In network forensics analysis, an event is an occurrence that changes thestate of entities in the network. A malicious event is an event that servescertain malicious intentions of the attacker. Therefore we define evidence fornetwork forensics investigation as digital data that provides information ofmalicious events that have occurred in a cyber attack scenario, which can beclassified into two categories: primary (foreground) evidence and secondary(background) evidence.

—Primary evidence refers to information that directly indicates attacks orsecurity policy violations. Primary evidence generally comes from sensorsdesigned for specific security purposes such as IDSs.

—Secondary evidence refers to information that is not an explicit indicationof exploits or security violations, but could represent malicious events incertain contexts. Secondary evidence may come from various general pur-pose sensors and in a much higher volume.

Intuitively primary evidence is the triggering point of forensic investiga-tion. Searching for secondary evidence usually has two objectives: to discoverhidden suspicious events that are not detected by specific security sensorsand to evaluate the trustworthiness of primary evidence. However the hugevolume and wide range of secondary evidence present a challenging problemfor forensics analysis, as most event logs are irrelevant to security incidents.In this work, we propose a flexible approach that uses preliminary analysisresults from primary evidence as the guideline to formulate hypotheses to-wards secondary evidence, which is presented in Section 6.3. In our currentproof-of-concept prototype, we focus on two network-based evidence sources ineach category: network IDS alerts are used as primary evidence and networkflow records are used as secondary evidence.

Tasks in the preprocessing stage include normalization and aggregation.All types of evidence from sensors are normalized with predefined templates.Following that, aggregation is performed to reduce redundancy in applicabletypes of evidence. The preprocessed evidence are stored into the evidencedepository for retrieval in the reasoning process.



4.1 Evidence Normalization

Objective of the normalization process is to transform evidence from hetero-geneous sensors into a uniform format for analysis. Although attributes ofevidence vary with different sources, the following set of essential fields applyto all categories of evidence and have to be instantiated in the normalizationprocess:

—ID: Unique index for the evidence record;

—Subject: The set of attributes that define the initiator of underlying event;

—Object: The set of attributes that define the target of underlying event;

—Action: Description of nature of the event such as its classification and pos-sible consequence;

—Time: Time stamp(s) of the underlying event.

For intrusion alerts, we define a simplified template derived from IDMEF[IDMEF] to capture essential alert attributes. The result is denoted as rawalert. Format of the raw alert template is {AlertID, Classif ication, SrcIP,DesIP, DetectTime, HyperID}. The HyperID field is reserved for the aggre-gation procedure presented below.

Similarly, we define a simplified template based on Cisco NetFlow [NetFlow]format to uniquely define a flow record. Format of the network flow templateis {FlowID, SrcIP, DesIP, SrcPort, Desport, StartTime, EndTime, Protocol,ByteCount, ServiceType}.

4.2 Evidence Aggregation

The large amount of redundancy in current IDS alerts makes it difficult toanalyze the underlying attacks in an efficient manner. For example, a singleevent often generates many duplicate alerts in a short period and only oneinstance needs to be kept for analysis purpose. The alert aggregation processaims to remove the duplicates and generate hyper alerts that are easy to ana-lyze without losing relevant information.

We use alert aggregation based on similarity of attributes and contextrequirements to merge raw alerts into hyper alerts. Format of hyper alertsis {HyperID, Classif ication, SrcIP, DesIP, StartTime, EndTime, Count}. Eachhyper alert has a one-to-many relationship with raw alerts. In the hyper alerttemplate, the Count field records the number of raw alerts that are mergedinto the hyper alert for statistical evaluation. The HyperID field in the rawalert template records the unique identification number of the hyper alert itmerged into. The index mapping between raw alerts and hyper alerts enableanalysts to backtrack and examine alerts in finer scale.

We apply a flexible alert aggregation algorithm based on the Leader −

Follower model. In essence it adapts the similarity-based alert correlationmethod proposed by Valdes and Skinner [Valdes and Skinner 2001]. The aggre-gation criteria is to combine alerts that have the same source-destination pair,belong to the same attack class and whose time stamp falls in a self-extendingtime window. The time window of hyper alerts is self-extending in that if time



Algorithm 1: Leader-Follower alert aggregation

input: A set of raw alerts r1 . . . rn, time limit Toutput: A set of hyper alerts h1 . . . hm

begin

h1 ← r1;m← 1;for i← 2 to n do

merged← 0;for j← 1 to m do

if ri.sourceaddr = hj.sourceaddr hj.destaddr = ri.destaddr

hj.class = ri.class

hj.starttime− T 6 ri.detecttime 6 hj.endtime + T then

hj.starttime← min(hj.starttime, ri.detecttime);hj.endtime← max(hj.endtime, ri.detecttime);ri.hyperid← hj.id;hj.count← hj.count + 1;merged← 1;break;

end

end

if merged = 0 then

m← m + 1;hm← ri;hm.count← 1,hm.HyperID← m;

end

end

end

stamp of the raw alert falls outside the [start time, end time] window of hyperalert but the difference is within a predefined limit T, the corresponding boundof the hyper alert time window is updated with time stamp of the raw alert.This implies that we are able to merge continuous duplicate raw alerts thatspan over a long period into a single hyper-alert with a proper T value. TheLeader− Follower alert aggregation procedure is shown in Algorithm 1.

Effectiveness of the Leader− Follower aggregation process depends on thespecific scenario being analyzed. Therefore we also consider the following vari-ations in aggregation criteria for improved results:

4.2.1 Level of abstraction. One important variant that affects the results ofaggregation is the evaluation of attack classification. A common practice is totake the classifications from well known exploit collections such as CVE (Com-mon Vulnerabilities and Exposures). However, it is common to observe thatmultiple different attack classes are defined for attacks exploiting the samevulnerability or having similar result. It is often desirable to evaluate attackclassifications on a higher abstraction level such that trivial differences areignored. For example, a “SCAN Nmap TCP” alert and a “SCAN Nmap XMAS”alert generated by Snort can be merged into one hyper alert with the sameabstracted class “SCAN Attack.” Although higher levels of abstraction reducesthe hyper alert space, we are aware that it results in loss of granularity in



classification information and may lead to inappropriate aggregation. Furtherresearch is needed to develop an appropriate abstraction scheme.

4.2.2 Context adaptive aggregation. The general aggregation criteria inAlgorithm 1 needs to be adapted for specific contexts. For example, by evaluat-ing traces of attacks we discover that a large portion of raw alerts is triggeredby port scan activity, which is usually of little significance. Therefore we canimprove the efficiency of aggregation by using a strategy that only cares forhosts that are either source or target of large number of scan-related alerts.The former one-to-many characteristic often represents a scanner while thelatter many-to-one characteristic often indicates a potential victim of attack.Consequently the flexible aggregation criteria uses an abstract type that rep-resents all scan-related alerts and only require match of either sourceaddr ordestaddr. Similarly, alerts triggered by DDOS attacks often exhibit a many-to-

one pattern and should use an abstract type of “DDOS” class abstraction andrequire match only on destaddr.

5. EVIDENCE GRAPH MODEL

The evidence graph model is the foundation of our forensic analysis process.Functionalities of the evidence graph model include:

(1) The evidence graph provides an intuitive representation of collected foren-sic evidence.

(2) The evidence graph provides the basis for both local functional and globalstructural analysis in attack reasoning.

(3) The evidence graph provides a friendly interface that allows the analystto incorporate hypotheses and out-of-band information into the reasoningprocess.

5.1 Model Description

Definition 5.1. An evidence graph is a quadruple G = (N, E, LN, LE), whereN is the set of nodes, E is the set of directed edges, LN is the set of labels thatindicate the attributes of nodes and LE is the set of labels that indicate theattributes of edges. In the evidence graph, each node ni represents a host-levelentity and each edge ei represents a piece of preprocessed forensic evidence.

In forensics analysis, entities are subjects or objects in the attack scenariothat can be modelled at different granularity. Here we choose to represententities on the host level. Therefore, each node in the evidence graph is char-acterized by the following labels:

(1) ID: Unique identification of the host-level identity.

(2) States: Each node is characterized by a set of fuzzy functional states toprovide context for evidence evaluation and attack scenario analysis. Inour prototype, a simple set of fuzzy states are defined as S={Attacker,Victim, Stepping Stone, Affiliated}. The fuzzy variable Attacker(AT)

indicates the belief that the current node is a source of attack. The fuzzy



variable Victim(VI) indicates the belief that the current node is a target ofattacks. The fuzzy variable Stepping Stone(SS) indicates the belief that thecurrent node is controlled by an attacker and used as a stepping stone inthe scenario. By notion of “suspicion by association,” the fuzzy variableAffiliated(AF) indicates the belief that the current node is suspiciousbecause it has certain types of interactions with an attacker, victim orstepping stone. The set of functional states could be refined, but it is keptsimple in this article for illustration purposes. Note that these states arenot mutually exclusive. For example, in a data theft scenario, a victimhost that was compromised in an attack could be used as a storage relayto transfer stolen files. Consequently states of the host will evolve fromVictim to both Victim and Affiliated.

(3) Time stamps: Each functional state is associated with two time stamps:Tactivate records the initial time the state is activated above a certainthreshold and Tlatest records the latest time of update to the state.

(4) Value: Each node is associated with a value V ∈ [0, 1], indicating assetvalue of the host. For example, an important file server in the protectedprivate network has a higher importance than a public web server in theDMZ. We note that host value is a quite site-specific metric and exten-sive knowledge of the network under investigation is needed for properassignment.

In addition to general attributed fields assigned in the normalization andaggregation process, each edge in the evidence graph is characterized by thefollowing set of attributes:

(1) Weight: Weight is a fuzzy value w ∈ [0, 1] that represents the serious-ness of evidence. For example, a port scan that has little impact on thetarget should be assigned lower weight than a buffer overflow attack thatcould gain root privilege on the target system. In our prototype, we modelthe known attacks and assign the weight of IDS alerts based on expertknowledge.

(2) Relevancy: Relevancy value of an edge represents the belief that theunderlying action indicated by the evidence would successfully achieveits expected impact. For intrusion alerts, there could be three specificcases: relevant true positive, false positive, and nonrelevant true posi-tive [Kruegel and Robertson 2004]. Relevant true positive refers to alertsthat truly represent an attack and the attack achieves its expected impact.False positive refers to alerts that identify a legitimate event as an alertby mistake. Nonrelevant true positive refers to alerts that truly representan attack but the attack does not achieve its expected impact. We definerelevancy value for these three cases as follows:

r =

0, false/non-relevant true positive;0.5, unable to verify;1 relevant true positive.

(1)



The process to check the relevancy of an alert is denoted as alert verifica-tion. In our current prototype, we compare the prerequisites of an attackwith configuration of the target host. If all prerequisites are completely sat-isfied, the alert is labelled as relevant and its relevancy value is assignedas 1. If one or more contradicting configurations are found, the alert islabelled as nonrelevant and its relevancy value is assigned as 0; otherwisewe are unable to determine the relevancy value is assigned as 0.5. This sta-tic approach could rule out attacks that are bound to fail because the targethost is not vulnerable. However it cannot guarantee that attacks taggedas relevant are truly successful, as various other factors could lead to fail-ure of the attack. For example, because the attack could still fail becauseof an incorrect parameter. Recently Kruegel and Robertson [Kruegel andRobertson 2004] proposed an active verification mechanism that checks fortraces that match the attack’s expected outcome on the victim host, buteffectiveness of the method needs further study.

(3) Host Importance: Host importance h ∈ [0, 1] is a fuzzy parameter to relateimportance of evidence to value of associated hosts. Our rationale for defin-ing host importance lies in the idea that events associated with a highlyvalued host should receive more attention in intrusion analysis. The hostimportance for an edge e can be determined as follows:

h(e) = max(Vsrc, Vdst), (2)

where Vsrc is value of the source node and Vdst is value of the destinationnode.

Finally, we calculate priority score for an edge to indicate the overall impor-tance of the intrusion evidence. The priority score p(e) of an edge e is calculatedas the product of its weight, relevancy and host importance:

p(e) = w(e)× r(e)× h(e). (3)

As an example, a Windows DCOM buffer overflow attack is observed toinitiate from attacker s against target t. By prior knowledge we know thatt is a Linux server and consequently the Windows DCOM buffer overflow willhave no impact on it. Therefore though weight of the edge w(e) is high, priorityscore of the edge is zero as relevancy of the edge is r(e) = 0.

Weight and priority score are defined to provide different views in ourreasoning process. The priority score p(e) is used for attack group and sce-nario identification. In the above example, the failed attack has p(e) = 0, whichindicates that it has little significance in the attack scenario because theattacker did not achieve his goal. However, edges with a zero priority scoreare not removed from the evidence graph because weight of the edge w(e) isused in reasoning for functional state of associated hosts. Though the attackis irrelevant, the unsuccessful attack attempt still indicates malicious in-tent of the source host s and consequently the “Attacker” state of s should beincremented.



Algorithm 2: Constructing an evidence graph

input: Stream of evidence in time orderoutput: Evidence graph G

begin

foreach evidence E in stream do

foreach host V affected by E do

if V does not exist in G then

CreateNode (G, V);end

end

CreateEdge (G, E);foreach host V as subject or object in E do

UpdateNode (E, V);end

end

end

5.2 Building Evidence Graph

To construct the evidence graph, the sequence of intrusion evidence isprocessed in time order, starting from the first evidence in the record and mov-ing towards the latest evidence. Evidence with time intervals is added to thegraph in order of the start time in their interval. For each evidence, we eval-uate which nodes in the current evidence graph it will affect and create nodesthat do not exist, then create the edge accordingly. The algorithm for buildingthe evidence graph is listed in Algorithm 2. The UpdateNode function performsinference for the states of node by causal reasoning via Rule-Based FuzzyCognitive Maps (RBFCM), which we will describe in our hierarchical reasoningframework.

Finally, expert opinions and out-of-band information can also be directlyincorporated into the evidence graph for automated reasoning through thefollowing graph edit operations:

(1) Insert a new node n: This represents adding a new suspicious host to theevidence graph.

(2) Remove a node n: This represents removing an irrelevant host from theevidence graph. Note that removing a node implies removing edges associ-ated with the node.

(3) Update a node n: This represents changing one or more functional statesof the node.

(4) Insert a new edge e: This represents adding new intrusion evidencebetween existing nodes in the evidence graph.

(5) Remove an edge e: This represents removing irrelevant evidence from theevidence graph.

(6) Update an edge e: This represents adjusting the weight or relevancy valueof the corresponding evidence.



6. HIERARCHICAL REASONING FRAMEWORK

Based on the evidence graph, we develop a hierarchical reasoning frameworkfor automated evidence analysis. In this section we describe two levels of theframework: local reasoning for functional analysis and global reasoning basedon structure analysis.

6.1 Local Reasoning: Functional Analysis

The objective of local reasoning is to infer the functional states of an entityfrom its local observations. In the evidence graph space, “local” means that theinference is only based on information of the node itself and its direct neigh-bors. We argue that keeping track of host states has significant importance inforensics analysis.

(1) Host states provide context for evaluating evidence. There is no absolute“suspicious value” of events. Actions of attackers are often represented byevents that do not seem suspicious when examined individually withoutcontext. For example, legitimate file transfer connections associated witha healthy host generally do not indicate suspicious activity. However, anoutbound ftp connection initiated from a victim host is likely to be a dataexfiltration attempt that leads back to a host controlled by the attacker.Therefore, host states provide the context to discover hidden events forfurther investigation.

(2) Host states derived with local observations provide an initial view of theroles the host may play in an attack. The evolution of host states helps todisplay the advancing stages of an attack to the forensics analyst.

The complexity of host systems and cyber attacks makes it difficult to reacha precise statement about host states. Therefore we use a fuzzy approachtowards the problem because it is powerful in approximating human decision-making process that involves qualitativeness and inexactness. In the currentprototype, we develop causal inference via Rule-Based Fuzzy Cognitive Maps(RBFCM) to model the states of nodes.

Fuzzy Cognitive Maps (FCM) are actually fuzzy directed graphs thatcombine neural networks and fuzzy logic to predict changes of the system[Carvalho and Tome 1999a]. Nodes in a FCM are concepts that change overtime and edges represent the causality link between nodes. The weight of anedge measures how much one concept influences the other. FCM has beenused for decision support in many different domains, including network se-curity and intrusion detection systems [Siraj et al. 2001]. However, FCMsare limited in their capacity to model real-world scenarios for two reasons.First, generic FCMs can only represent simple monotonic cause-effect rela-tions. Secondly, FCMs cannot cooperate with traditional fuzzy rules [Carvalhoand Tome 1999a]. To express the strength of causality, domain expert knowl-edge is required to assign numerical edge weights. However due to the highlyad hoc nature of attack traces and lack of training data, it is impractical toobtain appropriate weights in the network forensics domain.



Fig. 2. RBFCM model for local reasoning.

Rule Based Fuzzy Cognitive Maps (RBFCM) are an evolution of FCM. ARBFCM is essentially a standard rule-based fuzzy system plus feedback andmechanisms to deal with causal relations [Carvalho and Tome 1999b]. Com-pared with the generic FCM, RBFCM is better adapted for modelling complexdynamic systems because changes to the concept are not simply determined bythe weight of the edges, but are defined by the fuzzy rules relating the conceptsand inputs. This is an important advantage to incorporate nonsymmetric andnonmonotonic causal relationships. Fuzzy rules provide the analyst an effec-tive approach to present domain knowledge. As shown in Figure 2, a RBFCMconsists of fuzzy concepts and fuzzy rule bases. In our context, concepts are thedefined set of functional states {Attacker, Victim, Stepping Stone, A f f iliated}.For each concept, the respective fuzzy rule bases {RBA, RBV, RBS, RBA}consist of “IF...Then...” fuzzy rules that define how each concept is affected byvalues of other concepts and incoming new evidence.

In the RBFCM shown in Figure 2, fuzzy rules are used to map multipleinputs (current value of states and new evidence) to the output (updated valueof states). Note that the edges among concepts indicate that it is possible toactivate any functional state from any other functional state. The states areupdated in an incremental manner. State values at time t + 1 are determinedby the states at time t and new evidences observed during the time interval[t, t+1). The fuzzy rules are designed from expert knowledge. Below we presentseveral examples of rules in our RBFCM model. The mapping of numericalranges for states in the fuzzy rules is shown in Figure 3.

If Back Orifice is detected on host n

Then Victim state of n is highly activated.

If attack of weight w initiates from host n

Then Attacker state value of n is increased by w.



Fig. 3. Mapping the fuzzy states.

If Victim state is high andAttacker state is high andTactivate(AT) > Tactivate(V I)

Then Stepping Stone state is highly activated.

If ftp connection to host n is detected andhost n’s Victim state is high and0 < T f tp − Tactivate(V I) < Tlimit

Then A f f iliated state is medium.

In local reasoning, we apply the assumption that states of hosts aremonotonic. For example, once a host’s Victim state is activated, it will neverreturn to normal unless specifically instructed by out-of-band knowledge.

The total causal influence on each functional state is defined within theinterval [0,1]. To monotonically map the concept value into the normalizedrange [0,1], we apply the sigmoid function in the following form where c is apositive constant.

f (x) =1

1 + e−cx(4)

6.2 Global Reasoning: Structure Analysis

The global reasoning process aims to identify the set of highly correlated hoststhat belong to the coordinated attack scene from structure of the evidencegraph. Global reasoning is based on the assumption that during the procedureof attack, there must be a strong correlation between members of the attackgroup, and this correlation is exhibited through certain structural characteris-tics in the evidence graph. For example, a DDOS attack or an active scannerwould be presented by a star-like topology in the evidence graph, while anunusual long path could be indication of a stepping stone chain. These distinc-tive graph components provide a first approximation of the coordinated attackscenario together with the functional state estimates from local reasoning.

We approach the global reasoning task as a group detection problem, whichis to discover potential members of an attack group given the intrusion evi-dence observed. The attack group detection procedure works in two differentphases: (1) create new attack groups by generating seed for the group and (2)expand existing groups by membership testing.

6.2.1 Seed Generation. In the seed generation phase, we aim to discoverimportant nodes in the evidence graph as initial seeds of attack group. Inessence, we would like to select entities that are both functionally significant



and structurally important. From the functional perspective, the investigatorgenerally has two options in seed selection. In a forward search manner, it isstraightforward to select external hosts with Attacker state highly activatedin the local reasoning process as initial seeds. In a backward search manner,hosts in the trusted domain with Victim or Stepping Stone state highly acti-vated are good candidates of initial seeds.

From the structural perspective, we use the eigenvector centrality metricto evaluate importance of nodes in the evidence graph. Eigenvector central-ity is a refined version of the simple degree metric. Instead of just countingthe number of edges incident to a node, eigenvector centrality score is basedon both the number and the quality of edges. The intuition is that an edgeto a node having a high eigenvector centrality score would contribute morethan an edge to a node having a low score. Let centrality score of node i bedenoted by xi, which is proportional to the sum of the centrality score of n’sneighbors:

xi =1

λ

n∑

j=1

A i, jx j (5)

where A is the adjacency matrix of evidence graph G and λ is a constant.The equation can be rewritten in matrix form as Ax = λx where x becomesan eigenvector of the adjacency matrix with eigenvalue λ. In practice, weapply the power iteration method to obtain the dominant eigenvector of evi-dence graph’s adjacency matrix, which represents the eigenvector centralityscores of corresponding graph nodes. By integrating the eigenvector centralityrank with functional state analysis, we can easily formulate queries for effi-cient seed selection. For example, “Among all nodes that have Attacker stateactivated higher than T, which one has the highest structural significance?”

6.2.2 Group Expansion. In group expansion process, we expect to discovernodes that have strong correlation with the initial seeds and add them to theattack group. For a pair of neighbor nodes (s, t) in the evidence graph G, letE(s, t) represent the set of incident edges between s and t. Then we evaluatethe correlation score C(s, t) as a numeric abstraction of information encoded inthe set E(s, t) with higher score represents stronger correlation between s andt. A straightforward correlation function is to compute C(s, t) as the sum ofpriority scores of edges in E(s, t).

C(s, t)SUM = C(t, s)SUM =∑

e∈E{s,t}

p(e) (6)

In essence the correlation score indicate the amount of malicious activity be-tween two nodes. However, we note that the simple aggregation approach maylead to biased correlation scores. For example, a large number of port scanswould result in a bigger correlation score than that of a buffer overflow exploit,though the latter represents more serious intentions, that is, stronger corre-lation between the source and target node. To provide a good approximation



of the aggregated effects of edges in E(s, t), we incorporate the following twofactors into correlation evaluation:

—Attack Phase: Intuitively, a multistage attack scenario is characterized bya sequence of intrusion events that belong to different attack phases. Theadvancement of attack phases indicates the increase of attacker’s capabil-ity. Here we map attacks into a simple set of generalized attack phasesReconnaissance, Intrusion, Privilege Escalation, Denial of Service, PostCompromise. In the above example, repeated port scans should not be simplyaggregated because they belong to the same Reconnaissance phase.

—Vulnerability: In addition to attack phase, we also consider the targetedvulnerability of exploits. For example, both “NETBIOS SMB trans2openbuffer overflow attempt” and “FTP exploit wu-ftpd exec format string over-flow” alerts belong to Intrusion that could gain root privilege on the target.Though the two attacks have the same attack phase, the former attack ex-ploits the NETBIOS SMB vulnerability while the latter exploits vulnerabil-ity in the IIS Web server. The variation of vulnerabilities indicates that theattacker is making distinctively different attempts for the same objective.

Therefore for refined correlation evaluation, we examine the tuple {Attack

Phase, Vulnerability} of each attack in the set E(s, t) in increasing timeorder. Only priority scores of attacks that differentiate in either attack phaseor vulnerability are aggregated; attacks that have duplicate {Attack Phase,

Vulnerability} with existing edges are discarded. The rationale behind ourscheme is that only attacks that potentially increase the attacker’s knowl-edge or capability should increase the aggregated correlation score. The in-tegrated view of attack phase and vulnerability provides a unique measureof the attacker’s rising knowledge and capability in the intrusion process. Ineach effective step, the attacker either proceeds to the next attack phase orattempts to achieve the objective of current attack phase by exploiting differentvulnerabilities.

With the correlation evaluation procedure we transform the evidence graphG into a correlation graph G. A correlation graph G = (V, E) is an undi-rected graph where V is the identical set of nodes from corresponding evidencegraph G. For each pair s, t ∈ V, there is an undirected edge e ∈ E if andonly if s, t are neighbors in the corresponding evidence graph G. Weight of theedge e in G denotes the distance between two neighbor nodes s and t, whichis defined as the reciprocal of aggregated correlation score, that is, d(s, t) =d(t, s) = 1

C(s,t) . Smaller distance values represent stronger correlation betweentwo nodes.

Group expansion is an iterative process which consists of three basic steps.First, we identify all external neighbors of current seed members as the set ofcandidate nodes. Second, a ranked list is formed based on the distance betweeneach candidate node to current group members; Finally, the ranked list is cutat a predefined threshold and candidate nodes within the distance thresholdare added as new seed members of the group. If no candidate node is within thedistance threshold, the group expansion procedure terminates. The procedureis listed in Algorithm 3.



Algorithm 3: Basic attack group expansion process

input: Evidence graph G, initial seed node vs, distance threshold D, step size n

output: The derived attack group group

begin

group← vs;

neighbors← ∅;

candidates← ∅;

repeat

foreach node v in the set group do

neighbors← FindNeighbour (G, v);

candidates← candidates∪ neighbors;

end

foreach node v in the set candidates do

v.distance← GetDistance (v, group);

end

new←RankCandidates (candidates, D, n);

group← group∪ new;

until no new member is f ound;

end

The FindNeighbors function returns the set of external neighbor nodes ofcurrent seed members. In the GetDistance function, we evaluate the distancebetween the candidate node to its nearest seed member. In the RankCandidates

function, candidate nodes whose distance exceeds the threshold are discarded.A ranked list is formed for the remaining candidate nodes in the order of in-creasing distance. Given the step size n, we take a greedy approach and thetop n candidate nodes in the ranked list are added to the attack group as newseed members.

In essence the group expansion process belongs to the class of hierarchi-cal and agglomerative clustering algorithms, where each node start being itsown cluster and clusters are merged iteratively. The conceptual difference isthat we are only interested in clusters formed around “seed nodes” that pos-sess both functional and structure significance. We apply the single linkagedistance metric in group expansion, that is, comparing minimum distance be-tween candidate node and current seeds to the distance threshold for mem-bership testing. Other distance metric options such as complete and averagelinkage are less appropriate because in scenarios like a stepping stone chain,the suspicious entity may have strong correlation with single member of thecurrent seed group.

Due to the vast difference in attack traces, selecting appropriate thresholdsis largely an empirical process: the analyst could compare the results of a setof thresholds and pick the most suitable one based on expertise. Loweringthe threshold generally leads to higher rate of false positives while raising thethreshold may result in higher rate of false negatives. Moreover, the rankinglist often explains more than trying to find the best cut-off threshold.



Fig. 4. Overview of hypothesis testing procedure.

6.3 Interactive Hypothesis Testing

Network forensics analysis is not complete without hypotheses testing.Intuitively, hypotheses are “what if” propositions made for possible explana-tions of inconsistencies in preliminary analysis results. For example, “What ifthe backdoor was uploaded to the victim by FTP?”, “What if the buffer overflowalert between from H1 to H2 is an irrelevant background artifact?” Hypothesistesting is inherently an interactive process. As shown in Figure 4, the processof making and evaluating hypothesis based on the evidence graph modelconsists of the following steps:

(1) Formulate hypotheses based on initial analysis results and out-of-bandinformation.

(2) Instantiate formulated hypotheses into the evidence graph with graph editoperations and extended queries.

(3) Perform local and global reasoning on the updated evidence graph.Examine the new analysis results and reiterate from step 1 for newhypotheses.

Hypotheses can be classified into two major categories: removing irrelevantevidence and adding missed evidence. We note that the former essentially boilsdown to the relevancy evaluation problem discussed in evidence graph modelconstruction. The latter case be further grouped into the following two types:

—Missed Primary Evidence: Serious attacks could be missed by securitysensors due to their limited coverage and detection rate. In this case, enti-ties in the attack scenario that should be in the same attack group are splitinto several isolated components in the correlation graph, each of which onlyrepresents a part of the complete attack scenario.

—Missed Secondary Evidence: Steps in the attack scenario could be legitimateoperations that do not trigger security alerts on their own. For example, theattacker could use a separate host other than the ones employed in directattacks as the warez server. After successfully compromising a target, theattacker uploads backdoor and exploit tools from the warez server to thevictim by FTP. We are able to discover the group of attacker and victims



as explicit attacks are captured in primary evidence. However the warezserver will evade detection because the file transfer flows appear benign andare absent from IDS alerts.

Automated hypotheses generation for missed primary evidence is certainlya direction for future research, however, is out of the scope of this article.Ning et al. [Ning and Xu 2004] proposed schemes to hypothesize for missedattacks by (1) similarity of alert attributes and (2) potential prepare-for re-lations encoded in equality constraints, which can be integrated with ourapproach. Through graph edit operations defined in Section 5.2, hypothesescan be instantiated as new nodes and edges into the evidence graph for fur-ther evaluation.

To get a comprehensive view of the attack scenario, secondary evidenceneeds to be effectively incorporated into forensic analysis. However it is im-practical to apply all secondary evidence in building the initial evidence graphas most of them are innocuous events. Here we propose a flexible approach thatuses analysis results of primary evidence to formulate hypotheses for missedsecondary evidence. Specifically, the following factors are considered:

—Association: Benign activities associated with the malicious could be suspi-cious. For example, a ftp attempt from the Victim to an outside host couldrepresent the exfiltration of sensitive data. Moreover, the belief of suspicioncould be strengthened with noncoincidence in terms of association. For ex-ample, a host that has connections with several identified SteppingStones inthe attack group is more likely to be relevant to the attack scenario thanincidental.

—Classification: It is more prudent to give greater importance to certainclasses of events than others. The classifications are often derived from spe-cific host based information and consequences of attack. For example, httpconnections associated with a Web server are arguably not as suspicious asa ssh login. Also, backdoor tools discovered on a trusted internal host im-plicitly indicate that file transfer activities should be examined with moreattention.

—Time: In addition to Classification and Association, the range of potentiallysuspicious secondary evidence is further narrowed down with temporal con-straints, which comes from activation and update time stamps of functionalstates in local reasoning results.

Therefore after the initial evidence graph is constructed from primaryevidence, we can formulate hypotheses for missed secondary evidence basedon the triple (Association, Classification, Time). By querying the secondaryevidence repository, the hypotheses are then instantiated to build the enrichedevidence graph. Local and global reasoning results with the updated evidencegraph are then compared to previous ones to evaluate the credibility of hy-potheses, which heavily relies on expert knowledge. An automatic evaluationof hypotheses is a clear candidate for continued research.



7. EXPERIMENTAL RESULTS

We have implemented a prototype system of all the techniques presented.Snort is used as the network IDS sensor to generate intrusion alerts. We usesoftflowd [Softflowd] to process the TCPDUMP traces and generate datagramsin NetFlow format. The OSU flow-tools [Flowtools] is used as the collector tocapture NetFlow datagrams. Both raw and preprocessed evidence are storedin a MySQL database. The local and global reasoning modules are developedwith Perl. We also implement a GUI application based on LEDA [LEDA] forvisualization and manipulation of evidence graphs.

Traces of three multi-stage attack scenarios are used to validate our pro-posed techniques. The first dataset contains a small-scale attack implementedin our own testbed. To further demonstrate the ability of our techniques,we perform experiments on two public intrusion detection datasets, the MITLincoln Lab DARPA dataset [DARPA] and dataset provided by ARDA contractNo. NBCHC030107.

7.1 Evaluation Metrics

In the preprocessing phase, we use reduction rate to evaluate the efficiency ofreducing redundancy in raw evidence. For intrusion alerts, the reduction rateis defined as one minus the ratio between number of hyper alerts over numberof raw alerts. Similarly for network flows, the reduction rate is one minus theratio between storage size of NetFlow records over that of raw traffic.

To examine the output of our hierarchical reasoning framework, we considerthe following possibilities. If a host actually involved in the attack scenario isincluded in the attack group and its role is correctly recognized, we define itas a true correlation. A false correlation denotes that a host not related to theattack scenario is included in the attack group or its role is misclassified. Wecall it a missing correlation when a host actually plays a role in the attack sce-nario but is not correlated into our attack group. With sufficient ground truthfor the attack scenario, we are able to identify true, false, and missing correla-tions. Two quantitative metrics are used to evaluate the overall effectivenessof our attack reasoning methods. The reasoned attack group represents theset of hosts obtained in group expansion result, while the actual attack grouprepresents the set of hosts truly involved in the attack scenario.

Accuracy = 1−number of false correlations

number of hosts in the reasoned attack Group(7)

Coverage = 1−number of missing correlations

number of hosts in the actual attack group(8)

7.2 Scenario 1

The first multi-stage attack scenario is implemented in our own testbed ofaround 30 physical hosts and five subnets. Multiple exploits, backdoor pro-grams, and scanning tools are used to bring more variety to the scenario. Inaddition, background traffic including http, FTP, sftp, ssh and telnet are



generated throughout the whole attack process to simulate a more realisticnetwork environment. Irrelevant random attacks are also included to obscurethe coordinated attack scenario.

For notation convenience we adopt the following labels for hosts in-volved in the attack scenario Attacker:192.168.21.3, Stepping Stone 1:192.168.25.3, Stepping Stone 2:192.168.22.4, Victim: 192.168.23.4 and Ftp

Relay: 192.168.24.4. The scenario includes the following steps:

(1) Attacker initiates Samba buffer overflow attack against Stepping Stone 1.

(2) Attack tools are uploaded from Ftp Relay to Stepping Stone 1; also anNetcat backdoor is started on Stepping Stone 1.

(3) Stepping Stone 1 initiates Windows DCOM buffer overflow attack againstStepping Stone 2. Attack tools are uploaded from Ftp Relay to Stepping

Stone 2.

(4) Stepping Stone 2 initiates Frontpage Server 2000 buffer overflow attackagainst the Victim. After break in backdoor tool is uploaded from Ftp Relay

to Victim.

(5) Sensitive data is transferred from Victim to the Ftp Relay and backdoorconnections are closed.

7.2.1 Evidence Graph Construction. In preprocessing phase, we use a self-extending time window T of 60 seconds for alert aggregation. The overall pre-processing results are shown in Table VIII. The initial evidence graph shownin Figure 5(a) is then generated from primary evidence, that is, the aggre-gated hyper alerts. The number attached to each edge denotes the sequenceof corresponding events in time order. Figure 5(b) illustrates the correspond-ing attribute set {Attack Phase, Vulnerability} for each alert, which are usedfor correlation evaluation in global reasoning process. Note that the Ftp Relay

does not show up in the initial evidence graph as file transfer activities do nottrigger Snort alerts.

In the next step, hypotheses can be formulated for potentially missed sec-ondary evidence and instantiated to build the enriched evidence graph, whichoffers a more comprehensive view of what happened in the network. Hypothe-ses for missed secondary evidence are formed with the triple {Association,

Classification, Time}. By observing the existence of backdoor tools on exploitedhosts, it is reasonable to define Classification as all file transfer (FTP, tftp)flows. Association is defined as any host with Victim state activated and Time

is extracted from activation time of the Victim state. The enriched evidencegraph is shown in Figure 6(a) where the solid edges represent primary evi-dence while the dotted edges represent events from secondary evidence. We no-tice that several potentially suspicious hosts include the Ftp Relay are broughtup in the enriched evidence graph.

7.2.2 Local Reasoning. In the first step, the analyst would examine statesof hosts from the local reasoning process. Based on the RBFCM local reason-ing procedure, states of nodes in the evidence graph are inferred and shown



Fig. 5. Scenario 1 initial evidence graph from primary evidence.

in Table I. The hosts that have “Attacker” state activated are highlighted inFigure 5(a), 5(b) and Figure 6(a).

7.2.3 Global Reasoning. First, we compute eigenvector centrality scores ofnodes in the initial primary evidence graph for seed generation. As shown inTable II, 192.168.25.3 has the highest centrality score. Also note that in local



Fig. 6. Scenario 1 enrich evidence graph and correlation graph.

Table I. Scenario 1: Local Functional States

Host AT VI SS AF

192.168.22.4 0.85 0.85 0.87 0.84

192.168.25.3 0.85 0.80 0.94 0.84

192.168.21.3 0.80 0 0 0.84

192.168.23.4 0.69 0.85 0 0.82

192.168.24.4 0 0 0 0.84

192.168.22.6 0.85 0.50 0 0

129.186.215.40 0 0 0 0.81

129.186.215.41 0 0 0 0.69

192.168.21.5 0 0 0 0.70

192.168.21.6 0 0 0 0.70

207.171.166.48 0 0.67 0 0

207.171.175.22 0 0.67 0 0

66.150.153.111 0 0.67 0 0

216.52.167.132 0 0.69 0 0

63.240.204.202 0 0.71 0 0

reasoning it has all states activated. Therefore host 192.168.25.3 is chosen asthe initial seed based on both functional and structural importance.

In the next step, the attack group is constructed incrementally from theinitial seed. Distances between neighbor nodes are shown in the correlationgraph (Figure 6(b)). The highlighted oval node is the initial seed and the high-lighted square nodes are members added to the attack group during the groupexpansion process with distance threshold 1.

By observing the states of members in the attack group we can see that host192.168.21.3 has Attacker state activated. Hosts 192.168.25.3 and 192.168.22.4both have Stepping Stone state activated. Host 192.168.23.4 has both Attacker

and Victim state activated but not Stepping Stone. Further examination of theactivation time of states clearly suggests that host 192.168.21.3 is the initialstart point of attack and hosts 192.168.25.3,192.168.22.4 are used as step-ping stones. As the result of hypothesizing for missed secondary evidence,the set of nodes 192.168.24.4, 192.168.21.5, 192.168.21.6, 129.186.215.40 and129.186.215.41 all have A f f iliated state activated, which suggest that theyare possibly involved in malicious non-explicit-attack activities. However,



Table II. Scenario 1: Eigenvector Centrality Scores

Host Eigenvector centrality

192.168.25.3 0.68

192.168.21.3 0.53

192.168.22.4 0.46

192.168.23.4 0.21

192.168.22.6 0.07

216.52.167.132 0.03

207.171.166.48 0.01

207.171.175.22 0.01

66.150.153.111 0.008

63.240.204.202 0.008

192.168.24.4 is affiliated with all attackers and stepping stones in the attackgroup while any other node is only affiliated with one member. Though moreinvestigation is needed, it is reasonable to conclude that 192.168.24.4 is moresuspicious than 129.186.215.40 due to its higher degree of non-coincidence.We also note that although 192.168.22.6 is labelled as an attacker in localreasoning, it is isolated from the major attack group and regarded as a irrele-vant background attacker. The coverage and accuracy of reasoning is shown inTable VII.

7.3 Scenario 2

The second dataset we examine is the MIT Lincoln Lab 2000 DARPA intrusiondetection scenario 2.0.2. The LLDOS 2.0.2 dataset contains a series of attacksessions that can be grouped into the following stages:

(1) The attacker probes public DNS server of the target network via a HINFOquery.

(2) The attacker compromises the DNS server by exploiting the Solaris Sad-mind vulnerability.

(3) The attacker uploads attack scripts and mstream DDoS software to thecompromised DNS server via FTP.

(4) The attacker telnets to the compromised DNS server. Then the probing andSadmind exploit process is repeated towards hosts in the network. Aftersuccessfully breaking into one Solaris host, the attacker uploads mstreamsoftware via FTP.

(5) The attacker access the compromised hosts via telnet and initiates DDoSattack toward an external Web server.

7.3.1 Evidence Graph Construction. The attack scenario lasts about onehour and 45 minutes. In preprocessing, a self-extending time window T of 60seconds is used in alert aggregation. We note that the general aggregationcriteria does not perform well because most alerts focus on a limited numberof targets. Thus we free up the constraint on Desaddr field to better modelthe many-to-one pattern in raw alerts. The overall preprocessing results areshown in Table VIII.



Fig. 7. Scenario 2: Enriched evidence graph with extracted attack group.

Table III. Scenario 2: Functional States from Local Reasoning

Host AT VI SS AF

202.77.162.213 0.84 0 0 0.84

172.16.115.20 0.84 0.84 0.99 0.84

172.16.112.50 0.82 0.84 0.99 0.84

172.16.112.207 0 0.65 0 0

131.84.1.31 0 0.99 0 0

172.16.116.194 0.82 0 0 0

172.16.112.100 0.80 0.80 0 0.84

Given the initial evidence graph, the analyst can filter out irrelevant eventsfor a clearer view. In this scenario we arguably ignore all Reconnaissance alertsthat are not followed by serious Intrusion or Post Compromise alerts. In for-mulating hypotheses for missed secondary evidence, we define Classification

as all file transfer (FTP, tftp) and remote access (ssh, rlogin, telnet) flows. As-

sociation is defined as any host with Attacker or Victim state activated, andTime is extracted from activation time of the corresponding functional states.Due to space limitations, we only show part of the enriched evidence graphcorresponding to the reasoned attack group in Figure 7.

7.3.2 Local and Global Reasoning Results. In local reasoning process,functional states of nodes in Figure 7 are shown in Table III.

In the structure analysis phase, we first compute eigenvector centrality toevaluate the importance of each node in the evidence graph. Top nodes interms of eigenvector centrality score are shown in Table IV.

In the next step, we examine the effectiveness of seed-based single linkclustering. As shown in Table IV, node 131.84.1.31 has the highest eigen-vector centrality score. Local reasoning shows that 131.84.1.31 has a highly



Table IV. Scenario 2: Eigenvector Centrality Scores


131.84.1.31 0.78

172.16.112.50 0.25

172.16.112.100 0.18

172.16.114.50 0.17

172.16.113.148 0.14

172.16.115.20 0.14

172.16.112.207 0.13

activated Victim state. Thus we start by choosing 131.84.1.31 as the initialseed. As shown in the left side of Figure 7, the result cluster has a startopology in which edges correspond to mstream DOS alerts, which intuitivelyindicates the many-to-one DDoS attack pattern. However it is an isolated com-ponent in the evidence graph due to spoofed IP addresses. To further explorethe attack scenario, we choose 172.16.112.50 as the seed, which has the sec-ond highest eigen centrality score and an active Stepping Stone state. Withthreshold set to 80 percent of all correlation scores, the result attack groupis shown in Figure 7 where the highlighted oval node represents the seedand highlighted square nodes represent members in group expansion. FromTable III we can see that host 202.77.162.213 has Attacker state activated.Hosts 172.16.115.20 and 172.16.112.50 both have SteppingStone state activatedwhile host 172.16.112.207 have only Victim state activated. By comparing theactivation time of these states, it is straight forward to see that 202.77.162.213started the attack and the attacker initially broke into the protected networkthrough 172.16.115.20. Following that 172.16.115.20 fans out attacks against172.16.112.50 and 172.16.112.207. Though host-based information is neededfor validation, 172.16.112.50 appears more likely to be compromised as itsSteppingStone state has been activated.

We can see that the attack group and major attack steps in the groundtruth are successfully extracted. The false correlations 172.16.116.194 and172.16.112.100 are caused by irrelevant background attacks. Further hostbased investigation is needed to filter out these artifacts. Note that for targetof the DDoS attack 131.84.1.31, its Victim state is activated immediately afterthe activation time of SteppingStone state of 172.16.115.20 and 172.16.112.50,which suggests a strong logical relation between the two separated attack com-ponents. Overall coverage and accuracy of the scenario is shown in Table VII.

7.4 Scenario 3

Finally, we perform experiments on an attack dataset prepared under ARDAcontract No.NBCHC030117 which initially funded this work. In this dataseta multistage attack scenario is implemented in a network with multiple sub-nets and several hundred hosts. Background traffic and exploits calibratedin proportion to traffic traces collected from a real military network are in-jected to masquerade the coordinated attacks. Compared with previous sce-narios, this is a more realistic example of network forensics investigation in alarge-scale network where the foreground attacks are buried in huge volume of



Table V. Scenario 3: Functional States from Local Reasoning

Host AT VI SS AF

74.205.114.158 0.69 0 0 0

42.152.69.166 0.80 0 0 0

100.10.20.4 0 0.69 0 0

100.20.200.15 0.69 0 0 0

100.10.20.8 0.79 0.80 0 0

100.20.1.3 0.80 0 0 0

100.10.20.20 0 0.69 0 0.84

Table VI. Scenario 3: Eigenvector Centrality Scores


74.205.114.158 0.93

100.10.20.8 0.21

100.10.20.4 0.08

100.20.200.15 0.07

100.10.20.20 0.06

42.152.69.166 0.02

100.20.1.3 0.01

background noise. Moreover, the attacker changes identity at different stages.From the ground truth, the whole attack can be split into the following phases:

(1) The attacker probes the protected network, then launches a failed exploitattempt against a Windows host from a different source.

(2) The attacker launches attack against a Linux host from a different sourceand successfully compromises it through a web server exploit.

(3) The attacker uses the compromised Linux host to probe other hosts insidethe protected network, then launch attacks and breaks into one of them.

(4) The attacker exfiltrates sensitive data from compromised hosts and exits.

7.4.1 Evidence Graph Construction. The attack scenario lasts about 60minutes. In the preprocessing stage, we use the general aggregation criteriawith a self-extending time window T of 60 seconds. The overall preprocessingresults are shown in Table VIII.

Similar to scenario 2 we also ignore Reconnaissance alerts that are notfollowed by serious Intrusion or Post Compromise alerts. Only significant scan-ners that initiate intensive probing activities are kept in the evidence graph.For missed secondary evidence, we define Classification as all file transfer(FTP, tftp) and remote access (ssh, rlogin, telnet) flows. Association is definedas any host with Attacker or Victim state activated and Time is extracted fromactivation time of the functional states. The part of enriched evidence graphcorresponding to reasoned attack group is shown in Figure 8.

7.4.2 Local and Global Reasoning Results. States of nodes inferred in thelocal reasoning process are shown in Table V. Top nodes in terms of eigenvec-tor centrality score are shown in Table VI. In global reasoning, 74.205.114.158is selected as the initial seed as it has the highest centrality score and Attacker

state activated. However, the group expansion process fails to find other



Fig. 8. Scenario 3: Enriched evidence graph with extracted attack groups.

members of the attack group because edges incident to 74.205.114.158 all rep-resent scanning activities that have a low correlation score. To further explorethe dataset, we select node 100.10.20.8 as the second seed, which has the nexthighest centrality score and both Attacker and Victim state activated. By set-ting the threshold at 90 percent of all correlation scores, the result attack groupis shown in Figure 8 where the highlighted square nodes represent membersexpanded from 100.10.20.8.

In the next step, local and global reasoning results are combined to extractthe attack group and scenario. It is intuitive to observe that 74.205.114.158initiates intensive probing activities and has Attacker state activated, whichcorrectly identifies one external attacker in step 1. The second seed 100.10.20.8is also correctly identified as the Victim in step 3, but its Attacker state is in-correctly activated due to irrelevant background attacks against 100.10.20.20.Host 100.20.1.3 is also a false correlation caused by background noise, thesimilar pattern of attack against the Windows IIS vulnerability makes it dif-ficult to identify it without host based information. Two external attackers168.225.9.78 and 91.13.103.83 in step 2 are absent from the evidence graphbecause of missed primary evidence. As no further connections were estab-lished between the two absent attackers to other members of the attack group,we are not able to identify them even with the enriched evidence graph. Ac-cording to the ground truth, the attacker breaks into host 100.20.200.15 andused it as a stepping stone. In our analysis, though 100.20.200.15 is correctlyincluded in the attack group, it has only Attacker state activated because theearlier exploits against it evaded detection of Snort. The analyst has to relyon out-of-band information that 100.20.200.15 is an internal trusted server toreason it was potentially compromised. Finally, we note that host 100.10.20.4



Table VII. Accuracy and Coverage for Three Scenarios

Scenario 1 Scenario 2 Scenario 3

Accuracy 83.3% 71.4% 57.1%

Coverage 100% 100% 71.4%

Table VIII. Preprocessing Results for the Three Scenarios

Scenario 1 Scenario 2 Scenario 3

TCPDUMP 300 MB 118 MB 3.6 GBTraffic NetFlow 6.4 MB 10.6 MB 11.1 MB

Reduction 97.87% 91.0% 99.6%

Raw Alerts 7502 52815 12023Alerts Hyper Alerts 21 150 406

Reduction 99.72% 99.7% 97.6%

has the third highest eigenvector centrality score and Victim state activated,however it is not related to the attack group expanded from 100.10.20.8. Byselecting 100.10.20.4 as the initial seed, the external attacker 42.152.69.166is extracted in group expansion, which reveals the failed exploit attempt instep 1.

From Table VII we can see that false negatives in primary evidence signif-icantly degrade the coverage and accuracy of our analysis results. However,our functional and structural metrics still effectively extract the major identi-ties and steps in the attack scenario. Moreover, a strong association betweenthe three attack groups can be observed from the fact that the activation timeof Attacker state in 74.205.114.158, 42.152.69.166, and 100.20.200.15 closelyfollows each other.

8. CONCLUSION

In this article, we developed a novel graph-based approach for networkforensics analysis. The evidence graph model provides integrated support forevidence presentation, manipulation and semi-automated analysis. The hie-rarchical reasoning framework consists of two interrelated phases. Local rea-soning applies fuzzy techniques to infer the functional states of hosts from localobservations. Global reasoning identifies important entities from structure ofthe evidence graph and extracts the corresponding highly correlated attackgroup. The attack scenario is further analyzed by combining results from bothphases. Experimental evaluations demonstrate that our analysis mechanismachieves good coverage and accuracy in attack group and scenario extractionwith less dependence on comprehensive knowledge of exploits and vulnerabil-ities. Moreover, our hypothesis testing procedure helps to recognize attackers’“legitimate” activities from secondary evidence.

This work is only the starting point of our efforts towards network foren-sics analysis. In future research, we plan to address the limitations and sev-eral important extensions of current methods. We will extend our prototypeto incorporate more evidence sources such as firewall and application logs. Inglobal reasoning, more accurate and fine-grained correlation evaluation ap-proaches will be explored to better suppress the effect of background noise. Forlarge-scale analysis, high-performance clustering approaches like the spectral



clustering methods will be evaluated on our evidence graph model. Also to re-duce ad-hocness in current hypothesis testing procedure, we will investigatemethods to automate the process of hypothesis formulation and quantitativelyevaluate the credibility of candidate hypotheses. Finally, we will work withindustrial and government agencies to evaluate our techniques in larger scaleexperiments and real world cyber crime investigation cases.

ACKNOWLEDGMENTS

We would like to thank the anonymous reviewers for their valuable comments.

REFERENCES

CARRIER, B. D. AND SPAFFORD, E. H. 2004. Defining event reconstruction of digital crime scenes.J. Forensic Sci.

CARVALHO, J. P. AND TOME, J. A. B. 1999a. Rule Based Fuzzy Cognitive Maps and FuzzyCognitive Maps - A Comparative Study. In Proceedings of the 18th International Conference

of t he North American Fuzzy Information Processing Society (NAFIPS’99). New York.

CARVALHO, J. P. AND TOME, J. A. B. 1999b. Rule-Based Fuzzy Cognitive Maps: Fuzzy CausalRelations. In Proceedings of the 8th International Fuzzy Systems Association World Congress

(IFSA’99). Taiwan.

CUPPENS, F. 2001. Managing alerts in a multi-intrusion detecttion environment. In Proceedings

of the 17th Annual Computer Security Applications Conference (ACSAC’01).

CUPPENS, F. AND MIEGE, A. 2002. Alert Correlation in a Cooperative Intrusion Detection Frame-work. In Proceedings of the 2002 IEEE Symposium on Security and Privacy (SP’02).

DAIN, O. AND CUNNINGHAM, R. 2001a. Building scenarios from a heterogeneous alert stream.In Proceedings of the 2001 IEEE Workshop on Information Assurance and Security (IAW’01).231–235.

DAIN, O. AND CUNNINGHAM, R. 2001b. Fusing a heterogeneous alert stream into scenarios.In Proceedings of the 2001 ACM Workshop on Data Mining for Security Applications (DMSA’01).1–13.

DARPA. MIT Lincoln Lab 2000 DARPA intrusion detection scenario specific datasets. Retrievedfrom http://www.ll.mit.edu/IST/ideval/data/2000/index.html.

DEBAR, H., DACER, M., AND WESPI, A. 1999. A revised taxonomy for intrusion-detection systems.In IBM Research Report.

DEBAR, H. AND WESPI, A. 2001. Aggregation and Correlation of Intrusion-Detection Alerts.In Proceedings of the 4th International Symposium on Recent Advances in Intrusion Detection

(RAID’01).

ECKMANN, S., VIGNA, G., AND KEMMERER, R. 2000. Statl: An attack language for state-basedintrusion detection. Dept. of Computer Science, University of California, Santa Barbara.

EnCase. EnCase Forensic Tool. Available at http://www.guidancesoftware.com.

eTrust. eTrust Network Forensics Solution. Available at http://www3.ca.com/.

Flowtools. flow-tools. Retrieved from http://www.splintered.net/sw/flow-tools/.

IDMEF. Intrusion Detection Message Exchange Format. Internet draft available athttp://www.ietf.org/internet-drafts/draft-ietf-idwg-idmef-xml-14.txt.

INSTITUTE FOR SECURITY TECHNOLOGY STUDIES. 2004. Law enforcement tools andtechnologies for investigating cyber attacks: Gap analysis report. Retrieved fromhttp://www.ists.dartmouth.edu.

JAJODIA, S., NOELS, S., AND O’BERRY, B. 2005. Topological analysis of network attack vulnera-bility. Managing Cyber Threats: Issues, Approaches and Challenges.

JULISCH, K. 2001. Mining alarm clusters to improve alarm handling efficiency. In Proceedings of

the 17th Annual Computer Security Applications Conference (ACSAC’01). 12–21.

JULISCH, K. 2003. Clustering intrusion detection alarms to support root cause analysis. In ACM

Trans. Inf. Syst. Secur. 443–471.



KRUEGEL, C. AND ROBERTSON, W. 2004. Alert Verification: Determing the success of intru-sion attempts. In Proceedings of the 1st Workshop on the Detection of Intrusions and Malware

Vulnerability Assessment (DIMVA’04). Dortmund, Germany.

LEDA. LEDA graph library. Retrieved from http://www.algorithmic-solutions.com/enleda.htm.

MORIN, B. AND DEBAR, H. 2003. Correlation of intrusion symptoms: an application of chronicles.In Proceedings of the 6th International Symposium on Recent Advances in Intrusion Detection

(RAID’03).

NetDetector. Available at http://www.niksun.com/Products-NetDetector.htm.

NetFlow. Cisco IOS NetFlow protocol. Retrieved fromhttp://www.cisco.com/en/US/products/ps6601/home.html.

NING, P., CUI, Y., AND REEVES, D. S. 2002. Constructing attack scenarios through correlation ofintrusion alerts. In 9th ACM Conference on Computer and Communications Security (CCS’02).

NING, P. AND XU, D. 2003. Learning attack stratagies from intrusion alerts. In Proceedings of the

10th ACM Conference on Computer and Communications Security (CCS’03). 200-209.

NING, P. AND XU, D. 2004. Hypothesizing and reasoning about attacks missed by intrusiondetection systems. ACM Trans. Inf. Syst. Secur. 7, 4, 591–627.

PHILLIPS, C. AND SWILER, L. 1998. A graph-based system for network vulnerability analysis.In Proceedings of the New Security Paradigm Workshop. Charlottesville, VA.

QIN, X. AND LEE, W. 2003. Statistical causality analysis of INFOSEC alert data. In Proceedings

of the 6th International Symposium on Recent Advances in Intrusion Detection (RAID’03).

QIN, X. AND LEE, W. 2004. Discovering novel attack strategies from INFOSEC alerts. In Proceed-

ings of the 9th European Symposium on Research in Computer Security (ESORICS’04).

RAMAKRISHNAN, C. AND SEKAR, R. 1998. Model-based vulnerability analysis of computer sys-tems. In Proceedings of the 2nd International Workshop on Verification, Model Checking and

Abstract Interpretation (UMCAI’98).

RITCHEY, R. W. AND AMMANN, P. 2000. Using model checking to analyze network vulnerabilities.In Proceedings of the 2000 IEEE Symposium on Security and Privacy (SP’00). Washington, DC.

Safeback. SafeBack Bit Stream Backup Software. Available athttp://www.forensics-intl.com/safeback.html.

SHANMUGASUNDARAM, K., MEMON, N., SAVANT, A., AND BRONNIMANN, H. 2003. ForNet:A Distributed Forensics Network. In Proceedings of the Second International Workshop on

Mathematical Methods, Models and Architectures for Computer Networks Security (MMM’03).

SHEYNER, O., HAINES, J., JHA, S., LIPPMANN, R., AND WING, J. M. 2002. Automated generationand analysis of attack graphs. In Proceedings of the 2002 IEEE Symposium on Security and

Privacy (SP’02). Oakland, CA.

SHEYNER, O. AND WING, J. M. 2005. Tools for generating and analyzing attack graphs. InProceedings of International Symposium on Formal Methods for Components and Objects

(FMCO’05).

SIRAJ, A., M.BRIDGES, S., AND B.VAUGHN, R. 2001. Fuzzy cognitive maps for decision sup-port in an intelligent intrusion detection system. Tech. rep., Department of Computer Science,Mississippi State University.

Softflowd. Retrieved from http://www.mindrot.com/softflowd.html.

VALDES, A. AND SKINNER, K. 2001. Probablistic alert correlation. In Proceedings of the 4th Inter-

national Symposium on Recent Advances in Intrusion Detection (RAID’01).

Received June 2006; revised August 2007; accepted May 2008


A Graph Based Approach Toward Network Forensics Analysis › ~fortega › spring17 › df › research › a4-wang.pdfA Graph Based Approach Towards Network Forensics Analysis · 4:

Documents