Managing Cyber Threat Activities through Formal Modeling of CTI Data By Zafar Iqbal (Registration No: 2012-NUST-PhD-IT-35) Thesis Supervisor: Dr. Zahid Anwar Department of Computing School of Electrical Engineering and Computer Science, National University of Sciences & Technology (NUST), Islamabad, Pakistan. (2020)
156
Embed
Managing Cyber Threat Activities through Formal Modeling ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Managing Cyber Threat Activities through Formal
Modeling of CTI Data
By
Zafar Iqbal
(Registration No: 2012-NUST-PhD-IT-35)
Thesis Supervisor: Dr. Zahid Anwar
Department of Computing
School of Electrical Engineering and Computer Science,
National University of Sciences & Technology (NUST),
Islamabad, Pakistan.
(2020)
Managing Cyber Threat Activities through Formal
Modeling of CTI Data
By
Zafar Iqbal
(Registration No: 2012-NUST-PhD-IT-35)
A thesis submitted to the National University of Sciences and Technology, Islamabad,
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy in
Information Technology
Thesis Supervisor: Dr. Zahid Anwar
Department of Computing
School of Electrical Engineering and Computer Science,
National University of Sciences & Technology (NUST)
Islamabad, Pakistan
(2020)
Abstract
Cyber-attacks launched by nation-states, organizations, and individuals within and
across borders are on the rise. Modern-day adversaries change signatures and use
multiple malware to launch attacks. Such attacks are termed as Advanced Persistence
Threats (APTs). Although, a large amount of cyber threat data regarding these APTs is
available online, however, due to its high veracity and large volume, timely analysis
of APTs is a challenging task for security analysts. Moreover, it is being witnessed that
APTs launched against an organization subsequently succeeded with high probability
against other similar organizations. Therefore, it has become a need of the time that or-
ganizations accumulate and share cyber threat data with peers. Furthermore, this data
should incorporate information regarding various phases of cyber threat management
(CTM) namely cyber threat prevention, detection, and the response. In this regard, a
few efforts have been made towards the structuring and sharing of cyber threat data.
Noteworthy among these is the Structured Threat Information Expression (STIX). Un-
fortunately, the current state of the structured data is poor. Structured reports are not
appropriately formatted, use incorrect vocabulary, wrongly label threat data or leave
out key components, which curtail their usefulness for CTM. The solution presented
in this thesis to address the aforesaid problems can be categorized under three formal
sub-frameworks namely STIXGEN, SCERM, and A2CS. Each of these sub-frameworks
is designed towards obtaining three exclusive thesis goals.
The STIX Generation (STIGEN) framework is proposed and its prototype is devel-
oped to automatically generate distinct, threat relevant, and error-free structured data.
A comprehensive STIX dataset of well-known APTs has been generated and shared
with the community for the benefit of researchers.
The Structured threat data Cleansing, Evaluation, and Refinement (SCERM) frame-
work has been developed to acquire STIX reports from the STIXGEN and other re-
i
sources and uplift Cyber Threat Intelligence (CTI) data, refining incomplete or missing
components, and valuating it for different phases of CTM. During SCERM’s evalua-
tion, it is observed that current STIX reports have limited information on prevention
and almost none for the response phase of CTM. The results further demonstrate that
SCERM significantly enriches STIX reports. The improvement in prevention is 73%
and in the response is 100%.
Subsequently, the APTs Analysis and Classification System (A2CS) has been devel-
oped for automatic analysis of APTs. It employs ontology modeling and semantic rules
for APTs analysis, identification of their missing artifacts, and inferencing of the tac-
tics, techniques and procedures (TTPs) being employed. A2CS takes refined structured
data as input from SCERM and extracts both high and low-level artifacts according to
the various attacker and defender models. Then, it maps this data on the ontology that
helps in identification of the missing artifacts of APTs and inferencing of high-level
TTPs with help of low-level artifacts.
Overall the proposed solution generates refined, distinct, error-free, and properly
labeled structured threat data, valuates it for different phases of CTM and employs
different attacker and defender models for automated analysis of APTs, identification
of missing artifacts, and inferencing of the high-level artifacts.
ii
Acknowledgment
All the praises and thanks be to the Allah Almighty, Who showered his countless bless-
ings and bestowed the intellect, strength and resources upon me to complete this the-
sis.
I owe my deepest gratitude to my supervisor, Dr. Zahid Anwar, whose ever-present
support, and guidance enabled me to complete my thesis, well within the stipulated
time. Despite his prolonged commitments with a series of foreign assignments, he
always remained available to nourish my stray ideas with his valuable experiences and
strong technical background for which, I am highly indebted to him. This dissertation
would not have been completed without his guidelines and encouragement.
I am also heartily thankful to my co-supervisor Dr. Yousra Javed, and to my guid-
ance committee members, Dr. Rafia Mumtaz, Dr. Asad Waqar Malik, Dr. Hassan Islam,
and Dr. Shahzad Saleem for their effective supervision, encouragement, and guidance.
This thesis would not have been possible without the love, prayers, and support of
my parents and my wife who effectively shared my responsibilities and independently
managed all domestic affairs, thus enabled me to stay focused on my research.
I am also grateful to all members of NUST administration, particularly, Dr. Osman
Hasan (Principal SEECS), Dr. Sharifullah Khan (Senior HoD Deptt. of Computing
(DoC), SEECS), Dr. Rafia Mumtaz (HoD IT, SEECS), Dr. Rabia Irfan (PhD Coordinator
Doc), Mr. Zahid Aslam Raja (OIC Exams (PG), SEECS), Mr. Muhammad Banaras (DD
Monitoring at HQ NUST), Mr. Ejaz Ahmed (DoC Secretary) and Mr. Muhammad
Adnan Bhatti (Personnel Assistant of SHOD DoC) for their kind support and guidance
in administrative affairs. I am also thankful to all those who remember me in their
prayers, during all phases of PhD.
iii
In the name of Allah, the most Gracious, the most Merciful.
I dedicate my work to my parents, my wife and my all family members, whose sacrifices, love,
and prayers enable me to reach this stage.
iv
List of Publications
Journal Publications
1. Zafar Iqbal, and Zahid Anwar., “SCERM - A Novel Framework for Automated
Management of Cyber Threat Response Activities”, Future Generation Computer
Systems, Volume 108, July 2020, Pages 687-708, Publisher = Elsevier.
2. Zafar Iqbal, and Zahid Anwar. ”Ontology Generation of Advanced Persistent
Threats and their Automated Analysis.” NUST Journal of Engineering Sciences
registry entries, strings, API sequences, and services names used by malicious codes.
The authors in [115] present a threat intelligence system to learn attack patterns and
TA behaviors. The propose system is evaluated by employing several techniques such
as cloud-based honeypots called Kippo, Elasticsearch stack, and Kibana. The Kippo is used
for the collection of various events logs. The Elastic stack is employed for cyber threat
event search. Whereas, the Kibana is an open-source CTI visualization dashboard for
the Elasticsearch. In the proposed system, several cyber attack events are identified
such as Root trying auth none, Root trying auth password, Root failed with a password, Login
attempt failed, Channel open failed, Root authenticated with a password, Connection Lost, and
Unauthorized login.
All aforesaid works are a worthy contribution for point of data retrieval and these
efforts are complementary for our work.
41
Chapter 3. Related Work
3.11 Conclusion
In the literature review, a case is prepared that APTs are a complex cyber attack. It is ob-
served that although a massive volume of CTI data is publicly available, however, most
of the data have quality issues. Hence, APTs analysis is a challenging task. Although
tools are publicly available to generate structured CTI data, however, their produced
data is redundant, erroneous, threat-irrelevant, and does not follow threat analysis
models properly, especially that are related to CKC and POP. All of these issues be-
come a motivation for our research. During the literature review, the STIX format is
selected for the analysis of structured CTI data. Then, a tool namely STIXGEN is de-
veloped to generate error-free and threat-relevant structured CTI data in STIX format.
Subsequently, it is felt that most of the CTI data is not suitable for different phases of
CTM. Therefore, a sub-framework called SCERM is developed, which boosts, refines,
and valuates structured CTI data for the detection, prevention, and response phases
of CTM. Afterwards, it is studied that ontological modeling is an appropriate way for
the analysis of domain knowledge. Therefore, a combined ontology of CKC and POP
is developed for APTs analysis and effective CTM.
42
Chapter 4
Automatic Generation of Structured
Threat Data
4.1 Introduction
Presently, a large number of CTI data is publically available regarding APTs. How-
ever, due to the large volume and distributed nature of the data, the identification and
collection of the data for CTM is challenging. It is observed during the research that
APTs launched against an organization subsequently succeeded with high probability
against other similar organizations. Therefore, it is the need of the time that organiza-
tions compile and share CTI data with peers in a structured form for timely prevention,
detection, and the response of a cyber attack. Ironically, publically available solutions
of the structure data generation are manual and produce erroneous and redundant
CTI data, most of the time. To overcome these problems, this chapter presents a sub-
framework namely STIXGEN which takes CTI data as input and produces properly
labeled, error-free, and threat relevant structured threat data for CTM. In this regard,
the “Structured Threat Information eXpression (STIX)” format is used which is a com-
prehensive effort.
4.2 Research Approach and Contributions
We take all the aforesaid deficits as a barrier in structured data utilization and these
shortcomings have become a motivation for our research work. We designed and de-
43
Chapter 4. Automatic Generation of Structured Threat Data
veloped a prototype of the STIXGEN to overcome the issues of CTI collection, struc-
turing and sharing. We developed prototype of STIXGEN framework as lightweight
application using Microsoft Visual Basic.Net and Microsoft Access 2010 database. It
takes CTI data as an input and generates STIX report as an output. In the following
paragraph, the methodology of the STIXGEN sub-framework is presented in detail.
We not only proposed the STIXGEN sub-framework for structured threat generation
but also developed its prototype for a proof of concept.
4.3 STIXGEN System Model
Our methodology aims to develop a sub-framework for generation of error-free and
threat relevant STIX reports. During our literature review, we have found that a large
volume of CTI is available, but it is mostly unstructured. A few efforts like Open
IOC [53] and STIX are made towards the standardization of cyber threat data by gov-
ernments but are slow in adoption. Among these, we found STIX a comprehensive
one. We surveyed different security blogs, gathered STIXs and checked their quality.
We found that publically available STIXs are few and have erroneous and incomplete
information. Therefore threat analysts hesitate to use threat data. Our proposed sub-
framework STIXGEN generates threat-relevant, properly placed and error-free struc-
tured data. Therefore, we feel that it will increase the user confidence over structured
CTI data, hence the quality and usage of structured CTI data for the CTM will be in-
creased.
To describe our proposed sub-framework, we have selected well-known family of
APT i.e. Retail industry APTs [116]. According to the Illusive Networks [117], global
retail industry makes about $20 trillion sales per year through millions of dollars from
online and credit card based payment methods. This large annual revenue makes the
retail industry attractive to an attacker. The detailed description of our proposed sub-
framework and its prototype is presented in the following section.
44
Chapter 4. Automatic Generation of Structured Threat Data
4.4 STIXGEN Design and Architecture
The design and architecture of STIXGEN revolves around the STIX standard, as shown
in Figure 4.1.
Figure 4.1: STIXGEN Flow Diagram
The threat analyst gets APTs data related to different STIX components namely
campaign, TTPs, indicators, observables, incidents, COAs, exploits, TAs and feeds
them into a database. The important entities of the STIX schema have been highlighted
in Figure 4.2. Owing to the STIX requirements, we have created separate tables for each
STIX component. The STIX encoder retrieves CTI data from the database, encodes it
according to the STIX standard and generates a STIX report accordingly, which can
be further shared with peer organizations for cyber threat prevention, detection and
response.
Figure 4.2: STIXGEN’s Database Schema
45
Chapter 4. Automatic Generation of Structured Threat Data
4.5 Case Study
A case study is provided for a better understanding of STIXGEN with a real-world
example. For this purpose, we have selected well-known APTs of the retail industry. At
first, we will briefly describe the retail industry’s APTs, then we will describe how the
user feeds CTI data in STIXGEN and generates STIX reports. Subsequently, analysis of
the Generated STIX will be shared.
4.5.1 Retail Industry - APTs Selection
The retail industry comprises of individuals and companies involved in the selling
of goods and services to the end-users. Earlier, a cash register was used for record-
keeping, which has been replaced by an electronic device such as “Point of Sale (POS)
terminal”. These terminals are being used by for the payments of goods through credit
and debit cards. The POS system gets the user’s financial data from credit and debit
cards, and saves it in a central server. POS APTs are launched to steal the user’s finan-
cial data from the POS terminals and the central servers. POS APTs have more than
a dozen variants [116]. We selected some of these variants to describe the working of
STIXGEN. The detailed description of the STIXGEN sub-framework and its prototype
is presented in the ensuing sections.
4.5.2 Data Entry
First of all, a threat analyst scans different security blogs to gather CTI data related to
renowned POS APTs such as Alina [118], JackPOS [119], BackOff POS [18], CenterPOS
[120], and ProPOS [121]. After data collection, threat analyst extracts CTI data related
to STIX components from security blogs and feeds it into the database through an entry
form. The part of CTI data related to the Backoff APT collected from three different
security blogs namely SecureBox, Symantec, and RSA can be seen in Figure 4.3. It can
be identified that the SecureBox provides information regarding Campaign and TTPs
only. Whereas, the Symantec and RSA share indicator information.
46
Chapter 4. Automatic Generation of Structured Threat Data
Figure 4.3: Backoff APT and Security Blogs
4.5.3 STIX Encoder
STIX Encoder is the heart of STIXGEN. It retrieves CTI data from the database, pro-
cesses the information and generates a STIX report. The part of the STIX encoding
algorithm can be seen in algorithm: 4.1.
STIX Encoder performs the following operations:
1. First, it adds the header information (including the namespaces).
2. Then, the encoder connects to the database, as can be seen in line 5.
3. Next, it read the Campaign table, retrieves the campaign’s ID and title from the
campaign table that can seen in line number 7 to 10.
4. Accordingly, it fetches CTI data namely TTPs, indicators, incidents, TAs, observ-
ables, exploits and COAs from their corresponding tables and adds it in STIX
report, as can be seen in line 12 to 44.
5. Similarly, for the next campaign the encoder repeats step 3 and 4.
6. This process keeps going until all campaigns are processed and final STIX report
is generated which can be seen in line 47.
7. In the end, database is closed, as can be seen in line 49.
In this way, a combined STIX of the POS family having five different APTs is gen-
erated through STIXGEN. Next section provides analysis of geerated STIX.
47
Chapter 4. Automatic Generation of Structured Threat Data
Algorithm 4.1 : STIX Generation.1: Input : CTIData2: Output : STIXReport3: . Connect to Database4: Connect(DB)5: if DatabaseConnection ≡ successful then6: Read(CampaignTable)7: for all RecordofCampaign Table do8: CampaignID ≡ Campaign Table.CampaignID9: CampaignT itle ≡ Campaign Table.CampaignName10: . Adding TTP details11: for all RecordofTTPT able do12: if CampaignID ≡ TTP Table.CampaignID then13: WriteInStix(TTP Table.TTP Name)14: . Add related Exploits and COAs15: WriteInStix(Exploit Table.Exploit V alue)16: WriteInStix(COA Table.COA Name)17: end if18: end for19: . Adding Indicator details20: for all RecordofIndicator Table do21: if (CampaignID ≡ Indicator Table.CampaignID) then22: WriteInStix(Indicator Table.IndicatorName)23: WriteInStix(Indicator Table.IndicatorV alue024: . : Add related TTPs, COAs and Observables25: WriteInStix(TTP Table.TTP V alue)26: WriteInStix(COA Table.COA Name);27: WriteInStix(Observable Table.Observable Name)28: end if29: end for30: . Adding Incident details31: for all RecordofIncidentT able do32: if (CampaignID ≡ Incident Table.CampaignID) then33: WriteInStix(Incident Table.Incident Name)34: WriteInStix(Incident Table.Incident V alue)35: end if36: end for37: . Adding ThreatActor details38: for all RecordofThreatActor Table do39: if CampaignID ≡ ThreatActor Table.CampaignID then40: WriteInStix(ThreatActor Table.ThreatActor Name)41: end if42: end for43: end for44: end if45: . Generating STIX Report46: Generate(STIXReport)47: . Closing Database48: Close(DB)
48
Chapter 4. Automatic Generation of Structured Threat Data
4.5.4 Analysis of the Generated STIX
A STIXViz snapshot of the generated STIX can be seen in Figure 4.4. In the figure, five
different POS APTs namely Alina POS, JackPOS, BackOff POS, CenterPOS, and ProPOS
can be seen from left to right. Analysis details of aforesaid APTs are provided in the
following subsections while their comparisons are provided in the next section.
Figure 4.5 provides a close snapshot of the Alina APT. This APT is publically disclosed
in May 2013. In this APT, attackers generally access the target system through Remote
Desktop Login and install the malware. After installation, it identifies desired processes,
scans their memory, and gets payment card data. Afterwards, it encrypts the extracted
data by using the XORing function and then transmits it to the Command and Control
server via HTTP Post. It is believed that this APT is launched by Black Atlas Operation’s
actors against several bars and restaurants in the US.
Figure 4.5: Alina POS APT
49
Chapter 4. Automatic Generation of Structured Threat Data
4.5.4.2 JackPOS APT
A zoomed-in snapshot of the JackPOS APT STIX can be seen in Figure 4.6. JackPOS is
generally installed through Fake Java Update. Like Alina, this APT employs the Memory
Scrapping technique for data stealing. It performs Base64 encoding on the stolen data
and then transmits it to the Command and Control server by using the HTTP Post. It is
launched against several countries such as the US, India, and Spain.
Figure 4.6: JackPOS APT
4.5.4.3 BackOff POS APT
A closed snapshot of the BackOff POS APT is shown in Figure 4.7. This APT is iden-
tified first time in July 2014. The actor behind this APT uses the Remote Desktop Ap-
plications and the Brute-force login techniques for its delivery. Moreover, it employs
the Memory Scrapping and Key-Logging techniques for data extraction. This APT com-
promised more than 1000 business in the US including Target stores [122] and it stole
millions of users’ personnel data. Furthermore, this APT employs the RC4 and the
Base64 encoding to obscure the stolen data.
4.5.4.4 CenterPOS APT
Figure 4.8 shares a closed STIXViz snapshot of the CenterPOS APT. This APT is dis-
covered in Sep 2015. Like its predecessor, it employs the Memory Scrapping technique
for data stealing. It uses the HTTP protocol for data exfiltration. This APT employs
the Triple-DES standard to encrypt the stolen data. It is supposed that the CenterPOS is
launched by the actors of the Black Atlas Operation against several countries.
50
Chapter 4. Automatic Generation of Structured Threat Data
Figure 4.7: BackOff POS APT
Figure 4.8: CenterPOS APT
4.5.4.5 ProPOS APT
A zoomed-in snapshot of the ProPOS APT’s STIX is shared in Figure 4.9. It is discov-
ered in Dec 2015. This APT employs the Memory Scrapping technique for data stealing.
ProPOS performs the Base64 encoding and the XORring technique to obscure the stolen
data.
Figure 4.9: ProPOS APT
51
Chapter 4. Automatic Generation of Structured Threat Data
4.5.5 Comparison of the POS APTs
In this section, a detailed comparison of the Alina, JackPOS, BackOff POS, CenterPOS,
and ProPOS APTs is shared. These APTs are correlated in multiple ways and STIXViz
screenshots of each scenario are shared to justify the reader why all of these APTs
are kept under the common umbrella of a single-family. Details are provided in the
following subsections.
4.5.5.1 Tactics Techniques and Procedures
Generally, POS APTs employs several techniques to steal user data such as the Mem-
ory Scrapping, Key Logging, Network Sniffing, and Cameras. It can be identified from the
Figure 4.10 that all selected POS APTs namely Alina POS, JackPOS, BackOff POS, Cen-
terPOS, and ProPOS employ the Memory Scrapping technique for data stealing. The
BackOff POS APT is one that additionally employs the Key Logging technique. There-
fore, it can be inferred that aforesaid APTs belong to the same family.
Figure 4.10: TTP employed
4.5.5.2 Protocol Analysis
It is learned through various security blogs that POS APTs normally uses the HTTP
POST and Get, FTP, and DNS for the exfiltration of stolen data to Command and Con-
trol servers. Figure 4.11 highlights that HTTP POST is being employed by all of the
five aforesaid APTs.
52
Chapter 4. Automatic Generation of Structured Threat Data
Figure 4.11: Protocol Employed
4.5.5.3 Operating System Analysis
It is identified through literature that POS terminals run on different variants of the
Unix and the Windows operating system. It is also studied that the development and
maintenance of POS applications for various variants of the Windows OS are easy
as compared to Unix. Naturally, there are more Windows-based POS terminals than
Unix. This also means that Windows-based POS devices unavoidably attract the cyber
criminals. This assumption can be verified from the generated STIX as can be seen in
Figure 4.12. This figure highlights that all the aforementioned APTs are designed to
target Windows-based terminals.
Figure 4.12: Operating System Employed
4.5.5.4 Folder Analysis
APTs create folders on the victim machine for their installation and temporary storage
of stolen information. Figure 4.13 indicates that selected POS APTs use the same folder
for the installation and data storage.
53
Chapter 4. Automatic Generation of Structured Threat Data
Figure 4.13: Folder Path Employed
4.5.5.5 Encryption Analysis
Generally, the retail industry attacker employs multiple techniques to obscure the
stolen data. An earlier version of POS APTs employs simple obfuscation techniques
such as the XORing and the Base64 encoding. Whereas, the recent APTs use encryption
techniques such as the RC4 and the DES. It can be identified through Figure 4.14 that
Alina and JackPOS are earlier POS variants that employ XORing and Base64 encoding.
Whereas, the BackOff APT is a middle-age APT which employs the RC4 encryption
technique. Similarly, the CenterPOS is a recent APT that employs the Triple-DES to
obscure the stolen data.
Figure 4.14: Encryption Evolution
4.5.5.6 Comparison Outcomes
Correlation results of the aforesaid APTs are shown in Table 4.1. It can be noted from
correlation results of TTPs, protocols, OS used, and folder created that POS APTs namely
JackPOS, BackOff POS, CenterPOS, and ProPOS are various variants of the Alina POS
APT.
54
Chapter 4. Automatic Generation of Structured Threat Data
Table 4.1: Comparison of APTs
APT TTP Protocol OS Folder EncryptionAlina Memory Scrapping HTTP POST Windows %APPDATA% XOR KeyJackPOS Memory Scrapping HTTP POST Windows %APPDATA% Base64
BackOff Memory Scrapping,Key Logging HTTP POST, FTP Windows %APPDATA% RC4
CenterPOS Memory Scrapping HTTP POST Windows %APPDATA% Triple DESProPOS Memory Scrapping HTTP POST Windows %APPDATA% RC4 and XOR Key
It can be further affirmed through, encryption analysis results that selected APTs
are belong to the same family and Alina is the predecessor of the remaining four APTs.
Therefore, it can be concluded that the structuring of CTI data is the best way for the
classification of APTs.
4.5.5.7 Cyber Threat Management through Observables
Multiple observables cab be identified in Figure 4.15 such as Protocols: HTTP and FTP,
Domain Names: Jackkk[.]com and Sobra[.]ws, Files: Epson.exe, Wnhelp.exe, javaw.exe,
NTProvider.exe, windefender.exe, and driver.sys, and APIs: Process32First() and Pro-
cess32Next().
Figure 4.15: Observables for CTM
These observables can be employed for cyber threat prevention, detection, and re-
sponse phases of the CTM. For example, the detection team can monitor outbound
HTTP traffic to check if some data is being stolen. The prevention team can place
the Command and Control’s domain names under observation through the firewall to
check if the machine tries to connect to the Command and Control. Similarly, files and
folder names can be added in the Antivirus software to block any POS attack.
55
Chapter 4. Automatic Generation of Structured Threat Data
4.6 STIXGEN Evaluation
The evaluation of the STIXGEN sub-framework is based on accuracy and effectiveness.
At first, we started by collecting a variety of text-based threat reports, generated their
STIXs via state-of-the-art IBM X-Force Exchange tool and by using STIXGEN proto-
type. Then, we compared these STIXs based on the components. Next, we presented
a comparative analysis of features offered by different state-of-the-art STIX generator
tools. At the end, we provided a comprehensive STIX dataset [123] on GitHub, so that
researchers and analysts can use it for their research.
4.6.1 Accuracy
We randomly collected 10 different text reports from IBM X-Force Exchange threat
repository, generated their STIXs both by using the IBM X-Force Exchange (export op-
tion) and by employing our proposed STIXGEN prototype. Then we compared the
resulting STIX dataset based on the correctness and accuracy of the generated com-
ponents. A bar chart of the 10 APTs vs their number of indicators generated by both
IBM X-Force Exchange and STIXGEN can be seen in Figure 4.16. We choose to show
the “indicator” component here, which we thought was the most relevant. There are
three bars in the graph, where, the first bar represents indicator components present
in the input text reports, the second bar shows indicators generated by our proposed
framework STIXGEN and third bar represents indicators produced by IBM X-Force
Exchange.
The BackOff APT shown in the graph is a well-known POS APT. According to
IBM X-Force Exchange’s text report, part of which is given in Figure 4.17, it has five
different indicators namely HTTP Post, FTP and Beacons after every 45 sec, MD5 Hash
927AE15DBF549BD60EDCDEAFB49B829E. It can be observed from the graph that the
number of indicators in the blog’s input text report and the STIXGEN’s output (first
and the second bar of the graph) are exactly the same, which shows 100% accuracy
of STIXGEN. Whereas, the output of the IBM X-Force Exchange’s STIX shows 49 indi-
cators that are contradictory to the IBM input text report. Details are provided in the
ensuing paragraphs.
Upon close examination of the STIXViz snapshot of IBM X-Force Exchange’s STIX
56
Chapter 4. Automatic Generation of Structured Threat Data
Figure 4.16: IBM X-Force Exchange vs STIXGEN
in Figure 4.18(a), it can be observed that in 49 indicator components there are only
two distinct titles “Contained in XFE Collection” and “Malware risk high”, which are con-
stantly repeated. Moreover, none of these indicators match with the actual indicators
present in the IBM X-Force Exchange’s text report (Figure 4.17).
Figure 4.17: IBM X-Force Exchange Textual Report
On the basis of these outcomes, it can be seen that STIX generated by the IBM X-
Force Exchange has a greater number of components from the input text report and
many of the generated components have dummy, irrelevant and erroneous informa-
tion. Whereas, STIXGEN’s generated STIXs have the exactly same number of indica-
tor components as present in the input IBM X-Force Exchange’s text reports, which are
distinct, relevant to IBM X-Force Exchange’s text report and are error-free, see Figure
4.18(b).
57
Chapter 4. Automatic Generation of Structured Threat Data
(a) IBM X-Force Exchange (b) STIXGEN
Figure 4.18: IBM X-Force Exchange vs STIXGEN
4.6.2 Effectiveness
STIX is a new and evolving standard, devised for structuring and sharing of CTI. A
few positive efforts namely Cosive STIX Data Generator (CSDG), IBM X-Force Exchange
and Python-STIX Library have been made towards the structuring of cyber threat data.
But many of these do not take CTI as an input from the user and generally produce
erroneous STIX reports having dummy, unrelated and repeated information most of
the time. A comparison of different STIX generation tools is shown in Table 4.2, which
clearly shows that STIXGEN is easier to use than other competitors.
Table 4.2: Comparison of STIX Generators
Feature/ Tools STIXGEN CSDG Python-Lib IBM X-ForceExchange
GUI/ Console GUI GUI Console GUIUser Input Data Yes No Yes (programming required) YesSkills Required Operator Operator Programming required Operator
STIXGEN provides Graphical User Interface (GUI), which takes CTI from the user
and produces error-free, and threat-relevant STIXs. Whereas, CSDG does not take CTI
from the user but uses dummy data for STIX generation. So it is hard to say if it
produces correct STIXs. Similarly, IBM X-Force Exchange has a CTI repository from
where one can select data for STIX generation. Like STIXGEN, Python STIX Library
takes CTI directly from user. It is a console-based solution, which relies on other tools
to feed it the data components and their connections.
58
Chapter 5
Cyber Threat Response Activities
5.1 Introduction
In the previous chapter, a novel sub-framework namely STIXGEN is presented which
is designed to automatically generate distinct, error-free, and threat relevant structured
CTI data. It is learned during the research that most of the publicly available CTI data
is wrongly labeled, having incomplete artifacts, and missing important indicators re-
garding cyber threat prevention, detection, and response. Therefore, for effective CTM,
there is a need for a sub-framework that should boost, refine, and evaluate the struc-
tured CTI data. Accordingly, a formal sub-framework called SCERM is developed that
ranks, boosts and refines the structured CTI data for CTM. This chapter thoroughly
provides the details of the SCERM.
5.2 Research Approach and Contributions
For this research, our observation is that the identification and prioritization of CTI
data for different phases of cyber threat management cannot be meaningfully accom-
plished without having a formal model of threat intelligence components, their con-
nectivity, and dependency. Therefore, SCERM is proposed for the valuation of struc-
tured data, which formally models the STIX architecture [66] on the basis of the STIX
use case Managing cyber threat response activities [67]. The use case characterizes the
significance of different STIX components according to the cyber threat management
life-cycle. The use case asserts that all STIX components are not equally important
59
Chapter 5. Cyber Threat Response Activities
for every phase of cyber threat management rather certain components are more rel-
evant to a particular phase. For example, exploits and their COAs are necessary for
cyber threat prevention, indicators and observables are essential for cyber threat detec-
tion, while indicators, observables and their respective COAs are important for the cyber
threat response phase. As part of our solution we developed a prototype of SCERM,
which boosts, refines, and valuates STIX reports for cyber threat management. The
Boosting module remaps wrongly placed contents to a data model of STIX components
if required. Then, the Refinement module identifies and augments incomplete or miss-
ing artifacts. Subsequently, the Valuation component evaluates the refined CTI data and
provides valuation reports. These reports comprise of valuation score (vScore) and a list
of extracted components for every phase of cyber threat management. The valuation
and refinement processes are repeated until the STIX report improves to a threshold
suitable for use in cyber threat management. In fact, SCERM provides a starting point
for cyber threat management teams and categorizes STIX reports based on their benefit
for the prevention, detection, and the response phases of cyber threat management or
a combination thereof.
5.3 Design
This section provides a detailed description of the formal model used in the SCERM
system. The STIX Architecture based formal Model (SAM) is presented first, followed
by the formalization of the use case managing cyber threat response activities. The STIX
formal model is used to derive individual tests for different phases of cyber threat man-
agement namely cyber threat prevention, detection and response. Details are provided
in the following subsections.
5.3.1 Formal Model of STIX Architecture - SAM
The STIX architecture [66] describes cyber threat concepts as autonomous and reusable
constructs. The reason for the popularity of the STIX is that it objectively defines differ-
ent aspects of the cyber-threat that answer questions such as “what happened”, “how
the incident occurred”, “what vulnerabilities were exploited” and “who did it”. At the
same time, it establishes connections between these aspects. Based on our literature
60
Chapter 5. Cyber Threat Response Activities
review and study we have concluded that any valuation criterion must measure the
presence of these aspects as well as the associated connections. This valuation will
have to consider which aspects are more important to particular phases of the cyber
threat management and the confidence level of the reporting source regarding the CTI.
STIX is primarily designed to qualitatively model cyber threat data. The subjective
nature of descriptions of components’ properties and their contexts makes it difficult
to perform quantitative measurement of the different aspects of the threat. Particularly
the current STIX architecture cannot valuate the efficacy of STIX reports for different
phases of cyber threat management. Therefore, an alternative model called SAM is
developed, which considers characteristics of the STIX domain and relationship objects
in a quantitative fashion. This model is employed by SCERM to valuate STIX reports
for cyber threat management.
5.3.1.1 Modelling of Campaign Component
SAM defines the domain and relationship objects present in the STIX architecture [66]
as variables in a mathematical relation. The variables campaign, TTP, incident, TA, COA,
ExploitTarget, indicator, and observables are used to represent STIX domain objects. The
variables CC (Campaign Component), CrCr (Campaign related Component), TTPC
(TTP Component), TTPrCs (TTP related Component), EC (Exploit Component), ErCt
(Exploit related Component), IndC (Indicator Component), IndrCu (Indicator related
Component), IncC (Incident Component), IncrCv (Incident related Component), COAC
(Course of Action Component), COArCw (Course of Action related Component), ObsC
(Obervable Component), ObsrCx (Observable related Component), TAC (ThreatActor
Component) and TArCy (ThreatActor related Component) are employed for the selec-
tion of the aforesaid components. vScore is a variable, which is used to store the ranking
score for a STIX report.
Multiple functions such as COA ranking (CRF(coa,p)), indicator ranking
(IRF (indicator, |observable|)), producer strength (PS(p)), COA mass (CM(coa)), indi-
cator efficacy (IE(indicator)), and indicator mass (IM(indicator)) are introduced to measure
different characteristics of the aforesaid components. Similarly, j and k are iterators,
which are employed to iterate the components during the calculation of vScore. Since
the STIX architecture [66] is relatively huge, with several domain objects and complex
61
Chapter 5. Cyber Threat Response Activities
interrelations. Therefore we will explain the modeling with the help of the campaign
component and its related components. The rest follow similarly. A campaign may be
associated with one or more other campaigns, it may use related TTPs or have related
incidents and may be attributed to a TA as shown in Figure 5.1.
Figure 5.1: Campaign and its Related Components
Consider the following. campaignj belongs to Campaign run in the attack j
(campaignj ∈ Campaign ) where the cardinality of the Campaign is |Campaign| =
m camp. The symbol ∈ depicts the belongs to, whereas the symbol 3 depicts the own or
has a member relationship between components.
Similarly, ttpk belongs to TTP employed in this attack ( ttpk ∈ TTP ) where
|TTPs| = n ttp. Then the cardinality of the campaign-TTP relationship can be for-
mally expressed as in Equation 5.1.
m camp∑j=0
n ttp∑k=0
campaignj 3 ttpk (5.1)
The first summation describes the range of Campaign, whereas the second sum-
mation is used to represent the number of related TTPs. Other relations are modeled
similarly. A portion of the model that illustrates the four relationships of the campaign
can be seen in Figure 5.2.
On the left-hand side, we have the Campaign component and the arrows show the
62
Chapter 5. Cyber Threat Response Activities
Figure 5.2: Formal Depiction of the Campaign Components
relations to the several related components on the right-hand side of this figure. Each
relation is labeled by the formalism depicting the cardinality. The Valuation process
considers one or more of these components or their relationships by using selection
variables. One of these selection variables namely CrCr can be seen in this figure. The
next subsection describes the selection process in greater detail.
5.3.1.2 Component Selection
In SAM, the inclusion or exclusion of a STIX components is controlled by a single
Boolean variable. A TRUE value indicates that the component is included and a FALSE
indicates that it is excluded. In Figure 5.2, the Campaign component can be seen because
of its control variable, CC is set to TRUE. Similarly, the relationships with other compo-
nents are controlled via a vector of boolean variables. For instance, the CrCr is used to
control the campaign component’s relationship to the associated campaign, TTP, incident,
and threatactor. The subscript r indicates the index within the vector. CrC0 controls the
associated campaign relation. CrC1, CrC2 and CrC3 are used to control the TTP, incident,
and threatactor relations respectively. The complete Karnaugh map of all the variable
values of Campaign and its related components is shown in Table 5.1.
Accordingly, the details of all the variables employed in the SAM valuation model
sponse activities and (4) CTI sharing. In these, the “managing cyber threat response activi-
ties” is the most important use case, which expresses the significance of different STIX
components with the cyber threat management life-cycle. We have utilized the formal
model of the STIX architecture [66] (Equation 5.2) to derive individual tests for the val-
uation of the cyber threat management phases. Details are provided in the ensuing
subsections.
5.3.3 Cyber threat Prevention and Response Model
According to the STIX use case “managing cyber threat response activities” [67], the cyber
threat prevention team studies different preventive COAs for the identified threat and
selects suitable measures. Then, it applies these COAs e.g. software update, patch in-
stallation or firewall rules implementation for cyber threat prevention. Once the cyber-threat
has been detected, the response team takes corrective measures such as blocking the
data ex-filtration channel and restoring the systems. It is important to note that both the
prevention and response phases of the cyber threat management use the COAs. The
STIX standard defines various key properties or fields of the COA such as title, stage,
type, description, impact, cost, efficacy, and confidence. To valuate the COAs for the preven-
tion and response phases, we thoroughly studied the aforesaid properties of the COA
component and its relational bonds. Details of these are provided in the following
subsections.
5.3.3.1 Course of Action - Stage and Type
The stage property distinguishes whether the COA belongs to cyber threat prevention
or response. The default enumeration for the stage property is “COAStageVocab”. If
66
Chapter 5. Cyber Threat Response Activities
stage is set to Remedy then the COA is designed for cyber threat prevention and if its
value is Response then the COA is defined for cyber threat response, as can be seen in
Figure 5.3.
Figure 5.3: COA Stage
This property is applied through the type property, which states a class of the COA.
The type property is implemented through vocabulary “CourseOfActionTypeVocab-
1.0”. This vocabulary defines multiples classes of COA such as patching, hardening,
redirection, public or logical address restriction, eradication, perimeter or host blocking.
5.3.3.2 Course of Action - Impact, Efficacy, and Confidence
The STIX standard provides several properties such as impact, efficacy, and confidence
to describe the COA. (1) The impact property describes the repercussion of implement-
ing the COA. (2) The efficacy states the effectiveness of the COA in getting its intended
goals. (3) The confidence property gives the level of trust of the analyst on the assigned
scores of the impact and efficacy. The STIX standard uses an enumeration “HighMedi-
umLowVocab”, which defines vocabulary to express the various level of these proper-
ties such as unknown, none, low, medium, and high.
To measure the strength of a COA, the following procedure is adopted. (1) At first,
aforesaid qualitative vocabulary levels are converted into quantitative values 0, 1, 2,
and 3, respectively, for the valuation of the COA, as can be seen in Table 5.3. (2) Then
four functions namely CM(coa), I(coa), E(coa) and C(coa, string) are introduced. The
I(coa) (Equation 5.3) and E(coa) (Equation 5.4) functions take coa as input and extracts
67
Chapter 5. Cyber Threat Response Activities
Table 5.3: Levels of Impact, Efficacy, and Confidence for Course of Action
EnumerationVocabulary Values
AssignedNumerical Values
High 3Medium 2Low 1None or Unknown 0
the impact and efficacy levels, which are from 0 to 3 according to Table 5.3.
I (coa) 7−→ {0 , 1 , 2 and 3}
where : 0 , 1 , 2 and 3 are impact levels(5.3)
E (coa) 7−→ {0 , 1 , 2 and 3}
where : 0 , 1 , 2 and 3 are efficacy levels(5.4)
The C(coa, “impact or efficacy”) function takes the coa as well as a string argument asinput. When the caller function passes “impact” as a string then the C(coa, “impact”)function gives the confidence score for the impact of the subject COA, as can be seen inEquation 5.5. This function may results impact and efficacy score from 0 to 3.
C (coa, “impact ′′) 7−→ {0 , 1 , 2 and 3}
where : 0 , 1 , 2 and 3 are confidence levels(5.5)
On the other hand, when “efficacy” is passed then C(coa, “efficacy”) function pro-
duces a confidence score for the effectiveness of the COA, which can be seen in Equa-
tion 5.6.
C (coa, “efficacy ′′) 7−→ {0 , 1 , 2 and 3}
where : 0 , 1 , 2 and 3 are confidence levels(5.6)
(3) The CM(coa) is the main function, which calls the aforesaid IM(coa), E(coa), and
C(coa) functions, adds their produced scores namely impact, efficacy, and confidence,
as shown in Equation 5.7.
CM(coa) = I (coa) + C (coa, “impact ′′) + E (coa) + C (coa, “efficacy ′′) (5.7)
68
Chapter 5. Cyber Threat Response Activities
5.3.3.3 Course of Action and its Associations
According to the STIX architecture [66], there are three producers of the COA namely
the victim, indicator and the exploit target components, which can be seen in Figure 5.4.
Figure 5.4: COA and its Relations
Producers are components that convey COA details for the threat under considera-
tion. Upon close examination of the figure, different types of relational bonds between
the COA and its producer components can be observed. These bonds are labeled as
COA taken, COA requested, suggested COA, potential COA, and related COA. The strength
of these bonds can be judged on the basis of the experience and the knowledge of the
analyst who authored the producer component.
The most reliable and trustworthy producer is the victim himself, because he faced,
analyzed, and responded to the cyber attack. Therefore, the bond “COA taken” is con-
sidered as the highest level and is given a value 5, as can be seen in Table 5.4.
Table 5.4: COAs Producers and their Strength
Producer Bonding Producer StrengthIncident COA Taken 5 or HighestIncident COA Requested 4 or Medium-highIndicator Suggested COA 3 or MediumExploit Potential COA 2 or Medium-lowCOA Related COA 1 or LowCOA Nil 0 or Nil
Whereas, the requested COA is the second highest or of medium-high bond level
because it is identified by the victim after the analysis and observation of the actual cy-
69
Chapter 5. Cyber Threat Response Activities
ber attack but somehow he could not apply it. Hence, it is considered a second higher
remedial action for the cyber attack. Therefore, it is assigned a value 4. The “suggested
COA” is considered as a medium bond, because it is suggested by an expert after study
and analysis of the cyber attack. Hence, it is assigned a value 3. The COA produced on
the basis of common sense knowledge namely “potential COA” is more of an estimate
and is of a medium-low level or value 2. The related COA is considered as a weak bond
because some of the producers generically associate certain defense mechanisms with
each other without considering the cyber attack scenario. For example, the firewalls and
IDSes are commonly associated with network defence even though in actuality each of
these have their own specific utilization when considering the exact network attack in
question. Hence if a related COA has been mentioned in the STIX then the proposed
model assigns it a low-level value of 1. Similarly, the value of “ Nil or 0” is assigned a
COA, which does not have any association with a producer.
Afterwards, a function namely PS(p) is introduced to measure the strength of a
COA’s producer, which can be seen in Equation 5.8. It take producer as input and
returns the producer strength score according to Table 5.4.
PS (p) 7−→ {0 , 1 , 2 , 3 , 4 , and 5}
where : P ∈ producer of coa
0, 1, 2, 3, 4, and 5 are producer strength scores
(5.8)
5.3.3.4 Ranking of a Course of Action
To rank the COA component, a CRF function is introduced, which can be seen in Equa-
tion 5.9. It accepts coa and its producer (p) as input arguments and passes these to
the CM(coa) (sec. 5.3.3.2) and PS(p) (sec. 5.3.3.3) functions, respectively. The CM(coa)
function produces the mass score of a COA, while PS(p) function returns the producer
strength score. Finally, these scores are added (CM(coa) + PS(p)) to produce the rank-
ing score of the COA.
CRF (coa, p) = CM (coa) + PS (p)
where : coa ∈ COA
p ∈ the producer of the COA.
(5.9)
70
Chapter 5. Cyber Threat Response Activities
The STIX use case - “managing cyber threat response activities” [67], the COA com-
ponent properties and its relational bonding is a basis for us for the valuation of the
STIX reports for the prevention and response phases of the cyber threat management.
The valuation metric is formalized for all the relations enumerated in the SAM model
which have COA components and can be seen in an Equation 5.10 for the cyber threat
prevention and response phases.
vScore = COAC·m coa∑j=0
CRF (COAj , Nil)
+COArCw·m coa∑j=0
n rCOA∑k=0
CRF (COAj , RelatedCOAk)
+IndrCu·m Ind∑j=0
n coa∑k=0
CRF (Indicatorj , COAk)
+IncrCv ·m Inc∑j=0
n coareq∑k=0
CRF (Incidentj , COARequestedk)
+IncrCv ·m Inc∑j=0
n coaTaken∑k=0
CRF (Incidentj , COATakenk)
+TTPrCt·m Exp∑j=0
n coa∑k=0
CRF (ExploitTargetj , COAk)
(5.10)
To automate the relations selection procedure for the cyber threat prevention and
response phases, it is required to set the component selection variables in the SAM
equation (Equation 5.2) according to Table 5.5.
Table 5.5: Variables for Prevention and Response phases
Variables ValuesCC, CrCr 0 , r = 0TTPC, TTPrCs 0 , s = 0EC, ErCt 0 , t = 2IndC, IndrCu 0 , u = 5IncC, IncrCv 0 , v = 3, 4COAC, COArCw 1 , w = 1ObsC, ObsCx 0 , x = 0TAC, TArCy 0 , y = 0
The first column in the table shows the components selection variables, while the
second column represents the values of the variables for the automatic inclusion or
exclusion of the STIX component and to reduce the SAM equation for the prevention
71
Chapter 5. Cyber Threat Response Activities
and response phases. The detailed procedure for the inclusion or exclusion of a STIX
component is already provided in section: 5.3.1.2.
5.3.4 Cyber threat Detection
According to the STIX use case - “managing cyber threat response activities” [67], in or-
der to detect the cyber attack, after having defined threat indicators, the cyber threat
detection team collects and monitors the indicators and their observables in their cyber
environment.
The use-case suggests that for cyber threat detection the indicators and their observ-
ables such as IPs, port numbers, protocols, hashes, files or folders names, APIs and registry
entries used by the attacker are key components. These are forensic artifacts of the cy-
ber attack and are important for identifying the occurrence of the attack on the host
or within the network. The response team studies these and takes remedial actions to
block or respond to the cyber attack. In fact, cyber threat detection and response is
not possible without these components. The STIX standard defines several properties
or fields of the indicator components such as title, type, description, valid time position,
observables, indicated TTP, likely impact, confidence, and sighting. To valuate the indicator
component for the cyber threat detection phases, we thoroughly studied the afore-
said properties and the indicator’s classification model - the POP [26]. In the ensuing
subsections, we will describe how we formalized the key properties of the indicator
component to measure its strength and how POP’s levels are formalized into efficacy
score.
5.3.4.1 Indicator - likely impact and confidence
The STIX standard provides likely impact and confidence properties to describe the in-
dicator component. (1) The likely impact property describes the probable impact of
the indicator if it occurred. (2) The confidence property provides the level of trust of
the correctness of the provided indicator. The STIX standard defines the “HighMedi-
umLowVocab” enumeration. It states various levels of the likely impact and confidence
properties such as unknown, none, low, medium and high.
To measure the strength of the indicator the following steps are applied. (1) The
aforesaid vocabulary levels for likely impact and confidence are quantified in the range 0
72
Chapter 5. Cyber Threat Response Activities
to 3, with 0 being the lowest and 3 being the highest, similar to how the trustworthiness
levels of the COA component properties were mapped as shown in section: 5.3.3.2. (2)
Next, IM(indicator) function is employed, which takes indicator component information
as an input and forwards this information to the LI(indicator) and C(indicator) functions.
Where, the LI(indicator) function produces the likely impact and C(indicator) function
gives the confidence level for the impact of the subject indicator. Subsequently, these
scores are added to produce the indicator mass score, which can be seen in Equation
5.11.
IM (indicator) = I (indicator) + C (indicator) (5.11)
5.3.4.2 Formalization of POP indicator levels as efficacy scores
The POP model emphasises that all indicators are not equally important for cyber at-
tack detection. It classifies the indicators on the basis of their efficacy and places them
at different levels of the pyramid. Moreover, this model suggests that the higher an
indicator is in the pyramid, the more useful it is for cyber threat management because
it causes more damage to the adversary and it is difficult to change, as the adversary
invests more resources and time on indicators that are higher in the pyramid. For ex-
ample, responding to low-level indicators namely hashes, IPs and DNs will cause minor
damage, while preventing high-level indicators such as host and network artifacts, tools
and TTPs will cause more pain to the adversary. According to the various levels of the
POP, efficacy scores are assigned to the indicators, as shown in Table 5.6.
A lower score value is assigned to the low-level indicator than that assigned to the
higher level indicator. For example, a score of 5 is assigned to exploit watchlist which
is at a higher level than hash watchlist that is assigned a 1. All indicators are assigned
scores in this fashion. Next, IE(indicator) function is introduced, which takes indicator
as an input argument and returns indicator efficacy score according to the Table 5.6,
as can be seen in Equation 5.12.
IE (indicator) 7−→ {1 , 2 , 3 , 4 , and 5}
where : 1 , 2 , 3 , 4 , and 5 are indicator efficacy scores(5.12)
into a list called CL[]. Next, this list is traversed and distinct components are saved
into a list named as distinct component list (DCL), as can be seen in line 10 to 18.
(5) Subsequently, in line 20 to 34, the Booster function respectively retrieves the com-
ponent information from the DCL[], saves into a variable called misplaced component.
Booster assumes that the selected component is a misplaced component and compares
it with already stored components in the component dictionary (CD[][]). The dictio-
nary is comprised of SDOs and their related components. If components are found
equal, then the Booster function further verifies whether types of both the components
are similar or not. If found dissimilar, this indicates that the selected component was
77
Chapter 5. Cyber Threat Response Activities
wrongly placed in the STIX report. Afterwards, the Booster function places the mis-
placed component under a native component list (NCL[]), which can be seen in line
28. (6) This process keeps going until all components are processed. (7) In the end,
remapped components information from the NCL[] is saved into the DB for the valua-
tion and refinement processes.
Algorithm 5.2 : Preprocessing.1: Input := STIX Report2: Output := A Boosted STIX Report3: . Variables:4: CD[][] := ADictionary to boost the components.5: SDO = STIX Domain Objects.6: dcl index := 07: . Connecting with Database8: Connect(DB)9: . STIX report parsing and saving extracted components into a Graph Database10: DB := Parser(STIX Report)11: . STIX Booster : Gets distinct components from the Graph Database12: CL[] := reading(DB)13: for each (component in CL[]) do14: for each (component in DCL[]) do15: if (DCL[iterator i] = CL[iterator j]) then16: continue17: end if18: DCL[dcl index++] = CL[iterator j]19: end for20: end for21: . STIX Remapper : Remaps wrongly placed components under their native SDOs22: for each docomponent in DCL[]23: . Assuming component is misplaced24: misplaced component := DCL[iterator]25: . Validating above assumption26: for each do(SDO (s) in CD[][])27: for each do(related component (rc) of selected SDO)28: if (misplaced component = CD[s,rc]) then29: if (misplaced component.Type != CD[s,rc].Type) then30: NCL[] := selected component()31: Delete(misplaced component)32: end if33: end if34: end for35: end for36: end for37: save(DB,NCL[])
5.4.2 Valuation
The Valuation module takes the database as an input and retrieves the STIX compo-
nents. It formally evaluates the STIX model to generate valuation scores for cyber
threat prevention, detection, and response. These scores are communicated to the an-
alyst to aid him in prioritizing the intelligence. Subsequently, the Valuation module
triggers the Refinement module for possible STIX refinement and to increase the valua-
tion score.
78
Chapter 5. Cyber Threat Response Activities
5.4.3 Refinement
The Refinement module consists of three submodules namely (1) the Component Ana-
lyzer, (2) the Crawler, and (3) the Adder. The refinement algorithm is shown in algorithm
5.3. It performs the following operations. (1) The Component Analyzer retrieves, an-
alyzes and processes the component information. (2) At first, it extracts components
from the graph database that are necessary for different phases of cyber threat man-
agement. Then it saves these identified components into a list called “component list
(CL[]”. Based on expert feedback from the security community, we have determined
that these components are TTPS, exploits, indicators, observables, and COAs.
Algorithm 5.3 : Refinement.1: Input:= Boosted STIX Graph2: Output:= Refined STIX Graph and Report3: CL[] := read(DB)4: . Component Analyzer: Identifies incomplete components5: for each component(C) in CL do6: if Ci of SDO(i).Table 6∈ corresponding SDO(j).Table then7: ICL(iterator)(0) := Ci.T itle8: ICL(iterator)(1) := SDO(j).Name9: end if10: end for11: . Crawler: Crawls prepared dataset (PD[][]) and extracts required components12: for each component(C) in ICL do13: for each component (C) in PD do14: if ICL[iterator i)(0).PD(iterator j)(0) then15: if ICL(iterator i)(1) = PD(iterator j)(1) then16: . adding and saving ICL and PD components in a single list17: CCL(iterator k)(0) := ICL(iterator j)(0)18: CCL(iterator k)(1) := ICL(iterator j)(1)19: CCL(iterator k)(2) := PD(iterator j)(2)20: iterator k ++21: end if22: end if23: iterator j ++24: end for25: iterator i++26: end for27: . Gets and saves retrieved components them into the STIX Graph DB28: save(DB,CCL[])
(3) Then, Component Analyzer identifies incomplete components, saves their title
and required component (SDO) name into a incomplete component list (ICL[]). This list
has incomplete artifact information such as a TTP without an exploit, an exploit or an
indicator without a COA. (4) Afterwards, the Crawler function proceeds to process the
required list of components (ICL[]), as can be seen in line 11 to 26. It crawls a prepared
dataset called PD[][] of curated blog reports, retrieves the missing artifacts and com-
ponents, and saves them into a list named as comprehensive component List[] (CCL[]),
which can be seen in line 17 to 19. (5) Finally, retrieved information is fuses into the
STIX graph database, which can be seen in line 28. Accordingly, refined component
79
Chapter 5. Cyber Threat Response Activities
information is made available to the analyst as well as to the Valuation module. The
Valuation module then evaluates the refined STIXs and the cyclic feedback process re-
peats until the STIX converges to an optimum or desired valuation score determined
by the analyst or stops improving.
5.5 Case Study
We consider a recent APT to demonstrate the working of the proposed system SCERM
in performing valuation and refinement. On the basis of the outcomes, we will guide
the user which STIX report is better for the prevention, detection, and response phases
of the cyber threat management. In subsequent subsections, first of all, we will briefly
introduce the selected STIX report and summarize its components details. Then, we
will explain how signal boosting is performed by the remapping of the CTI data. Next,
boosted STIX report will be valuated for different phases of the cyber threat manage-
ment. After that, we will precisely describe how CTI data from security blogs are used
for the refinement of the STIX report. Subsequently, we will perform the valuation of
refined STIX report. In the end, a comparison of the refined and raw STIX reports will
be provided on the basis of their valuation scores.
5.5.1 APT Selection
We selected a high impact APT meant for cyber-espionage attributed to the threat
group “TG-3390” [124], which also goes under the aliases Goblin Panda, APT27, Emis-
sary Panda, Hellsing, Cycledek as well as Bronze Union. Since 2013, the APT has been
launched against various sectors such as aerospace, pharmacy, intelligence, energy, nuclear
as well as the defense to steal high-value information. In order to precisely demonstrate
the valuation and refinement functionality of SCERM, a holistic STIX report compris-
ing of CTI components beneficial for all three phases of cyber threat management was
desired. For this purpose a number of cybersecurity blogs were scanned and a rea-
sonably good sample from the IBM X-Force threat exchange was retrieved that reports
incidents attributed to “TG-3390”.
80
Chapter 5. Cyber Threat Response Activities
5.5.2 A Brief Description of the Report
Threat Incident reports that are provided by IBM are made available in both textual
form as well as in STIX XML and JSON formats. A small portion of the TG-3390’s
XML based STIX report retrieved from the IBM X-Force threat exchange can be seen in
Figure 5.6.
Figure 5.6: IBM X-Force STIX XML
For ease of the reader in correlating the XML tags with STIX components, the com-
ponent labels have been highlighted as well as annotated in the figure. Figure 5.7
shows the same portion of the STIX report in a visual format displayed using the
STIXViz tool. The reader will notice the STIX components TTPs, cybox, and indica-
tors defined as XML tags as well as he will notice icons representing the same in the
figure.
During visual analysis of the STIX report multiple STIX components such as TTPs,
indicators, and observables are identified. Details of these components are as follows. (1)
There are 120 TTPs in the IBM STIX file, which can be divided into five groups on the
basis of their titles such as heuristic, trojan, virus, worm, and spyware. 12 out of 120 TTPs
can be identified in the figure (Figure 5.7).
(2) 98 indicators are observed in the STIX report that can be equally divided into
two types. (a) indicators with a title “Contained in XFE Collection”. (b) indicators with the
title “Malware risk high”. 5 out of 98 indicators are shown in the figure (Figure 5.7). (3)
Similarly, there are 49 observables in the STIX report, which have a title “XFE Observable
81
Chapter 5. Cyber Threat Response Activities
Figure 5.7: STIX-1: IBM X-Force STIX
for” concatenated with the different hash values. 4 out of 49 observables can be seen in
the figure (Figure 5.7). This STIX report like numerous other structured threat data,
provided in threat feeds, depicts a high level of noise. For instance, a small portion of
the input IBM TG-3390 text report can be seen in Figure 5.8. There are two important
concepts namely Remote Access Trojans and Spearphishing emails as TTPs shown in the
report, however, both of these TTPs are not present in the STIX file (Figure 5.7) that is
produced.
It is important to notice that the IBM text report also highlights some COAs such as
Keep applications, OS, antivirus and associated files up-to-date and block all URL, hash, and
IP based IoCs at the firewall, IDS, routers but these are also missing in the STIX report.
These are just a couple of examples that were illustrated. In total there are 7 concepts
that have a discrepancy in that they are not reflected in both the text report and the
structured STIX output or do not have the proper labels. In the next subsection, we will
show how SCERM consolidates these noise discrepancies and boosts the intelligence
signal of the structured report.
82
Chapter 5. Cyber Threat Response Activities
Figure 5.8: IBM Text Report
5.5.3 Signal Boosting
During our research, it has been observed that STIX reports are not appropriately for-
matted, use incorrect vocabulary and are either missing key components or have erro-
neously labeled elements reducing their usefulness for effective cyber threat manage-
ment. For example, the TG-3390 STIX report selected for the case study has important
CTI data under the description tag. By zooming into the description tag in the figure
(Figure 5.6), the CTI data related to important STIX components such as TTPS, indica-
tors, observables, and COAs can be identified, as shown in Figure 5.9.
This CTI data is exactly the same as the IBM text report (Figure 5.8). The Signal
Booster retrieves CTI data from the description tag of the STIX and remaps it under
the appropriate STIX component’s tag. For example, the CTI Remote Access Trojans and
Spearphishing emails are placed under the TTP tag, the Keep applications, OS, antivirus
and associated files up-to-date are placed under the exploit components, while Block hashes
at the firewall, IDS, routers are placed under the observable component. Then the updated
information is stored into the shared graph database. The updated STIX report, gen-
erated from the boosted components information, has meaningful, threat-relevant and
distinct CTI Data, as shown in Figure 5.10.
Upon close examination of the updated STIX report, followings components infor-
mation can be observed. (1) There are 2 TTPs namely Remote Access Trojan and Spear
83
Chapter 5. Cyber Threat Response Activities
Figure 5.9: IBM STIX Description Portion
Figure 5.10: STIX-2 : Boosted STIX Report
phishing. (2) The indicator labeled as “Hash watchlist” can be identified. It has several
“Hashes” as observables, which can be used for cyber threat detection. (3) There are
multiple COAs such as Keep application, software and antivirus update and Block hashes at
firewalls and gateways, which can be used for cyber threat prevention and response.
84
Chapter 5. Cyber Threat Response Activities
5.5.4 Valuation of the TG-3390 Boosted STIX Report
The valuation module retrieves the boosted STIX report components’ information from
the graph database for the valuation and prioritization. Then, it evaluates the STIX
model and automatically generates valuation: reports, scores and graph for different
phases of the cyber threat management. These reports provide key STIX components
such as TTPs, exploits, indicators, observables and their corresponding COAs to users in
filtered form for every cyber threat management phase. The valuation details of the
IBM STIX report for different phases of cyber threat management is provided in the
ensuing subsection.
5.5.4.1 Valuation for Cyber Threat Prevention
As discussed earlier, STIX’s components namely TTPs, exploit targets, and COAs are
important for cyber threat prevention. The valuation module retrieves these boosted
STIX’s components from the graph database. Then it generates a valuation report
as well as a valuation score for the prevention phase of cyber threat management.
Regarding TG-3390, the valuation report guides the analyst that this APT employs a
Spearphishing TTP. The TTP uses an email attachment as an exploit, which can be seen in
Figure 5.11. It further indicates that the analyst can safeguard his organization from the
aforesaid exploit by employing COAs namely use up-to-date-antivirus and use up-to-date
OS and applications.
Figure 5.11: Valuation Report for Cyber Threat Prevention
With reference to TG-3390, the valuation score (vScore) for cyber threat prevention
phase is shown in Table 5.8, while calculation details of (vScore) are as follows:
• The aforementioned COAs are potential remedies for the spearphishing email ex-
ploits; hence each of these will get a producer strength score PS(p) as 2 (Table 5.4).
85
Chapter 5. Cyber Threat Response Activities
• The impact score of the first COA “use up-to-date antivirus” I(coa) with a high level
of confidence is 4 and the efficacy score E(coa) of the COA with a medium level of
confidence is 3. The substitution of the impact and efficacy scores in the Equation
5.7 outputs the COA Mass score of 7. According to the Equation 5.9, the COA’s
ranking score is computed as (CM(coa)+(PS(p)), which is 9.
• The impact score I(coa) of the second COA “use up-to-date OS and applications”
with a medium level of confidence is 5, while the efficacy score (E(coa)) with a high
level of confidence is 6. The substitution of these scores in the Equation 5.7, results
in the COA Mass score of 11. According to the Equation 5.9, the COA’s ranking
score is calculated as PS(p) + CM(coas), that is 13 in this case. The procedure of
this calculation can be seen in Table 5.8.
• The overall valuation score (vScore) of the IBM STIX report for the prevention
phase of cyber threat management is the sum of the individual ranking scores of
all COAs. In this case, for the two COAs, this computes to 22 (Equation 5.10).
Low-Level High-Level Low-Level High-LevelVirus Total URL and Log Files Yes No No No No NoBro Log Files Yes No Yes No No NoSplunk Log Files Yes No Yes Yes No NoMLSec CTI Feeds Yes No No Yes No NoFeedRank CTI Feeds Yes No No Yes No NoTISA Structured / Unstructured Textual Data Yes No Yes Yes No NoSML Unstructured Textual Data Yes Yes No No No NoSCERM Structured Textual Data (STIX Reports) Yes Yes Yes Yes Yes Yes
5.6.3.1.2 Statistical Comparison To conduct a fair comparison of the level of effi-
cacy achieved by SCERM’s boosting and refinement with competing machine learn-
ing tools we selected the MLSec Project. MLSec provides the Uniqueness and Enrich-
ment tests which may be directly compared to the Boosting and Refinements functions
of SCERM. In the experiment, the MLSEc software is downloaded and installed from
the provider’s website. Then, CTI data from the Attack.Mitre is extracted and labeled
to test the efficacy of both systems. MLSec accepts data in .csv format while SCERM
receives .xml or .json files as input. Therefore the extracted data is encoded in both .csv
and .xml files without any loss of information. Afterwards, MLSEc is opened and its
Uniqueness and Enrichment tests are performed on the .csv file. Similarly, the .xml file is
processed through SCERM and results are shown in Figure 5.21 where tests names are
shown on the x-axis, while corresponding scores are provided on the y-axis.
It can be observed from the figure (Figure 5.21) that results produced by SCERM
are more accurate than MLSec. Using manual analysis, it was identified that there were
50 unique malicious IPs in extracted CTI data. SCERM extracted all IPs while MLSec
96
Chapter 5. Cyber Threat Response Activities
Figure 5.21: Statistical Comparison
extracted 40 IPs only. Upon investigation of this behaviour, it was observed that MLSec
was unable to identify the CTI data for boosting which was ambiguously labeled in
the input file by the provider. Similarly, during the manual investigation of refinement
results, it was observed that there were 20 Domain Names in the input data and SCERM
used them to identify and extraction of additional 20 IPS during refinement.
To confirm the effectiveness of the SCERM system further, it is shared with domain
experts. They performed multiple tests and confirmed the efficacy of the generated
results. Moreover, it is endorsed that valuation, prioritization, and extraction of STIX
components such as TTPs, indicators, exploits, observables and their COAs are not possi-
ble to perceive manually. The experiment’s details and outcomes are provided in the
next section.
5.6.3.2 User Study
A study is carried out to verify the effectiveness of the SCERM system from the user’s
viewpoint in terms of cyber threat management. The proposed framework’s proto-
type is provided to the participants with all the prerequisite configuration details and
sample STIX reports. They are asked to use the SCERM system and share results. The
participants’ demography summary is provided in Table 5.13. All of them belong to
the information security domain and have experience between 1 to 5 years.
Table 5.14 provides users’ feedback regarding the SCERM system. It reveals that
100% of the participants feel that the current state of the structured threat data is poor
and there is a need for a tool which performs data boosting, refinement, and evaluation.
80% of the participants acknowledged that SCERM is easy to use. 90% of the users
admit that SCERM’s results are accurate, efficient, and easy to understand as compared
97
Chapter 5. Cyber Threat Response Activities
Table 5.13: Participants Details
User details CountTotal Participants 20
Education GraduatePostgraduate
Expertise:Cyber Threat Analysis 12 (60%)Software Development 8 (40%)
Experience: 1 to 5 yearsAge 22 to 35 YearsKnowledge of:
to the manual method. Few of the suggestions are about the automation of the boosting
dictionary “CD[][]” (can be seen in algorithm 5.2) and curated list “PD[][]” (can be
seen in algorithm 5.3) generation. It is worth mentioning, 100% of the users confirm
that automatic analysis of STIX reports and key components extraction by SCERM for
cyber threat prevention, detection, and responses phases allow them to perform cyber
threat management efficiently.
Table 5.14: SCERM Evaluation Survey
Survey Questions ResultsIs there a current need of automatic boosting, enhancement 100%and quality testing of the structured threat intelligence.How you compare SCERM and other tools which you usedfor CTM:
The SCERM is easier to use. 80%Its results are more accurate than others. 90%Its outcomes are easy to understand. 100%It provides additional outcomes. 100%
During SCERM’s experiment, how you perceive:The directory generation process of components 80%remapping module is a simple procedure.The curated list preparation for refinement is an easy task. 70%
The automatic analysis of STIX reports and key components 100%extraction for different phases of CTM allow meto perform CTM more efficiently.
5.6.4 Efficiency
To study the algorithmic efficiency of SCERM, CPU utilization during processing and
memory space usage is analyzed and found to be quite low. The details of the experi-
ment are provided below and an intuitive discussion of this is as follows. The SCERM
98
Chapter 5. Cyber Threat Response Activities
design is based on simple and concise scripts that are extensible and do not rely on a
particular platform or technology.
To enhance efficiency, SCERM is designed to perform functionalities in an offline
mode such as the (1) parsing of input reports and (2) preparation of Booster’s com-
ponent dictionary and Refinement’s dataset. As CTI for new threats emerge these can
be added to the database incrementally. Moreover, the algorithms presented in sec-
tion 5 have a polynomial running time in the size of the input. In our implementation
we have pre-computed constant parts, array elements have been carefully referenced,
conditional statements are properly terminated, the database has been normalized and
the code avoids redundant computations.
The efficiency measurement is performed by processing different sets of STIX re-
ports (Table 5.11). These reports are provided offline and are processed in a batch-
processing fashion. To test the efficiency of the SCERM system, it is deployed on an
Intel(R) Pentium(R) machine with CPU B950 @ 2.10GHz and 6 GB of RAM. The OS of
the machine is Windows 7 Ultimate, 64-bit. Minor increases in processor and memory
utilization are observed by varying the number of STIX reports and their sizes.
5.6.4.1 Processor utilization
The efficiency testing of the SCERM system in terms of CPU utilization is performed
as follows. (1) First of all, 2500 STIX reports are imported and 5 different sets are
composed. These sets comprise of 500, 1000, 1500, 2000 and 2500 STIX reports. (2)
Next, each set of STIX report is processed through the SCERM system and processor
usage and execution time is calculated, which can be seen in Figure 5.22 (a). It can
be observed that several sets of STIX reports are shown on the x-axis, while the CPU
utilization in terms of CPU percentage and execution time is provided on the y-axis. The
solid line shows the CPU utilization, while the dotted line depicts the execution time of the
SCERM’s software.
It can be noticed that the increase in CPU execution time is proportional to the
change in the number of input STIX reports, whereas, CPU utilization percentage
increases slightly. It is important to highlight that at the time of writing in the At-
tack.Mitre [20] knowledge base there are 100 intrusion activities (groups), whereas dur-
ing testing we run SCERM on 2500 reports and no degradation in CPU or memory
99
Chapter 5. Cyber Threat Response Activities
(a) CPU Utilization (b) Memory Utilization
Figure 5.22: SCERM Efficiency
usage is observed (Figure 5.22), therefore it is reasonable to say that it is an efficient
tool for the work load in the IT enterprise.
5.6.4.2 Memory Utilization
The proposed framework SCERM performs three main operations namely boosting,
valuation, and refinement and all of these operations consume memory. Figure 5.22 (b)
presents the memory usage by the SCERM framework. It can be identified that sets of
STIX reports are shown on the x-axis, while the memory usage is provided on the y-axis.
The figure shows that memory usage slightly increases with respect to the number of
input files.
100
Chapter 6
APTs Analysis and Classification
System
6.1 Introduction
This chapter presents the procedure and techniques adopted by the APTs Analysis
and Classification System A2CS for automatic analysis of APTs, identification of their
missing artifacts, and inferencing of the Tactics, Techniques and Procedures being em-
ployed. In the A2CS sub-framework, a combined ontology of CKC and POP models
is developed. SWRL rules are written for APTs analysis and identification of their
missing artifacts. Furthermore, a case study of the Point of Sales (POS) system is also
presented to demonstrate the working of the A2CS.
6.2 Research Approach and Contributions
In the recent past, several models have been proposed related to cyber attack analysis
of which two particulars models are of interest and are more popular. These models
are the CKC [25] and the POP [26]. The CKC guides an analyst regarding how a perpe-
trator uses different phases such as Reconnaissance, Weaponization, Delivery, Exploitation,
Installation, and Exfiltration to launch a cyber attack. It further guides the security an-
alyst regarding how various signatures and artifacts available at different attack levels
can be used to defend their network from advanced cyber attacks. Whereas, the POP
model describes the efficacy of indicators namely Hash values, IP addresses, DNs, Net-
101
Chapter 6. APTs Analysis and Classification System
work artifacts, Host artifacts, Tools, and TTPs. It places these indicators at different levels
of the pyramid. Moreover, it states that the treatment of the low-level artifacts such as
hash values, IPs, and DNs cause less damage to the attacker while high-artifacts like host
and network artifacts, tools and TTPs cause more damage.
Heretofore, the CKC and POP are theoretical models and are not used in real se-
curity solutions. These models are complementary to each other and the cyber attack
picture cannot be seen holistically without using one of these models. Due to these rea-
sons, a combined ontology of both models is developed that can be seen in Figure 6.1.
In the proposed ontology, 45 classes, 44 objects, and 10 data properties are developed.
The blue circles in the figure depict entities of CKC, orange circles are associated with
POP and green entities are common.
Figure 6.1: Combined Ontology of CKC and POP
At first, real examples of the Point of Sale’s (POS) well-known APTs are selected
for the demonstration of the A2CS. Afterwards, various security blogs are scanned to
102
Chapter 6. APTs Analysis and Classification System
gather CTI data related to these APTs. Although, a significant amount of CTI data is
found, however, the following challenges are faced. (1) The conversion of extracted
CTI data in a structured form and developing its connection and relationship is a chal-
lenging task. Moreover, it is learned that Ontology is the best way to develop and
analyse such relationships. Accordingly, extracted CTI data is mapped on the CKC
and the POP models. (2) The second problem with CTI data is that it generally con-
tains low-level artifacts while the high-level artifacts related to most of the APTs are
missing. Therefore, at first, incomplete artifacts are identified in the CTI data. Then,
high-level artifacts are deduced through a combination of low-level artifacts.
6.3 A2CS Architecture
The A2CS architecture can be seen in Figure 6.2. Details of its various modules are pre-
sented by using POS APTs namely JackPOS and BackOff. These APTs are selected from
a large family of POS APTs [116] which comprises of Reedum, Fsyna, Dexter, Treasure
hunt, Posfind, Alina, Poseidon, JackPOS, and BackOff. A2CS system fetches web reports
from the internet and forwards these to the Parser module.
Figure 6.2: A2CS Flow Diagram
103
Chapter 6. APTs Analysis and Classification System
The Parser parses the data and extracts the entities and concepts. Next, the Mapper
module correlates these extracted concepts with different phases of CKC and POP. As
the example is shown in Figure 6.3. The outputs of the Mapper module are as follows:
• Installation/ Host Artifacts: These artifacts are registry entries, filenames or folder-
name. For example, during installation phase, the JackPOS creates files namely
%Temp%
svchost.exe, java.exe, javaw.exe, javcpl.exe , and the BackOff creates javaw.txt, Log.txt,
Local.dat, winserv.exe files.
• Network Artifacts: These artifacts are related to the Command and Control (C2) or
Domain Name. In this phase, both the malware are using the HTTP protocol and
hard-coded domain names to communicate with C2.
• TTPs: The BackOff malware uses both Memory Scraping and Keystroke logging tech-
niques for data stealing while JackPOS uses Memory Scraping technique, only.
Then Mapper module feeds this extracted information into the knowledge base.
Next, the Reasoner module executes the rules over the knowledge base. The next section
will give details of the reasoning module.
Figure 6.3: Concepts Extraction and Mapping
104
Chapter 6. APTs Analysis and Classification System
6.4 Analysis via Reasoning
During research, various methods for the analysis of APTs are employed such as Time
analysis, Common Artifacts analysis, and TTPs analysis for evaluation of the proposed
sub-framework. Whereas, Risk analysis, Dependency analysis, and Complexity analysis
are planned in the future work.
6.4.1 Identification of Missing Artifacts
As a result of our studies, it is observed that high-level artifacts of APTs are generally
missing. In this research, two types of techniques are developed for the identification
of these missing artifacts. Using the first technique called Time analysis. A2CS fetches
information regarding various aspects of the APT from multiple reports of different
date and time and combine them in the ontology knowledgebase. For example, in
our case of information retrieval regarding the BackOff APT, concerning Host artifacts
are retrieved from the Symantec portal whereas Network artifacts are extracted from
IBM X-force, as shown in Figure 6.4. This is important because threat sources usually
specialize in particular aspects of threat reporting.
Figure 6.4: Identification of Missing Artifacts
105
Chapter 6. APTs Analysis and Classification System
The second technique is called Common Artifacts analysis. It concerns the aug-
mentation and enrichment of information about an incomplete APT from information
about known or previously studied APTs of the same family. For example, JackPOS is a
recent successor of BackOff and is therefore not as well studied as the latter. Our knowl-
edgebase already consisted of information regarding BackOff APT’s stealing methods
and affected device. When the reasoning module correlated the artifacts of both, it
concluded that since both are attacking the same domain i.e. the retailer industry and
directly affecting the terminal. Therefore, JackPOS may be employing a similar stealing
Method as used by the BackOff.
A number of queries are developed for the identification of missing artifacts in the
Semantic Query-Enhanced Web Rule Language (SQWRL), a sample of these queries are as
follows.
The Query-1 Equation 6.1 correlates files and folders names, and identifies the com-