CYBER FORENSICS
Module 2
Introduction to Cyber forensics
Interrelation among Cybercrime, Cyber Forensics and Cyber Security - Historical background of
Cyberforensics - Cyber Forensics – Definition, Need, Objectives, Computer Forensics
Investigations, Steps in Forensic Investigation, Forensic Examination Process, Classification of
Cyber Forensics , Benefits of Cyber Forensics, Incident and Incident Handling - Computer
Security Incident Response Team
______________________________________________________________________________
Interrelation among Cybercrime, Cyber Forensics and Cyber Security
Cyber crime - any criminal offence that involves a computer/network or an ECD
Cyber forensics / Computer Foreniscs – focus on investigation of the cyber crimes
Deals with acquisition, analysis and admissibility of digital evidence from a
computer or any ECD after the occurrence of a cybercrime
Evidence gathered will be used in criminal proceedings
An attempt is made to determine what has happened to the digital media, as a result
of the incident
Cyber security – refers to technologies , processes and practices designed to protect
networks, computers, programs, and data from the attack, damage or unauthorized access
It determines vulnerabilities that exist in the network, computers programs, and
data and patches the loop holes
SO Cyber forensics is a response to the cyber crime to investigate if any adverse has
happened and to determine the source of the incident
But Cyber security prevents incidents of cybercrime with the implementation of
security measures
All these 3 are interrelated where every cybercrime drives the cyber forensics team
and the cyber security team to work in tandem , to respond to and to prevent cyber
crimes
In computing context, security can be viewed from different perspectives
Security of any organization has 4 dimensions:
IT :
Application security – applications should be developed to overcome vulnerabilities and
threats
Computing security – efficient security policy should be in force so as to avoid threats
Data security – data of the organization should be secured from unauthorized
manipulation, theft, loss and secrecy
Information security – ensures confidentiality, integrity and availability of the information
Network security – organization network should be secure enough to facilitate safe data
transfer
Physical Security:
Facilities security – all the equipments within organization must be secured from physical
damage, system crash and power failure
IT
Security
Physical
Security
Financial
Security
Legal
Security
Human security – employees within the organization must be provided with security
training be aware of the security process
Financial Security
Organization must adopt appropriate measures to be financially immune to threats from
insiders and outsiders
Legal Security
National security – It involves checking for any lapse in security from threats that arise out
of nationwide issues
Public security - security from threats associated with societal issues such as riots, strikes
or clashes
CYBER FORENSICS
Its an electronic discovery technique that is used to determine and reveal technical
criminal evidence
It involves extraction of electronic data for legal purposes
Also known as Computer forensics
Definition :
“Computer forensics is the study of evidence from attacks on computer systems in order to
learn what has occurred, how to prevent it from recurring and the extent of damage”
“Forensic computing is the process of identifying, preserving, analyzing and presenting
digital evidence in a manner that is legally acceptable ”
And etc
Need for Cyber forensics
1. Traditional approaches like finger printing, DNA extraction are insufficient to
prove an incident or they end in deadlock during an investigation
2. Due to advent of internet offences and crimes now span a diverse range from
hacking till cyber terrorism, so it is essential to curb and control
3. Cybercrime has changed the mode of operation of the crime hence investigation
must be performed by cyberforensics body rather than regular crime branch
4. Cybercrime spread across the boundaries in no time, so its necessary to address
incidents that conflict with legal provisions
5. Cyberforensics is essential to prosecute a criminal if a compromise of some sort is
observed
Objectives of Cyber forensics
1. To identify the evidence associated with a malicious activity in short span of time
2. To recover and analyze the evidence and related materials from computers and
ECDs
3. To present the collected evidence in a court of law
4. To estimate the potential impact of malicious activity
5. To assess the intention and identity of the offender
Computer/Cyber Forensics Investigation
Cyber crime investigation works in phases:
First phase : Preliminary analysis – by forensic investigator – gathers information on crime
scene
Second phase : works on forensic copy acquisition and recovery
Third phase : perform detailed analysis and prepares comprehensive report
To do this forensic investigator should have extensive knowledge on this area and highly
specialized skills The evidence has to be gathered from ECDs
Steps in Forensic Investigation
Cyber forensics should ensure integrity of the evidence while handling and analyzing so
that the evidence is admissible in court
Steps are:
1) Investigation starts when a crime is reported or complaint is recieved
2) In response to the complaint the following are made:
a. If evidence has to be gathered from a third party, a notice is served
b. If it is a criminal offence , a First Information Report (FIR) is filed
c. A search warrant (if required) is obtained from court
3) First responder / Computer Emergency Response Team(CERT) procedures are
performed
4) Evidence is seized from the crime scene, by photographing the scene and marking
the evidence
Necessary documentation is done
Witnesses present during the seizure of evidence and the suspect can be interviewed
If there are any complications in evidence collection , or the investigation officer
does not possess evidence collection expertise , a third party expertise may be called
in
Chain of custody has to be documented
5) The collected evidence is numbered and securely transported to the forensic
laboratory for analysis
6) The following are done at forensic lab:
a. Two bit stream copies of the evidence are collected. The hash values of the
original and forensic copies are verified
b. Chain of custody is maintained
c. Original evidence is stored in a secured location
d. Forensic copy is analysed for evidence
e. A forensic report is prepared stating the methods and recovery tools used,
the potential evidence and the findings
f. The report is presented to the client
7) In some cases forensic investigator may be called to testify in court as an expert
witness
The task performed by the forensic investigator is as follows:
a. Determine the extent of crime and damage caused due to it
b. Recover the data to be investigated from ECDs
c. Collect the evidence from ECDs in a forensically sound manner
d. Ensure the integrity of the evidence
e. Analyse the evidence
f. Consider all possible conclusions of investigations
g. Prepare a forensic report
h. Testify in court if required
Forensic Examination Process
Following steps are required for reconstruction of technical aspects of the data and to
analyse computer usage to prove a crime, examine residual data and to authenticate data
by technical analysis
Identification – Attempts to determine the evidence present, where is it stored and how is it
stored, the context of the evidence present, either physical in disk drive as hardware and
software components or logical as location of the evidence in the drive
Procedure used to locate the evidence should be documented
.Acquisition - it is required for the incident that has already occurred
Based on this and type of information on an ECD and its format, the tools and strategy
used for acquisition will vary
Extraction - forensic investigator will extract data from it. Volatile data is lost , so a copy
of it is made and is compared with the original one.
Preservation - integrity of original evidence has to be preserved, and is ensured by creating
its forensic copy for analysis
Evaluation – it attempts to ascertain and analyse if the evidence is relevant to the case.
Irrelevant information may be filtered out to avoid confusion
Interpretation – interpretation of what is found during analysis should be done in an easily
understandable way
Presentation - suitability of the evidence with respect to the case has be presented before
the court. Documentation has to be prepared – chain of custody and evidence analysis
Methods employed in Forensic Analysis
Data Recovery
o Recovering and analyzing deleted files that have not been
overwritten
o Carving out portions of text from the unallocated and slack space
String and keyword searching
o Attempts to identify the readable text within a binary file or
specific string within a file
o Search is made with known and unknown files as well as
unallocated and slack space
Volatile evidence analysis
o It gives details about the state of the system by looking into
connections, processes and cache tables gathered from the RAM
Timeline analysis
o Attempts to create a timeline of events and makes analysis on the
basis of modified, accessed and changed times associated with files
that are imaged
System file analysis
o Reveals any unauthorized changes that are made to system
binaries
Benefits of Cyber forensics
ECD subjected to forensic examination is protected from alteration, damage, data
corruption and viruses
Files, hidden files and password protected files are discovered and deleted data is
recovered from ECD
Contents of the hidden files and swap files used by the application programs as well
as operating system are revealed
The contents of password protected and encrypted files are recovered using tools
All possible and relevant data present in special areas of the disk are analysed
All the possible relevant files are discovered
It offers expert consultation and testimony when required
Classification of Cyber forensics
1. Disk forensics
2. Network forensics
3. Wireless forensics
4. Database forensics
5. Malware forensics
6. Mobile device forensics
7. GPS forensics
8. Email forensics
9. Memory forensics
Disk forensics
It is the process of extracting forensic information from storage media such as hard disks,
USB drive, CD, DVD, flash drive and floppy disk etc
Steps in Disk forensics are:
Identification of evidence – locates source of evidence at crime scene
Seizure and acquisition of evidence - at the crime scene hashvalue of the original
evidence in the storage after seizure is computed using a forensic tool.
Hash value is stored, evidence is packed and sealed
Acquisition is the process of taking bit-by-bit copy of the original evidence which
itself is write protected . This is done in forensic lab
Authentication and analysis of evidence – this done at forensic lab where he hash
values of both original media and forensic copy are compared to make sure that
they are the same
Preservation of evidence - after acquisition and authentication, evidence is kept in a
place that is secured from magnetic and other radiation sources
Analysis of evidence – process of collecting the evidence from storage media
Report on findings – prepare case analysis report that includes all the details:
examination, analysis , authentication. It should also include observation of
examiner
Documentation – every activity at each step is documented to make the case
admissible in court
Disk Forensics challenges:
1. Text search utility is usually used by examiners to find the keywords which would
serve as evidence
It becomes impossible to gather evidence a)if the keyword is misspelt, b)files in the
media are encrypted or c) stored as graphic
Certain graphic file will open only if it is extracted from the image file and opened
with respective software
In such case its responsibility of the examiner to look for other alternatives to gather
evidence rather than concluding that the evidence does not exist
2. Hidden files, encrypted files, files with disguised names and files whose extension
are altered and hidden areas in the storage media provide room for hiding
evidential data unless specialized tools are used to analyse them
Network forensics
Refers to capture , recording and analysis of network events so as to discover any malicious
activity, security attack, or any violation
It finds applications in cases relating to hacking, fraud, data espionage, data theft,
defamation, narcotics trafficking, credit card cloning, software piracy , sexual harassment
etc.
Tools for analysis:
1) Intrusion detection system – monitors networks and systems under it for malicious
activity or policy violations and maintains a record of the activity
Any such activity will be reported to the network administrator
2) Logging gathers and records the activity on a network with the help of IDS which
can help in tracking an offender or hacker
3) Packet capturing tools can gather and record every bit exchanged between ant two
designated hosts . Since large amount of data is generated by these tools in a short
span of time, it cannot be used to capture data for a longer time
4) NetFlow data collector gathers and records data about every network connection-
eg: source, destination, the volume of the data , since it captures only summary, this
can be used to gather data for longer periods
Network forensic challenges:
1. Large volume of data generated by network everyday , it is tedious to search for an
evidence
2. The inherent anonymity of internet protocols, with MAC address at datalink layer,
IP address at network layer and an email address at the application layer and the
possibility that all these can be spoofed poses biggest challenge in identifying the
source of the incident
3. Single purpose tools for collecting, filtering and stream reassembly from
applications, routers, firewalls are insufficient to figure out the network activity.
Raw network packets should be captured to gather highest level of traffic, this is
possible with sniffing
4. Sessioning is the act of assembling raw packets between specified points as a
complete stream which helps in gathering information about specific
communication. Protocol analysis tools can be used to produce a tree oriented view
of sessions and such visual presentation gives clear picture of what happened on the
network.
Wireless forensics
Is associated with network forensics
It involves capturing the data moving over the network and analyzing network events so as
to uncover network anomalies, discover source of security attacks and investigate breaches
on computers and wireless networks
Evidence collected can correspond to plain data, or with the broad usage of Voice over IP
technologies, especially over wireless networks and can include voice conversations
Traffic analysis in wireless networks involves following stages:
1. Data normalization and mining to search through the data
2. Traffic pattern recognition for identifying suspect patterns
3. Protocol dissection for analyzing the header fields
4. Reconstruction of application sessions for visualization
Forensic tool: A Network forensic analysis tool (NFAT) is available for network forensics
but no alternate for wireless as such
1. Graphical wireshark protocol dissector is used to inspect every field of the frame
captured
2. Ngrep to search for specific strings in contents of frame
3. Text based tcpdump or tshark sniffers to automate and script the analysis of certain
tasks eg: filtering traffic based on specific conditions
Database forensics
Malware forensics
Mobile device forensics
GPS forensics
Email forensics
Memory forensics
Chapter 5
Introduction to Cyber Forensics
© Oxford University Press 2018. All rights reserved.
Outline
• Interrelation among Cybercrime, Cyber Forensics and Cyber Security
• Cyber Forensics
• Disk Forensics
• Network Forensics
• Wireless Forensics
• Database Forensics
• Malware Forensics
© Oxford University Press 2018. All rights reserved.
Outline (Cont…)
• Mobile Forensics
• GPS Forensics
• Email Forensics
• Memory Forensics
• Building Forensic Computing Lab
• Incident and Incident Handling
• Computer Security Incident Response Team
© Oxford University Press 2018. All rights reserved.
Interrelation among Cybercrime, Cyber Forensics, and Cyber Security• Cybercrime:
• Any criminal offence.
• Involves a computer/ network.
• Computer forensics: Focuses on the investigation of cybercrimes.
• Cyber security: Prevents cybercrime with the implementation of security measures.
© Oxford University Press 2018. All rights reserved.
Cyber Forensics
• Cyber Forensics• Electronic discovery technique.
• Determine and reveal technical criminal evidence.
Definition
• Computer forensics is the study of evidence from attacks on computer systems in order to learn what has occurred, how to prevent it from recurring and the extent of damage.
- McGraw-Hill Dictionary of Scientific and Technical Terms
© Oxford University Press 2018. All rights reserved.
Cyber Forensics (Cont…)
Need
• Traditional approaches are either insufficient or endin deadlock.
• Wide range of cyber offences and crimes.
• Perpetrators of cybercrimes have changed the modusoperandi.
• Cybercrime can spread across boundaries in no time.
• Integrity and existence have to be ensured.
© Oxford University Press 2018. All rights reserved.
Cyber Forensics (Cont…)
Objectives
• To identify the evidence.
• To recover and analyze the evidence and relatedmaterials.
• To present the evidence in a court of law.
• To estimate the impact of the malicious activity.
• To assess the intention and identity of the offender.
© Oxford University Press 2018. All rights reserved.
Cyber Forensics (Cont…)
Computer Forensics Investigations
© Oxford University Press 2018. All rights reserved.
Cyber Forensics (Cont…)
Steps in Forensics Investigations
© Oxford University Press 2018. All rights reserved.
Cyber Forensics (Cont…)
Forensic Examination Process
• Identification – attempts to determinecthe vvff ukikk
• Acquisition
• Extraction
• Preservation
• Evaluation
• Interpretation
• Presentation
© Oxford University Press 2018. All rights reserved.
Cyber Forensics (Cont…)
Methods Employed in Forensic Analysis
• Data recovery: Recovering and analyzing deleted files.
• String and keyword searching: Identifying readable textor specific string.
• Volatile evidence analysis: Details about the state of thesystem.
• Timeline analysis: Analysis on modified, accessed andchanged times.
• System file analysis: Analysis on any unauthorizedchanges.
© Oxford University Press 2018. All rights reserved.
Cyber Forensics (Cont…)
Classification of Cyber Forensics
• Disk Forensics
• Network Forensics
• Wireless Forensics
• Database Forensics
• Malware Forensics
• Mobile device Forensics
• GPS Forensics
• Email Forensics
• Memory Forensics
© Oxford University Press 2018. All rights reserved.
Cyber Forensics (Cont…)
Benefits of Cyber Forensics
• Protected from alteration, damage, data corruption,and viruses.
• Files, hidden files, and password-protected files arediscovered.
• Deleted data is recovered.
• Content of password-protected and encrypted filesare accessed.
• Data present in the special area of the disk areanalyzed.
• Offers expert consultation and testimony.
© Oxford University Press 2018. All rights reserved.
Disk Forensics
• Extracting information from storage media.• Hard disk.
• USB drive.
• CD.
• DVD.
• Flash drive.
• Floppy disk.
© Oxford University Press 2018. All rights reserved.
Disk Forensics (Cont…)
• Steps in disk forensics:• Identification of evidence.
• Seizure and acquisition of evidence.
• Authentication and analysis of evidence.
• Preservation of evidence.
• Analysis of evidence.
• Reports on findings.
• Documentation.
© Oxford University Press 2018. All rights reserved.
Disk Forensics (Cont…)
Challenges
• Text search utility:• Misspelt keywords.
• Encrypted files.
• Files stored as graphics.
• Difficult to gather evidence:• Hidden files, files with disguised names , and files whose
extensions are altered.
• Hidden areas in the storage media.
© Oxford University Press 2018. All rights reserved.
Network Forensics
• Capture, recording, and analysis of network events.
• Cases:• Hacking.
• Fraud.
• Data espionage.
• Data theft.
• Defamation.
• Narcotics trafficking.
• Credit card cloning.
• Software piracy.
• Sexual harassment.
© Oxford University Press 2018. All rights reserved.
Network Forensics (Cont…)
Tools for Analysis
• Intrusion Detection System (IDS): Monitors networksand systems.
• Logging: Gathers and records the network activity.
• Packet capturing tools: Gather and record every bitexchange.
• NetFlow data collector: Gathers and records dataabout every network connection.
© Oxford University Press 2018. All rights reserved.
Network Forensics (Cont…)
Challenges
• Large volume of data in the order of gigabytes.
• Spoofing of inherent anonymity of Internet protocols.
• Single-purpose tools are insufficient to figure outnetwork activity.
• Protocol analysis tools: Produce a tree-oriented viewof sessions.
© Oxford University Press 2018. All rights reserved.
Wireless Forensics
• Capturing the network data and analyzing thenetwork events.
• Goal: To collect and analyze network traffic.
• Stages of traffic analysis:• Data normalization and mining.
• Traffic pattern recognition.
• Protocol detection.
• Application session reconstruction.
© Oxford University Press 2018. All rights reserved.
Wireless Forensics (Cont…)
Forensic Tools
• Graphical Wirelesshack protocol detector: Inspectsevery field of the frame.
• ngrep (network grep): Searches for specific strings.
• Text-based tcpdump / tshark sniffers: Automate andscript the analysis of certain tasks.
© Oxford University Press 2018. All rights reserved.
Wireless Forensics (Cont…)
Challenges
• Radio frequency communication and the complexityof the medium.
• Tracking data during roaming.
• Handling processing overheads and storage.
© Oxford University Press 2018. All rights reserved.
Database Forensics
• Determine the security breach to a database.
• Sources for database breach:• Files where the metadata resides.
• Cached data (Internal structures).
• Index files (Logical structures).
Forensic Approaches
• Reactive Approach.
• Proactive Approach.
© Oxford University Press 2018. All rights reserved.
Database Forensics (Cont…)
Forensic Methodology
• Investigation Preparedness.
• Incident Verification.
• Artifact Collection.
• Artifact Analysis.
© Oxford University Press 2018. All rights reserved.
Malware Forensics
• Finding the malicious code.• Determining how it got there and changes caused.• Malware forensic process begins with the examination of
the following:• Master boot record.• Volatile data.• System files.• Hash of the files.• System programs.• Auto-start locations.• Host-based logs.• File system artifacts.• Web browsing history.• Suspected malicious files.
© Oxford University Press 2018. All rights reserved.
Malware Forensics (Cont…)
Malware Analysis
• Removal of malware.
• Scanning of machine for malware.
• Gathering of data / evidence
© Oxford University Press 2018. All rights reserved.
Mobile Forensics
• Recovery of digital evidence from mobile devices.
Stages
• Seizure.
• Preparation.• Legal authority.
• Goals of examination.
• Make, model, and identifying information of device.
• Removable and external data storage.
© Oxford University Press 2018. All rights reserved.
Mobile Forensics (Cont…)
• Acquisition.
• Evidence examination.
• Presentation and reporting.
Analysis Tools
• Manual extraction.
• Logical extraction.
• Physical extraction.
• Chip-off.
• Micro read.
© Oxford University Press 2018. All rights reserved.
GPS Forensics
• Recovery of live and deleted data from differentnavigation devices.
© Oxford University Press 2018. All rights reserved.
Email Forensics
• Email tracing and email tracking can be achieved withemail forensics.
• Tracing: Done when an email header is available.
• Tracking: Done even when no information is available.
Client and Server in Email
• Email Clients run programs such as Outlook Express,Eudora, or Pine.
• Servers run specialized software such as Windows Server2003 or Novell Netware.
• Servers run programs such as Exchange, GroupWise, orSendmail.
© Oxford University Press 2018. All rights reserved.
Email Forensics (Cont…)
Structure of Email
• Header: Email information source.• Message body: compiled by the user and is stored as binary
data.• Attachments: 80% of email data.Working of Email
• Composed using mail client (Gmail, Yahoo mail, etc.).• Client sends the message to a mail transfer agent (MTA).• MTA is a server that runs simple mail transfer protocol
(SMTP).• Header information is placed on the top.• Timestamp is added.• Recipient accesses the mail server using POP3 or IMAP.
© Oxford University Press 2018. All rights reserved.
Email Forensics (Cont…)
Email Protocols
• Post office protocol (POP).
• Internet message access protocol (IMAP).
• Microsoft’s mail API (MSMAPI).
Examining Email Messages
• Accessing the victim’s computer.
• Retrieving the evidence.
• Investigation• Look for, open, and copy the evidence in the email along
with header.• Look for protected and encrypted material.
© Oxford University Press 2018. All rights reserved.
Email Forensics (Cont…)
Viewing Email Headers
• Information in the email header:• Unique identifying numbers.
• IP address of the sending server.
• Time the mail was sent.
• Headers can be viewed using• GUI clients.
• Command-line clients.
• Web-based clients.
© Oxford University Press 2018. All rights reserved.
Email Forensics (Cont…)
Examining Email Headers
• Return path.
• Recipient’s email address.
• Type of sending email service.
• IP address of the server from where the mail hasbeen sent.
• Name of the email server.
• Unique message number.
• Date and time at which the mail has been sent.
• Information related to attached files.
© Oxford University Press 2018. All rights reserved.
Email Forensics (Cont…)
Tracing Email Messages
© Oxford University Press 2018. All rights reserved.
ARIN
Email Forensics (Cont…)
Tracing Email Messages
© Oxford University Press 2018. All rights reserved.
APNIC
Email Forensics (Cont…)
Email Servers and their Examination
• FINALeMAIL• Scans email database files.
• Recovers deleted files.
• FTK• Filters and finds files specific to email client and servers.
© Oxford University Press 2018. All rights reserved.
Email Forensics (Cont…)
Tracking Emails
• Services• Readnotify.
• DidTheyReadIt.
• getnotify.
© Oxford University Press 2018. All rights reserved.
Memory Forensics
• Examination of volatile data in a computer’s memorydump.
RAM Artifacts
• Network connections.
• Running process.
• Usernames and passwords.
• Dynamic link libraries.
• Contents of open window.
• Open registry key of process.
• Open files for process.
• Memory resident malware.© Oxford University Press 2018. All rights reserved.
Memory Forensics (Cont…)
RAM Analysis
• Tools• Volatility: Free and open-source
• HBGary: Proprietary.
Forensic Tools
• Magnet RAM Capture.
• Belkasoft Live RAM Capturer
• MoonSols Dump.
• FTK Imager.
© Oxford University Press 2018. All rights reserved.
Building Forensic Computing Lab
© Oxford University Press 2018. All rights reserved.
Requirements:
• A log register should be maintained at the entrance of the lab as a layer of monitoring for protection.
• The lab area should be secured by cipher combination locks to ensure that the chain of custody is maintained.
• The lab should be equipped with fire safety measures.
• The work area should be equipped with necessary infrastructure such as work tables, chairs, and storage capability.
• The evidence storage area should have a strongly constructed metal shelf, be non-destructive, and fire-proof.
• A forensic toolkit should contain disassembly and removal tools, packaging and transport supplies, etc., facilitating the examiner to collect evidence from the crime scene.
Building Forensic Computing Lab (Cont…)• A forensic lab should have the following: workstations, UPS, book
racks with necessary reference materials, necessary software and tools, safe locker, LAN, and Internet connectivity.
• Necessary hardware equipment.
• Necessary software.
• Internet connectivity with sufficient bandwidth for the workstations is necessary.
• Multiple forensic tools as required.
© Oxford University Press 2018. All rights reserved.
Incident and Incident Handling
Incident
• An event or set of events that threaten the security ofcomputing systems and the network.
Incident Handling
© Oxford University Press 2018. All rights reserved.
Incident and Incident Handling (Cont…)Incident Reporting
• Report the incident to the CERT Coordination Center, a lawenforcement agency or CSIRT.
Incident Response
• Ascertain the affected resources.
• Assess the incident.
• Assign a unique identity to the event.
• IIC coordinates with the task force.
• Collect the information related to the evidence.
• Perform forensic analysis.
© Oxford University Press 2018. All rights reserved.
Computer Security Incident Response Team • Service organization.
• Receives reports, and reviews.
• Responds to computer security incidents.
• Members:• Incident investigator and coordinator (IIC).
• Incident liaison (IL).
• Senior system manager.
• Information system security officer.
Forensic Readiness
• Incident response procedures in place along withtrained personnel to handle any investigation.
© Oxford University Press 2018. All rights reserved.
Using Data Mining Techniques in
Cyber Security Solutions
Data mining is the process of identifying patterns in large datasets. Data mining techniques are heavily used in scientific research (in order to process large amounts of raw scientific data) as well as in business, mostly to gather statistics and valuable information to enhance customer relations and marketing strategies.
Data mining has also proven a useful tool in cyber security solutions for discovering vulnerabilities and gathering indicators for baselining.
The process of data mining
What is data mining? In general, it is a process that involves analyzing information, predicting future trends, and making proactive, knowledge-based decisions based on large datasets.
While the term data mining is usually treated as a synonym for Knowledge Discovery in Databases (KDD), it’s actually just one of the steps in this process. The main goal of KDD is to obtain useful and often previously unknown information from large sets of data.
The entire KDD process includes four steps:
Pre-processing – selecting, cleaning, and integrating data Transformation – transforming information and consolidating it into forms appropriate
for mining Mining – collecting, extracting, analyzing, and statistically processing data Pattern evaluation – identifying new and unusual patterns and presenting the
knowledge gained from data mining
Data mining helps you find new interesting patterns, extract hidden (yet useful and valuable) information, and identify unusual records and dependencies from large databases. To obtain valuable knowledge, data mining uses methods from statistics, machine learning, artificial intelligence (AI), and database systems.
In recent years, many IT industry giants such as Comodo, Symantec, and Microsoft have started using data mining techniques for malware detection.
Data mining methods
Many methods are used for mining big data, but the following eight are the most common:
Association rules help find possible relations between variables in databases,
discover hidden patterns, and identify variables and the frequencies of their occurrence.
Classification breaks a large dataset into predefined classes or groups.
Clustering helps identify data items that have similar characteristics and
understand similarities and differences among data. The decision tree technique creates classification and regression models in the
form of a tree structure. The neural network technique is used to model complex relationships between
inputs and outputs and to discover new patterns. Regression analysis is used for predicting the value of one item based on the
known value of other items in a dataset by building a model of the relationship between dependent and independent variables.
Statistical techniques help find patterns and build predictive models.
Visualization discovers new patterns and shows the results in a way that is
comprehensible for users.
You can apply one or several data mining methods to create an efficient model that will ensure successful detection of attacks.
Data mining for malware detection
Data mining is one of the four detection methods used today for detecting malware. The other three are scanning, activity monitoring, and integrity checking.
When building a security app, developers use data mining methods to improve the speed and quality of malware detection as well as to increase the number of detected zero-day attacks.
Malware detection strategies
There are three strategies for detecting malware:
Anomaly detection Misuse detection Hybrid detection
Anomaly detection involves modeling the normal behavior of a system or network in order
to identify deviations from normal usage patterns. Anomaly-based techniques can detect even previously unknown attacks and can be used for defining signatures for misuse detectors.
The main problem with anomaly detection is that any deviation from the norm, even if it is a legitimate behavior, will be reported as an anomaly, thus producing a high rate of false positives.
Misuse detection, also known as signature-based detection, identifies only known attacks
based on examples of their signatures. This technique has a lower rate of false positives but can’t detect zero-day attacks.
A hybrid approach combines anomaly and misuse detection techniques in order to
increase the number of detected intrusions while decreasing the number of false positives. It doesn’t build any models, but instead uses information from both harmful and clean programs to create a classifier – a set of rules or a detection model generated by the data mining algorithm. Then the anomaly detection system searches for deviations from the normal profile and the misuse detection system looks for malware signatures in the code.
Detection process
When using data mining, malware detection consists of two steps:
Extracting features Classifying/clustering
In the first step, various features such as API calls, n-grams, binary strings, and program behaviors are extracted statically and dynamically to capture the characteristics of the file samples. Feature extraction can be performed by running static or dynamic analysis (with or without actually running potentially harmful software). A hybrid approach that combines static and dynamic analysis may also be used.
During classification and clustering, file samples are classified into groups based on feature analysis. To classify samples, you can use classification or clustering techniques.
To classify file samples, you need to build a classification model (a classifier) using classification algorithms such as RIPPER, Decision Tree (DT), Artificial Neural Network (ANN), Naive Bayes (NB), or Support Vector Machines (SVM). Clustering is used for grouping malware samples that have similar characteristics.
Using machine learning techniques, each classification algorithm constructs a model that represents both benign and malicious classes. Training a classifier using such file sample collection makes it possible to detect even newly released malware.
Note that the effectiveness of data mining techniques for malware detection critically depends on the features you extract and the categorization techniques you use.
Data mining for intrusion detection
Aside from detecting malware code, data mining can be effectively used to detect intrusions and analyze audit results to detect anomalous patterns. Malicious intrusions may include intrusions into networks, databases, servers, web clients, and operating systems.
There are two types of intrusion attacks you can detect using data mining methods:
Host-based attacks, when the intruder focuses on a particular machine or a group of machines
Network-based attacks, when the intruder attacks the entire network (for instance, causing a buffer overflow
To detect host-based attacks, you need to analyze features extracted from programs, while to detect network-based attacks, you need to analyze network traffic. And just like with malware detection, you can look for either anomalous behavior or cases of misuse.
Data mining for fraud detection
Fraudulent activities can be detected with the help of supervised and unsupervised learning.
With supervised learning, all available records are classified as either fraudulent or non-fraudulent. This classification is then used for training a model to detect possible fraud. The main drawback of this method is its inability to detect new types of attacks. Unsupervised learning methods help identify privacy and security issues in data without using statistical analysis.
Data mining pros and cons
Using data mining in cyber security lets you
process large datasets faster; create a unique and effective model for each particular use case; apply certain data mining techniques to detect zero-day attacks.
While this list of the benefits is impressive, there are also certain drawbacks you need to know about:
Data mining is complex, resource-intensive, and expensive Building an appropriate classifier may be a challenge Potentially malicious files need to be inspected manually Classifiers need to be constantly updated to include samples of new malware There are certain data mining security issues, including the risk of unauthorized
disclosure of sensitive information
Data mining helps you quickly analyze huge datasets and automatically discover hidden patterns, which is crucial when it comes to creating an effective anti-malware solution that’s able to detect previously unknown threats. However, the final result of using data mining methods always depends on the quality of data you use.
When using data mining in cyber security, it’s crucial to use only quality data. However, preparing databases for analysis requires a lot of time, effort, and resources. You need to clear all your records of duplicate, false, and incomplete information before working with them. Lack of information or the presence of duplicate records or errors can significantly decrease the effectiveness of complex data mining techniques. Only using accurate and complete data can ensure high quality of analysis.
Conclusion
Data mining has great potential as a malware detection tool. It allows you to analyze huge sets of information and extract new knowledge from it.
The main benefit of using data mining techniques for detecting malicious software is the ability to identify both known and zero-day attacks. However, since a previously unknown but legitimate activity may also be marked as potentially fraudulent, there’s the possibility for a high rate of false positives.