Intro The issues in general Motivation Solution Experiments Tools eof() Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools Muhammad Najmi Ahmad Zabidi International Islamic University Malaysia MOSC 2012 Berjaya Times Square, Kuala Lumpur 9th July 2012 Muhammad Najmi Ahmad Zabidi MOSC 2012 1/34
66
Embed
Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Machine Learning-based Malicious AdversariesDetection in an Enterprise Environment by Using Open
Source Tools
Muhammad Najmi Ahmad ZabidiInternational Islamic University Malaysia
MOSC 2012Berjaya Times Square, Kuala Lumpur
9th July 2012
Muhammad Najmi Ahmad Zabidi MOSC 2012 1/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
About
• I am a research grad student at Universiti TeknologiMalaysia, Skudai, Johor Bahru, Malaysia
• My current employer is International Islamic UniversityMalaysia, Kuala Lumpur
• Research area - malware detection, narrowing onWindows executables
• For past few years (since 2003), I am a Subversion(SVN)committer for KDE localization project to Malay language(but now rarely commit.. need a new intern to replace :) )
Muhammad Najmi Ahmad Zabidi MOSC 2012 2/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Computing world as we knew it
• Interconnected machine
• Previously less connected, now ‘‘socialized’’ machines
• Brought real problems to the cyberworld
Muhammad Najmi Ahmad Zabidi MOSC 2012 3/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Risks
• Financial lost
• Company/government level espionage
• Privacy breach
Muhammad Najmi Ahmad Zabidi MOSC 2012 4/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Types of adversaries
• Spam
• Scam
• Phishing
• Malware, botnet, rookit etc
• Anything else?
Muhammad Najmi Ahmad Zabidi MOSC 2012 5/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Spam
• Annoying
• Productivity wasted in unneccesary file deletion
• Difficult to find important email - extreme case
Muhammad Najmi Ahmad Zabidi MOSC 2012 6/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Spam
• Annoying
• Productivity wasted in unneccesary file deletion
• Difficult to find important email - extreme case
Muhammad Najmi Ahmad Zabidi MOSC 2012 6/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Spam
• Annoying
• Productivity wasted in unneccesary file deletion
• Difficult to find important email - extreme case
Muhammad Najmi Ahmad Zabidi MOSC 2012 6/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Spam
• Annoying
• Productivity wasted in unneccesary file deletion
• Difficult to find important email - extreme case
Muhammad Najmi Ahmad Zabidi MOSC 2012 6/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Scam
• Preying on naive victims
• Sounds to good to be true, but still some people believed
• Organized crime/syndicate... with mules cooperating
Muhammad Najmi Ahmad Zabidi MOSC 2012 7/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Scam
• Preying on naive victims
• Sounds to good to be true, but still some people believed
• Organized crime/syndicate... with mules cooperating
Muhammad Najmi Ahmad Zabidi MOSC 2012 7/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Scam
• Preying on naive victims
• Sounds to good to be true, but still some people believed
• Organized crime/syndicate... with mules cooperating
Muhammad Najmi Ahmad Zabidi MOSC 2012 7/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Scam
• Preying on naive victims
• Sounds to good to be true, but still some people believed
• Organized crime/syndicate... with mules cooperating
Muhammad Najmi Ahmad Zabidi MOSC 2012 7/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Phishing
• Almost similar with scam, but different tactic
• More sophisticated, but does not need mule/physicalmeetup
• Main purpose to gain important details - online bankinglogin name, password hence access to the victim’saccount
• More secure to the criminal
Muhammad Najmi Ahmad Zabidi MOSC 2012 8/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Phishing
• Almost similar with scam, but different tactic
• More sophisticated, but does not need mule/physicalmeetup
• Main purpose to gain important details - online bankinglogin name, password hence access to the victim’saccount
• More secure to the criminal
Muhammad Najmi Ahmad Zabidi MOSC 2012 8/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Phishing
• Almost similar with scam, but different tactic
• More sophisticated, but does not need mule/physicalmeetup
• Main purpose to gain important details - online bankinglogin name, password hence access to the victim’saccount
• More secure to the criminal
Muhammad Najmi Ahmad Zabidi MOSC 2012 8/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Phishing
• Almost similar with scam, but different tactic
• More sophisticated, but does not need mule/physicalmeetup
• Main purpose to gain important details - online bankinglogin name, password hence access to the victim’saccount
• More secure to the criminal
Muhammad Najmi Ahmad Zabidi MOSC 2012 8/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Phishing
• Almost similar with scam, but different tactic
• More sophisticated, but does not need mule/physicalmeetup
• Main purpose to gain important details - online bankinglogin name, password hence access to the victim’saccount
• More secure to the criminal
Muhammad Najmi Ahmad Zabidi MOSC 2012 8/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Malware
• Safely to say,coverstrojan,virus,dialers,rabbits,worms,rootkit(bundlednowadays)
• Already infecting computers since 1980s, threat is moreobvious when the Internet is coming in
• Attacking any operating system, Linux, Windows, Mac...even Android phones
Muhammad Najmi Ahmad Zabidi MOSC 2012 9/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Malware
• Safely to say,coverstrojan,virus,dialers,rabbits,worms,rootkit(bundlednowadays)
• Already infecting computers since 1980s, threat is moreobvious when the Internet is coming in
• Attacking any operating system, Linux, Windows, Mac...even Android phones
Muhammad Najmi Ahmad Zabidi MOSC 2012 9/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Malware
• Safely to say,coverstrojan,virus,dialers,rabbits,worms,rootkit(bundlednowadays)
• Already infecting computers since 1980s, threat is moreobvious when the Internet is coming in
• Attacking any operating system, Linux, Windows, Mac...even Android phones
Muhammad Najmi Ahmad Zabidi MOSC 2012 9/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Malware
• Safely to say,coverstrojan,virus,dialers,rabbits,worms,rootkit(bundlednowadays)
• Already infecting computers since 1980s, threat is moreobvious when the Internet is coming in
• Attacking any operating system, Linux, Windows, Mac...even Android phones
Muhammad Najmi Ahmad Zabidi MOSC 2012 9/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Problems with adversaries detection
• Some manually crafted, some automated
• React relatively fast, difficult to trace
• Too many (for example, spam) hence too time consumingfor manual work
Muhammad Najmi Ahmad Zabidi MOSC 2012 10/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
In house analysis
• Given enough expertise, in house analysis could be useful
• Maintaining reputation, having own group of analysts tohandle incidents
• Try minimize costs, use open source tools wheneverpossible
Muhammad Najmi Ahmad Zabidi MOSC 2012 11/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Machine Learning
• Associated with the Artificial Intelligence
• Mimicking human (brain) learning
• Learns through experience
• Deals with known and unknown patterns
• Overlapping (or somehow originated) with Data Mining,Pattern Recognition
Muhammad Najmi Ahmad Zabidi MOSC 2012 12/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Table 1: Differences between clustering and classification
Classification Clustering
Deals with known data Deals with unknown data
Supervised learning Unsupervised learning
Popular algorithms includes:
• Random Forest
• Neural Networks
• k-Nearest Neighbor
• Decision Trees
Popular algorithms includes:
• K-means
• Fuzzy C
• Gaussian
Predictive [Tan et al., 2005] Descriptive [Tan et al., 2005]
Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Table 1: Differences between clustering and classification
Classification
Clustering
Deals with known data Deals with unknown data
Supervised learning Unsupervised learning
Popular algorithms includes:
• Random Forest
• Neural Networks
• k-Nearest Neighbor
• Decision Trees
Popular algorithms includes:
• K-means
• Fuzzy C
• Gaussian
Predictive [Tan et al., 2005] Descriptive [Tan et al., 2005]
Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Table 1: Differences between clustering and classification
Classification
Clustering
Deals with known data
Deals with unknown data
Supervised learning Unsupervised learning
Popular algorithms includes:
• Random Forest
• Neural Networks
• k-Nearest Neighbor
• Decision Trees
Popular algorithms includes:
• K-means
• Fuzzy C
• Gaussian
Predictive [Tan et al., 2005] Descriptive [Tan et al., 2005]
Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Table 1: Differences between clustering and classification
Classification
Clustering
Deals with known data
Deals with unknown data
Supervised learning
Unsupervised learning
Popular algorithms includes:
• Random Forest
• Neural Networks
• k-Nearest Neighbor
• Decision Trees
Popular algorithms includes:
• K-means
• Fuzzy C
• Gaussian
Predictive [Tan et al., 2005] Descriptive [Tan et al., 2005]
Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Table 1: Differences between clustering and classification
Classification
Clustering
Deals with known data
Deals with unknown data
Supervised learning
Unsupervised learning
Popular algorithms includes:
• Random Forest
• Neural Networks
• k-Nearest Neighbor
• Decision Trees
Popular algorithms includes:
• K-means
• Fuzzy C
• Gaussian
Predictive [Tan et al., 2005] Descriptive [Tan et al., 2005]
Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Table 1: Differences between clustering and classification
Classification
Clustering
Deals with known data
Deals with unknown data
Supervised learning
Unsupervised learning
Popular algorithms includes:
• Random Forest
• Neural Networks
• k-Nearest Neighbor
• Decision Trees
Popular algorithms includes:
• K-means
• Fuzzy C
• Gaussian
Predictive [Tan et al., 2005]
Descriptive [Tan et al., 2005]
Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Table 1: Differences between clustering and classification
Classification Clustering
Deals with known data
Deals with unknown data
Supervised learning
Unsupervised learning
Popular algorithms includes:
• Random Forest
• Neural Networks
• k-Nearest Neighbor
• Decision Trees
Popular algorithms includes:
• K-means
• Fuzzy C
• Gaussian
Predictive [Tan et al., 2005]
Descriptive [Tan et al., 2005]
Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Table 1: Differences between clustering and classification
Classification Clustering
Deals with known data Deals with unknown data
Supervised learning
Unsupervised learning
Popular algorithms includes:
• Random Forest
• Neural Networks
• k-Nearest Neighbor
• Decision Trees
Popular algorithms includes:
• K-means
• Fuzzy C
• Gaussian
Predictive [Tan et al., 2005]
Descriptive [Tan et al., 2005]
Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Table 1: Differences between clustering and classification
Classification Clustering
Deals with known data Deals with unknown data
Supervised learning Unsupervised learning
Popular algorithms includes:
• Random Forest
• Neural Networks
• k-Nearest Neighbor
• Decision Trees
Popular algorithms includes:
• K-means
• Fuzzy C
• Gaussian
Predictive [Tan et al., 2005]
Descriptive [Tan et al., 2005]
Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Table 1: Differences between clustering and classification
Classification Clustering
Deals with known data Deals with unknown data
Supervised learning Unsupervised learning
Popular algorithms includes:
• Random Forest
• Neural Networks
• k-Nearest Neighbor
• Decision Trees
Popular algorithms includes:
• K-means
• Fuzzy C
• Gaussian
Predictive [Tan et al., 2005]
Descriptive [Tan et al., 2005]
Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Table 1: Differences between clustering and classification
Classification Clustering
Deals with known data Deals with unknown data
Supervised learning Unsupervised learning
Popular algorithms includes:
• Random Forest
• Neural Networks
• k-Nearest Neighbor
• Decision Trees
Popular algorithms includes:
• K-means
• Fuzzy C
• Gaussian
Predictive [Tan et al., 2005] Descriptive [Tan et al., 2005]
Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
What to look?
• We look for patterns
• In some case, have the spam,phishing mails corpus ready
• We call these patterns as ‘‘features’’
Muhammad Najmi Ahmad Zabidi MOSC 2012 14/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Spam/scam
• The language that being used
• Perhaps words like ‘‘You have won GBP100,000,000’’notification through emails
• Spam bombarded emails, some might be true businesses,but irresistable to handle.
• Scam, asking people to bank in money for untruthfulreasons
Muhammad Najmi Ahmad Zabidi MOSC 2012 15/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Phishing mails
• Look for URL
• Current effort for example by PhishTank is done by usingpublic submission and (I believe) manual verification
Muhammad Najmi Ahmad Zabidi MOSC 2012 16/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Malware
• Researchers tend to look on the ApplicationProgramming Interface (API) calls, some on the opcodes
• Analysis done either by using static or dynamic analysis
Muhammad Najmi Ahmad Zabidi MOSC 2012 17/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Some example
Figure 1: Automated classification proposed by [Rieck et al., 2009]
Muhammad Najmi Ahmad Zabidi MOSC 2012 18/34
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
The datasets
• Spam email research is already quite sometimescompared to the other (phishing)