2009/6/22 1 BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection Reporter : Fong-Ruei , Li Machine Learning and Bioinformatics Lab Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee. In Proceedings of the 17th USENIX Security Symposium (Security'08), San Jose, CA, 2008.
36
Embed
2009/6/221 BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Reporter : Fong-Ruei, Li Machine.
Introduction Botnets are becoming one of the most serious threats to Internet security Such as SPAM, DDoS … Botnet is a network of compromised machines under the influence of malware code Bot BotMaster 2009/6/223Machine Learning and Bioinformatics Lab
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
2009/6/22 1
BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection
Reporter : Fong-Ruei , Li
Machine Learning and Bioinformatics Lab
Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee. In Proceedings of the 17th USENIX Security Symposium (Security'08), San Jose, CA, 2008.
Outline
Introduction BotMiner : Detection Framework
Problem statement Architecture overview
Experiments Conclusion
2009/6/22 2Machine Learning and Bioinformatics Lab
Introduction
Botnets are becoming one of the most serious threats to Internet security Such as SPAM , DDoS …
Botnet is a network of compromised machines under the influence of malware code Bot BotMaster
2009/6/22 3Machine Learning and Bioinformatics Lab
Introduction
Most of the current botnet detection approaches work on Specific botnet command and
control(C&C) protocol e.g., IRC
Structure e.g., centralized
2009/6/22 4Machine Learning and Bioinformatics Lab
Introduction
Almost all of these approaches are designed for detecting botnets that use IRC or HTTP based C&C Rish is designed to detect IRC botnets using
known bot nickname patterns as signature Another recent system is designed for
detecting C&C activities with centralized servers BotSniffer
2009/6/22 5Machine Learning and Bioinformatics Lab
Introduction
We need to develop a next generation botnet detection system which should be independent of the C&C protocol and Structure
2009/6/22 6Machine Learning and Bioinformatics Lab
Botnet is characterized by C&C communication channel Malicious activities
Botnet structure Centralized P2P
2009/6/22 7
Problem Statement
Machine Learning and Bioinformatics Lab
Assumptions
We assume that bots within the same botnet will be characterized by similar malicious activities and similar C&C communications
2009/6/22 8Machine Learning and Bioinformatics Lab
Architecture overview
2009/6/22 9
Clustering similar malicious activities
Clustering similar communication
Cross-checking
Machine Learning and Bioinformatics Lab
C-plane Monitor
The C-plane monitor captures network flows and records information on who is talking to whom
We limit our interest to TCP and UDP flows Each flow record contains the information:
Time , Duration IP 、 Port (Source , Destination) Number of packets Bytes transferred
2009/6/22 10Machine Learning and Bioinformatics Lab
A-plane Monitor
The A-plane monitor logs information on who is doing what
It analyzes : Outbound traffic through the monitored
network Detecting several malicious activities that
the internal hosts may perform
2009/6/22 11Machine Learning and Bioinformatics Lab
C-plane Clustering
Be responsible for : Reading the logs generated by the C-
plane monitor Finding clusters of machines that share
similar communication patterns
2009/6/22 12Machine Learning and Bioinformatics Lab
C-plane Clustering-Flow Chart
2009/6/22 13
Filter out irrelevant traffic flows
Machine Learning and Bioinformatics Lab
C-plane Clustering-Basic Filtering
Filter Rule 1 (F1): Ignore the flows that are not directly from
internal host to external hosts Filter Rule 2 (F2):
Ignore the flows that only contain one-way traffic
2009/6/22 14Machine Learning and Bioinformatics Lab
Filter Rule 3 (F3): Ignore the flows whose destinations are
well known as the legitimate servers Google Yahoo!
2009/6/22 15
C-plane Clustering-White Listing
Machine Learning and Bioinformatics Lab
Aggregate related flows into communication flows
Given an period , all m TCP/UDP flows share the same protocol , source IP ,
destination IP and port aggregate them into the same C-flow
2009/6/22 16
1..C { }i j j mf
C-plane Clustering-Aggregation (C-Flow)
Machine Learning and Bioinformatics Lab
C-plane Clustering-Vector representation
Extract a number of statistical features from each C-flow Ci
Translate them into d-dimensional pattern vectors :
2009/6/22 17
diP
Machine Learning and Bioinformatics Lab
Discrete sample distribution of four random variable :1. the number of flows per hour (fph).
fph is computed by counting the number of TCP/IP flows in ci that are present for each hour of the epoch E.
2. the number of packets per flow (ppf). ppf is computed by summing the total number of
packets sent within each TCP/UDP flow in ci.
2009/6/22 18
C-plane Clustering-Vector representation
Machine Learning and Bioinformatics Lab
3. the average number of bytes per packets (bpp).
For each TCP/UDP flow fj ci we divide the overall number of bytes transferred within fj by the number of packets sent within fj .
4. the average number of bytes per second (bps).
bps is computed as the total number of bytes transferred within each fj ci divided by the duration of fj .