BotMiner : Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection. Guofei Gu 1,2 , Roberto Perdisci 3 , Junjie Zhang 1 , and Wenke Lee 1 1 Georgia Tech 3 Damballa, Inc. 2 Texas A&M University. Roadmap. Roadmap. Introduction - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
2008-7-31 Guofei Gu BotMiner
BotMiner: Clustering Analysis of Network Traffic for
1Georgia Tech 3Damballa, Inc.2Texas A&M University
2
Roadmap
• Introduction– Botnet problem– Challenges for botnet detection– Related work
• BotMiner– Motivation– Design– Evaluation
• Conclusion
Roadmap
3
What Is a Bot/Botnet?
• Bot– A malware instance that runs autonomously and automatically on
a compromised computer (zombie) without owner’s consent– Profit-driven, professionally written, widely propagated
• Botnet (Bot Army): network of bots controlled by criminals– Definition: “A coordinated group of malware instances that are
controlled by a botmaster via some C&C channel”– Architecture: centralized (e.g., IRC,HTTP), distributed (e.g., P2P)– “25% of Internet PCs are part of a botnet!” ( - Vint Cerf)
bot
C&C
Botmaster
IntroductionBotMiner
Conclusion
Botnet ProblemChallenges for Botnet DetectionRelated Work
4
Botnets are used for …
• All DDoS attacks
• Spam
• Click fraud
• Information theft
• Phishing attacks
• Distributing other malware, e.g., spyware
IntroductionBotMiner
Conclusion
Botnet ProblemChallenges for Botnet DetectionRelated Work
5
Challenges for Botnet Detection
• Bots are stealthy on the infected machines– We focus on a network-based solution
• Bot infection is usually a multi-faceted and multi-phased process– Only looking at one specific aspect likely to fail
• Bots are dynamically evolving– Static and signature-based approaches may not be
effective
• Botnets can have very flexible design of C&C channels– A solution very specific to a botnet instance is not
desirable
Botnet Problem
Challenges for Botnet DetectionRelated Work
IntroductionBotMiner
Conclusion
6
Why Existing Techniques Not Enough?
• Traditional AV tools– Bots use packer, rootkit, frequent updating to
easily defeat AV tools
• Traditional IDS/IPS– Look at only specific aspect– Do not have a big picture
• Honeypot– Not a good botnet detection tool
IntroductionBotMiner
Conclusion
Botnet Problem
Challenges for Botnet DetectionRelated Work
7
Existing Botnet Detection Work
• [Binkley,Singh 2006]: IRC-based bot detection combine IRC statistics and TCP work weight
• Traffic arrives at high rates– High volume– Some analysis scales with the size of the
input
• Possible approaches– Random packet sampling– Targeted packet sampling
24
Approach
• Idea: Bias sampling of traffic towards subpopulations based on conditions of traffic
• Two modules– Counting: Count statistics of each traffic flow– Sampling: Sample packets based on (1)
overall target sampling rate (2) input conditions
CountingTraffic stream Sampling
Input conditionsInstantaneous
sampling probability
Overall sampling rate
Traffic subpopulations
25
Challenges
• How to specify subpopulations?– Solution: multi-dimensional array specification
• How to maintain counts for each subpopulation?– Solution: rotating array of counting Bloom filters
• How to derive instantaneous sampling probabilities from overall constraints?– Solution: multi-dimensional counter array, and
scaling based on target rates
26
Specifying Subpopulations
• Idea: Use concatenation of header fields (“tupples”) as a “key” for a subpopulation– These keys specify a group of packets that
will be counted together
# base sampling ratesampling_rate = 0.01# number of tuplestuples = 2# number of conditionsconditions = 1# tuple definitionstuple_1 := srcip.dstiptuple_2 := srcip.srcport.dstport# condition : sampling budgettuple_1 in (30, 1] ANDtuple_2 in (0, 5]: 0.5
Count groups of packets with the same source and destination IP address
Count groups of packets with the same source IP, source port, and destination port
27
# base sampling ratesampling_rate = 0.01# number of tuplestuples = 2# number of conditionsconditions = 1# tuple definitionstuple_1 := srcip.dstiptuple_2 := srcip.srcport.dstport# condition : sampling budgettuple_1 in (30, inf] ANDtuple_2 in (0, 5]: 0.5
Sampling Rates for Subpopulations
• Operator specifies– Overall sampling rate– Conditional rate within each class
• Flexsample computes instantaneous sampling probabilities based on this
Sample one in 100 packets on average
Within the 1/100 “budget”, half of sampled packets should come from groups satisfying this condition
28
Examining the Condition
• Biases sampling towards packets from (source IP, destination IP) pairs which– Have sent at least 30 packets– Have sent packets to at least 5 distinct ports
• Application: Portscan
# base sampling ratesampling_rate = 0.01# number of tuplestuples = 2# number of conditionsconditions = 1# tuple definitionstuple_1 := srcip.dstiptuple_2 := srcip.srcport.dstport# condition : sampling budgettuple_1 in (30, inf] ANDtuple_2 in (0, 5]: 0.5
29
Sampling Lookup Table
• Problem: Conditions may not be completely specified