2009/9/15 1 Rishi : Identify Bot Contaminated Hosts By IRC Nickname Evaluation Reporter : Fong-Ruei , Li Machine Learning and Bioinformatics Lab In Proceedings of USENIX Workshop on Hot Topics in Understanding Botnets (HotBots), 2007
Feb 24, 2016
12009/9/15
Rishi : Identify Bot Contaminated Hosts By IRC Nickname Evaluation
Reporter : Fong-Ruei , Li
Machine Learning and Bioinformatics Lab
In Proceedings of USENIX Workshop on Hot Topics in Understanding Botnets (HotBots), 2007
2
Outline
Introduction Background Communication Channel Detection Results and Evaluation Conclusion
2009/9/15 Machine Learning and Bioinformatics Lab
3
Introduction
Currently stop a given botnet is to disable the
communication channel for the bots However
the hosts stay infected and are in most cases still backdoored, allowing an attacker to reclaim the machine at any time.
2009/9/15 Machine Learning and Bioinformatics Lab
4
Background
Internet Relay Chat(IRC) Each of the different servers hosts a
number of different chat rooms called channels
Every user connected to an IRC server has its own unique username called nickname
2009/9/15 Machine Learning and Bioinformatics Lab
5
Background
BotMaster communicate with the botnet is to use
IRC Bots
join a specific channel on a public or private IRC server
to receive further instructions
2009/9/15 Machine Learning and Bioinformatics Lab
6
Communication Channel Detection
All bots have one characteristic in common: they need a communication channel
Our approach focuses on detecting the communication channel
between the bot and the botnet controller it is possible to detect a bot even before it
performs any malicious actions
2009/9/15 Machine Learning and Bioinformatics Lab
7
Project Rishi
Every captured packet extracts : Time of suspicious connection IP address and port of suspected source
host IP address and port of destination IRC
server Channels joined Utilized nickname
2009/9/15 Machine Learning and Bioinformatics Lab
8
Network setup of Rishi
2009/9/15 Machine Learning and Bioinformatics Lab
9
Basic Concept - Rishi
2009/9/15 Machine Learning and Bioinformatics Lab
10
Scoring Function
Checks for the occurrence of several criteria : suspicious substrings
the name of a bot (e.g., RBOT or l33t-) special characters
like [ , ] , and | long numbers.
nickname consists of many digits: for each two consecutive digits
2009/9/15 Machine Learning and Bioinformatics Lab
1 point
11
Scoring Function
True signs for an infected host raise the final score by more than one point a match with one of the regular
expressions a connection to a blacklisted server the use of a blacklisted nickname
2009/9/15 Machine Learning and Bioinformatics Lab
> 1 points
12
Regular Expression
Each nickname is tested against several regular expressions which match known bot names
For example the following expression: \[[0-9]\|[0-9]{4,} like [0|1234] like |1234
2009/9/15 Machine Learning and Bioinformatics Lab
10 points
13
Whitelisting
The software utilizes : hard coded whitelist dynamic whitelist
Each nickname, which receives zero points is added to the dynamic whitelist
2009/9/15 Machine Learning and Bioinformatics Lab
14
Blacklisting
Two blacklists: the first blacklist is hard coded
in the configuration file the second one is a dynamic list
with nicknames added to it automatically according to the final score
2009/9/15 Machine Learning and Bioinformatics Lab
15
Example
Imagine that the nickname RBOT|DEU|XP-1234 was added to the
blacklist The next captured nickname
RBOT|CHN|XP-5678
2009/9/15 Machine Learning and Bioinformatics Lab
1 point each due to the suspicious substrings RBOT,CHN, and XP
1 points each due to the two occurrences of the special character |
1 point each due to two occurrences of consecutive digits
7points 10 points for more than 50% congruence with a
name stored on the dynamic blacklist
17points
16
Example
1 point each due to the suspicious substrings RBOT,CHN, and XP
1 points each due to the two occurrences of the special character |
1 point each due to two occurrences of consecutive digits
2009/9/15 Machine Learning and Bioinformatics Lab
7points
17points
10 points for more than 50% congruence with a name stored on the dynamic blacklist
17
Results and Evaluation
RWTH Aachen university 30,000 computer users to support Rishi runs on a Quad-CPU Intel Xeon
3,2Ghz system with 3GB of memory installed
we are monitoring a 10 GBit network
2009/9/15 Machine Learning and Bioinformatics Lab
18
Results and Evaluation
2009/9/15 Machine Learning and Bioinformatics Lab
19
Results and Evaluation
2009/9/15 Machine Learning and Bioinformatics Lab
20
Results and Evaluation
2009/9/15 Machine Learning and Bioinformatics Lab
21
Conclusion
Based on characteristics of the communication channel observe protocol messages use n-gram analysis together with a
scoring function black-/whitelists
2009/9/15 Machine Learning and Bioinformatics Lab
22
Bot Nicknames
2009/9/15 Machine Learning and Bioinformatics Lab
23
Thank you for listening
2009/9/15
The end
Machine Learning and Bioinformatics Lab