A MULTIFACETED APPROACH TO UNDERSTANDING THE BOTNET PHENOMENON (2006) Jonathan Brant CAP 6135 – Spring 2010 Moheeb Abu Rajab, Jay Zarfoss, Fabian Monrose,

A MULTIFACETED APPROACH TO A MULTIFACETED APPROACH TO UNDERSTANDING THE BOTNET UNDERSTANDING THE BOTNET PHENOMENON (2006)PHENOMENON (2006)

Jonathan BrantCAP 6135 – Spring 2010

Moheeb Abu Rajab, Jay Zarfoss, Fabian Monrose, Andreas Moheeb Abu Rajab, Jay Zarfoss, Fabian Monrose, Andreas TerzisTerzis

Computer Science DepartmentComputer Science DepartmentJohns Hopkins UniversityJohns Hopkins University

OverviewOverview

Introduction Background Measurement Methodology

Malware Collection Graybox testing Longitudinal Tracking of Botnets

Results and Analysis Botnet Prevalence Spreading Methods Growth Patterns Botnet Structures Effective Botnet Size Lifetime “Insider’s view”

Conclusion

IntroductionIntroduction

Botnets – “networks of infected end-hosts that are under the control of a human operator” Bots – end-hosts Botmaster – human operator

Command and Control channels facilitate botmaster commands to bots in the botnet Channels can use different communication

mechanisms (e.g. P2P) Most modern botnets use Internet Relay Chat (IRC)

Originally used to form large chat rooms


Botnets almost always used for illegal activities Extortion E-mail spamming Identity theft Software piracy


Paper attempts to address inquiries such as: Number of botnet “species”

Behavioral categorization of different species Evolution of a botnet

BackgroundBackground

Step 1 – Botnets commandeer victims via remotely exploiting vulnerability of software running on victim Infection strategies

include: Self-replicating worms E-mail viruses Social engineering

Convincing victims to run malicious code on their machine


Step 2 – Victim executes shellcode and image of bot binary is fetched from location within botnet When fetch is

complete, the binary installs itself on target machine and automatically starts on each reboot


Step 3 – Bot attempts to contact IRC server (address stored in executable) Using a DNS name

instead of IP address allows botmaster to retain control if IP is blacklisted by ISP


Step 4 – Bot attempts to establish IRC session and join C2 channel Three authentication steps:

Bot authenticates itself using PASS message

This is the IRC session password

Bot issues C2 channel password

This password and session password are in bot binary

Botmaster authenticates to bot population

This prevents other botmasters from seizing control of botnet


Step 5 – Channel topic is parsed and executed Contains default

command that every bot executes

Future commands coming from botmaster can vary widely Wide variety of available

commands/responses increases difficulty of classifying botnet behaviors

Measurement MethodologyMeasurement Methodology

Data collection includes three phases: Malware collection Binary analysis via gray-box testing Tracking of IRC botnets through IRC and

DNS trackers

Measurement| Malware Measurement| Malware CollectionCollection

Distributed darknet Locally deployed

darknet Allocated but

unused portion of IP address space

14 distributed nodes using PlanetLab testbed

Goal is to collect as many bot binaries as possible Must support a wide array of data collection

endpoints and be highly scalable

Measurement| Malware Measurement| Malware CollectionCollection Modified nepenthes

platform Mimics replies generated

by vulnerable services Collects first-stage exploit

(shell-code) Raw packets from

PlanetLab nodes translated Using translation module

written in Click Packets were injected into

local tunneling interface

Measurement | Malware Measurement | Malware CollectionCollection On-line download

modules in nepenthes disabled to prevent excessive downloads Binaries retrieved by

generating list of URL targets and sending to download station

Download station filtered entries in list and extracted unique sources/URLs

Measurement | Malware Measurement | Malware CollectionCollection Honeynet catches

exploits missed by nepenthes Composed of honeypots

running unpatched, virtual instances of Windows XP Each honeypot assigned

private static IP on separate VLAN

Infected honeypots sustain IRC connections until VM’s reimaged

Suspect binaries retrieved by comparing VM contents to clean Windows image

Measurement | Malware Measurement | Malware CollectionCollection Gateway routes

darknet traffic to various parts on internal network Half of darknet

prefixes directed to local responder and other half to honeynet NAT used to map

each honeypot to 128 darknet IP addresses

Measurement | Malware Measurement | Malware CollectionCollection

Serves as firewall preventing honeypots from conducting outbound attacks or infecting each other Cross-infection

prevented by: Placing each

honeypot on separate VLAN and terminating cross-VLAN traffic

Terminating cross-VLAN traffic

Outbound traffic block on popular vulnerable ports

135, 139, 445, etc.


Runs IRC detection module Application-level

traffic searched for common IRC protocol strings

NICK, JOIN, USER Once IRC connection

witnessed, detection module establishes record for IRC session

When honeypot attempts to reconnect, connection allowed to proceed to IRC server


Detection module only allows one honeypot to connect to an IRC server at given point in time Gateway detects

when honeypot is infected

Rules inserted to block inbound attacks to that honeypot


Gateway also performs miscellaneous tasks Triggering honeypot

re-imaging Loading clean

Windows images Pre-filtering for

download station Running local DNS

server to resolve DNS queries from honeypots

Measurement | Graybox Measurement | Graybox TestingTesting Graybox testing used to extract features

of suspicious binaries Analysis spans two distinct phases

(performed on isolated network segment) First phase derives network fingerprint of

binary Second phase extracts binaries IRC-specific

features

Measurement | Graybox Measurement | Graybox TestingTesting Phase 1: Creation of a network fingerprint

Server acts as network sink All network activity initiated by malware will be detected

Traffic logs automatically processed to extract network fingerprint

DNS – target of DNS requests IPs – destination IP addresses Ports – contacted ports and protocols Scan – whether or not default scanning behavior was

detected Default scanning behavior – any attempt to contact more

than 20 distinct destinations on the same port during the monitored period

scanPortsIPsDNSfnet ,,,

Measurement | Graybox Measurement | Graybox TestingTesting Phase 2: Extraction of IRC-related features

Modified version of UnrealIRC daemon instantiated on network sink

IRC listens on all ports ever observed in network fingerprint

Upon detecting an IRC connection, IRC-fingerprint is created

PASS – initial password to establish IRC session NICK – nickname USER – username MODE – modes set JOIN – IRC channels to be automatically joined (and their

associated passwords)

JOINMODEUSERNICKPASSfirc ,,,,

Measurement | Graybox Measurement | Graybox TestingTesting (Phase 2 continued…)

To learn botnet “dialect”, bot connects to local IRC server and enters default channel IRC query engine plays role of botmaster Bot behavior is learned by subjecting it to

series of commands Command set includes:

IRC commands observed in honeynet traces Commands extracted from publicly available

bot source code

Measurement | Longitudinal Measurement | Longitudinal TrackingTracking Botnet tracking is performed by two

means: The use of a custom, lightweight IRC

tracker Probing DNS caches across the globe

Measurement | Longitudinal Measurement | Longitudinal TrackingTracking IRC Tracker

“A modified IRC client that can join a specified IRC channel and automatically answer directed queries based on the template created by the graybox testing technique”

IRC tracker instantiates new IRC session to IRC server using fingerprint and template IRC trackers need to appear responsive

Measurement | Longitudinal Measurement | Longitudinal TrackingTracking

In order to appear “real”, the following must be performed: Traffic filtered so inappropriate information is

not included in template Filtering performed automatically while bot is

executing Computer specifications (e.g. memory, disk

space) are changed to resemble specifications of a real machine

IRC query engine issues a set of commands that require stateful responses

Emulates a bot’s stateful software

Measurement | Longitudinal Measurement | Longitudinal TrackingTracking DNS Tracking

Most bots issue DNS queries to resolve IP addresses of IRC servers

Caches of DNS servers are probed to determine number of DNS servers giving cache hits “Cache hit” implies at least one client queried

DNS server during lifetime of its DNS entry

Measurement | Longitudinal Measurement | Longitudinal TrackingTracking

Original list contained 1.6 million DNS servers First filter removed top level domains

.gov, .mil, etc. Second filter checked consistency of replies

Two consecutive DNS queries First query was recursive and forced DNS server to

completely resolve query Second query was not recursive and obtained local

answers from server cache TTL field in second response should be smaller than first

After filtering, master list consisted of 800,000 name servers

For a given IRC server, the caches of all DNS servers were probed and any associated cache hits recorded

Results and AnalysisResults and Analysis

Results include: Traffic traces captured on local darknet

3 month period IRC logs gathered

3 month period DNS cache hit results from tracking 65 IRC

servers 45 day period

Results| Botnet PrevalenceResults| Botnet Prevalence

Botnet Traffic share Two week snapshot of total incoming SYN packets to local

darknet vs. packets originating from botnet spreaders A botnet spreader is any source that delivered a bot executable

27% of incoming SYNs attributed to botnet spreaders 76% come from

botnet spreaders if target ports considered


More than 90% of all traffic during peaks targeted ports used by botnet spreaders

More than 70% of sources during peak periods sent shell exploits

This suggests the total amount of botnet-related traffic is far greater than 27%


11% (85,000) of probed servers were involved in at least one botnet activity 55% of servers in

dataset are for .com domains 82% of DNS cache

hits from name servers in that domain

29% of .com servers had at least 1 cache hit

.cn servers only 0.2% of total servers 95% of them

exhibited botnet activity

Results|Spreading MethodsResults|Spreading Methods

Botnets use a variety of means to spread and recruit new victims Email Web Active scanning (most prevalent)

Botnets can be grouped into two types: Worm-like

Continuosly scan ports following target selection algorithm

Variable scanning behavior Uses a number of scanning algorithms

Uniform, non-uniform, localized


192 botnets captured 34 botnets were Type-I

Upon infection, bot starts scanning IP space for new victims

Initiates connection to IRC servers (identified by hard-coded list of DNS names)

All IRC servers/channels bots tried to join were unreachable

Channel was banned by public IRC server DNS name did not resolve to valid IP address

Still, botnet grew over time due to persistence of scanning


Type-II botnets were the most prevalent class Scanning triggered by a command More difficult to track due to continuosly changing behavior Localized and targeted scanning are were most prevalent

techniques Localized scanning focused on Class B address space Targeted scanning focused on Class A address space

Results|Growth PatternsResults|Growth Patterns

In order to examine botnet growth patterns, two approaches were taken: Cumulative number of unique DNS cache

hits for distinct botnets over time was plotted

Growth pattern was compared to behavior learned from IRC tracker


Botnets with semi-exponential growth patterns exhibit persistent random scanning activity (unchanging over time) Example: for one botnet, topic of the corresponding channel

was set to randomly scan port 445 indefinitely for one month Related to worm infections


Also representative of botnets with intermittent activity profiles Example: Botnet III corresponds to botnet that infected

honeypots on 3/13/2006 IRC server went down between 4/12/2006 – 4/30/2006 When IRC server became available, growth slope increased and

honeypots were re-infected by the same botnet


Predominantly used time-scoped scanning commands As opposed to continuous scanning like the

previous two


Botnet evolution estimated by counting unique sources for message broadcast to the channel Only plotted botnets of comparable size on

a given plot Trends confirm heterogeneity in botnets

Results | Botnet StructuresResults | Botnet Structures

60% of 318 collected malicious binaries were IRC bots Four predominant IRC structures were revealed

All bots connected to a single IRC server Prevalent among smaller classes of botnets (few hundred users) 70% of observed botnets fell into this category

IRC servers can be connected to form an IRC network supporting large numbers of users 30% of botnets bridged on multiple servers 50% bridged between two servers only

Seemingly unrelated botnets appear more similar when comparing their naming conventions, channel names, and operators’ user IDs These botnets may seem to belong to the wrong botmaster

Selected group of bots commanded to download an updated binary Results in bots being moved to a different IRC server

Results | Effective Botnet Results | Effective Botnet SizeSize Botnet footprint can become fairly large

(> 15,000 bots) Predominant structures were botnets

managed by a single or few servers Distinction drawn between

Botnet’s footprint Number of bots connected to IRC channel

at a given time Effective Size

Results | Effective Botnet Results | Effective Botnet SizeSize Some “chatty” IRC servers broadcast join/leave information for

members on channel Number of online bots versus time for these IRC servers is plotted in figure

9

Maximum size of online population is significantly smaller than botnet’s footprint Footprint greater than

10,000 No more than 3,000

bots online at the same time

Effective size has little impact on long term activity, however, it affects number of bots available to execute commands in a timely manner

Results | LifetimeResults | Lifetime

Discrepancy between footprint and effective size likely due to the long lifetime of a typical botnet Bot death rates and high churn rates can

affect botnet’s effective size

Results | LifetimeResults | Lifetime

High churn rates Bots do not stay long on IRC channel

Average stay time: 25 minutes 90% stay less than 50 minutes

Likely causes include Client instability

(as a result of infection)

Machine hibernation

Botmasters commanding bots to leave the channel

Results | Botnet Software Results | Botnet Software TaxonomyTaxonomy 183 of 192 confirmed IRC-based bot executables

responded to probes of IRC query engine 49% of bots run AV/FW killer – a utility that disables anti-

virus and firewall processes 43% run identd server which performs user identification

Ensures only intended bots join a given IRC channel 40% run system security monitor which tightens bot

security E.g. disables DCOM service and file sharing

38% run a registry monitor which alerts the bot of any attempts to disable it

Results | Botnet Software Results | Botnet Software TaxonomyTaxonomy Number of exploits within bot binaries

varied from 3 to 29 Average of 15 exploits per binary Most popular exploits (appeared in over

75% of binaries) DCOM135 LSASS445 NTPASS

Results | Botnet Software Results | Botnet Software TaxonomyTaxonomy Authors evaluated effectiveness of ClamAV and

Norton anti-virus on 192 malicious binaries ClamAV classified 137 binaries as malicious Norton anti-virus classified 179 binaries as malicious

Windows XP service pack 2 still not immune

Results | “Insider’s view”Results | “Insider’s view”

Traces show that: Botmasters share information concerning

what prefixes should not be scanned Bots are tweaked to minimize chatter on C2

channel Bots are probed to detect and isolate

“misbehavers” Also look for “super-bots” with high bandwidth

network links and large storage capacities


Bots migrate from one IRC channel to another, instructed by: Command from botmaster Download of replacement software that points to a

different C2 server


Control commands include channel joins and leaves

Mining category includes commands that collect machine specifications

Attack category includes commands from botmasters to attack other network computers


Small botnets receive larger portion of control and mining commands Hands-on botmasters that devote large amounts of

time to manually control their botnet Medium and large

botnets have a larger percentage of cloning and download commands Cloning could

include the use of one botnet to attack another botnet by overloading its IRC server with join requests

ConclusionConclusion

Botnets are a major contributor to overall unwanted internet traffic Most botnet traffic can be attributed to scans used to

recruit new bots IRC is still the dominant protocol used for C2

communications Effective sizes of botnets can range from a few

hundred to a few thousand Botnet footprints are usually much larger than effective size

This is due to high churn rate within a botnet Bot’s average channel occupancy is less than half an hour

Graybox testing revealed sophistication of modern bot software E.g. Self-protection measures

ContributionsContributions

Established empirical measurements for botnet prevalence Particularly in considering DNS cache hits by IRC

botnets that were tracked Classified typicality's of bot binaries

Registry monitoring tactics Locking down host vulnerabilities

Classified most prevalent botnet activities as a function of botnet size

Delineated between botnet footprint and “effective size.”

Large experiment samples further solidified results

CritiqueCritique

Focused mainly on Windows-based systems It would be interesting to see the effectiveness of

noted infection strategies on Unix systems Only evaluated two anti-virus applications

Perhaps include other popular anti-virus applications McAfee, Symantec Corporate, AVG, etc.

Authors noted 60% of binaries collected were IRC bots Did the other 40% use a different communication

mechanism? If so, it would be interesting to know how they were

structured and if the authors evaluated them in any way

ReferencesReferences

[1] Rajab, M.A., Zarfoss, J., Monrose, F., & Terzis A. (2006). A multifaceted approach to understanding the botnet phenomenon. Proceedings of the 6th ACM SIGCOMM conference on Internet measurement, Rio de Janeriro, Brazil

A MULTIFACETED APPROACH TO UNDERSTANDING THE BOTNET PHENOMENON (2006) Jonathan Brant CAP 6135 – Spring 2010 Moheeb Abu Rajab, Jay Zarfoss, Fabian Monrose,

Documents

machine slide

scalable slide

isp slide

reboot slide

botnet channels

bot attempts

bot binary botmaster

botnet phenomenon