From Throw-Away Traffic to Bots: Detecting the Rise of DGA-Based ...

From Throw-Away Traffic to Bots:Detecting the Rise of DGA-Based Malware

Manos Antonakakis‡,∗, Roberto Perdisci†,∗, Yacin Nadji∗,Nikolaos Vasiloglou‡, Saeed Abu-Nimeh‡, Wenke Lee∗ and David Dagon∗

‡Damballa Inc.,†University of Georgia{manos,nvasil,sabunimeh}@damballa.com , [email protected]

∗Georgia Institute of Technology{yacin.nadji, wenke}@cc.gatech.edu, [email protected]

Abstract

Many botnet detection systems employ a blacklist ofknown command and control (C&C) domains to detectbots and block their traffic. Similar to signature-basedvirus detection, such a botnet detection approach is staticbecause the blacklist is updated only after running an ex-ternal (and often manual) process of domain discovery.As a response, botmasters have begun employing domaingeneration algorithms (DGAs) to dynamically produce alarge number of random domain names and select a smallsubset for actual C&C use. That is, a C&C domain is ran-domly generated and used for a very short period of time,thus rendering detection approaches that rely on staticdomain lists ineffective. Naturally, if we know how a do-main generation algorithm works, we can generate thedomains ahead of time and still identify and block bot-net C&C traffic. The existing solutions are largely basedon reverse engineering of the bot malware executables,which is not always feasible.

In this paper we present a new technique to detect ran-domly generated domains without reversing. Our insightis that most of the DGA-generated (random) domainsthat a bot queries would result in Non-Existent Domain(NXDomain) responses, and that bots from the same bot-net (with the same DGA algorithm) would generate sim-ilar NXDomain traffic. Our approach uses a combinationof clustering and classification algorithms. The cluster-ing algorithm clusters domains based on the similarity inthe make-ups of domain names as well as the groups ofmachines that queried these domains. The classificationalgorithm is used to assign the generated clusters to mod-els of known DGAs. If a cluster cannot be assigned to aknown model, then a new model is produced, indicatinga new DGA variant or family. We implemented a pro-totype system and evaluated it on real-world DNS trafficobtained from large ISPs in North America. We reportthe discovery of twelve DGAs. Half of them are variantsof known (botnet) DGAs, and the other half are brandnew DGAs that have never been reported before.

1 IntroductionBotnets are groups of malware-compromised ma-

chines, orbots, that can be remotely controlled by anattacker (thebotmaster) through acommand and control(C&C) communication channel. Botnets have becomethe main platform for cyber-criminals to send spam, stealprivate information, host phishing web-pages, etc. Overtime, attackers have developed C&C channels with dif-ferent network structures. Most botnets today rely ona centralized C&C server, whereby bots query a prede-fined C&C domain name that resolves to the IP addressof the C&C server from which commands will be re-ceived. Such centralized C&C structures suffer from thesingle point of failureproblem because if the C&C do-main is identified and taken down, the botmaster losescontrol over the entire botnet.

To overcome this limitation, attackers have used P2P-based C&C structures in botnets such as Nugache [35],Storm [38], and more recently Waledac [39], Zeus [2],and Alureon (a.k.a. TDL4) [12]. While P2P botnetsprovide a more robust C&C structure that is difficult todetect and take down, they are typically harder to imple-ment and maintain. In an effort to combine the simplicityof centralized C&Cs with the robustness of P2P-basedstructures, attackers have recently developed a numberof botnets that locate their C&C server throughautomat-ically generatedpseudo-random domains names. In or-der to contact the botmaster, each bot periodically exe-cutes adomain generation algorithm(DGA) that, givena random seed (e.g., the current date), produces a list ofcandidateC&C domains. The bot then attempts to re-solve these domain names by sending DNS queries un-til one of the domains resolves to the IP address of aC&C server. This strategy provides a remarkable levelof agility because even if one or more C&C domainnames or IP addresses are identified and taken down, thebots will eventually get the IP address of the relocatedC&C server via DNS queries to the next set of automat-ically generated domains. Notable examples of DGA-

based botnets (or DGA-bots, for short) are Bobax [33],Kraken [29], Sinowal (a.k.a. Torpig) [34], Srizbi [30],Conficker-A/B [26], Conficker-C [23] and Murofet [31].A defender can attempt to reverse engineer the bot mal-ware, particularly its DGA algorithm, to pre-computecurrent and future candidate C&C domains in order todetect, block, and even take down the botnet. However,reverse engineering is not always feasible because the botmalware can be updated very quickly (e.g., hourly) andobfuscated (e.g., encrypted, and only decrypted and exe-cuted by external triggers such as time).

In this paper, we propose a novel detection system,called Pleiades, to identify DGA-based bots within amonitored network without reverse engineering the botmalware. Pleiades is placed “below” the local recursiveDNS (RDNS) server or at the edge of a network to mon-itor DNS query/response messages from/to the machineswithin the network. Specifically, Pleiades analyzes DNSqueries for domain names that result inName Errorre-sponses [19], also calledNXDOMAINresponses, i.e., do-main names for which no IP addresses (or other resourcerecords) exist. In the remainder of this paper, we referto these domain names as NXDomains. The focus onNXDomains is motivated by the fact that modern DGA-bots tend to query large sets of domain names amongwhich relatively few successfully resolve to the IP ad-dress of the C&C server. Therefore, to automaticallyidentify DGA domain names, Pleiades searches for rela-tively large clusters of NXDomains that (i) have similarsyntactic features, and (ii) are queried by multiple po-tentially compromised machines during a given epoch.The intuition is that in a large network, like the ISP net-work where we ran our experiments, multiple hosts maybe compromised with the same DGA-bots. Therefore,each of these compromised assets will generate severalDNS queries resulting in NXDomains, and a subset ofthese NXDomains will likely be queried by more thanone compromised machine. Pleiades is able to automat-ically identify and filter out “accidental”, user-generatedNXDomains due to typos or mis-configurations. WhenPleiades finds a cluster of NXDomains, it applies statis-tical learning techniques to build a model of the DGA.This is used later to detect future compromised ma-chines running the same DGA and to detectactive do-main namesthat “look similar” to NXDomains resultingfrom the DGA and therefore probably point to the botnetC&C server’s address.

Pleiades has the advantage of being able to discoverand model new DGAs without labor-intensive malwarereverse-engineering. This allows our system to detectnew DGA-bots before any sample of the related malwarefamily is captured and analyzed. Unlike previous workon DNS traffic analysis for detecting malware-related [4]or malicious domains in general [3, 6], Pleiades lever-

agesthrow-away traffic(i.e., unsuccessful DNS resolu-tions) to (1) discover the rise of new DGA-based botnets,(2) accurately detect bot-compromised machines, and (3)identify and block the active C&C domains queried bythe discovered DGA-bots. Pleiades achieves these goalsby monitoring the DNS traffic in local networks, withoutthe need for a large-scale deployment of DNS analysistools required by prior work.

Furthermore, while botnet detection systems that fo-cus on network flow analysis [13, 36, 44, 46] or requiredeep packet inspection [10, 14] may be capable of de-tecting compromised machines within a local network,they do not scale well to the overwhelming volume oftraffic typical of large ISP environments. On the otherhand, Pleiades employs alightweightDNS-based moni-toring approach, and can detect DGA-based malware byfocusing on a small fraction of all DNS traffic in an ISPnetwork. This allows Pleiades to scale well to very largeISP networks, where we evaluated our prototype system.

This paper makes the following contributions:

• We propose Pleiades, the first DGA-based bot-net identification system that efficiently analyzesstreams of unsuccessful domain name resolutions,or NXDomains, in large ISP networks to automati-cally identify DGA-bots.

• We built a prototype implementation of Pleiades,and evaluated its DGA identification accuracy overa large labeled dataset consisting of a mix of NX-Domains generated by four different known DGA-based botnets and NXDomains “accidentally” gen-erated by typos or mis-configurations. Our experi-ments demonstrate that Pleiades can accurately de-tect DGA-bots.

• We deployed and evaluated our Pleiades prototypein a largeproductionISP network for a period of 15months. Our experiments discovered twelve newDGA-based botnets and enumerated the compro-mised machines. Half of these new DGAs havenever been reported before.

The remainder of the paper is organized as follows.In Section 2 we discuss related work. We provide anoverview of Pleiades in Section 3. The DGA discoveryprocess is described in Section 4. Section 5 describes theDGA classification and C&C detection processes. Weelaborate on the properties of the datasets used and theway we obtained the ground truth in Section 6. The ex-perimental results are presented in Section 7 while wediscuss the limitations of our systems in Section 8. Weconclude the paper in Section 9.

2 Related WorkDynamic domain generation has been used by mal-

ware to evade detection and complicate mitigation, e.g.,Bobax, Kraken, Torpig, Srizbi, and Conficker [26]. Touncover the underlying domain generation algorithm(DGA), researchers often need to reverse engineer thebot binary. Such a task can be time consuming and re-quires advanced reverse engineering skills [18].

The infamous Conficker worm is one of the most ag-gressive pieces of malware with respect to domain namegeneration. The “C” variant of the worm generated50,000 domains per day. However, Conficker-C onlyqueried 500 of these domains every 24 hours. In oldervariants of the worm, A and B, the worm cycled throughthe list of domains every three and two hours, respec-tively. In Conficker-C, the length of the generated do-mains was between four and ten characters, and the do-mains were distributed across 110 TLDs [27].

Stone-Gross et al. [34] were the first to report on do-main fluxing. In the past, malware used IP fast-fluxing,where a single domain name pointed to several IP ad-dresses to avoid being taken down easily. However, indomain fluxing malware uses a domain generation al-gorithm to generate several domain names, and then at-tempt to communicate with a subset of them. The au-thors also analyzed Torpig’s DGA and found that thebot utilizes Twitter’s API. Specifically, it used the sec-ond character of the most popular Twitter search andgenerated a new domain every day. It was updated touse the second character of the 5th most popular Twittersearch. Srizbi [40] is another example of a bot that uti-lizes a DGA by using unique magic number. Researchersidentified several unique magic numbers from multiplecopies of the bot. The magic number is XOR’ed with thecurrent date and a different set of domains is generated.Only the characters “q w e r t y u i o p a s d f” are usedin the generated domain names.

Yadav et. al. proposed a technique to identify botnetsby finding randomly generated domain names [42], andimprovements that also include NXDomains and tempo-ral correlation [43]. They evaluated their approaches byautomatically detecting Conficker botnets in an offlinedataset from a Tier-1 ISP in South Asia in the first paper,and both the ISP dataset and a university’s DNS logs inthe second.

Villamarin-Salomon and Brustoloni [37] comparedtwo approaches to identify botnet C&Cs. In their firstapproach, they identified domains with high query ratesor domains that were temporally correlated. They usedChebyshev’s inequality and Mahalanobis distance toidentify anomalous domains. In their second approach,they analyzed recurring “dynamic” DNS replies withNXDomain responses. Their experiments showed thatthe first approach was ineffective, as several legitimate

services use DNS with short time-to-live (TTL) values.However, their second approach yielded better detectionand identified suspicious C&C domains.

Pleiades differs from the approaches described abovein the following ways.(A) Our work models five differ-ent types of bot families including Conficker, Murofet,Sinowal, and Bobax.(B) We model these bot families us-ing two clustering techniques. The first utilizes the distri-bution of the characters and 2-grams in the domain name.The second relies on historical data that shows the rela-tionship between hosts and domain names.(C) We builda classification model to predict the maliciousness of do-mains that deviate from the two clustering techniques.

Unlike previous work, our approach does not requireactive probing to maintain a fresh list of legitimate do-mains. Our approach does not rely on external reputa-tion databases (e.g., DNSBLs); instead, it only requiresaccess to local DNS query streams to identify new clus-ters of DGA NXDomains. Not only does our approachidentify new DGAs, but it also builds models for theseDGAs to classify hosts that will generate similar NXDo-mains in the future. Furthermore, among the list of iden-tified domains in the DGAs, our approach pinpoints theC&C domains. Lastly, we note that our work is comple-mentary to the larger collection of previous research thatattempts to detect and identify malicious domain names,e.g., [3,4].

3 System OverviewIn this section, we provide a high-level overview of

our DGA-bot detection system Pleiades. As shown inFigure 1, Pleiades consists of two main modules: aDGADiscoverymodule, and aDGA Classification and C&CDetectionmodule. We discuss the roles of these twomain modules and their components, and how they areused in coordination toactively learnand update DGA-bot detection models. We describe these components inmore detail in Sections 4 and 5.

3.1 DGA DiscoveryThe DGA Discoverymodule analyzes streams of un-

successful DNS resolutions, as seen from “below” a localDNS server (see Figure 1). All NXDomains generated bynetwork users are collected during a given epoch (e.g.,one day). Then, the collected NXDomains are clusteredaccording to the following two similarity criteria: (1) thedomain name strings have similar statistical characteris-tics (e.g., similar length, similar level of “randomness”,similar character frequency distribution, etc.) and (2) thedomains have been queried by overlapping sets of hosts.The main objective of this NXDomain clustering processis to group together domain names that likely are auto-matically generated by the same algorithm running onmultiple machines within the monitored network.

Test NXDomain

clusters

Compromised

Hosts Report

DGA Discovery

DGA Classification and C&C Detection

Local DNS

Server

known

DGA-botnet

domains

legitimate

domains

NXDomain

Clustering

DGA Classifier

C&C Detection

NXDomains

NXDomainsActive

Domains

2

1

3

4

56

DGA Filteringfilter out noise and

known DGAs

DGA-like

NXDomain

clusters

NXDomain

clusters

new DGA

models DGA Modeling

Figure 1: A high level overview of Pleiades.

Naturally, because this clustering step is unsupervised,some of the output NXDomain clusters may containgroups of domains that happen to be similar by chance(e.g., NXDomains due to common typos or to mis-configured applications). Therefore, we apply a subse-quent filtering step. We use a supervisedDGA Classifierto prune NXDomain clusters that appear to be generatedby DGAs that we have previously discovered and mod-eled, or that contain domain names that are similar topopular legitimate domains. The final output of theDGADiscoverymodule is a set of NXDomain clusters, eachof which likely represents the NXDomains generated bypreviously unknown or not yet modeled DGA-bots.

3.2 DGA Classification and C&C Detection

Every time a new DGA is discovered, we use a su-pervised learning approach to build models of what thedomains generated by this new DGA “look like”. In par-ticular, we build two different statistical models: (1) astatistical multi-class classifier that focuses on assign-ing a specific DGA label (e.g.,DGA-Conficker.C )to theset of NXDomainsgenerated by a hosthi and (2)a Hidden Markov Model (HMM) that focuses on findingsingle active domain namesqueried byhi that are likelygenerated by a DGA (e.g.,DGA-Conficker.C ) run-ning on the host, and are therefore goodcandidate C&Cdomains.

The DGA Modeling component receives differ-ent sets of domains labeled asLegitimate (i.e.,“non-DGA”), DGA-Bobax, DGA-Torpig/Sinowal ,DGA-Conficker.C , New-DGA-v1 , New-DGA-v2 ,etc., and performs the training of the multi-classDGAClassifierand the HMM-basedC&C Detectionmodule.

The DGA Classificationmodule works as follows.Similar to theDGA Discoverymodule, we monitor thestream of NXDomains generated by each client machine

“below” the local recursive DNS server.Given a subset of NXDomains generated by a ma-

chine, we extract a number of statistical features relatedto the NXDomain strings. Then, we ask theDGA Clas-sifier to identify whether this subset of NXDomains re-sembles the NXDomains generated by previously dis-covered DGAs. That is, the classifier will either label thesubset of NXDomains as generated by a known DGA,or tell us that it does not fit any model. If the subsetof NXDomains is assigned a specific DGA label (e.g.,DGA-Conficker.C ), the host that generated the NX-Domains is deemed to be compromised by the relatedDGA-bot.

Once we obtain the list of machines that appear to becompromised with DGA-based bots, we take detectionone step further. While all previous steps focused on NX-Domains, we now turn our attention to domain names forwhich we observe valid resolutions. Our goal is to iden-tify which domain names, among the ones generated bythe discovered DGA-based bots, actually resolve into avalid IP address. In other words, we aim to identify thebotnet’s active C&C server.

To achieve this goal, we consider all domain namesthat are successfully resolved by hosts which have beenclassified as running a given DGA, sayNew-DGA-vX,by theDGA Classifier. Then, we test these successfullyresolved domains against an HMM specifically trainedto recognize domains generated byNew-DGA-vX. TheHMM analyzes the sequence of characters that composea domain named, and computes the likelihood thatd isgenerated byNew-DGA-vX.

We use an HMM, rather than theDGA Classifier, be-cause for the C&C detection phase we need to classifysingle domain names. TheDGA Classifieris not suitablefor this task because it expects as inputsetsof NXDo-mains generated by a given host to assign a label to the

DGA-bot running on that host. Some of the features usedby theDGA Classifiercannot be reliably extracted froma single domain name (see Sections 4.1.1 and 5.2).

4 DGA DiscoveryThe DGA Discoverymodule analyzes sequences of

NXDomains generated by hosts in a monitored network,and in acompletely unsupervisedway, clusters NXDo-mains that are being automatically generated by a DGA.We achieve this goal in multiple steps (see Figure 1).First (Step 1), we collect sequences of NXDomains gen-erated by each host during an epochE. Afterwards (Step2), we split the overall set of NXDomains generated byall monitored hosts into small subsets, and translate eachset into a statistical feature vector (see Section 4.1.1).We then apply the X-means clustering algorithm [24] togroup these domain subsets into larger clusters of domainnames that have similarstring-basedcharacteristics.

Separately (Step 3), we cluster the NXDomains basedon a completely different approach that takes into ac-count whether two NXDomains are being queried byoverlapping sets of hosts. First, we build a bipartitehostassociationgraph in which the two sets of vertices repre-sent distinct hosts and distinct NXDomains, respectively.A host vertexVhi is connected to an NXDomain vertexVn j if host hi queried NXDomainn j . This allows us toidentify different NXDomains that have been queried byoverlapping sets of hosts. Intuitively, if two NXDomainsare queried by multiple common hosts, this indicates thatthe querying hosts may be running the same DGA. Wecan then leverage this definition of similarity betweenNXDomains to cluster them (see Section 4.1.3).

These twodistinct viewsof similarities among NXDo-mains are then reconciled in acluster correlationphase(Step 4). This step improves the quality of the final NX-Domains clusters by combining the clustering results ob-tained inStep 2andStep 3, and reduces possible noiseintroduced by clusters of domains that may appear sim-ilar purely by chance, for example due to similar typosoriginating from different network users.

The final clusters represent different groups of NX-Domains, each containing domain names that are highlylikely to be generated by the same DGA. For each ofthe obtained NXDomain clusters, the question remainsif they belong to a known DGA, or a newly discoveredone. To answer this question (Step 5), we use theDGAClassifierdescribed in Section 5.2, which is specificallytrained to distinguish between sets of NXDomains gen-erated by currently known DGAs. Clusters that matchpreviously modeled DGAs are discarded. On the otherhand, if a cluster of NXDomains does not resemble anypreviously seen DGAs, we identify the cluster of NXDo-mains as having been generated by a new, previously un-known DGA. These NXDomains will then be sent (Step

6) to theDGA Modelingmodule, which will update (i.e.,re-train) theDGA Classifiercomponent.

4.1 NXDomain ClusteringWe now describe theNXDomain Clusteringmodule in

detail. First, we introduce the statistical features Pleiadesuses to translate small sets of NXDomains into featurevectors, and then discuss how these feature vectors areclustered to find similar NXDomains.

4.1.1 Statistical FeaturesTo ease the presentation of how the statistical features

are computed, we first introduce some notation that wewill be using throughout this section.

Definitions and Notation A domain named con-sists of a set of labels separated by dots, e.g.,www.example.com . The rightmost label is calledthe top-level domain (TLD or TLD(d)), e.g., com.The second-leveldomain (2LD or 2LD(d)) repre-sents the two rightmost labels separated by a period,e.g., example.com . The third-level domain (3LDor 3LD(d)) contains the three rightmost labels, e.g.,www.example.com , and so on.

We will often refer to splitting a sequenceNX ={d1,d2, ...,dm} of NXDomains into a number ofsubsequences (or subsets) of lengthα, NXk ={dr ,dr+1, ...,dr+α−1}, wherer = α(k− 1) + 1 andk =1,2, ...,⌊m

α ⌋. Subscriptk indicates thek-th subsequenceof lengthα in the sequence ofm NXDomainsNX. Eachof the NXk domain sequences can be translated into afeature vector, as described below.

n-gram Features Given a subsequenceNXk of α NX-Domains, we measure the frequency distribution ofn-grams across the domain name strings, withn = 1, ..,4.For example, forn = 2, we compute the frequency ofeach 2-gram. At this point, we can compute the median,average and standard deviation of the obtained distribu-tion of 2-gram frequency values, thus obtaining three fea-tures. We do this for each value ofn= 1, ...,4, producing12 statistical features in total. By measuring the median,average and standard deviation, we are trying to capturetheshapeof the frequency distribution of then-grams.

Entropy-based Features This group of features com-putes the entropy of the character distribution for sep-arate domain levels. For example, we separately com-pute the character entropy for the 2LDs and 3LDs ex-tracted from the domains inNXk. To better understandhow these features are measured, consider a setNXk ofα domains. We first extract the 2LD of each domaindi ∈ NXk, and for each domain we compute the entropyH(2LD(di)) of the characters of its 2LD. Then, we com-pute the average and standard deviation of the set of val-ues{H(2LD(di))}i=1...α . We repeat this for 3LDs andfor the overall domain name strings. We measure a total

of six features, which capture the “level of randomness”in the domains. The intuition is that most DGAs pro-duce random-looking domain name strings, and we wantto account for this characteristic of the DGAs.

Structural Domain Features This group of featuresis used to summarize information about the structure ofthe NXDomains inNXk, such as their length, the num-ber of unique TLDs, and the number of domain levels.In total, we compute 14 features. Specifically, givenNXk, we compute the average, median, standard devi-ation, and variance of the length of the domain names(four features), and of the number of domain levels (fourfeatures). Also, we compute the number of distinct char-acters that appear in these NXDomains (one feature), thenumber of distinct TLDs, and the ratio between the num-ber of domains under the.com TLD and the number ofdomains that use other TLDs (two features). The remain-ing features measure the average, median, and standarddeviation of the occurrence frequency distribution for thedifferent TLDs (three features).

4.1.2 Clustering using Statistical FeaturesTo find clusters of similar NXDomains, we proceed as

follows. Given the setNX of all NXDomains that we ob-served from all hosts in the monitored network, we splitNX into subsets of sizeα, as mentioned in Section 4.1.1.Assumingm is the number of distinct NXDomains inNX, we split the setNX into ⌊m

α ⌋ different subsets whereα = 10.

For each of the obtained subsetsNXk of NX, we com-pute the aforementioned 33 statistical features. After wehave translated eachNXk into its corresponding featurevector, we apply the X-means clustering algorithm [24].X-means will group theNXk into X clusters, whereX isautomatically computed by an optimization process in-ternal to X-means itself. At this point, given a clusterC= {NXk}k=1..l of l NXDomain subsets, we simply takethe union of theNXk in C as an NXDomain cluster.

4.1.3 Clustering using Bipartite GraphsHosts that are compromised with the same DGA-

based malware naturally tend to generate (with highprobability) partially overlapping sets of NXDomains.On the other hand, other “non-DGA” NXDomains areunlikely to be queried by multiple hosts. For example,it is unlikely that multiple distinct users make identicaltypos in a given epoch. This motivates us to considerNXDomains that are queried by several common hosts assimilar, and in turn use this similarity measure to clusterNXDomains that are likely generated by the same DGA.

To this end, we build a sparse association matrixM,where columns represent NXDomains and rows repre-sent hosts that query more than two of the column NX-Domains over the course of an epoch. We discard hosts

INPUT : Sparse matrixM ∈ ℜl×k, in which the rows representl hosts and the columns representk NXDomains.

[1] : NormalizeM: ∀ j = 1, ..,kl∑

i=1Mi, j = 1

[2] : Compute the similarity matrixS from M: S= MT ·M[3] : Compute the firstρ eigenvectors fromSbyeigen-decomposition.

Let U ∈ ℜρ×k be the matrix containingk vectorsu1, ...,uk of sizeρ resulting from the eigen-decomposition ofS

(a vectorui is a reducedρ-dimensional representation of thei-thNXDomain).

[4] : Cluster the vectors (i.e., the NXDomains){ui}i=1,..,k usingthe X-means algorithm

OUTPUT: Clusters of NXDomains

Algorithm 1: Spectral clustering of NXDomains.

that query only one NXDomain to reduce the dimension-ality of the matrix, since they are extremely unlikely tobe running a DGA given the low volume of NXDomainsthey produce. Let a matrix elementMi, j = 0, if hosthi

did not query NXDomainn j . Conversely, letMi, j = wi ifhi did queryn j , wherewi is a weight.

All non-zero entries related to a hosthi are assignedthe same weightwi ∼

1ki

, whereki is the number of NX-Domains queried by hosthi . Clearly,M can be seen as arepresentation of a bipartite graph, in which ahost ver-tex Vhi is connected to anNXDomains vertex Vn j with anedge of weightwi if host hi queried NXDomainn j dur-ing the epoch under consideration. The intuition behindthe particular method we use to compute the weightswi

is that we expect that the higher the number of uniqueNXDomains queried by a hosthi (i.e., the higherki) theless likely the host is “representative” of the NXDomainsit queries. This is in a way analogous to theinverse doc-ument frequencyused in the text mining domain [1,7].

OnceM is computed, we apply a graph partitioningstrategy based on spectral clustering [21, 22], as sum-marized in Algorithm 1. As a first step, we computethe first ρ eigenvectors ofM (we useρ = 15 in ourexperiments), and then we map each NXDomain (eachcolumn of M) into a ρ-dimensional vector. In effect,this mapping greatly reduces the dimensionality of theNXDomain vectors from the total number of hosts (thenumber of rows inM) to ρ . We then used the obtainedρ-dimensional NXDomain representations and apply X-means to cluster the NXDomains based on their “host as-sociations”. Namely, NXDomains are grouped togetherif they have been queried by a similar set of hosts.

4.1.4 Cluster Correlation

We now have two complementary views of how theNXDomains should be grouped based on two differentdefinitions of similarity between domain names. Nei-

ther view is perfect, and the produced clusters may stillcontain noise. Correlating the two results helps filter thenoise and output clusters of NXDomains that are morelikely to be generated by a DGA. Cluster correlation isperformed in the following way.

Let A = {A1, ..,An} be the set of NXDomain clus-ters obtained by using statistical features, as describedin Section 4.1.2, andB = {B1, ..,Bm} be the set of NX-Domain clusters derived from the bipartite graph parti-tioning approach discussed in Section 4.1.3. We com-pute the intersection between all possible pairs of clus-ters Ii, j = Ai ∩ B j , for i = 1, ..,n and j = 1, ..,m. Allcorrelated clustersIi, j that contain less than a predefinednumberλ of NXDomains (i.e.,|Ii, j | < λ ) are discarded,while the remaining correlated clusters are passed to theDGA filtering module described in Section 4.2. Clustersthat are not sufficiently agreed upon by the two cluster-ing approaches are not considered for further processing.We empirically setλ = 40 in preliminary experiments.

4.2 DGA FilteringThe DGA filtering module receives the NXDomain

clusters from the clustering module. This filtering stepcompares the newly discovered NXDomain clusters todomains generated by known DGAs that we have al-ready discovered and modeled. If the NXDomains ina correlated clusterIi, j are classified as being generatedby a known DGA, we discard the clusterIi, j . The rea-son is that the purpose of theDGA Discoverymodule isto find clusters of NXDomains that are generated (withhigh probability) by a new, never before seen DGA. Atthe same time, this filtering step is responsible for deter-mining if a cluster of NXDomains is too noisy, i.e., if itlikely contains a mix of DGA and “non-DGA” domains.

To this end, we leverage theDGA Classifierdescribedin detail in Section 5. At a high level, we can treat theDGA Classifieras a function that takes as input a setNXk

of NXDomains, and outputs a set of tuples{(lt ,st)}t=1..c,wherel i is a label (e.g.,DGA-Conficker.C ), andsi isa score that indicates how confident the classifier is onattributing labell i to NXk, andc is the number of dif-ferent classes (and labels) that theDGA Classifiercanrecognize.

When the DGA filtering module receives a new cor-related cluster of NXDomainsIi, j , it splits the clus-ter into subsets ofα NXDomains, and then passeseach of these subsets to theDGA Classifier. As-sumeIi, j is divided inton different subsets. From theDGA Classifier, we obtain as a resultn sets of tuples

{{(lt ,st)}(1)t=1..c,{(lt ,st)}

(2)t=1..c, ...,{(lt ,st)}

(n)t=1..c}.

First, we consider for each set of tuples{(lt ,st)}(k)t=1..c

with k= 1, ..,n, the labell (k) that was assigned the max-imum score. We consider a clusterIi, j as too noisy ifthe related labelsl (k) are too diverse. Specifically, a

cluster is too noisy when the majority label among thel (k),k = 1, ..n was assigned to less thanθma j = 75% ofthe n domain subsets. The clusters that do not pass theθma j “purity” threshold will be discarded. Furthermore,NXDomain clusters whose majority label is theLegit-imate label will also be discarded.

For each remaining cluster, we perform an additional“purity” check. Let the majority label for a given cluster

Ii, j bel∗. Among the set{{(lt ,st)}(k)t=1..c}k=1..n we take all

the scoresst whose relatedlt = l∗. That is, we take theconfidence score assigned by the classifier to the domainsubsets that have been labeled asl∗, and then we computethe averageµ(st) and the varianceσ2(st) of these scores(notice that the scoresst are in[0,1]). We discard clusterswhoseσ2(st) is greater than a predefined thresholdθσ =0.001, because we consider the domains in the cluster asnot beingsufficiently similarto the majority label class.

At this point, if µ(st)< θµ , with θµ = 0.98, we deemthe NXDomain cluster to be not similar enough to themajority label class, and instead we label it as “newDGA” and pass it to theDGA Modelingmodule. On theother hand, ifµ(st) ≥ θµ , we confirm the majority labelclass (e.g.,DGA-Conficker.C ) and do not consider itfurther.

The particular choice for the values of the above men-tioned thresholds are motivated in Section 7.2.

5 DGA Classification and C&C DetectionOnce a new DGA is reported by theDGA Discov-

ery module, we use a supervised learning approach tolearn how to identify hosts that are infected with the re-lated DGA-based malware by analyzing the set of NX-Domains they generate. To identify compromised hosts,we collect the set of NXDomainsNXhi generated by ahost, hi , and we ask theDGA ClassifierwhetherNXhi

likely “belongs” to a previously seen DGA or not. If theanswer is yes,hi is considered to be compromised andwill be labeled with the name of the (suspected) DGA-bot that it is running.

In addition, we aim to build a classifier that can ana-lyze the set of active domain names, sayADhi , resolvedby a compromised hosthi and reduce it to a smaller sub-setCChi ⊂ ADhi of likely C&C domains generated bythe DGA running onhi . Finally, the setCChi may bemanually inspected to confirm the identification of C&Cdomain(s) and related IPs. In turn, the list of C&C IPsmay be used to maintain an IP blacklist, which can beemployed to block C&C communications and mitigatethe effects of the malware infection. We now describethe components of the DGA classification and C&C de-tection module in more detail.

5.1 DGA ModelingAs mentioned in Section 4.2, the NXDomain clusters

that pass theDGA Filtering and do not fit any knownDGA model are (automatically) assigned aNew-DGA-vX label, whereX is a unique identifier. At this point,we build two different statical models representative ofNew-DGA-vX: (1) a statistical multi-class classifier thatcan assign a specific DGA label to the set of NXDomainsgenerated by a hosthi and (2) a Hidden Markov Model(HMM) that can compute the probability that a singleactivedomain queried byhi was generated by the DGArunning on the host, thus producing a list ofcandidateC&C domains.

TheDGA Modelingmodule takes as input the follow-ing information: (1) a list of popular legitimate domainnames extracted from the top 10,000 domains accordingto alexa.com ; (2) the list of NXDomains generatedby running known DGA-bots in a controlled environ-ment (see Section 6); (3) the clusters of NXDomains re-ceived from theDGA Discoverymodule. LetNX be onesuch newly discovered cluster of NXDomains. Becausein some casesNX may contain relatively few domains,we attempt to extend the setNX to a larger setNX′ thatcan help build better statistical models for the new DGA.To this end, we identify all hosts that “contributed” tothe NXDomains clustered inNX from our sparse asso-ciation matrixM and we gather all the NXDomains theygenerated during an epoch. For example, for a given hosthi that generated some of the domains clustered inNX,we gather all the other NXDomains domainsNX′

higen-

erated byhi . We then add the setNX′ =⋃

i NX′hi

to thetraining dataset (marked with the appropriate new DGAlabel). The reader may at this point notice that the setNX′

himay contain not only NXDomains generated by a

hosthi due to running a DGA, but it may also includeNXDomains “accidentally’ generated byhi . Therefore,this may introduce some noisy instances into the trainingdataset. However, the number of “accidental” NXDo-mains is typically very small, compared to the number ofNXDomains generated by a DGA. Therefore, we rely onthe generalization ability of the statistical learning algo-rithms we use to smooth away the effects of this potentialsource of noise. This approach works well in practice, aswe will show in Section 7.

5.2 DGA ClassifierTheDGA Classifieris based on a multi-class version

of the Alternating Decision Trees (ADT) learning algo-rithm [9]. ADT leverages the high classification accu-racy obtained by Boosting [17], while producing com-pact classification rules that can be more easily inter-preted.

To detect hosts that are compromised with DGA-basedmalware, we monitor all NXDomains generated by each

host in the monitored network and periodically send thisinformation to theDGA Classifier. Given a setNXhi

of NXDomains generated by hosthi , we splitNXhi intosubsets of lengthα, and from each of these subsets weextract a number of statistical features, as described inSection 4.1.1 If one of these subsets of NXDomains islabeled by theDGA Classifieras being generated by agiven DGA, we mark hosthi as compromised and we addits IP address and the assigned DGA label to a malwaredetection report.

5.3 C&C Detection

The C&C Detection module is based on HiddenMarkov Models (HMM) [28]. We use one distinct HMMper DGA. Given the setNXD of domains generated bya DGA D , we consider each domaind ∈ NXD sepa-rately, and feed these domains to an HMM for training.The HMM sees the domain names simply as a sequenceof characters, and the result of the training is a modelHMMD that given a new domain names in input willoutput the likelihood thatswas generated byD .

We useleft-to-right HMM as they are used in prac-tice to decrease the complexity of the model, effectivelymitigating problems related to under-fitting. The HMM’semission symbols are represented by the set of charactersallowed in valid domain names (i.e., alphabetic charac-ters, digits, ‘ ’, ‘-’, and ‘.’). We set the number of hiddenstates to be equal to the average length of the domainnames in the training dataset.

During operation, theC&C Detectionmodule receivesactive domain names queried by hosts that have been pre-viously classified by theDGA Classifieras being com-promised with a DGA-based malware. Lethi be one suchhost, andD be the DGA running onhi . TheC&C Detec-tion module will send every domains resolved byhi toHMMD , which will compute a likelihood scoref (s). Iff (s)> θD , s is flagged as a good candidate C&C domainfor DGA D .

The thresholdθD can be learned during the trainingphase. First, we train the HMM with the setNXD . Then,we use a setL of legitimate “non-DGA” domains fromAlexa. For each domainl ∈ L, we compute the likelihoodf (l) and set the thresholdθD so to obtain a maximumtarget false positive rate (e.g., max FPs=1%).

6 Data CollectionIn this section we provide an overview of the amount

of NXDomain traffic we observed during a period of fif-teen consecutive months (our evaluation period), start-ing on November 1st, 2010 and ending on January 15th,2012. Afterwards, we discuss how we collected the do-main names used to train and test ourDGA Classifier(see Section 5).

Figure 2: Observations from NXDomain traffic collected below a set of ISP recursive DNS servers over a 439 day window.

6.1 NXDomain Traffic

We evaluated Pleiades over a 15-month period againstDNS traffic obtained by monitoring DNS messagesto/from a set of recursive DNS resolvers operated by alarge North American ISP. These servers were physicallylocated in the US, and served (in average) over 2 millionclient hosts per day1. Our monitoring point was “below”the DNS servers, thus providing visibility on the NXDo-mains generated by the individual client hosts.

Figure 2(a) reports, per each day, (1) the number ofNXDomains as seen in theraw DNS traffic, (2) the num-ber of distinct hosts that in the considered day query atleast one NXDomains, and (3) the number of distinct(de-duplicated) NXDomains (we also filter out domainnames that do not have a valid effective TLD [15,19,20]).The abrupt drop in the number of NXDomains and hosts(roughly a 30% reduction) experienced between 2011-03-24 and 2011-06-17 was due to a configuration changeat the ISP network.

On average, we observed about 5 millions (raw) NX-Domains, 187,600 distinct hosts that queried at least oneNXDomains, and 360,700 distinct NXDomains overall,per each day. Therefore, the average size of the associ-ation matrixM used to perform spectral clustering (seeSection 4.1.3) was 187,600× 360,700. However, it isworth noting thatM is sparse and can be efficiently storedin memory. In fact, the vast majority (about 90%) ofhosts query less than 10 NXDomains per day, and there-fore most rows inM will contain only a few non-zeroelements. This is shown in Figure 2(b), which reportsthe cumulative distribution function (CDF) for the vol-ume of NXDomains queried by a host in the monitorednetwork. On the other hand, Figure 2(c) shows the CDFfor the number of hosts that query an NXDomain (thisrelates directly to the sparseness ofM according to its

1We estimated the number of hosts by computing the average num-ber of distinct client IPs seen per day.

columns).

6.2 Ground TruthIn order to generate the ground truth to train and eval-

uate theDGA Classifier(Section 5), we used a sim-ple approach. To collect the NXDomains generated byknown DGA-based malware we used two different meth-ods. First, because the DGA used by different variants ofConficker and by Murofet are known (derived throughreverse-engineering), we simply used the respective al-gorithms to generate a set of domain names from eachof these botnets. To obtain a sample set of domains gen-erated by Bobax and Sinowal, whose exact DGA algo-rithm is not known (at least not to us), we simply ex-ecuted two malware samples (one per botnet) in a VM-based malware analysis framework that only allows DNStraffic2, while denying any other type of traffic. Over-all we collected 30,000 domains generated by Conficker,26,078 from Murofet, 1,283 from Bobax and, 1,783 fromSinowal.

Finally, we used the top 10,000 most popular domainsaccording toalexa.com , with and without thewww.prefix. Therefore, overall we used 20,000 domain namesto represent the “negative” (i.e., “non-DGA”) class dur-ing the training and testing of theDGA Classifier.

7 AnalysisIn this section, we present the experimental results of

our system. We begin by demonstrating Pleiades’ mod-eling accuracy with respect to known DGAs like Con-ficker, Sinowal, Bobax and Murofet. Then, we elaborateon the DGAs we discovered throughout the fifteen monthNXDomain monitoring period. We conclude the sectionby summarizing the most interesting findings from thetwelve DGAs we detected. Half of them use a DGA al-gorithm from a known malware family. The other half,

2We only allowedUDP port 53 .

Table 1: Detection results (in %) using 10-fold cross validationfor different values ofα.

α = 5 NXDomains α = 10 NXDomains

Class TPrate FPrate AUC TPrate FPrate AUC

Bobax 95 0.4 97 99 0 99Conficker 98 1.4 98 99 0.1 99Sinowal 99 0.1 98 100 0 100Murofet 98 0.7 98 99 0.2 99Benign 96 0.7 97 99 0.1 99

to the best of our knowledge, haveno known malwareassociation.

7.1 DGA Classifier’s Detection Results

In this section, we present the accuracy of the DGAclassifier. We bootstrap the classifier with NXDo-mains from Bobax, Sinowal, Conficker-A, Conficker-B,Conficker-C and Murofet. We test the classifier in twomodes. The first mode is bootstrapped with a “super”Conficker class composed of an equal number of samplesfrom Conficker-A, Conficker-B and Conficker-C classesand another with each Conficker variant as its own class.As we mentioned in Section 5.2, the DGA classifier isbased on a multi-class version of the Alternating Deci-sion Trees (ADT) learning algorithm [9]. We build thevectors for each class by collecting NXDomains fromone day of Honeypot traffic (in the case of Sinowal andBobax) and one day of NXDomains produced by theDGAs for Conficker-A, Conficker-B, Conficker-C andMurofet. Finally, the domain names that were used torepresent the benign class were the first 10,000 Alexadomain names with and without thewww. child labels.

From the raw domain names in each of the classes,we randomly selected 3,000 sets of cardinalityα. As areminder, the values ofα that we used were two, five,ten and 30. This was to build different training datasetsin order to empirically decide which value ofα wouldprovide the best separation between the DGA models.

We generated additional testing datasets. The domainnames we used in this case were from each class as inthe case of the training dataset but we used different days.We do that so we get the minimum possible domain nameoverlap between the training and testing datasets. Weevaluate the training datasets using two methods: 10-foldcross validation on thetraining dataset and by using thetesting datasets computed from domains collected ondif-ferent days. Both methods gave us very similar results.Our system performed the worst in the case of the 10-fold cross validation, therefore we chose to present thisworst-case scenario.

In Table 1, we can see the detection results using twovalues forα, five and ten. We omit the results for theother values due to space limitations. The main confu-

sion between the classes was observed in the datasetsthat contained separate Conficker classes, specificallybetween the classes of Conficker-A and Conficker-B. Toaddress this problem, we created a generic Confickerclass that had an equal number of vectors from each Con-ficker variant. This merging of the Conficker variantsinto a single “super” class allowed the DGA classifierto correctly classify 99.72% (Table 1) of the instances(7,986 correctly classified vs 22 incorrectly classified).Using the datasets with the five classes of DGAs, theweighted average of theTPrates andFPrates were 99.7%and 0.1%, respectively. As we see in Table 1,α = 5 per-forms reasonably well, but with a higher rate of FPs.

7.2 NXDomain Clustering ResultsIn this section, we will discuss results from the DGA

discovery module. In particular, we elaborate on the se-lection of the thresholds used, the unique clusters identi-fied and the false alerts the DGA discovery module pro-duced over the duration of our study.

7.2.1 Correlation ThresholdsIn order to set the thresholdsθma j and θσ defined

in Section 4.2, we spent the first five days of Novem-ber 2010 labeling the 213 produced clusters as DGA re-lated (Positive) or noisy (Negative). For this experiment,we included all produced clusters without filtering outthose withθµ=98% (or higher) “similarity” to an alreadyknown one (see Section 4.2). In Figure 3, we can see inthe Y-axis the percentage values for the dominant (non-benign) class in every cluster produced during these fivedays. In the X-axis we can see the variance that eachdominant class had within each cluster. The results showthat the Positive and Negative assignments had a clearcut, which we can achieve by setting the thresholds asθma j = 75% andθσ = 0.001. These thresholds gave usvery good results throughout the duration of the experi-ments. As we will discuss in Section 7.2.3, the DGA dis-covery module falsely reported only five benign clustersover a period of 15 months. All falsely reported clustershad variance very close to 0.001.

7.2.2 New DGAsPleiades began clustering NXDomain traffic on the

first day of November 2010. We bootstrapped the DGAmodeler with domain names from already known DGAsand also a set of Alexa domain names as the benign class.In Table 2, we present all unique clusters we discoveredthroughout the evaluation period. The “Malware Fam-ily” column simply maps the variant to a known mal-ware family if possible. We discover the malware familyby checking the NXDomains that overlap with NXDo-mains we extracted from traffic obtained from a malwarerepository. Also, we manually inspected the clusters withthe help of a security company’s threat team. The “First

Figure 3: Thresholdsθma j and θσ from the first five days ofNovember 2010.

lymylorozig.eu

lyvejujolec.eu

xuxusujenes.eu

gacezobeqon.eu

tufecagemyl.eu

lyvitexemod.eu

mavulymupiv.eu

jenokirifux.eu

fotyriwavix.eu

vojugycavov.eu

New-DGA-v6

semk1cquvjufayg02orednzdfg.com

invfgg4szr22sbjbmdqm51pdtf.com

0vqbqcuqdv0i1fadodtm5iumye.com

np1r0vnqjr3vbs3c3iqyuwe3vf.com

s3fhkbdu4dmc00ltmxskleeqrf.com

gup1iapsm2xiedyefet21sxete.com

y5rk0hgujfgo0t4sfers2xolte.com

me5oclqrfano4z0mx4qsbpdufc.com

jwhnr2uu3zp0ep40cttq3oyeed.com

ja4baqnv02qoxlsjxqrszdziwb.com

New-DGA-v4

zpdyaislnu.net

vvbmjfxpyi.net

oisbyccilt.net

vgkblzdsde.net

bxrvftzvoc.net

dlftozdnxn.net

gybszkmpse.net

dycsmcfwwa.net

dpwxwmkbxl.net

ttbkuogzum.net

New-DGA-v5

uwhornfrqsdbrbnbuhjt.com

epmsgxuotsciklvywmck.com

nxmglieidfsdolcakggk.com

ieheckbkkkoibskrqana.com

qabgwxmkqdeixsqavxhr.com

gmjvfbhfcfkfyotdvbtv.com

sajltlsbigtfexpxvsri.com

uxyjfflvoqoephfywjcq.com

kantifyosseefhdgilha.com

lmklwkkrficnnqugqlpj.com

New-DGA-v3

clfnoooqfpdc.com

slsleujrrzwx.com

qzycprhfiwfb.com

uvphgewngjiq.com

gxnbtlvvwmyg.com

wdlmurglkuxb.com

zzopaahxctfh.com

bzqbcftfcrqf.com

rjvmrkkycfuh.com

itzbkyunmzfv.com

New-DGA-v2

71f9d3d1.net

a8459681.com

a8459681.info

a8459681.net

1738a9aa.com

1738a9aa.info

1738a9aa.net

84c7e2a3.com

84c7e2a3.info

84c7e2a3.net

New-DGA-v1

Figure 4: A sample of ten NXDomain for each DGA cluster thatwe could not associate with a known malware family.

Seen” column denotes the first time we saw traffic fromeach DGA variant. Finally, the “Population on Discov-ery” column shows the variant population on the discov-ery day. We can see that we can detect each DGA variantwith an average number of 32 “infected hosts” across theentire statewide ISP network coverage.

Table 2: DGAs Detected by Pleiades.

Population

Malware Family First Seen on Discovery

Shiz/Simda-C [32] 03/20/11 37Bamital [11] 04/01/11 175BankPatch [5] 04/01/11 28Expiro.Z [8] 04/30/11 7Boonana [41] 08/03/11 24Zeus.v3 [25] 09/15/11 39New-DGA-v1 01/11/10 12New-DGA-v2 01/18/11 10New-DGA-v3 02/01/11 18New-DGA-v4 03/05/11 22New-DGA-v5 04/21/11 5New-DGA-v6 11/20/11 10

As we see in Table 2, Pleiades reported six vari-ants that belong to known DGA-enabled malware fami-lies [5,8,11,25,32,41]. Six more variants of NXDomainswere reported and modeled by Pleiades but for these, tothe best of our knowledge, no known malware can be as-sociated with them. A sample set of 10 domain namesfor each one of these variants can be seen in Figure 4.

In the 15 months of our observations we observed anaverage population of 742 Conficker infected hosts in theISP network. Murofet had the second largest populationof infected hosts at 92 per day, while the Boonana DGAcomes third with an average population of 84 infectedhosts per day. The fastest growing DGA is Zeus.v3 withan average population of 50 hosts per day, however, dur-ing the last four days of the experiments the Zeus.v3DGA had an average number of 134 infected hosts. It

is worth noting the New-DGA-v1 had an average of 19hosts per day, the most populous of the newly identifiedDGAs.

7.2.3 False Reports on New DGAsDuring our evaluation period we came across five cat-

egories of clusters falsely reported as new DGAs. In allof the cases, we modeled these classes in the DGA mod-eler as variants of the benign class. We now discuss eachcase in detail.

The first cluster of NXDomains falsely reported byPleiades were random domain names generated byChrome [16,45]. Each time the Google Chrome browserstarts, it will query three “random looking” domainnames. These domain names are issued as a DNS check,so the browser can determine if NXDomain rewriting isenabled. The “Chrome DGA” was reported as a vari-ant of Bobax from Pleiades. We trained a class for thisDGA and flagged it as benign. One more case of test-ing for NXDomain rewriting was identified in a brand ofwireless access points. Connectify3, offers wireless hot-spot functionality and one of their configuration optionenables the user to hijack the ISP’s default NXDomainrewriting service. The device generates a fixed numberof NXDomains to test for rewriting.

Two additional cases of false reports were triggeredby domain names from the.it and.edu TLDs. Thesedomain names contained minor variations on commonwords (i.e. repubblica, gazzetta, computer, etc.). Domainnames that matched these clusters appeared only for twodays in our traces and never again. The very short livedpresence of these two clusters could be explained if thedomain names were part of a spam-campaign that wasremediated by authorities before it became live.

The fifth case of false report originated from domainnames under a US government zone and contained the

3www.connectify.me

Table 3: TPs (%) for C&C detection (1,000 training sequences).

FPs(%)botnet 0.1 0.5 1 3 5 10

Zeus.v3 99.9 99.9 99.9 99.9 99.9 99.9Expiro.Z 33.03 64.56 78.23 91.77 95.23 98.67Bamital 100 100 100 100 100 100Shiz 0 1.64 21.02 96.58 100 100Boonana 3.8 10.69 15.59 27.67 35.05 48.43BankPatch 56.21 70.77 93.18 99.9 99.91 99.94

stringwpdhsmp. Our best guess is that these are inter-nal domain names that were accidentally leaked to the re-cursive DNS server of our ISP. Domain names from thiscluster appeared only for one day. This class of NXDo-mains was also modeled as a benign variant. It is worthnoting that all falsely reported DGA clusters, excludingthe Chrome cluster, were short lived. If operators arewilling to wait a few days until a new DGA cluster isreported by Pleiades, these false alarms would not havebeen raised.

7.3 C&C Detection

To evaluate the effectiveness of theC&C Detection,we proceeded as follows. We considered the six newDGAs which we were able to attribute to specific mal-ware, as shown in Table 3. LetNXi be the set of NXDo-mains collected by theDGA Discovery(Section 4) andDGA Modeling(Section 5.1) modules for thei-th DGA.For each DGA, we set aside a subsetNXtrain

i ⊂ NXi ofNXDomains to train anHMMi model. Then we use theremainingNXtest

i = NXi −NXtraini to compute the true

positive (TP) rate ofHMMi , and a setA that consistsof 602,969 unique domain names related to the consis-tently popular domain names according toalexa.comto compute the false positive (FP) rate. To obtainAwe first consider all domain names that have been con-sistently ranked in the top 100,000 popular domains byalexa.com for approximately one year. This gave us aset T of about 60,000 “stable” popular domain names,which we consider aslegitimate domains. Then, wemonitored the stream of successful DNS queries in alarge live network for a few hours, and we added toAall the domain names whose effective 2LD is inT.

We performed experiments with a varying numberc = |NXtrain

i | of training samples. Specifically, we setcequal to 100, 200, 500, 1,000, 2,000, 5,000, and 10,000.We then computed the trade-off between TPs and FPs fordifferent detection thresholds. In the interest of space, wereport only the results forc=1,000 in Table 3. In general,the results improve for increasing numbers of training in-stances. We set the detection threshold so as to obtain anFP rate equal to 0.1%, 0.5%, 1%, 3%, 5%, and 10%. Aswe can see, at FP=1% we obtained a high (> 93%) TPrate for three out of six DGAs, and relatively good results

(> 78%) in five out of six cases. At FP=3% we have highTP rate (> 91%) in five out of six cases.

As mentioned in Section 3, theC&C Detectionmod-ule reduces the set of domain names successfully re-solved by a hosth that have been labeled as compro-mised with DGA-malware to a smaller set of goodcan-didate C&C domainsgenerated by the DGA. The resultsin Table 3 show that if we rank the domains resolved byh according to the likelihood assigned by the HMM, inmost cases we will only need to inspect between 1/100to 3/100 of the active domains queried byh to discoverthe C&C.

7.4 Case Studies7.4.1 Zeus.v3

In September 2011, Pleiades detected a new DGAthat we linked to the Zeus.v3 variant a few weeks later.The domain names collected from the machines compro-mised by this DGA-malware are hosted in six differentTLDs: .biz ,.com ,.info ,.net ,.org and.ru . Ex-cluding the top level domains, the length of the domainnames generated by this DGA are between 33 and 45alphanumeric characters. By analyzing one sample ofthe malware4 we observed that its primary C&C infras-tructure is P2P-based. If the malware fails to reach itsP2P C&C network, it follows a contingency plan, wherea DGA-based component is used to try to recover fromthe loss of C&C communication. The malware will thenresolve pseudo-random domain names, until an activeC&C domain name is found.

To date, we have discovered 12 such C&C domains.Over time, these 12 domains resolved to five differentC&C IPs hosted in four different networks, three in theUS (AS6245, AS16626 and AS3595) and one in theUnited Kingdom (AS24931). Interestingly, we observedthat the UK-based C&C IP address remained active for avery short period of time of only a few minutes, from Jan25, 201212:14:04 EST to Jan 25, 201212:22:37EST. The C&C moved from a US IP (AS16626) to theUK (AS24931), and then almost immediately back to theUS (AS3595).

7.4.2 BankPatchWe picked the BankPatch DGA cluster as a sample

case for analysis since this botnet had been active forseveral months during our experiments and the infectedpopulation continues to be significant. The C&C infras-tructure that supports this botnet is impressive. Twentysix different clusters of servers acted as the C&Cs forthis botnet. The botnet operators not only made use ofa DGA but also moved the active C&Cs to different net-works every few weeks (on average). During our C&C

4Sample MD5s: 8f60afa9ea1e761edd49dfe012c22cbf andccec69613c71d66f98abe9cc7e2e20ef.

discovery process, we observed IP addresses controlledby a European CERT. This CERT has been taking overdomain names from this botnet for several months. Wemanaged to cross-validate with them the completenessand correctness of the C&C infrastructure. Complete in-formation about the C&C infrastructure can be found inTable 4.

The actual structure of the domain name usedby this DGA can be separated into a four byte pre-fix and a suffix string argument. The suffix stringarguments we observed were:seapollo.com,tomvader.com, aulmala.com, apon-tis.com, fnomosk.com, erhogeld.com,erobots.com, ndsontex.com, rte-hedel.com, nconnect.com, edsafe.com,berhogeld.com, musallied.com, newna-cion.com, susaname.com, tvolveras.comanddminmont.com .

The four bytes of entropy for the DGA were providedby the prefix. We observe collisions between NXDo-mains from different days, especially when only one suf-fix argument was active. Therefore, we registered a smallsample of ten domain names at the beginning of 2012 inan effort to obtain a glimpse of the overall distribution ofthis botnet. Over a period of one month of monitoringthe sink-holed data from the domain name of this DGA,this botnet has infected hosts in 270 different networksdistributed across 25 different countries. By observingthe recursive DNS servers from the domain names wesinkholed, we determined 4,295 were located in the US.The recursives we monitored were part of this list and wewere able to measure 86 infected hosts (on average) inthe network we were monitoring. The five countries thathad the most DNS resolution requests for the sinkholeddomain names (besides the US) were Japan, Canada,the United Kingdom and Singapore. The average num-ber of recursive DNS servers from these countries thatcontacted our authorities was 22 — significantly smallerthan the volume of recursive DNS servers within the US.

8 Discussion and LimitationsPleiades has some limitations. For example, once a

new DGA is discovered, Pleiades can build fairly accu-rate statistical models of how the domains generated bythe DGA “look like”, but it is unable to learn or recon-struct the exact domain generation algorithm. Therefore,Pleiades will generate a certain number of false positivesand false negatives. However, the results we presentedin Table 1 show that Pleiades is able to construct a veryaccurateDGA Classifiermodule, which produces veryfew false positives and false negatives forα = 10. Atthe same time, Table 3 shows that theC&C Detectionmodule, which attributes a single active domain nameto a given DGA, and also works fairly well in the ma-

Table 4: C&C Infrastructure for BankPatch.

IP addresses CC Owner

146.185.250.{89-92} RU Petersburg Int.31.11.43.{25-26} RO SC EQUILIBRIUM31.11.43.{191-194} RO SC EQUILIBRIUM46.16.240.{11-15} UA iNet Colocation62.122.73.{11-14,18} UA “Leksim” Ltd.87.229.126.{11-16} HU Webenlet Kft.94.63.240.{11-14} RO Com Frecatei94.199.51.{25-18} HU NET23-AS 23VNET94.61.247.{188-193} RO Vatra Luminoasa88.80.13.{111-116} SE PRQ-AS PeRiQuito109.163.226.{3-5} RO VOXILITY-AS94.63.149.{105-106} RO SC CORAL IT94.63.149.{171-175} RO SC CORAL IT176.53.17.{211-212} TR Radore Hosting176.53.17.{51-56} TR Radore Hosting31.210.125.{5-8} TR Radore Hosting31.131.4.{117-123} UA LEVEL7-AS IM91.228.111.{26-29} UA LEVEL7-AS IM94.177.51.{24-25} UA LEVEL7-AS IM95.64.55.{15-16} RO NETSERV-AS95.64.61.{51-54} RO NETSERV-AS194.11.16.133 RU PIN-AS Petersburg46.161.10.{34-37} RU PIN-AS Petersburg46.161.29.102 RU PIN-AS Petersburg95.215.{0-1}.29 RU PIN-AS Petersburg95.215.0.{91-94} RU PIN-AS Petersburg124.109.3.{3-6} TH SERVENET-AS-TH-AP213.163.91.{43-46} NL INTERACTIVE3D-AS200.63.41.{25-28} PA Panamaserver.com

jority of cases. Unfortunately, there are some scenariosin which the HMM-based classification has difficulties.We believe this is because our HMM considers domainnames simply to be sequences of individual characters.In our future work, we plan to experiment with 2-grams,whereby a domain name will be seen as a sequence ofpairs of characters, which may achieve better classifica-tion accuracy for the harder to model DGAs.

For example, our HMM-based detector was unable toobtain high true positive rates on the Boonana DGA. Thereason is that the Boonana DGA leverages third-levelpseudo-random domain names under several second-level domains owned bydynamic DNSproviders. Dur-ing our evaluation, the hosts infected with Boonana con-tacted DGA-generated domain names under 59 differenteffective second-level domains. We believe that the highvariability in the third-level domains and the high num-ber of effective 2LDs used by the DGA make it harderto build a good HMM, thus causing a relatively lownumber of true positives. However, in a real-world de-ployment scenario, the true positive rate may be signif-icantly increased by focusing on the dynamic DNS do-mains queried by the compromised hosts. For example,since we know that Boonana only uses dynamic DNSdomains, we can filter out any other NXDomains, andavoid passing them to the HMM. In this scenario the

HMM would receive as an input only dynamic DNS do-mains, which typically represent a fraction of all activedomains queried by each host, and consequently the ab-solute number of false positives can be significantly re-duced.

As we mentioned in Section 3, detecting active DGA-generated C&C domains is valuable because their re-solved IP addresses can be used to update a C&C IPblacklist. In turn, this IP blacklist can be used to blockC&C communications at the network edge, thus pro-viding a way to mitigate the botnet’s malicious activ-ities. Clearly, for this strategy to be successful, thefrequency with which the C&C IP addresses changeshould be lower than the rate with which new pseudo-random C&C domain names are generated by the DGA.This assumption holds for all practical cases of DGA-based malware we encountered. After all, the generationof pseudo-random domains mainly serves the purposeof making the take-down of loosely centralized botnetsharder. However, one could imagine “hybrid” botnetsthat use DGA-generated domains to identify a set of peerIPs to bootstrap into a P2P-based C&C infrastructure.Alternatively, the DGA-generated C&C domains may beflux domains, namely domain names that point to a IPfluxing network. It is worth noting that such sophisti-cated “hybrid” botnets may be quite complex to develop,difficult to deploy, and hard to manage successfully.

Another potential limitation is due to the fact thatPleiades is not able to distinguish between different bot-nets whose bot-malware use the same DGA algorithm.In this case, while the two botnets may be controlled bydifferent entities, Pleiades will attribute the compromisedhosts within the monitored network to a single DGA-based botnet.

One limitation of our evaluation method is the ex-act enumeration of the number of infected hosts in theISP network. Due to the location of our traffic moni-toring sensors (below the recursive DNS server), we canonly obtain a lower bound estimate on the number of in-fected hosts. This is because we have visibility of the IPaddresses within the ISP that generate the DNS traffic,but lack additional information about the true number ofhosts “behind” each IP. For example, an IP address thatgenerates DNS traffic may very well be a NAT, firewall,DNS server or other type of complex device that behavesas a proxy (or relay point) for other devices. Also, ac-cording to the ISP, the DHCP churn rate is relatively low,and it is therefore unlikely that we counted the same in-ternal host multiple times.

In the case of Zeus.v3, the DGA is used as a backupC&C discovery mechanism, in the event that the P2Pcomponent fails to establish a communication channelwith the C&C. The notion of having a DGA compo-nent as a redundant C&C discovery strategy could be

used in the future by other malware. A large numberof new DGAs may potentially have a negative impact onthe supervised modules of Pleiades, and especially on theHMM-based C&C detection. In fact, a misclassificationby theDGA Classifierdue to the large number of classesamong which we need to distinguish may misguide theselection of the right HMM to be used for C&C detec-tion, thus causing an increase in false positives. In ourfuture work we plan to estimate the impact of such mis-classifications on the C&C detection accuracy, and inves-tigate whether using auxiliary IP-based information (e.g.,IP reputation) can significantly improve the accuracy inthis scenario.

As the internals of our system become public, somebotnets may attempt to evade both the DGA discoveryand C&C detection process. As we have already dis-cussed, it is in the malware authors’ best interest to createa high number of DGA-related NXDomains in order tomake botnet take-over efforts harder. However, the mal-ware could at the same time generate NXDomains not re-lated with the C&C discovery mechanism in an effort tomislead our current implementation of Pleiades. Thesenoisy NXDomains may be generated in two ways: (1)randomly, for example by employing a different DGA,or (2) by using one DGA with two different seeds, oneof which is selected to generate noise. In case of (1), theprobability that they will be clustered together is small.This means that these NXDomains will likely not be partof the final cluster correlation process and they will notbe reported as new DGA-clusters. On the other hand,case (2) might cause problems during learning, espe-cially to the HMM, because the noisy and “true” NXDo-mains may be intermixed in the same cluster, thus mak-ing it harder to learn an accurate model for the domainnames.

9 Conclusion

In this paper, we presented a novel detection system,called Pleiades, that is able to accurately detect machineswithin a monitored network that are compromised withDGA-based bots. Pleiades monitors traffic below the lo-cal recursive DNS server and analyzes streams of un-successful DNS resolutions, instead of relying on man-ual reverse engineering of bot malware and their DGAalgorithms. Using a multi-month evaluation phase, weshowed that Pleiades can achieve very high detection ac-curacy. Moreover, over the fifteen months of the oper-ational deployment in a major ISP, Pleiades was able toidentify six DGAs that belong to known malware fami-lies and six new DGAs never reported before.

References[1] K. Aas and L. Eikvil. Text categorisation: A sur-

vey., 1999.

[2] abuse.ch. ZeuS Gets More Sophisticated Us-ing P2P Techniques.http://www.abuse.ch/?p=3499 , 2011.

[3] M. Antonakakis, R. Perdisci, D. Dagon, W. Lee,and N. Feamster. Building a dynamic reputationsystem for DNS. In the Proceedings of 19thUSENIX Security Symposium (USENIX Security’10), 2010.

[4] M. Antonakakis, R. Perdisci, W. Lee,N. Vasiloglou, and D. Dagon. Detecting mal-ware domains in the upper DNS hierarchy. IntheProceedings of 20th USENIX Security Symposium(USENIX Security ’11), 2011.

[5] BankPatch. Trojan.Bankpatch.C. http://www.symantec.com/security_response/writeup.jsp?docid=2008-081817-1808-99&tabid=2 , 2009.

[6] L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi.EXPOSURE: Finding malicious domains usingpassive dns analysis. InProceedings of NDSS,2011.

[7] R. Feldman and J. Sanger.The text mining hand-book: advanced approaches in analyzing unstruc-tured data. Cambridge Univ Pr, 2007.

[8] R. Finones. Virus:Win32/Expiro.Z. http://www.microsoft.com/security/portal/Threat/Encyclopedia/Entry.aspx?Name=Virus%3AWin32%2FExpiro.Z , 2011.

[9] Y. Freund and L. Mason. The alternating deci-sion tree learning algorithm. InProceedings ofthe Sixteenth International Conference on MachineLearning, ICML ’99, 1999.

[10] G. Gu, P. Porras, V. Yegneswaran, M. Fong, andW. Lee. BotHunter: Detecting malware infectionthrough IDS-driven dialog correlation. InProc.USENIX Security, 2007.

[11] M. Geide. Another trojan bamital pattern.http://research.zscaler.com/2011/05/another-trojan-bamital-pattern.html , 2011.

[12] S. Golovanov and I. Soumenkov. TDL4 topbot. http://www.securelist.com/en/analysis/204792180/TDL4_Top_Bot ,2011.

[13] G. Gu, R. Perdisci, J. Zhang, and W. Lee. Bot-Miner: clustering analysis of network traffic forprotocol- and structure-independent botnet detec-tion. In USENIX Security, 2008.

[14] G. Gu, J. Zhang, and W. Lee. BotSniffer: Detectingbotnet command and control channels in networktraffic. In Network and Distributed System SecuritySymposium (NDSS), 2008.

[15] J. Hermans. MozillaWiki TLD List.https://wiki.mozilla.org/TLD_List , 2006.

[16] S. Krishnan and F. Monrose. Dns prefetching andits privacy implications: when good things go bad.In Proceedings of the 3rd USENIX conference onLarge-scale exploits and emergent threats: botnets,spyware, worms, and more, LEET’10, pages 10–10, Berkeley, CA, USA, 2010. USENIX Associa-tion.

[17] L. I. Kuncheva. Combining Pattern Classifiers:Methods and Algorithms. Wiley-Interscience,2004.

[18] M. H. Ligh, S. Adair, B. Hartstein, and M. Richard.Malware Analyst’s Cookbook and DVD, chapter 12.Wiley, 2010.

[19] P. Mockapetris. Domain names - conceptsand facilities. http://www.ietf.org/rfc/rfc1034.txt , 1987.

[20] P. Mockapetris. Domain names - implementationand specification. http://www.ietf.org/rfc/rfc1035.txt , 1987.

[21] M. Newman. Networks: an introduction. OxfordUniversity Press, 2010.

[22] A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectralclustering: Analysis and an algorithm. InAdvancesIn Neural Information Processing Systems, pages849–856. MIT Press, 2001.

[23] P. Porras, H. Saidi, and V. Yegneswaran. An anal-ysis of conficker’s logic and rendezvous points.http://mtc.sri.com/Conficker/ , 2009.

[24] D. Pelleg and A. W. Moore. X-means: Extendingk-means with efficient estimation of the number ofclusters. InProceedings of the Seventeenth Inter-national Conference on Machine Learning, ICML’00, pages 727–734, San Francisco, CA, USA,2000. Morgan Kaufmann Publishers Inc.

[25] C. POLSKA. ZeuS P2P+DGA variantmapping out and understanding the threat.

http://www.cert.pl/news/4711/langswitch_lang/en , 2012.

[26] P. Porras. Inside risks: Reflections on conficker.Communications of the ACM, 52:23–24, October2009.

[27] P. Porras, H. Saidi, and V. Yegneswaran. ConfickerC analysis. Technical report, SRI International,Menlo Park, CA, April 2009.

[28] L. R. Rabiner. Readings in speech recognition.chapter A tutorial on hidden Markov models andselected applications in speech recognition. 1990.

[29] P. Royal. Analysis of the kraken botnet.http://www.damballa.com/downloads/r_pubs/KrakenWhitepaper.pdf , 2008.

[30] S. Shevchenko. Srizbi domain generator calculator.http://blog.threatexpert.com/2008/11/srizbis-domain-calculator.html ,2008.

[31] S. Shevchenko. Domain name gen-erator for murofet. http://blog.threatexpert.com/2010/10/domain-name-generator-for-murofet.html , 2010.

[32] SOPHOS. Mal/Simda-C. http://www.sophos.com/en-us/threat-center/threat-analyses/viruses-and-spyware/Mal ˜ Simda-C/detailed-analysis.aspx , 2012.

[33] J. Stewart. Bobax trojan analysis. http://www.secureworks.com/research/threats/bobax/ , 2004.

[34] B. Stone-Gross, M. Cova, L. Cavallaro, B. Gilbert,M. Szydlowski, R. Kemmerer, C. Kruegel, andG. Vigna. Your botnet is my botnet: analysis ofa botnet takeover. InProceedings of the 16th ACMConference on Computer and Communications Se-curity, CCS ’09, pages 635–647, New York, NY,USA, 2009. ACM.

[35] S. Stover, D. Dittrich, J. Hernandez, and S. Diet-rich. Analysis of the storm and nugache trojans:P2P is here. InUSENIX ;login:, vol. 32, no. 6, De-cember 2007.

[36] T.-F. Yen and M. K. Reiter. Are your hosts tradingor plotting? Telling P2P file-sharing and bots apart.In ICDCS, 2010.

[37] R. Villamarin-Salomon and J. Brustoloni. Identi-fying botnets using anomaly detection techniquesapplied to dns traffic. In5th Consumer Communi-cations and Networking Conference, 2008.

[38] Wikipedia. The storm botnet. http://en.wikipedia.org/wiki/Storm_botnet ,2010.

[39] J. Williams. What we know (and learned) from thewaledac takedown. http://tinyurl.com/7apnn9b , 2010.

[40] J. Wolf. Technical details ofsrizbi’s domain generation algorithm.http://blog.fireeye.com/research/2008/11/technical-details-of-srizbis-domain-generation-algorithm.html, 2008. Retreived: April, 102010.

[41] J. Wong. Trojan:Java/Boonana. http://www.microsoft.com/security/portal/Threat/Encyclopedia/Entry.aspx?Name=Trojan%3AJava%2FBoonana ,2011.

[42] S. Yadav, A. K. K. Reddy, A. N. Reddy, and S. Ran-jan. Detecting algorithmically generated maliciousdomain names. InProceedings of the 10th annualConference on Internet Measurement, IMC ’10,pages 48–61, New York, NY, USA, 2010. ACM.

[43] S. Yadav and A. N. Reddy. Winning with dnsfailures: Strategies for faster botnet detection. In7th International ICST Conference on Security andPrivacy in Communication Networks, 2011.

[44] T.-F. Yen and M. K. Reiter. Traffic aggregation formalware detection. InProc. International confer-ence on Detection of Intrusions and Malware, andVulnerability Assessment (DIMVA), 2008.

[45] B. Zdrnja. Google Chrome and (weird)DNS requests. http://isc.sans.edu/diary/Google+Chrome+and+weird+DNS+requests/10312 , 2011.

[46] J. Zhang, R. Perdisci, W. Lee, U. Sarfraz, andX. Luo. Detecting stealthy P2P botnets using sta-tistical traffic fingerprints. InAnnual IEEE/IFIPInternational Conference on Dependable Systemsand Networks - Dependable Computing and Com-munication Symposium, 2011.

From Throw-Away Traffic to Bots: Detecting the Rise of DGA-Based ...

Documents