Game Theoretic Model of Strategic Honeypot Allocation … Theoretic Model of Strategic Honeypot Allocation in Computer Networks Radek P bil 1, Viliam Lisy , Christopher Kiekintveld2,

Game Theoretic Model of Strategic HoneypotAllocation in Computer Networks

Radek Pıbil1, Viliam Lisy1, Christopher Kiekintveld2, Branislav Bosansky1,and Michal Pechoucek1

1 Agent Technology Center, Department of Computer Science and EngineeringFaculty of Electrical Engineering, Czech Technical University in Prague

Czech Republic2 University of Texas, El Paso, Texas, United States of America

Abstract A honeypot is a decoy computer system used in network se-curity to waste the time and resources of attackers and to analyze theirbehaviors. While there has been significant research on how to designhoneypot systems, less is known about how to use honeypots strategic-ally in network defense. We develop game-theoretic models that provideinsight into how honeypots can be used to maximal effect to deceive anddelay potential attackers. Our model generalizes previous work on decep-tion games for honeypots by introducing differential values for networkservices and honeypot systems. We also introduce an extension that al-lows attackers to systematically probe multiple systems on a network todetermine which ones are likely to be real systems (and not honeypots)before launching an attack. We provide linear programs for solving in-stances of these games, and analyze the properties of optimal solutions,leading to faster calculations. We present an empirical study of the mod-els to better understand strategic issues related to honeypots.

1 Introduction

We increasingly depend on information technology and computer networks todeliver vital information and services. Protecting these systems and the inform-ation they contain is a growing priority, even as they become more attractivetargets for criminal activity. Cybercriminals are highly motivated and devotelarge efforts to launching sophisticated attacks, requiring network administratorsto adopt increasingly sophisticated countermeasures to protect their networks.Honeypots are one of these countermeasures that provides a unique set of be-nefits for network defense. Falling costs for deploying honeypots and improvedvirtualization technologies are likely to lead to increased use of honeypots, in-cluding systems with many honeypots on a single network.

A honeypot is a computer system placed on a network explicitly in orderto attract the attention of an attacker. It does not store any valuable data andit thoroughly logs everything that happens in the system. Honeypots help toincrease the security of computer systems in two ways [1]: (1) The presenceof honeypots wastes the attacker’s time and resources. The effort an attacker

spends to compromise the honeypot and learn that it does not contain anyuseful information directly takes away time and resources that could be used tocompromise valuable machines. (2) Moreover, once the attacker compromises ahoneypot, the network administrator can analyze all of the attacker’s actionsin great detail, and use the information obtained to better protect the network.For example, specific security holes used in an attack can be patched, and newattack signatures added to antivirus and intrusion detection systems. Attackson honeypots can also serve as an “early warning” system for administrators,providing more time to react to attacks in progress.

For these reasons, the network administrators using honeypots try to maxim-ize the probability that the attacker attacks a honeypot and not a real service.However, with an increasing use of this technology, attackers have started toconsider the existence of honeypots during their attacks and take steps to avoidattacking them. For example, once they gain access to a system, they can usemultitude of methods to probe the system and rule out the possibility that theyare in a honeypot before they continue with their attack (e.g., [2]). To be effectiveagainst more sophisticated attackers, honeypots must be sufficiently disguisedthat they are not obvious (i.e., they cannot simply present the most vulnerablepossible target). These considerations lead us to consider honeypots from an ad-versarial perspective, where network administrators reason about the strategiesof the attackers and vice versa.

Game theory is a formal framework developed to analyze interactions betweenmultiple decision makers. In this paper, we present two novel game-theoreticmodels of adding honeypots to a network and the following target selection bythe attacker. The first model combines a resource allocation game and a decep-tion game, and is designed to answer basic question about how many honeypotsa defender should use, and how they should be configured. In particular, we con-sider the possibility that honeypots can be configured to look like real targets ofvarying importance, offering new ways to deceive an attacker. The second modelextends the first one to add the capability for an attacker to strategically probetargets before launching an attack to determine whether they are likely to behoneypots or real machines. Both models are formulated as zero-sum extensive-form imperfect-information games, and we provide linear programs for comput-ing the optimal strategies of the players (i.e., the network administrator and theattacker) in both cases.

We solve the linear programs using a state of the art optimization toolkit(CPLEX ). This provides greater scalability than previous models [3,4] that weresolved using Gambit, allowing us to analyze the models in greater detail. Theseprevious models found simple uniform randomization strategies to be optimalfor honeypot placement. However, our models show richer and more complexstrategies are necessary when we generalize the assumptions to include non-uniform machine values and sophisticated attackers with probing capabilities.Our experimental results show that the game-theoretic strategies are signific-antly better in reducing the expected harm of the attacks and they allow usinga larger numbers of honeypots more efficiently than two heuristic approaches.

We also test our strategies against simple heuristic attackers, in addition to op-timal ones. Based on the analysis of the optimal game-theoretic strategies, weprovide recommendations to the network administrators applying honeypots intheir networks.

The next section explains the relation of the presented research to the previ-ous work. In Section 3, we introduce the basic model without probing, we analyzeits properties and present the solution LP. In Section 4, we introduce the pos-sibility of probing. The experimental evaluation of both models is presented inSection 5 and we conclude the paper in Section 6.

2 Related Work

Many software packages for creating honeypots and analyzing attackers’ behaviorare available through the honeynet project website3. This paper does not focuson the technical aspects of creating honeypots, so we do not review this lineof research here. An extensive introduction to the practices and technologicalchallenges of applying honeypots is available in [1]. We focus our review on moreclosely related work that applies game theory to honeypots.

2.1 Honeypots and Game Theory

There are relatively few papers that explore the use of game theory for hon-eypots. The existing work can be divided into two categories. One models theinteraction within a honeypot during an ongoing attack. The other models thesituation before the actual attack, when the attacker selects a target.

In the first category, game theory is used to optimize the information learnedabout the attacker’s strategies by modeling the progress of the attack. In thework [5] the authors give the defender a possibility to block the action, or letit be executed, while the attacker can either retry, continue, or stop the attack.In [6] the defender models the attack as a movement on a graph and tries tomake some of the nodes more desirable for the attacker by using a multi-agentlearning.

The approach presented in this paper belongs to the second category, in whichthe game theory is used to optimize the probability that the attacker will attacka honeypot and not a real system. In [3], the authors model situations similar tothe ones we model in this paper. However, their model is simpler and results insimple, uniform strategies. They analyze the problem of allocating the real serv-ers and honeypots to the space of IP addresses. However, the attacker cannotdistinguish between individual servers and honeypots, so the only meaningfulstrategy the attacker can use is to attack a random server. Only if the defendergives the attacker some hint based on the address of the servers, e.g., by assign-ing the honeypots to the lowest IP addresses, a rational attacker can deviatefrom a random strategy. Therefore, a rational defender also allocates addresses

3 www.honeynet.org

randomly. In reality, however, not all computers in the network are identical tothe attacker. In our model, we consider the importance of the computers, whichmake the optimal strategies non-trivial and much harder to compute.

In the second part of [3] as well as in [4], the authors give the attacker theoption of probing the servers before the attack. The results of the probes detectwhether the server is real or a honeypot, but they assume that the result isfully determined by the defender. This implies that the probe results are onlyuseful if the defender voluntarily discloses some information to the attacker. Arational defender uses uniform random probe results and the attacker ignoresthem. A more realistic assumption is that the defender can successfully deceivethe attacker only with certain probability. Otherwise, his probe will identify thereal nature of the server. In this paper, we consider this generalization and itsresults to non-trivial strategies for both players.

2.2 Related Game Theoretic Models

The game theoretic models presented in this paper are a special case of imperfect-information extensive-form games (EFG) with chance nodes. The state-of-the-art algorithm for solving these games optimally is the mathematical program forsequence-form representation of the games [7]. More efficient algorithms can befound for sub-classes of EFGs with special structure. Two such subclasses are theBayesian Stackelberg games [8] and signaling games [4]. As in our game models,these games include hidden information available only to one of the players,however, this information modifies only the payoffs of the players and not theapplicable actions. In our games, the hidden information defines the applicableactions as well, which makes the techniques developed for Bayesian Stackelberggames inapplicable.

A less studied class of games that are most closely related to our modelsare deception games. A formal deception game was first formulated as an openproblem in [9]. One player is given a vector of three random numbers from uni-form distribution on unit interval. It changes one of the numbers to an arbitrarynumber from the interval and presents the modified vector to the second player.The second player chooses one position in the vector and receives as its rewardthe number that was originally on that position. The open question stated inthe paper is whether there is a better strategy than randomly choosing one ofthe positions. This question was answered in [10] and a few similar questionsabout various modifications of the model were published in the next years, butthe results generally apply only to the specific game formulations and they donot present the complete strategies to play the game.

3 Honeypot Selection Game

The Honeypot Selection Game models a situation where an attacker is decidingwhich machine in a computer network to attack.4 However, the network admin-

4 We use the terms machine and service interchangeably here.

istrator has added a set of honeypots to the network, and wants to configurethem to maximize the probability that the attacker will choose to attack a hon-eypot rather than a real computer. There are two basic kinds of honeypots. Alow interaction honeypot is relatively simple, and therefore it can be added tothe network at low cost [11], but even simple probing by the attacker will revealit is not a real system. A high interaction honeypot is much more expensive tocreate and maintain. In order to make it believable, realistic user activity andnetwork traffic has to be simulated. Therefore, high interaction honeypots are alimited resource and it is important to optimize how they are deployed.

One of the important features of real-world networks is that they have manydifferent types of machines with different configurations (available services, hard-ware, etc.). Some categories of machines are more important than others, both tothe owner of the network and as targets for the attacker. For example, a databaseserver containing valuable customer information would have a very high value,while a standard desktop machine may have relatively low value. To model this,we assume that each machine in the network can be classified into one of a fewcategories of importance, which can be assigned a numeric value that representsthe gain/loss associated with a successful attack. One of the decisions that thedefender makes when deploying honeypots on a diverse network is how to dis-guise the honeypots – in other words, which category of machine should eachhoneypot be designed to look like?

We represent a configuration of the network using a vector of values repres-enting the apparent importance of each machine. The defender knows the valuesof each of the real machines in the network, and is able to extend the vector ofvalues by adding honeypots. For each honeypot, the defender is able to select thevalue of the machine that will be observed by the attacker (by configuring thehoney pot to emulate machines of that category). We assume that both playershave knowledge of the typical configurations of the network, so both players knowthe distribution of values in the network. For any configuration, the players cancalculate the probability that the configuration is the actual configuration of thenetwork. We also assume that the defender uses a fixed number of honeypots toadd to the network, and that the attacker knows the number of honeypots (butnot their assigned values). This is a worst case assumption about the attacker,and the model could be generalized to allow for imperfect information about thenumber of honeypots, though it makes the problem more difficult to solve.

Consider the following example. The network has two machines, which haveimportance values 4 and 3. The administrator has one honeypot to deploy, andneeds to decide how to configure it, which corresponds to assigning a value in ourmodel. He could assign it a value of 5 to make it appear very attractive (e.g., bymaking it appear to contain valuable data and exposing obvious vulnerabilities).The attacker observes the unordered vector of values by doing a scan of thenetwork, including the value of the honeypot: (5,4,3). A naıve attacker mightattack the machine with the highest value (5), therefore attacking the honeypot.However, a sophisticated attacker might reason that this is “too good to betrue” and choose instead to attack the next best machine, with a value of 4.

If the attacker chooses a real machine to attack, he obtains a reward and thenetwork administrator is penalized. If the attacker chooses to attack a honeypot,he does not obtain any reward and possibly is penalized for disclosing his attackstrategy. We model the game as a zero-sum game, so a gain for one player isa loss for the other. While this may not always be the case, it allows for fastersolution methods and can provide a solution with guaranteed quality againstany (not necessarily rational) opponent. From this example, we can see that thedefender’s goal is to somehow convince the attacker to selecting a honeypot, andthat assigning all honeypots a maximal value may not be the optimal strategy.

3.1 Formal Definition of the Honeypot Selection Game

The Honeypot Selection Game (HSG) is a two-player zero-sum extensive-formgame with imperfect and incomplete information.

Definition 1. The Honeypot Selection Game (HSG) is defined by the tupleG = (d, a, n, k, D, p, I, χ, A, u):

– d,a are the players in the game called the defender and the attacker;– n is the number of real services;– k is the number of honeypots;– D is a set of importance values;– p : Dn → [0, 1] is the probability of each configuration of real services;– I is a set of all attacker information sets (I ∈ I, I ⊆ Dn+k = Ds);– χ : Dn → P(I) is a function that provides a set of possible actions for the

defender, which append as set of honeypot values to the observed x ∈ Dn;– A is a union of all possible attacker actions for all y ∈ I;– u : Dn × I ×A→ R+ defined if the second parameter being in χ(x) with x

being first parameter, is the expected utility function for the attacker (−u isthe utility function for the defender).

The game starts with a random choice by nature of the network configurationx ∈ Dn according to a known probability distribution p. The defender learns thevalue x and chooses the vector h ∈ Dk of values for the k honeypots it applies.The defender can insert honeypots anywhere in vector x, creating a vector y oflength s = n+ k, which is presented to the attacker. The attacker then choosesto attack one of the services in vector y. If he attacks a real service i, he obtainsthe reward yi from y = (y1, . . . , yi, . . . , ys). If he attacks a honeypot the attackerobtains a reward of 0.

Extensive form games are usually represented as game trees. An example ofa small HSG with one real service (n = 1), one honeypot (k = 1) and two im-portance values (D = {1, 2}) is shown in Figure 1(a). The root node of the gameis a chance node representing the probability distribution p. In the example, twoconfigurations are possible and the distribution p depicted below the branches isuniform. In each possible real configuration (i.e., child of the root), the defenderchooses to add a set of honeypots to the network and defines the information setthe attacker will be in (χ). In the example, the defender can add one honeypot

11/2

10

10

12

11

12

21

11/2

1

21/2

0

21/2

11/2

11/2

0

21/2

2

21/2

20

1

21

1

(a)

11/2

10

11

12

21

11/2

1

21/2

0

21/2

11/2

11/2

0

21/2

2

21/2

21

1

(b)

Figure 1. (a) The game tree of a Honeypot Selection Game rendered by Gambit [12]with one real service, one honeypot and a domain 1, 2. Light gray edges are randomchoices, white edges are defender’s actions and black edges the attacker’s actions. Ser-vices corresponding to actions are above the branches, while probabilities are underthem. (b) The same game with grouped attacker’s actions.

with a value 1 or 2. The possible attacker information sets are I = {11, 12, 22}.We assume that the ordering of the vector is arbitrary and contains no informa-tion, so 12 and 21 are equivalent. The tree nodes in the same information set areconnected by dotted line. In each information set, the attacker can attack one ofthe servers, which is the set of actions A. In the example, in the top informationset (11), the attacker can choose the first server with importance 1 or the secondserver with importance 1. One of them is real, but they are indistinguishable, sothe expected payoffs for each choice is u(1, 11, ∗) = 1

2 . In the middle informationset, the attacker can choose to attack the server with a value 1 or 2. In the topnode of the information set, he can gain 0 or 1, in the bottom, he can gain 0 or2, but he cannot distinguish between these two nodes, and must use the samestrategy in both.

3.2 Solution of the Game

A strategy of a player in a game defines what action the player performs inany situation that can occur in the game. A solution of a game is a set ofstrategies, one for each player, that satisfies some notion of optimality. We willsearch for the solution of the game in form of a behavioral strategy [7], i.e., astrategy that prescribes a probability distribution over all possible actions in eachpossible situation. In our game, this means determining the probability of usingeach combination of available honeypot values for each possible configurationof the real part of the network. We allow mixed strategies (i.e., randomizedstrategies), since they generalize pure strategies and allow for strategic deceptionin adversarial games. The goal of the defender is to maximize his expected payoff,which in a zero-sum game corresponds to minimizing the attacker’s expected

payoff. A pair of strategies that achieves these maximum/minimum expectedpayoffs is a Nash equilibrium of the game.

3.3 Properties of the Honeypot Selection Game

We present some useful properties of the HSG game. Our analysis providesintuition about meaningful strategies by identifying sets of dominated actions.Removing these dominated strategies also allows us to reduce the size of thegame and improves computation time.

Lemma 1. If the attacker sees a vector of values y ∈ Ds, then it has a strategythat guarantees payoff

m(y) = maxS⊆{1,...,s}

U(S,y) (1a)

U(S,y) =

∑i∈S yi −

∑max(k,y)

|S|. (1b)

The function max(k,y) takes k maximum values from y.The value m(y) is the result of a attacker strategy that uniformly randomizes

over the set S, which is a set of component indexes in the vector observed by theattacker.

Proof. The optimal set of S always includes indexes of k maximum values fromy. If the defender knows S and also knows that the attacker picks from thevalues according to a uniform distribution, then he knows that the expectedpayoff for the attacker is the mean of values with indexes from S, with zerosfor honeypots. The smallest possible mean is the one with the largest valuesbelonging to honeypots. utLemma 2. The maximizing S from Lemma 1 does not contain any index of aserver with a value lower than m(y).

Proof. By the same reasoning as in the proof of Lemma 1, but this time from thepoint of view of the attacker. Let’s assume that the maximizing S does containsuch index si. The idea of the proof is that since ysi < m(y), ysi makes m(y)smaller. If we remove si the value of m(y) will increase. S cannot maximize U .utCorollary 1. Attacking a target with value lower than m(y) can never appearwith non-zero probability in any attacker’s optimal strategies.

Proof. If any attacker’s strategy attacks a server yi with yi < m(y), then thestrategy can be modified to attack the set S from the Lemma 1 with uniformprobability. This would increase the expected payoff of the strategy, which con-tradicts its optimality. utCorollary 2. If the defender receives a vector x ∈ Dn of real targets, it doesnot have to consider honeypots with value lower than m(x).

Proof. If the defender uses honeypots h with some values lower than m(x), thenm(x ∪ h) ≥ m(x). A rational attacker will not attack this honeypot and thedefender can choose better honeypots h′ for which m(x) ≥ m(x ∪ h′). ut

Grouping of Server Values In addition to removing dominated strategies us-ing our lemmas, we also introduce a more compact representation of the game.Since we assume that the attacker cannot distinguish between the servers of thesame value, we can reduce the number of the actions available to the attackerin each information set I ∈ I to the number of different values in the observedconfiguration y. To do this we create groups of servers that have identical im-portance values (and are therefore indistinguishable). The expected value forchoosing any server from that group is computed by assuming that the attackeractually chooses uniformly among members of the group, some of which maybe real and some honeypots. Recall the example from Figure 1(b), where theattacker could not distinguish between the real server and the honeypot, bothvalued 1. We limit the attacker to one action for this information set, with theexpected value of 1

2 .

3.4 Solution using Linear Programming

We compute a Nash equilibrium of the game in behavioral strategies using a lin-ear program (LP) based on the state-of-the-art method for imperfect-informationextensive-form games – a sequence-form LP (e.g., see [7]). The sequence-formutilizes a compact representation of imperfect-information extensive-form gameswith perfect recall termed sequences [13,14], where one sequence for a player rep-resents an ordered list of actions for the player from the root to some node in thegame tree. In the following we use the term compatibility of sequences – we saythe sequences are compatible, if a step-by-step execution of the actions in thesequences is a valid course of play. The behavioral strategies can be representedas a probability of executing some sequence conditioned on the opponent playinga compatible sequence. We present two different LP formulations for finding theoptimal strategies for the attacker and the defender, assuming in each case thatthe opponent plays a best response.

Defender’s Linear Program The linear program for computing the defender’sstrategy is as follows. There are two types of variables: (1) vI ∈ R+ representsan expected value of a subgame assigned to each information set of the attackerI ∈ I, and (2) variables pdxI ∈ [0, 1] represent the probability of the defenderchoosing each action (adding a specific honeypots) that leads to the informationI for each possible real configuration of the network x ∈ Dn. Furthermore, udenotes the utility function of the attacker that defender minimizes, and χ−1(I) :I 7→ P(Dn) denotes an inverse function that maps an information set to a setof possible network configurations. Finally, px denotes the probability of theexistence of network configuration x.

minv,d

∑I∈I

vI (2a)

vI ≥∑

x∈χ−1(I)

u(x, I, aIi )pdxI ∀I ∈ I, ∀aIi action applicable in I (2b)

∑I∈χ(x)

pdxI = px ∀x ∈ Dn (2c)

The program minimizes the utility of the attacker by searching for the optimalstrategy of the defender pdxI . These variables are constrained by (2c) in order torepresent valid probabilities of sequences played by the defender, conditioned onthe other players playing compatible sequences (both nature and the attacker).Finally, the attacker chooses the optimal solution in each information set I.Hence, the expected value vI is maximized for all possible configurations andactions of the attacker in constraints (2b).

Attacker’s Linear Program The linear program for computing the optimalstrategy for the attacker is similar – the attacker is maximizing its utility value bychoosing appropriate probabilities for each action paIi ∈ [0, 1] in each informationset I, while the defender selects an optimal action minimizing the expected utilityvalue at each information set corresponding to each network configuration x inconstraints (3b).

maxv,a

∑x∈Dn

pxvx (3a)

∀I ∈ I assume the attacker can perform actions {aI1, . . . , aIm} :

∑i∈{1,...,mI}

u(x, I, aIi )paIi ≥ vx ∀I ∈ I, ∀x ∈ χ−1(I) (3b)

∑i∈{1,...,mI}

paIi = 1 (3c)

Size of the Linear Programs The size of presented linear programs can beexponential in the size of the vector |y| = s in both number of constraints aswell as number of variables. This result follows from the estimation of the sizeof all information sets for the attacker, |I|, that can be at most equal to |D|s.The exponential size of the programs currently limits the applicability of thisapproach to large computer networks, however, it is not in the scope of thispaper, as we currently focus on the quality validation of the proposed model.

4 Honeypot Selection Game with Probes

In this section we extend the basic model from the previous section by allowingthe attacker to analyze the observed servers to learn whether they are real serversor honeypots. The main idea of the extended model is that the attacker, prior tothe actual attack, can use probes to discover the true nature of servers, whether

each machine is real (denoted R), or a honeypot (denoted HP). We formalizethis model as the Extended Honeypot Selection Game.

We assume that the attacker can use a limited number of probes, and thatthe results of the probes are stochastic. The first assumption reflects the limitedtime and resources the attacker typically has for the attack before being exposed.The second assumption models the fact that the attacker cannot be perfectlysure if the machine is a honeypot or not, even after gathering some informationthrough probing.

4.1 Formal Definition of Extended Honeypot Selection Game

The formal definition of the extended Honeypot Selection Game (eHSG) follows:

Definition 2. The eHSG is defined by the tuple G = (Γ , q, IE , Ap, Aa, ψ, u):

– Γ is a basic HSG;

– q is the number of probes to be performed by the attacker;

– IE is a set of all attacker information sets, I ⊆ IE ;

– Ap is a set of all possible attacker probing actions;

– Aa is a set of all possible attacker attacking actions;

– ψ : {R,HP}i ×Api+1 → [0, 1];∀i ∈ {0, . . . , q − 1} is a function that assigns

the probability of a probe result (either R, or HP), based on the history ofprobing decisions and observations;

– u′ : Dn×I×(Apq,Aa)×{R,HP}q → R+ is the expected utility function for

the attacker (−u for the defender). The second parameter is the attacker’sstarting information set, the third and fourth are the sequences of probes andobservations, the and fifth is the final attack action.

We assume that results of probing a single server are independent and identic-ally distributed to simplify the mathematical expression (though in principle themodel is not restricted to this). The probability that a probe to a server thatis either R or HP returns either result R or HP is fixed and does not changewith repeated attempts. We denote these probabilities α(R|R) – the probabil-ity of R when probing a real server – and α(HP |HP ) – the probability of HPwhen probing a honeypot. The complementary probabilities for false positivesand false negatives (misidentification) follow from these.

Figure 2 shows part of a game tree for an extended honeypot selection game.The attacker chooses a server to probe in its information set (12), followed bychance nodes representing the uncertain results of the probes. The probabilityvalues for the chance nodes that determine the results of the probes are givenaccording to the function ψ. Although we assume that the probe results areindependent from each other, and the result of probing a fixed server is determ-ined according to α parameters, the probabilities in the game tree depend onthe path in the tree that lead to them. In the following section we describe amethodology for computing the Bayesian posteriors.

11/2

2

1

R0.9

11

20

HP0.1

11

20

2

R0.3

11

20

HP0.7

11

20

21/2

1

1

R0.3

10

22

HP0.7

10

22

2

R0.9

10

22

HP0.1

10

22

Figure 2. One root information set with observed values 1 and 2 for the attackerincluding subtrees for the node from the set. R and HP are outcomes of probes.

4.2 Probabilities of the Chance Nodes After Probing Outcomes

Since we model a set of servers of a single value as indistinguishable, each of theseservers has a probability of being real, at first depending only on the number ofhoneypots among them. The probing modifies these probabilities and directlyaffects ψ. Let us describe the methodology for defining the ψ function moreformally. We focus on a single set of servers sharing the same importance valueφ. We base our notation on previous definitions: kφ is the number of honeypots,nφ is the number of real servers, sφ = kφ + nφ is the total number of serverswith value φ.

The prior probability of the i-th server being real is p(i). The ordering isdrawn at random uniformly to make sure that it cannot be exploited. We denotep(i|o, b) as the posterior probability of the i-th server being real after a sequenceof observations o and probing actions b. The ψ function determining the resultof first probe of the attacker (examining server i) can be calculated as ψ(∅, i) =p(R) = p(R|i)p(i) + p(R|¬i)p(¬i) .

Based on the outcome we can update the probabilities p for servers in φ.We can use the Bayes rule to calculate p(i|R) for the probed server. For a notprobed server j 6= i, the probability of being real after the first probing can becalculated as p(j|o, b) = p(j|i)p(i|o, b) +p(j|¬i)p(¬i|o, b); where p(j|i) representsthe probability of server j being real if server i is real without any observationscalculated as p(j|i) =

nφ−1sφ−1 , and p(j|¬i) representing the complementary case.

However, this update rule becomes difficult to express concisely, with the increas-ing amount of probes, since calculating p(j|i, o, b,o, b) becomes very difficult.

To see why this is the case, let us denote each of the possible assignmentsof real servers and honeypots for φ by characteristic vectors c ∈ {R,HP}sφ .Let us put each of the vectors into groups that have honeypots and real serversin the same places for all probed locations. For example, the first server was

probed and yields two groups of characteristic vectors, one with a honeypotas the first server, and one with a real server as the first server. Each newlyprobed server subdivides the groups further. These subdivisions requires separateBayesian updates possibly having a full amount of 2sφ groups, with only a singlecharacteristic vector per group.

To exactly calculate all the probabilities p(i|o, b,o, b), we consider all charac-teristic vectors that are compatible with the current information set in the gametree. Each game situation has a list of probabilities of being true assigned toeach of the characteristic vectors for each of the importance values. The probab-ility p(i|o, b) can be calculated by summing over probabilities of characteristicvectors with a real server at the i-th position:

p(i|o, b) =∑c∈S

p(c|o, b), S = {c|∀c ∈ {R,HP}sφ ; ci = R} (4)

With the iid assumption, the updates are based on Bayes’ Rule. The notationused is oli for the result of l-th probe with the i-th server as a target, bli for theprobe, and c for the characteristic vector, whose probability is being updated.

p(c|on+1i , bn+1

i ,o, b) =

{1

p(oi|bni ,o,b)p(c|o, b)α(on+1

i |R), iff ci = R1

p(oi|bni ,o,b)(1− p(c|o))α(on+1

i |HP ), iff ci = HP(5)

The updated vector of probabilities is used in the subtree of the node.

Grouping with Probes We can reduce the number of actions for the attackerby grouping all servers of the same importance value that have not yet beenprobed. These are treated identically as the “next server to be probed”. Theyhave the same outcomes and same probabilities of being real, so we do notbreak the interpretation of the game. Every time a new server is probed, it isdifferentiated from the rest of the servers in the group. This approach keeps afixed ordering, which the defender still cannot influence.

Properties of the Extended Model There is an opportunity for furtherpruning in the eHSG besides creating groups. In the final decision node of theattacker, we can replace a set of attacks on the servers of a same importancewith a single attack that represents an attack on the server with the largestprobability of being real. Among all the servers with the same importance value,the one with the highest probability being real has the highest expected utility;hence, this strategy is dominant and will be selected by a rational attacker.

4.3 Solution Using Linear Programming

The linear program calculating the solution is an extension of the linear programpresented in Section 3.4. The extension treats the chance nodes as defender’s

choice nodes with a fixed strategy. However, it is still necessary to provide con-straints for the weighted values for the attacker choice nodes for probes.

In order to improve readability we denote fin(IE) to be a set of all in-formation sets where the attacker chooses the server to attack. Σa,c,I refers tocompatible sequences of attacker’s actions and chance node outcomes for I, oneof the starting information sets for the attacker. Function orgn(I) returns thefirst information set of the attacker encountered on the path in the game tree toI. Exta(σa) returns the shortest extension to sequence σa ∈ Σa, where Σa is theset of all possible sequences of attacker’s actions. By the shortest extension wemean σa with a single, valid attacker action appended to its end. In the program,we also use IE(σa) as a function that returns a set of information sets reachedby the attacker after executing sequence of actions σa.

minv,d

∑I∈I

vI (6a)

vI ≥∑

x∈χ−1(orgn(I))

−u′(x, I, σa, σe)pdxI ∀I ∈ fin(IE),∀(σa, σe) ∈ Σa,e,I (6b)

vI(σa) ≥∑

I′∈I(Exta(σa))

vI′ ∀σa ∈ Σa (6c)

∑I∈χ(x)

pdxI = px ∀x ∈ Dn (6d)

The defender aims to minimize the expected utility values for the attacker.We define u′ as u′(x, I, σa, σe) = φpe(σe)pt(φi), where pt(φi) is the probabilityof the i-th server in the φ-valued set being real in the final decision node t, whilepe(σe) is the probability of the outcomes of the observations that led to the finalinformation set.

Inequality (6b) provides constraints that maximize the attacker’s expectedutility in the level just above the one with terminal nodes. The second inequal-ity (6c) provides constraints that maximize over the expected value of the sub-trees of attacker’s probing decisions by summing over weighted outcomes of theprobing. The final inequality (6d) makes sure that the probabilities of defender’sactions form a valid probabilistic distribution.

Due to space limits, we omit the attacker’s LP. The difference between theHSG program and eHSG is in the addition of new constraints that make surethat the probabilities of attacker’s sequence are valid in each node, including thechance nodes. The probabilities for chance nodes are fixed probabilities in eachof the chance nodes for probing outcomes. Due to the sequence of q decisions ofthe attacker, the size of the linear program is exponential in q (and in s as is thebasic HSG).

5 Experiments

In this section we provide an experimental analysis of the behavior of the modelswith varying parameters. The goal is to identify key characteristics of the game,

and compare the quality of the game-theoretic solution to baseline strategies.Finally, we want to derive general principles from the results in order to givethe network administrators some rules of thumb for placing the honeypots in acomputer network. All of the results are computed using the LP formulationsdescribed earlier with CPLEX 12.1.

5.1 Experimental Settings

In our experiments we fix only the number of real servers, n = 5. The importancevalues are D = {1, . . . , 4}. The number of honeypots is k ∈ {0, . . . , 5}. The sizeof these games is plausible for a small computer network, and is considerablylarger than games studied in related work. We use two different probabilitydistributions over the possible network configurations x, a uniform distributionand a power-law Yule-Simon distribution with parameter ρ = 1. The Yule-Simondistribution reflects a common situation in computer networks with relativelyfew high-valued targets, and a larger number of less significant targets. For ourdomain with four values the probabilities from this distribution are in increasingimportance order: (0.6250, 0.2083, 0.1042, 0.0625).

As baselines for comparison with the game-theoretic strategies we introducetwo methods for each player: (1) Random strategy, in which the player alwaysselects a uniform random action in each information set, and (2) Maximumstrategy, which reflects a greedy heuristic where the defender always uses hon-eypots with the maximum value5, and the attacker always attacks a target withthe maximum observed value.

Our first set of results compares the payoffs of our baseline strategies andthe game-theoretic strategies from two different perspectives. First, we evaluatethe exploitability of the different defender strategies, and then we compare thequality of the attacker strategies against the defender’s NE strategy. The exploit-ability of a strategy σ is the payoff for the strategy when the opponent playsa best response to σ. We calculate the exploitability using the linear programfor the game-theoretic solution, but with fixed probabilities for the defendersactions (and vice versa for the attacker).

Our second set of results shows the details of the optimal defender strategies,i.e., the probability that a honeypot with a specific value will be actually deployedin a computer network. We present the results about honeypot likelihood intwo different ways: (1) the probability that at least one honeypot of the givenvalue is used by the defender, and (2) the portion of honeypots assigned toeach value. In the first case we marginalize over the probabilities of defenderactions that add a honeypot of this value, weighted by the network configurationprobabilities. For the second case we marginalize over all defender actions andweight each component by the number of honeypots of this value added by theaction, as well as by the network configuration probabilities. We then renormalizethe probabilities (divide by k) so the proportions sum to 1. For example, if an

action dxI adds three honeypots of value 4, its component in the sum is3p(dxI )k .

5 Maximal expected value in the case of extended honeypot selection game

5.2 Basic HSG

Game Values The results for the game values are presented in Figure 3, firstrow. In Subfigure 3(a) we show the exploitability of the defender’s NE, Random,and Maximum strategies. The results show that the Maximum strategy gainsalmost no benefit from more than one honeypot, while the Random strategyshows a small gain. The NE is clearly stronger than the two baselines and showssignificant increases in defender’s utility as the number of honeypots increases.

Subfigure 3(b) shows the quality of attacker’s strategies against the defender’sNE strategy (higher values are better for the attacker). The Random strategyperforms better as there are more honeypots, and almost matches the other twostrategies when n = k. This suggests that the defender strategy for using hon-eypots is effectively making it impossible for the attacker to distinguish betweenthem based on value. From k = 0, Maximum strategy has essentially the samepayoff as the NE strategy, implying that it is part of the support set that theNE strategy randomizes over.

The second pair of subfigures (3(c), (d)) show the game values for the Yule-Simon distribution. For both the exploitability of defender strategies and the pay-offs of the attacker strategies against the NE, the progression is nearly identicalwith an increasing number of honeypots. The values are lower overall, whichreflects the lower frequency of high-valued machines. The only exception is theRandom strategy, which gains slightly more than the other strategies, thoughthe NE strategy is still better. The overall similarity of results shows that thechoice of distribution does not have a strong effect on the results.

Defender Strategy Analysis We next study the properties of the defenderstrategies to understand their structure. The plots in Figure 4, first row, showhow the defender chooses to assign values d ∈ {1, . . . , 4} to the honeypots (thecase with k = 0 is meaningless and not included). Each color represents one ofthe four possible values.

Figure 4(a) shows the probability that at least one honeypot of the given valueis used in the defender strategy. We see that it is very rare to use a honeypotwith value 1, because there is little to be gained for the defender from protectingsuch low-valued machines. When there are few honeypots this is also the case forvalue 2, but as the number of honeypots increases the prevalence of honeypotswith value 2 becomes more significant. In Figure 4(b), we present the expectedproportion of honeypots in the network that have each value. The proportionstend to converge somewhat as the number increases, with probabilities of lowervalued honeypots increasing slightly, while the higher values decrease slightly.This indicates a strategy closer to uniform random. The overall stability is quiteinteresting, because it suggests that network administrators can use the samebasic allocation ratio over a range of possible numbers of honeypots.

In Subfigures 4(c) and (d) we present the same results for the Yule-Simondistribution. There is a noticeable increase in probability of using at least onelower-valued server, primarily at the expense of value 4. The real configurationsare very likely to contain servers with a low value, and much less likely high

NE Maximum RandomStrategy:

0 1 2 3 4 50.5

1

2

3

4

Honeypots

Attacker’s U

tilit

y

(a) Exploitability,q = 0

0 1 2 3 4 50.5

1

2

3

4

Honeypots

(b) Attacker strat.against NE, q = 0

0 1 2 3 4 50.5

1

2

3

4

Honeypots

(c) Exploitability,Y-S ρ = 1, q = 0.

0 1 2 3 4 50.5

1

2

3

4

Honeypots

(d) Attacker strat.against NE, Y-Sρ = 1, q = 0.

0 1 2 3 4 50.5

1

2

3

4

Honeypots

Attacker’s U

tilit

y

(e) Exploitability, q =3

0 1 2 3 4 50.5

1

2

3

4

Honeypots

(f) Attacker strat.against NE, q = 3

0 1 2 3 4 50.5

1

2

3

4

Honeypots

(g) Exploitability,Y-S ρ = 1, q = 3

0 1 2 3 4 50.5

1

2

3

4

Honeypots

(h) Attacker strat.against NE, Y-Sρ = 1, q = 3

Figure 3. For all figures n = 5, h = {1, . . . , 5}, and D = {1, . . . , 4}. For (a), (b), (c),(d) q = 0 probes. (a) Exploitability of defender strategies for the HSG. (b) Value of theattacker strategies against Nash Equilibrium strategy NE of the defender for the HSG.(c) Game values for the with the Yule-Simon distribution of the real server values withρ = 1. (d) Value of the attacker strategies against NE strategy of the defender for theHSG with the Y-S distribution; For (e), (f), (g), (h) q = 3 probes. (e) Exploitabilityof defender strategies. (f) Value of the attacker strategies against defender NE. (g)Game values for the under Y-S. (h) Value of the attacker strategies against defenderNE under Y-S.

values, so the defender has to protect low values more. A slight tendency forconvergence can be seen here as well, but overall, the portions show less differencethan in Subfigure 4(b).

5.3 HSG with Probes

We next present results for our extended model with attacker probing capabil-ities. Our base set of parameters is α(R|R) = 0.9, α(HP |HP ) = 0.7, with q = 3probes. The probing probability α(R|R) models the assumption that it is verylikely for a R server not to behave as a HP. The α(HP |HP ) assumes that theattacker is only slightly less likely to uncover a HP.

The attacker’s Random strategy uniformly probes the servers and then ran-domly chooses the final server value to attack. The attacker’s Maximum strategyassumes a uniform prior probability of any server being real, regardless of thedefender’s strategy. The strategy probes according to the current highest expec-ted value, conditioned by the observations and probings. When all probes havebeen expended, the server with the maximum expected value is attacked. Whilethis strategy is simple from a strategic point of view, it has high memory require-

Honeypot importance: 1 2 3 4

1 2 3 4 50

0.5

1

Honeypots

Pro

babili

ty

(a) uniform

1 2 3 4 50

0.5

1

Honeypots

(b) portion, uni-form

1 2 3 4 50

0.5

1

Honeypots

(c) Y-S ρ = 1

1 2 3 4 50

0.5

1

Honeypots

(d) portion, Y-Sρ = 1 dist. of values

1 2 3 4 50

0.5

1

Honeypots

Pro

babili

ty

(e) uniform

1 2 3 4 50

0.5

1

Honeypots

(f) portion, uniform

1 2 3 4 50

0.5

1

Honeypots

(g) Y-S ρ = 1

1 2 3 4 50

0.5

1

Honeypots

(h) portion, Y-Sρ = 1

Figure 4. For all figures n = 5, h = {1, . . . , 5} and D = {1, . . . , 4}. For (a), (b),(c), (d) q = 0. (a) Probability of use of honeypot values under uniform distributionof real values. (b) The expected portion of selected honeypots having the given valueunder uniform distribution of real values. (c) Probability of use of honeypot valuesunder Yule-Simon distribution of the real server values with ρ = 1. (d) The expectedportion of honeypots under Y-S. For (e), (f), (g), (h) q = 3. (e) Probability of use ofhoneypot values under uniform. (f) The expected portion of honeypots under uniform.(g) Probability of use of honeypot values under Y-S. (h) The expected portion ofhoneypots under Y-S.

ments for evaluation because it needs to keep a separate probability vector foreach possible plan, which resulted in the missing data point for k = 5 honeypotsin Figures 3(e), (f).

Game Values The results for the game values are presented in Figure 3, secondrow. There is an almost linear decrease in attacker’s utility in Subfigure 3(e)which contrasts with the results for the setting with q = 0, especially for the NEstrategy (see Section 5.2). The almost-linearity is present also in the evaluationof other attacker strategies in Figure 3(f). The Maximum strategy comparesreasonably well with the NE strategy for the attacker, but this tradeoff is notrewarded by smaller demands on resources. The Random strategy performs sig-nificantly worse than the other two. These two observations support the use ofNE attacker strategy.

The results are very similar to the basic HSG, with only one differencebetween Subfigures 3(e), (f) and 3(g), (h) apart from the shift towards 0, becauseof the higher probability of lower valued servers, as opposed to lower probab-ility of higher valued. Comparing the results in Subfigure 3(e) and 3(a), weobserve that there is a difference between quality of the Random and Maximumstrategies. With q = 0, the Random strategy is better than Maximum, but with

q = 3, it is worse. An intuition for this is that higher values need to be protectedmore, because a positive probe result gives the attacker a high confidence thatthe machine is real.

Defender Strategy Analysis Most of the observations we make for the casewith no probes q = 0 hold for q = 3 as well. One exception is that the leveling outof 3 in Figure 4(b) is not present in Figure 4(f). Comparing subfigures of fromthe first row of Figure 4 (q = 0) with the second row (q = 3), we can see thatwith the increased amount of probes, the highest valued 4 is more preferred. Wespeculate that the reason for this might be the increased chance of the attacker ofdiscerning honeypots from real servers. The selected values for α(•|•) give highprobability of a server being real, if observed as real (α(R|R)), while a slightlylower probability for a honeypot observed as a honeypot (α(HP |HP )). Thiscould explain why probabilities for 3-valued servers do not level out in the secondcolumn (Figure 4(f)) when compared to the case with q = 0 (Figure 4(b)).

6 Conclusion

We introduce new game-theoretic models for analyzing honeypot allocation andconfiguration problems in network security. These models significantly extendprevious work in this area, and provide new insights into non-trivial strategiesfor using honeypots effectively in network security. Our model shows that hon-eypots should not always be configured to look like the most or least valuablemachines in a network, but instead the optimal strategy is randomized and dis-tributes honeypots that look like different types of machines on the network.This becomes increasingly important as networks move towards using a largernumber of honeypots as ways to deceive and attract the attention of attack-ers. This is shown in our empirical results as we see that the Nash equilibriumstrategies have a stronger performance relative to baselines as the number ofavailable honeypots increases.

The first model we present is a type of deception game, where the defendertries to disguise honeypots in a network so that the attacker will choose to attackhoneypots instead of real machines. Our second model extends this by includingprobing actions for the attackers, who can try to distinguish honeypots fromreal machines before actually launching an attack. The probes are noisy, so theattacker still needs to act with imperfect information in these models. We presentlinear programming models for solving both of these classes of games.

We study the behavior of both of our models empirically, using heuristicbaseline strategies for both players. We also vary the assumption about the dis-tribution of importance values on the network. The Nash equilibrium strategiesin our models significantly outperform the baseline strategies, regardless of thedistribution of values. We also studied the structure of the equilibrium strategiesin these games, which show that honeypot values in both cases should be dis-tributed across the space of possible configurations. As the number of honeypots

increases, there may be some change in the strategies, with the optimal strategiesplacing greater weight on lower values.

Our analysis shows that there are important strategic issue that must beinvestigated to maximize the efficiency of honeypots in network security, par-ticularly as the purpose of honeypots evolves from learning about attackers toactively deceiving and delaying attackers. It is not sufficient to consider only thetechnical issues involved in honeypot design, but also the strategic issues abouthow they should be used.

References

1. Spitzner, L.: Honeypots: tracking hackers. Addison-Wesley Professional (2003)2. Dornseif, M., Holz, T.: Nosebreak-attacking honeynets. In: 2004 IEEE Workshop

on Information Assurance and Security. Number June (2004) 10–113. Garg, N., Grosu, D.: Deception in Honeynets: A Game-Theoretic Analysis. In:

IEEE Information Assurance Workshop. (2007)4. Carroll, T., Grosu, D.: A game theoretic investigation of deception in network

security. Security and Communication Networks 4(10) (2011) 1162–11725. Wagener, G., State, R., Dulaunoy, A.: Self Adaptive High Interaction Honeypots

Driven by Game Theory. Stabilization, Safety, and (2009) 1–156. Williamson, S., Varakanthamn, P., Gao, D.: Active Malware Analysis using

Stochastic Games. In: AAMAS 2012. (2012)7. Shoham, Y., Leyton-Brown, K.: Multiagent Systems: Algorithmic, Game-

Theoretic, and Logical Foundations. Cambridge University Press (2009)8. Paruchuri, P., Pearce, J., Marecki, J., Tambe, M., Ordonez, F., Kraus, S.: Playing

games for security: an efficient exact algorithm for solving Bayesian Stackelberggames. In: Proceedings of AAMAS. (2008) 895–902

9. Spencer, J.: A deception game. American Mathematical Monthly (1973) 416–41710. Lee, K.: On a deception game with three boxes. Int. Jour. of Game Theory 22(2)

(1993) 89–9511. Cohen, F.: A mathematical structure of simple defensive network deception. Com-

puters & Security 19(6) (2000) 520–52812. McKelvey, R.D., McLennan, A.M., Turocy, T.L.: Gambit: Software Tools for Game

Theory. Technical report, Version 0.2006.01.20 (2006)13. von Stengel, B.: Efficient computation of behavior strategies. Games and Economic

Behavior (1996)14. Koller, D., Megiddo, N., von Stengel, B.: Efficient Computation of Equilibria for

Extensive Two-Person Games. Games and Economic Behavior (1996)

Game Theoretic Model of Strategic Honeypot Allocation … Theoretic Model of Strategic Honeypot Allocation in Computer Networks Radek P bil 1, Viliam Lisy , Christopher Kiekintveld2,

Documents

Game Theoretic Model of Strategic Honeypot Allocation … Theoretic Model of Strategic Honeypot Allocation in Computer Networks Radek P bil 1, Viliam Lisy , Christopher Kiekintveld2,