A Virtual Honeypot · PDF fileHoneypots can help with some of these problems. A honeypot is as a closely monitored computing re-source that we intend to be probed, attacked, or com

CITI Technical Report 03-1

A Virtual Honeypot Framework

Niels [email protected]

Abstract

A honeypot is a closely monitored network decoy serving several purposes: it can distract adversaries from morevaluable machines on a network, can provide early warning about new attack and exploitation trends, or allowin-depth examination of adversaries during and after exploitation of a honeypot. Deploying a physical honeypot isoften time intensive and expensive as different operating systems require specialized hardware and every honeypotrequires its own physical system. This paper presents Honeyd, a framework for virtual honeypots that simulatesvirtual computer systems at the network level. The simulated computer systems appear to run on unallocatednetwork addresses. To deceive network fingerprinting tools, Honeyd simulates the networking stack of differentoperating systems and can provide arbitrary routing topologies and services for an arbitrary number of virtualsystems. This paper discusses Honeyd’s design and shows how the Honeyd framework helps in many areas ofsystem security, e.g. detecting and disabling worms, distracting adversaries, or preventing the spread of spamemail.

October 21, 2003

Center for Information Technology IntegrationUniversity of Michigan

535 West William StreetAnn Arbor, MI 48103-4943

.

A Virtual Honeypot Framework

Niels Provos∗

Google, [email protected]

1 Introduction

Internet security is increasing in importance as moreand more business is conducted there. Yet, despitedecades of research and experience, we are still unableto make secure computer systems or even measure theirsecurity.

As a result, exploitation of newly discovered vul-nerabilities often catches us by surprise. Exploit au-tomation and massive global scanning for vulnerabil-ities enable adversaries to compromise computer sys-tems shortly after vulnerabilities become known [23].

One way to get early warnings of new vulnerabilitiesis to install and monitor computer systems on a net-work that we expect to be broken into. Every attemptto contact these systems via the network is suspect.We call such a system a honeypot. If a honeypot iscompromised, we study the vulnerability that was usedto compromise it. A honeypot may run any operatingsystem and any number of services. The configured ser-vices determine the vectors an adversary may chooseto compromise the system.

A physical honeypot is a real machine with its ownIP address. A virtual honeypot is simulated by anothermachine that responds to network traffic sent to thevirtual honeypot.

Virtual honeypots are attractive because they re-quirer fewer computer systems, which reduces mainte-nance costs. Using virtual honeypots, it is possible topopulate a network with hosts running numerous op-erating systems. To convince adversaries that a virtualhoneypot is running a given operating system, we needto simulate the TCP/IP stack of the target operatingsystem carefully, in order to fool TCP/IP stack finger-printing tools like Xprobe [1] or Nmap [7].

This paper describes the design and implementationof Honeyd, a framework for virtual honeypots that sim-ulates computer systems at the network level. Honeydsupports the IP protocol suites [24] and responds tonetwork requests for its virtual honeypots according to

∗This research was conducted by the author while at the Cen-ter for Information Technology Integration of the University ofMichigan.

the services that are configured for each virtual honey-pot. When sending a response packet, Honeyd’s per-sonality engine makes it match the network behaviorof the configured operating system personality.

To simulate real networks, Honeyd creates virtualnetworks that consist of arbitrary routing topologieswith configurable link characteristics such as latencyand packet loss. When networking mapping tools liketraceroute are used to probe the virtual network, theydiscover only the topologies simulated by Honeyd.

Our experimental evaluation of Honeyd verifies thatfingerprinting tools are deceived by the simulated sys-tems and that our virtual network topologies seem re-alistic to network mapping tools.

To demonstrate the power of the Honeyd framework,we show how it can be used in many areas of systemsecurity. For example, Honeyd can help with detectingand disabling worms, distracting adversaries, or pre-venting the spread of spam email.

The rest of this paper is organized as follows. Sec-tion 2 presents background information on honeypots.In Section 3, we discuss the design and implementa-tion of Honeyd. Section 4 presents an evaluation of theHoneyd framework in which we verify that fingerprint-ing and network mapping tools are fooled to report thespecified system configurations. We describe how Hon-eyd can help to improve system security in Section 5and present related work in Section 6. We summarizeand conclude in Section 7.

2 Honeypots

This section presents background information onhoneypots and our terminology. We provide motiva-tion for their use by comparing honeypots to networkintrusion detection systems (NIDS) [17]. The amountof useful information provided by NIDS is decreasingin the face of ever more sophisticated evasion tech-niques [19, 26] and an increasing number of protocolsthat employ encryption to protect network traffic fromeavesdroppers. NIDS also suffer from high false posi-tive rates that decrease their usefulness even further.

Honeypots can help with some of these problems.A honeypot is as a closely monitored computing re-

source that we intend to be probed, attacked, or com-promised. The value of a honeypot is determined bythe information that we can obtain from it. Monitor-ing the data that enters and leaves a honeypot letsus gather information that is not available to NIDS.For example, we can log the key strokes of an inter-active session even if encryption is used to protect thenetwork traffic. To detect malicious behavior, NIDSrequire signatures of known attacks and often fail todetect compromises that were unknown at the time itwas deployed. On the other hand, honeypots can de-tect vulnerabilities that are not yet understood. Forexample, we can detect compromise by observing net-work traffic leaving the honeypot even if the means ofthe exploit has never been seen before.

Because a honeypot has no production value, any at-tempt to contact it is suspicious. Consequently, foren-sic analysis of data collected from honeypots is lesslikely to lead to false positives than data collected byNIDS.

Honeypots can run any operating system and anynumber of services. The configured services determinethe vectors available to an adversary for compromis-ing or probing the system. A high-interaction honey-pot simulates all aspects of an operating system. Alow-interaction honeypots simulates only some parts,for example the network stack [22]. A high-interactionhoneypot can be compromised completely, allowing anadversary to gain full access to the system and use itto launch further network attacks. In contrast, low-interaction honeypots simulate only services that can-not be exploited to get complete access to the hon-eypot. Low-interaction honeypots are more limited,but they are useful to gather information at a higherlevel, e.g., learn about network probes or worm activ-ity. They can also be used to analyze spammers or foractive countermeasures against worms; see Section 5.

We also differentiate between physical and virtualhoneypots. A physical honeypot is a real machine onthe network with its own IP address. A virtual honey-pot is simulated by another machine that responds tonetwork traffic sent to the virtual honeypot.

When gathering information about network attacksor probes, the number of deployed honeypots influ-ences the amount and accuracy of the collected data. Agood example is measuring the activity of HTTP basedworms [21]. We can identify these worms only afterthey complete a TCP handshake and send their pay-load. However, most of their connection requests willgo unanswered because they contact randomly chosenIP addresses. A honeypot can capture the worm pay-

load by configuring it to function as a web server. Themore honeypots we deploy the more likely one of themis contacted by a worm.

Physical honeypots are often high-interaction, so al-lowing the system to be compromised completely, theyare expensive to install and maintain. For large ad-dress spaces, it is impractical or impossible to deploya physical honeypot for each IP address. In that case,we need to deploy virtual honeypots.

3 Design and Implementation

In this section, we present Honeyd, a lightweightframework for creating virtual honeypots. The frame-work allows us to instrument thousands of IP addresseswith virtual machines and corresponding network ser-vices. We start by discussing design considerations,then describe Honeyd’s architecture and implementa-tion.

We limit adversaries to interacting with our honey-pots only at the network level. Instead of simulatingevery aspect of an operating system, we choose to sim-ulate only its network stack. The main drawback ofthis approach is that an adversary never gains accessto a complete system even if he compromises a simu-lated service. On the other hand, we are still able tocapture connection and compromise attempts. How-ever, we can mitigate these drawbacks by combiningHoneyd with a virtual machine like Vmware [25]. Thisis discussed in the related work section. For that rea-son, Honeyd is a low-interaction virtual honeypot thatsimulates TCP and UDP services. It also understandsand responds correctly to ICMP messages.

Honeyd must be able to handle virtual honeypotson multiple IP addresses simultaneously, in order topopulate the network with numerous virtual honeypotssimulating different operating systems and services. Toincrease the realism of our simulation, the frameworkmust be able to simulate arbitrary network topologies.To simulate address spaces that are topologically dis-persed and for load sharing, the framework also needsto support network tunneling.

Figure 1 shows a conceptual overview of the frame-work’s operation. A central machine intercepts net-work traffic sent to the IP addresses of configured hon-eypots and simulates their responses. Before we de-scribe Honeyd’s architecture, we explain how networkpackets for virtual honeypots reach the Honeyd host.

3.1 Receiving Network Data

Honeyd is designed to reply to network packetswhose destination IP address belongs to one of the sim-

Figure 1: Honeyd receives traffic for its virtual honeypotsvia a router or Proxy ARP. For each honeypot, Honeydcan simulate the network stack behavior of a differentoperating system.

ulated honeypots. For Honeyd, to receive the correctpackets, the network needs to be configured appropri-ately. There are several ways to do this, e.g., we cancreate special routes for the virtual IP addresses thatpoint to the Honeyd host, or we can use Proxy ARP [3],or we can use network tunnels.

Let A be the IP address of our router and B the IPaddress of the Honeyd host. In the simplest case, theIP addresses of virtual honeypots lie within our localnetwork. We denote them V1, . . . , Vn. When an adver-sary sends a packet from the Internet to honeypot Vi,router A receives and attempts to forward the packet.The router queries its routing table to find the forward-ing address for Vi. There are three possible outcomes:the router drops the packet because there is no routeto Vi, router A forwards the packet to another router,or Vi lies in local network range of the router and thusis directly reachable by A.

We use the latter two cases to direct traffic for Vi toB. The easiest way is to configure routing entries for Vi

with 1 ≤ i ≤ n that point to B. In that case, the routerforwards packets for our virtual honeypots directly tothe Honeyd host. If no special route has been config-ured, the router ARPs to determine the MAC addressof the virtual honeypot. As there is no correspond-ing physical machine, the ARP requests go unansweredand the router drops the packet after a few retries. Weconfigure the Honeyd host to reply to ARP requests forVi with its own MAC addresses. This is called ProxyARP and allows the router to send packets for Vi toB’s MAC address.

In more complex environments, it is possible to tun-nel network address space to a Honeyd host. We use

Figure 2: This diagram gives an overview of Honeyd’sarchitecture. Incoming packets are dispatched to the cor-rect protocol handler. For TCP and UDP, the configuredservices receive new data and send responses if neces-sary. All outgoing packets are modified by the personalityengine to mimic the behavior of the configured networkstack. The routing component is optional and used onlywhen Honeyd simulates network topologies.

the generic routing encapsulation (GRE) [9, 10] tun-neling protocol described in detail in Section 3.4.

3.2 Architecture

Honeyd’s architecture consists of several compo-nents: a configuration database, a central packet dis-patcher, protocol handlers, a personality engine, andan optional routing component; see Figure 2.

Incoming packets are processed by the central packetdispatcher. It first checks the length of an IP packetand verifies the packet’s checksum. The framework isaware of the three major Internet protocols: ICMP,TCP and UDP. Packets for other protocols are loggedand silently discarded.

Before it can process a packet, the dispatcher mustquery the configuration database to find a honeypotconfiguration that corresponds to the destination IPaddress. If no specific configuration exists, a defaulttemplate is used. Given a configuration, the packet andcorresponding configuration is handed to the protocolspecific handler.

The ICMP protocol handler supports most ICMPrequests. By default, all honeypot configurations re-spond to echo requests and process destination un-reachable messages. The handling of other requestsdepends on the configured personalities as describedin Section 3.3.

For TCP and UDP, the framework can establish con-

nections to arbitrary services. Services are externalapplications that receive data on stdin and send theiroutput to stdout. The behavior of a service depends en-tirely on the external application. When a connectionrequest is received, the framework checks if the packetis part of an established connection. In that case, anynew data is sent to the already started service appli-cation. If the packet contains a connection request, anew process is created to run the appropriate service.Instead of creating a new process for each connection,the framework supports subsystems. A subsystem isan application that runs in the name space of the vir-tual honeypot. The subsystem specific application isstarted when the corresponding virtual honeypot is in-stantiated. A subsystem can bind to ports, accept con-nections, and initiate network traffic.

Honeyd contains a simplified TCP state machine.The three-way handshake for connection establishmentand connection teardown via FIN or RST are fully sup-ported, but receiver and congestion window manage-ment is not fully implemented.

UDP datagrams are passed directly to the applica-tion. When the framework receives a UDP packet fora closed port, it sends an ICMP port unreachable mes-sage unless this is forbidden by the configured personal-ity. In sending ICMP port unreachable messages, theframework allows network mapping tools like tracer-oute to discover the simulated network topology.

In addition to establishing a connection to a lo-cal service, the framework also supports redirection ofconnections. The redirection may be static or it candepend on the connection quadruple (source address,source port, destination address and destination port).Redirection lets us to forward a connection request fora service on a virtual honeypot to a service runningon a real server. For example, we can redirect DNSrequests to a proper name server. Or we can reflectconnections back to an adversary, e.g. just for run wemight redirect an SSH connection back to the origi-nating host and cause the adversary to attack her ownSSH server. Evil laugh.

Before a packet is sent to the network, it is processedby the personality engine. The personality engine ad-justs the packet’s content so that it appears to originatefrom the network stack of the configured operating sys-tem.

3.3 Personality Engine

Adversaries commonly run fingerprinting tools likeXprobe [1] or Nmap [7] to gather information about atarget system. It is important that honeypots do notstand out when fingerprinted. To make them appearreal to a probe, Honeyd simulates the network stack

Fingerprint IRIX 6.5.15m on SGI O2

TSeq(Class=TD%gcd=<104%SI=<1AE%IPID=I%TS=2HZ)

T1(DF=N%W=EF2A%ACK=S++%Flags=AS%Ops=MNWNNTNNM)

T2(Resp=Y%DF=N%W=0%ACK=S%Flags=AR%Ops=)

T3(Resp=Y%DF=N%W=EF2A%ACK=O%Flags=A%Ops=NNT)

T4(DF=N%W=0%ACK=O%Flags=R%Ops=)

T5(DF=N%W=0%ACK=S++%Flags=AR%Ops=)

T6(DF=N%W=0%ACK=O%Flags=R%Ops=)

T7(DF=N%W=0%ACK=S%Flags=AR%Ops=)

PU(Resp=N)

Figure 3: An example of an Nmap fingerprint that speci-fies the network stack behavior of a system running IRIX.

behavior of a given operating system. We call this thepersonality of a virtual honeypot. Different personali-ties can be assigned to different virtual honeypots. Thepersonality engine makes a honeypot’s network stackbehave as specified by the personality by introducingchanges into the protocol headers of every outgoingpacket so that they match the characteristics of theconfigured operating system.

The framework uses Nmap’s fingerprint database asits reference for a personality’s TCP and UCP behav-ior; Xprobe’s fingerprint database is used as referencefor a personality’s ICMP behavior.

Next, we explain how we use the information pro-vided by Nmap’s fingerprints to change the character-istics of a honeypot’s network stack.

Each Nmap fingerprint has a format similar to theexample shown in Figure 3. We use the string afterthe Fingerprint token as the personality name. Thelines after the name describe the results for nine differ-ent tests. The first test is the most comprehensive. Itdetermines how the network stack of the remote oper-ating system creates the initial sequence number (ISN)for TCP SYN segments. Nmap indicates the difficultyof predicting ISNs in the Class field. Predictable ISNspost a security problem because they allow an adver-sary to spoof connections [2]. The gcd and SI fieldprovide more detailed information about the ISN dis-tribution. The first test also determines how IP identi-fication numbers and TCP timestamps are generated.

The next seven tests determine the stack’s behaviorfor packets that arrive on open and closed TCP ports.The last test analyzes the ICMP response packet to aclosed UDP port.

The framework keeps state for each honeypot. Thestate includes information about ISN generation, theboot time of the honeypot and the current IP packetidentification number. Keeping state is necessary sothat we can generate subsequent ISNs that follow thedistribution specified by the fingerprint.

Figure 4: The diagram shows the structure of the TCPheader. Honeyd changes options and other parameters tomatch the behavior of network stacks.

Nmap’s fingerprinting is mostly concerned with anoperating system’s TCP implementation. TCP is astateful, connection-oriented protocol that provides er-ror recovery and congestion control [18]. TCP also sup-ports additional options, not all of which implementedby all systems. The size of the advertised receiver win-dows varies between implementations and is used byNmap as part of the fingerprint.

When the framework sends a packet for a newly es-tablished TCP connection, it uses the Nmap fingerprintto see the initial window size. After a connection hasbeen established, the framework adjusts the windowsize according to the amount of buffered data.

If TCP options present in the fingerprint have beennegotiated during connection establishment, then Hon-eyd inserts them into the response packet. The frame-work uses the fingerprint to determine the frequencywith which TCP timestamps are updated. For mostoperating systems, the update frequency is 2 Hz.

Generating the correct distribution of initial se-quence numbers is tricky. Nmap obtains six ISN sam-ples and analyzes their consecutive differences. Nmaprecognizes several ISN generation types: constant dif-ferences, differences that are multiples of a constant,completely random differences, time dependent andrandom increments. To differentiate between the lat-ter two cases, Nmap calculates the greatest commondivisor (gcd) and standard deviation for the collecteddifferences.

The framework keeps track of the last ISN that wasgenerated by each honeypot and its generation time.For new TCP connection requests, Honeyd uses a for-mula that approximates the distribution described by

Figure 5: The diagram shows the structure of an ICMPport unreachable message. Honeyd introduces errors intothe quoted IP header to match the behavior of networkstacks.

the fingerprint’s gcd and standard deviation. In thisway, the generated ISNs match the generation classthat Nmap expects for the particular operating system.

For the IP header, Honeyd adjusts the generationof the identification number. It can either be zero,increment by one, or random.

For ICMP packets, the personality engine uses thePU test entry to determine how the quoted IP headershould be modified using the associated Xprobe finger-print for further information. Some operating systemsmodify the incoming packet by changing fields fromnetwork to host order and as a result quote the IP andUDP header incorrectly. Honeyd introduces these er-rors if necessary. Figure 5 shows an example for anICMP destination unreachable message. The frame-work also supports the generation of other ICMP mes-sages, not described here due to space considerations.

3.4 Routing Topology

Honeyd can simulate arbitrary virtual routingtopologies, so it is not always possible to use ProxyARP to direct the packets to the Honeyd host. In-stead, we need to configure routers to delegate networkaddress space to our host.

Normally, the virtual routing topology is a treerooted where packets enter the virtual routing topol-ogy. Each interior node of the tree represents a routerand each edge a link that contains latency and packetloss characteristics. Terminal nodes of the tree corre-spond to networks. The framework supports multipleentry points that can exit in parallel. An entry router ischosen by the network space for which it is responsible.

To simulate an asymmetric network topology, weconsult the routing tables when a packet enters theframework and again when it leaves the framework; seeFigure 2. In this case, the network topology resemblesa directed acyclic graph1.

1Although it is possible to configure routing loops, this is

When the framework receives a packet, it finds thecorrect entry routing tree and traverses it, starting atthe root until it finds a node that contains the destina-tion IP address of the packet. Packet loss and latencyof all edges on the path are accumulated to determine ifthe packet is dropped and how long its delivery shouldbe delayed.

The framework also decrements the time to live(TTL) field of the packet for each traversed router. Ifthe TTL reaches zero, the framework sends an ICMPtime exceeded message with the source IP address ofthe router that causes the TTL to reach zero.

For network simulations, it is possible to integratereal systems into the virtual routing topology. Whenthe framework receives a packet for a real system, ittraverses the topology until it finds a virtual routerthat is directly responsible for the network space thatthe real machine belongs to. The framework sends anARP request if necessary to discover the hardware ad-dress of the system, then encapsulates the packet inan Ethernet frame. Similarly, the framework respondswith ARP replies from the corresponding virtual routerwhen the real system sends ARP requests.

We can split the routing topology using GRE to tun-nel networks. This allows us to load balance acrossseveral Honeyd installations by delegating parts of theaddress space to different Honeyd hosts. Using GREtunnels, it is also possible to delegate networks that be-long to separate parts of the address space to a singleHoneyd host. For the reverse route, an outgoing tunnelis selected based both on the source and the destina-tion IP address. An example of such a configuration isdescribed in Section 5.

3.5 Configuration

A virtual honeypots is configured with a template, areference for a completely configured computer system.New templates are created with the create command.

The set and add commands change the configurationof a template. The set command assigns a personalityfrom the Nmap fingerprint file to a template. The per-sonality determines the behavior of the network stack,as discussed in Section 3.3. The set command also de-fines the default behavior for the supported networkprotocols. The default behavior is one of the follow-ing values: block, reset, or open. Block means thatall packets for the specified protocol are dropped bydefault. Reset indicates that all ports are closed bydefault. Open means that they are all open by default.The latter settings make a difference only for UDP andTCP.

normally undesirable and should be avoided.

We specify the services that are remotely accessiblewith the add command. In addition to the templatename, we need to specify the protocol, port and thecommand to execute for each service. Instead of spec-ifying a service, Honeyd also recognizes the keywordproxy that allows us to forward network connections toa different host. The framework expands the follow-ing four variables for both the service and the proxystatement: $ipsrc, $ipdst, $sport, and $dport. Vari-able expansion allows a service to adapt its behaviordepending on the particular network connection it ishandling. It is also possible to redirect network probesback to the host that is doing the probing.

The bind command assigns a template to an IP ad-dress. If no template is assigned to an IP address, weuse the default template. Figure 6 shows an exampleconfiguration that specifies a routing topology and twotemplates. The router template mimics the networkstack of a Cisco 7206 router and is accessible only viatelnet. The web server template runs two services: asimple web server and a forwarder for SSH connections.In this case, the forwarder redirects SSH connectionsback to the connection initiator. A real machine is in-tegrated into the virtual routing topology at IP address10.1.0.3.

3.6 Logging

The Honeyd framework supports several ways of log-ging network activity. It can create connection logsthat report attempted and completed connections forall protocols. More usefully, information can be gath-ered from the services themselves. Service applicationscan report data to be logged to Honeyd via stderr. Theframework uses syslog to store the information on thesystem. In most situations, we expect that Honeydruns in conjunction with a NIDS.

4 Evaluation

This section presents an evaluation of Honeyd’s abil-ity to create virtual network topologies and to mimicdifferent network stacks.

We start Honeyd with a configuration similar to theone shown in Figure 6 and use traceroute to find therouting path to a virtual host. We notice that the mea-sured latency is double the latency that we configured.This is correct because packets have to traverse eachlink twice.

Running Nmap against IP addresses 10.0.0.1 and10.1.0.2 results in the correct identification of theconfigured personalities. Nmap reports that 10.0.0.1seems to be a Cisco router and that 10.1.0.2 seems

route entry 10.0.0.1

route 10.0.0.1 link 10.0.0.0/24

route 10.0.0.1 add net 10.1.0.0/16 10.1.0.1 latency 55ms loss 0.1

route 10.0.0.1 add net 10.2.0.0/16 10.2.0.1 latency 20ms loss 0.1

route 10.1.0.1 link 10.1.0.0/24

route 10.2.0.1 link 10.2.0.0/24

create routerone

set routerone personality "Cisco 7206 running IOS 11.1(24)"

set routerone default tcp action reset

add routerone tcp port 23 "scripts/router-telnet.pl"

create netbsd

set netbsd personality "NetBSD 1.5.2 running on a Commodore Amiga (68040 processor)"

set netbsd default tcp action reset

add netbsd tcp port 22 proxy $ipsrc:22

add netbsd tcp port 80 "sh scripts/web.sh"

bind 10.0.0.1 routerone

bind 10.1.0.2 netbsd

bind 10.1.0.3 to fxp0

Figure 6: An example configuration for Honeyd. The configuration language is a context-free grammar. This examplecreates a virtual routing topology and defines two templates: a router that can be accessed via telnet and a host thatis running a web server. A real system is integrated into the virtual routing topology at IP address 10.1.0.3.

to run NetBSD. Xprobe identifies 10.0.0.1 as Ciscorouter and lists a number of possible operating systems,including NetBSD, for 10.1.0.2.

To fully test if the framework fools Nmap, we set upa B-class network populated with virtual honeypots forevery fingerprint in Nmap’s fingerprint file. After re-moving duplicates, we found 600 distinct fingerprints.The honeypots were configured so that all but one portwas closed; the open port ran a web server. We thenlaunched Nmap against all configured IP addresses andchecked which operating systems Nmap identifies. For555 fingerprints, Nmap uniquely identified the operat-ing system simulated by Honeyd. For 37 fingerprints,Nmap presented a list of possible choices that includedthe simulated personality. Nmap failed to identify thecorrect operating system for only 8 fingerprints. Thismight be a problem of Honeyd, or it could be due to abadly formed fingerprint database.

5 Applications

In this section, we describe how the Honeyd frame-work can be used in different areas of system security.

5.1 Network Decoys

The traditional role of a honeypot is that of a net-work decoy. Our framework can be used to instrument

$ traceroute -n 10.3.0.10

traceroute to 10.3.0.10 (10.3.0.10), 64 hops max

1 10.0.0.1 0.456 ms 0.193 ms 0.93 ms

2 10.2.0.1 46.799 ms 45.541 ms 51.401 ms

3 10.3.0.1 68.293 ms 69.848 ms 69.878 ms

4 10.3.0.10 79.876 ms 79.798 ms 79.926 ms

Figure 7: Using traceroute, we measure a routing pathin the virtual routing topology. The measured latenciesmatch the configured ones.

the unallocated addresses of a production network withvirtual honeypots. Adversaries that scan the produc-tion network can potentially be confused and deterredby the virtual honeypots. In conjunction with a NIDS,the resulting network traffic may help in getting earlywarning of attacks.

5.2 Detecting and Countering Worms

Honeypots are ideally suited to intercept traffic fromadversaries that randomly scan the network. This isespecially true for Internet worms that use some form ofrandom scanning for new targets [23], e.g. Blaster [5],Code Red [13], Nimda [4], Slammer [14], etc. In thissection, we show how a virtual honeypot deploymentcan be used to detect new worms and how to launchactive countermeasures against infected machines once

Figure 8: The graphs show the simulated worm propagation when immunizing infected hosts that connect to a virtualhoneypot. The left graph shows the propagation if the virtual honeypots are activated one hour after the worm startsspreading. The right graph shows the propagation if the honeypots are activated after 20 minutes.

a worm has been identified.To intercept probes from worms, we instrument vir-

tual honeypots on unallocated network addresses. Theprobability of receiving a probe depends on the numberof infected machines i, the worm propagation chanceand the number of deployed honeypots h. The wormpropagation chance depends on the worm propagationalgorithm, the number of vulnerable hosts and the sizeof the address space. In general, the larger our hon-eypot deployment the earlier one of the honeypots re-ceives a worm probe.

To detect new worms, we can use the Honeyd frame-work in two different ways. We may deploy a largenumber of virtual honeypots as gateways in front of asmaller number of high-interaction honeypots. Hon-eyd instruments the virtual honeypots. It forwardsonly TCP connections that have been established andonly UDP packets that carry a payload that fail tomatch a known fingerprint. In such a setting, Hon-eyd shields the high-interaction honeypots from un-interesting scanning or backscatter activity. A high-interaction honeypot like ReVirt [6] is used to de-tect compromises or unusual network activity. Usingthe automated NIDS signature generation proposedby Kreibich et al. [12], we can then block the de-tected worm or exploit at the network border. Theeffectiveness of this approach has been analyzed byMoore et al. [15]. To improve it, we can configure Hon-eyd to replay packets to several high-interaction honey-pots that run different operating systems and softwareversions.

On the other hand, we can use Honeyd’s subsys-tem support to expose regular UNIX applications likeOpenSSH to worms. This solution is limiting as we are

restricted to detecting worms only for the operatingsystem that is running the framework and most wormstarget Microsoft Windows, not UNIX.

Moore et al. show that containing worms is notpractical on an Internet scale unless a large fraction ofthe Internet cooperates in the containment effort [15].However, with the Honeyd framework, it is possible toactively counter worm propagation by immunizing in-fected hosts that contact our virtual honeypots. Anal-ogous to Moore et al. [15], we can model the effect ofimmunization on worm propagation by using the clas-sic SIR epidemic model [11]. The model states that thenumber of newly infected hosts increases linearly withthe product of infected hosts, fraction of susceptiblehosts and contact rate. The immunization is repre-sented by a decrease in new infections that is linear inthe number of infected hosts:

ds

dt= −β i(t)s(t)

di

dt= β i(t)s(t)− γ i(t)

dr

dt= γ i(t),

where at time t, i(t) is the fraction of infected hosts,s(t) the fraction of susceptible hosts and r(t) the frac-tion of immunized hosts. The propagation speed ofthe worm is characterized by the contact rate β andthe immunization rate is represented by γ.

We simulate worm propagation based on the param-eters for a Code-Red like worm [13, 15]. We use 360,000susceptible machines in a 232 address space and set theinitial worm seed to 150 infected machines. Each worm

launches 50 probes per second. The simulation mea-sures the effectiveness of using active immunization byvirtual honeypots. The honeypots start working af-ter a time delay. The time delay represents the timethat is required to detect the worm and install the im-munization code. We expect that immunization codecan be prepared before a vulnerability is actively ex-ploited. Figure 8 shows the worm propagation result-ing from a varying number of instrumented honeypots.The graph on the left shows the results if the honey-pots are brought online an hour after the worm startedspreading. The graph on the right shows the resultsif the honeypots can be activated within 20 minutes.If we wait for an hour, all vulnerable machines on theInternet will be infected. Our chances are better if westart the honeypots after 20 minutes. In that case,a deloyment of about 262,000 honeypots is capable ofstopping the worm from spreading.

Alternatively, it would be possible to scan the Inter-net for vulnerable systems and remotely patch them.For ethical reasons, this is probably unfeasible. How-ever, if we can reliably detect an infected machinewith our virtual honeypot framework, then active im-munization might be an appropriate response. Forthe Blaster worm, this idea has been realized byOudot et al. [16].

5.3 Spam Prevention

The Honeyd framework can be used to understandhow spammers operate and to automate the identifi-cation of new spam which can then be submitted tocollaborative spam filters.

In general, spammers abuse two Internet services:proxy servers [8] and open mail relays. Open proxiesare often used to connect to other proxies or to sub-mit spam email to open mail relays. Spammers canuse open proxies to anonymize their identity to pre-vent tracking the spam back to its origin. An openmail relay accepts email from any sender address toany recipient address. By sending spam email to openmail relays, a spammer causes the mail relay to deliverthe spam in his stead.

To understand how spammers operate we use theHoneyd framework to instrument networks with openproxy servers and open mail relays. We make use ofHoneyd’s GRE tunneling capabilities and tunnel sev-eral C-class networks to a central Honeyd host.

We populate our network space with randomly cho-sen IP addresses and a random selection of services.Some virtual hosts may run an open proxy and othersmay just run an open mail relay or a combination ofboth.

When a spammer attempts to send spam email via

Figure 9: Using the Honeyd framework, it is possible toinstrument networks to automatically capture spam andsubmit it to collaborative filtering systems.

an open proxy or an open mail relay, the email isautomatically redirected to a spam trap. The spamtrap then submits the collected spam to a collabora-tive spam filter.

At this writing, Honeyd has received and processedmore than 26,000 spam emails. A detailed evaluationis the subject of future work.

6 Related Work

There are several areas of research in TCP/IP stackfingerprinting, among them: effective methods to clas-sify the remote operating system either by active prob-ing or by passive analysis of network traffic, and defeat-ing TCP/IP stack fingerprinting by normalizing net-work traffic.

Fyodor’s Nmap uses TCP and UDP probes to deter-mine the operating system of a host [7]. Nmap collectsthe responses of a network stack to different queries andmatches them to a signature database to determine theoperating systems of the queried host. Nmap’s finger-print database is extensive and we use it as the refer-ence for operating system personalities in Honeyd.

Instead of actively probing a remote host to deter-mine its operating systems, it is possible to identify theremote operating system by passively analyzing its net-work packets. P0f [27] is one such tool. The TCP/IPflags inspected by P0f are similar to the data collectedin Nmap’s fingerprint database.

On the other hand, Smart et al. show how to de-feat fingerprinting tools by scrubbing network pack-ets so that artifacts identifying the remote operatingsystem are removed [20]. This approach is similar toHoneyd’s personality engine as both systems changenetwork packets to influence fingerprinting tools. In

contrast to the fingerprint scrubber that removes iden-tifiable information, Honeyd changes network packetsto contain artifacts of the configured operating system.

High-interaction virtual honeypots can beconstructed using User Mode Linux (UML) orVmware [25]. One example is ReVirt which canreconstruct the state of the virtual machine for anypoint in time [6]. This is very helpful for forensicanalysis after the virtual machine has been compro-mised. Although high-interaction virtual honeypotscan be fully compromised, it is not easy to instrumentthousands of high-interaction virtual machines dueto their overhead. However, the Honeyd frameworkallows us to instrument unallocated network spacewith thousands of virtual honeypots. Furthermore,we may use a combination of Honeyd and virtualmachines to get the benefit of both approaches. In thiscase, Honeyd provides network facades and selectivelyproxies connections to services to backends providedby high-interaction virtual machines.

7 Conclusion

Honeyd is a framework for creating virtual honey-pots. Honeyd mimics the network stack behavior ofoperating systems to deceive fingerprinting tools likeNmap and Xprobe.

We gave an overview of Honeyd’s design and archi-tecture. Our evaluation shows that Honeyd is effectivein creating virtual routing topologies and successfullyfools fingerprinting tools.

We showed how the Honeyd framework can be de-ployed to help in different areas of system security, e.g.,worm detection, worm countermeasures, or spam pre-vention.

Honeyd is freely available as source code and canbe downloaded from http://www.citi.umich.edu/u/provos/honeyd/.

8 Acknowledgments

I thank Marius Eriksen and Peter Honeyman forcareful reviews and suggestions. Jamie Van Randwyk,Dug Song and Eric Thomas also provided helpful sug-gestions and contributions.

References

[1] Ofir Arkin and Fyodor Yarochkin. Xprobe v2.0: A“Fuzzy” Approach to Remote Active Operating Sys-tem Fingerprinting. www.xprobe2.org, August 2002.2, 5

[2] Steven M. Bellovin. Security problems in the TCP/IPprotocol suite. Computer Communications Review,19:2:32–48, 1989. 6

[3] Smoot Carl-Mitchell and John S. Quarterman. Us-ing ARP to Implement Transparent Subnet Gateways.RFC 1027, October 1987. 4

[4] CERT. Cert advisory ca-2001-26 nimda worm. www.

cert.org/advisories/CA-2001-26.html, September2001. 9

[5] CERT. Cert advisory ca-2003-20 w32/blasterworm. www.cert.org/advisories/CA-2003-20.html,August 2003. 9

[6] George W. Dunlap, Samuel T. King, Sukru Cinar,Murtaza Basrai, and Peter M. Chen. ReVirt: EnablingIntrusion Analysis through Virtual-Machine Loggingand Replay. In Proceedings of the 2002 Symposiumon Operating Systems Design and Implementation, De-cember 2002. 9, 11

[7] Fyodor. Remote OS Detection via TCP/IPStack Fingerprinting. www.nmap.org/nmap/

nmap-fingerprinting-article.html, October1998. 2, 5, 11

[8] S. Glassman. A Caching Relay for the World WideWeb. In Proceedings of the First International WorldWide Web Conference, pages 69–76, May 1994. 11

[9] S. Hanks, T. Li, D. Farinacci, and P. Traina. GenericRouting Encapsulation (GRE). RFC 1701, October1994. 4

[10] S. Hanks, T. Li, D. Farinacci, and P. Traina. GenericRouting Encapsulation over IPv4 networks. RFC 1702,October 1994. 4

[11] Herbert W. Hethcote. The Mathematics of InfectiousDiseases. SIAM Review, 42(4):599–653, 2000. 9

[12] C. Kreibich and J. Crowcroft. Automated NIDS Signa-ture Generation using Honeypots. Poster paper, ACMSIGCOMM 2003, August 2003. 9

[13] D. Moore, C. Shannon, and J. Brown. Code-Red: ACase Study on The Spread and Victims of an Inter-net Worm. In Proceedings of the 2nd ACM InternetMeasurement Workshop, pages 273–284. ACM Press,November 2002. 9, 10

[14] David Moore, Vern Paxson, Stefan Savage, ColleenShannon, Stuart Staniford, and Nicholas Weaver. In-side the Slammer Worm. IEEE Security and Privacy,1(4):33–39, July 2003. 9

[15] David Moore, Colleen Shannon, Geoffrey Voelker, andStefan Savage. Internet Quarantine: Requirements forContaining Self-Propagating Code. In Proceedings ofthe 2003 IEEE Infocom Conference, April 2003. 9, 10

[16] Laurent Oudot. Fighting worms with honeypots:honeyd vs msblast.exe. lists.insecure.org/lists/

honeypots/2003/Jul-Sep/0071.html, August 2003.Honeypots mailinglist. 10

http://www.citi.umich.edu/u/provos/honeyd/

http://www.citi.umich.edu/u/provos/honeyd/

www.xprobe2.org

www.cert.org/advisories/CA-2001-26.html



www.nmap.org/nmap/nmap-fingerprinting-article.html

www.nmap.org/nmap/nmap-fingerprinting-article.html

lists.insecure.org/lists/honeypots/2003/Jul-Sep/0071.html

lists.insecure.org/lists/honeypots/2003/Jul-Sep/0071.html

[17] Vern Paxson. Bro: A System for Detecting NetworkIntruders in Real-Time. In Proceedings of the 7thUSENIX Security Symposium, January 1998. 2

[18] Jon Postel. Transmission Control Protocol. RFC 793,September 1981. 6

[19] Thomas Ptacek and Timothy Newsham. Insertion,Evasion, and Denial of Service: Eluding Network In-trusion Detection. Secure Networks Whitepaper, Au-gust 1998. 2

[20] Matthew Smart, G. Robert Malan, and Farnam Ja-hanian. Defeating TCP/IP Stack Fingerprinting. InProceedings of the 9th USENIX Security Symposium,August 2000. 11

[21] Dug Song, Robert Malan, and Robert Stone. A Snap-shot of Global Worm Activity. Technical report, ArborNetworks, November 2001. 3

[22] Lance Spitzner. Honeypots: Tracking Hackers. Addi-son Wesley Professional, September 2002. 3

[23] Stuart Staniford, Vern Paxson, and Nicholas Weaver.How to 0wn the Internet in your Spare Time. In Pro-ceedings of the 11th USENIX Secuirty Symposium, Au-gust 2002. 2, 9

[24] W. R. Stevens. TCP/IP Illustrated, volume 1.Addison-Wesley, 1994. 2

[25] Jeremy Sugerman, Ganesh Venkitachalam, , and Beng-Hong Lim. Virtualizing I/O Devices on VMware Work-station’s Hosted Virtual Machine Monitor. In Pro-ceedings of the Annual USENIX Technical Conference,pages 25–30, June 2001. 3, 11

[26] David Wagner and Paolo Soto. Mimicry Attacks onHost-Based Intrusion Detection Systems. In Proceed-ings of the 9th ACM Conference on Computer andCommunications Security, November 2002. 2

[27] Michal Zalewski and William Stearns. Passive OSFingerprinting Tool. www.stearns.org/p0f/README.Viewed on 12th January 2003. 11

www.stearns.org/p0f/README

A Virtual Honeypot · PDF fileHoneypots can help with some of these problems. A honeypot is as a closely monitored computing re-source that we intend to be probed, attacked, or com

Documents