1 OpenFlow Controllers over EstiNet Network Simulator and ...€¦ · simulated network. As a result, without any modiﬁcation, the real NOX OpenFlow controller readily runs on a

1

OpenFlow Controllers over EstiNet Network

Simulator and Emulator: Functional Validation

and Performance Evaluation

Shie-Yuan Wang ∗ Chih-Liang Chou † and Chun-Ming Yang †

∗Department of Computer Science, National Chiao Tung University, Taiwan

Email: [email protected]

†EstiNet Technologies, Inc.

Email: [email protected]

Abstract

In this article, we use the EstiNet OpenFlow network simulator and emulator to perform functional

validation and performance evaluation of the widely-used NOX OpenFlow controller. EstiNet uses an

unique kernel reentering simulation methodology to enable real applications to run on nodes in its

simulated network. As a result, without any modification, the real NOX OpenFlow controller readily

runs on a host in an EstiNet simulated network to control thousands of simulated OpenFlow switches.

Using EstiNet as the testing and evaluation platform, we studied how NOX implements the learning

bridge protocol (LBP) and the spanning tree protocol (STP) based on the OpenFlow 1.0 protocol. Our

simulation results show that these protocols, which are implemented as loadable components in NOX, do

not synchronize their gathered information well and thus NOX may give wrong forwarding instructions

to an OpenFlow switch after a link failure. We also found that when NOX’s STP detects a link failure,

it does not send a message to an affected OpenFlow switch to delete obsolete flow entries. As a result,

because the obsolete flow entry expires only after an idle period of 5 seconds, it may be matched and

used endlessly causing the OpenFlow switch to continue to forward incoming matched packets onto a

broken link. Our results reveal that the LBP and STP components provided in NOX serve only as basic

implementations and lack information synchronization, and there is much room left to further enhance

them.

2

I. INTRODUCTION

Software-Defined Networks (SDN) [1] is a new type of network that can be programmed

by a software controller according to various needs and purposes. The goal of SDN is to

facilitate innovations in network architecture and protocol designs. To achieve this goal, the

OpenFlow protocol [2] has been proposed to define the internal architecture of an OpenFlow

switch and the messages exchanged between an OpenFlow controller and OpenFlow switches. In

an OpenFlow network, because the operation and intelligence of the network are fully controlled

by an OpenFlow controller, the correctness and efficiency of the functions implemented by the

controller must be fully tested before its uses in a production network.

Testing the correctness and evaluating the performance of a network protocol can be performed

in several approaches. One approach is performing these tests over an experimental testbed (e.g.,

Emulab [3] and PlanetLab [4]). Although this approach uses real devices running real operating

systems and applications and can generate more realistic testing results, the cost of building

a large experimental testbed is huge and generally the testbed is not easily accessible to many

users. Traditionally, modeling checking and symbolic execution have long been used to automate

testing a system [5] [6]. However, the scalability is the main challenge for this approach because

the test space is very large.

Another common approach is via simulation, in which the operations of real devices and

their interactions are modeled and executed in a software program. The simulation approach

has many advantages such as lost cost, flexible, controllable, scalable, repeatable, accessible to

many users, and fast than real time in many cases. However, if the modeling of real devices

is not accurate enough, the simulation results may differ from the experimental results. To

overcome this problem, the emulation experiment approach may be used. Emulation differs from

simulation in that an emulation is like an experiment and thus must be executed in real time while

simulation speed can be faster or slower than the real time. Furthermore, in an emulation some

real devices running real operation systems and application programs will interact with some

simulated devices. In contrast, in a simulation generally no real operation systems or applications

are involved.

In this article, we introduce the EstiNet OpenFlow network simulator and emulator [7]. EstiNet

uses an unique approach to testing the functions and performances of OpenFlow controllers. By

3

using an innovative simulation methodology, which is called the “kernel reentering methodology,”

EstiNet combines the advantages of both the simulation and the emulation approaches. In a

network simulated by EstiNet, a simulated device can run the real Linux operating system

and any UNIX-based real application program can readily run on a simulated device without

any modification. With these unique capabilities, EstiNet’s simulation results are as accurate as

those obtained from an emulation while still preserving the many advantages of the simulation

approach.

In testing OpenFlow controllers, since the first and most widely-used NOX OpenFlow con-

troller [8] is a real application program runnable on Linux, NOX can readily run on a host in

an EstiNet simulated network to control thousands of simulated OpenFlow switches. In EstiNet,

because these real OpenFlow controllers are tested and evaluated in simulations rather than in

emulations, the tests and evaluations can be performed much faster than real time. In addition, in

EstiNet, the performance results of a simulated OpenFlow network managed by these OpenFlow

controllers are correct, accurate, and repeatable. These performance results can be correctly

explained based on the parameter settings (e.g., link bandwidth, delay, downtime, etc.) and

configurations (e.g., network size, mobility pattern or speed) of the simulated OpenFlow network.

We have used EstiNet to perform functional validation and performance evaluation of several

protocols that are implemented by NOX as components. In this article, we choose the learning

bridge protocol (LBP) and the spanning tree protocol (STP) as illustration examples. We studied

the complicated interactions among these OpenFlow-implemented protocols and the address

resolution protocol (ARP) when the network traffic is TCP traffic. Our simulation results and

detailed logs reveal their behavior, efficiency, and implementation flaws under the tested network

settings.

II. SIMULATION ARCHITECTURE OF ESTINET

To implement the kernel reentering methodology, EstiNet uses tunnel network interfaces to

automatically intercept the packets exchanged by two real applications and redirect them into

the EstiNet simulation engine. As shown in Figure 1 (a), inside the EstiNet simulation engine, a

protocol stack composed of the MAC/Phy layers along with other layers below the IP layer are

created for each simulated host. Packets to be sent out on host 1 are sent out to the output queue

of tunnel interface 1 where the simulation engine will fetch them later. After fetching a packet

4

from tunnel interface 1, the simulation engine processes the packet through the protocol stack

created for host 1 to simulate the MAC/Phy and many other mechanisms of the network interface

used by host 1. For example, the effects of the link delay, link bandwidth, link downtime, and

link bit-error-rate (BER) are all simulated in the Phy module. The Phy module of host 1 will

deliver the packet to the Phy module of host 2 after the link delay plus the transmission time of

the packet on this link based on the simulation clock. Then, the packet will be processed from

the Phy module up to the interface module, where it is written back into the kernel via tunnel

interface 2. The packet will then go through the IP/TCP/Socket layers and finally be received by

the application running on host 2, which ends its journey. By this methodology, all Linux-based

real applications can run on a simulated network in EstiNet without any modification and they

all use the real TCP/IP protocol stack in the Linux kernel to create their TCP connections.

Figure 1 (b) shows how we extend this methodology to support running a real OpenFlow

controller on an EstiNet simulated network. Since real OpenFlow controllers such as NOX

are normal application programs, they readily run on a simulated host in EstiNet without any

modification. However, because a real OpenFlow switch needs to set up a TCP connection to

the OpenFlow controller to receive its messages, we simulate the operations of each OpenFlow

switch inside the simulation engine and let it create an OpenFlow TCP socket bound to a network

interface (in this example, the used network interface is tunnel interface 2). With this design,

in EstiNet a simulated OpenFlow switch can set up a real TCP connection to a real OpenFlow

controller to receive its messages. All messages exchanged between a real OpenFlow controller

and a simulated OpenFlow switch are accurately scheduled based on the simulation clock.

Therefore, the results of functional validation and performance evaluation of a real OpenFlow

controller are correct and repeatable over EstiNet.

III. COMPARISON WITH RELATED TOOLS

Currently, very few network simulators support the OpenFlow protocol and the most notable

one is ns-3 [9]. Ns-3 is the most widely used network simulator in the world. There is a project of

ns-3 about supporting the OpenFlow protocol and the version of the OpenFlow protocol supported

is 0.89, which is old as the latest version of OpenFlow as of this writing is already 1.3.1. Ns-3

simulates the operations of an OpenFlow switch by compiling and linking an OpenFlow switch

C++ module with its simulation engine code. To simulate a real OpenFlow controller, ns-3 also

5

Interface

ARP

FIFO

MAC8023

TCPDUMP

Phy

Interface

ARP

FIFO

MAC8023

TCPDUMP

Phy

Tunnel 1 Tunnel 2

Interface

ARP

FIFO

Phy

Interface

ARP

FIFO

Phy

MAC8023 MAC8023

TCPDUMP TCPDUMP

Tunnel 2

FIFO

MAC8023

Phy

Socket

Tunnel 1

TCP

Socket

OpenFlowSwitch

Simulation Engine

IP

User Space

Kernel Space

Controller

Host 1 Host 2

User Space

TCP/UDP

IP

Kernel Space

(1)

(3)

(2)

Socket

Host 1’s

Application

Program

(4)

(5)

(6)

(9)

(8)

(7)

Socket

Host 2’s

Application

Program

Simulation Engine

(a) The host−to−host case

(b) The controller−to−OpenFlowSwitch case

Fig. 1: Simulation Architecture of EstiNet

implements it as a C++ module and compiles and links it with its simulation engine code. In

fact, all devices/objects simulated in ns-3 are implemented as C++ modules compiled and linked

together with its simulation engine code to form a user-level executable program (i.e., the ns-3).

Because the ns-3 program is a user-level program and a real OpenFlow controller such as

NOX is also a user-level program, a real OpenFlow controller program cannot be compiled and

6

linked together with the ns-3 program to form a single executable program. As a result, a real

OpenFlow controller cannot readily run without modification on a node in a network simulated

by ns-3. It is this reason why ns-3 has to implement its own OpenFlow controller from scratch

as a C++ module. This approach wastes much time and effort to re-implement widely-used

real OpenFlow controllers. In addition, the running behavior of a re-implemented OpenFlow

controller module in ns-3 may not be the same as the behavior of a real OpenFlow controller

because the former is a much-simplified abstraction of the latter. For example, as documented in

ns-3, the spanning tree protocol and the MPLS function are not supported. As another example,

in ns-3 there is no TCP connection between a simulated OpenFlow switch and its simulated

OpenFlow controller. Since this usage is not the usage in the real world, the simulation results

will differ from the real results when the TCP connection in the real world experiences packet

losses or congestion.

Regarding network emulators that support the OpenFlow protocol, currently there are very few

such tools and the most notable one is Mininet [10]. Mininet uses the virtualization approach

to create emulated hosts and uses the Open vSwitch [11] to create software OpenFlow switches

on a physical server. The links connecting a software OpenFlow switch to emulated hosts or to

other software OpenFlow switches are implemented by using the virtual Ethernet pair mechanism

provided by the Linux kernel. Because an emulated host in Mininet is like a virtual machine, real

applications can readily run on it to exchange information. A real OpenFlow controller, which is

also a real application, can also run on an emulated host to set up TCP connections to software

OpenFlow switches to control them. With this approach, emulated hosts and software OpenFlow

switches can be connected together to form a desired network topology and be controlled by a

real OpenFlow controller.

Although Mininet can be used to as a rapid prototyping for software-defined networks, it has

several limitations. As stated in [10], the most significant limitation of Mininet is its lack of

performance fidelity because it provides no guarantee that an emulated host in Mininet that is

ready to send a packet will be scheduled promptly by the operating system to send the packet and

it provides no guarantee that all software OpenFlow switches in Mininet will forward packets

at the same rate. The packet forwarding rate of a software OpenFlow switch in Mininet is

unpredictable and varies in every experimental run as it depends on the CPU speed, the main

memory bandwidth, the numbers of emulated hosts and software OpenFlow switches that must

7

be multiplexed over a CPU in Mininet, and the current system activities and load. As a result,

Mininet can only be used to study the behavior of an OpenFlow controller but cannot be used

to study any time-related network/application performance.

In contrast, EstiNet combines the advantages of both the simulation and the emulation ap-

proaches without their respective shortcomings. Like in an emulation, in EstiNet a real OpenFlow

controller can readily run without modification to control simulated OpenFlow switches and real

applications can readily run on hosts running a real operating system to generate realistic network

traffic. However, the operations and interactions among these real applications, the real OpenFlow

controller, the OpenFlow switches, hosts and links in a studied network are all scheduled by the

EstiNet simulation engine based on its simulation clock, rather than be multiplexed and executed

in an unpredictable way by the operating system. For this reason, differing from Mininet, EstiNet

generates time-related OpenFlow performance results correctly and the results are repeatable.

In Figure 2, we compare EstiNet, ns-3, and Mininet according to their latest developments.

Most comparison results are self-explanatory and thus we only explain the scalability and GUI

comparison results. EstiNet uses the kernel reentering methodology to use a single kernel to

support multiple hosts and its simulation engine process can support multiple OpenFlow switches.

As a result, it is highly scalable. Ns-3 is also highly scalable as its simulated hosts, OpenFlow

switches, and controller are all implemented as C++ modules and linked together as a single

process. In contrast, Mininet needs to run up a shell process (e.g., /bin/bash) to emulate each

host and needs to run up a user-space Open vSwitch process (or a kernel-space Open vSwitch) to

simulate each OpenFlow switch. As a result, it is less scalable than EstiNet and ns-3. Regarding

the GUI supports, which are very important for the user, EstiNet’s GUI can be used to easily set

up and configure a simulation case and be used to observe the packet playback of a simulation

run. The GUI of ns-3, on the other hand, can only be used for observation of the results and

the user needs to write C++ or scripts to set up and configure the simulation case. For Mininet,

its GUI can be used for observation purposes only and the user needs to write Python scripts to

set up and configure the simulation case.

IV. STUDY TARGETS AND SIMULATION SETTINGS

We used the network topology shown in Figure 3 to study how NOX implements its LBP

and STP. These protocols are implemented as the “switch” and the “spanning tree” components

8

Fig. 2: A comparison of EstiNet, ns-3, and Mininet

in NOX. The reason why we chose to study them is because these protocols are important and

fundamental to the operations of a network and they are relatively the most complicated ones

among those provided in NOX. During the simulations, these components are loaded into and

reside in the core of NOX simultaneously.

Nodes 3, 4, 5 and 11 are simulated hosts running the real Linux operating system where real

applications can run without modification. Nodes 6, 7, 8, 9, and 10 are simulated OpenFlow

switches supporting the OpenFlow 1.0 protocol. Node 1 is the host where NOX will be running

during simulation. (In the following, we call it the “controller node” for brevity.) Node 2 is a

simulated legacy (normal) switch that connects all simulated OpenFlow switches together with

the controller node. It forms a management network over which the TCP connection between

each simulated OpenFlow switch and the controller node will be set up. All OpenFlow messages

between NOX and simulated OpenFlow switches are exchanged over this management network.

In contrast, the network formed by simulated OpenFlow switches, simulated hosts, and the links

connecting them together is the data network over which real applications running on simulated

9

hosts will exchange their information. We set the bandwidth and delay of each link in both the

management and data networks to be 10 Mbps and 10 ms, respectively. To test the path-finding

and network convergence performance of NOX after a link failure, each simulation run starts at

0’th sec and ends at 100’th sec in the simulated network and the link between nodes 6 and 7 is

purposely turned down between 40’th sec and 65’th sec.

Fig. 3: The topology of the tested OpenFlow network and the original path of a TCP traffic

flow

We tested the behavior of a greedy TCP flow on the data network controlled by NOX.

Because in EstiNet real applications can directly run on simulated hosts, we chose node 11

as the destination host and ran up the “rtcp” application on it. We chose node 3 as the source

host and ran up the “stcp” application on it. Once the stcp successfully sets up a real TCP

connection with the rtcp, it generates greedy TCP traffic to the rtcp subject to the TCP error,

flow, and congestion control algorithms. All of these real applications are set to start at 30’th sec

rather than 0’th sec. This is because we want the NOX’s STP to have formed a stable spanning

tree over the data network before any packet enters into it.

We also tested the effects of the ARP protocol on the path-finding and network convergence

speed of the simulated OpenFlow network. Normally, on a real network the ARP protocol is

10

enabled on every host and triggered on demand to find out the (MAC address, IP address)

mapping relationship. However, under some circumstances. this mapping table can be pre-built

to avoid the ARP request/reply latency and in this case the ARP protocol is disabled on hosts.

Our simulation results show that enabling or disabling the ARP protocol can have a significant

impact on the path-finding capability/speed of an OpenFlow network.

V. FUNCTIONAL VALIDATION

Before presenting the functions of NOX’s LBP and STP over the tested network, we briefly

explain the main OpenFlow messages exchanged between NOX and OpenFlow switches to

implement LBP and STP.

In OpenFlow 1.0, a PacketIn message is issued by an OpenFlow switch (to save space, in the

following we will use “switch” to refer to an OpenFlow switch when there is no ambiguity)

to the controller to ask it how to process a received packet. It can contain the full content of

the packet or just the headers of the packet with a few bytes of the data payload. A PacketOut

message is issued by the controller to a switch to instruct it how to process and send out a packet.

The packet may be carried in the PacketOut message or is a packet that is already buffered in a

switch waiting for a PacketOut message. A FlowModify message is issued by the controller to

a switch to add/modify/remove a flow entry in the flow table. When a packet enters a switch,

it is matched against all flow entries in the flow table. If it matches one or more entries, the

entry with the highest priority will be used to process the packet. Each entry can be associated

with some actions, which may be DROP, FORWARD, etc. and the actions will be applied to a

matched packet. If there is no match in the flow table (which is called a table miss), a switch

can decide to drop the packet or issue a PacketIn message to the controller asking it how to

process the packet. Initially, the flow table in every switch is empty. The PortModify message

is issued by the controller to a switch to change the status of one of its ports. For example, the

status of a port can be set to Flood or NonFlood, which means that when the switch needs to

send a packet out of all of its ports (excluding the ingress port), whether this port should be

included or not for a flood operation.

11

A. LBP in NOX

Here we use Figure 3 to explain how NOX’s LBP works on the tested network. Suppose

that the source host (node 3) sends its first TCP DATA packet to the destination host (node

11). When this packet enters node 6, a table miss event is generated because there is no flow

entry in the table that can be matched to it. This event causes node 6 to issue a PacketIn

message to NOX asking it how to process this packet. After receiving the message, NOX learns

the (source node, ingress port) = (node 3, left port) mapping information for node 6. It then

issues a PacketOut message to instruct node 6 to flood this packet because it does not have any

forwarding information for the destination host (node 11) so far. After receiving the PacketOut

message, node 6 floods the packet to nodes 9 and 7. To simplify the discussion, we now focus

on the flooding path along nodes 6, 9, and 10.

After node 9 receives the packet, like what has occurred in node 6, it issues a PacketIn message

to NOX and NOX learns that on node 9 (source node, ingress port) = (node 3, top port). Not

knowing to which output port to forward this packet, NOX also sends out a PacketOut to node 9

instructing it to flood the packet. When node 10 receives the packet, the same scenario happens

and NOX learns that on node 10 (source node, ingress port) = (node 3, left port). Finally, when

node 11 (the destination host) receives the packet, it sends back a TCP ACK packet to the

source host (node 3). When this packet enters node 10, because there is no entry in its flow

table for this reverse flow, node 10 issues a PacketIn message to NOX asking for instructions.

Now with the mapping information learned previously, NOX issues a FlowModify message to

node 10 instructing it to add a flow entry of (destination node, output port) = (node 3, left port)

into its flow table and forward the packet out of its left port. When node 9 receives the packet,

the same scenario occurs. It issues a PacketIn to NOX and NOX instructs it to add an entry of

(destination node, output port) = (node 3, top port) and forward the packet out of its top port.

The same scenario applies to node 6 and finally the TCP ACK packet reaches node 3, finishing

a round trip.

When the TCP ACK packet is coming back on the returning path, because nodes 10, 9, and

6 each issues a PacketIn message to NOX, NOX also learned (source node, ingress port) =

(node 11, bottom port), (node 11, right port), and (node 11, bottom port) for nodes 10, 9, and 6,

respectively. However, we found that although NOX learns a mapping from a PacketIn message

12

issued by a switch, it does not immediately add this mapping as a flow entry to that switch

using the FlowModify message. Instead, it silently keeps the learned mapping information until

the mapping is used in the future. For this reason, when the second TCP DATA packet traverses

nodes 6, 9, and 10 to reach node 11, each of these nodes will generate a table miss event and

issue a PacketIn message to NOX again. However, at this time NOX already knows how to

forward the packet to node 11 on these nodes. Therefore, it issues FlowModify messages to add

these learned mapping information into the flow tables of nodes 6, 9, and 10 and instruct them

to forward the packet out of a port according to the newly added entries. So far, all required

flow entries for both the forward and reverse directions of this TCP flow have been installed in

the flow tables of nodes 6, 9, and 10. Starting from the third TCP DATA packet, when a TCP

DATA or ACK packet enters these nodes, no more PacketIn messages will need to be sent to

NOX from these nodes.

In the above case, however, if the source host sends out UDP packets to the destination host

and the destination host does not reply any packet to the source host, we found that the behavior

of the OpenFlow network is very different from the TCP traffic case. When the first UDP packet

traverses nodes 6, 9, 10 to reach the destination host, the interactions between these nodes and

NOX are the same as those in the TCP traffic case. However, because there is no returning packet

from the destination host to the source host, NOX has no chance to learn (source node, ingress

port) = (node 11, bottom port), (node 11, right port), and (node 11, bottom port) for nodes 10, 9,

and 6, respectively. As a result, when the second and all of following UDP packets continue to

traverse nodes 6, 9, and 10, each of them will trigger a table miss event on each of these nodes,

causing an excessive number of PacketIn messages to be sent to NOX continuously. Conceivably,

when the number of nodes on the path is large, when the sending rate of an unidirectional UDP

flow is high, or when there are many unidirectional UDP flows in the network, NOX will be

burdened by a high rate of PacketIn messages, reducing its capability to manage a large network.

B. STP in NOX

NOX’s STP uses the LLDP (Link Layer Discovery Protocol) [12] packets to discover the

topology of an OpenFlow network. For each switch, after it is powered on and has established a

TCP connection to NOX, NOX immediately sends it a FlowModify message to add an entry into

its flow table. This flow entry will match future received LLDP packets and its associated action

13

is “Send the received LLDP packet to the controller.” For each port of a switch, every 5 seconds

(the LLDP transmission interval) NOX sends a PacketOut message to the switch asking it to

send the LLDP packet carried in the PacketOut message out of the specified port. Since every

switch has already had a flow entry matching received LLDP packets, when a switch receives

an LLDP packet from one of its neighboring switches, it will send the received LLDP packet to

NOX. With these received LLDP packets from all switches, NOX builds the complete network

topology and computes a spanning tree over it. For a link that is included/not included on the

computed spanning tree, NOX sends a PortModify message to each of the two switches that

are at the two ends of the link. This message enables/disables the flooding status of the port

connected to the link. For each link detected in NOX, a 10-second (which is two times of the

LLDP transmission interval) timer is set up in NOX to monitor its connectivity. If a link’s timer

expires, NOX views that this link is currently down and recomputes a new spanning tree. Then,

it uses PortModify messages to change the flooding status of the affected ports.

We found that the PacketIn and PacketOut messages triggered by the exchanges of LLDP

packets on a large network can cause a heavy processing burden for NOX. For example, if there

are 100 switches in the network and each has 24 ports connecting to neighboring switches,

because there are 2,400 ports in total in the network, NOX will have to process 2,400 PacketOut

messages plus 2,400 PacketIn messages every 5 seconds just for LLDP packets alone.

VI. PERFORMANCE EVALUATION

We study how quickly a TCP flow can change its path to a new path after a link failure. In

Section VI-A, we first study the details of two cases when the LLDP transmission interval is

the original value of 5 seconds, with the first case carried out when ARP is enabled and the

second case carried out when ARP is disabled. Then, in Section VI-B, we study the effects of

the LLDP transmission interval on the new path finding time of the TCP flow when ARP is

enabled.

A. Detailed Case Studies

On the tested network depicted in Figure 3, before the link between nodes 6 and 7 breaks at

40’th sec, NOX’s STP disables the link between nodes 9 and 10 and the link between nodes 10

and 8. Therefore, the TCP flow from the source host to the destination host traverses nodes 3, 6,

14

Fig. 4: The new path of a TCP traffic flow after the link between nodes 6 and 7 becomes

down under the ARP-enabled condition

7, 10, and 11. After the link breakage, NOX’s STP enables back these two links but disables the

link between nodes 7 and 8 to maintain a loopless and connected topology. In the following, we

show that (1) when ARP is enabled, the new path taken by the TCP flow is changed to the path

traversing nodes 3, 6, 9, 10, and 11 at 62’th sec, as shown in Figure 4; However, the TCP flow

never changes back to its original path after the link downtime; and (2) when ARP is disabled,

the TCP flow never changes its path to the new path during the link downtime between 40’th

sec and 65’th sec and it becomes active over the original path after the link downtime at 84’th

sec. These results are caused by the flow idle timers used in OpenFlow switches, NOX’s LBP

and STP implementations, and their interactions with ARP.

Figure 5 (a) shows the timeline of the important events of the TCP flow when ARP is enabled

on hosts. Since the link becomes down at 40’th sec, as discussed previously, because NOX’s STP

uses a separate 10-second timer to detect the failure of each link, one would expect that the new

spanning tree would be formed very quickly around 50+’th sec and the TCP flow would change

to the new path quickly after the new spanning tree is formed (i.e., also around 50+’th sec).

However, this is not the case and the TCP flow actually changes to the new path and becomes

15

active at around 62’th sec (at the F event).

Fig. 5: The timeline for a TCP flow to change/keep its path after the link between nodes 6 and

7 breaks at 40’th sec

The events A, B, C, D, E and F represent the timestamps when the TCP flow continues

to resend a packet lost on the broken link. The timestamp of event A is 40.3’th sec and the

intervals between two successive retransmission events starting from event A are 0.7, 1.38, 2.72,

5.44, and 11.30 seconds, respectively. These exponentially-growing intervals are caused by the

TCP congestion control algorithm on the source host. The F event represents the successful

retransmission of the lost packet over the new path. After that, the TCP flow becomes active

in transmitting packets over the new path. The X event represents the source host sending an

ARP request while the Y event represents the destination host returning the ARP reply back

to the source host. The X event is triggered by the TCP flow retransmitting the lost packet at

the F timestamp. This is because on the source host the ARP entry for the destination host had

expired during the long TCP retransmission interval and an ARP request must be broadcast to

the network to rebuild the entry. (Note: An ARP entry may expire after an idle period of between

16

2 and 4 seconds in the simulations, depending on the relative timing between the installation of

the entry and the 2-second periodic flush operations.)

In the following, we explain why the TCP flow experiences unexpected delays before it

successfully changes to the new path when ARP is enabled. The reason why the retransmission

attempts at events A, B, C, D and E fail is the same. From the EstiNet log, we found that NOX’s

STP does not detect the link failure between 40’th sec and 50’th sec and the new spanning tree

is formed at 52’th sec, which is after the timestamp of event E. Therefore, all of these resent

TCP packets are forwarded over the broken link and get lost. (Note that NOX uses a 5-second

interval to periodically update the Flood/NonFlood status of the ports of a switch and our log

shows that the update occurs at 52’th sec.) As for the retransmission at event F, it succeeds and

the reason is explained below. As discussed before, the broadcast ARP request at the timestamp

of X is triggered by the resent TCP packet at event F. Because the flow entries added in all

switches for the previous ARP request/reply transmitted by the source and destination hosts had

expired during the long TCP retransmission timeout, when an ARP request enters a switch,

the switch will issue a PacketIn message to NOX asking for forwarding instructions. In return,

NOX sends back a PacketOut message instructing the switch to flood the ARP request (which

is a broadcast packet) out of all of its ports. Although the right port of node 6 connects to the

broken link, the bottom port connecting to node 9 is functioning. As a result, the ARP request

can traverse nodes 6, 9, 10 to reach the destination host and the destination host can send back

the unicast ARP reply back to the source host, like the scenario described in Section V-A.

The effects of these ARP request and reply packets not only install ARP flow entries in

switches but also let NOX learn the most updated flow forwarding information for each switch.

Later, when the resent TCP packet enters node 6, because there is no entry for this TCP flow

(note that there is an ARP entry just installed, however, because its type is not TCP, this flow

entry does not match the incoming TCP packet), node 6 issues a PacketIn message to NOX

asking for instructions. For this TCP packet, NOX now returns a PacketOut message instructing

the switch to add an updated and correct flow entry for this TCP flow and forward the TCP

packet out of the bottom port of node 6, which is correct. After that, this TCP packet and its ACK

packet follow the scenario described in Section V-A to (1) update NOX of correct forwarding

information for this TCP flow on all switches and (2) install correct flow entries for this TCP

flow in all switches. With all correct entries installed in the switches on the new path, starting

17

from the second TCP packet after the timeout, the TCP flow becomes active on the new path at

62’th sec.

In contrast to the fact that the TCP flow can change to a new path when ARP is enabled,

when ARP is disabled the TCP flow never changes to the new path but instead becomes active

again on the original path at 84’th sec. In the following, we explain the reasons.

When ARP is disabled on hosts, as shown in Figure 5 (b), one sees that the retransmission

attempt at event P fails. This failure doubles the TCP retransmission interval from 11.30 seconds

to 22.60 seconds and causes the TCP flow to resend the lost packet at the timestamp of P plus

22.60 seconds, which is at 84’th sec (at the Q timestamp). After event Q, because the link

downtime has passed and NOX never found a new path for the TCP flow during the link

downtime, the TCP flow still uses its old path to successfully transmit its packets.

As for the reason why when ARP is disabled the retransmission attempt at event P still fails,

we found that it is caused by a bug of NOX’s STP. For each entry added to the flow table of a

switch, the switch sets up an idle timer for it and will remove the entry if it has not been used

for more than 5 seconds. As a result, the flow entry for the TCP flow expires and is removed

at the timestamp of E plus 5 seconds (which is 56’th sec). From the EstiNet log, we observed

that node 6 does send a FlowRemoved message to NOX to inform it of the removal of this

flow entry at 56’th sec. Later at 62’th sec when a resent TCP packet enters node 6, because the

flow entry for this TCP flow had expired and been removed before, the switch issues a PacketIn

message to NOX asking for its forwarding instruction.

Surprisingly, we found that NOX issues back a PacketOut message instructing the switch

to add a flow entry for this flow and forward the resent packet out of the port connected to

the broken link. That is, even though NOX has received the FlowRemoved notification from

the switch at 56’th sec, it still keeps the old and obsolete forwarding information for this flow

and gives the wrong forwarding entry and instruction to the switch at 62’th sec. Since this

resent packet is lost again on the broken link, the TCP flow has to resend the packet at the

next retransmission timeout, which occurs at 84’th sec. We note that the EstiNet log shows

that NOX’s STP has detected the broken link and re-formed a new spanning tree at 52’th sec.

However, the above results show that detecting a link failure and accordingly disabling that link

does not cause NOX to automatically remove all obsolete forwarding information related to that

link. These results indicate that in NOX the STP and LBP components do not synchronize their

18

gathered information well and thus, as shown in this study, they may result in wrong operations

of an OpenFlow network. Note that this bug does not happen when ARP is enabled and the

ARP request/reply transmissions happen before the transmission of the resent TCP packet, as

shown in Figure 5 (a). Since the ARP request/reply trigger PacketIn messages sent to NOX,

we conjecture that these PacketIn messages replace the wrong forwarding information stored in

NOX with correct information.

Another important finding from Figure 5 (a) is that the TCP flow does not change its path

from the new path back to its original path after the link downtime, even though the spanning

tree has been restored to the original one. This problem is caused by the flow idle timers used in

the switches on the new path. Because the TCP flow is active in sending packets after changing

to the new path, the flow entries for the TCP flow in these switches, which were created over

the spanning tree during the link downtime, will never expire. Since every incoming TCP packet

will match these entries and will not generate a table miss event, these switches will not send

any PacketIn messages to NOX. Therefore, NOX has no chances to install new entries into the

flow tables of these switches to change the path of the TCP flow back to its original one, which

may be better than the new path in terms of hop count or available bandwidth.

B. Effects of the LLDP Interval

To study the effects of the LLDP transmission interval and the link failure detection timeout

value (which is two times of the LLDP transmission interval) on the new path finding time of

the TCP flow when ARP is enabled, we varied the value of the LLDP interval from 1, 2, 3,

..., to 10 seconds and observed at what time the TCP flow can switch to the new path. (Note:

When ARP is disabled, we have shown that the TCP flow will never change to a new path due

to the implementation of NOX.) Figure 6 shows the results. After careful investigations into the

causes, we found that this phenomenon is caused by the complicated interactions among several

timers used in an OpenFlow switch and NOX. When the LLDP interval is reduced to 1 second,

a link failure can be detected quickly after 2*1 = 2 seconds. However, because the flow entry

for the TCP flow in node 6 is used and matched on each resent TCP packet, this obsolete flow

entry (whose output port still points to the broken link) continues to reside in the flow table and

be used for the resent TCP packets at event A, B, C, and D in Figure 5 (a). As a result, all of

these retransmitted TCP packets are lost on the broken link. (Note: This finding shows a design

19

flaw in NOX’s STP as it should send a FlowModify message to node 6 to delete the obsolete

TCP flow entry as soon as it detects the link failure.)

0

10

20

30

40

50

60

70

80

90

100

0 1 2 3 4 5 6 7 8 9 10 11

New

path

fin

din

g tim

e (

sec)

LLDP transmission interval (sec)

ARP enabled

Fig. 6: The new path finding time of a TCP flow under different LLDP transmission intervals

On the next retransmission attempt at event E, because the TCP retransmission timeout between

event D and E is 5.44 seconds (which is larger than the 5-second flow entry idle timeout value),

the flow entry for this TCP flow had expired and a table miss event will be generated. However,

because the ARP entry on the source host had expired as well during this long TCP retransmission

interval, at the timestamp of event E, an ARP request is instead generated and enters into node

6. Starting from this moment, the scenario that we described in Section VI-A when explaining

events X, Y, and F occurs again. Therefore, the TCP flow can now switch to the new path at

52’th sec when the LLDP interval is reduced to 1 second.

All of the results shown in Figure 6 can be explained accurately based on the OpenFlow

protocol and the implementations of ARP, TCP, and NOX’s STP and LBP. Due to the paper

length limitation, however, we can only pick one result to explain in details. Nevertheless, this

study already shows the performance fidelity of EstiNet when used to evaluate a real OpenFlow

controller.

20

VII. CONCLUSION

In this article, we present the EstiNet OpenFlow network simulator and emulator and use it as

a platform to perform functional validation and performance evaluation of the NOX OpenFlow

controller. EstiNet uses an unique kernel reentering simulation methodology to combine the

advantages of both the simulation approach and the emulation approach. By this methodology.

a real OpenFlow controller can run without modification to control thousands of simulated

OpenFlow switches. In addition, real applications can run without modification on simulated

hosts that run the real Linux operating system to generate realistic network traffic.

Our simulation study provides important insights into how NOX implements the functions of

the learning bridge protocol (LBP) and the spanning tree protocol (STP) based on the OpenFlow

1.0 protocol. NOX implements these protocols as separate components that can be loaded into

the core of NOX simultaneously. Our detailed logs reveal that the LBP and STP components

in NOX do not synchronize their gathered information well and thus NOX may give wrong

forwarding instructions to an OpenFlow switch after a link failure. Our another finding is that

when NOX’s STP detects a link failure, it does not send a message to an affected switch to

delete obsolete flow entries. As a result, because the obsolete flow entry expires only after an

idle period of 5 seconds, it may be matched and used endlessly causing the OpenFlow switch

to continue to forward incoming packets onto a broken link.

In summary, our results show that the LBP and STP components provided in NOX only

implement basic functions and lack information synchronization. As revealed in this paper, there

is much room left to further improve them.

REFERENCES

[1] “Software-Defined Networking: The New Norm for Networks,” a white paper of Open Networking Foundation, April 13,

2012.

[2] Nick Mckeown, Tom Anderson, Hari BalaKrishnan, Guru Parulkar, Larry Peterson, Jennifer Rexford, Scott Shenker, and

Jonathan Turner, “OpenFlow: Enabling Innovation in Campus Networks,” ACM SIGCOMM Computer Communication

Review, Volume 38 Issue 2, April 2008.

[3] B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad, M. Newbold, M. Hibler, C. Barb, and A. Joglekar. “An Integrated

Experimental Environment for Distributed Systems and Networks,” In Proc. of the Fifth Symposium on Operating Systems

Design and Implementation, pages 255 - 270, Boston, MA, Dec. 2002.

[4] B. Chun, D. Culler, T. Roscoe, A. Bravier, L, Peterson, M. Wawrzoniak, and M. Bowman, “PlanetLab: an Overlay Testbed

for Broad-Coverage Services,” ACM SIGCOMM Computer Communication Review, Volume 33 Issue 3, July 2003.

21

[5] N. Foster, R. Harrison, M.J. Freedman, C. Monsanto, and J. Rexford, A. Story, and D. Walker, “Frenetic: A Network

Programming Language,” in Proc. of ICFP 2011.

[6] Macro Canini, Daniele Venzano, Peter Peresini, Dejan Kostic, and Jennifer Rexford, “A NICE Way to Test OpenFlow

Applications,” in Proc. of Networked System Design and Implementation, April 2012.

[7] EstiNet 8.0 OpenFlow Network Simulator and Emulator, EstiNet Technologies Inc., available at http://www.estinet.com.

[8] Natasha Gude, Teemu Koponen, Justin Pettit, Ben Pfaff, Martn Casado, Nick McKeown, Scott Shenker, “NOX: towards an

Operating System for Networks,” ACM SIGCOMM Computer Communication Review, Volume 38 Issue 3, July 2008

[9] T. R. Henderson, M. Lacage, and G. F. Riley, “Network Simulations with the ns-3 Simulator,” ACM SIGCOMM’08, August

17-22, 2008, Seattle, USA.

[10] Bob Lantz, Brandon Heller, and Nick McKeown, “A Network in a Laptop: Rapid Prototyping for Software-Defined

Networks” ACM Hotnets 2010, October 20-21, 2010, Monterey, CA, USA.

[11] B. Pfaff, J. Pettit, T. Koponen, K. Amidon, M. Casado, and S. Shenker, “Extending Networking into the Virtualization

Layer,” in Prof. of HOTNETS 2009.

[12] Link Layer Discovery Protocol, IEEE 802.1AB standards.

1 OpenFlow Controllers over EstiNet Network Simulator and ...€¦ · simulated network. As a result, without any modiﬁcation, the real NOX OpenFlow controller readily runs on a

Documents