Page 1
1
OpenFlow Controllers over EstiNet Network
Simulator and Emulator: Functional Validation
and Performance Evaluation
Shie-Yuan Wang ∗ Chih-Liang Chou † and Chun-Ming Yang †
∗Department of Computer Science, National Chiao Tung University, Taiwan
Email: [email protected]
†EstiNet Technologies, Inc.
Email: [email protected]
Abstract
In this article, we use the EstiNet OpenFlow network simulator and emulator to perform functional
validation and performance evaluation of the widely-used NOX OpenFlow controller. EstiNet uses an
unique kernel reentering simulation methodology to enable real applications to run on nodes in its
simulated network. As a result, without any modification, the real NOX OpenFlow controller readily
runs on a host in an EstiNet simulated network to control thousands of simulated OpenFlow switches.
Using EstiNet as the testing and evaluation platform, we studied how NOX implements the learning
bridge protocol (LBP) and the spanning tree protocol (STP) based on the OpenFlow 1.0 protocol. Our
simulation results show that these protocols, which are implemented as loadable components in NOX, do
not synchronize their gathered information well and thus NOX may give wrong forwarding instructions
to an OpenFlow switch after a link failure. We also found that when NOX’s STP detects a link failure,
it does not send a message to an affected OpenFlow switch to delete obsolete flow entries. As a result,
because the obsolete flow entry expires only after an idle period of 5 seconds, it may be matched and
used endlessly causing the OpenFlow switch to continue to forward incoming matched packets onto a
broken link. Our results reveal that the LBP and STP components provided in NOX serve only as basic
implementations and lack information synchronization, and there is much room left to further enhance
them.
Page 2
2
I. INTRODUCTION
Software-Defined Networks (SDN) [1] is a new type of network that can be programmed
by a software controller according to various needs and purposes. The goal of SDN is to
facilitate innovations in network architecture and protocol designs. To achieve this goal, the
OpenFlow protocol [2] has been proposed to define the internal architecture of an OpenFlow
switch and the messages exchanged between an OpenFlow controller and OpenFlow switches. In
an OpenFlow network, because the operation and intelligence of the network are fully controlled
by an OpenFlow controller, the correctness and efficiency of the functions implemented by the
controller must be fully tested before its uses in a production network.
Testing the correctness and evaluating the performance of a network protocol can be performed
in several approaches. One approach is performing these tests over an experimental testbed (e.g.,
Emulab [3] and PlanetLab [4]). Although this approach uses real devices running real operating
systems and applications and can generate more realistic testing results, the cost of building
a large experimental testbed is huge and generally the testbed is not easily accessible to many
users. Traditionally, modeling checking and symbolic execution have long been used to automate
testing a system [5] [6]. However, the scalability is the main challenge for this approach because
the test space is very large.
Another common approach is via simulation, in which the operations of real devices and
their interactions are modeled and executed in a software program. The simulation approach
has many advantages such as lost cost, flexible, controllable, scalable, repeatable, accessible to
many users, and fast than real time in many cases. However, if the modeling of real devices
is not accurate enough, the simulation results may differ from the experimental results. To
overcome this problem, the emulation experiment approach may be used. Emulation differs from
simulation in that an emulation is like an experiment and thus must be executed in real time while
simulation speed can be faster or slower than the real time. Furthermore, in an emulation some
real devices running real operation systems and application programs will interact with some
simulated devices. In contrast, in a simulation generally no real operation systems or applications
are involved.
In this article, we introduce the EstiNet OpenFlow network simulator and emulator [7]. EstiNet
uses an unique approach to testing the functions and performances of OpenFlow controllers. By
Page 3
3
using an innovative simulation methodology, which is called the “kernel reentering methodology,”
EstiNet combines the advantages of both the simulation and the emulation approaches. In a
network simulated by EstiNet, a simulated device can run the real Linux operating system
and any UNIX-based real application program can readily run on a simulated device without
any modification. With these unique capabilities, EstiNet’s simulation results are as accurate as
those obtained from an emulation while still preserving the many advantages of the simulation
approach.
In testing OpenFlow controllers, since the first and most widely-used NOX OpenFlow con-
troller [8] is a real application program runnable on Linux, NOX can readily run on a host in
an EstiNet simulated network to control thousands of simulated OpenFlow switches. In EstiNet,
because these real OpenFlow controllers are tested and evaluated in simulations rather than in
emulations, the tests and evaluations can be performed much faster than real time. In addition, in
EstiNet, the performance results of a simulated OpenFlow network managed by these OpenFlow
controllers are correct, accurate, and repeatable. These performance results can be correctly
explained based on the parameter settings (e.g., link bandwidth, delay, downtime, etc.) and
configurations (e.g., network size, mobility pattern or speed) of the simulated OpenFlow network.
We have used EstiNet to perform functional validation and performance evaluation of several
protocols that are implemented by NOX as components. In this article, we choose the learning
bridge protocol (LBP) and the spanning tree protocol (STP) as illustration examples. We studied
the complicated interactions among these OpenFlow-implemented protocols and the address
resolution protocol (ARP) when the network traffic is TCP traffic. Our simulation results and
detailed logs reveal their behavior, efficiency, and implementation flaws under the tested network
settings.
II. SIMULATION ARCHITECTURE OF ESTINET
To implement the kernel reentering methodology, EstiNet uses tunnel network interfaces to
automatically intercept the packets exchanged by two real applications and redirect them into
the EstiNet simulation engine. As shown in Figure 1 (a), inside the EstiNet simulation engine, a
protocol stack composed of the MAC/Phy layers along with other layers below the IP layer are
created for each simulated host. Packets to be sent out on host 1 are sent out to the output queue
of tunnel interface 1 where the simulation engine will fetch them later. After fetching a packet
Page 4
4
from tunnel interface 1, the simulation engine processes the packet through the protocol stack
created for host 1 to simulate the MAC/Phy and many other mechanisms of the network interface
used by host 1. For example, the effects of the link delay, link bandwidth, link downtime, and
link bit-error-rate (BER) are all simulated in the Phy module. The Phy module of host 1 will
deliver the packet to the Phy module of host 2 after the link delay plus the transmission time of
the packet on this link based on the simulation clock. Then, the packet will be processed from
the Phy module up to the interface module, where it is written back into the kernel via tunnel
interface 2. The packet will then go through the IP/TCP/Socket layers and finally be received by
the application running on host 2, which ends its journey. By this methodology, all Linux-based
real applications can run on a simulated network in EstiNet without any modification and they
all use the real TCP/IP protocol stack in the Linux kernel to create their TCP connections.
Figure 1 (b) shows how we extend this methodology to support running a real OpenFlow
controller on an EstiNet simulated network. Since real OpenFlow controllers such as NOX
are normal application programs, they readily run on a simulated host in EstiNet without any
modification. However, because a real OpenFlow switch needs to set up a TCP connection to
the OpenFlow controller to receive its messages, we simulate the operations of each OpenFlow
switch inside the simulation engine and let it create an OpenFlow TCP socket bound to a network
interface (in this example, the used network interface is tunnel interface 2). With this design,
in EstiNet a simulated OpenFlow switch can set up a real TCP connection to a real OpenFlow
controller to receive its messages. All messages exchanged between a real OpenFlow controller
and a simulated OpenFlow switch are accurately scheduled based on the simulation clock.
Therefore, the results of functional validation and performance evaluation of a real OpenFlow
controller are correct and repeatable over EstiNet.
III. COMPARISON WITH RELATED TOOLS
Currently, very few network simulators support the OpenFlow protocol and the most notable
one is ns-3 [9]. Ns-3 is the most widely used network simulator in the world. There is a project of
ns-3 about supporting the OpenFlow protocol and the version of the OpenFlow protocol supported
is 0.89, which is old as the latest version of OpenFlow as of this writing is already 1.3.1. Ns-3
simulates the operations of an OpenFlow switch by compiling and linking an OpenFlow switch
C++ module with its simulation engine code. To simulate a real OpenFlow controller, ns-3 also
Page 5
5
Interface
ARP
FIFO
MAC8023
TCPDUMP
Phy
Interface
ARP
FIFO
MAC8023
TCPDUMP
Phy
Tunnel 1 Tunnel 2
Interface
ARP
FIFO
Phy
Interface
ARP
FIFO
Phy
MAC8023 MAC8023
TCPDUMP TCPDUMP
Tunnel 2
FIFO
MAC8023
Phy
Socket
Tunnel 1
TCP
Socket
OpenFlowSwitch
Simulation Engine
IP
User Space
Kernel Space
Controller
Host 1 Host 2
User Space
TCP/UDP
IP
Kernel Space
(1)
(3)
(2)
Socket
Host 1’s
Application
Program
(4)
(5)
(6)
(9)
(8)
(7)
Socket
Host 2’s
Application
Program
Simulation Engine
(a) The host−to−host case
(b) The controller−to−OpenFlowSwitch case
Fig. 1: Simulation Architecture of EstiNet
implements it as a C++ module and compiles and links it with its simulation engine code. In
fact, all devices/objects simulated in ns-3 are implemented as C++ modules compiled and linked
together with its simulation engine code to form a user-level executable program (i.e., the ns-3).
Because the ns-3 program is a user-level program and a real OpenFlow controller such as
NOX is also a user-level program, a real OpenFlow controller program cannot be compiled and
Page 6
6
linked together with the ns-3 program to form a single executable program. As a result, a real
OpenFlow controller cannot readily run without modification on a node in a network simulated
by ns-3. It is this reason why ns-3 has to implement its own OpenFlow controller from scratch
as a C++ module. This approach wastes much time and effort to re-implement widely-used
real OpenFlow controllers. In addition, the running behavior of a re-implemented OpenFlow
controller module in ns-3 may not be the same as the behavior of a real OpenFlow controller
because the former is a much-simplified abstraction of the latter. For example, as documented in
ns-3, the spanning tree protocol and the MPLS function are not supported. As another example,
in ns-3 there is no TCP connection between a simulated OpenFlow switch and its simulated
OpenFlow controller. Since this usage is not the usage in the real world, the simulation results
will differ from the real results when the TCP connection in the real world experiences packet
losses or congestion.
Regarding network emulators that support the OpenFlow protocol, currently there are very few
such tools and the most notable one is Mininet [10]. Mininet uses the virtualization approach
to create emulated hosts and uses the Open vSwitch [11] to create software OpenFlow switches
on a physical server. The links connecting a software OpenFlow switch to emulated hosts or to
other software OpenFlow switches are implemented by using the virtual Ethernet pair mechanism
provided by the Linux kernel. Because an emulated host in Mininet is like a virtual machine, real
applications can readily run on it to exchange information. A real OpenFlow controller, which is
also a real application, can also run on an emulated host to set up TCP connections to software
OpenFlow switches to control them. With this approach, emulated hosts and software OpenFlow
switches can be connected together to form a desired network topology and be controlled by a
real OpenFlow controller.
Although Mininet can be used to as a rapid prototyping for software-defined networks, it has
several limitations. As stated in [10], the most significant limitation of Mininet is its lack of
performance fidelity because it provides no guarantee that an emulated host in Mininet that is
ready to send a packet will be scheduled promptly by the operating system to send the packet and
it provides no guarantee that all software OpenFlow switches in Mininet will forward packets
at the same rate. The packet forwarding rate of a software OpenFlow switch in Mininet is
unpredictable and varies in every experimental run as it depends on the CPU speed, the main
memory bandwidth, the numbers of emulated hosts and software OpenFlow switches that must
Page 7
7
be multiplexed over a CPU in Mininet, and the current system activities and load. As a result,
Mininet can only be used to study the behavior of an OpenFlow controller but cannot be used
to study any time-related network/application performance.
In contrast, EstiNet combines the advantages of both the simulation and the emulation ap-
proaches without their respective shortcomings. Like in an emulation, in EstiNet a real OpenFlow
controller can readily run without modification to control simulated OpenFlow switches and real
applications can readily run on hosts running a real operating system to generate realistic network
traffic. However, the operations and interactions among these real applications, the real OpenFlow
controller, the OpenFlow switches, hosts and links in a studied network are all scheduled by the
EstiNet simulation engine based on its simulation clock, rather than be multiplexed and executed
in an unpredictable way by the operating system. For this reason, differing from Mininet, EstiNet
generates time-related OpenFlow performance results correctly and the results are repeatable.
In Figure 2, we compare EstiNet, ns-3, and Mininet according to their latest developments.
Most comparison results are self-explanatory and thus we only explain the scalability and GUI
comparison results. EstiNet uses the kernel reentering methodology to use a single kernel to
support multiple hosts and its simulation engine process can support multiple OpenFlow switches.
As a result, it is highly scalable. Ns-3 is also highly scalable as its simulated hosts, OpenFlow
switches, and controller are all implemented as C++ modules and linked together as a single
process. In contrast, Mininet needs to run up a shell process (e.g., /bin/bash) to emulate each
host and needs to run up a user-space Open vSwitch process (or a kernel-space Open vSwitch) to
simulate each OpenFlow switch. As a result, it is less scalable than EstiNet and ns-3. Regarding
the GUI supports, which are very important for the user, EstiNet’s GUI can be used to easily set
up and configure a simulation case and be used to observe the packet playback of a simulation
run. The GUI of ns-3, on the other hand, can only be used for observation of the results and
the user needs to write C++ or scripts to set up and configure the simulation case. For Mininet,
its GUI can be used for observation purposes only and the user needs to write Python scripts to
set up and configure the simulation case.
IV. STUDY TARGETS AND SIMULATION SETTINGS
We used the network topology shown in Figure 3 to study how NOX implements its LBP
and STP. These protocols are implemented as the “switch” and the “spanning tree” components
Page 8
8
Fig. 2: A comparison of EstiNet, ns-3, and Mininet
in NOX. The reason why we chose to study them is because these protocols are important and
fundamental to the operations of a network and they are relatively the most complicated ones
among those provided in NOX. During the simulations, these components are loaded into and
reside in the core of NOX simultaneously.
Nodes 3, 4, 5 and 11 are simulated hosts running the real Linux operating system where real
applications can run without modification. Nodes 6, 7, 8, 9, and 10 are simulated OpenFlow
switches supporting the OpenFlow 1.0 protocol. Node 1 is the host where NOX will be running
during simulation. (In the following, we call it the “controller node” for brevity.) Node 2 is a
simulated legacy (normal) switch that connects all simulated OpenFlow switches together with
the controller node. It forms a management network over which the TCP connection between
each simulated OpenFlow switch and the controller node will be set up. All OpenFlow messages
between NOX and simulated OpenFlow switches are exchanged over this management network.
In contrast, the network formed by simulated OpenFlow switches, simulated hosts, and the links
connecting them together is the data network over which real applications running on simulated
Page 9
9
hosts will exchange their information. We set the bandwidth and delay of each link in both the
management and data networks to be 10 Mbps and 10 ms, respectively. To test the path-finding
and network convergence performance of NOX after a link failure, each simulation run starts at
0’th sec and ends at 100’th sec in the simulated network and the link between nodes 6 and 7 is
purposely turned down between 40’th sec and 65’th sec.
Fig. 3: The topology of the tested OpenFlow network and the original path of a TCP traffic
flow
We tested the behavior of a greedy TCP flow on the data network controlled by NOX.
Because in EstiNet real applications can directly run on simulated hosts, we chose node 11
as the destination host and ran up the “rtcp” application on it. We chose node 3 as the source
host and ran up the “stcp” application on it. Once the stcp successfully sets up a real TCP
connection with the rtcp, it generates greedy TCP traffic to the rtcp subject to the TCP error,
flow, and congestion control algorithms. All of these real applications are set to start at 30’th sec
rather than 0’th sec. This is because we want the NOX’s STP to have formed a stable spanning
tree over the data network before any packet enters into it.
We also tested the effects of the ARP protocol on the path-finding and network convergence
speed of the simulated OpenFlow network. Normally, on a real network the ARP protocol is
Page 10
10
enabled on every host and triggered on demand to find out the (MAC address, IP address)
mapping relationship. However, under some circumstances. this mapping table can be pre-built
to avoid the ARP request/reply latency and in this case the ARP protocol is disabled on hosts.
Our simulation results show that enabling or disabling the ARP protocol can have a significant
impact on the path-finding capability/speed of an OpenFlow network.
V. FUNCTIONAL VALIDATION
Before presenting the functions of NOX’s LBP and STP over the tested network, we briefly
explain the main OpenFlow messages exchanged between NOX and OpenFlow switches to
implement LBP and STP.
In OpenFlow 1.0, a PacketIn message is issued by an OpenFlow switch (to save space, in the
following we will use “switch” to refer to an OpenFlow switch when there is no ambiguity)
to the controller to ask it how to process a received packet. It can contain the full content of
the packet or just the headers of the packet with a few bytes of the data payload. A PacketOut
message is issued by the controller to a switch to instruct it how to process and send out a packet.
The packet may be carried in the PacketOut message or is a packet that is already buffered in a
switch waiting for a PacketOut message. A FlowModify message is issued by the controller to
a switch to add/modify/remove a flow entry in the flow table. When a packet enters a switch,
it is matched against all flow entries in the flow table. If it matches one or more entries, the
entry with the highest priority will be used to process the packet. Each entry can be associated
with some actions, which may be DROP, FORWARD, etc. and the actions will be applied to a
matched packet. If there is no match in the flow table (which is called a table miss), a switch
can decide to drop the packet or issue a PacketIn message to the controller asking it how to
process the packet. Initially, the flow table in every switch is empty. The PortModify message
is issued by the controller to a switch to change the status of one of its ports. For example, the
status of a port can be set to Flood or NonFlood, which means that when the switch needs to
send a packet out of all of its ports (excluding the ingress port), whether this port should be
included or not for a flood operation.
Page 11
11
A. LBP in NOX
Here we use Figure 3 to explain how NOX’s LBP works on the tested network. Suppose
that the source host (node 3) sends its first TCP DATA packet to the destination host (node
11). When this packet enters node 6, a table miss event is generated because there is no flow
entry in the table that can be matched to it. This event causes node 6 to issue a PacketIn
message to NOX asking it how to process this packet. After receiving the message, NOX learns
the (source node, ingress port) = (node 3, left port) mapping information for node 6. It then
issues a PacketOut message to instruct node 6 to flood this packet because it does not have any
forwarding information for the destination host (node 11) so far. After receiving the PacketOut
message, node 6 floods the packet to nodes 9 and 7. To simplify the discussion, we now focus
on the flooding path along nodes 6, 9, and 10.
After node 9 receives the packet, like what has occurred in node 6, it issues a PacketIn message
to NOX and NOX learns that on node 9 (source node, ingress port) = (node 3, top port). Not
knowing to which output port to forward this packet, NOX also sends out a PacketOut to node 9
instructing it to flood the packet. When node 10 receives the packet, the same scenario happens
and NOX learns that on node 10 (source node, ingress port) = (node 3, left port). Finally, when
node 11 (the destination host) receives the packet, it sends back a TCP ACK packet to the
source host (node 3). When this packet enters node 10, because there is no entry in its flow
table for this reverse flow, node 10 issues a PacketIn message to NOX asking for instructions.
Now with the mapping information learned previously, NOX issues a FlowModify message to
node 10 instructing it to add a flow entry of (destination node, output port) = (node 3, left port)
into its flow table and forward the packet out of its left port. When node 9 receives the packet,
the same scenario occurs. It issues a PacketIn to NOX and NOX instructs it to add an entry of
(destination node, output port) = (node 3, top port) and forward the packet out of its top port.
The same scenario applies to node 6 and finally the TCP ACK packet reaches node 3, finishing
a round trip.
When the TCP ACK packet is coming back on the returning path, because nodes 10, 9, and
6 each issues a PacketIn message to NOX, NOX also learned (source node, ingress port) =
(node 11, bottom port), (node 11, right port), and (node 11, bottom port) for nodes 10, 9, and 6,
respectively. However, we found that although NOX learns a mapping from a PacketIn message
Page 12
12
issued by a switch, it does not immediately add this mapping as a flow entry to that switch
using the FlowModify message. Instead, it silently keeps the learned mapping information until
the mapping is used in the future. For this reason, when the second TCP DATA packet traverses
nodes 6, 9, and 10 to reach node 11, each of these nodes will generate a table miss event and
issue a PacketIn message to NOX again. However, at this time NOX already knows how to
forward the packet to node 11 on these nodes. Therefore, it issues FlowModify messages to add
these learned mapping information into the flow tables of nodes 6, 9, and 10 and instruct them
to forward the packet out of a port according to the newly added entries. So far, all required
flow entries for both the forward and reverse directions of this TCP flow have been installed in
the flow tables of nodes 6, 9, and 10. Starting from the third TCP DATA packet, when a TCP
DATA or ACK packet enters these nodes, no more PacketIn messages will need to be sent to
NOX from these nodes.
In the above case, however, if the source host sends out UDP packets to the destination host
and the destination host does not reply any packet to the source host, we found that the behavior
of the OpenFlow network is very different from the TCP traffic case. When the first UDP packet
traverses nodes 6, 9, 10 to reach the destination host, the interactions between these nodes and
NOX are the same as those in the TCP traffic case. However, because there is no returning packet
from the destination host to the source host, NOX has no chance to learn (source node, ingress
port) = (node 11, bottom port), (node 11, right port), and (node 11, bottom port) for nodes 10, 9,
and 6, respectively. As a result, when the second and all of following UDP packets continue to
traverse nodes 6, 9, and 10, each of them will trigger a table miss event on each of these nodes,
causing an excessive number of PacketIn messages to be sent to NOX continuously. Conceivably,
when the number of nodes on the path is large, when the sending rate of an unidirectional UDP
flow is high, or when there are many unidirectional UDP flows in the network, NOX will be
burdened by a high rate of PacketIn messages, reducing its capability to manage a large network.
B. STP in NOX
NOX’s STP uses the LLDP (Link Layer Discovery Protocol) [12] packets to discover the
topology of an OpenFlow network. For each switch, after it is powered on and has established a
TCP connection to NOX, NOX immediately sends it a FlowModify message to add an entry into
its flow table. This flow entry will match future received LLDP packets and its associated action
Page 13
13
is “Send the received LLDP packet to the controller.” For each port of a switch, every 5 seconds
(the LLDP transmission interval) NOX sends a PacketOut message to the switch asking it to
send the LLDP packet carried in the PacketOut message out of the specified port. Since every
switch has already had a flow entry matching received LLDP packets, when a switch receives
an LLDP packet from one of its neighboring switches, it will send the received LLDP packet to
NOX. With these received LLDP packets from all switches, NOX builds the complete network
topology and computes a spanning tree over it. For a link that is included/not included on the
computed spanning tree, NOX sends a PortModify message to each of the two switches that
are at the two ends of the link. This message enables/disables the flooding status of the port
connected to the link. For each link detected in NOX, a 10-second (which is two times of the
LLDP transmission interval) timer is set up in NOX to monitor its connectivity. If a link’s timer
expires, NOX views that this link is currently down and recomputes a new spanning tree. Then,
it uses PortModify messages to change the flooding status of the affected ports.
We found that the PacketIn and PacketOut messages triggered by the exchanges of LLDP
packets on a large network can cause a heavy processing burden for NOX. For example, if there
are 100 switches in the network and each has 24 ports connecting to neighboring switches,
because there are 2,400 ports in total in the network, NOX will have to process 2,400 PacketOut
messages plus 2,400 PacketIn messages every 5 seconds just for LLDP packets alone.
VI. PERFORMANCE EVALUATION
We study how quickly a TCP flow can change its path to a new path after a link failure. In
Section VI-A, we first study the details of two cases when the LLDP transmission interval is
the original value of 5 seconds, with the first case carried out when ARP is enabled and the
second case carried out when ARP is disabled. Then, in Section VI-B, we study the effects of
the LLDP transmission interval on the new path finding time of the TCP flow when ARP is
enabled.
A. Detailed Case Studies
On the tested network depicted in Figure 3, before the link between nodes 6 and 7 breaks at
40’th sec, NOX’s STP disables the link between nodes 9 and 10 and the link between nodes 10
and 8. Therefore, the TCP flow from the source host to the destination host traverses nodes 3, 6,
Page 14
14
Fig. 4: The new path of a TCP traffic flow after the link between nodes 6 and 7 becomes
down under the ARP-enabled condition
7, 10, and 11. After the link breakage, NOX’s STP enables back these two links but disables the
link between nodes 7 and 8 to maintain a loopless and connected topology. In the following, we
show that (1) when ARP is enabled, the new path taken by the TCP flow is changed to the path
traversing nodes 3, 6, 9, 10, and 11 at 62’th sec, as shown in Figure 4; However, the TCP flow
never changes back to its original path after the link downtime; and (2) when ARP is disabled,
the TCP flow never changes its path to the new path during the link downtime between 40’th
sec and 65’th sec and it becomes active over the original path after the link downtime at 84’th
sec. These results are caused by the flow idle timers used in OpenFlow switches, NOX’s LBP
and STP implementations, and their interactions with ARP.
Figure 5 (a) shows the timeline of the important events of the TCP flow when ARP is enabled
on hosts. Since the link becomes down at 40’th sec, as discussed previously, because NOX’s STP
uses a separate 10-second timer to detect the failure of each link, one would expect that the new
spanning tree would be formed very quickly around 50+’th sec and the TCP flow would change
to the new path quickly after the new spanning tree is formed (i.e., also around 50+’th sec).
However, this is not the case and the TCP flow actually changes to the new path and becomes
Page 15
15
active at around 62’th sec (at the F event).
Fig. 5: The timeline for a TCP flow to change/keep its path after the link between nodes 6 and
7 breaks at 40’th sec
The events A, B, C, D, E and F represent the timestamps when the TCP flow continues
to resend a packet lost on the broken link. The timestamp of event A is 40.3’th sec and the
intervals between two successive retransmission events starting from event A are 0.7, 1.38, 2.72,
5.44, and 11.30 seconds, respectively. These exponentially-growing intervals are caused by the
TCP congestion control algorithm on the source host. The F event represents the successful
retransmission of the lost packet over the new path. After that, the TCP flow becomes active
in transmitting packets over the new path. The X event represents the source host sending an
ARP request while the Y event represents the destination host returning the ARP reply back
to the source host. The X event is triggered by the TCP flow retransmitting the lost packet at
the F timestamp. This is because on the source host the ARP entry for the destination host had
expired during the long TCP retransmission interval and an ARP request must be broadcast to
the network to rebuild the entry. (Note: An ARP entry may expire after an idle period of between
Page 16
16
2 and 4 seconds in the simulations, depending on the relative timing between the installation of
the entry and the 2-second periodic flush operations.)
In the following, we explain why the TCP flow experiences unexpected delays before it
successfully changes to the new path when ARP is enabled. The reason why the retransmission
attempts at events A, B, C, D and E fail is the same. From the EstiNet log, we found that NOX’s
STP does not detect the link failure between 40’th sec and 50’th sec and the new spanning tree
is formed at 52’th sec, which is after the timestamp of event E. Therefore, all of these resent
TCP packets are forwarded over the broken link and get lost. (Note that NOX uses a 5-second
interval to periodically update the Flood/NonFlood status of the ports of a switch and our log
shows that the update occurs at 52’th sec.) As for the retransmission at event F, it succeeds and
the reason is explained below. As discussed before, the broadcast ARP request at the timestamp
of X is triggered by the resent TCP packet at event F. Because the flow entries added in all
switches for the previous ARP request/reply transmitted by the source and destination hosts had
expired during the long TCP retransmission timeout, when an ARP request enters a switch,
the switch will issue a PacketIn message to NOX asking for forwarding instructions. In return,
NOX sends back a PacketOut message instructing the switch to flood the ARP request (which
is a broadcast packet) out of all of its ports. Although the right port of node 6 connects to the
broken link, the bottom port connecting to node 9 is functioning. As a result, the ARP request
can traverse nodes 6, 9, 10 to reach the destination host and the destination host can send back
the unicast ARP reply back to the source host, like the scenario described in Section V-A.
The effects of these ARP request and reply packets not only install ARP flow entries in
switches but also let NOX learn the most updated flow forwarding information for each switch.
Later, when the resent TCP packet enters node 6, because there is no entry for this TCP flow
(note that there is an ARP entry just installed, however, because its type is not TCP, this flow
entry does not match the incoming TCP packet), node 6 issues a PacketIn message to NOX
asking for instructions. For this TCP packet, NOX now returns a PacketOut message instructing
the switch to add an updated and correct flow entry for this TCP flow and forward the TCP
packet out of the bottom port of node 6, which is correct. After that, this TCP packet and its ACK
packet follow the scenario described in Section V-A to (1) update NOX of correct forwarding
information for this TCP flow on all switches and (2) install correct flow entries for this TCP
flow in all switches. With all correct entries installed in the switches on the new path, starting
Page 17
17
from the second TCP packet after the timeout, the TCP flow becomes active on the new path at
62’th sec.
In contrast to the fact that the TCP flow can change to a new path when ARP is enabled,
when ARP is disabled the TCP flow never changes to the new path but instead becomes active
again on the original path at 84’th sec. In the following, we explain the reasons.
When ARP is disabled on hosts, as shown in Figure 5 (b), one sees that the retransmission
attempt at event P fails. This failure doubles the TCP retransmission interval from 11.30 seconds
to 22.60 seconds and causes the TCP flow to resend the lost packet at the timestamp of P plus
22.60 seconds, which is at 84’th sec (at the Q timestamp). After event Q, because the link
downtime has passed and NOX never found a new path for the TCP flow during the link
downtime, the TCP flow still uses its old path to successfully transmit its packets.
As for the reason why when ARP is disabled the retransmission attempt at event P still fails,
we found that it is caused by a bug of NOX’s STP. For each entry added to the flow table of a
switch, the switch sets up an idle timer for it and will remove the entry if it has not been used
for more than 5 seconds. As a result, the flow entry for the TCP flow expires and is removed
at the timestamp of E plus 5 seconds (which is 56’th sec). From the EstiNet log, we observed
that node 6 does send a FlowRemoved message to NOX to inform it of the removal of this
flow entry at 56’th sec. Later at 62’th sec when a resent TCP packet enters node 6, because the
flow entry for this TCP flow had expired and been removed before, the switch issues a PacketIn
message to NOX asking for its forwarding instruction.
Surprisingly, we found that NOX issues back a PacketOut message instructing the switch
to add a flow entry for this flow and forward the resent packet out of the port connected to
the broken link. That is, even though NOX has received the FlowRemoved notification from
the switch at 56’th sec, it still keeps the old and obsolete forwarding information for this flow
and gives the wrong forwarding entry and instruction to the switch at 62’th sec. Since this
resent packet is lost again on the broken link, the TCP flow has to resend the packet at the
next retransmission timeout, which occurs at 84’th sec. We note that the EstiNet log shows
that NOX’s STP has detected the broken link and re-formed a new spanning tree at 52’th sec.
However, the above results show that detecting a link failure and accordingly disabling that link
does not cause NOX to automatically remove all obsolete forwarding information related to that
link. These results indicate that in NOX the STP and LBP components do not synchronize their
Page 18
18
gathered information well and thus, as shown in this study, they may result in wrong operations
of an OpenFlow network. Note that this bug does not happen when ARP is enabled and the
ARP request/reply transmissions happen before the transmission of the resent TCP packet, as
shown in Figure 5 (a). Since the ARP request/reply trigger PacketIn messages sent to NOX,
we conjecture that these PacketIn messages replace the wrong forwarding information stored in
NOX with correct information.
Another important finding from Figure 5 (a) is that the TCP flow does not change its path
from the new path back to its original path after the link downtime, even though the spanning
tree has been restored to the original one. This problem is caused by the flow idle timers used in
the switches on the new path. Because the TCP flow is active in sending packets after changing
to the new path, the flow entries for the TCP flow in these switches, which were created over
the spanning tree during the link downtime, will never expire. Since every incoming TCP packet
will match these entries and will not generate a table miss event, these switches will not send
any PacketIn messages to NOX. Therefore, NOX has no chances to install new entries into the
flow tables of these switches to change the path of the TCP flow back to its original one, which
may be better than the new path in terms of hop count or available bandwidth.
B. Effects of the LLDP Interval
To study the effects of the LLDP transmission interval and the link failure detection timeout
value (which is two times of the LLDP transmission interval) on the new path finding time of
the TCP flow when ARP is enabled, we varied the value of the LLDP interval from 1, 2, 3,
..., to 10 seconds and observed at what time the TCP flow can switch to the new path. (Note:
When ARP is disabled, we have shown that the TCP flow will never change to a new path due
to the implementation of NOX.) Figure 6 shows the results. After careful investigations into the
causes, we found that this phenomenon is caused by the complicated interactions among several
timers used in an OpenFlow switch and NOX. When the LLDP interval is reduced to 1 second,
a link failure can be detected quickly after 2*1 = 2 seconds. However, because the flow entry
for the TCP flow in node 6 is used and matched on each resent TCP packet, this obsolete flow
entry (whose output port still points to the broken link) continues to reside in the flow table and
be used for the resent TCP packets at event A, B, C, and D in Figure 5 (a). As a result, all of
these retransmitted TCP packets are lost on the broken link. (Note: This finding shows a design
Page 19
19
flaw in NOX’s STP as it should send a FlowModify message to node 6 to delete the obsolete
TCP flow entry as soon as it detects the link failure.)
0
10
20
30
40
50
60
70
80
90
100
0 1 2 3 4 5 6 7 8 9 10 11
New
path
fin
din
g tim
e (
sec)
LLDP transmission interval (sec)
ARP enabled
Fig. 6: The new path finding time of a TCP flow under different LLDP transmission intervals
On the next retransmission attempt at event E, because the TCP retransmission timeout between
event D and E is 5.44 seconds (which is larger than the 5-second flow entry idle timeout value),
the flow entry for this TCP flow had expired and a table miss event will be generated. However,
because the ARP entry on the source host had expired as well during this long TCP retransmission
interval, at the timestamp of event E, an ARP request is instead generated and enters into node
6. Starting from this moment, the scenario that we described in Section VI-A when explaining
events X, Y, and F occurs again. Therefore, the TCP flow can now switch to the new path at
52’th sec when the LLDP interval is reduced to 1 second.
All of the results shown in Figure 6 can be explained accurately based on the OpenFlow
protocol and the implementations of ARP, TCP, and NOX’s STP and LBP. Due to the paper
length limitation, however, we can only pick one result to explain in details. Nevertheless, this
study already shows the performance fidelity of EstiNet when used to evaluate a real OpenFlow
controller.
Page 20
20
VII. CONCLUSION
In this article, we present the EstiNet OpenFlow network simulator and emulator and use it as
a platform to perform functional validation and performance evaluation of the NOX OpenFlow
controller. EstiNet uses an unique kernel reentering simulation methodology to combine the
advantages of both the simulation approach and the emulation approach. By this methodology.
a real OpenFlow controller can run without modification to control thousands of simulated
OpenFlow switches. In addition, real applications can run without modification on simulated
hosts that run the real Linux operating system to generate realistic network traffic.
Our simulation study provides important insights into how NOX implements the functions of
the learning bridge protocol (LBP) and the spanning tree protocol (STP) based on the OpenFlow
1.0 protocol. NOX implements these protocols as separate components that can be loaded into
the core of NOX simultaneously. Our detailed logs reveal that the LBP and STP components
in NOX do not synchronize their gathered information well and thus NOX may give wrong
forwarding instructions to an OpenFlow switch after a link failure. Our another finding is that
when NOX’s STP detects a link failure, it does not send a message to an affected switch to
delete obsolete flow entries. As a result, because the obsolete flow entry expires only after an
idle period of 5 seconds, it may be matched and used endlessly causing the OpenFlow switch
to continue to forward incoming packets onto a broken link.
In summary, our results show that the LBP and STP components provided in NOX only
implement basic functions and lack information synchronization. As revealed in this paper, there
is much room left to further improve them.
REFERENCES
[1] “Software-Defined Networking: The New Norm for Networks,” a white paper of Open Networking Foundation, April 13,
2012.
[2] Nick Mckeown, Tom Anderson, Hari BalaKrishnan, Guru Parulkar, Larry Peterson, Jennifer Rexford, Scott Shenker, and
Jonathan Turner, “OpenFlow: Enabling Innovation in Campus Networks,” ACM SIGCOMM Computer Communication
Review, Volume 38 Issue 2, April 2008.
[3] B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad, M. Newbold, M. Hibler, C. Barb, and A. Joglekar. “An Integrated
Experimental Environment for Distributed Systems and Networks,” In Proc. of the Fifth Symposium on Operating Systems
Design and Implementation, pages 255 - 270, Boston, MA, Dec. 2002.
[4] B. Chun, D. Culler, T. Roscoe, A. Bravier, L, Peterson, M. Wawrzoniak, and M. Bowman, “PlanetLab: an Overlay Testbed
for Broad-Coverage Services,” ACM SIGCOMM Computer Communication Review, Volume 33 Issue 3, July 2003.
Page 21
21
[5] N. Foster, R. Harrison, M.J. Freedman, C. Monsanto, and J. Rexford, A. Story, and D. Walker, “Frenetic: A Network
Programming Language,” in Proc. of ICFP 2011.
[6] Macro Canini, Daniele Venzano, Peter Peresini, Dejan Kostic, and Jennifer Rexford, “A NICE Way to Test OpenFlow
Applications,” in Proc. of Networked System Design and Implementation, April 2012.
[7] EstiNet 8.0 OpenFlow Network Simulator and Emulator, EstiNet Technologies Inc., available at http://www.estinet.com.
[8] Natasha Gude, Teemu Koponen, Justin Pettit, Ben Pfaff, Martn Casado, Nick McKeown, Scott Shenker, “NOX: towards an
Operating System for Networks,” ACM SIGCOMM Computer Communication Review, Volume 38 Issue 3, July 2008
[9] T. R. Henderson, M. Lacage, and G. F. Riley, “Network Simulations with the ns-3 Simulator,” ACM SIGCOMM’08, August
17-22, 2008, Seattle, USA.
[10] Bob Lantz, Brandon Heller, and Nick McKeown, “A Network in a Laptop: Rapid Prototyping for Software-Defined
Networks” ACM Hotnets 2010, October 20-21, 2010, Monterey, CA, USA.
[11] B. Pfaff, J. Pettit, T. Koponen, K. Amidon, M. Casado, and S. Shenker, “Extending Networking into the Virtualization
Layer,” in Prof. of HOTNETS 2009.
[12] Link Layer Discovery Protocol, IEEE 802.1AB standards.