BEBA D2.1 Basic BEBA Abstraction API.v1.0 final · BEBA Behavioural Based Forwarding Deliverable Report D2.1 Basic BEBA abstraction API Project co-funded by the European Commission

BEBA Behavioural Based

Forwarding Grant Agreement: 644122

BEBA/WP2 – D2.1 Version: 1.0 Page 1 of 39

BebaBEhavioural BAsed forwarding


Forwarding

Deliverable Report

D2.1 Basic BEBA abstraction API

Project co-funded by the European Commission within the Horizon 2020 (H2020) Programme

DISSEMINATION LEVEL PU Public X PP Restricted to other programme participants (including the Commission

Services)

RE Restricted to a group specified by the consortium (including the Commission Services)

CO Confidential, only for members of the consortium (including the Commission Services)

Deliverable title Basic BEBA abstraction API Version 1.0 Due date of deliverable (month) August 2015

Actual submission date of the deliverable (dd/mm/yyyy) 01/09/2015

Start date of project (dd/mm/yyyy) 01/01/2015

Duration of the project 27 months Work Package WP2 Task T2.1 Leader for this deliverable CNIT Other contributing partners KTH

Authors Marco Bonola, Giuseppe Bianchi, Salvatore Pontarelli, Antonio Capone, Carmelo Cascone, Luca Pollini, Davide Sanvito (CNIT)

Deliverable reviewer(s) Georgios Katsikas (KTH)





REVISION HISTORY Revision Date Author Organisation Description

0.1 13/05/2015 Marco Bonola CNIT TOC draft 0.2 29/05/2015 Carmelo Cascone CNIT Section 3 draft 0.3 03/06/2015 Marco Bonola CNIT Section 1 draft 0.4 04/06/2015 DavideSanvito CNIT Section 4 draft 0.5 04/06/2015 Luca Pollini CNIT Section 4 draft 0.6 08/06/2015 Marco Bonola CNIT Section 2.1 draft 0.7 09/06/2015 DavideSanvito CNIT Section 4 draft 0.8 21/08/2015 Marco Bonola CNIT First complete

draft 1.0 01/09/2015 Marco Bonola CNIT Final version

PROPRIETARY RIGHTS STATEMENT

This document contains information, which is proprietary to the BEBA consortium. Neither this document nor the information contained herein shall be used, duplicated or communicated by any means to any third party, in whole or in parts, except with the prior written consent of the BEBA consortium. This restriction legend shall not be altered or obliterated on or from this document.

STATEMENT OF ORIGINALITY

This deliverable contains original unpublished work except where clearly indicated otherwise. Acknowledgement of previously published material and of the work of others has been made through appropriate citation, quotation or both





TABLE OF CONTENT

1 OVERVIEW OF THE BEBA PROGRAMMABILITY GOALS AND APPROACH ................. 6

1.1 PROBLEM STATEMENT .............................................................................................. 6

1.2 BEBA SOLUTION APPROACH ...................................................................................... 8

2 BEBA BASIC API REQUIREMENTS ....................................................................... 11

2.1 SUMMARY OF THE PROPOSED USE CASES ...................................................................... 11

2.2 WHAT IS SUPPORTED BY THE BEBA BASIC API? ............................................................ 13

3 HIGH LEVEL DESCRIPTION OF THE BEBA BASIC ABSTRACTION ......................... 15

3.1 STATEFUL PIPELINING ............................................................................................ 15

3.2 FLOW STATES ..................................................................................................... 16

3.2.1 Flow identification ...................................................................................... 16

3.2.2 State table ................................................................................................ 16

3.2.3 Set-state action ......................................................................................... 18

3.2.4 State modification commands ...................................................................... 19

3.3 GLOBAL STATES .................................................................................................. 19

3.3.1 Set-flag action ........................................................................................... 19

3.3.2 Flag modification commands ....................................................................... 20

4 BEBA BASIC STATEFUL PROTOCOL ..................................................................... 21

4.1 EXPERIMENTER MESSAGES ...................................................................................... 21

4.1.1 State modification messages ....................................................................... 21

4.2 EXPERIMENTER ACTIONS ......................................................................................... 25

4.2.1 Set-state action ......................................................................................... 25

4.2.2 Set-flag action ........................................................................................... 26

4.3 MATCH FIELDS .................................................................................................... 27





4.3.1 State match field ....................................................................................... 27

4.3.2 Flags match field ....................................................................................... 27

4.4 EXPERIMENTER STATISTICS MESSAGE .......................................................................... 28

4.4.1 State statistics message ............................................................................. 29

4.4.2 Global state statistics message .................................................................... 30

5 USE CASE IMPLEMENTATION .............................................................................. 32

5.1 MAC LEARNING ................................................................................................... 32

5.2 FORWARDING CONSISTENCY .................................................................................... 34

5.3 DDOS DETECTION ................................................................................................ 36

REFERENCES ............................................................................................................ 39





Executive summary This deliverable describes the general data-plane approach (operation, data structures and interfaces) to support the basic management of state transitions directly at data plane level, namely the BEBA Basic forwarding abstraction. This deliverable is organized as follows. In section 1 an overview of the BEBA programmability goals and approach is provided. Section 2 recalls the use cases and requirements described in deliverable D5.1 and underlines what the BEBA basic API supports. Sections 3 and 4 focus on the extensions required by OpenFlow to support the proposed solution, described from a high level point of view and as a detailed OpenFlow experimenter specification respectively. In conclusion, Section 5 provides the description of the implementation of three use cases.





1 Overview of the BEBA programmability goals and approach In this section we recall the goals that we envisioned in the BEBA DOW for a novel behavioural based data plane forwarding abstraction and we introduce the basic approach that will be deeply described in the reminder of this deliverable.

1.1 Problem statement Just a few years ago it was normal to configure network devices using proprietary interfaces, differing across vendors, device types (switches, routers, firewalls, load balancers, etc.), and even different firmware releases for the same equipment. Managing heterogeneous multi-vendor networks of non marginal scale is extremely difficult, and requires a great expertise. OpenFlow [OF08] emerged in 2008 as an attempt to change this situation. OpenFlow's approach was the identification of a vendor-agnostic programming abstraction for configuring the forwarding behavior of switching fabrics. Via the OpenFlow Application Programming Interface (API), network administrators can remotely reconfigure forwarding tables at run-time, probe for flow statistics, and redirect packets not matching any local flow entry towards a central controller for further processing and for taking relevant decisions; in essence ”program” the network from a central control point, clearly separated from the forwarding plane.

Although this vision, which we today call Software Defined Networking (SDN), finds its roots in earlier works [Zeg14], it is not restricted to using OpenFlow as device-level API. It goes well beyond data plane programmatic interfaces among which OpenFlow is recognized as the technology that brought SDN in the real world [Gre09]. OpenFlow was immediately deployable, thanks to its pragmatic balance between open network programmability and real world vendors' and deployers' needs. Starting from the recognition that several different network devices implement somewhat similar flow tables for a broad range of networking functionalities (L2/L3 forwarding, firewall, NAT, etc.) the authors of OpenFlow proposed an abstract model of a programmable flow table. This model is “amenable to high-performance and low-cost implementations, capable of supporting a broad range of research and consistent with vendors' need for closed platforms” (quote from [OF08]). Via the OpenFlow “match/action” abstraction, the device programmer can broadly specify a flow via a header matching rule, associate forwarding/processing actions (natively implemented in the device) to the matching packets, and access bytes/packet statistics associated to the specified flow.

Almost six years have now passed since the OpenFlow inception, and the latest OpenFlow standard, now at version 1.5 [OF1.5], appears way more complex than the initial elegant and simple concept. To fit the real world needs, a huge number of extension (not only the initially foreseen functional ones, such as supplementary actions or more flexible header matching, but also structural ones such as a\ction bundles, multiple pipelined tables, synchronized tables, and many more [Cra13]) were promoted in the course of the standardization process. New





extensions are currently under discussion for the next OpenFlow version, among which flow states that we will further discuss later on.

All this hectic work was not accompanied by any substantial rethinking in the original programmatic abstraction (besides the abandoned Google OpenFlow 2.0 proposal, considered too ambitious and futuristic [Mey13]), so as to properly capture the emerging extensions, simplify their handling [Per13], and prevent the emergence of brittle, platform-specific, implementations [Mey13] which may ultimately threaten the original vendor-independency goal of the OpenFlow inventors.

As a result, even if an OpenFlow device is now rich of functionalities and primitives, it remains completely “dumb”, with all the “smartness” placed at the controller side. Someone could argue that this is completely in line with the spirit of SDN's control and data plane separation. To avoid misunderstandings, we fully agree that network management and control can be (logically) centralized. However we posit that several stateful tasks, just involving “local states” inside “single links/switches” are unnecessarily centralized for easy management and programmability, for the only reason that they cannot be deployed on the local OpenFlow devices without retaining the explicit involvement of the controller for any state update (for a notable example, think to the off-the-shelf Layer 2 MAC learning operation). Ironically, MAC learning is frequently invoked to motivate OpenFlow extensions [Cra13]. For instance, flow table synchronisation (different views of the same data at different points of the OpenFlow pipeline), to permit learning and forwarding functions to access the same data. Another example is the flow monitor (i.e. tracking of flow table changes in a multi-controller deployment), to permit a device natively implementing a legacy MAC learning function to inform the remote controller for every new MAC address learned; in essence to permit to break (!) the original OpenFlow vision of general purpose forwarding device configured only through the data plane programming interface.

As a result, the explicit involvement of the controller for any stateful processing and for any update of the match/action rules, is problematic. In the best case, this leads to extra signalling load and processing delay, and calls for a capillary distributed implementation of the logically centralized controller. In the worst case, the very slow control plane operation a priori prevents the support of network control algorithms that require prompt, real time reconfiguration in the data plane forwarding behavior. In essence, dumbness in the data forwarding plane appears to be a by-product of the limited capability of the OpenFlow data plane API compromise, rather than an actual design choice or an SDN postulate.

In conclusion, as for the programmability limitations described above, the BEBA project and in particular Work Package 2 aims at answering the following question: can we emerge with better SDN data plane APIs which permit to program some level of smartness directly inside the forwarding device?

The BEBA proposal has come to the conclusion that the major shortcoming of OpenFlow is its inability to permit the programmer to deploy states inside the device itself. Adding states to OpenFlow is however not sufficient: the programmer should be entitled to formally specify how





states should be handled, and this specification should be executed inside the device with no further interaction with the controller. Indeed a viable solution must not violate the vendor-agnostic principle that has driven the OpenFlow invention, and which has fostered SDN; in essence, it must emerge as a concrete and pragmatic abstraction, rather than as a technical approach.

1.2 BEBA solution approach Our abstraction relies on “eXtended Finite State Machines” (XFSM). More specifically, the basic BEBA API described in this deliverable is based on a simplified state machine approach called Mealy Machine, which as will be clarified in the next subsections, will be easily (i.e. with minimal structural changes in the internal operations and data structures) mapped to the basic flow table on which OpenFlow poses its forwarding abstraction.

In OpenFlow, a set of actions is associated to a flow match. Our proposed approach exploits the same match/action primitive to add the stateful extension: the match is performed on packet header fields plus a flow state label (to be retrieved from a suitable State Table) and one of the actions associated to the match allows to update a flow state label. Note that a match not triggering any state transition (arguably the most common case) is readily accounted under the special case of self-transitions, i.e. a transition from a state to itself.

The proposed approach can be formally modelled, in an abstract form, by means of a simple Mealy Machine. We recall that a Mealy Machine is an abstract model comprising a 4-tuple , plus an initial starting (DEFAULT) state S0, where:

• S is a finite set of states;

• I is a finite set of input symbols (events);

• O is a finite set of output symbols (actions); and

• T : S * I àS * O is a transition function which maps pairs into pairs.

Similarly to the OpenFlow API, the abstraction is made concrete (while retaining platform independence) by restricting the set O of actions to those available in current OpenFlow devices, and by restricting the set I of events to OpenFlow matches on header fields and metadata easily implementable in hardware platforms. The finite set of states S (concretely, state labels, i.e., bit strings), and the relevant state transitions (i.e. the ”behaviour” of a stateful application) are left to the programmer's freedom. As previously discussed, a transition function T is readily accommodated into a single TCAM entry, hence it uses the same OpenFlow hardware employed for ordinary match/action pairs.

The OpenFlow data plane abstraction is based on a single table of match/action rules for version 1.0, and multiple tables from version 1.1 and on. Unless explicitly changed by the remote controller through flow-mod messages, rules are static, i.e., all packets in a flow





experience the same forwarding behavior. With the proposed approach, we introduce the notion of stateful block, as an extension of a single flow table. Stateful blocks can be pipelined with other stateful blocks as well as ordinary OpenFlow tables. A stateful block is an atomic block comprising two distinct, but interrelated, tables:

• State Table, which stores the state labels associated to flow identities (no state stored meaning DEFAULT state), and

• An FSM execution table, which performs a (wildcard) match on a state label and the packet header fields, and returns an associated forwarding action (action set) and a next state label.

The programmer can specify the operation of a stateful block as follows:

• provide the list of entries to be loaded in the FSM table. Each entry in the FSM table comprises of four columns: i) a state provided as a user-defined label, ii) an event expressed as an OpenFlow match, iii) a list of OpenFlow actions, and iv) a next-state label; each row is a designed state transition. At least one entry in the FSM table must use, in the first column, the DEFAULT state label;

• provide the header field(s) of the packet which shall be used to access the state table during a lookup (read)

• provide a possibly different set of header field(s) of the packet which shall be used to update the state table (write). We note that a straightforward extension could consist in what follows. Rather than using a common (unique) update scope for all the FSM entries, an extended implementation could permit the programmer to specify different keys to be used as a parameters when calling for a state transition.

The decoupling between the read and write keys permits to support a functional extension that we call cross-flow state handling. There are many practically useful stateful control tasks, in which states for a given flow are updated by events occurring on different flows. A prominent example is MAC learning: packets are forwarded using the destination MAC address, but the forwarding database is updated using the source MAC address. Similarly, the handling of bidirectional flows may encounter the same needs; for instance, the detection of a returning TCP SYNACK packet could trigger a state transition on the opposite direction.

The BEBA stateful forwarding approach offloads the control plane management from all the actions that can be defined inside the switch by using the above described programming model, reacting only to the few exceptions not covered by the defined Mealy machine. The main task of the control plane management of a BEBA switch mainly consists of taking high level decisions that will be realized by programming the BEBA switch with a suitable FSM.





To best convey our concept, let's reuse a perhaps niche, but indeed very descriptive example: port knocking, a well-known method for opening a port on a firewall. An host IP that wants to establish a connection (say an SSH session, i.e., port 22) delivers a sequence of packets addressed to an ordered list of pre-specified closed ports, say ports 5123, 6234, 7345 and 8456. Once the exact sequence of packets is received, the firewall opens port 22 for the considered host. Before this stage, all packets (including the knocking ones) are dropped. This example can be easily implemented with the Mealy Machine illustrated in Figure 1. Starting from a DEFAULT state, each correctly knocked port will cause a transition to a series of three intermediate states, until a final OPEN state is reached. Any knock on a different port will reset the state to DEFAULT. When in the OPEN state, only packets addressed to port 22 will be forwarded; all remaining packets will be dropped, but without resetting the state. Note that a controller-based implementation of Port Knocking would require the switch to deliver each and every packet received on a currently blocked port to the controller itself.

Figure 1 Port Knocking example





2 BEBA basic API requirements This section focuses on the main input to task 2.1 “Basic programming abstraction” provided by task 5.1 “Use cases and application scenarios”. In particular this section recalls the BEBA programmability requirements derived from the analysis of the reference use cases documented in D5.1.

2.1 Summary of the proposed use cases BEBA task 5.1 “Use cases and application scenarios” has described the following 13 reference use cases. The use cases emerged after a broad research on both academic and industrial applications, considered important for modern networking architectures. Our objective is to introduce heterogeneity in the use cases to significantly challenge the BEBA abstraction model and API design.

UC01 - In-switch support of legacy control protocols for SDN networks

In this use case a set of BEBA switches are foreseen to support the management of legacy control protocols like ICMP, ARP or DHCP in order to offload the controller. This use case deals with a possible a mechanism that allows a BEBA controller to “program” the BEBA switches to flexibly generate control protocols messages automatically.

UC02 - Programmable network flow measurement

This use case deals with advanced passive measurement of network flows, in-switch aggregation of information into flow records, and export of flow records using standard protocols. Upon receiving a packet, the BEBA switch finds the matching flow record and updates the flow record back. The monitoring process also defines the way to export the flow records to a designated location. The export is in standard IPFIX protocol [IPFIX].

UC03 - Deep monitoring for (proactive) failure detection

This use case has the goal of enabling BEBA switches to catch specific low level events that are ‘suspicious’. Even after the last OpenFlow release which supports TCP flags matching, SDN is still incapable of applying deep packet inspection to support flexible packet matching and cover a very broad range of protocols. Therefore this use case envisions the BEBA switches to support the capture of at least the following events: Dropped packets that are not caused by drop rules, RTTs per neighbour (either host or switch),  TCP retransmissions and re-orderings, Hardware measurements such as CPU/Memory/TCAM utilization, unexpected high traffic volume (e.g. increased byte/packet counters), absence of traffic (e.g. zero byte/packet counters).

UC04 - Adaptive QoS management and admission control

In this use case, BEBA switches are given the capability to automatically manage local QoS parameters and, as an extension, perform admission control to avoid network congestion. The





BEBA switch itself is envisioned to support dynamic adaption of the QoS parameters, according to traffic demands for each class of service and to the state of the queues, in order to provide the most efficient forwarding scheme. The SDN controller is offloaded since no instructions from him are required but instead, it is only notified by the switch each time a QoS parameter, or the forwarding table, are modified.

UC05 - Adaptive treatment of unexpected traffic

In this use case, SDN switches are given the capability to automatically detect suspicious/undesired traffic and react by filtering or diverting some flows in an adaptive way. In order to support such capability the BEBA API should be able to configure per flow timers, metrics and threshold verification.

UC06 - Forwarding Consistency

This use case deals with ensuring consistency in forwarding decisions for packets of the same transport layer flow and imagines the BEBA switch as a load balancer able to ensure consistency in forwarding decisions for packets of the same transport layer flow without requiring to interact with the controller by developing Mealy machines inside the switches. By associating one Mealy machine per flow, the switch forwarding behaviour is dictated by the STATE associated to each flow. For example, the STATE could indicate to which port the packet has to be forwarded.

UC07 - Advanced packet processing

This use case requires the BEBA switches to realize a flexible and high performance data plane that supports advanced data plane primitives like flow state transitions, tunnelling and programmable packet parsing.

UC08 - Distributed, usage-based data rate limiters

This use case is designed to give flexibility to network operators in order to abstract the way that (a) monitoring, (b) usage-based pattern generation, and (c) rate limitation are performed. To achieve this, the first requirement is that the BEBA switch is already infused with some basic monitoring capabilities. Specifically, we need to monitor per-user data rate. A broad set of monitoring activities is proposed by other use cases and sufficiently covers the needs of this use case.

UC09 - Network fault-tolerance

In this use case, SDN switches have the capability to reroute connections in case of failures. More specifically, in this use case the BEBA switches are able to distinguish among different failure scenarios and select the correct forwarding rule based on the STATE of the network without requiring interaction with the controller. This mechanism would minimize the load on the controller and would allow reacting to faults even when the controller is unreachable.





UC10 - Automatic IP/MPLS binding

This use case deals with the problem of announcing IP subnets behind MPLS edge routers and considers a datacentre network where a subset of edge routers, directly connected to the end hosts via edge ports, act as ingress/egress nodes and are connected each other via a core network engineered via MPLS paths. For each incoming packet the ingress router identify which path brings the packet to the egress router, before adding the appropriate MPLS label and forward the packet on the proper path.

UC11 - Dynamic secured tunnelling

In this use case, BEBA switches are able to dynamically setup and release IPsec tunnels for specific traffic. By extending both the switches’ programming capabilities and the actions they are able to deal with secure tunnelling by themselves, BEBA allows pre-configuring the switches to react to specific events very fast, and automatically adapt to the situation. More particular to this use case, switches are programmed to provide on-demand tunneling services. Encrypted tunnels are setup when required, and automatically released when they are not used anymore.

UC12 - DDoS detection and mitigation

In this use case a distributed set of BEBA switches are able (without requiring subsequent interactions with the controller) to monitor specific possible DDoS attack targets, measure a given set of per flow metrics, understand when an attack is being performed and dynamically adapt it forwarding behaviour accordingly (i.e: re-route traffic to further DPI or filter the traffic).

UC13 - Flexible Evolved Packet Core

In this use case, a BEBA switches are used to implement part of the S/PGW functions of an LTE mobile network architecture. In particular, BEBA switches can be used to decapsulate and encapsulate packets in GTP tunnels, while the controller performs the required control plane interactions with the other components of the LTE architecture. When introducing BEBA switches, part of the control tasks, such as tunnel establishment or users’ policies enforcement can be implemented directly by the switches, offloading the controller from some signalling traffic and increasing the overall system scalability.

2.2 What is supported by the BEBA Basic API? This deliverable describes the basic design for a stateful, platform-agnostic, data plane programming interface, which minimally departs from the current OpenFlow specification. We refer to such data plane abstraction as BEBA Basic API. The forwarding abstraction and the resulting interface described in this deliverable fulfil only a subset of the requirements derived from the use cases described in the previous section. The BEBA Basic API will provide the basic support for the execution of Mealy Machine at dataplane. The uncovered requirements will be





addressed by future deliverables in WP2 however, we take advantage of them in order to design the API in the most flexible and extensive way.

More specifically, with respect to deliverable D5.1 in this deliverable the support for the following requirements is documented:

1. REQ-P2 Per state timer management 2. REQ-D11 Update flow state action 3. REQ-D16 Read different flow states for different flow keys 4. REQ-C1 Send state machine models to switches 5. REQ-C2 Sendstate modification 6. REQ-C6 Query the state table

As described in the following sections, all requirements above have been addressed by introducing the following extensions to the standard OpenFlow specification:

1. The new action OFPAT_EXP_SET_STATE and the message OFPT_STATE_MOD with command field set to OFPSC_SET_FLOW_STATE satisfy the REQ-P2

2. The new action OFPAT_EXP_SET_STATE satisfies REQ-D11 3. The new OFPT_STATE_MOD message with command field set to OFPSC_SET_L_EXTRACTOR

or OFPSC_SET_U_EXTRACTOR satisfies REQ-D16 and REQ-C1. 4. The new message OFPT_STATE_MOD with command field set to OFPSC_SET_FLOW_STATE

satisfies REQ-C2. 5. The OFPMP_EXP_STATE_STATS multipart request with exp_type set to

OFPMP_EXP_STATE_STATS satisfies REQ-C6.

This deliverable focuses on the core feature of enabling state transition at the data plane (i.e. the basic behavioural based forwarding abstraction) and thus features strictly related to the design of novel action/instructions like “In switch packet generation”, “Asynchronous notifications” or “Extended flow match” will be documented in WP3 and WP4 related deliverables. Moreover, advanced features like “State transition depending on threshold comparison” or “Per flow programmable metrics, writing/reading” are outside the scope of the BEBA Basic API and will be addressed by the BEBA Extended API.





3 High level description of the BEBA basic abstraction In this section we detail the design of the Mealy Machine based forwarding abstraction by giving a high level description of the main operations and data structures of the BEBA basic abstraction, and in particular those related to stateful pipelining, flow states, global states.

3.1 Stateful pipelining As defined by the OpenFlow specification, a packet entering an OpenFlow switch is processed through a pipeline comprised of a set of linked flow tables that provide matching, forwarding, and packet modification. We indicate with the term stateless stage the processing operated by a single stateless OpenFlow flow table. Conversely, to realize our abstraction, we define as stateful stage (Figure 2) a logical stage comprising a state table and and a flow table.

When a packet enters a stateful stage, it is first processed by a key extractor which produces a string of bits representing the key to be used to match a row in the state table. The key is derived by concatenating the header fields defined in the lookup-scope. The matched state label is appended to the packet headers as an additional header field. By exiting the state table, packet headers along with the returned state label are matched in the flow table.

The flow table is extended by adding support to a new “state” virtual header field to be used to match packets along with other header fields (MPLS, IP, TCP, etc.). We characterize this header as virtual because it is not really appended to the packet header and it is valid only for current processing through the flow table of the stateful stage. Since state values are valid only for a specific flow (defined by the flow key), we call them “flow states”. Finally, a new “set-state” action is introduced to allow the update of the state value for a given flow in a given stateful stage. The set-state action can be appended to the packet action set as any other OpenFlow action.

Figure 2 - Architecture of the BEBA stateful stage

pkt headers + next_state

…

DEFAULT

…

✳

…

statematch key

…

……

……

Key extractor

Key extractor

State table Flow table

pkt headerspkt headers+ state

pkt headers+ actions

………

…

actions

…… …

……

headers statematch fields

flow-modCONTROLLER

state-mod





By DEFAULT all the flow tables in the switch are intended as stateless stages, and this default behaviour makes a BEBA-enabled switch acts as a typical OpenFlow switch. The controller can then enable stateful processing for one or more stages by sending a special control message to the switch and by configuring the key extractors (lookup-scope and update-scope) associated with the state table. Similarly to flow tables, new modification message called “state-mod” has been defined to allow the controller to configure the state entries and key extractors.

Similarly to flow states, BEBA introduces the concept of “global states”, called also “flags”. As suggested by the name, flags are valid globally for every packet processed by the switch. A controller can specify flags as a match field on the header packet, and it can be seen as a filtering of the flow table. A “set-flags” action is defined to allow the update of flags directly from the pipeline processing.

3.2 Flow states

3.2.1 Flow identification Flow states are associated with packets and are valid only inside that stateful stage that produced them. Inside a stateful stage, flows can be arbitrarily defined by using “flow scopes”, which can be seen as a vector of header fields that distinguish one flow from another. For example a Layer 2 flow can be defined by using just the MAC source address and MAC destination address (2 fields), while a flow in the socket sense can be defined by using the whole L2-L4 header (6 fields).

In BEBA, states for a given flow can be updated by events occurring on different flows. A prominent example is MAC learning: packets are forwarded using the destination MAC address, but the forwarding database is updated using the source MAC address. Similarly, the handling of bidirectional flows may encounter the same needs; for instance, the detection of a returning TCP SYNACK packet could trigger a state transition on the opposite direction. In protocols such as FTP, a control exchange on port 21 could be used to set a state on the data transfer session on port 20. For this reason, two types of flow scopes are defined, the “lookup-scope” and the “update-scope”, as the ordered sequence of header fields that shall be used to produce the key used to access the state table and perform, respectively, a lookup or an update operation.

The lookup-scope and the update-scope are intrinsic to the state table and are used to configure the key extraction process.

3.2.2 State table A state table consists of state entries. Each state table entry (see Table 1) contains:

• Key: String of bit used to match the packet flow key obtained from the key extractor;

• State: value associated with a specific flow key





• Timeouts: Maximum amount of time or idle time before the entry is updated with a predefined “rollback state”;

Table 1 - Main components of a state entry in the state table

Key State Timeouts

The match on the state table is performed using the key extracted using the lookup-scope, and it is performed exactly, in other words wildcards are not allowed. In case of a table-miss (the key is not found) then a DEFAULT state is appended to the packet headers. If the header fields specified by the lookup-scope are not found (e.g. extracting the IP source address when the Ethernet type is not IP), a special state value NULL is returned.

If the header fields specified by the update-scope are not found in the packet, the set-state action is not executed.

3.2.2.1 State timeouts As per OpenFlow’s flow table, timeouts can be defined for state entries. In contrast to OpenFlow, state entries are not expired, but instead the state value is updated to a predefined rollback value. Each entry has an idle_timeout and a hard_timout associated with it (Table 2).

Table 2 - Main components of a state entry timeouts in the state table

Idle timeout Hard timeout

Milliseconds Rollback state Milliseconds Rollback state

Timeouts are set the same way state entries are added to the state table, by means of a set-state action or a state-mod command received from the controller. The switch must note the state entry’s arrival (update) time, as it may need to update the entry later. A non-zero hard_timeout field causes the state entry to be updated to the rollback state after the given number of milliseconds, regardless of how many packets it has matched. A non-zero idle_timeout field causes the flow entry to be updated to the rollback state when it has matched no packets in the given number of milliseconds. When a state entry is expired and its value updated with the rollback value, it remains in that state until a new set-state action or state-mod command is performed. That said, in the current version of BEBA is not possible to set timeouts on the rollback state.

When a timeout is set with DEFAULT rollback state, the expiration of the timeout is equivalent to the entry removal.





3.2.3 Set-state action One of the main features introduced in BEBA is the possibility to trigger state transitions as a consequence of packet matching a flow entry. By adding a set-state action to the action set, it is possible to execute state transitions in the same stage or in any other (stateful) stage of the pipeline. Multiple state transitions are allowed by defining more than one set-state action in the action set. It is also possible to perform state transitions from the group table by inserting a set-state action in the action bucket.

When adding a set-state action to an action set/bucket the following parameters can be specified:

• state: value to write in the state table (required)

• state_mask: 32bit integer (optional)

• table_id: target stage to update(required)

• idle_timeout: interval in milliseconds (optional)

• idle_rollback: rollback state value for the idle timeout (optional)

• hard_timeout: interval in milliseconds (optional)

• hard_rollback: rollback state value for the hard timeout (optional)

When the switch executes a set-state action, the packet header is processed by the update-scope key extractor of the specific state table (table_id), the corresponding entry is then updated.

It is important to underline that OpenFlow actions in an action set are applied in the order specified in the OpenFlow specification, regardless of the order that they were added to the set. If we want to perform set-state action before some other action of the same stage alters its header fields (potentially the fields of the update-key), we should define the highest possible priority for the set-state action. The problem is that an experimenter action OFPAT EXPERIMENTER has a unique priority (specified in ofsoftswitch13): we cannot differentiate the priority of two experimenter actions and we cannot change the priority of an experimenter action itself. As explained in the OpenFlow specification, the switch may support arbitrary action execution order through the Apply-Actions instruction (instead of Write-Actions). Thus, the programmer must use the apply-action (instead of write-action) taking care of actions order to guarantee priorities of OF specifications.

3.2.3.1 Atomicity As defined in OpenFlow, actions are usually executed at the end of the pipeline. The same applies for the set-state action, thus making the state update operation not atomic by DEFAULT. Not enforcing atomicity can bring to consistency issues when more than one packets





are processed by the pipeline at the same time. The only way to guarantee state consistency between packets is to call the set-state action from the OpenFlow Apply-action instruction (instead of the Write-actions instruction) in order to be sure to update the value contained in the state table when exiting a specific stage of the pipeline.

3.2.4 State modification commands 4 different state modification commands (state-mod) are defined in BEBA:

• Set-lookup-extractor: allows the controller to set the header fields’ vector for the lookup-scope of the state table.

• Set-update-extractor: allows the controller to set the header fields’ vector for the update-scope of the state table.

• Set-flow-state: allows the controller to add or update a state entry in the state table;, in this case the same parameters of the set-state actions can be used.

• Delete-flow-state: allows the controller to delete a state entry in the state table. This command is equivalent to invoking a set-flow-state command or a set-state action with DEFAULT state.

• Stateful-table-config: allows the controller to explicitly tell the switch to allow or disallow stateful processing for a given table of the pipeline. Upon receipt of this command the switch has to instantiate the necessary resources for a state stable and connected structures (key extractors, timeout timers, etc.) Otherwise, the switch utilizes this table as an ordinary SDN table.

3.3 Global states By extending the flow state concept some states could be shared among multiple flows. For this reason global states have been introduced. These states (a.k.a. flags) are defined at the datapath level and are not related to a single flow of a particular stage. Now each incoming flow’s packet can be matched also according to the current value of global states. Global states can be updated by means of a new action “set-flags” that can be triggered by a match in the flow table (or group table). Furthermore, the controller is able to modify and reset global states value of a specific switch exploiting the new flag modification messages.

3.3.1 Set-flag action Global states can be modified as a consequence of packet matching a flow entry. By adding a set-flag action to the action set (or action bucket in the group table), it is possible to modify the global state values in any stage of the pipeline. Using the set-flag action values can be totally overwritten or, by using a mask, selectively modified.





When adding a set-flag action to an action set/bucket the following parameters can be specified:

• flag: bit string representing the flag values (required)

• flag_mask: bit mask to selectively modify only some of the flags (optional)

3.3.2 Flag modification commands The following types of flag modification commands are defined:

• Set-flags: allows the controller to update the value of global states. Values can be totally overwritten or, by using a mask, selectively modified.

• Reset-flags: allows the controller to reset the flags to a DEFAULT value.





4 BEBA basic stateful protocol This section describes an extension to the OpenFlow protocol specification v1.3 to enable support to stateful packet forwarding inside OpenFlow-enabled switches. Backward compatibility with OpenFlow is always guaranteed since existing elements and primitives are not modified in a way that breaks compatibility.

The “official way” to extend OpenFlow is to use the pre-defined experimenter structures already provided by OpenFlow. From the OpenFlow specification: “Experimenter messages provide a standard way for OpenFlow switches to offer additional functionality within the OpenFlow message type space. This is a staging area for features meant for future OpenFlow revisions”. Considering future BEBA standardization efforts aiming at submitting OpenFlow extensions to the ONF, the OpenFlow experimenter messages represents the most suitable way to extend OpenFlow for the BEBA project purposes.

4.1 Experimenter messages OpenFlow experimenter messages have the following structure:

struct ofp_experimenter_header { struct ofp_header header; /* Type OFPT_EXPERIMENTER. */ uint32_t experimenter; /* Experimenter ID */ uint32_t exp_type; /* Experimenter defined. */ /* Experimenter-defined arbitrary additional data. */ }; OFP_ASSERT(sizeof(struct ofp_experimenter_header) == 16);

BEBA experimenter messages have type field in ofp_header set to OFPT_EXPERIMENTER. experimenter field is set to 0xBEBABEBA and exp_type field is set to one of the following types:

enum ofp_exp_messages { OFPT_EXP_STATE_MOD = 0, };

4.1.1 State modification messages Configurations and modifications to the state table from the controller are performed with the State Modification message. This message is an experimenter message with exp_type field set to OFPT_EXP_STATE_MOD.

/* * Structure of the state modification message.





*/ struct ofp_exp_msg_state_mod { struct ofp_experimenter_header header; uint8_t command; uint8_t pad; uint8_t payload[]; };

The payload structure depends on the value of the command field. The differences between the seven commands are already explained in section 3.2.4.

/* * Possible values for 'command' field in ofp_exp_msg_state_mod */ enum ofp_exp_msg_state_mod_commands { OFPSC_STATEFUL_TABLE_CONFIG = 0, OFPSC_SET_L_EXTRACTOR, OFPSC_SET_U_EXTRACTOR, OFPSC_SET_FLOW_STATE, OFPSC_DEL_FLOW_STATE, OFPSC_SET_GLOBAL_STATE, OFPSC_RESET_GLOBAL_STATE };

Stateful table configuration To configure a stage as stateful the controller sends a OFPT_EXP_STATE_MOD message with command field set to OFPSC_STATEFUL_TABLE_CONFIG and a payload structure as defined by ofp_exp_stateful_table_config. During packet processing a stateful stage retrieves the flow state and adds it to the packet header. If state is NULL, no state field is appended.

struct ofp_exp_stateful_table_config { uint8_t table_id; uint8_t stateful; };

The table_id field specifies the table to be modified.

The stateful field allows to configure the stage: 0 value makes the stage stateless, any value different from 0 makes it stateful.

Lookup/Update scope configuration An OFPT_STATE_MOD message with command field set to OFPSC_SET_L_EXTRACTOR or OFPSC_SET_U_EXTRACTOR must have a payload structure as defined by ofp_exp_set_extractor.

/* * Max number of fields for the key extractor vector





*/ #define OFPSC_MAX_FIELD_COUNT 6 struct ofp_exp_set_extractor { uint8_t table_id; uint8_t pad[3]; uint32_t field_count; uint32_t fields[OFPSC_MAX_FIELD_COUNT]; };


The field_count field specifies the number of fields provided in fields[], which contains the vector of Type Length Value (TLV) fields composing the key extractor.

Set flow state message An OFPT_STATE_MOD message with command field set to OFPSC_SET_FLOW_STATE has a payload structure as defined by ofp_exp_set_flow_state.

/* * Number of bytes composing the state key. */ #define OFPSC_MAX_KEY_LEN 48 struct ofp_exp_set_flow_state { uint8_t table_id; uint8_t pad[3]; uint32_t key_len; uint32_t state; uint32_t state_mask; uint32_t hard_rollback; uint32_t idle_rollback; uint32_t hard_timeout; uint32_t idle_timeout; uint8_t key[OFPSC_MAX_KEY_LEN]; };


The key_len field specifies the key size (number of bytes) of the key provided in key[].

The state field contains the state to be inserted (or updated) in the state table.

The state_mask field specifies which bits of the state should be modified. A state_mask with value 0xFFFFFFFFFFFFFFFF indicates that the state field should be entirely overwritten.

hard_timeout and idle_timeout fields specifiy the number of microseconds before state is rollbacked to hard_rollback and idle_rollback, respectively.





The key field contains the key used to access the state table, splitted in bytes (e.g.: ip 10.0.0.1 is stored as [10,0,0,1]).

Delete flow state message An OFPT_STATE_MOD message with command field set to OFPSC_DEL_FLOW_STATE has a payload structure as defined by ofp_exp_del_flow_state.

/* * Number of bytes composing the state key. */ #define OFPSC_MAX_KEY_LEN 48 struct ofp_exp_del_flow_state { uint8_t table_id; uint8_t pad[3]; uint32_t key_len; uint8_t key[OFPSC_MAX_KEY_LEN]; };


The key_len field specifies the key size (number of bytes) of the key provided in key[].

The state field contains the state to be inserted (or updated) in the state table.

The state_mask field specifies which bits of the state should be modified. A state_mask with value 0xFFFFFFFFFFFFFFFF indicates that the state field should be entirely overwritten.

The key field contains the key used to access the state table, split in bytes (e.g: ip 10.0.0.1 is stored as [10,0,0,1]).

Global state modification message An OFPT_STATE_MOD message with command field set to OFPSC_SET_GLOBAL_STATE has a payload structure as defined by ofp_exp_set_global_state.

struct ofp_exp_set_global_state { uint32_t flag; uint32_t flag_mask; };

The flag field specifies the new value of global states.

The flag_mask field specifies which bits of the global state should be modified. A flag_mask with value 0xFFFFFFFF indicates that the global state field should be entirely overwritten.





Global state reset message An OFPT_STATE_MOD message with command field set to OFPSC_RESET_GLOBAL_STATE has an empty payload. This message reset the global state value to OFP_GLOBAL_STATES_DEFAULT.

#define OFP_GLOBAL_STATES_DEFAULT 0

4.2 Experimenter actions OpenFlow experimenter actions have the following structure:

struct ofp_action_experimenter_header { uint16_t type; /* OFPAT_EXPERIMENTER. */ uint16_t len; /* Length is a multiple of 8. */ uint32_t experimenter; /* Experimenter ID */ /* Experimenter-defined arbitrary additional data. */

}; OFP_ASSERT(sizeof(struct ofp_action_experimenter_header) == 8);

BEBA experimenter actions have the following structure:

struct ofp_beba_action_experimenter_header { struct ofp_action_experimenter_header header; uint32_t act_type; uint8_t pad[4];

}; OFP_ASSERT(sizeof(struct ofp_beba_action_experimenter_header) == 16);

type field and experimenter field in ofp_action_experimenter_header are set to OFPAT_EXPERIMENTER and 0xBEBABEBA respectively.

act_type field is set to one of the following values:

enum ofp_exp_actions { OFPAT_EXP_SET_STATE = 0, OFPAT_EXP_SET_FLAG };

4.2.1 Set-state action This action is a BEBA experimenter action with act_type field set to OFPAT_EXP_SET_STATE. This action allows to set flow states in a particular stage of the pipeline. The following structure describes the body of the set-state action:

/* * Action structure for OFPAT_EXP_SET_STATE. */ struct ofp_exp_action_set_state {





struct ofp_beba_action_experimenter_header header; uint32_t state; /* State value. */ uint32_t state_mask; /* State mask */ uint8_t table_id; /* Stage destination */ uint8_t pad[3]; uint32_t hard_rollback; uint32_t idle_rollback; uint32_t hard_timeout; uint32_t idle_timeout; uint8_t pad2[4]; /* Align to 64-bits. */ }; OFP_ASSERT(sizeof(struct ofp_exp_action_set_state) == 32);

The state field is used to set the value to be inserted (or updated) in the state table.

The state_mask field specifies which bits of the state field should be modified. A state_mask with value 0xFFFFFFFF indicates that the state field must be entirely overwritten.

The table_id field specifies the target stage of the state update action.

hard_timeout and idle_timeout fields specifiy the number of microseconds before state is rollbacked to hard_rollback and idle_rollback, respectively.

4.2.2 Set-flag action This action is a BEBA experimenter action with act_type field set to OFPAT_EXP_SET_FLAG. This action allows to set global states. The following structure describes the body of the set-flag action:

/* * Action structure for OFPAT_EXP_SET_FLAG */ struct ofp_exp_action_set_flag { struct ofp_beba_action_experimenter_header header; uint32_t flag; /* Flag value */ uint32_t flag_mask; /* Flag mask */ }; OFP_ASSERT(sizeof(struct ofp_exp_action_set_flag) == 24);

The flag field specifies the new value of global states.

The flag_mask field specifies which bits of the global state should be modified. A flag_mask with value 0xFFFFFFFF indicates that the global state field should be entirely overwritten.





4.3 Match fields The BEBA Experimenter flow match fields are standard OpenFlow Extensible Match OXM TLV. The standard flow match field structure consists of a header (oxm_class, oxm_field, oxm_hasmask and oxm_length) and a body.

The oxm_class must be set to OFPXMC_EXPERIMENTER.

The oxm_field can have the following values:

enum oxm_exp_match_fields { OFPXMT_EXP_FLAGS, /* Global States */ OFPXMT_EXP_STATE /* Flow State */ };

The first four bytes of the OXM TLV’s body contains the BEBA’s experimenter identifier 0xBEBABEBA. The usual oxm_value and oxm_mask start from the fifth byte.

4.3.1 State match field The experimenter OFPXMT_EXP_STATE field is used in the flow table in order to match on the state value defined in the virtual packet header field, which is returned by a state table in a stateful stage. It is a 32 bit field.

OFPXMT_EXP_STATE is maskable, so it is possible to match it either exactly or with wildcards. A 0 bit in the mask means ith state’s bit is “do not care”, while a 1 bit value means “exact match”.

/* * Flow state field definition */ #define OXM_EXP_STATE OXM_HEADER (0xFFFF, 1, 8) #define OXM_EXP_STATE_W OXM_HEADER_W (0xFFFF, 1, 6)

4.3.2 Flags match field Right after the packet headers are parsed, the global states are retrieved and written in the flags field. OXM_EXP_FLAGS is a field with mask, so it is possible to match it either exactly or with wildcards. A 0 bit in the mask means ith flags value is “do not care”, while a 1 bit value means “exact match”.

/* Global States */ #define OXM_EXP_FLAGS OXM_HEADER (0xFFFF, 0, 8) #define OXM_EXP_FLAGS_W OXM_HEADER_W (0xFFFF, 0, 6)

Example match:





flags=(4,5)

This command allows to match over *****************************1*0 flags configuration (4 in binary is 100 and the mask 5 is 101 that is exact match on LSB 1 (0 value) and LSB 3 (1 value) and “don’t care” over all the other flags. In order to perform an exact match on flags value no mask is required.

Example match:

flags=4

NB: this match is very different from the previous one. With this command we are matching over 00000000000000000000000000000100 flags configuration, so it is an exact match.

4.4 Experimenter statistics message While the system is running, the controller may request information from the datapath using the OFPT_MULTIPART_REQUEST message:

struct ofp_multipart_request { struct ofp_header header; uint16_t type; /* One of the OFPMP_* constants. */ uint16_t flags; /* OFPMPF_REQ_* flags. */ uint8_t pad[4]; uint8_t body[0]; /* Body of the request. */ }; OFP_ASSERT(sizeof(struct ofp_multipart_request) == 16); enum ofp_multipart_request_flags { OFPMPF_REQ_MORE = 1





An OpenFlow experimenter multipart message has the type field set to OFPMP_EXPERIMENTER (in both ofp_multipart_request and ofp_multipart_reply) and the first bytes of the request and reply bodies have the following structure:

/* Body for ofp_multipart_request/reply of type OFPMP_EXPERIMENTER. */ struct ofp_experimenter_stats_header { uint32_t experimenter; /* Experimenter ID */ uint32_t exp_type; /* Experimenter defined. */ /* Experimenter-defined arbitrary additional data. */ }; OFP_ASSERT(sizeof(struct ofp_experimenter_stats_header) == 8);

All the BEBA experimenter multipart messages have the experimenter field set to 0xBEBABEBA and exp_type field is set to one of the following types:

enum ofp_stats_extension_commands { OFPMP_EXP_STATE_STATS, OFPMP_EXP_FLAGS_STATS };

4.4.1 State statistics message The State statistics message is a BEBA’s multipart experimenter message having as exp_type field the value OFPMP_EXP_STATE_STATS. This message is used by the controller to get statistics from either all the state tables or a single one. Furthermore, it is possible to retrieve information about a set of state entries that satisfy a specific key.

/* Body for ofp_multipart_request of type OFPMP_EXP_STATE_STATS. */ struct ofp_exp_state_stats_request { struct ofp_experimenter_stats_header header; uint8_t table_id; /* ID of table to read (from ofp_table_stats), OFPTT_ALL for all tables. */ uint8_t get_from_state; uint8_t pad[2]; /* Align to 64 bits. */ uint32_t state; struct ofp_match match; /* Fields to match. Variable size. */ }; OFP_ASSERT(sizeof(struct ofp_exp_state_stats_request) == 24); /* Body of reply to OFPMP_EXP_STATE_STATS request. */ struct ofp_exp_state_stats_reply{ struct ofp_experimenter_stats_header header; struct ofp_exp_state_stats *stats; };





In both the messages, you have to specify the experimenter ID and the multipart experimenter type as specified in the structure ofp_experimenter_stats_header. In the request message, the table_id field specifies the table to be queried and the matchfieldsfields are used to selectively extract specific state entries.

If get_from_state field is different from 1, this message queries the state table for obtaining entries in a specific state. If get_from_state is 0, statefield is ignored.

The reply to a OFPMP_EXP_STATE_STATS multipart request consists of an array of the following

/*Structure of a single state statistic*/ struct ofp_exp_state_stats { uint16_t length; /* Length of this entry. */ uint8_t table_id; /* ID of table flow came from. */ uint8_t pad; uint32_t field_count; /*number of extractor fields*/ uint32_t fields[OFPSC_MAX_FIELD_COUNT]; /*extractor fields*/ struct ofp_exp_state_entry entry; /* Description of fields. Variable size. */ }; OFP_ASSERT(sizeof(struct ofp_exp_state_stats) == 88); /*Structure of a single state entry*/ struct ofp_exp_state_entry{ uint32_t key_len; uint8_t key[OFPSC_MAX_KEY_LEN]; uint32_t state; }; OFP_ASSERT(sizeof(struct ofp_exp_state_entry) == 56);

If more than one entry matches the request, multiple ofp_exp_state_stats will be appended to the message. The length of a single ofp_exp_state_stats is stored in the length field, the table_id field specifies the source table, the field_count specifies the number of extractor field stored in the fields field and the entry field specifies the state entry: key, key_len and state.

4.4.2 Global state statistics message The Global states statistics message is an BEBA’s multipart experimenter message having as exp_type field the value OFPMP_EXP_FLAGS_STATS. It is used by the controller to retrieve the values of the global states of a specific datapath.

/* Body for ofp_multipart_request of type OFPMP_EXP_FLAGS_STATS. */ struct ofp_exp_global_state_stats_request { struct ofp_experimenter_stats_header header;





}; OFP_ASSERT(sizeof(struct ofp_exp_global_state_stats_request) == 8); /* Body of reply to OFPMP_EXP_FLAGS_STATS request. */ struct ofp_exp_global_state_stats { struct ofp_experimenter_stats_header header; uint8_t pad[4]; uint32_t global_states; }; OFP_ASSERT(sizeof(struct ofp_exp_global_state_stats) == 16);

In both the messages, you have to specify the experimenter ID and the multipart experimenter type as specified in the structure ofp_experimenter_stats_header. The request message has an empty body, instead the reply message has as global_states value a 32 bit long field that specifies the current global states value.





5 Use case implementation In order to have a better understanding of the novel BEBA basic forwarding abstraction, in this section we describe how three simple, and yet explanatory, applications are implemented by a BEBA switch supporting the basic API. Please note that this is a high-level description of the implementation; a deeper view of the controller’s source code and any further practical considerations will be provided in WP5 specific deliverables.

5.1 MAC learning This use case is very simple and yet really useful to fully understand several aspects of the proposed data plane forwarding approach. With a BEBA switch a MAC learning operation becomes trivial, while in case of “standard” OpenFlow would require an explicit interaction with the controller for each new flow.

Figure 3 Mealy Machine for the MAC learning use case

In the behavioural model depicted in Figure 3, we identify the state associated to a flow identity (namely, a MAC address) as the current switch port to which packets should be forwarded (or DEFAULT if no port has been yet learned). During state lookup, the lookup-scope is set to be the MAC destination address. During state update, we define as udpate-scope the MAC source address. Finally, we fill the FSM table with the transitions given in Figure 4. Thanks to the udpate-scope, the pair used in the State Table update is thus .

Let us consider a simple bidirectional packet exchange between two hosts, H1 behind port 1 and H2 behind port 2. At time 0 the switch FSM table and state table are empty and H1 sends an Ethernet frame (the actual Ethernet payload type is irrelevant for this use case) to H2 (and





thus eth_src=H1 and eth_dst=H2). The look-up scope defined by the user enforces the switch to retrieve the status associated to eth_dst=H2. Since the state table is empty (i.e. the switch does not yet know the location of any host attached to itself), the flow associated to H2 is in DEFAULT state (i.e. state 0). Since the input port is 1, the first line of the table in Figure 4 is matched and the packet is flooded through all possible output port (except port 1). At the same time the action set for the matched flow entry makes use of the novel set_state() primitive to learn where the sender is located. More specifically, the status for the flow identified by the update scope eth_src=H1 is set to 1 (i.e. the port form which the packet was received).

Figure 4 FSM table for the MAC learning use case

From this point on, all packet addressed to H1 will not be flooded anymore as any packet with eth_dst=H1 will be associated to a flow in state=1, and thus forwarded through port 1.

Please note that in this use case we do not specify a value for the idle timeout. Nevertheless, by setting an idle timeout value different from 0 we can trivially implement an aging time mechanism and free the state table memory for useless expired entries.

Priority' Match' Ac-ons'

0' in_port=1,''state=0' set_state(1,'0),''flood()'

0' in_port=1,''state=1' set_state(1,'0),''output(1)'

0' in_port=1,''state=2' set_state(1,'0),''output(2)'.'.'.!

0' in_port=2,''state=0' set_state(2,'0),''flood()'

0' in_port=2,''state=1' set_state(2,'0),''output(1)'.'.'.!

0' in_port=N,'state=N' set_state(N,'0),''output(N)'

lookup_scope: [eth_dst] update_scope: [eth_src]





5.2 Forwarding Consistency

Figure 5 Mealy machine for the forwarding consistency scheme

Load balancing traffic over multiple paths (also known as load sharing) is an important feature that allows flexible and efficient allocation of network resources. The trick here is to have network switches use i) a link selection scheme that guarantees the desired (optionally weighted) splitting and, most important, ii) consistency on the forwarding of packets of the same transport layer flow (i.e. TCP) in order to avoid packet reordering at the receiver, which can cause unnecessary throughput degradation.

Starting from OpenFlow 1.1, the select group type has been introduced to support load sharing over multiple ports. Citing the latest OpenFlow 1.5 specification “Packets are processed by a single bucket in the group, based on a switch-computed selection algorithm (e.g. hash on some user-configured tuple or simple round robin). All configuration and state for the selection algorithm is external to OpenFlow”. Thus in Open-Flow selection and consistency are tied together and left out to vendors implementation. For example, HP OpenFlow switches use a per-packet round-robin scheduler with no consistency features [HPOF14] while older versions of Open vSwitch used only a hash on the Ethernet destination address (without any proper rationale behind this decision [PFA14].

Different hashing schemes exists, each one with its associated trade offs, thus we argue that choosing a selection scheme should be separated from the granularity of the states required to provide consistency. For example it has been shown in [KAN07] how providing consistency at





level of TCP bursts (instead of pinning the whole flow to a specific path) guarantees more accurate load shares with hardly any out-of- order packet.

Figure 6 Implementation of the destination based load balancer using the forwarding consistency mechanism described in this section

By using flow states and associated idle timeouts, the BEBA forwarding abstraction allows a programmer to choose the granularity and the lifetime of a forwarding decision. Figure 5 shows the behavioural model (in the form of a Mealy machine) used to implement such a scheme, while Figure 6 presents a detailed description of the tables needed to implement a destination-based load balancer using the BEBA basic API. The granularity of the splitting is defined using the lookup-scope, in this example a 4-tuple is used to define a unique TCP flow. For each incoming packet of a new TCP connection, a state 0 (DEFAULT) is returned by the state table, the corresponding group entry is invoked by the flow table based on the matched destination IP address. Finally, a random bucket is selected from the group entry, the state is updated in the state table and the packet forwarded accordingly. Subsequent packets will be forwarded using the value returned from the state table. By using an idle timeout = δ we can define the lifetime of the forwarding decision. For example, with δ = 10s the state will be maintained only if a packet of given TCP flow (otherwise described by a different lookup-





scope, e.g. UDP flow, only L2 source-destination, etc.) is seen at least once every 10s. In this case it is safe to say that an idle interval of 10s represents the end of an instance of a TCP flow. As an alternative, smaller values of δ can be evaluated and used to distinguish bursts of the same flow. In this case a new forwarding decision will be taken for each burst, maximizing load share accuracy while minimizing the risk of packet reorder at the receiver.

The benefits of using such approach to implement a flexible forwarding consistency scheme are highlighted when comparing an implementation using OpenFlow switches not providing any means of forwarding consistency, like in the HP case presented above. In this case each time the first packet of a new instance of a transport layer flow is received by the switch, and upon selecting an output port by using the group table, the switch must inform the controller of the decision, which in turn replies by installing an higher priority flow-mod that guarantees consistency by explicitly forwarding all packets of that flow using the previously selected output port. It is clear how the switch-controller RTT and the processing delay at the (logically centralized, i.e. distributed) controller make this approach hardly scalable in large networks with an increasing arrival rate of new flows. The same reactive mechanism applies when a different hashing scheme from the one implemented by switches is required. Analogously, the idea of consistently splitting packet bursts by maintain states at the controller would be totally nonviable given the high frequency of control messages needed. A more detailed performance evaluation will be documented in WP5-specific deliverables.

5.3 DDoS detection In this section we describe a simple DDoS detection and mitigation mechanism. The proposed use case is not meant to introduce a novel security algorithm but it rather demonstrates a basic BEBA capability that could not be offered by the standard stateless OpenFlow data plane without requiring to forward each new connection to a controller, i.e. the ability of identifying flows generated before and after a given event (an attack in this case).

This application performs two monitoring stages. In Stage1, the switch monitors the bit-rate of incoming TCP SYN packets addressed to a finite list of possible destinations and keep a state for each source/destination IP flows. For each new flow, identified by the pair (IP.src,IP.dst), the switch acts according the following strategy: if a new flow is addressed to a destination for which the SYN bit-rate is under a given threshold, the flow is marked as GREEN and forwarded through the proper switch output port. If instead a new flow is addressed to a destination for which the meter threshold is exceeded, the flow is marked as YELLOW and forwarded to a second monitoring stage. From this point on, all new connections to the same IP address will be forwarded to a second monitoring stage, while all GREEN flows (i.e. those generated before the actual attack) are forwarded. The second monitoring stage (Stage2) is analogous to the first one. If for a given IP address a second SYN-rate threshold is exceeded, all new flows addressed to this IP address are dropped.





Figure 7 DDoS implementation

Figure 7 describes the implementation of the simple mechanism described above, which is realized with a four tables pipeline: Table0 and Table1 implement Stage1; Table2 and Table3 implement Stage2.

Table0 and Table2 are configured to measure the SYN rate towards a predefined set of IP destinations and are instantiated at startup by the controller using OpenFlow DSCP meters. It

key$ state$

IP.src|IP.dst=10.0.0.1|10.0.0.2$ GREEN$

IP.src|IP.dst=10.0.0.2|10.1.0.2$ YELLOW$

STATE TABLE

XFSM TABLE match$ ac=ons$

state=DEF,$dscp=$0$ set_state(GREEN),$FWD$

state=DEF,$dscp=1$ set_state(YELLOW),$GOTO$T2$

state=GREEN$ set_state(GREEN),$FWD$

state=YELLOW,$$dscp=0$ set_state(GREEN),$FWD$

state=YELLOW,$dscp=1$ set_state(YELLOW),$GOTO$T2$

match$ ac=ons$

ip.dst=10.0.0.2,$SYN$ Meter1a,$GOTO$Table1$




*$ FORWARD$

Table0

key$ state$

IP.src|IP.dst=10.0.0.2|10.1.0.2$ YELLOW$

IP.src|IP.dst=10.3.0.10|10.0.0.2$ RED$

STATE TABLE

XFSM TABLE

match$ ac=ons$

State=DEF$ set_state(YELLOW),$FWD$

state=YELLOW,$dscp=1$ set_state(YELLOW),$FWD$

state=YELLOW,$dscp=2$ set_state(RED),$DROP$

state=RED,$dscp=2$ set_state(RED),$DROP$

state=RED,$dscp=1$ set_state(YELLOW),$FWD$

Table3

Table1

match$ ac=ons$

ip.dst=10.0.0.2,$SYN$ Meter1b,$GOTO$Table3$




*$ FORWARD$

Table2

Lookup scope: ip.dst Update scope: ip.dst

Lookup scope: ip.dst Update scope: ip.dst





is worth noting that the DSCP field is used to propagate information between tables. A much cleaner solution might be based on a new meter type able to write metadata. As result of these meter instructions, all flows exceeding the given threshold and burst size will be marked with a DSCP field set to 1 or 2 by respectively Table0 and Table2.

Table1 implements the actual FSM for Stage1. As described in figure 7 the state table has lookup and update scopes set to IP.src|IP.dst and all possible state are either DEFAULT, GREEN or YELLOW. The FSM table implements the following behavior.

For each new flow (state DEFAULT) the switch checks for the DSCP field. If DSCP = 0 (meter band under threshold), the flow is marked with state GREEN and forwarded (entry 1). If DSCP = 1 (meter band threshold exceeded), the flow is marked with state YELLOW and the packet is pipelined to Table2 (entry 2). All packets belonging to flows marked as GREEN (entry 3) are forwarded and kept as GREEN (and do not pass through Stage2). For each packet belonging to a flow marked as YELLOW, the switch checks for the DSCP field. If DSCP = 0 (meter band rolled back under threshold), the flow state is set to GREEN and the packet is forwarded (entry 4). If DSCP = 1 (meter band still over threshold), the flow is marked with state YELLOW and the packet is pipelined to Table2 (entry 5).

Table3 implements the actual FSM for Stage2. The state table has the same lookup and update scopes as Table1 and all possible states are either DEFAULT, YELLOW or RED. The FSM table implements the following behavior.

If the table receives a packet with DEFAULT state (i.e. the first packet of a flow pipelined to table3), the flow is marked YELLOW and the packet is forwarded, regardless the DSCP field value (entry 1). For each packet belonging to a flow marked as YELLOW, the switch checks for the DSCP field. If DSCP = 1 (second meter band under threshold), the flow state is kept to YELLOW and the packet is forwarded (entry 2). If DSCP = 2 (second meter band threshold exceeded), the flow state is set to RED and the packet is dropped (entry 3). Finally, for all packets marked as RED the switch checks for the DSCP field. If DSCP = 2 (meter band still over threshold) the packet is dropped and the flow state is kept unchanged (entry 4). If DSCP = 1 (meter band back under threshold) the packet is forwarded and the flow state is rolled back to YELLOW (entry 5).





References [Gre09] K. Greene. “TR10: Software-defined networking”, 2009. MIT Technology Review [HPOF14]HP OpenFlow 1.3 Administrator Guide,” Oct. 2014. [Online]. Avail- able:

http://h10032.www1.hp.com/ctg/Manual/c04495114 [Cra13] B. Mack-Crane. “OpenFlow Extensions”. In: US Ignite ONF GENI workshop, October

8, 2013. [IPFIX] B. Claise et al, “Specification of the IP Flow Information Export (IPFIX) Protocol for

the Exchange of Flow Information”, IETF RFC 7011 [Mey13] D. Meyer, “OpenFlow: Today’s Reality, Tomorrow’s Promise? An Architectural

Perspective”, available online at http://www.1-4-5.net/~dmm/talks/upperside, March 2013.

[OF08] N. McKeown, et al. “OpenFlow: enabling innovation in campus networks”, ACM SIGCOMM Comput. Commun. Rev. 38, 2, pp. 69-74, March 2008.

[OF1.5] Open Networking Foundation. “OpenFlow Switch Specification ver. 1.5.0”. In: Oct 14, 2013.

[Per13] P. Peresini, M. Kuzniar, and D. Kostic. “OpenFlow Needs You! A Call for a Discussion About a Cleaner OpenFlow API”.In Proc. of the EU Workshop on Software Defined Network (EWSDN). 2013

[PFA14] B. Pfaff, “OpenFlow 1.3 groups with type select,” ovs-discuss (mailing list), May 2014. [Online]. Available: http://openvswitch.org/pipermail/ discuss/2014- May/014118.html

[KAN07] S. Kandula, D. Katabi, S. Sinha, and A. Berger, “Dynamic load balancing without packet reordering,” SIGCOMM Comput. Commun. Rev., vol. 37, no. 2, pp. 51–62, Mar. 2007

[Zeg14] N. Feamster, J. Rexford, and E. Zegura. “The Road to SDN: An Intellectual History of Programmable Networks”. In ACM Queue, to appear (2014).

BEBA D2.1 Basic BEBA Abstraction API.v1.0 final · BEBA Behavioural Based Forwarding Deliverable Report D2.1 Basic BEBA abstraction API Project co-funded by the European Commission

Documents