Cisco ACI in Telecom Data Centers White Paper...Since telecom data centers are becoming distributed, there is a demand to have centralized management and consistent policy across any

© 2018 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 1 of 50

White Paper

Using Cisco ACI in Telecom Data Centers to Enhance Automation,

Service Chaining, Scalability, Operational Simplification,

Troubleshooting, and provide Consistent Policy across any

location


Contents

Telco data center trends ......................................................................................................................................... 3

Cisco Application Centric Infrastructure (Cisco ACI) in telco data centers ....................................................... 4 Intent-based fabric ................................................................................................................................................ 5 Bare-metal appliance, virtual machine, and container support .............................................................................. 5 Integrated security ................................................................................................................................................ 7 Control plane and user-plane separation (CUSP) with Cisco ACI Anywhere ........................................................ 8 High performance, multispeed, and scale ........................................................................................................... 14 Hardware telemetry and operational tools .......................................................................................................... 15 Intelligent service chaining .................................................................................................................................. 16

Cisco ACI use case for Gi-LAN ............................................................................................................................. 17 Gi-LAN using intelligent service chaining ............................................................................................................ 17

Automatic traffic symmetry, simplified expansion, and load-balancing ........................................................... 17 Service node health check ............................................................................................................................. 20 Multinode service chaining ............................................................................................................................. 25 Cisco ACI OpenStack plug-in ......................................................................................................................... 25

Cisco ACI use case for Evolved Pocket Core (EPC) and virtual Evolved Packet Core (vEPC) ....................... 28 Control/data plane connectivity ........................................................................................................................... 29 Cisco ACI integration with Cisco Network Functions Virtualization Infrastructure (NFVI) ................................... 34 Faster convergence ............................................................................................................................................ 35

Simplified operations ............................................................................................................................................ 36 Topology dashboard ....................................................................................................................................... 37 Health score card ........................................................................................................................................... 41 Faults across fabric ........................................................................................................................................ 43 Upgrade and downgrade of fabric .................................................................................................................. 43 Capacity dashboard ....................................................................................................................................... 44 Endpoint tracker ............................................................................................................................................. 44 Troubleshooting wizard .................................................................................................................................. 45 Traffic map and statistics ................................................................................................................................ 46 Configuration handholding – “Show me how?” ............................................................................................... 48

Conclusion ............................................................................................................................................................. 49

References ............................................................................................................................................................. 50


Telco data center trends

Telecom operators build data centers to provide voice, Internet, voice over Wi-Fi (VoWiFi), media content, and

online applications to mobile subscribers. With unprecedented growth in mobile subscribers and Internet traffic

because of social media, video demands, and online applications, these datacenters demand consistent low

latency, faster convergence, dual-stack connectivity, multispeed interfaces, high bandwidth, and a very high degree

of redundancy.

Telco data centers typically host different types of servers and services, such as:

● IP Multimedia Subsystem (IMS) servers for voice and VoWiFi

● Serving Gateways (SGW) and packet data network gateways (PGW)

● Policy and Charging Rule Functions (PCRF) to apply subscriber-specific policies

● Gi-LAN services such as TCP optimizers, Carrier-Grade Network Address Translation (CG-NAT), Deep

Packet Inspection (DPI), firewalls, and load balancers to provide security and improve bandwidth utilization,

and offer better experiences to end subscribers

● Multimedia services such as Evolved Multimedia Broadcast Multicast Services (eMBMS), Content Delivery

Network (CDN) servers, and Over-The-Top (OTT) servers

● IP support systems such as DNS, Authentication, Authorization, and Accounting (AAA), Dynamic Host

Configuration Protocol (DHCP), TACACS, and OSS and BSS systems

Applications hosted in these data centers have a mixed environment. Some applications are hosted on custom

physical appliances, while others are hosted on virtual servers. Applications on virtual servers have different

hypervisor requirements. Newer applications are delivered through micro-services architecture and need container

support. To support all of these requirements, the data center fabric is expected to support both physical and virtual

environments, and should be able to integrate with multiple hypervisors and containers.

Figure 1, below, illustrates the landscape of telco data centers. Typically, telecom operators build data centers at

central and edge locations, but due to increasing demands, and to provide better experiences to subscribers, some

services are moving to aggregation layers.

Figure 1. Landscape of telco data centers


Key demands in these data centers are:

● Consistent policies and centralized management across each data center: central, edge, or aggregation

● Massive scalability

● Automation

● Policy-driven configurations

● Network Functions Virtualization (NFV)

● Service chaining

● Telemetry

● Security

● Ease of operation

Cisco Application Centric Infrastructure (Cisco ACI) in telco data centers

Cisco ACI™

provides the solution for high performance, massive, scalable data centers across any geographical

location with centralized management and consistent security and telemetry policies. Cisco ACI Fabric has three

key components:

● Cisco Application Policy Infrastructure Controller (APIC)

The APIC provides a single point of management for fabric switches, including automation, troubleshooting,

and operations management for the whole fabric.

● Fabric switches

The Cisco ACI Fabric is built with Cisco Nexus® 9000 Series Switches that connect leaf and spine switches

in full mesh topology. It provides consistent low latency and high performance. Fabric switches support

interfaces from 100M to 100G (100M, 1G, 10G, 25G, 40G, and 100G), with a plan of 400G in future. It gives

users the choice to connect different applications with different bandwidth requirements in the same fabric.

● Policies

Cisco ACI policies define multitenancy, security, telemetry, and service chaining in the fabric. A policy is

configured on the APIC and is automatically applied to switches as needed. For example, APIC can be

configured with a service-chaining policy that defines that communication from source to destination should

first go through a firewall, then to a TCP optimizer before going to the destination. APIC translates this

policy and configures switches based on the locations of source, destination, and service devices.

Following are some key reasons why using Cisco ACI makes sense in telco data centers. (See Figure 2.)

Figure 2. Cisco ACI for telco data centers


Intent-based fabric

The APIC provides end-to-end automation and management that includes day-0 fabric bring-up, day-1 fabric-wide

provisioning, and day-2 operations.

To bring up a fabric on day 0, a fabric admin connects to the APIC management console (CIMC) and provides a

few simple parameters, such as fabric subnet, out-of-band management IP, and the APIC login credential. Once

the fabric admin registers the automatically discovered Cisco ACI leaf and spine switches, the APIC brings up the

VXLAN fabric automatically. Figure 3 lists the steps for bringing up the Cisco ACI Fabric.

Figure 3. Fully automated provisioning of Cisco ACI Fabric

The network admin uses APIC as a single pane of glass for complete day-1 network provisioning. APIC deploys

the configuration policies on switches based on the application server location.

Network operators use APIC as a single pane of glass to operationalize the network. APIC becomes a single tool

to troubleshoot the network and to provide telemetry information, upgrades/downgrades of software image, and

integration with northbound OSS/BSS tools.

Bare-metal appliance, virtual machine, and container support

Most telco data centers have a mixed environment of virtual and physical appliances. Applications often have their

own hypervisor requirements. Cisco ACI provides a single platform that enables automation for bare-metal

appliance, virtual machines, and containers.

Cisco ACI supports integration with OpenStack, vCenter, Hyper-V, and Kubernetes; this integration is named VMM

(Virtual Machine Manager) domain integration. The APIC pushes Cisco ACI policies to network switches based on

the location of the virtual machine or container.

With VMM domain integration, APIC automates the virtual network configuration. APIC automatically creates port

groups on the VMM when an application’s network profile is created. Server admins can use this port group to

instantiate the Virtual Machine (VM). When a VM with an APIC-configured port group is identified on a switch,

APIC configures the switches with the correct VLAN/VXLAN configuration.


Figure 4. ACI Fabric in Telco Datacenter

With VMM domain integration, Cisco ACI provides visibility into multiple hypervisors and containers in a single

view. The following is a screen shot of the VMM domain from APIC that shows different VMMs and container

domains supported by Cisco ACI:


The following screen shot from APIC shows information about VMs and virtual networks for vCenter integration.

Similar information is provided for other VMM domains as well.

The following screen shot shows the container domain and container endpoint from the APIC:

Integrated security

Cisco ACI uses white-list policies. By default, it does not allow communication between different Endpoint Groups

(EPGs). An endpoint group is a collection of endpoints, such as servers, that have the same policy. An explicit

contract is needed to allow communication between endpoint groups.

A user can choose to use micro-segmentation based on security and application requirements. Cisco ACI provides

one of most comprehensive micro-segmentation solutions in the networking industry. The diagram below provides

an overview of micro-segmentation support in Cisco ACI.


Figure 5. Micro-segmentation with Cisco ACI

Control plane and user-plane separation (CUSP) with Cisco ACI Anywhere

The current trend in telco data centers is to separate the user plane and the control plane. The control plane

remains in central data centers, while the user plane is moving to edge data centers that are closer to subscribers.

There are other services such as OTT caching and Gi-LAN that are moving to edge data centers, due to this trend

in data center distribution.

Since telecom data centers are becoming distributed, there is a demand to have centralized management and

consistent policy across any location. The Cisco ACI Anywhere vision aligns with distributed telco data centers.

Customers can use the same policy in central, edge, and aggregation data centers.

Cisco ACI offers the following three primary solutions for connecting distributed telco data centers where services

can be deployed across any location, but management is central and policy is consistent.

● Cisco ACI multipod

● Cisco ACI multisite

● Cisco ACI remote leaf

In a future software release, Cisco ACI policies can be extended to private and public cloud.

Cisco ACI multisite and remote leaf solutions allow a network operator to establish secure communication between

SGW-U and SGW-C, PGW-U and PGW-C. These solutions allow the centralized management of all distributed

data centers by providing full day-0 and day-1 automation, consistent day-1 policies, and end-to-end

troubleshooting across any location.


Figure 6. Cisco ACI Anywhere

Cisco ACI multipod

The Cisco ACI multipod solution allows connectivity between Cisco ACI pods over an IP network. Consider each

pod as a single leaf-and-spine fabric that runs an independent control plane with IS-IS, BGP, and COOP. The

Cisco ACI multipod solution uses BGP EVPN to exchange control plane prefixes and VXLAN headers to exchange

Cisco ACI policy information across pods.

APIC instances across different pods form a single APIC cluster that manages all of the pods. Customers use a

single APIC cluster interface to manage all of the pods. APIC provides automation for day-0 fabric build-up, day-1

rendering of Cisco ACI policies on switches, and day-2 troubleshooting for all pods.

The Cisco ACI multipod solution needs 50 ms of latency across pods to form an APIC cluster. Figure 7 provides an

overview of the Cisco ACI multipod solution:


Figure 7. Cisco ACI multipod

Any network device that supports the following features can be used in the IP network to connect Cisco ACI pods:

● PIM Bi-Dir to handle broadcast, unknown unicast, and multicast (BUM) traffic

● DHCP relay for automatic bring-up of pods

● Open Shortest Path First (OSPF) to provide access across pods

● Increased Maximum Transportation Unit (MTU) to handle VXLAN encapsulation

The following screen shot is from an APIC that is managing multiple pods:


Cisco ACI multisite

Cisco ACI multisite offers connectivity between two completely separate Cisco ACI Fabrics (sites) that are

managed by a Multisite Orchestrator (MSO). Each Cisco ACI Fabric has an independent APIC cluster and a control

plane to provide complete fault isolation. As in the multipod solution, BGP EVPN is used to exchange control plane

information, and VXLAN is used to exchange the Cisco ACI policy information. Figure 8 shows an overview of

Cisco ACI multisite.

Figure 8. Cisco ACI multisite

Cisco ACI multisite offers the following key benefits:

● Monitoring the health-state of the different Cisco ACI sites

● Provisioning of day-0 configuration to establish an inter-site EVPN control plane

● Defining and provisioning policies across sites (scope of changes)

● Inter-site troubleshooting

ACI multisite uses micro-services architecture in which a cluster of three multisite orchestrator is deployed in

active/active fashion to provide redundancy and load balancing. Since MSO might be in different locations, these

MSO could be up to 150 ms RTT (round trip time) away from each other. APIC to multisite orchestrator latency can

be up to 1 sec; this allows Cisco ACI sites to be deployed across different continents.


The following screen shot from a Cisco ACI multisite orchestrator shows the MSO provides management of

different sites:

Cisco ACI multisite uses head-end replication for broadcast, unknown unicast, and multicast (BUM) traffic. Any

network device that supports the following features can be used in IP network connecting ACI sites:

● OSPF to provide reachability across sites

● Increased MTU to handle VXLAN encapsulation

Cisco ACI remote leaf

The Remote leaf solution is applicable for aggregation data centers, because these data centers are smaller. An

APIC at a central or edge data center can manage all of these smaller distributed aggregation data centers and

provide end consistent policies. Figure 9 provides an overview of an ACI remote leaf solution:


Figure 9. Remote leaf

The Cisco ACI remote leaf solution allows the Cisco ACI policy extension without the need for APIC and spine at a

remote location. Remote locations will just need a pair of remote leaf switches that will forward traffic locally to

connected endpoints, but will be managed centrally by APICs.

APICs at central sites are responsible for day-0 bring-up of leaf switches at remote locations, pushing day-1 policy

and day-2 troubleshooting. End-to-end management of remote data centers is done from an APIC at a centralized

location. From a feature, functionality, and scale perspective, a remote leaf is similar to a local ACI leaf.

The following are the requirements from the IP network for a remote leaf solution:

● Reachability to infrastructure address of APICs of Main DC, and TEP pool of ACI main datacenter Pod from

remote location.

● Up to 300 ms (RTT) latency

● 100-Mbps bandwidth

● OSPF on the upstream router that connects to the remote leaf to provide reachability to the primary site

● DHCP relay on the upstream router for automatic bring-up of remote leaf

● Increased MTU to handle VXLAN encapsulation


High performance, multispeed, and scale

Different applications in telco data centers have different types of interfaces and bandwidth requirements. The

Cisco ACI Fabric supports this mixed environment. Cisco ACI supports line-rate 100G interfaces and mixed

bandwidth interfaces in the fabric.

● Data and internet services need 100G downstream interfaces with very high bandwidth requirements.

Switches connected to these applications need 100G upstream interfaces in the fabric.

● Voice and signaling need 1/10G downstream interfaces with moderate bandwidth requirements. Switches

connected to these applications need 40G upstream interfaces in the fabric.

● OTT applications demand 25G connectivity from servers. Switches connected to these applications need

100G upstream interfaces in the fabric.

Since there is an exponential demand, scalability is extremely important in telco data centers. Scalability in Cisco

ACI is handled in multiple ways.

● Grow horizontally with the fabric approach. As the demand for bandwidth grows in the fabric, users can add

more capacity at the spine layer by adding more spine switches. Similarly, when additional services are

needed, capacity can be increased at the leaf layer by adding more leaf switches. Adding a node into fabric

is seamless; APIC automatically adds the new nodes into the VXLAN fabric.

● Cisco ACI Fabric uses a conversation-learning approach. Leaf switches learn the information only when

communication happens. This increases scalability drastically. Not every MAC and IP address will be

learned everywhere.

Figure 10. Cisco ACI Conversation learning


● Scalable architecture with multipod and multisite solutions

Cisco ACI Fabric provides scalability with multipod and multisite architectures.

Cisco ACI multipod uses a separate control plane in each pod and connects pods using BGP EVPN to

provide control plane isolation and scalability. In Cisco ACI 3.1 multipod architecture, up to 12 pods and 400

leaf switches can be connected in a single fabric.

The multisite solution provides complete fault isolation by using a separate APIC cluster in each ACI site.

This allows control plane, data plane, and management plane isolation to provide better scalability. The

multisite solution in Cisco ACI 3.1 scales up to 800 leaf switches and 8 sites. In Cisco ACI 4.0, Cisco plans

to support up to 1200 leaf switches and 10 sites. Architecturally, the solution is more scalable, and scale

can be further increased in future releases.

Hardware telemetry and operational tools

Traffic processing in telco data centers is complex in nature. ASICs used in Cisco Nexus 9000 Series Switches

have the capability to support hardware telemetry information by capturing full data-plane packet information at line

rate. Table 1 summarizes the hardware telemetry data collected by ASICs:

Table 1. Cisco Nexus 9000 hardware telemetry

Flow Table (FT) Flow Table Events (FTE) Streaming Statistics Export (SSX)

● Captures full data-plane packet flow information, plus metadata

◦ 5-tuple flow info

◦ Interface/queue info

◦ Flow start/stop time

◦ Flow latency

◦ Drop indicator and drop reason

● Direct hardware export with low flush times (100 milliseconds)

● Triggers notifications based on thresholds/ criteria met by data-plane packet flows

◦ 5-tuple flow info

◦ Interface/queue info

◦ Forwarding drop indication

◦ Buffer drop indication

◦ Latency/burst threshold indication

● Direct hardware export with flow-level and global throttling

● Streams statistics and other ASIC-level state data based on user config

◦ Interface statistics (packets/bytes/drops)

◦ Buffer depth

◦ Queue-level microburst stats

● Direct hardware export with very low collection intervals (10’s of microseconds)

Fabric insight app hosted at the APIC collects the telemetry data from ASICs and provides information such as

control plane and environment monitoring, anomaly detection, traffic flow visibility, flow latency, and flow packet

drops.


Following is a screen shot of fabric insight app hosted at the APIC:

Cisco ACI allows the network operator to view the fabric as a single entity, and provides tools such as health score

cards, faults across the fabric, endpoint tracking, a troubleshooting wizard, etc., to simplify network management.

The operations tools are discussed in a separate section.

Intelligent service chaining

One of the most important use cases in telco data centers is service chaining. Traffic needs to go through a chain

of devices before it exits the data center. In a traditional network, service chaining is based on node-by-node

Policy-Based Routing (PBR) and Access Control List (ACL) configuration.

Cisco ACI automates and provides scalability of service chaining through:

● Ease of configuration, because service nodes are handled in a group rather than as individual nodes

● Easy expansion by simply adding additional devices in a group without changing the overall service policy

configuration

● Automatic load balancing of traffic across service nodes

● Automatic symmetricity of traffic

● Health check of service nodes and automatic rebalancing of traffic across the remaining nodes

● Bypassing and reinsertion of the service group in a chain, based on threshold

Intelligent service chaining is discussed in detail in a separate section.


Cisco ACI use case for Gi-LAN

Gi-LAN using intelligent service chaining

Service chaining is a very common use case in telco data centers. Packets from telco servers such as PGWs need

to go through multiple services such as TCP optimizers, Deep Packet Inspection (DPI) and Carrier-Grade NAT

(CG-NAT) devices before they leave the data center.

The Cisco ACI intent‒based architecture and intelligent service chaining function allow users to use RESTful API

or GUI to define the service chaining. APIC then translates the logical service graph to a concrete switch

configuration and deploys the configuration to the ACI leaf switches that participate in the service chaining function.

This innovative approach brings multiple benefits:

● Ease of provisioning service. Unlike the traditional service insertion where the user has to set up many

low-level switch configurations on a box-by-box level, with this new approach the user needs to care only

about high-level logical design. APIC automates the process of creating and deploying the configuration.

● Deployment flexibility and ease of scale-out. With the Cisco ACI policy model and service chaining, the

user has the flexibility to deploy service nodes anywhere in the fabric. The traffic requires services that can

enter and leave traffic from any leaf switches. This also implies ease of scale-out in future, when traffic

volume increases.

● Better visibility and day-2 operation. This new approach brings more visibility and provides direct

mapping between application/use cases and network infrastructure. With this new approach, APIC allocates

and reclaims network resources based on the needs of service.

The TCP optimizer is one of the most common services in Gi-LAN. Let’s take an example of TCP optimizers as a

service and see how Cisco ACI can simplify service chaining. The same example can be used for other services as

well.

Automatic traffic symmetry, simplified expansion, and load-balancing

Symmetry of traffic is a common need for services devices. The traffic out from and into a data center should

always go through the same node. We can achieve this by complex routing or PBR rules, but it is always static in

nature and achieved through complex configurations.

Without the Cisco ACI solution

PGW has a subscriber address pool, which needs to be divided manually into smaller buckets, and each bucket is

allocated manually to a TCP optimizer. To address failure, each bucket will be allocated a backup TCP optimizer

node. To take care of symmetric traffic patterns, PBR has to be configured separately for return traffic.

For better load balancing, the subscriber address pool must be divided into granular groups. This would result in

higher TCAM utilization on switches, a larger configuration, and higher complexity. This solution requires PBR

configuration on all switches in the fabric.

If there is a requirement to modify the number of TCP optimizers, then the whole configuration needs to be

changed. This is not easy to do in a production environment.


With the Cisco ACI solution

Cisco ACI takes a device group approach for service chaining rather than a node-by-node approach. The

subscriber pool traffic can be sent as a whole to a device group containing all service nodes.

The Cisco ACI Fabric switches will calculate hash based on the source IP, the destination IP, the source L4 port,

and the destination L4 port. Based on this hash value, a PBR node is chosen to forward a packet. In reverse

direction, hash will result in the same PBR node when IP and port value are reversed. Symmetricity is by default

and does not need extra configuration. Hash can also be configured based on source IP only or destination IP only,

as needed.

The solution uses forwarding table hash; hence it will not consume any extra TCAM resources. There is no need to

divide the subscriber pool to provide load balancing and backup; different flows will automatically take different

PBR nodes.

If there is a requirement to change the number of devices in the group, the entire configuration does not need to

change; new devices need only be added/deleted.

Figure 11. Cisco ACI Symmetric PBR

Configuration simplicity

Cisco ACI configuration for service chaining is simple (see Figure 12). The network administrator configures the

service nodes into a device group. The subscriber pool of PGW can be classified into consumer EPG, and outside

prefixes can be classified into provider EPG. In the contract based on configurable rules, traffic can be forwarded to

the device group containing all service nodes.

Return traffic comes to the same device as forward traffic, based on the forwarding table hash.


Figure 12. Simplified configuration of symmetric PBR

Cisco ACI uses a template-based approach for configuration that can be reused multiple times to save

configuration effort. To configure service chaining, a service graph template needs to be defined for a group of

service nodes. This template can be used multiple times with different filter (ACL) rules. APIC automatically pushes

the policy-based redirection rules to switches involved in service chaining. APIC also automatically extends the

Layer 2 domain to multiple ACI leaf switches whenever this is required (for example, when service nodes are

attached to multiple leaf switches).

For example, in TCP optimizer service chaining, one service may need particular TCP port traffic to be optimized,

while another service may need all TCP ports to be optimized. Network administrators can create two ACI

contracts for this purpose, but use the same service graph template.

Following is a screen shot of a service graph template from APIC:


Following is a screen shot of the contract that uses the service graph template and redirects traffic to a group of

nodes based on filter values:

Service node health check

Cisco ACI tracks the inside and outside end-of-service node through IPv4 ICMP, IPv6 ICMP, or TCP probes. If a

device is not reachable through the inside interface, Cisco ACI automatically removes the whole node from PBR to

avoid traffic black-holing, and automatically load-balances traffic across the remaining service nodes. The same

technique is followed if the device is not reachable through an outside interface.

APIC can detect the location of the service nodes and start a health check automatically on the leaf switches where

the service nodes are attached. (See Figure 13).

Figure 13. PBR tracking for device liveliness


The following is a screen shot of Cisco ACI’s PBR policy from the APIC:

The health group identifies the internal and external leg of the same device. That way, when the internal leg fails,

the whole device can be removed from the PBR and vice versa.

Bypassing service based on threshold

Some services may need to avoid PBR when more than the defined number of nodes in a group fails. One such

example could be that a user may prefer to forward traffic without optimization, if more than a few TCP optimizers

in a data center fail. Customers may like to bypass TCP optimizers to ensure that traffic does not drop, because

remaining TCP optimizers may not be able to process the traffic load. (See Figure 13.)


Figure 14. Bypassing TCP optimizers to avoid congestion

The following is a screen shot of a configuration to enable PBR bypass based on threshold:


The following table captures the comparison between Cisco ACI and a on Cisco ACI solution with PBR-based

service chaining.

Table 2. Comparison between traditional service chaining and ACI intelligent service chaining

Non-Cisco ACI Cisco ACI

Configuration complexity Complex due to manual IP pool division, PBR, ACL and IP SLA configuration.

Simple. No need for complex PBR, ACL and IP SLA rules based on IP addresses.

Traffic load-balancing Manual, hence not very accurate.

Active/standby design.

Better load-balancing due to automatic hashing.

Active/active design

Scalability TCAM utilization is high due to scaled and complex PBR rules needed for load-balancing

No extra TCAM consumption for better load-balancing

Resiliency Manually configured backup device for each primary device.

IP SLA probes that declares a device dead with one packet loss.

All devices are active. After the threshold reaches to minimum needed TCP Optimizers, traffic bypasses all of them.

Device is declared dead after multiple tries.

Resilient hashing after service node failure

In ACI 3.2 and later, PBR supports resilient hashing. With this feature, when a node failure is detected, only the

flows going through the failed node are rehashed to one of the nodes. The rest of the flow will remain intact. If a

service node maintains a TCP session, then resilient hashing may be a desirable behavior. (See Figures 15, 16

and 17.)

Figure 15. Traffic forwarding without any failure


Figure 16. Node failure without resilient hashing

Figure 17. Node failure with resilient hashing


Multinode service chaining

Cisco ACI Fabric can redirect packets to multiple services through a contract, but only one service can use policy-

based redirect. In Cisco ACI 3.2, the fabric can redirect packets to multiple different types of services using policy-

based redirect. (See Figure 18.)

Figure 18. Multinode service chaining

The following is a screen shot of multinode PBR from the APIC:

Cisco ACI OpenStack plug-in

Cisco ACI offers tight integration with OpenStack that is beneficial as part of a network function virtualization

solution. The Cisco ACI OpenStack plug-in automatically provisions ACI policies, including EPGs, bridge domains,

and contracts, in response to actions taken through OpenStack Neutron APIs.

The plug-in automatically maps Neutron networks and routers to EPGs, BDs, and contracts in ACI and responds to

instance creation and deletion events by creating the appropriate VLAN bindings in the ACI Fabric. When a virtual

instance is created through OpenStack, the ACI OpenStack plug-ins and APIC automatically configure the

appropriate policies on each leaf switch and, optionally, the virtual switch as well. As virtual machines migrate,

policies are automatically updated as well.


The following diagram (Figure 19) explains how the Cisco ACI policies are automatically pushed by APIC control

on creating a virtual instance on OpenStack, through the steps listed below

1. The OpenStack administrator creates network, subnet, security groups, and policy.

2. Through the Cisco ACI plug-in, policies are pushed to the APIC.

3. The APIC automatically maps OpenStack network policies to EPGs, BD, and contracts.

4. The OpenStack administrator instantiates VM on OpenStack.

5. The APIC, based on the location of the VM, pushes Cisco ACI policies to the correct leaf switch.

Figure 19. Cisco ACI unified plug-in for OpenStack

With Cisco ACI 3.2, the openstack ACI plug-in is extended to include an automated L3out (external connectivity)

configuration. Cisco ACI locates VM or Virtual Network Function (VNF) locations through LLDP, and uses this

information to configure L3outs (static route or BGP) on the APIC dynamically.

Architecture of Cisco ACI OpenStack plug-in

The Cisco ACI OpenStack plug-in uses the OpFlex agent on the host and controller, which allows admins to get

visibility from the APIC into the virtual environments.

The OpFlex agent creates an endpoint file containing VM names, IP addresses, network policies, and routing

settings of all VMs. The Cisco ACI OpenStack plug-in uses this information to provide VM and virtual network

information into the APIC through the VMM domain.

The Cisco ACI OpenStack integration also uses an Agent-OVS, which runs the OpFlex protocol with Cisco ACI

leaf, and provisions the open vSwitch through OpenFlow. It provides the network automation of OVS through the

Cisco ACI OpenStack plug-in. (See Figure 20 for the ACI OpenStack plug-in architecture.)


Figure 20. Cisco ACI OpenStack plug-in architecture

The following two screen shots from APIC show the VM information that can be seen from APIC. EPGs in the

following screen shot are configured through the ACI plug-in. Client endpoints are VMs instantiated on OpenStack.

When the Cisco ACI plug-in is running in OpFlex mode, it configures a VMM domain for OpenStack and provides

the information about hypervisors, VMs, IP/MAC of VM, port-groups, encap, etc. Following is a screen shot of an

OpenStack VMM domain from APIC:


Users may have a scenario where OpFlex cannot be supported. The Cisco ACI plug-in can run without OpFlex as

well. In this mode, EPG, BD, and contract is configured on APIC as soon as OpenStack networks are configured.

Once VM is instantiated on OpenStack, mapping of VLAN and port is done on APIC. Since OpFlex is not running

on the host in this mode, a VMM domain for OpenStack will not be created.

Cisco ACI supports a faster data plane with openstack integration. The following table captures the current support

with a faster data plane:

Table 3. Faster dataplane support with Cisco ACI plug-in

Data plane Cisco ACI plug-in with OpFlex agent Cisco ACI plug-in without OpFlex agent

OVS Yes Yes

OVS-DPDK No Cisco ACI 3.2

VPP-DPDK No No

SR-IOV Yes (SR-IOV ports are not managed by OpFlex agent; however, on same host, other ports can be managed by the agent.)

Yes

Cisco ACI use case for Evolved Pocket Core (EPC) and virtual Evolved Packet Core (vEPC)

Mobile providers are virtualizing EPC functionality to reduce dependency on specialized hardware. It also

increases speed of service delivery, on-demand scalability, and ability to respond to real-time network conditions

and user needs.

Deploying vEPC on Cisco ACI Fabric brings the following benefits:

● Supports dynamic routing protocol peering between Cisco ACI leaf and VM

● Supports BFD for fast failure detection

● Supports vPC between compute and Cisco ACI leaf while supporting dynamic routing protocol and BFD


● Supports VM mobility to further increase the flexibility of deployment and scale-out in the future. VMs of a

given vEPC instance can be attached to any one of eight leaf switches. Future software can further increase

the mobility zone.

● Evens traffic distribution among VMs. All incoming traffic can be evenly distributed to VMs of the same

vEPC instance

● Provides wide ECMP to support large vEPC cluster. Current Cisco ACI software support a vEPC cluster

consist of 64 Session Function (SF) VM. ACI hardware has capability to support larger size of vEPC cluster.

● Offers central provision and management from APIC

Control/data plane connectivity

VNFs such as virtual EPC (vEPC) need to have routing protocol peering with fabric switches. Telecom operators

will typically deploy multiple virtual instances of vEPC across a network fabric to provide high throughput and

redundancy. These instances may be placed across different pairs of leaf switches, hence there is a unique

challenge to form routing protocol peering across the fabric. Cisco ACI supports this, but other overlay fabrics does

not do it today.

VNFs may also move across the fabric to support this requirement. The fabric should ensure that VNFs can

seamlessly form routing peering with the switch where it moves. In a Cisco ACI Fabric, up to 8 switches can be

part of an external routing domain with the same VLAN. This allows VNFs to move seamlessly across switches.

(See Figure 21.)

Figure 21. Cisco ACI external routing domain (L3out)

Another challenge with vEPC is the load-balancing of traffic. Since Cisco ACI leaf switches have route peering with

all VNF across the fabric, it can perform ECMP load-balancing across all VNFs.

Virtual appliances will be much more in number compared to physical appliances to provide throughput and

optimum performance, hence ECMP requirements are much higher. The latest Cisco ACI software release support

64-way ECMP for OSPF and static route. As a result, both OSPF and BGP design options described below are

supported, and the solution supports vEPC clusters consisting of up to 64 VMs. Cisco ACI leaf switch hardware

has the capability to support a wider ECMP path, and Cisco ACI software can be enhanced to support clusters

consisting of larger numbers of VMs.


The following figure shows some of the most common deployment options in vEPC:

Figure 22. Cisco ACI peering with VNF

In the first option, the Cisco ACI leaf switches have OSPF and BFD peering with VNFs across the fabric. The

common IP of all VNFs is reachable through OSPF. APIC is a single pane of glass for provisioning and validating

the routing policy for all VNFs and fabric. If a VNF is moved to a different leaf switch configured under the same

L3out, it forms a routing relationship with the Cisco ACI Fabric, because the same policy is configured on multiple

nodes under the same L3out. When using the same VLAN encapsulation for the border nodes under the same

L3out, APIC automatically extends the L2 domain to all border leaf switches configured for this L3out. It effectively

creates a “VLAN” with the border leaf switches and all VMs that are part of the same vEPC instance as members.

The user must choose the OSPF broadcast network type under an L3out configuration. It is recommended to use

an OSPF interface priority to ensure that Cisco ACI border leaf switches are elected as OSPF DR and BDR.

The following screen shot of an APIC provides an example of how the APIC becomes a single pane of glass to

provision and verify the routing configuration with VNF or within the fabric to any leaf over VXLAN:


The following OSPF policy on the APIC can be used to enable OSPF and BFD.

The following screen shot from an APIC shows the verification of a BFD neighbor for OSPF:

Use of the BGP design option is recommended when there is a large number of VMs from the same vEPC and

each VM advertises a large number of OSPF routes.

In the second option, Cisco ACI leaf switches have a BGP neighbor control VM. Control VM advertises the

common IP for all VNFs using BGP. On leaf switches, there will be a static route with next-hop of each VNF to

forward traffic and perform load-balancing. BFD for static route is used for faster convergence.

The following four screen shots from an APIC show how to provision and validate BGP with static route over BFD.

A single BGP policy configures the BGP relationships across all nodes in L3out.


BGP configuration


Static route configuration

Verification of BGP configuration

Fabric to external connectivity

There are two methods to connect the Cisco ACI Fabric to external routing domains (see Figure 23):

1. Routing protocol connectivity from border leaf: a Cisco ACI Fabric leaf can be connected to external nodes

using OSPF, BGP, EIGRP, or static routes. A dedicated border leaf switch is not required; any leaf switch can

be used as a border leaf to establish connectivity to servers.

2. Routing protocol connectivity from Spine: a Cisco ACI spine supports an external routing protocol relationship

using BGP EVPN. Using a single BGP EVPN session, all VRF prefixes can be advertised to the WAN router.

This option of connectivity is also called golf connectivity. Optionally, APIC can push VRF configuration to the

WAN router as soon as a VRF is configured on a Cisco ACI Fabric. The WAN router can be ASR9K, CSR1Kv,

ASR1K, or Nexus7K.


Figure 23. Cisco ACI fabric to external connectivity options

Cisco ACI integration with Cisco Network Functions Virtualization Infrastructure (NFVI)

Cisco NFVI provides the virtual layer and hardware environment in which VNFs can operate. While these network

functions required a tight integration between network software and hardware in the past, the introduction to VNFs

has helped decouple (or loosely couple) the software from the underlying hardware. The following figure shows the

high-level Cisco NFVI architecture:

Figure 24. Virtualized Network Functions (VNFs)

Cisco ACI Fabric connects to the controller, compute, storage, and build nodes of the Cisco Virtual Installation

Manager (VIM). To install and operate OpenStack, there needs to be connectivity across these nodes. The

following figure shows the different L2 networks between the nodes that are needed for the VIM solution:


Figure 25. Cisco VIM networks

These L2 networks may be stretched across multiple leaf switches. To provide layer2 extension across leaf

switches, a VXLAN overlay will need to be provided.

Cisco ACI, with integrated overlay and underlay provides the following advantages:

● Touchless day-0 VXLAN fabric bring-up

● Day-0 automation of above networks that are needed to install and operate OpenStack infrastructure

● VIM2.2 support Cisco ACI OpenStack plug-in in Cisco ACI 3.0 release in OpFlex mode. By utilizing OpFlex,

the Cisco ACI policy model can be extended all the way down to the virtual switches running on OpenStack

Nova compute hosts.

● Troubleshooting of overlay and underlay network through a single tool

As of this writing, Cisco VIM supports Cisco ACI plug-in only in OpFlex mode.

Faster convergence

Services such as voice in telco data centers demands extremely low convergence. If APIC fails, there is no impact

to data traffic because it does not participate in either control or data plane. Convergence due to failure in fabric,

access connectivity, and external connectivity are within 200msec.

The Cisco ACI Fabric is used in telco data centers with voice services running over it; failure to nodes or links does

not cause voice-call drops.


The following diagram summarizes the convergence with Cisco ACI Fabric:

Figure 26. Cisco ACI convergence

Simplified operations

Scale and complexity in data centers are growing. To simplify operations, Cisco ACI has multiple tools that help

users to proactively find faults, troubleshoot, and perform regular operations on the network. The following figure

shows screen shots of some of the key tools most commonly used for fabric management. APIC is a single

application used to troubleshoot and operate the entire network. The reader can refer to screenshots from APIC in

the sections below to get more details about the operational benefits of using Cisco ACI.

Figure 27. Operational simplification


Topology dashboard

Network topology is the first thing that a network operator needs during troubleshooting. Most of the time it is

captured in an offline diagram, but is more useful if seen live. The Cisco ACI Topology dashboard provides the live

topology diagram of the Cisco ACI Fabric, including multiple pods and remote leaf switches connected over WAN.

Operators first get a summary view of the entire fabric, and then can go into details on each pod, each switch, and

the interfaces of each switch. From the same window, it can find out CPU, memory, processes, protocols,

interfaces, faults, crashes, statistics, endpoints, etc., for complete visibility into the entire fabric.

The following is a screen shot from Cisco APIC controller that shows a Cisco ACI Fabric with two pods that are

connected to a fabric with IP network.

Summary view of Cisco ACI Topology diagram

The user can view an individual pod to see a pod-specific topology. The following is a screen shot of a Pod2

topology from an APIC. It shows that two ACI leaves and two spines are connected in spine leaf fashion, but that

the remote leaf in this pod is connected over WAN.


Cisco ACI Topology diagram specific to a pod

Double-clicking on any switch will show the physical connectivity of a switch with respect to rest of the fabric. The

following screen shot gives an example.


The user can then go into each switch and get more details on switch specific topology, health of switch,

processes, interface statistics, protocol statistics, faults, history of events, configuration, logs of the switch,

inventory details, etc.

Switch-specific Cisco ACI Topology diagram

General information about the switch


Inventory details

Protocol-specific statistics and information

The history of the audit logs lists configuration-related changes on the switch, with timestamps.


The history of health provides the reasons, timestamps, and how much the health of a switch has changed for a

particular object of the switch.

The history of events lists all of the events on the switch, with timestamps.

Health score card

Health score is a powerful tool that a network operator can use to easily troubleshoot problems across the fabric. If

there are problems within the fabric, the Cisco ACI Fabric Health score card goes down from an ideal health score

of 100. Looking at the health score card, network operators will immediately know that there are problems in the

network.


Network operators can see the summary view of health of the fabric at system level. If health of the system is less

than 100, a network operator can trace the end objects from the system level and discover what objects are

causing this.

In the following screen shot, the health of the system is 97, because some of the switches in the fabric have a

health score of less than 100:

After finding out that the overall system health is less than 100, network operators can then go to individual

switches under an ideal health score (100) and uncover the reasons for this. In the following example, leaf 101’s

health is less than 100 because there is an STP issue in the network. In a traditional network, uncovering an STP

problem can be severely challenging. With box-by-box troubleshooting, it is extremely difficult to uncover such

problems; however, with a fabric approach, using APIC, operators can easily uncover such issues across the

whole fabric.

In the following screen shot, APIC controller is showing the fault that caused the health score card to be less than

100.


Faults across fabric

The Cisco ACI Fabric provides a view of overall faults across the fabric. These faults are categorized by domain

and type. From this single view, a network operator can get into each individual fault and uncover the details of

what is causing it.

The following screen shot from an APIC controller shows all the faults across a full fabric:

The following is a screen shot from an APIC providing details about an individual fault: a port on Eth1/1 on node-1

in the fabric is down.

Upgrade and downgrade of fabric

Upgrade and downgrade of a network is typically done using a box-by-box approach. Software upgrades for large

networks, spanning across multiple data centers, typically takes months. Cisco ACI takes a fabric approach for the

upgrade, and the entire upgrade can be done in three simple steps. Processes that used to take months can be

finished in a few days. Following is the list of steps for upgrading the Cisco ACI fabric.

1. Upgrade controllers, this does not cause any traffic loss.

2. Instead of upgrading individual switches, a network operator can divide the fabric into even and odd groups of

leaf and spine switches. Using the APIC, the even group can be upgraded while the odd group is still

forwarding traffic.

3. Once the even group is upgraded and forwarding traffic, odd group can be upgraded.


The following is a screen shot of switch groups from APICs:

Capacity dashboard

To find out the current scale of the network, operators need to gather scale information from individual switches.

Often, network operators run into challenges when they unknowingly exceed the limits of a network and run into

trouble. Cisco ACI provides a simple way to find out the scale of the whole fabric that can be used for capacity

planning. On software upgrade, maximum supported scale information automatically gets updated on the APIC.

The following is a screen shot of a capacity dashboard from an APIC:

Endpoint tracker

In a network with all physical appliances, network operators can keep an offline database and find out the location

of each appliance in the network; however, with virtualization, it becomes difficult to know the current location of a

VNF. It is very important to know the location of VNFs to schedule a maintenance window, troubleshoot, perform

upgrades, or do any operations activities. With Cisco ACI, using endpoint tracker, the current location and history

of any end point in a fabric can be found using an MAC, VM name, IPv4 address, or IPv6 address.


The following screen shot shows the current location and history of an endpoint with IP “100.100.100.20,” from an

APIC:

Troubleshooting wizard

APIC has a built-in troubleshooting wizard. Using this wizard, network operators can create a troubleshooting

session between the source and a destination endpoint in the Cisco ACI Fabric.

The first thing that is required to troubleshoot problems in a network is to build a relevant network topology. A

network may have hundreds of switches, but what is important is to know the relevant switches that are in the

traffic path. APIC dynamically builds a topology diagram based on the location of the endpoints in the network.

The following screen shot displays a troubleshooting session from APIC. The troubleshooting session is started

between source endpoint (12.28.0.11) and destination endpoint 12.28.0.10 that shows hosts, connections, and

nodes between the source endpoint (12.28.0.11) and destination endpoint 12.28.0.10”.

APIC dynamically builds a topology and shows that the source endpoint is connected to host 172.29.164.4, and the

destination endpoint is connected to host 172.29.164.3. It also provides port-connectivity information, such as host-

to-leaf-port information and leaf-to-spine-port information, along the path.


Network operators can view faults, drops/stats, contract-related drops, and events and audits and can perform

traceroute, check atomic counters, find out latency information, and run SPAN sessions to analyze packets

between the source and destination endpoints.

The following screenshot from APIC displays the atomic counters between a source and a destination endpoint.

The network operator can run atomic counters for an extended period of time, and check if packets are getting

dropped or if there are duplicate (excess) packets between the source and destination endpoints.

Traffic map and statistics

For capacity planning, network operators need a tool that can provide them heat-map information across the fabric.

A traffic map displays how much traffic is being forwarded between leaf switches; this information can then be used

to plan network capacity.

The following is a screen shot of a traffic map from an APIC.


Traffic statistics

Cisco ACI provides an easy way to check traffic statistics at multiple levels, such as interface level, EPG level,

contract level, etc.

The following screen shot shows a diagram displaying the interface statistics of a leaf from the APIC.

APIC provides visibility into multiple types of ingress or egress packet counters and byte counters for both multicast

and unicast traffic. The network operator can build a traffic graph based on time intervals, sampling intervals, and

statistics of interest.

The following screen shot from APIC shows how a network operator can change stats parameters and sampling

intervals, to build a new graph based on them:


The following screen shot displays the new graph from the APIC after changing the parameters:

Configuration handholding – “Show me how?”

Do you always wonder if you can have a friend who can sit beside you to configure a new feature on a network to

avoid any configuration mistake? In APIC, there is a “show me how” tab that guides you with pointers on how to

configure any Cisco ACI feature.

Let’s take an example of configuring a tenant in Cisco ACI. The following is a screen shot of this feature from the

APIC:


Click on the “Show me how” tab, and it will point to multiple options; click on “Create a Tenant”:

When a user clicks on “Create a Tenant,” the APIC shows the next step and explains it; it waits for the user to

execute the step, then proceeds to the next step. It waits at each step until it is executed; providing an explanation,

like a friend. See below a screen shot from the APIC of a series of steps creating a new tenant:

Conclusion

Cisco ACI provides a solution to current challenges of telecom data centers by providing automation, service

chaining, massive scalability, operational simplification, ease of troubleshooting, and consistent policy across any

location. The same Cisco ACI policy can be applied to central, edge, or aggregation data centers with centralized

management. Cisco ACI‒based data centers provide solutions for physical appliances, virtualized servers with

mixed hypervisors, and micro-services-based applications.

Telecom operators can use the benefits of, and deploy future NFVI-ready data centers with, Cisco ACI.


References

Cisco Application Centric Infrastructure (ACI):

https://www.cisco.com/c/en/us/solutions/data-center-virtualization/application-centric-infrastructure/index.html

Cisco Virtual Installation Manager Installation Guide:

https://www.cisco.com/c/en/us/td/docs/net_mgmt/network_function_virtualization_Infrastructure/2_2_17/Cisco_VIM

_Install_Guide_2_2_17/Cisco_VIM_Install_Guide_2_2_17_chapter_00.html

Cisco ACI OpenStack Installation Guide:

https://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/2-

x/openstack/osp_director/b_ACI_Installation_Guide_for_Red_Hat_OpenStack_Using_OSP_Director_2_3_x/b_ACI

_Installation_Guide_for_Red_Hat_OpenStack_Using_OSP_Director_2_3_x_chapter_010.html

Cisco ACI Multipod White Paper:

https://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualization/application-centric-infrastructure/white-

paper-c11-737855.html

Cisco AI Multisite Architecture White Paper:



Cisco ACI Release 2.3 Design White Paper:



Printed in USA C11-740717-00 05/18

https://www.cisco.com/c/en/us/solutions/data-center-virtualization/application-centric-infrastructure/index.html

https://www.cisco.com/c/en/us/td/docs/net_mgmt/network_function_virtualization_Infrastructure/2_2_17/Cisco_VIM_Install_Guide_2_2_17/Cisco_VIM_Install_Guide_2_2_17_chapter_00.html

https://www.cisco.com/c/en/us/td/docs/net_mgmt/network_function_virtualization_Infrastructure/2_2_17/Cisco_VIM_Install_Guide_2_2_17/Cisco_VIM_Install_Guide_2_2_17_chapter_00.html

https://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/2-x/openstack/osp_director/b_ACI_Installation_Guide_for_Red_Hat_OpenStack_Using_OSP_Director_2_3_x/b_ACI_Installation_Guide_for_Red_Hat_OpenStack_Using_OSP_Director_2_3_x_chapter_010.html



https://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualization/application-centric-infrastructure/white-paper-c11-737855.html






Cisco ACI in Telecom Data Centers White Paper...Since telecom data centers are becoming distributed, there is a demand to have centralized management and consistent policy across any

Documents