For more information .......................................................................................................................... 12
Introduction This paper compares the network architecture, configuration, deployment, management capabilities, and performance of two systems:
• HP BladeSystem servers with Virtual Connect (VC) • Cisco Unified Computing System (UCS)
Both systems connect physical and virtual servers to LAN and SAN networks and manage those connections up to the server-network edge.
This paper does not compare HP CloudSystem Matrix to Cisco UCS. CloudSystem Matrix lets you readily provision and adjust infrastructure services to meet changing business demands. Neither Cisco UCS servers nor HP BladeSystem servers with VC alone provide that higher level of capability and services.
Hardware infrastructure To compare HP BladeSystem with VC to Cisco UCS, we need to establish a hardware baseline. This ensures that we consider function and performance comparisons between equivalent systems as much as possible. Table 1 specifies the hardware—sample servers, NICs, switches, interconnects—and primary management software required to compare BladeSystem and UCS.
Table 1: Comparable UCS and VC BladeSystem components
Component Cisco UCS HP BladeSystem with VC
Enclosure UCS 5108 Blade server chassis
(capacity 8 half-height blades)
(capacity 16 half-height blades)
Redundant UCS 6248UP Fabric Interconnect) with 48 port connections to FEX
Redundant VC FlexFabric 10 Gb/24-port module with 16 x 10 Gb downlinks through the midplane and multi-enclosure stacking links between modules Enclosure
Cisco UCS 2208XP Series Fabric Extenders (FEX) with
8 X 10 Gb uplinks to the Fabric Interconnect
32 midplane connections to the servers
NICs Cisco UCS 1280 Virtual Interface card (VIC) with
8 X 10 GbE uplinks ports
Integrated NC553i Dual Port FlexFabric Adapter with
2 X 10 Gb uplink ports
Servers Cisco UCS B200 M2 HP ProLiant BL460c G7
UCS Manager Virtual Connect Manager (VCM)
Both HP and Cisco network hardware portfolios contain more options than those listed in Table 1. The technology and hardware compared in this paper represent the newest and highest performing options available from HP and Cisco as of October 2011. This paper explores capabilities and issues of in both systems when you maximize performance and scalability. Check vendor options to find the appropriate level of scalability and performance for your business environment.
Network architecture Cloud-computing and service-oriented applications are driving today’s increasing demand for virtualization. As a result, a major shift is under way in data center traffic patterns. Server-to-server (east-west) communication generated by these new applications will likely account for up to 80% of all data center traffic by 2014 according to Gartner.
HP BladeSystem and Cisco UCS network architectures are significantly different. Figure 1 illustrates the difference in the two network architectures.
Figure 1: VC has a flatter architecture than Cisco UCS in “End Host Mode” configuration.
The UCS design is hierarchical. Most data traffic goes upstream to Layer 2 aggregation switches before heading back down to its target.
UCS Fabric Interconnects support two operating modes: Switch Mode and End Host Mode. Cisco’s best practices recommend using End Host Mode for UCS configuration, and Cisco enables this mode by default. In End Host Mode, the Cisco UCS hierarchical model uses active-active alternating A and B fabrics (that is, topology connecting network nodes with one or more switches). A dual-port NIC, dual FEX modules per enclosure, and dual fabric interconnect switches per UCS domain provide redundancy.
In End Host Mode, the A and B fabrics are isolated until traffic reaches the Layer 2 aggregation switches. This means that server-to-server traffic must travel to the Layer 2 aggregation switches and back through the FEX modules to the server. The exception here is for server-to-server traffic within the same fabric. A fabric interconnect can connect NICs associated with different servers if they are all part of the same fabric.
Because the percentage of server-to-server traffic is increasing rapidly, the UCS architecture using fabric interconnect switches increases latency, adds another hop, and can create a network traffic bottleneck. This increases the probability of network congestion due to oversubscription, and that can result in unpredictable network performance and application behavior.
In contrast, the HP BladeSystem design is flatter. With HP VC and ProLiant BladeSystem servers, the c7000 enclosure supports 16 half-height blade servers. VC FlexFabric interconnect modules support redundancy management and a fault tolerant stacking solution. FlexFabric modules connect all LAN and SAN traffic through converged fabrics with egress to external networks through the same VC module. Server-to-server traffic stays within the VC domain. The VC domain exists within a single enclosure, or within multiple enclosures when configured as a multi-enclosure domain with stacking links. Server-to-core (north-south) traffic is in its native LAN or SAN format when it exits or enters the VC domain at the uplink ports of the VC FlexFabric interconnect modules. All network traffic connects to Layer 2 aggregation switches using appropriate industry-standard network connection protocols (native Ethernet, Fibre Channel).
Network complexity VC FlexFabric provides a simple way to connect 16 blade servers in a single c7000 enclosure at the server edge while reducing networking sprawl. Converged LAN and SAN traffic travels from the embedded FlexFabric adapter available in ProLiant G7 Server blades, to the VC FlexFabric 10 Gb/24-port Interconnect module using dual 10 Gb uplinks. The VC configuration includes two FlexFabric modules for redundancy. It creates a complete, redundant server-edge network.
The VC configuration requires two components versus the 56 components that the comparable UCS configuration requires. Cisco recommends using the UCS End Host Mode to connect 16 servers in a redundant configuration within two 5108 chassis. It requires you to use 2 Fabric Interconnects, 32 cables, 4 FEX modules, and 16 VIC cards. This approach adds network hops, latency, and complexity—even for blades in the same enclosure (Figure 2). It also increases the number of possible fault events, such as down-link events caused by cable faults, and the probability of failover.
Figure 2: Best practice redundant configurations for 16 servers using all 8 X 10 Gb 1280 VIC adapter ports
Network loop protection We continue to implement and improve network loop protection for VC as we have since introducing it in 2007. We chose loop avoidance mechanisms based on the VC Ethernet port role. VC ports can be an uplink, downlink, or stacking link. VC Ethernet uplink ports connect to external LAN switches. VC Ethernet downlink ports connect to server NIC ports. VC Ethernet stacking link ports connect to other VC Ethernet modules.
VC uplink, downlink and, stacking link loop protection VC Ethernet uplink ports avoid network loops with external switches. An administrator associates uplink ports with a VC network (vNet). Uplink ports can belong to only one vNet. Administrators can map a vNet to zero, one, or more uplink ports. No matter how many uplinks a vNet has, no more than a single path to the external network switch is ever possible. Either VC aggregates (using the Link Aggregation Control Protocol) the vNet’s multiple uplink ports into a single logical port, or it ensures a single physical port forwards traffic while the remaining uplink ports are on standby. By ensuring there is only one network path from VC to the external switch, VC avoids a network loop between VC uplink ports and the external switch ports.
VC Ethernet downlink ports are edge devices. They inherently avoid network loops, provided there is no bridging between server NIC ports. If a server administrator mistakenly bridges NIC ports, it creates the opportunity for a network loop. However, the VC Network Loop Prevention feature detects and disables the bridged NIC ports.
VC stacking links avoid network loops using an internal Layer 2 loop avoidance protocol. VC stacking links form only when VC modules automatically discover other VC modules that are part of the same VC domain. VC stacking links allow a network topology to span multiple VC Ethernet modules. VC stacking links between a pair of VC modules consist of one or more cables. Multiple stacking links between the same pair of VC modules automatically form a link aggregation group or LAG. Within the VC domain, LAGs are also known as Shared Uplink Sets.
When you stack multiple VC modules together, an internal Layer 2 protocol calculates the optimal path to the “root bridge,” which is always the module with the active uplink port(s). Each stacked module seeks one optimal network path to the root bridge. Any alternate paths block network traffic, preventing network loops within the VC domain. Each network topology has only one root bridge at a time. Network topologies without uplinks usually assign the VC module with the lowest MAC address as the root bridge. The VC stacking link’s loop avoidance protocol only transmits from stacking link ports, and never from downlink or uplink ports.
UCS End Host mode loop protection Cisco UCS is a relative newcomer to the production and management of server-optimized network edge devices. UCS uses more network tiers than VC, but Cisco adopted network loop protection tools and techniques very similar to those of VC. In Cisco’s recommended End Host Mode, fabric interconnects provide Layer 2 switching to uplinks, but this mode pins server NICs to a specific fabric interconnect uplink. No local switching occurs between uplinks in End Host Mode. While the absence of local switching between the fabric interconnect uplinks prevents looping, the UCS architecture also prevents the flow of server-to-server traffic between fabric interconnects. Instead, server-to server traffic must travel to upstream switches for routing back to the target server. So while the tools that VC and UCS use for loop protection are similar, implementation and architecture are significantly different. The flatter VC architecture achieves loop protection and still allows server-to-server traffic without exiting the VC domain.
Congestion control in converged networks We have seen that server-to-server data traffic in VC domains stays within the enclosure. The following factors minimize network congestion for server-to-server traffic within the VC domain:
• Cross-sectional bandwidth performance of 60 Gb/s in half-height servers and 100 Gb/s in full-height servers, per server in each direction (full duplex)
• Up to 1 terabit per second (Tb/s) communication across the enclosure midplane
When native LAN and SAN traffic leaves the VC FlexFabric interconnect, external network congestion control and QoS protocols manage LAN traffic. As vendors extend FCoE converged networks beyond
the server-network edge and into multi-hop configurations, congestion control becomes a significant issue.
The IEEE ratified the 802.1Qau-2010 standard, also known as Quantized Congestion Notification (QCN), in March 2010. HP actively participated and drove the ratification efforts. QCN is one of the most significant standards for creating end-to-end converged data center networks that carry both LAN and SAN traffic. While existing Priority-based Flow Control (PFC) protects against link level congestion, QCN addresses end-to-end, switched, converged network infrastructure. It is a multi-hop protocol designed to protect the network against persistent oversubscribed congestion. To enable QCN in a network, the entire data path, including converged network adapters and switches, must support QCN. QCN does not guarantee a lossless environment in the LAN. It must work in conjunction with PFC to avoid dropping packets.
The purpose of the QCN standard is to address current and future data center network densities and to create a solution that can keep pace with network growth. QCN-compliant, end-to-end network hardware is not yet broadly available, so HP has not yet implemented QCN.
Cisco does not support QCN. Cisco claims that QCN is not a requirement for deploying an end-to-end FCoE network. Instead, the Cisco multi-hop architecture includes full Fibre Channel switching services such as Distributed Name Server, Fabric Shortest Path First (FSPF) routing protocol, and zoning in each fabric interconnect or data center switch. This Fibre Channel switching relies entirely on DCB-standard PFC-based flow control. As a result, there are unanswered questions about congestion control for multi-hop converged networks:
• Will Cisco and other industry vendors adopt QCN? When will this happen? • More important, how will Cisco interoperate with networks that support QCN? • Does Cisco’s multi-hop architecture with the inclusion of Fibre Channel, switching-based, congestion
control add latency? • Is the legacy FSPF protocol developed for native FC networks adequate to ensure end-to-end
congestion management in arbitrarily complex data center networks? • Does the Cisco multi-hop architecture require a Cisco-only solution for customers?
If you choose to implement the QCN standard, the next generation of HP VC hardware will offer end-to-end QCN support in the future, and will integrate with other QCN-compliant network infrastructures.
Configuration and deployment capability VC and UCS configure and deploy servers and establish network connections in similar ways. You can use Virtual Connect Manager (VCM) to set up VC server profiles. The VCM GUI includes wizards that automate configuration and deployment. If you prefer a command line environment, you can use the VCM CLI management capability to create VC server profiles.
The UCS Service Profile functions in much the same way as the VC server profile. It uses UCS Manager instead of VCM. Some of the major differences between the two occur in the approaches to automation.
Virtual Connect Server Profile A VC server profile defines the linkage between the server, and the networks and fabrics defined in VC. The server profile includes MAC addresses, WWN addresses, and boot parameters for the connection protocols that VC supports (Ethernet, iSCSI, FCoE, and Fibre Channel). VCM supports up to 256 profiles within the domain.
After defining a server profile, you can use VCM to assign the VC server profile to a specific enclosure device bay within the VC domain. VCM configures the server blade in that device bay with
the appropriate MAC, PXE, WWN, and SAN boot settings and connects the appropriate networks and fabrics.
VCM lets you use the iSCSI Boot Assistant to configure a server to boot from a remote iSCSI target as part of the VC server profile. This simplifies 90% of what can otherwise be a manual, error-prone process. Once you identify the iSCSI target, VCM automates most of the setup work and retrieves storage parameters directly from the target. The iSCSI Boot Assistant then attaches the boot configuration parameters to the server profile.
You can move VC server profiles between Virtual Connect domains as long as the servers remain physically connected to the same networks.
Cisco Service Profile Cisco considers UCS servers to be “stateless.” Cisco defines stateless as the ability “to use a service profile to apply the personality of one server to a different server in the same Cisco UCS instance.” When considering Cisco's definition for stateless computing, you should keep in mind that it differs from the industry’s definition.
The UCS Service Profile defines the server personality:
• Firmware versions • UUID (used for server identification) • MAC address (used for LAN connectivity) • WWNs (used for SAN connectivity) • Boot settings
The UCS Service Profile contains user-configurable settings related to the identifying information for that physical server. These settings include the network and storage configuration, the RAID configuration, and firmware versions used for BIOS, NIC adapters, and HBA adapters. Cisco claims to deliver at least eight times the number of managed settings in a UCS Service Profile that HP delivers in a VC Server Profile. Cisco claims that providing more settings makes UCS more robust, more effective, and ultimately simpler for adding and moving workloads.
In reality, many user configurable parameters included in the UCS Service Profile are BIOS-related. Administrators rarely modify them in most server deployments, so the Cisco claim of automating parameters that are not part of a typical server configuration is largely irrelevant. Making seldom-used UCS Service Profile parameters available might suggest that there is a need to adjust BIOS settings and NIC behavior, edit storage controller settings, and use separate media, applications, and processes for firmware updates. That can add complexity to the configuration process.
Server adapter and PCIe bus scalability When you consider configuration and deployment options for HP BladeSystem servers, it is important to understand the possible FlexFabric adapter and HBA configurations. In the half-height ProLiant BL460c G7 server, you can use the integrated FlexFabric adapter and add two more FlexFabric adapters using the available mezzanine connections. In supported, full-height G7 servers, you have two integrated FlexFabric adapters (two LOMs) and three mezzanine options that allow a total of five FlexFabric adapters in a single server. None of these adapter configurations causes oversubscription in VC domain server-to-server traffic. The enclosure midplane and VC interconnect midplane connection have the throughput to handle all 10 Gb ports on the FlexFabric adapters without oversubscription. You can also configure Fibre Channel and iSCSI networks on the same server by using optional FlexFabric adapters. These different storage fabrics work simultaneously and without conflict on the same server.
Even though the UCS 1280 VIC has an 80 Gb uplink capability (8 X 10 Gb), the PCIe 2 bus in Cisco UCS servers has an effective throughput of about 64 Gb. You should consider this limitation in all
performance calculations. The PCIe 2 bus constraint is an industry hardware limitation. With UCS, you may be investing in network bandwidth that you cannot use. In contrast, HP VC adapters do not configure 8 x 10 Gb ports for x8 or x16 PCI 2 connections. We balance HP VC adapters at 2x 10 Gb with a x8 PCIe Gen 2 bus connection. Consult hardware manufacturers’ roadmaps for future generations of PCIe server bus implementations.
Performance and scalability The server-network architecture for VC differs significantly from that of Cisco UCS. An HP enclosure with 16 half-height BladeSystem servers, each with the embedded dual-port 10Gb (20 Gb total) FlexFabric adapters and two optional mezzanine FlexFabric adapters, has a total bandwidth of 960 Gb. The usable aggregate bandwidth between device bays and interconnect bays is up to 1 Tb/s across the midplane. As a result, VC can support server-to-server communication within the enclosure at midplane speed and with no oversubscription. Keeping all server-to-server traffic within the enclosure also means that the rate of subscription for server-to-server traffic stays the same as VC BladeSystem installations scale.
In comparison, all UCS network traffic below the Layer 2 aggregation switches is server-to-core. With workloads using maximum bandwidth, UCS can only avoid periodic congestion or oversubscription issues by overprovisioning the network. Overprovisioning compensates for bandwidth issues caused by forcing all network traffic upstream to Layer 2 aggregation switches. Rates of UCS oversubscription in maximized configurations have doubled with the release of the 6248UP Fabric Interconnect, 2208XP FEX, and 1280 VIC. Cisco has doubled bandwidth in the 2208 FEX and doubled bandwidth in the 1280 VIC. Figure 3 shows the problem with this approach in fully utilized configurations. With all eight of the 10 Gb ports of the UCS 1280 VIC in use on each of the enclosure’s eight servers, two UCS 2208 FEX can accommodate only 160 Gb of the 640 Gb aggregate bandwidth. In an active-active configuration, this results in a 4:1 oversubscription of server-to-server traffic.
Figure 3: This single, fully utilized UCS enclosure shows a 4:1 oversubscription when configured in an active-active configuration, and a 2:1 configuration in an active-standby configuration.
The lack of available port connections between the FEX and fabric interconnect modules also affects scalability in UCS systems. Figure 4 shows that with paired 2208XP FEX modules and 6248UP fabric interconnect modules, the maximum number of fully utilized chassis drops from the first to the second generation of UCS hardware.
Figure 4: Fabric interconnect port limitations on fully utilized UCS server and chassis configurations drop scalability by almost half with the 2nd generation UCS hardware.
You should keep in mind that in most cases a lack of scalability is the cost for increased bandwidth or decreased oversubscription. The comparisons presented in this paper show what happens to scalability when you maximize performance in VC and UCS systems.
Figure 5 compares the bandwidth and subscription rates for server-to-core traffic in VC and UCS network architectures. In this comparison, we have configured the HP and Cisco servers for 40 Gb/s of network I/O. The cable count for the HP configuration is slightly lower between the interconnect modules and Layer 2 aggregation switches. In this example, HP VC has a 2:1 oversubscription rate compared to a 10:1 oversubscription rate for UCS.
Figure 5: This is a comparison of north-south oversubscription between VC and UCS.
Management We designed Virtual Connect Enterprise Manager (VCEM) as the primary management tool for multiple VC domains. VCEM is a highly scalable software solution. It centralizes network connection management and workload mobility for thousands of servers that use VC to connect to data and storage networks. VCEM uses the same profile format, content, and general operations as VCM.
VCEM provides these core capabilities:
• A single intuitive console that controls up to 250 VC domains (up to 1,000 BladeSystem enclosures and 16,000 servers) in VC multi-enclosure domain configurations
• The ability to define and manage server profiles for multiple VC domains from a central management interface
• A central repository that administers more than 256K MAC addresses and WWNs for server-to-network connectivity, simplifying address assignments and eliminating the risk of conflicts
• Group-based management of VC domains using common configuration profiles that increase infrastructure consistency, limit configuration errors, simplify enclosure deployment, and enable configuration changes pushed to multiple VC domains
• Scripted and manual movement of server connection profiles and associated workloads between BladeSystem enclosures so that you can add, change, and replace servers across the data center in minutes without affecting production or LAN and SAN availability
• Automated failover of server connection profiles to user-defined spare servers
• Discovery and aggregation of existing VC domain resources into the VCEM console and address repository
• Licenses per c-Class enclosure, simplifying deployment and support
In the Cisco hierarchical management approach, the UCS Manager centralizes management of all software and hardware components across multiple chassis and VMs. You access Cisco UCS Manager through GUI, CLI, or an XML API interfaces.
The Cisco UCS ”fewer steps” approach to deployment can be an issue in configurations with multiple policy pools, blade types, firmware types, and VMware clusters where all hardware requires a Service Profile documenting a complex data set. UCS Manager can only span a pair of fabric interconnects. Also, you cannot share UCS Service Profiles across different logical systems without using third-party software―with the associated costs of the software and training.
HP provides system management beyond the scope of UCS manager. As you go up the HP solution stack, you will find a complete management solution with our Matrix Operating Environment, data center service orchestration, and self-service catalogs. Cisco must turn to partner and third-party software to provide this level of service.
Conclusion Cisco UCS technology mimics many established technologies that form the foundation of HP Virtual Connect. You can find Flex-10 and FlexFabric-like features and capabilities in the UCS Virtual Interface Card 1280. The UCS Service Profile borrows heavily from VC server profile capabilities. But significant differences become evident when you compare the UCS and VC approaches to architecture and interoperability in the data center.
We base HP VC on open industry-standard technologies. Cisco UCS promotes standards and architecture that force customers into proprietary, Cisco-only solutions.
HP VC creates a flat architecture for server-to-server traffic, moving that traffic through high bandwidth midplane connections within the enclosure. UCS forces all server-to-server traffic upstream to Layer 2 aggregation switches to perform all management tasks.
VC has no server-to-server oversubscription. It does not burden processors with the overhead of unneeded QoS processes. High oversubscription on UCS server-to-server traffic requires QoS mechanisms to deal with network bottlenecks and reduced performance.
Mature, tested VC technology contains clear advantages over the emerging UCS technology for managing rapidly growing data centers with a heterogeneous mix of server and network solutions.