Top Banner
FEATURE GUIDE Why Fibre Channel Is the NVMe Fabric of Choice 53-1004983-01 31 March 2017
17

Why Fibre Channel Is the NVMe Fabric of Choice

Nov 21, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Why Fibre Channel Is the NVMe Fabric of Choice

FEATURE GUIDE

Why Fibre Channel Is the NVMe Fabric ofChoice

53-1004983-0131 March 2017

Page 2: Why Fibre Channel Is the NVMe Fabric of Choice

© 2017, Brocade Communications Systems, Inc. All Rights Reserved.

Brocade, the B-wing symbol, and MyBrocade are registered trademarks of Brocade Communications Systems, Inc., in the United States and in othercountries. Other brands, product names, or service names mentioned of Brocade Communications Systems, Inc. are listed at www.brocade.com/en/legal/brocade-Legal-intellectual-property/brocade-legal-trademarks.html. Other marks may belong to third parties.

Notice: This document is for informational purposes only and does not set forth any warranty, expressed or implied, concerning any equipment,equipment feature, or service offered or to be offered by Brocade. Brocade reserves the right to make changes to this document at any time, withoutnotice, and assumes no responsibility for its use. This informational document describes features that may not be currently available. Contact a Brocadesales office for information on feature and product availability. Export of technical data contained in this document may require an export license from theUnited States government.

The authors and Brocade Communications Systems, Inc. assume no liability or responsibility to any person or entity with respect to the accuracy of thisdocument or any loss, cost, liability, or damages arising from the information contained herein or the computer programs that accompany it.

The product described by this document may contain open source software covered by the GNU General Public License or other open source licenseagreements. To find out which open source software is included in Brocade products, view the licensing terms applicable to the open source software, andobtain a copy of the programming source code, please visit http://www.brocade.com/support/oscd.

Why Fibre Channel Is the NVMe Fabric of Choice2 53-1004983-01

Page 3: Why Fibre Channel Is the NVMe Fabric of Choice

ContentsPreface...................................................................................................................................................................................................................................4

Audience........................................................................................................................................................................................................................................................ 4Related Documents...................................................................................................................................................................................................................................4About Brocade............................................................................................................................................................................................................................................ 4Document History......................................................................................................................................................................................................................................4

Overview of NVMe over Fabrics ..................................................................................................................................................................................... 5

Questions Arising from NVMe over Fabrics Fanfare...................................................................................................................................................6Q1: Is Fibre Channel a recognized fabric for NVMe? Yes........................................................................................................................................................6Q2: Is Ethernet just as good as Fibre Channel for NVMe? No.............................................................................................................................................. 6Q3: Is RDMA key to an NVMe fabric? No......................................................................................................................................................................................6Q4: Is SCSI the only native Fibre Channel protocol? No..........................................................................................................................................................7Q5: Is a translation layer bad for NVMe? It depends..................................................................................................................................................................7Q6: Can Fibre Channel do "zero copy"? Yes..................................................................................................................................................................................8

Some Background on Remote DMA............................................................................................................................................................................10Q7: Is RDMA needed for zero copy on IP? No.........................................................................................................................................................................10Networked RDMA Experiences a Rocky Patch..........................................................................................................................................................................11

NVMe Begins.................................................................................................................................................................................................................... 12NVMe's Advantage over SCSI and ATA........................................................................................................................................................................................12NVMe Looks to Expand.......................................................................................................................................................................................................................12RDMA Requires an iWARP Patch....................................................................................................................................................................................................13

Ethernet and IP Add Risk to Enterprise Storage........................................................................................................................................................14Complex Protocol Stacks Are Suboptimal for Storage...........................................................................................................................................................14Multilayer Flow Control Is Problematic...........................................................................................................................................................................................15Complex Stacks Mean Complex Configuration......................................................................................................................................................................... 15New Stacks Create New Security Targets.....................................................................................................................................................................................16Disrupting the Storage Ecosystem Imperils Support...............................................................................................................................................................16A Parallel Ethernet Infrastructure Means Risks...........................................................................................................................................................................16

Takeaway: Use a Dual-Protocol Fibre Channel SAN................................................................................................................................................ 17

Why Fibre Channel Is the NVMe Fabric of Choice53-1004983-01 3

Page 4: Why Fibre Channel Is the NVMe Fabric of Choice

PrefaceThis feature guide compares the differences between NICs with and without RDMA, iSCSI with and without RDMA, and NVMe overFabrics and NVMe over Fibre Channel.

AudienceThis guide is for technical IT architects and administrators who are directly or indirectly responsible for SAN design or administration.

Related DocumentsFor further information on NVMe over Fibre Channel, refer to www.brocade.com/nvme.

About BrocadeBrocade

® (NASDAQ: BRCD) networking solutions help the world's leading organizations transition smoothly to a world where

applications and information reside anywhere. This vision is designed to deliver key business benefits such as unmatched simplicity,non-stop networking, application optimization, and investment protection.

Innovative storage networking solutions help reduce complexity and cost while enabling virtualization, solid-state storage, and privatecloud computing to increase business agility.

To help ensure a complete solution, Brocade partners with world-class IT companies and provides comprehensive education, support,and professional services offerings (www.brocade.com).

Document HistoryDate Version Description

March 2017 1.0 Initial version.

Why Fibre Channel Is the NVMe Fabric of Choice4 53-1004983-01

Page 5: Why Fibre Channel Is the NVMe Fabric of Choice

Overview of NVMe over FabricsSCSI-based all-flash and hybrid arrays are going mainstream in the data center, driving enterprise storage to new levels of performanceand prompting a reassessment of performance bottlenecks. Meanwhile, Non-Volatile Memory Express (NVMe), a PCI Express (PCIe)standard that is purpose-built for solid-state PCIe modules, has emerged as a new high-performance interface for server-attached flash.NVMe's low latency and enhanced queuing provide better random and sequential performance and increased parallelism to applicationsthan traditional protocols like Serial Attached SCSI (SAS). To support data-center-scale networked storage, the NVMe standard is alsobeing extended beyond the PCIe bus via a new NVMe over Fabrics specification that positions NVMe as a high-performance challengerto SCSI's dominance in the SAN. NVMe over Fabrics maps NVMe over various transport options, including Fibre Channel, InfiniBand,RoCEv2, and iWARP. This paper will compare Fibre Channel to the other NVMe over Fabrics options.

Why Fibre Channel Is the NVMe Fabric of Choice53-1004983-01 5

Page 6: Why Fibre Channel Is the NVMe Fabric of Choice

Questions Arising from NVMe overFabrics FanfareNVMe was developed to provide a common, high-performance interface for PCIe non-volatile memory modules (for example, flashmodules). As NVMe gained traction, the desire to extend the scale grew, and thus the NVMe over Fabrics specification was begun,positioning NVMe as alternative to SCSI in the SAN space and opening a door for flash module vendors to address a new market.Naturally, the storage market newcomers sought to tout their technology as having advantages, and a bit of marketing spin waspredictable. Similarly, disadvantages of the new technologies may not have received much attention. Not surprisingly, somemisunderstandings have arisen. This paper will resolve several questions that have been raised.

Q1: Is Fibre Channel a recognized fabric for NVMe?Yes.As called out in the overview to this paper, Fibre Channel is, indeed, one of the fabrics that supports NVMe. The NVM Express whitepaper web page features an overview on NVMe over Fabrics that explicitly lists two types of fabric transports for NVMe: those usingRDMA and one using Fibre Channel. Though some competitive advocates will claim that Fibre Channel is not a legitimate NVMe fabric,the NVM Express white paper resolves that question.

Q2: Is Ethernet just as good as Fibre Channel forNVMe? No.The same white paper that explicitly lists Fibre Channel as one of the NVMe over Fabrics options also describes the ideal underlyingfabric as having a reliable, credit-based flow control and delivery mechanism. It then reminds us that credit-based flow control is native toFibre Channel, InfiniBand, and PCI Express transports. Credit-based flow control is not part of Ethernet/IP networks, so the white paperis effectively telling us that Fibre Channel is actually a better fabric for NVMe than either of the Ethernet-based fabrics, iWARP or RoCE.

Q3: Is RDMA key to an NVMe fabric? No.RDMA advocates will claim that somehow RDMA is important to a good NVMe fabric. But notice that the NVMe over Fabrics whitepaper does not list RDMA as an important attribute of the "ideal" NVMe fabric transport. There's nothing extraordinary about RDMA; it'sjust another way to implement an NVMe fabric transport. The InfiniBand community was deeply invested in RDMA (more below onRDMA) and was closely tied to the PCI community where NVMe development began. But NVMe itself is not dependent on RDMA, andneither is NVMe over Fabrics.

Why Fibre Channel Is the NVMe Fabric of Choice6 53-1004983-01

Page 7: Why Fibre Channel Is the NVMe Fabric of Choice

Q4: Is SCSI the only native Fibre Channel protocol?No.One approach of RDMA advocates is to compare the latency of NVMe over Ethernet/IP to "Fibre Channel" latency. That's like comparingIP to Ethernet because NVMe is an upper layer protocol and Fibre Channel is a link layer protocol. The full comparison is NVMe overEthernet versus SCSI over Fibre Channel, which is a valid comparison if it is described correctly. Now, as Fibre Channel experts areaware, SCSI on Fibre Channel has been given the (somewhat confusing) name of Fibre Channel Protocol (FCP), and more than onenewcomer has mistakenly assumed that all Fibre Channel traffic must be FCP. But FCP is not the same as Fibre Channel; it is merelyone FC-4 (upper layer) protocol, similar to FICON (a mainframe storage protocol), that can be carried by the Fibre Channel transport. TheRDMA advocates are happy to suggest that Fibre Channel is a SCSI-only transport, with SCSI-based latency baggage.

One misunderstanding that has arisen is that NVMe runs on Fibre Channel only after first being translated into underlying SCSI (FCP).Quite possibly, this misunderstanding was prompted by the same white paper, which tells us that ideal NVMe transport should allowclients "to send and receive native NVMe commands directly to and from the fabric without having to use a translation layer such asSCSI." That makes sense, since NVMe is latency-optimized and a translation layer would introduce latency. Well it turns out that FibreChannel does transport NVMe natively; no translation is required for Fibre Channel to transport NVMe. The implementation of NVMeover Fibre Channel defines a new upper layer traffic type, FC-NVMe, that identifies NVMe-specific frames. So once again, Fibre Channelqualifies as having this ideal fabric attribute.

But some clarification is in order! The developers of the FC-NVMe standard recognized that there would be huge value in offeringconcurrent support for both NVMe and SCSI traffic on the same infrastructure. They also recognized that this would be done mostefficiently by leveraging existing frame types, such as I/O frames. The FC-NVMe standard specifies that the NVMe over Fibre Channelimplementation will use the same I/O frame type that FCP uses. So a connection running NVMe over Fibre Channel that is captured andanalyzed will show a mix of FC-NVMe and FCP frame types.

Fibre Channel's long usage as a multiprotocol fabric is a good indication that a Fibre Channel SAN will simultaneously support SCSI andNVMe very reliably.

Q5: Is a translation layer bad for NVMe? It depends.The NVMe over Fabrics white paper says that an ideal attribute of an NVMe fabric transport is that it does not require a translation layer.From the perspective of a from-scratch implementation that is laser-focused on the lowest latency, having a translation layer to convert,for example, SCSI to NVMe would be suboptimal. It would be more efficient to write applications to use NVMe directly and avoid thetranslation step, which would add undesired clock cycles of latency to every I/O. And so, clearly, the ideal fabric should not requiretranslation, and, as mentioned, Fibre Channel does support NVMe natively, that is, without translation. At the same time, the NVMExpress community recognized the importance of all the deployed SCSI-based applications when they developed the SCSI TranslationReference. That resource was meant to be used by application developers re-architecting their products to use NVMe. But manypotential users of NVMe are not in a position to re-architect the applications that they run. They want the option of moving to an NVMeinfrastructure on their own terms without being dependent on their application vendors. From that perspective, having a translation layeravailable—as an option—is actually good for NVMe adoption. NVMe over Fibre Channel offers the best of both worlds, with HBAvendors providing drivers that offer SCSI-to-NVMe translation when desired, while also providing native NVMe support for applicationsthat are designed to use it.

Q5: Is a translation layer bad for NVMe? It depends.

Why Fibre Channel Is the NVMe Fabric of Choice53-1004983-01 7

Page 8: Why Fibre Channel Is the NVMe Fabric of Choice

Q6: Can Fibre Channel do "zero copy"? Yes.When the IP stack was being developed during the 1980s, it was designed to work with many upper layer protocols and many Layer 2networks, from Token Ring to phone lines. A clean separation of networking layers made perfect sense for interoperability, and one wayto achieve that was the use of intermediate buffering, making buffer copies common place. As speeds increased, however, most buffercopies were optimized away, except where it would break backward compatibility.

In the early 1990s, a good networking stack could offer single-copy efficiency. A Network Interface Card (NIC) received frames andwrote them (using DMA) into DRAM buffers associated with the networking stack; and then the stack would process the frame,determine which application should get the payload, and copy it to the application's DRAM buffer. (Since the NIC DMA step is not aDRAM-to-DRAM copy, it is not counted.) At that time, this single-copy architecture was optimal. See Figure 1.

FIGURE 1 Traditional TCP/IP Does One Copy

But in the mid-1990s as Fibre Channel was being productized, the game was changing. Fibre Channel's main claim was speed, so thepressure to optimize was high. Chip technology allowed for more complexity, and the FC/SCSI stack had fewer layers and was notconstrained by the same backward compatibility challenges that IP stacks faced. Fibre Channel was therefore well-positioned toimplement an adaptor/driver/stack architecture that eliminated the single copy. And so it did. When the application requests a storageI/O, it specifies a buffer address in the form of a "logical address range," which is then translated into a physical address range for DMApurposes. At times, a logical range will map to multiple physical blocks, so the HBA is designed to support a Scatter-Gather List (SGL) towrite the payloads. Fibre Channel has quietly been delivering "zero copy" for the past two decades. See Figure 2.

Q6: Can Fibre Channel do "zero copy"? Yes.

Why Fibre Channel Is the NVMe Fabric of Choice8 53-1004983-01

Page 9: Why Fibre Channel Is the NVMe Fabric of Choice

FIGURE 2 Fibre Channel Does Zero Copy

Q6: Can Fibre Channel do "zero copy"? Yes.

Why Fibre Channel Is the NVMe Fabric of Choice53-1004983-01 9

Page 10: Why Fibre Channel Is the NVMe Fabric of Choice

Some Background on Remote DMAA few years after Fibre Channel went mainstream, two industry efforts, Future I/O and Next Generation I/O, merged to form InfiniBand.InfiniBand focused on Remote DMA (RDMA) as a higher level protocol for improving the performance of server cluster communications,and both protocols gained traction in high-performance computing (HPC) environments. Relative to earlier protocols, RDMA reducedlatency for data transfer between servers, particularly when the data involved is highly dynamic (e.g., the result of computations). RDMApasses a Scatter-Gather List from the local server to the remote server, effectively sharing ownership of local memory with the remoteserver, enabling the remote server to directly read or write the local server's memory. InfiniBand, like Fibre Channel before it, was also in asituation (the availability of complex chip technology, no backward compatibility constraints) where zero-copy efficiency was practical.

Q7: Is RDMA needed for zero copy on IP? No.As RDMA gained popularity in the server cluster context, efforts were made to extend it across networks, leading in 2007 to the InternetWide Area RDMA Protocol (iWARP) standard, formally documented in five Internet Engineering Task Force (IETF) Requests ForComment (RFCs 5040–5044). iWARP was built on top of TCP, the streaming transport protocol that uses acknowledgments andretransmits when needed to provide reliable delivery. TCP also includes a "windowing" algorithm to throttle transmission in order to avoidexceeding the capacity of the network between the sender and the receiver. (The "streaming" aspect of TCP lets it gather sequentialpayload chunks and send them together, meaning that receivers must process all early chunks to understand where later chunks begin.)The first RFC, 5040, describes how RDMA uses the Direct Data Placement (DDP) protocol to achieve the zero-copy efficiency of FibreChannel and InfiniBand. The last RFC, 5044, Marker PDU Aligned Framing for TCP Specification, effectively disables the "coalescing"behavior of TCP so that a receiving NIC can process chunks more easily, enabling practical hardware-based support of DDP.

The aforementioned RFCs provided the basis for zero-copy efficiency, but traditional NICs did not have TCP processing capabilities.Software implementations offer interoperability, but do not deliver the promised RDMA performance. For that, new NICs, called TCPOffload Engines (TOEs), were required; and even early TOEs were not adequate for iWARP, and so RDMA-enabled TOEs weredeveloped that could implement DDP in hardware. Such TOEs are able to deliver the same zero-copy efficiency as Fibre Channel. See Figure 3.

Why Fibre Channel Is the NVMe Fabric of Choice10 53-1004983-01

Page 11: Why Fibre Channel Is the NVMe Fabric of Choice

FIGURE 3 iWARP and RoCE Achieve Zero Copy

Networked RDMA Experiences a Rocky PatchAround 2009, NVMe was gaining mindshare just as the InfiniBand market was running low on steam, and IETF's TRansparentInterconnection of Lots of Links (TRILL) and IEEE's Data Center Bridging (DCB) were gaining momentum as a way to make Ethernet alossless fabric. (TRILL was created to enable any-to-any Ethernet topologies not supported by IEEE's Spanning Tree Protocol. DCBincorporated Priority-based Flow Control, Enhanced Transmission Selection, and Data Center Bridging eXchange.)

The InfiniBand Trade Association (IBTA) saw an opportunity to shift gears and leverage its expertise in RDMA in a new technology space,and so they developed the RDMA over Converged Ethernet (RoCE, pronounced "rocky") specification ("Converged Ethernet" was an earlyterm for DCB). Just as iWARP requires specialized TOEs to deliver zero-copy efficiency, RoCE depends on RDMA-enabled NICs(RNICs) to achieve this performance. The IBTA has touted RoCE as higher performance than iWARP, pointing out that TCP, thefoundation of iWARP, was not the ideal protocol for low-latency communications in part because of TCP's "slow-start" behavior, whichkicks in when a connection is initiated or when a connection has been idle for an extended period. Because Ethernet does not offer TCP'sreliable transport capability, the RoCE standard implements that capability at a higher layer in the stack. At the time RoCE was launched,IPv4 address constraints were top of mind and TRILL was promising to radically expand the scale of Layer 2 Ethernet networks and thusIP subnets. The IBTA apparently felt that RoCE had everything it needed to deliver large-scale, high-performance RDMA.

Instead, hyperscale players and Software-Defined Networking promoters carried the day with "Layer 3 to Top of Rack," resulting in rack-sized IP subnets and prompting the IBTA to create RoCEv2 (sometimes called "Routable RoCE"). Unlike TCP-based iWARP, RoCEv2runs on top of UDP, which has no slow-start throttling behavior. Moving to UDP means that RoCEv2 frames are not compatible withRoCEv1 frames (although RDMA-enabled NICs that support RoCEv2 can usually be configured to use the RoCEv1 format). BecauseUDP lacks TCP's support for IETF's Explicit Congestion Notification (aka ECN, RFCs 3168, 4301, 6040), the IBTA specified thatRoCEv2 support IETF's ECN as well, implementing the flow control in the IB transport layer above UDP.

The current state of affairs is that both iWARP and RoCEv2 are vying for ownership of the Ethernet-based NVMe fabric market, eachwith valid criticisms of the other transport.

Networked RDMA Experiences a Rocky Patch

Why Fibre Channel Is the NVMe Fabric of Choice53-1004983-01 11

Page 12: Why Fibre Channel Is the NVMe Fabric of Choice

NVMe BeginsThe Non-Volatile Memory Express community arose following the 2007 Intel Developer Forum among interested parties looking tostandardize an interface for flash modules on PCI Express. Several aspects of the resulting NVMe specification are arguably moreoriented toward storage semantics than memory semantics, primarily because of the block-oriented nature of flash technology. SeeTable 1.

TABLE 1 Memory, Flash, Storage, and NVMe Semantics

Feature "Ideal Memory" Priority "Ideal Storage" Priority Flash Resembles ... NVMe Semantics

Read Bandwidth Very High Medium Memory Memory

Write Bandwidth Very High Medium Storage Memory

Read Latency Very High Medium Memory 50/50

Write Latency Very High Medium Storage 50/50

Read Granularity High Low Memory Storage

Write Granularity High Low Storage Storage

Scale GB to TB TB to EB GB to PB Storage

Random Access Very High Medium Memory Memory

Persistence Low Very High Storage Storage

Rewritability High Medium Storage N/A

Reliability High Very High Memory N/A

Metadata Linkage Low Medium Memory Storage

NVMe's Advantage over SCSI and ATABefore the development of NVMe, directly connected Solid-State Drives (SSDs), including flash-based drives, were usually connectedvia Serial Attached SCSI (SAS), or Serial ATA, both of which were serialized versions of parallel disk interfaces originally defined in the1980s for the DOS-era PC industry. As operating systems and applications matured, little notice was paid to the complexity and latencyoverhead in those legacy protocols, since the rotational latency and seek times of disk drives dominated the overall latency of any diskI/O. The maturity of the flash ecosystem has changed that equation. From the beginning, flash brought huge advantages in latency forreads, especially for random access reads where disk architectures are particularly weak. Write caches offered the ability to hide the slowspeed of writes to flash, but flash's write endurance challenges and limited density constrained its early suitability to specialized niches.As the density of flash grew and clever algorithms emerged to mitigate flash's write endurance issues, flash gradually shed its niche labeland became a fully viable alternative to spinning disk. The industry dependence on disk-friendly SCSI softened, the openness to a higherperformance SSD protocol grew, and NVMe was the answer.

NVMe Looks to ExpandIn parallel with RDMA expansion efforts, advocates of NVMe were also looking to expand their footprint. As flash-based SSDs displacedhard disk drives in the server, the NVMe community could see that flash would soon displace disk drives in networked storage as well. Asthe media latency dropped, pressure would grow to also reduce the protocol latency, just as happened in the PCI context. If they couldscale NVMe beyond its PCIe roots, it would have a shot at displacing networked SCSI. Anticipating the need, the NVMe communitybegan to develop the NVMe over Fabrics specification.

Why Fibre Channel Is the NVMe Fabric of Choice12 53-1004983-01

Page 13: Why Fibre Channel Is the NVMe Fabric of Choice

Though, as mentioned earlier, the semantics of the NVMe specification are oriented more toward storage than memory, three of thefabric transports identified in the NVMe over Fabrics specification, InfiniBand, iWARP, and RoCEv2, all use an RDMA layer in theirimplementations. And because NVMe over Fabrics allows NVMe-based products to target the networked storage use case, those threeRDMA fabric transports are now facing off against Fibre Channel, the undisputed incumbent protocol for high performance networkedstorage. In this context, it is not so surprising that RDMA-oriented NVMe presenters often dismiss Fibre Channel as an NVMe fabric,insisting that RDMA is somehow vital and casting Fibre Channel instead as a SCSI-only transport. See Q4: Is SCSI the only native FibreChannel protocol? No. on page 7.

RDMA Requires an iWARP PatchAlthough RDMA is a powerful protocol for dynamic shared-memory server clustering applications, it is not an inherently efficient protocolfor storage, especially for small writes. Imagine that a server needs to write 1024 bytes to storage. The RDMA model is for the server tosend a message to the storage device saying, "Hey, I want to write 1024 bytes to my volume. The bytes are located in my memory at theaddresses in the attached Scatter-Gather List." The storage device receives the message and associated Scatter-Gather List, then turnsaround and issues an RDMA read request to the server's memory, and the server then sends the 1024 bytes. This transaction requires achatty three messages in contrast to the single SCSI-based small write message available on Fibre Channel. The NVMe communityrecognized this weakness and, closing its eyes to remote DMA principles, defined a "capsule" mechanism for sending payload data in thesame message with commands.

NVMe's effective "write-this-data" capsule scheme helped its performance, but the non-RDMA aspect made it incompatible with amainstream RDMA implementation: iWARP. The iWARP community promptly worked on this, and in 2014 the IETF publishedRFC 7306, which describes the (arguably oxymoronic) "RDMA Write with Immediate Data."

RDMA Requires an iWARP Patch

Why Fibre Channel Is the NVMe Fabric of Choice53-1004983-01 13

Page 14: Why Fibre Channel Is the NVMe Fabric of Choice

Ethernet and IP Add Risk to EnterpriseStorage

Complex Protocol Stacks Are Suboptimal for StorageOne reason the NVMe protocol is more efficient than the SCSI protocol is NVMe's markedly simpler protocol stack. Since stacksimplicity seems relevant, it's worth taking a minute to look at the protocol stacks of the different NVMe fabrics. The stacks for FibreChannel, RoCEv2, and iWARP are represented graphically in Figure 4.

FIGURE 4 Stack Comparison for Different NVMe Fabrics

The complexity of IP/Ethernet relative to Fibre Channel is neither random nor gratuitous. There are several key differences in theprotocols that, over time, led to that complexity:

• Ethernet and IP (and TCP/UDP) implement transport in layers that were developed much more independently than FibreChannel. The challenges of address assignment and routing across global scale, with literally billions of nodes, that Ethernet/IPnetworks must support require multiple complex layers and algorithms. Fibre Channel was designed for data center scale, aproblem that has its own complexities, but is much simpler than Ethernet/IP's global scale.

• Ethernet was developed in the early days of networking as a best-effort shared medium. The protocol evolved variouspiecemeal mechanisms for loop avoidance, flooding, address learning, etc. Flow control was stitched on incrementally overyears. By contrast, the developers of Fibre Channel were able to benefit from these early lessons and thus created a moreholistically consistent protocol.

• Ethernet and IP have grown to work in a huge variety of environments, from LAN to MAN to campus to small office to home.Plug-and-play backward compatibility is essential to adoption in most of those environments; this reality imposes daunting

Why Fibre Channel Is the NVMe Fabric of Choice14 53-1004983-01

Page 15: Why Fibre Channel Is the NVMe Fabric of Choice

requirements on the protocol stacks. Fibre Channel has remained focused on the premium data center use case, and thus hasnot been forced to evolve dramatically, which greatly simplifies compatibility issues.

It's appropriate to acknowledge here that the complexities of the iWARP and RoCEv2 stacks do not necessarily add significant latency;much of the stack complexity is handled in "hardware" (though often ASIC-based processor cores are involved) by specialized RDMA-enabled NICs or TCP Offload Engines. But complex stacks translate to challenges with configuration, management, interoperability,troubleshooting, and analysis.

Multilayer Flow Control Is ProblematicFibre Channel has always been a lossless network; on every link, the sender and receiver use buffer credits to communicate exactly howmany packet buffers can be sent safely, ensuring that packets are not discarded due to lack of space. Fibre Channel fabrics are alsoimplemented in a single layer with multiple channels operating in parallel, allowing the flow control to propagate through each fabricdevice. Both iWARP and RoCEv2 recommend using lossless Ethernet for Layer 2 connectivity. DCB Ethernet has made progress inreducing the packet loss of traditional Ethernet, but DCB still suffers from interoperability challenges, and the flow-control mechanismsdo not propagate through routers. The IETF defined Explicit Congestion Notification for end-to-end congestion control. However, ECNcreates a conflict of interest in the sense that misbehaving end nodes can unfairly use more than their share of network bandwidth,highlighting the advantages of delegating flow control to the clean, single-layer fabric that Fibre Channel provides.

Complex Stacks Mean Complex ConfigurationEthernet and IP gained traction through low cost and simple one-size-fits-all behavior based on best-effort delivery and TCP recoverymechanisms. There has been benefit from the layering, such as the ability to introduce network virtualization, like VXLAN, to supportworkload mobility. But the encapsulation schemes needed for network virtualization have knock-on impacts. IP networks must considerthe Maximum Protocol Data Unit (MAXPDU) on each link. When routers add extra headers to frames passing from one link to another,the MAXPDU between these two links can be different, which can result (for IPv4) in a need for routers to fragment the frame. (IPv6handles fragmentation at the end nodes.) Alternatively, the MAXPDU can be managed across all links to mitigate fragmentation.

Similarly, many Ethernet products support "jumbo frames," which allow payloads of up to 8K bytes to be sent in a single packet, reducingthe overhead of packet headers. Because jumbo frames are not universally supported, the benefits are usually limited to specializedenvironments. When jumbo frame support is inconsistent, routers are forced to invoke their MAXPDU handling algorithms. Advocates ofIP/Ethernet fabrics will sometimes highlight the option of jumbo frames as a benefit, but experts (such as Demartek) do not recommendturning on jumbo frames for RoCEv2.

This legacy of IP/Ethernet complexity represents a challenge in a premium, lossless environment: the default behaviors of the equipmentand the experience and training of the support staff are largely oriented around the mainstream market. While it should be possible toconfigure a vendor's Ethernet and IP equipment for premium operation, such operation is not the normal default and usually not thedesired configuration for the same vendor in a different role in the network. By contrast, Fibre Channel was always designed as apremium network, and this will be as true in an NVMe context as it has been for SCSI environments for decades.

Complex Stacks Mean Complex Configuration

Why Fibre Channel Is the NVMe Fabric of Choice53-1004983-01 15

Page 16: Why Fibre Channel Is the NVMe Fabric of Choice

New Stacks Create New Security TargetsOne advantage of maintaining high-value storage assets in a Fibre Channel SAN has been that such fabrics are hard to reach over theInternet. There is simply no commonly defined infrastructure pathway to get from the Internet Protocol to the stable Fibre Channelprotocol stack. Attackers cannot send Fibre Channel frames across the Internet to probe a SAN. Thus the recurring minor security bugsthat regularly arise do not translate into zero-day exposures for storage volumes. The complex and relatively untried RoCEv2 and iWARPstacks open up new threat surfaces that are relatively accessible via the Internet, resulting in added complexity to the management offirewalls and other security mechanisms throughout an organization's IP network.

Disrupting the Storage Ecosystem Imperils SupportMany premium enterprise storage features, such as active-active multipath I/O and failover and non-disruptive upgrades, depend onsignificant testing by the storage vendor. This testing is also essential for the vendor to provide enterprise-level support when somethingunexpected happens. Vendor testing with Fibre Channel was enabled in part by the simple structure of the Fibre Channel stack, as well asthe relationships between the storage vendors and the Fibre Channel equipment suppliers. As enterprise storage customers look toadopt NVMe, they should be working with their vendors, not against them. Building out an NVMe architecture based on Fibre Channelleverages the traditional enterprise storage testing and support model. Using Ethernet/IP changes the equation and rocks the boat.

A Parallel Ethernet Infrastructure Means RisksCreating a parallel Ethernet SAN is a risky approach to NVMe storage adoption. Although this may seem an easy way to get started in agreenfield situation, questions will arise as production adoption approaches. What is the SLA of the new SAN? How can we migrateassets between our Ethernet SAN and our Fibre Channel SAN? What rollback options are feasible if we migrate between the SANs? Aswe go through our capacity planning efforts, how do we anticipate demand when we have two sets of infrastructure? Recognizing that thelong-term transition from SCSI to NVMe will take years, it's clear that the transition is not an event but a multiyear process that will bestrongly impacted by near-term infrastructure decisions.

New Stacks Create New Security Targets

Why Fibre Channel Is the NVMe Fabric of Choice16 53-1004983-01

Page 17: Why Fibre Channel Is the NVMe Fabric of Choice

Takeaway: Use a Dual-Protocol FibreChannel SANNVMe over Fibre Channel offers the performance and robustness of the Fibre Channel transport, along with the ability to run FCP andFC-NVMe protocols concurrently on the same infrastructure. Such a dual-protocol approach enables IT organizations to transition theirstorage volumes smoothly from SCSI to NVMe, either on different arrays or, as dual-protocol arrays become available, on the samearray.

With NVMe over Fibre Channel, there is no need to rip-and-replace the SAN and no need to create an expensive parallel infrastructure asorganizations begin to adopt NVMe. Dual-protocol HBAs and driver stacks mean that each storage application can be migratedincrementally as needed. SCSI assets can be migrated from SCSI to NVMe on a volume-by-volume basis. The low-risk performance-sensitive volumes can be migrated first, and the risk-sensitive volumes can be kept until later. In addition, master copies of key assets canbe created and maintained on top-tier enterprise arrays, while operational copies can be published to lower-cost arrays in the same SANfor use by other applications.

For more information about Brocade solutions, visit www.brocade.com/nvme.

Why Fibre Channel Is the NVMe Fabric of Choice53-1004983-01 17