Mellanox Storage Solutions Yaron Haviv, VP Datacenter Solutions VMworld 2013 – San Francisco, CA
May 12, 2015
Mellanox Storage Solutions
Yaron Haviv, VP Datacenter Solutions
VMworld 2013 – San Francisco, CA
© 2012 Mellanox Technologies 2 - Mellanox Confidential - - Mellanox Confidential - - Mellanox Confidential -
11_2012_v4
Maximizing Data Center Return on Investment
97% reduction in database recovery time
• Case: Tier-1 Fortune100 Web 2.0 company
Database performance improved up to 10X
• Cases: Oracle, Teradata, IBM, Microsoft
Big Data needs big pipes
• high throughput, low latency server and storage interconnect
2X faster data analytics = expose your data value!
• With Mellanox 56Gb/s RDMA interconnect solutions
3x more VMs per physical server
33% lower application cost • Cases: Microsoft, Oracle, Atlantic, Profitbricks and more
Consolidation of network and storage I/O for lower OPEX
More than 10X higher storage performance
Millions of IOPS
60% Lower TCO, 50% Lower CAPEX
From 7 Days to 4 Hours!
3X the Virtual Machines
At 33% lower Cost Storage
© 2012 Mellanox Technologies 3 - Mellanox Confidential - - Mellanox Confidential - - Mellanox Confidential -
11_2012_v4
SSD Adoption grows significantly, Driving the Need for Faster I/O
Source: IT Brand Pulse
SSDs are 100x Faster, Require Faster Networks and RDMA
© 2012 Mellanox Technologies 4 - Mellanox Confidential - - Mellanox Confidential - - Mellanox Confidential -
11_2012_v4
The Storage Delivery Bottleneck
+ = 12GB/s =
Server
24 x 2.5” SATA 3 SSDs
15 x 8Gb/s Fibre
Channel Ports
2 x 40-56Gb/s IB/Eth port
(with RDMA)
OR
OR
SSD & Flash Mandate High-Speed Interconnect
10 x 10Gb/s iSCSI
Ports (with offload)
© 2012 Mellanox Technologies 5 - Mellanox Confidential - - Mellanox Confidential - - Mellanox Confidential -
11_2012_v4
Solving the Storage (Synchronous) IOPs Bottleneck With RDMA
100usec 200usec 6000usec
25
usec
1 us
20 usec
10
usec
The Old Days
(~6msec)
Software Disk
With SSDs
(~0.5msec)
With Fast Network
(~0.2msec)
With RDMA
(~0.05msec)
Network
100usec 200usec
200usec 25
usec
25
usec
180 IOPs
3000 IOPs
4300 IOPs
20,000 IOPs
Synchronous (back to back)
With Full OS Bypass
& Cache
(~0.007msec)
1 us
6
us
3
us
100,000 IOPs
Synchronous
© 2012 Mellanox Technologies 6 - Mellanox Confidential - - Mellanox Confidential - - Mellanox Confidential -
11_2012_v4
FC 10GbE/TCP 10GbE/RoCE 40GbE/RoCE InfiniBand
Block
NAS and Object
Big Data (Hadoop)
Storage Backplane/Clustering
Messaging
Compare Interconnect Technologies
FC 10GbE/TCP 10GbE/RoCE 40GbE/RoCE InfiniBand
Bandwidth [GB/s] 0.8/1.6 1.25 1.25 5 7
$/GBps [NIC/Switch]* 500 / 500 200 / 150 200 / 150 120 / 90 80 / 50
Credit Based (Lossless) ** **
Built in L2 Multi-path
Latency
Technology Features:
Storage Application/Protocol Support:
* Based on Google Product Search
** Mellanox end to end can be configured as true lossless Mellanox
© 2012 Mellanox Technologies 7 - Mellanox Confidential - - Mellanox Confidential - - Mellanox Confidential -
11_2012_v4
Standard iSCSI over TCP/IP
iSCSI’s main performance deficiencies stem from TCP/IP:
• TCP is a complex protocol requiring significant processing
• Stream based, making it hard to separate data and headers
• Requires copies that increase latency and CPU overhead
• Using checksums requiring additional CRCs in the ULP
BHS AHS HD Data DD
Basic Header
Segment
Additional
Header
Segment
(optional)
Header Digest
(optional)
Data Digest
(optional)
Protocol frames
(TCP/IP)
iSCSI PDU
© 2012 Mellanox Technologies 8 - Mellanox Confidential - - Mellanox Confidential - - Mellanox Confidential -
11_2012_v4
iSCSI Mapping to iSER / RDMA Transport
iSER eliminates the bottlenecks through:
• Zero copy using RDMA
• CRC calculated by hardware
• Work with message boundaries instead of streams
• Transport protocol implemented in hardware (minimal CPU cycles per IO)
Enabling unparalleled performance
BHS AHS HD Data DD
Protocol frames
(RDMA)
iSCSI PDU
RC Send RC RDMA Read/Write
X In HW
X In HW
© 2012 Mellanox Technologies 9 - Mellanox Confidential - - Mellanox Confidential - - Mellanox Confidential -
11_2012_v4
iSER protocol overview (Read)
SCSI Reads
• Initiator Send Command PDU (Protocol data unit) to Target
• Target return data using RDMA Write
• Target send Response PDU back when completed transaction
• Initiator receives Response and complete SCSI operation
iSC
SI
Init
iato
r
iSE
R
HC
A
HC
A
iSE
R T
arg
et
Targ
et
Sto
rage
Send_Control (SCSI
Read Cmd)
RDMA Write for
Data
Send_Control + Buffer
advertisement Control_Notify
Data_Put
(Data-In PDU)
for Read
Control_Notify Send_Control (SCSI
Response)
© 2012 Mellanox Technologies 10 - Mellanox Confidential - - Mellanox Confidential - - Mellanox Confidential -
11_2012_v4
Mellanox Unbeatable Storage Performance
@ 2300K IOPs
5-10% the latency under 20x the workload
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
iSCSI/TCP iSCSI/RDMA
IO L
ate
nc
y @
4K
IO
[
mic
se
c]
We Deliver Significantly Faster IO Rates and Lower Access Times !
@ only 131K
IOPs
iSCSI (TCP/IP)1 x FC 8 Gb
port4 x FC 8 Gb
portiSER 1 x
40GbE/IB Port
iSER 2 x40GbE/IB Port(+Acceleration)
KIOPs 130 200 800 1100 2300
0
500
1000
1500
2000
2500
K IO
Ps
@ 4
K I
O S
ize
© 2012 Mellanox Technologies 11 - Mellanox Confidential - - Mellanox Confidential - - Mellanox Confidential -
11_2012_v4
Accelerating IO Performance (Accessing a Single LUN)
0
1000
2000
3000
4000
5000
6000
1 2 4 8 16 32 64 128 256
Ban
dw
idth
[M
B/s
]
IO Size [KB]
iSCSI/TCP
iSER std
iSER BD (bypass)
0
100
200
300
400
500
600
700
1 2 4 8 16 32 64 128 256
IOP
s [
K/s
]
IO Size [KB]
iSCSI/TCP
iSER std
iSER BD (bypass)
0
500
1000
1500
2000
2500
3000
3500
iSCSI/TCP iSER std iSER BD (bypass)
IO L
ate
ncy
[
mic
rose
cou
nd
s]
Read
Write
R/W
0
2
4
6
8
10
12
14
16
18
iSCSI/TCP iSER std iSER BD (bypass)C
PU
Io W
ait
in
%
Read
Write
R/W
IO Latency and % of CPU in I/O Wait @ 4KB IO size and max IOPs Bandwidth & IOPs, Single LUN, 3 Threads
PCIe
Limit
PCIe Limit
@ only 80K IOPs
@ 624K IOPs
@ only 80K IOPs
@ 624K IOPs
@ 235K IOPs
@ 235K IOPs
© 2012 Mellanox Technologies 12 - Mellanox Confidential - - Mellanox Confidential - - Mellanox Confidential -
11_2012_v4
Primary Secondary
Mellanox & LSI address the critical storage latency and IOPs
bottlenecks in Virtual Desktop Infrastructure (VDI) • LSI Nytro MegaRAID accelerate disk access through SSD based caching
• Mellanox ConnectX®-3 10/40GbE Adapter with RDMA Accelerate access from
Hypervisors to fast shared storage over Ethernet, and enable Zero-overhead replication
When tested with Login VSI’s VDI load generator, the solution
delivered unprecedented VM density of 150 VMs per ESX server • Using iSCSI/RDMA (iSER) enabled 2.5x more VMs compared to using iSCSI with
TCP/IP over the exact same setup
Mellanox & LSI Accelerate VDI, Enable 2.5x More VMs
iSCSI/RDMA (iSER) target
Software RAID (MD)
LSI Caching Flash/RAID Controller
0 20 40 60 80 100 120 140 160
Intel 10GbE, iSCSI/TCP
ConnectX3 10GbE, iSCSI/RDMA (iSER)
ConnectX3 40GbE, iSCSI/RDMA (iSER)
Number of Virtual Desktop VMs
iSCSI/RDMA (iSER) target
LSI Caching Flash/RAID Controller
Rep
lica
tio
n
Mellanox SX1012
10/40GbE Switch
Benchmark Configuration
Redundant Storage Cluster
• 2 x Xeon E5-2650
processors
• Mellanox ConnectX®-3
Pro, 40GbE/RoCE
• LSI Nytro MegaRAID
NMR 8110-4i
2.5x More VMs
© 2012 Mellanox Technologies 13 - Mellanox Confidential - - Mellanox Confidential - - Mellanox Confidential -
11_2012_v4
Using OpenStack Built-in components and management (Open-iSCSI, tgt target, Cinder), no
additional software is required, RDMA is already inbox and used by our OpenStack customers !
Mellanox enable faster performance, with much lower CPU%
Next step is to bypass Hypervisor layers, and add NAS & Object storage
Native Integration Into OpenStack Cinder
Hypervisor (KVM)
OS
VM
OS
VM
OS
VM
Adapter
Open-iSCSI w iSER
Compute Servers
Switching Fabric
iSCSI/iSER Target (tgt)
Adapter Local Disks
RDMA Cache
Storage Servers
OpenStack (Cinder)
Using RDMA
to accelerate
iSCSI storage
0
1000
2000
3000
4000
5000
6000
7000
1 2 4 8 16 32 64 128 256
Ban
dw
idth
[M
B/s
]
I/O Size [KB]
iSER 4 VMs Write
iSER 8 VMs Write
iSER 16 VMs Write
iSCSI Write 8 vms
iSCSI Write 16 VMs
PCIe Limit
6X