Top Banner
A State Machine Architecture for High- Efficiency, Low-Latency SS Media Performance Extraction Bret S. Weber DataDirect Networks Flash Memory Summit 2012 Santa Clara, CA 1
21

A State Machine Architecture for High- Efficiency, Low ... · PDF file• Want SSD Performance & Latency ... Tradeoffs between Cost, Performance and Life ... Ground Up Design...

Feb 15, 2018

Download

Documents

trantu
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A State Machine Architecture for High- Efficiency, Low ... · PDF file• Want SSD Performance & Latency ... Tradeoffs between Cost, Performance and Life ... Ground Up Design Exploiting

A State Machine Architecture for High-Efficiency, Low-Latency SS Media

Performance Extraction

Bret S. Weber DataDirect Networks

Flash Memory Summit 2012 Santa Clara, CA

1

Page 2: A State Machine Architecture for High- Efficiency, Low ... · PDF file• Want SSD Performance & Latency ... Tradeoffs between Cost, Performance and Life ... Ground Up Design Exploiting

The Real Issue

• Want SSD Performance & Latency • Need Enterprise Redundancy • Use Commodity Flash Products

• Non Proprietary • Customer Replaceable • No Technology “Lock In” • Lowest $/TB

• Allow Seamless Levels of Storage • Low $/TB • Low $/IOP

2

Page 3: A State Machine Architecture for High- Efficiency, Low ... · PDF file• Want SSD Performance & Latency ... Tradeoffs between Cost, Performance and Life ... Ground Up Design Exploiting

SSD Small Block IOPs

3

Page 4: A State Machine Architecture for High- Efficiency, Low ... · PDF file• Want SSD Performance & Latency ... Tradeoffs between Cost, Performance and Life ... Ground Up Design Exploiting

SSD Large Sequential

4

Page 5: A State Machine Architecture for High- Efficiency, Low ... · PDF file• Want SSD Performance & Latency ... Tradeoffs between Cost, Performance and Life ... Ground Up Design Exploiting

Flash Performance Takeaways

► Flash Performance is All over the Map ► The Technology Changes Fast ► Tradeoffs between Cost, Performance and Life ► Investment Protection

5

Page 6: A State Machine Architecture for High- Efficiency, Low ... · PDF file• Want SSD Performance & Latency ... Tradeoffs between Cost, Performance and Life ... Ground Up Design Exploiting

Typical Controller IOPs Throttling

6

0

200,000

400,000

600,000

800,000

1,000,000

1,200,000

1,400,000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Small Block IOPs

SSD Capability Typical Storage Controller Capability

Page 7: A State Machine Architecture for High- Efficiency, Low ... · PDF file• Want SSD Performance & Latency ... Tradeoffs between Cost, Performance and Life ... Ground Up Design Exploiting

Typical Controller BW Throttling

7

0

2000

4000

6000

8000

10000

12000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Bandwidth (MB/S)

SSD Capability Typical Storage Controller Capability

Page 8: A State Machine Architecture for High- Efficiency, Low ... · PDF file• Want SSD Performance & Latency ... Tradeoffs between Cost, Performance and Life ... Ground Up Design Exploiting

Storage Fusion Architecture History

► Previous Experience with Silicon-Based RAID-3 Arrays • High Bandwidth • Very consistent Quality of Service (Bandwidth and Latency)

► Ground Up “Blank Sheet of Paper” Architecture in 2007 • Typical Five Year Maturity Cycle

► Address Emerging Applications • HPC • Data-Intensive Cloud Applications • Big Data Analytics

► Address the Emerging Technologies • Multi-Core Processors • High IOP Non Volatile Memory Technologies • Server Virtualization • Scale Out

Ground-Up Design Big Data Optimized

Page 9: A State Machine Architecture for High- Efficiency, Low ... · PDF file• Want SSD Performance & Latency ... Tradeoffs between Cost, Performance and Life ... Ground Up Design Exploiting

SFA – Details

► Ground Up Design Exploiting Emerging Technologies ► Optimized for Ultra-High IOPs & Bandwidth

• Application Space Code • Kernel Bypass for all IO Operations • Real-Time NUMA Scheduler Minimizes Latency • Built to Exploit All Available CPU Cores • BW, IOPs and Mixed Workload Configurations

► Fully Parallel I/O Execution Engine • No Lock Architecture

► Portable Code • Rapid Time To Market • Optimizations around Linux architectures

► Highest levels of Performance & QoS ► DDN device drivers accelerate I/O ► Optional Server Virtualization Brings Big Data Closest to Processing

240Gb/s Cache Link

32-64GB High-Speed Cache

32-64GB High-Speed Cache

SFA Interface Virtualization SFA Interface Virtualization

960Gb/s Internal SAS Storage Management Network

16 x FDR InfiniBand Host Ports

SFA RAID 5,6

RAID 5,6

SFA RAID 1

1 2 3 4 5 6 7 8 P

RAID 5,6

Q RAID 6

1 2 3 4

1 1m

Q RAID 6

P RAID 5,6

Internal SAS Switching

Internal SAS Switching

Page 10: A State Machine Architecture for High- Efficiency, Low ... · PDF file• Want SSD Performance & Latency ... Tradeoffs between Cost, Performance and Life ... Ground Up Design Exploiting

Typical Approaches

10

User Application

GNU C Library

System Call Interface

Kernel-Level Engine

Architecture Dependent Kernel Code

User Space

OS Kernel

Kernel Level Implementation

Data Path

Sys

tem

Man

agem

ent

Data Path

Storage Application

GNU C Library

System Call Interface

Kernel

Architecture Dependent Kernel Code

User Space

OS Kernel

Typical User Space Implementation

Sys

tem

Man

agem

ent

Kernel Context Switches Add Significant Latency

In-Kernel Dependencies Prohibit Easy Portability

Page 11: A State Machine Architecture for High- Efficiency, Low ... · PDF file• Want SSD Performance & Latency ... Tradeoffs between Cost, Performance and Life ... Ground Up Design Exploiting

Low Latency – No Context Switching

User Application

GNU C Library

System Call Interface

Kernel-Level Engine

Architecture Dependent Kernel Code

User Space

OS Kernel

Kernel Level Implementation

Data Path

Sys

tem

Man

agem

ent

Data Path

DDN SFA Application

GNU C Library

System Call Interface

Kernel

Architecture Dependent Kernel Code

User Space

OS Kernel

DDN User Space Implementation

Sys

tem

Man

agem

ent

Data Path

Storage Application

GNU C Library

System Call Interface

Kernel

Architecture Dependent Kernel Code

User Space

OS Kernel

Typical User Space Implementation

Sys

tem

Man

agem

ent

► No Context Switching ► Low Latency IO ► Predictable Performance ► High Speed Routing

Architecture

Page 12: A State Machine Architecture for High- Efficiency, Low ... · PDF file• Want SSD Performance & Latency ... Tradeoffs between Cost, Performance and Life ... Ground Up Design Exploiting

A Scalable & Abstractable I/O Delivery Architecture

12

Front-End Services Abstraction

Back-End Services Abstraction

SFA JIPC: Non-Preemptive

I/O Routing System

InfiniBand RDMA Driver Fibre Channel Driver Memory DMA Driver

Internal VM Clients External Block-Based Applications

User Space I/O Scheduler

Owns The Kernel & Prohibits Disruption

(State Machine)

I/O Routing, RAID Services, Cache Management & Disk Virtualization

Com

plet

e St

orag

e O

S

Scale Scale

Page 13: A State Machine Architecture for High- Efficiency, Low ... · PDF file• Want SSD Performance & Latency ... Tradeoffs between Cost, Performance and Life ... Ground Up Design Exploiting

SFA12K Architecture

► 40GB/S Throughput ► 1.4M IOPs to SSD ► DirectFlash Caching

Acceleration

State Machine

Scale Up To Any # Of CPU Cores Today: 8/Socket

Scale Out To Any # Of CPU Sockets

Today: 4 Sockets

Page 14: A State Machine Architecture for High- Efficiency, Low ... · PDF file• Want SSD Performance & Latency ... Tradeoffs between Cost, Performance and Life ... Ground Up Design Exploiting

SFA Controller IOPs – No Throttling

14

0

200,000

400,000

600,000

800,000

1,000,000

1,200,000

1,400,000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Small Block IOPs

SSD Capability Typical Storage Controller Capability

SFA with SSD

Page 15: A State Machine Architecture for High- Efficiency, Low ... · PDF file• Want SSD Performance & Latency ... Tradeoffs between Cost, Performance and Life ... Ground Up Design Exploiting

SFA Linear Scaling (SFA10K)

15

0

100000

200000

300000

400000

500000

600000

700000

800000

1 4 8 12 16 20 24 28 32 36

IO/S

Pool Count

random aligned read, Rate (IO/s), Varying Pool Count

SFA10K - ssd_1+1_WM_Re,4K, QD 16

Previous Generation Product

Page 16: A State Machine Architecture for High- Efficiency, Low ... · PDF file• Want SSD Performance & Latency ... Tradeoffs between Cost, Performance and Life ... Ground Up Design Exploiting

SFA Controller – No BW Throttling

16

0

2000

4000

6000

8000

10000

12000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Bandwidth (MB/S)

SSD Capability Typical Storage Controller Capability

SFA with SSD

Page 17: A State Machine Architecture for High- Efficiency, Low ... · PDF file• Want SSD Performance & Latency ... Tradeoffs between Cost, Performance and Life ... Ground Up Design Exploiting

SFA Linear Scaling (SFA10K)

17

0

2000

4000

6000

8000

10000

12000

14000

1 4 8 12 16 20 24 28 32 36 44 48 50 56

MB

/S

Pool Count

random aligned read, Rate (IO/s), Varying Pool Count

SFA 10K - ssd_1+1_WM_Re,64K, QD 16

Previous Generation Product

Page 18: A State Machine Architecture for High- Efficiency, Low ... · PDF file• Want SSD Performance & Latency ... Tradeoffs between Cost, Performance and Life ... Ground Up Design Exploiting

Our Surprise: 4KB I/O (SFA10K) Latency That Defies Convention

1 x 8+2: Pliant 800GB 1 x 4+1: Pliant 800GB QD Avg Latency (μs) Avg Latency (μs)

1 75.87 79.15 2 62.77 69.78 4 72.58 76.89 8 78.21 84.97

16 86.80 90.98 32 95.67 96.79 64 102.17 107.89

128 99.88 105.70

18

1 x 8+2: Pliant 800GB 1 x 4+1: Pliant 800GB QD Avg Latency (μs) Avg Latency (μs)

1 324.78 323.02 2 156.92 156.07 4 72.91 73.11 8 37.86 38.15

16 19.97 20.34 32 10.38 11.16 64 6.51 7.58

128 6.89 8.23

Ran

dom

Writ

e R

ando

m R

ead

Page 19: A State Machine Architecture for High- Efficiency, Low ... · PDF file• Want SSD Performance & Latency ... Tradeoffs between Cost, Performance and Life ... Ground Up Design Exploiting

Actual Case Studies

► Rich Imagery Processing Application

► GPFS File System ► Mix of Small & Large File

Requests (mostly small) ► SSD System Bake-Off ► DDN Selected Due To:

• Multi-Dimensional Performance • Low Latency & Wall Clock • Lowest Data Center Footprint

19

Leading U.S. Cloud-Based Applications

Provider

SFA10K-M Systems QDR InfiniBand Connected

114 x 800GB SSDs

Page 20: A State Machine Architecture for High- Efficiency, Low ... · PDF file• Want SSD Performance & Latency ... Tradeoffs between Cost, Performance and Life ... Ground Up Design Exploiting

Conclusions

► Get Performance and Redundancy ► SSDs Can Compete With The Right Architecture ► Lots Of Options: Don’t Lock Yourself In ► Don’t artificially limit your performance ► Big Data Requires All Varieties of Performance

• We don’t know what we don’t know…

7/27/2012 20

Page 21: A State Machine Architecture for High- Efficiency, Low ... · PDF file• Want SSD Performance & Latency ... Tradeoffs between Cost, Performance and Life ... Ground Up Design Exploiting

Questions ???

7/27/2012 21