The HPC Storage Leader · 2020. 1. 14. · 62.5 Concurrent Write Requests 438 MB/s 500 MB/s 125,000 Concurrent Write Requests 170 KB/s 500 MB/s CLUSTER LEVEL TESTING DDN GRIDScaler™

© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.

Any statements or representations around future events are subject to change. ddn.com

Store

Process

Analyze

Collaborate

Archive

Cloud

The HPC Storage Leader Invent

Discover

Compete

1



DDN | Who We Are

► Main Office: Sunnyvale, California, USA

► Go To Market: Partner & Reseller Assisted, Direct

► DDN: World’s Largest Private Storage Company

► Only Storage Company with Long-Term on Big Data Focus

We Design, Deploy and Optimize Storage Systems Which

Solve HPC, Big Data and Cloud Business Challenges at Scale

World-Renowned & Award-Winning

2



3 An Elite Collection Of HPC’s Finest... Some of our 1000+ Customers



DDN | 15 Years of HPC Innovation 4

DDN Leads On The List of Lists:

80% of the Top 10

67% of the Top 100

32% of the Top 500

1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012

DDN FOUNDED LARGEST PRIVATE

STORAGE CO. (IDC)

500+

EMPLOYES

1st Real-Time Appliance

for High-Scale Big Data 1st in Data Center

Density

1st CUSTOMER

NASA

SFA Storage Fusion Architecture

2013

1st in Bandwidth + IOPS

1st In-Storage Processing™

SW-Only, Portable Architecture DDN’s 1st Parallel File

System Offering ft. Lustre 1st Hyperscale Object

Storage

Web-Scale Computing

and HPC Collaboration

SFX Flash Tiering

Revolutionizing HPC

1st Application-Aware

Hybrid Caching EXAScaler™

2014

10GB/s NCSA

100GB/s CEA, LLNL

1TB/s ORNL

5 BP / Rack



5 Our Unwavering Commitment to HPC

Investments in Exascale Real Engineering Is Needed To Scale 1000x

Fast Forward



Exascale I/O Challenges - Cost 6

LANL Trinity Hybrid Scratch Cost Analysis

Hybrid approach is necessary to meet bandwidth & capacity requirements



Exascale I/O Challenges – Power Consumption 7

7

0

10000

20000

30000

40000

50000

60000

70000

0.76 2.96

# o

f H

DD

s

Burst Throughput (TB/sec)

NERSC-8 Cost Comparison

Hybrid

HDDs

Power – 470KW

26 SFA Controllers

Power – 768KW

26 SFA Controllers + BB

Power – 1792KW

99 SFA Controllers



Burst Buffer

Absorbs the

Peak Load

Filesystem

Handles the

Remaining Load

Analysis: Argonne’s LCF production storage system (circa 2010) • 99% of the time, storage BW utilization < 33% of max

• 70% of the time, storage BW utilization < 5% of max

Archival

Storage Tier

Persistent

Storage Tier

Burst Buffer

Tier

IME SC’13 Demo

Cluster

25 MB/s

4 GB/s

50 GB/s

1) Separation of bandwidth and capacity is required

2) Utilization efficiency must be improved

8 Exascale I/O Challenges – Efficiency



Why is today’s I/O efficiency so poor?

► Serialization at various points in the I/O

path

• Stripe and block alignment (PFS and RAID) o Read-modify-writes to underlying storage

• Lock contention o Exacerbated by poor I/O structuring in applications

File Server 1 File Server 2 File Server 3 File Server 4

Storage

Compute

Node1

Compute

Node2

Lock Contention

Worsens with 1000s of nodes

2015 2018

Performance(TF)

20000 1000000

Concurrency 5000000 1000000000

0

200000000

400000000

600000000

800000000

1E+09

1.2E+09

1000

10000

100000

1000000

10000000

100000000

Source: http://storageconference.org/2011/Presentations/SNAPI/1.Grider.pdf

9



Why is today’s I/O efficiency so poor?

► Poor horizontal scaling characteristics

in the PFS – weakest link

• PFS are only as fast as the slowest I/O

component

• Oversubscribed or crippled I/O

components affect the entire system

performance

• As I/O sections get larger and # of

components increases the problem

worsens (congestion)

• This weakest link can be all the way down

to disks (RAID rebuilds)

File Server 1 File Server 2 File Server 3 File Server 4

Storage

A single overloaded

server can slow

down the entire

system

10



PFS Efficiency as a Function of I/O Size 11

0

10

20

30

40

50

60

70

80

90

100

0

10

20

30

40

50

60

70

80

90

100

4096 409600

Perc

en

tag

e o

f S

trip

e S

ize

Pe

rfo

rma

nc

e E

ffic

ien

cy (

Pe

rce

nt)

I/O Size (Log-Scale)

Performance & Efficiency of Non-Mergeable Writes as a

Function of I/O Size

Performance

I/O Size (bytes)

0

200

400

600

800

1000

1200

1400

1600

1800

2000

1 8 64 512

Th

rou

gh

pu

t (M

B/s

)

IO Request SIze (KB)

Parallel Filesystem on IME Demo Cluster SSDs (50GB/s available)

Avg…

Aligned, full-stripe-width IO required

for maximum PFS I/O performance

Faster media (SSDs) may not address

the underlying PFS performance limitations



What is Infinite Memory Engine (IME™)?

High performance I/O system based on

parallel log structuring

► Massive concurrency regardless of

application I/O pattern

► Dynamically load balancing helps steer clear

of oversubscribed and handicapped

components

► Innovative lookup mechanism enables

immediate availability of data

► Distributed fault tolerance

12

12



13 The Infinite Memory Advantage

Designed for Scalability

Patented DDN Algorithms

Scale-Out Data Protection

Distributed Erasure Coding

Integrated With File Systems

Designed to Accelerate Lustre*,

GPFS

No Code Modification Needed

Fully POSIX & HPC Compatible

No Application Modifications

Non-Deterministic System

Write Anywhere, No Layout Needed

Writes: Fast. Reads: They’re Fast Too.

No other system offers both at scale.



14 SC‘13 Demo Comparative Testing: Shared

Writes 14 IME Clients one per compute node; 98 node-local MLC SSDs

DISK LEVEL TESTING DDN GRIDScaler™

(per SSD)

IME

(per SSD)

62.5 Concurrent Write Requests 438 MB/s 500 MB/s

125,000 Concurrent Write Requests 170 KB/s 500 MB/s

CLUSTER LEVEL TESTING DDN GRIDScaler™ IME( overall)

6,225 Concurrent Write Requests

(8 MB)

49 GB/s 49 GB/s

12,250,000 Concurrent, Interleaved

Write Requests (4 KB)

17 MB/s 49 GB/s

SSDs behind a PFS don’t help

IME is at line rate to scale with SSD rates

Linear Cluster Scaling

Avg. 2018 Top500

Cluster Concurrency

57,772,000 Cores (est)



IME Checkpoint / Migration Workload Demo

Achieves >90% of Available Storage Bandwidth

• Checkpoint I/O directed at

IME (emulated with IOR)

• File #1 (49 – 50 GB/s)

• File #2 (49 – 50 GB/s)

• File #3 (49 – 50 GB/s)

• Migration of File #3 from IME

to PFS (4 -5 GB/s)

15



ISC’14 IME Demo Server 16

► Off the shelf 2U Server Chassis

► Dual Socket Ivy Bridge with 128

GB RAM

► Up to 24 SSDs per IME Server

► 2 FDR IB Ports

► Expected Burst Bandwidth per

IME Server: ~10GB/s



ISC’14 Demo System in DDN Booth 17

► 16U (servers)

► Total Peak BW: ~80GB/s

The HPC Storage Leader · 2020. 1. 14. · 62.5 Concurrent Write Requests 438 MB/s 500 MB/s 125,000 Concurrent Write Requests 170 KB/s 500 MB/s CLUSTER LEVEL TESTING DDN GRIDScaler™

Documents