Lustre Beyond HPC - OpenSFSopensfs.org/wp-content/uploads/2014/04/D2_S25_LustreFor... · 2016-10-12 · • Lustre market expansion – From a relatively small player to the dominant

Post on 30-May-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Lustre Beyond HPCToward Novel Use Cases for Lustre?

Presented at LUG 2014

Robert Triendl, DataDirect Networks, Inc.2014/03/31

2

ddn.com

this is a vendor presentation!

3

ddn.com

Lustre Today

• The undisputed file-system-of-choice for HPC – large-scale parallel I/O for very large HPC clusters

(several thousand nodes or larger)– applications that generate very large datasets but only a

relatively limited amount of metadata traffic

• Alternative solutions– are typically more expensive (since proprietary) – and not nearly as scalable as Lustre

4

ddn.com

Lustre Market OverviewData for the Japan Market

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

2008 2013 2018

Cloud & Content

Business Data Analysis

HPC "Work" Corporate

Research Data Analysis

HPC Archive

HPC "Work"

5

ddn.com

What Happened?Example: Storage Market Japan

• Lustre market expansion– From a relatively small player to the dominant HPC file

system

– From tens of OSSs to hundreds of OSSs (and, when including the “K” system, literally thousands of OSSs)

– But, also Lustre has become synonymous with large-scale HPC

– Experiments to use Lustre in other markets and for other applications have decreased, rather than increased

6

ddn.com

Overall Storage Market Evolution• “Software-defined storage” is replacing traditional

storage architectures in large cloud deployments

• Many Lustre “alternatives” are now available– CEPH, OpenStack/Swift, Swift Stack, Gluster, etc. – Various Hadoop distributions– Commercial Object Storage (DDN WOS, Scality,

Cleversafe, etc.)

• Lustre remains “exotic” and is rarely considered even an option

7

ddn.com

BA 6%

HPC "Work" Corp 15%

Research Data Analysis 35%

HPC Archive 23%

HPC Work, 24%

Project Quota

Small File I/O

SSD Acceleration

Fine-Grained Monitoring

NFS/CIFS Access

Management

Connectors

Object/Cloud Links

Data Management

Backup/Replication

HSM

Client Performance

Cluster Integration

Large I/O

I/O Caching

RAS Features

8

ddn.com

BA 6%

HPC "Work" Corp 15%

Research Data Analysis 35%

HPC Archive 23%

HPC Work, 24%

Project Quota

Small File I/O

SSD Acceleration

Fine-Grained Monitoring

NFS/CIFS Access

Management

Connectors

Object/Cloud Links

Data Management

Backup/Replication

HSM

Client Performance

Cluster Integration

Large I/O

I/O Caching

9

ddn.com

Nagoya University Acceptance BM

• Large Cluster– FFP from each core in the cluster– Most efficient configuration with 3 TB/4 TB

drives– 350-400 threads per Lustre OST

10

ddn.com

Nagoya University Initial DataFPP with Large Number of Threads

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 50 100 150 200 250

GB/sec

Processes per OST

Write Performance Single OST (GB/sec)

11

ddn.com

Large I/O Patches

0%

20%

40%

60%

80%

100%

120%

0 100 200 300 400 500 600 700

Perf

orm

ance

per

OST

as

% o

f Pea

k Pe

rfor

man

ce

Number of Process

Write: Lustre Backend Performance Degradation (Maximum for each dataset=100%)

7.2KSAS(1MB RPC) 7.2KSAS(4MB RPC) SSD(1MB RPC)

0%

20%

40%

60%

80%

100%

120%

0 100 200 300 400 500 600 700

Perf

orm

ance

per

OST

as

% o

f Pea

k Pe

rfor

man

ce

Number of Process

Read : Lustre Backend Performance Degradation (Maximum for each dataset=100%)

7.2KSAS(1MB RPC) 7.2KSAS(4MB RPC) SSD(1MB RPC)

12

ddn.com

Raw Device Performance: Write

0

200

400

600

800

1000

1200

Thro

ughp

ut(M

B/s

ec)

Total number of thread

sgpdd-survey(SeagateES3 7.2 NL-SAS, RAID6, write)crg=1 crg=2 crg=4 crg=8 crg=16

crg=32 crg=64 crg=128 crg=256

0

200

400

600

800

1000

1200

Thro

ughp

ut(M

B/s

ec)

Total number of thread

sgpdd-survey(Hitachi 7.2K NL-SAS, RAID6, write)

crg=1 crg=2 crg=4 crg=8 crg=16

crg=32 crg=64 crg=128 crg=256

12

13

ddn.com

Raw Device Performance: Read

0

200

400

600

800

1000

1200

1400

Thro

ughp

ut(M

B/s

ec)

Total number of thread

sgpdd-survey(SeagateES3 7.2 NL-SAS, RAID6, read)crg=1 crg=2 crg=4 crg=8 crg=16

crg=32 crg=64 crg=128 crg=256

0

200

400

600

800

1000

1200

Thro

ughp

ut(M

B/s

ec)

Total number of thread

sgpdd-survey(Hitachi 7.2K NL-SAS , RAID6, read)

crg=1 crg=2 crg=4 crg=8 crg=16

crg=32 crg=64 crg=128 crg=256

14

ddn.com

BA 6%

HPC "Work" Corp 15%

Research Data Analysis 35%

HPC Archive 23%

HPC Work, 24%

Project Quota

Small File I/O

SSD Acceleration

Fine-Grained Monitoring

NFS/CIFS Access

Management

Connectors

Object/Cloud Links

Data Management

Backup/Replication

HSM

Client Performance

Cluster Integration

Large I/O

I/O Caching

15

ddn.com

Output Data from a Simulation

• Requirements– Random reads against multiple large files– 2 million 4k random read IOPS

• Solution– Lustre file system with 16 OSS servers– Two SFA12K (or one SFA12KXi)– 40 SSDs as Object Storage Devices

16

ddn.com

Lustre 4K Random Read IOPSConfiguration: 4 OSSs, 10 SSDs, 16 clients

0

100,000

200,000

300,000

400,000

500,000

600,000

1n32p 2n64p 4n128p 8n256p 12n384p 16n512p

Lustre 4K Random Read (FPP and SSF)FPP (POSIX)

FPP (MPIIO)

SSF (POSIX)

SSF (MPIIO)

17

ddn.com

Genomics Workflos

• Mixed workflows– Ingest– Sequencing pipelines: large file I/O– Analytics workflows: mixed I/O

• Various I/O issues– Random reads for reference data– Small-file random reads

18

ddn.com

Random Reads with SSDs

19

ddn.com

Data Analysis “Workflows”

• Scientific data analysis– Genomics workflows– Seismic Data Analysis– Various types of accelerators– Large scientific instruments in astronomy– Remote sensing and environmental monitoring– Microscopy

20

ddn.com

Data Analysis “Workflows”

• Additional topics–Data ingest–Data management and data retention–Data distribution and data sharing

21

ddn.com

Hyperscale StorageHPC, Cloud, Data Analysis

High Performance ComputingMostly (very) large files (GBs)Mostly write I/O performance

Mostly streaming performance

10s of Petabytes of DataScratch data

100,000s coresMostly InfinibandSingle location

Very limited replication factorHigh efficiency

Cloud ComputingSmall and medium size files (MBs)

Mostly read I/O performanceMostly transactional performance

10s of Billions of filesWORM & WORN

10s of millions of coresAlmost exclusively Ethernet

Highly distributed dataHigh replication factor

Low efficiency

Data Analysis

Workflows

22

ddn.com

BA 6%

HPC "Work" Corp 15%

Research Data Analysis 35%

HPC Archive 23%

HPC Work, 24%

Project Quota

Small File I/O

SSD Acceleration

Fine-Grained Monitoring

NFS/CIFS Access

Management

Connectors

Object/Cloud Links

Data Management

Backup/Replication

HSM

Client Performance

Cluster Integration

Large I/O

I/O Caching

23

ddn.com

A (DDN) Vision for Lustre

• Maximum sequential and transactional performance per storage sub-system CPU

• Caching at various layers within the data path

• Increased single node streaming and small file performance

• Millions of metadata operations in a single FS

• Millions of random (read) IOPS within a single FS

24

ddn.com

A (DDN) Vision for Lustre cont.

• Data management features, including (cloud) tiering, fast and efficient data back-up, and data lifecycle management

• Novel usability features such as cluster integration, QoS, directory-level quota, etc.

• Extremely high backend reliability for small and mid-sized systems

25

ddn.com

Futures for Lustre?

• Work Closely with Users– User problems are the best source for future

direction– Translate user problems into roadmap priorities

• Work Closely with the Lustre Community– Work very closely with OpenSFS and Intel

HPDD on Lustre roadmap priorities and various other topics

26

ddn.com

Futures for Lustre?

27

ddn.com

top related