Lustre Beyond HPC - OpenSFSopensfs.org/wp-content/uploads/2014/04/D2_S25_LustreFor... · 2016-10-12 · • Lustre market expansion – From a relatively small player to the dominant
Post on 30-May-2020
6 Views
Preview:
Transcript
Lustre Beyond HPCToward Novel Use Cases for Lustre?
Presented at LUG 2014
Robert Triendl, DataDirect Networks, Inc.2014/03/31
2
ddn.com
this is a vendor presentation!
3
ddn.com
Lustre Today
• The undisputed file-system-of-choice for HPC – large-scale parallel I/O for very large HPC clusters
(several thousand nodes or larger)– applications that generate very large datasets but only a
relatively limited amount of metadata traffic
• Alternative solutions– are typically more expensive (since proprietary) – and not nearly as scalable as Lustre
4
ddn.com
Lustre Market OverviewData for the Japan Market
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2008 2013 2018
Cloud & Content
Business Data Analysis
HPC "Work" Corporate
Research Data Analysis
HPC Archive
HPC "Work"
5
ddn.com
What Happened?Example: Storage Market Japan
• Lustre market expansion– From a relatively small player to the dominant HPC file
system
– From tens of OSSs to hundreds of OSSs (and, when including the “K” system, literally thousands of OSSs)
– But, also Lustre has become synonymous with large-scale HPC
– Experiments to use Lustre in other markets and for other applications have decreased, rather than increased
6
ddn.com
Overall Storage Market Evolution• “Software-defined storage” is replacing traditional
storage architectures in large cloud deployments
• Many Lustre “alternatives” are now available– CEPH, OpenStack/Swift, Swift Stack, Gluster, etc. – Various Hadoop distributions– Commercial Object Storage (DDN WOS, Scality,
Cleversafe, etc.)
• Lustre remains “exotic” and is rarely considered even an option
7
ddn.com
BA 6%
HPC "Work" Corp 15%
Research Data Analysis 35%
HPC Archive 23%
HPC Work, 24%
Project Quota
Small File I/O
SSD Acceleration
Fine-Grained Monitoring
NFS/CIFS Access
Management
Connectors
Object/Cloud Links
Data Management
Backup/Replication
HSM
Client Performance
Cluster Integration
Large I/O
I/O Caching
RAS Features
8
ddn.com
BA 6%
HPC "Work" Corp 15%
Research Data Analysis 35%
HPC Archive 23%
HPC Work, 24%
Project Quota
Small File I/O
SSD Acceleration
Fine-Grained Monitoring
NFS/CIFS Access
Management
Connectors
Object/Cloud Links
Data Management
Backup/Replication
HSM
Client Performance
Cluster Integration
Large I/O
I/O Caching
9
ddn.com
Nagoya University Acceptance BM
• Large Cluster– FFP from each core in the cluster– Most efficient configuration with 3 TB/4 TB
drives– 350-400 threads per Lustre OST
10
ddn.com
Nagoya University Initial DataFPP with Large Number of Threads
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 50 100 150 200 250
GB/sec
Processes per OST
Write Performance Single OST (GB/sec)
11
ddn.com
Large I/O Patches
0%
20%
40%
60%
80%
100%
120%
0 100 200 300 400 500 600 700
Perf
orm
ance
per
OST
as
% o
f Pea
k Pe
rfor
man
ce
Number of Process
Write: Lustre Backend Performance Degradation (Maximum for each dataset=100%)
7.2KSAS(1MB RPC) 7.2KSAS(4MB RPC) SSD(1MB RPC)
0%
20%
40%
60%
80%
100%
120%
0 100 200 300 400 500 600 700
Perf
orm
ance
per
OST
as
% o
f Pea
k Pe
rfor
man
ce
Number of Process
Read : Lustre Backend Performance Degradation (Maximum for each dataset=100%)
7.2KSAS(1MB RPC) 7.2KSAS(4MB RPC) SSD(1MB RPC)
12
ddn.com
Raw Device Performance: Write
0
200
400
600
800
1000
1200
Thro
ughp
ut(M
B/s
ec)
Total number of thread
sgpdd-survey(SeagateES3 7.2 NL-SAS, RAID6, write)crg=1 crg=2 crg=4 crg=8 crg=16
crg=32 crg=64 crg=128 crg=256
0
200
400
600
800
1000
1200
Thro
ughp
ut(M
B/s
ec)
Total number of thread
sgpdd-survey(Hitachi 7.2K NL-SAS, RAID6, write)
crg=1 crg=2 crg=4 crg=8 crg=16
crg=32 crg=64 crg=128 crg=256
12
13
ddn.com
Raw Device Performance: Read
0
200
400
600
800
1000
1200
1400
Thro
ughp
ut(M
B/s
ec)
Total number of thread
sgpdd-survey(SeagateES3 7.2 NL-SAS, RAID6, read)crg=1 crg=2 crg=4 crg=8 crg=16
crg=32 crg=64 crg=128 crg=256
0
200
400
600
800
1000
1200
Thro
ughp
ut(M
B/s
ec)
Total number of thread
sgpdd-survey(Hitachi 7.2K NL-SAS , RAID6, read)
crg=1 crg=2 crg=4 crg=8 crg=16
crg=32 crg=64 crg=128 crg=256
14
ddn.com
BA 6%
HPC "Work" Corp 15%
Research Data Analysis 35%
HPC Archive 23%
HPC Work, 24%
Project Quota
Small File I/O
SSD Acceleration
Fine-Grained Monitoring
NFS/CIFS Access
Management
Connectors
Object/Cloud Links
Data Management
Backup/Replication
HSM
Client Performance
Cluster Integration
Large I/O
I/O Caching
15
ddn.com
Output Data from a Simulation
• Requirements– Random reads against multiple large files– 2 million 4k random read IOPS
• Solution– Lustre file system with 16 OSS servers– Two SFA12K (or one SFA12KXi)– 40 SSDs as Object Storage Devices
16
ddn.com
Lustre 4K Random Read IOPSConfiguration: 4 OSSs, 10 SSDs, 16 clients
0
100,000
200,000
300,000
400,000
500,000
600,000
1n32p 2n64p 4n128p 8n256p 12n384p 16n512p
Lustre 4K Random Read (FPP and SSF)FPP (POSIX)
FPP (MPIIO)
SSF (POSIX)
SSF (MPIIO)
17
ddn.com
Genomics Workflos
• Mixed workflows– Ingest– Sequencing pipelines: large file I/O– Analytics workflows: mixed I/O
• Various I/O issues– Random reads for reference data– Small-file random reads
18
ddn.com
Random Reads with SSDs
19
ddn.com
Data Analysis “Workflows”
• Scientific data analysis– Genomics workflows– Seismic Data Analysis– Various types of accelerators– Large scientific instruments in astronomy– Remote sensing and environmental monitoring– Microscopy
20
ddn.com
Data Analysis “Workflows”
• Additional topics–Data ingest–Data management and data retention–Data distribution and data sharing
21
ddn.com
Hyperscale StorageHPC, Cloud, Data Analysis
High Performance ComputingMostly (very) large files (GBs)Mostly write I/O performance
Mostly streaming performance
10s of Petabytes of DataScratch data
100,000s coresMostly InfinibandSingle location
Very limited replication factorHigh efficiency
Cloud ComputingSmall and medium size files (MBs)
Mostly read I/O performanceMostly transactional performance
10s of Billions of filesWORM & WORN
10s of millions of coresAlmost exclusively Ethernet
Highly distributed dataHigh replication factor
Low efficiency
Data Analysis
Workflows
22
ddn.com
BA 6%
HPC "Work" Corp 15%
Research Data Analysis 35%
HPC Archive 23%
HPC Work, 24%
Project Quota
Small File I/O
SSD Acceleration
Fine-Grained Monitoring
NFS/CIFS Access
Management
Connectors
Object/Cloud Links
Data Management
Backup/Replication
HSM
Client Performance
Cluster Integration
Large I/O
I/O Caching
23
ddn.com
A (DDN) Vision for Lustre
• Maximum sequential and transactional performance per storage sub-system CPU
• Caching at various layers within the data path
• Increased single node streaming and small file performance
• Millions of metadata operations in a single FS
• Millions of random (read) IOPS within a single FS
24
ddn.com
A (DDN) Vision for Lustre cont.
• Data management features, including (cloud) tiering, fast and efficient data back-up, and data lifecycle management
• Novel usability features such as cluster integration, QoS, directory-level quota, etc.
• Extremely high backend reliability for small and mid-sized systems
25
ddn.com
Futures for Lustre?
• Work Closely with Users– User problems are the best source for future
direction– Translate user problems into roadmap priorities
• Work Closely with the Lustre Community– Work very closely with OpenSFS and Intel
HPDD on Lustre roadmap priorities and various other topics
26
ddn.com
Futures for Lustre?
27
ddn.com
top related