Part No 821-0125-10 Revision 1.0, 06/25/09 SOLID STATE DRIVES IN HIGH PERFORMANCE COMPUTING REDUCING THE I/O BOTTLENECK Lawrence McIntosh, Systems Engineering Solutions Group Michael Burke, Ph.D., Strategic Applications Engineering Sun BluePrints™ Online
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Part No 821-0125-10Revision 1.0, 06/25/09
SOLID STATE DRIVES IN HIGH PERFORMANCE COMPUTINGREDUCING THE I/O BOTTLENECKLawrence McIntosh, Systems Engineering Solutions GroupMichael Burke, Ph.D., Strategic Applications Engineering
Sun BluePrints™ Online
Sun Microsystems, Inc.
Table of ContentsIntroduction ....................................................................................................... 3
Note: The Lustre file system command lfs setstripe was used on specific directories (st1hdd, st1ssd, st2ssd) to direct I/O to specific HDDs and SSDs for data contained in
this report.
In addition, the lfs getstripe Lustre file system command was used to review
that proper striping was in force as well as specific object-storage targets (OSTs)
were assigned that were needed to support the specific tests that were run in this
report as described. The Java™ Performance statistics monitor (JPerfmeter7) was also
incorporated to see which OSS/OST was being used.
Sun Microsystems, Inc.18 Solid State Drives in HPC: Reducing the I/O Bottleneck
.
8 http://www.iozone.org/
IOZone file system testingThe IOZone file system benchmark tool8 is used to perform broad--based performance
testing of file systems, using a synthetic workload with a wide variety of file system
operations. IOZone is an independent, portable benchmark that is used through the
industry.
Runs were first made with the HDD-based OSS and the IOZone benchmark in order to
establish baseline performance. Similar runs were then made using the SSD-based
configuration, again recording performance using IOZone.
From the Lustre file system client several IOZone commands were used to gather
data for these tests. The following IOZone command was used to direct traffic to the
HDD-based OSS on the first Sun Blade X6250 server module.
Sun used the following configuration for the OSS server to run the IOZone test
comparisons described in this report:
Hardware configuration
• Sun Blade X6250 server module
• Two 3.00 GHz quad-core Intel Xeon Processor E5450 CPUs
• 16 GB memory
• One SSD
• One 10000 RPM SAS HDD
Software configuration
• Red Hat Enterprise Linux 5.1 CentOS 5.1 x86_64
• 2.6.18-53.1.14.el5_lustre.1.6.5smp
• The IOZone file system benchmarking tool
Summary for SSD usage with the Lustre parallel file systemThis report has shown that the use of SSD-based OSSs can drive I/O faster than
traditional HDD-based OSSs. Testing showed, further, that the Lustre file system
can scale with the use of multiple SSD-based OSSs. Not only can I/O bandwidth be
increased with the use of the Lustre file system and SSDs but it is anticipated that
run times of other applications using the Lustre file system equipped with SSDs can
also be reduced .
Future directionsNew technology included in the Lustre file system version 1.8 allows pools of
storage to be configured based on technology and performance, and then allocated
according to the needs of specific jobs. So, for example, an elite pool of extremely
fast SSD storage could be defined along with pools of slower, but higher capacity,
HDD storage. Other pools might be defined to use local devices, SAN devices,
or networked file systems. The Lustre file system then allows these pools to be
allocated as needed to specific jobs in order to optimize performance based upon
service level objectives.
Performance studies of a production Lustre file system have been performed at the
Texas Advanced Computing Center (TACC), using the scaling capabilities of the Lustre
file system to obtain higher performance, therefore reducing the I/O bottleneck.
(This work is described in the Sun BluePrint Solving the HPC I/O Bottleneck: Sun
Lustre Storage System9.)
Future work will explore the use of SSDs integrated with new versions of the Lustre
file system, Quad Data Rate (QDR) InfiniBand, and Sun’s new servers and blades.
Sun Microsystems, Inc.21 Solid State Drives in HPC: Reducing the I/O Bottleneck
.
ConclusionUse of SSDs with Sun servers and blades has demonstrated significant performance
improvements in single-system runs of FEA HPC application benchmarks, and
through the use of the Lustre parallel file system. There is significant promise that
other applications with similar data throughput needs and workloads will also
obtain increased bandwidth as well as a reduction in run times.
Sun Microsystems, Inc.22 Solid State Drives in HPC: Reducing the I/O Bottleneck
.
Appendix: Benchmark descriptions and parametersThe results reported in this article make use of a collection of benchmark numerical
applications. Each benchmark suite makes particular requirements for data that
should be made available so the benchmarks can be evaluated fairly. In this
Appendix, we note the required details for each of the benchmarks used.
ABAQUS standard benchmark test casesThe problems described below provide an estimate of the performance that can
be expected when running ABAQUS/Standard on different computers. The jobs are
representative of typical ABAQUS/Standard applications including linear statics,
nonlinear statics, and natural frequency extraction.
• S1: Plate with gravity load
This benchmark is a linear static analysis of a plate with gravity loading. The plate
is meshed with second-order shell elements of type S8R5 and uses a linear elastic
material model. Edges of the plate are fixed. There is no contact.
– Input file name: s1.inp
– Increments: 1
– Iterations: 1
– Degrees of freedom: 1,085,406
– Floating point operations: 1.89E+011
– Minimum memory requirement: 587 MB
– Memory to minimize I/O: 2 GB
– Disk space requirement: 2 GB
• S2: Flywheel with centrifugal load
This benchmark is a mildly nonlinear static analysis of a flywheel with centrifugal
loading. The flywheel is meshed using first-order hexahedral elements of type
C3D8R and uses an isotropic hardening Mises plasticity material model. There
is no contact. The nonlinearity in this problem arises from localized yielding in
the vicinity of the bolt holes. Two versions of this benchmark are provided. Both
versions are identical except that one uses the direct sparse solver and the other
uses the iterative solver.
• S2a: Direct solver version
– Input file name: s2a.inp
– Increments: 6
– Iterations: 12
– Degree of freedom: 474,744
– Floating point operations: 1.86E+012
– Minimum memory requirement: 733 MB
– Memory to minimize I/O: 849 MB
– Disk space requirement: 4.55 GB
Sun Microsystems, Inc.23 Solid State Drives in HPC: Reducing the I/O Bottleneck
.
• S2b: Iterative solver version
– Input file name: s2b.inp
– Increments: 6
– Iterations: 11
– Degrees of freedom: 474,744
– Floating point operations: 8.34E+010
– Minimum memory requirement: 2.8 GB
– Memory to minimize I/O: NA
– Disk space requirement: 387 MB
• S3: Impeller frequencies
This benchmark extracts the natural frequencies and mode shapes of a turbine
impeller. The impeller is meshed with second-order tetrahedral elements of type
C3D10 and uses a linear elastic material model. Frequencies in the range from 100
Hz. to 20,000 Hz. are requested.
Three versions of this benchmark are provided: a 360,000 DOF version that
uses the Lanczos eigensolver, a 1,100,000 DOF version that uses the Lanczos
eigensolver, and a 1,100,000 DOF version that uses the AMS eigensolver.
• S3a: 360,000 DOF Lanczos eigensolver version
– Input file name: s3a.inp
– Degrees of freedom: 362,178
– Floating point operations: 3.42E+11
– Minimum memory requirement: 384 MB
– Memory to minimize I/O: 953 MB
– Disk space requirement: 4.0 GB
• S3b: 1,100,000 DOF Lanczos eigensolver version
– Input file name: s3b.inp
– Degrees of freedom: 1,112,703
– Floating point operations: 3.03E+12
– Minimum memory requirement: 1.33 GB
– Memory to minimize I/O: 3.04 GB
– Disk space requirement: 23.36 GB
• S3c: 1,100,000 DOF AMS eigensolver version
– Input file name: s3c.inp
– Degrees of freedom: 1,112,703
– Floating point operations: 3.03E+12
– Minimum memory requirement: 1.33 GB
– Memory to minimize I/O: 3.04 GB
– Disk space requirement: 19.3 GB
Sun Microsystems, Inc.24 Solid State Drives in HPC: Reducing the I/O Bottleneck
.
• S4: Cylinder head bolt-up
This benchmark is a mildly nonlinear static analysis that simulates bolting a
cylinder head onto an engine block. The cylinder head and engine block are
meshed with tetrahedral elements of types C3D4 or C3D10M, the bolts are meshed
using hexahedral elements of type C3D8I, and the gasket is meshed with special-
purpose gasket elements of type GK3D8. Linear elastic material behavior is used
for the block, head, and bolts while a nonlinear pressure-overclosure relationship
with plasticity is used to model the gasket. Contact is defined between the bolts
and head, the gasket and head, and the gasket and block. The nonlinearity in this
problem arises both from changes in the contact conditions and yielding of the
gasket material as the bolts are tightened.
Three versions of this benchmark are provided: a 700,000 DOF version that is
suitable for use with the direct sparse solver on 32-bit systems, a 5,000,000 DOF
version that is suitable for use with the direct sparse solver on 64-bit systems, and
a 5,000,000 DOF version that is suitable for use with the iterative solver on 64-bit
systems.
• S4a: 700,000 DOF direct solver version
– Input file name: s4a.inp
– Increments: 1
– Iterations: 5
– Degrees of freedom: 720,059
– Floating point operations: 5.77E+11
– Minimum memory requirement: 895 MB
– Memory to minimize I/O: 3 GB
– Disk space requirement: 3 GB
• S4b: 5,000,000 DOF direct solver version
– Input file name: s4b.inp
– Increments: 1
– Iterations: 5
– Degrees of freedom: 5,236,958
– Floating point operations: 1.14E+13
– Minimum memory requirement: 4 GB
– Memory to minimize I/O: 20 GB
– Disk space requirement: 23 GB
• S4c: 5,000,000 DOF iterative solver version
– Input file name: s4c.inp
– Increments: 1
– Iterations: 3
– Degrees of freedom: 5,248,154
– Floating point operations: 3.74E+11
Sun Microsystems, Inc.25 Solid State Drives in HPC: Reducing the I/O Bottleneck
.
– Minimum memory requirement: 16 GB
– Memory to minimize I/O: NA
– Disk space requirement: 3.3 GB
• S5: Stent expansion
This benchmark is a strongly nonlinear static analysis that simulates the
expansion of a medical stent device. The stent is meshed with hexahedral
elements of type C3D8 and uses a linear elastic material model. The expansion
tool is modeled using surface elements of type SFM3DR. Contact is defined
between the stent and expansion tool. Radial displacements are applied to the
expansion tool which in turn cause the stent to expand. The nonlinearity in this
problem arises from large displacements and sliding contact.
– Input file name: s5.inp
– Increments: 21
– Iterations: 91
– Degrees of freedom: 181,692
– Floating point operations: 1.80E+009
– Minimum memory requirement: NA
– Memory to minimize I/O: NA
– Disk space requirement: NA
Note: Abaqus, Inc. would like to acknowledge Nitinol Devices and Components for providing the original finite element model of the stent. The stent model used in this benchmark is not representative of current stent designs.
• S6: Tire footprint
This benchmark is a strongly nonlinear static analysis that determines the
footprint of an automobile tire. The tire is meshed with hexahedral elements of
type C3D8, C3D6H, and C3D8H. Linear elastic and hyperelastic material models
are used. Belts inside the tire are modeled using rebar layers and embedded
elements. The rim and road surface are modeled as rigid bodies. Contact is
defined between the tire and wheel and the tire and road surface. The analysis
sequence consists of three steps. During the first step the tire is mounted to the
wheel, during the second step the tire is inflated, and then during the third step a
vertical load is applied to the wheel. The nonlinearity in the problem arises from
large displacements, sliding contact, and hyperelastic material behavior.
– Input file name: s6.inp
– Increments: 41
– Iterations: 177
– Degrees of freedom: 729,264
– Floating point operations: NA
Sun Microsystems, Inc.26 Solid State Drives in HPC: Reducing the I/O Bottleneck
.
– Minimum memory requirement: 397 MB
– Memory to minimize I/O: 940 MB
– Disk space requirement: NA
Hardware configuration
• Sun Fire X4450 server
• Four 2.93 GHz quad-core Intel Xeon X7350 Processor CPUs
• Four 15,000 RPM 500 GB SAS drives
• Three 32 GB SSDs
The system was set up to boot off of one of the hard disk drives. The base-line hard-
disk based file system was set to stripe across three SAS HDDs. For comparative
purposes, the SSD-based file system was configured across three SSDs.
Software configuration
• 64-bit SUSE Linux Enterprise Server SLES 10 SP 1
• ABAQUS V6.8-1 Standard Module
• ABAQUS 6.7 Standard Benchmark Test Suite
Sun Microsystems, Inc.27 Solid State Drives in HPC: Reducing the I/O Bottleneck
.
NASTRAN benchmark test casesThe problems described below are representative of typical MSC/Nastran
applications including both SMP and DMP runs involving linear statics, nonlinear
statics, and natural frequency extraction.
• vl0sst1
– No. Degrees Of Freedom: 410,889
Run time sensitive to memory allocated to job:
– 2:04:36 elapsed w/ mem=37171200
– 4:35:26 elapsed w/ mem=160mb sys1=32769
– 5:20:12 elapsed w/ mem=80mb sys1=32769
– 1:11:58 elapsed w/ mem=1600mb bpool=40000
(This job does extensive post solution processing of GPSTRESS I/O. )
– Solver: SOL 101
– Memory Usage: 7.3 MB
– Maximum Disk Usage: 4.33 GB
• xx0cmd2
– No. Degrees Of Freedom: 1,315,562
– Solver: SOL 103
– Normal Modes With ACMS - DOMAINSOLVER ACMS (Automated Component
Modal Synthesis)
– Memory Usage: 1800 MB
– Maximum Disk Usage: 14.422 GB
• xl0tdf1
– No. Degrees Of Freedom: 529,257
– Solver: SOL 108 Fluid/Solid Interaction
– Car Cabin Noise - FULL VEHICLE SYSTEM MODEL
– Eigenvalue extraction - Direct Frequency Response
– Memory Usage: 520 MB
– Maximum Disk Usage: 5.836 GB
• xl0imf1
– No. Degrees Of Freedom: 468,233
– Fluid/Solid Interaction
– Frequency Response
– Memory Usage: 503 MB
– Maximum Disk Usage: 10.531 GB
Sun Microsystems, Inc.28 Solid State Drives in HPC: Reducing the I/O Bottleneck
.
• md0mdf1
– No. Degrees Of Freedom: 42,066
– This model is for Exterior Acoustics
– Modal Frequency Response Analysis With UMP Pack
– Fluid/Solid Interaction
– Memory Usage: 1 GB
– Maximum Disk Usage: 414.000 MB
• 400_1 & 400_S
– No. Degrees Of Freedom: 437,340
– Solver: 400 (MARC module)
– Nonlinear Static Analysis
– Memory Usage: 1.63 GB
– Maximum Disk Usage: 3.372 GB
(S Model Sets Aside 3 GB Physical Memory For I/O Buffering)
• getrag (Contact Model)
– No. Degrees Of Freedom: 2,450,320
– PCGLSS 6.0: Linear Equations Solver
– Solver: 101
– Memory Usage: 8.0 GB
– Maximum Disk Usage: 17.847 GB
– Total I/O: 139 GB
Hardware configuration
• Sun Fire x2270 server
• Two 2.93 GHz quad-core Intel Xeon Processor X5570 CPUs
• 24 GB memory
• Three 7200 RPM SATA 500 GB HDDs
• Two 32 GB SSDs
The system was set up to boot from one of the hard disk drives. The base-line hard-
disk based file system was set to stripe across two SATA HDDs. For comparative
purposes, the SSD-based file system was configured across both SSDs.
Software configuration
• 64-bit SUSE Linux Enterprise Server SLES 10 SP 1
• MSC/NASTRAN MD 2008
• MSC/NASTRAN Vendor_2008 Benchmark Test Suite
Sun Microsystems, Inc.29 Solid State Drives in HPC: Reducing the I/O Bottleneck
.
ANSYS 12.0 (prel. 7) with ANSYS 11.0 distributed benchmarks
• bmd-1
– Dsparse solver, 400K DOF
– Static analysis
– Medium sized job, should run in-core on all systems
• bmd-2
– 1M DOF iterative solver job.
– Shows good scaling due to simple preconditioner
• bmd-3
– 2M DOF Static analysis
– Shows good parallel performance for iterative solver
– Uses pcg iterative solver
– Uses msave,on feature, cache friendly
• bmd-4
– Larger dsparse solver job
– 3M DOF, tricky job for dsparse when memory is limited
– Shows I/O as well as CPU performance
– Good to show benefit of large memory
• bmd-5
– 5.8M DOF large pcg solver job
– Good parallel performance for iterative solver on a larger job
– Cache friendly msave,on elements
• bmd-6
– 1M DOF lanpcg: Uses assembled matrix with PCG preconditioner
– New iterative modal based analysis solver chosen to maximize speedups
• bmd-7
– 5M DOF static analysis, uses solid45 elements
– Best test of memory bandwidth performance, which are NOT msave,on
elements
– Lower mflop rate is expected because of sparse matrix/vector kernel
Sun Microsystems, Inc.30 Solid State Drives in HPC: Reducing the I/O Bottleneck
.
Hardware configuration
• Sun Fire x2270 server
• Two 2.93 GHz quad-core Intel Xeon Processor X5570 CPUs
• 24 GB memory
• Two 32 GB SSDs
• Three 7200 rpm SATA 500 GB HDDs
The system was set up to boot from one of the hard disk drives. The base-line hard-
disk based file system was set to stripe across two SATA HDDs. For comparative
purposes, the SSD-based file system was configured across three SSDs.
Software configuration
• 64-bit SUSE Linux Enterprise Server SLES 10 SP 2
• ANSYS V 12.0 Prerelease 7
• ANSYS 11 Distributed BMD Benchmark Test Suite
About the authorsLarry McIntosh is a Principal Systems Engineer at Sun Microsystems and works
within Sun’s Systems Engineering Solutions Group. He is responsible for designing
and implementing high performance computing technologies at Sun’s largest
customers. Larry has 35 years of experience in the computer, communications,
and storage industries and has been a software developer and consultant in the
commercial, government, education and research sectors as well as a computer
science college professor. Larry’s recent work has included the deployment of the
Ranger system servicing the National Science Foundation and Researchers at the
Texas Advanced Computer Center (TACC) in Austin, Texas.
Michael Burke obtained his Ph.D. from Stanford University. Since then he has spent
over 35 years in the development and application of MCAE software. He was the
principal developer of the MARC code now owned by MSC/Nastran. Following the SS
Challenger disaster he developed FANTASTIC (Failure Analysis Thermal and Structural
Integrated Code) for NASA and its suppliers/contractors for the analysis of rocket
(nozzles) More recently he has been involved with the benchmarking of state of the
art HPC platforms using the more prominent commercial ISV MCAE/CFD/CRASH and
other scientific applications He has performed this benchmarking for Fujitsu and
Hewlett Packard, and is currently in the Strategic Applications Engineering group at
Sun Microsystems.
Sun Microsystems, Inc.31 Solid State Drives in HPC: Reducing the I/O Bottleneck
.
References
Web Sites
Sun Fire x4450 server http://www.sun.com/servers/x64/x4450/
Sun Fire x2270 server http://www.sun.com/servers/x64/x2270/
Sun HPC Software, Linux Edition http://www.sun.com/software/products/
hpcsoftware/index.xml
Sun Blade x6250 server module http://www.sun.com/servers/blades/x6250/
Sun Blade 6000 Modular System chassis http://www.sun.com/servers/blades/6000/
JPerfMeter http://jperfmeter.sourceforge.net/
IOZone Benchmark http://www.iozone.org/
Sun BluePrints Articles
Solving the HPC I/O Bottleneck: Sun
Lustre Storage System
http://wikis.sun.com/display/BluePrints/
Solving+the+HPC+IO+Bottleneck+-
+Sun+Lustre+Storage+System
Ordering Sun DocumentsThe SunDocsSM program provides more than 250 manuals from Sun Microsystems,
Inc. If you live in the United States, Canada, Europe, or Japan, you can purchase
documentation sets or individual manuals through this program.
Accessing Sun Documentation OnlineThe docs.sun.com Web site enables you to access Sun technical documentation
online. You can browse the docs.sun.com archive or search for a specific book title or
subject. The URL is http://docs.sun.com
To reference Sun BluePrints Online articles, visit the Sun BluePrints Online Web site
at: http://www.sun.com/blueprints/online.html
Sun Microsystems, Inc.
Sun Microsystems, Inc. 4150 Network Circle, Santa Clara, CA 95054 USA Phone 1-650-960-1300 or 1-800-555-9SUN (9786) Web sun.com
Solid State Drives in HPC: Reducing the I/O Bottleneck