Performance Study: Abaqus/Standard 6.8-3 Stan Posey Director, Industry and Applications Market Development Panasas, Fremont, CA, USA Bill Loewe, Ph.D. Sr. Applications Engineer Panasas, Fremont, CA, USA
Performance Study:Abaqus/Standard 6.8-3
Stan PoseyDirector, Industry and Applications Market DevelopmentPanasas, Fremont, CA, USA
Bill Loewe, Ph.D.Sr. Applications EngineerPanasas, Fremont, CA, USA
Slide 2 Please Keep Confidential Between CSC and PanasasSlide 2 Please Keep Confidential to Customer and Panasas
Background on Abaqus/Standard Study
Abaqus is an application from SIMULIA -- not a benchmark kernel
The FEA model and tests are relevant to customer practice
All tests were run on a dedicated system at Panasas
The results were validated by SIMULIA
Since Apr 2007, SIMULIA and Panasas have made joint
investments in a business and technical alliance that
ensures Abaqus will fully leverage Panasas PanFS
This study demonstrates benefits of Panasas parallel file
system and parallel storage for Abaqus/Standard 6.8-3
with tests for both single job and mulit-job computing
Motivation
Considerations
Slide 3 Please Keep Confidential Between CSC and PanasasSlide 3 Please Keep Confidential to Customer and Panasas
3
Abaqus/Standard 6.8-3: Model S4b 5M DOF Non-linear Static Analysis
Automotive engine block cylinder head bolt-up
Panasas Study on Abaqus/Standard 6.8-3
Slide 4 Please Keep Confidential Between CSC and PanasasSlide 4 Please Keep Confidential to Customer and Panasas
Abaqus/Standard I/O Scheme
CSM implicit solver
Abaqus/Standard is
direct and single-
step, with out-of-core
READS and WRITES
-- I/O occurs in the sparse
factor phase of the solver
-- this scheme is for static, if
an eigen (Lanzcos) solution,
then I/O can be VERY heavy
-- NOTE: Abaqus also has an
implicit iterative solver
start
Write solution results [100’s of GB’s of I/O]
complete
element matrix
generation and
assembly into
global matrix
matrix factor
(dominant phase,
as much as 85%
of total time,
often I/O wait)
FBS solve phase,
stress recovery,
multiple RHS’s
Factor matrixout-of-core,reads/writes
.
.
.
.
.
.
Read nodes,elements andcontrol file
Work Dir: serial IO
Scratch Dir: parallel IO
Work Dir: serial IO
Job Task IO Scheme IO Operation
Slide 5 Please Keep Confidential Between CSC and PanasasSlide 5 Please Keep Confidential to Customer and Panasas
Features of the Hardware System Configurations
CISCOSYSTEMS
NOTE: Panasas total 30 TB in 12U, installed and operational in just 1 hr!
10 GigE
Features of Penguin cluster configuration:
Processors: 2.3GHz QC AMD Opteron
Nodes: 8 x 2 Sockets x 4 cores; 2 GB/core
Interconnect: 10GigE
Local FS: Ext3, single drive per node, 160 GB
SATA, 7200 RPM
Features of the Panasas storage system:
3 shelves: 1 director + 10 storage blades
Each shelf 10 TB, total of 30 TB
Panasas Study on Abaqus/Standard 6.8-3
8 nodes,
64 cores
10
GigE
Slide 6 Please Keep Confidential Between CSC and PanasasSlide 6 Please Keep Confidential to Customer and Panasas
Abaqus/Standard 6.8-3: Comparison of PanFS vs. Local FS
Total Time in Seconds
5M DOFEngine Block
1273213674
12770
15108
0
6000
12000
18000
PanFS -- Num OpsPanFS -- Total TimeLocal FS -- Num OpsLocal FS -- Total Time
Lower
is
better
11%
NOTE: Num-Ops times within 1%Difference is IO
NOTE: PanFS
11% Advantage
in Total Time
vs. Local FS
S4b Performance for Single Core
1 Job x 1 Core x 1 Node
Times for Single Job on a Single Core
Slide 7 Please Keep Confidential Between CSC and PanasasSlide 7 Please Keep Confidential to Customer and Panasas
Numerical vs. IO Computational Profile
0
6000
12000
18000Local FS -- IO-OpsLocal FS -- Num-OpsPanFS -- IO-OpsPanFS -- Num-Ops
Job Profiles of Numerical Ops % vs. IO %
So
lve
r
97%
50%
Lower
is
better
13674 IO – 16%
93%
Numerical
Operations
IO – 7%
15108
NOTE: PanFS
11% Advantage
in Total Time
vs. Local FS
5M DOFEngine Block
Abaqus/Standard 6.8-3: Comparison of PanFS vs. Local FS
Total Time in Seconds
84%
Numerical
Operations
1 Job x 1 Core x 1 Node
Slide 8 Please Keep Confidential Between CSC and PanasasSlide 8 Please Keep Confidential to Customer and Panasas
Abaqus/Standard 6.8-3: Comparison of PanFS vs. Local FS
S4b Performance for 1 Core x 8 NodesTotal Time in Seconds
5M DOFEngine Block
12641
14373
12654
15064
0
6000
12000
18000
PanFS -- Num OpsPanFS -- Total TimeLocal FS -- Num OpsLocal FS -- Total Time
Lower
is
better
5%
NOTE: PanFS
5% Advantage
in Total Time
vs. Local FS
Average Times of 8 Simultaneous Jobs
NOTE: N-Ops times within 1%Difference is IO
Average of 8 Jobs | Each on 1 Core | Each on 1 Node | 7 Cores Idle on Each Node
8 Jobs x 1 Core x 8 Nodes
Slide 9 Please Keep Confidential Between CSC and PanasasSlide 9 Please Keep Confidential to Customer and Panasas
Numerical vs. IO Computational Profile
0
6000
12000
18000Local FS -- IO-OpsLocal FS -- Num-OpsPanFS -- IO-OpsPanFS -- Num-Ops
Job Profiles of Numerical Ops % vs. IO %
So
lve
r
97%
50%
Lower
is
better
14373IO – 19%
88%
Numerical
Operations
IO – 12%
15064
5M DOFEngine Block
Abaqus/Standard 6.8-3: Comparison of PanFS vs. Local FS
Total Time in Seconds
81%
Numerical
Operations
NOTE: PanFS
5% Advantage
in Total Time
vs. Local FS
Average of 8 Jobs | Each on 1 Core | Each on 1 Node | 7 Cores Idle on Each Node
8 Jobs x 1 Core x 8 Nodes
Slide 10 Please Keep Confidential Between CSC and PanasasSlide 10 Please Keep Confidential to Customer and Panasas
Abaqus/Standard 6.8-3: Comparison of PanFS vs. Local FS
S4b Performance for 8 Cores x 1 NodeTotal Time in Seconds
5M DOFEngine Block
2946
5495
0
2000
4000
6000PanFS -- Total Time
Local FS -- Total Time
Lower
is
better
NOTE: PanFS
46% Advantage
in Total Time
vs. Local FS
Singe Job on Single 8-Core Node
46%
1 Job x 8 Cores x 1 Node
Slide 11 Please Keep Confidential Between CSC and PanasasSlide 11 Please Keep Confidential to Customer and Panasas
Abaqus/Standard 6.8-3: Comparison of PanFS vs. Local FS
S4b Performance for Single Job ScalingTotal Time in Seconds
5M DOFEngine Block
13674
2946
5495
15108
0
4000
8000
12000
16000
Lower
is
better
Scalability of Single Job from 1 to 8 Cores
1 Job
1 Core
1 Job
8 Cores
NOTE: PanFS
58% in Parallel
Efficiency vs.
35% for Local FS
1 Job
1 Core
1 Job
8 Cores
4.6x on 8
2.8x on 8
Slide 12 Please Keep Confidential Between CSC and PanasasSlide 12 Please Keep Confidential to Customer and Panasas
Abaqus/Standard 6.8-3: Comparison of PanFS vs. Local FS
S4b Performance for 8 Cores x 4 NodesTotal Time in Seconds
5M DOFEngine Block3773
5289
0
2000
4000
6000PanFS -- Total Time
Local FS -- Total Time
Lower
is
better
NOTE: PanFS
40% Advantage
in Total Time
vs. Local FS
Average Times of 4 Simultaneous Jobs
Average of 4 Jobs | Each Job on 8 Cores | Each Job on 1 Node Using All 8 Cores
40%
4 Jobs x 8 Cores x 4 Nodes
Slide 13 Please Keep Confidential Between CSC and PanasasSlide 13 Please Keep Confidential to Customer and Panasas
Abaqus/Standard 6.8-3: Comparison of PanFS vs. Local FS
S4b Performance for Single vs. Multi-JobTotal Time in Seconds
5M DOFEngine Block
2946
3773
52895495
0
2000
4000
6000
PanFS -- Total Time, Single Job
PanFS -- Total Time, Multi-Job
Local FS -- Total Time, Single Job
Local FS - Total Time, Multi-JobLower
is
better
Times of Single 8-way and Multi 8-way Jobs
1 Job
8-way
1 Job
8-way
4 Jobs
8-way
4 Jobs
8-way
NOTE: PanFSdegrades 22%for1 to 4 nodes
NOTE: Local FSabout the same for 1 to 4 nodes, each FS on node is independent
NOTE: PanFS
40% Advantage
in Total Time
vs. Local FS
22%
Slide 14 Please Keep Confidential Between CSC and PanasasSlide 14 Please Keep Confidential to Customer and Panasas
Panasas and Intel Abaqus S4b Study
Panasas:
16 client iozone
1180 MB/s write
1260 MB/s read
ENDEAVOR File Systems and Storage
PanFS: 2 Shelves AS6000 (1+10 and 2+9), 38 TB FS; network connected through 10GigE switches and IB router, ~ 1.2 GB/s
Lustre: DDN storage, 100 TB FS, ~ 5 GB/s
Local FS: Ext2 FS, 370 GB SATA drive, 80 MB/s per disk
Intel ENDEAVOR Xeon ClusterLocation: Intel HPC Customer Enabling Center, Dupont ,WA
Vendor: Intel; 80 nodes; 640 c ores; 18 GB memory per node
CPU: Intel Xeon (Nehalem) QC, 2.8 GHz, 8 cores per node
Interconnect: Infiniband
File Systems: Panasas PanFS; Lustre on DDN; Local disk
Operating System: RHEL Linux v5.2
Local FS:
Ext2
~80 MB/s
per disk
DDN/Lustre:
16 client iozone
5390 MB/s write
3370 MB/s read
ENDEAVOR
Slide 15 Please Keep Confidential Between CSC and PanasasSlide 15 Please Keep Confidential to Customer and Panasas
2613
1268
639 574
1289
4180
0
1000
2000
3000
4000
5000
8 16 32
PanFS
Local FS
Abaqus/Standard 6.8-3: Comparison of PanFS vs. Local FS Ext2
S4b Performance for Single Job ScalingTotal Time in Seconds
5M DOFEngine Block
Lower
is
better
Single Job Scalability 8 to 32 Cores; Memory 90%
NOTE: PanFS
advantage over
Local for single
node case when
IO is heavy – in
the same range
for 2-4 nodes
when job goes
in-memoryNumber of Cores
60 %
Slide 16 Please Keep Confidential Between CSC and PanasasSlide 16 Please Keep Confidential to Customer and Panasas
1268 1219 12191289
0
500
1000
1500
2000
Memory 90% Memory 70%
PanFS
Local FS
S4b Performance for Single Job ScalingTotal Time in Seconds
5M DOFEngine Block
Lower
is
better
Single Job Scalability on 16 Cores; Memory 90%/70%
NOTE: Effect of
memory setting
16 Cores Each Case
Abaqus/Standard 6.8-3: Comparison of PanFS vs. Local FS Ext2
Slide 17 Please Keep Confidential Between CSC and PanasasSlide 17 Please Keep Confidential to Customer and Panasas
12941360
0
500
1000
1500
2000PanFS
Local FS
S4b Performance for Multi-Job Thru-putTotal Time in Seconds
5M DOFEngine Block
Lower
is
better
Average Times for 8 Jobs, Each 16 Cores; Mem 90%
NOTE: PanFS
and Local FS
difference ~ 5%
Average Times for 8 Jobs | Each Job on 2 Nodes | Each Job on 16 Cores | Total 128 Cores
8 Jobs x 16 Nodes x 128 Cores
Average of 8 Jobs Average of 8 Jobs
Abaqus/Standard 6.8-3: Comparison of PanFS vs. Local FS Ext2
Slide 18 Please Keep Confidential Between CSC and PanasasSlide 18 Please Keep Confidential to Customer and Panasas
Questions
Thank You
For more information,call Panasas at:
1-888-PANASAS(US & Canada)
00 (800) PANASAS2(UK & France)
00 (800) 787-702(Italy)
+001 (510) 608-7790(All Other Countries)