Aerodynamics of a hi-performance vehicle: a parallel computing application inside the Hi-ZEV project Workshop “HPC enabling of OpenFOAM ® for CFD applications” 26-28 november, CINECA, Casalecchio di Reno (BO), Italy A. De Maio (1) , V. Krastev ( 2) , P. Lanucara (3) , F. Salvadore (3) (1) Nu.m.i.d.i.a. S. r. l. (2) Dept. of Industrial Engineering, University of Rome “Tor Vergata” (3) CINECA Roma, Dipartimento SCAI
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Aerodynamics of a hi-performancevehicle: a parallel computing
application inside the Hi-ZEV project
Workshop “HPC enabling of OpenFOAM® for CFD applications”
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
A. De Maio(1), V. Krastev(2), P. Lanucara(3), F. Salvadore(3)
(1) Nu.m.i.d.i.a. S. r. l.(2) Dept. of Industrial Engineering, University of Rome “Tor Vergata”(3) CINECA Roma, Dipartimento SCAI
Summary
• Hi-ZEV project outline
• Preliminary evaluation of the OpenFOAM® code
• Prototype car simulations: aerodynamic results and scalability/performance tests
• Conclusions
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
• Granted by the Italian Ministry of Economic Development’s program«Industria 2015 – Nuove Tecnologie per il Made in Italy»
• The project aim is the development of an Innovative High Performance Car with Low Environmental Impact based on an Electrical/Hybrid Powertrain
• The project started on 01/01/2011 and will last until 31/12/2013
Hi-ZEV: a collaborative industrial research project
Hi-ZEV: the partners
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Very light vehicle (low weight/power ratio)
High performance Hybrid Powertrain for a wide rangetorque availability
Very advanced chassis and suspensions for an excellentroad-holding
Accurate Fluid-Dynamic Design
Hi-ZEV: technical Key Points
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Very light vehicle (low weight/power ratio)
High performance Hybrid Powertrain for a wide rangetorque availability
Very advanced chassis and suspensions for an excellentroad-holding
Accurate Fluid-Dynamic Design CFD
The role of CFD inside the project
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
• In the early, as well as in the more advanced design stages, CFD can beeffectively used to optimize:
1. the external aerodynamics of the vehicle;2. the underhood aerodynamics/thermal
management;3. The HVAC systems.
• The combination of an open source fully parallelized code (OpenFOAM®) with the the HPC infrastructure ofCASPUR/CINECA represents anincredibly powerful and efficientanswer to these needs.
OpenFOAM® + HPC
CFD
Externalaerodynamics Underhood HVAC
Preliminary simulations on the Matrix cluster
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
• Preliminary evaluation of OpenFOAM®
on the Matrix infrastructure
• Standard external aerodynamics test case (Ahmed body)
• OpenFOAM-1.7.1 + OpenMPI-1.4.2 + Scotch for decomposition
• Steady state solver (simpleFoam) on unstructured grids (up to 6*106 cells)
• Three architectures selected for the performance tests
Prototype car simulations: computationaldomain
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
movingfloor
inlet
half car
symmetryplane
outlet
top
side
Prototype car simulations: aerodynamicresults (OF vs. Fluent)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
OpenFOAM® settings:
•Symmetrical prism/tetra grid(exactly the same for both codes)
•simpleFoam pressure-based solver
•Realizable k-ε for turbulence + standard WF
•TVD scheme for momentumconvection, upwind for k/ε
Fluent settings:
•Symmetrical prism/tetra grid(exactly the same for both codes)
•pressure-based solver
•Realizable k-ε for turbulence + non-equilibrium WF
•Second-order upwind scheme for allconvective terms
Prototype car simulations: aerodynamicresults (OF vs. Fluent)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
OpenFOAM® Fluent
Aerodynamic coefficients
Cd = 0.32, CL = 0.14 Cd = 0.31, CL = 0.17
Prototype car simulations: aerodynamicresults (OF vs. Fluent)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Pressure distribution around the car, y=0 (symmetry plane)
Fluent, 6000 iterations
OpenFOAM, 4500 iterations
212
pp pC
Uρ
∞
∞ ∞
−=
212
pp pC
Uρ
∞
∞ ∞
−=
Prototype car simulations: aerodynamicresults (OF vs. Fluent)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Pressure distribution around the car, y=- 0. 4
Fluent, 6000 iterations
OpenFOAM, 4500 iterations
212
pp pC
Uρ
∞
∞ ∞
−=
212
pp pC
Uρ
∞
∞ ∞
−=
Prototype car simulations: aerodynamicresults (OF vs. Fluent)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Pressure distribution around the car, y=- 0. 7
Fluent, 6000 iterations
OpenFOAM, 4500 iterations
212
pp pC
Uρ
∞
∞ ∞
−=
212
pp pC
Uρ
∞
∞ ∞
−=
Prototype car simulations: aerodynamicresults (OF vs. Fluent)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Total pressure distribution around the car, y=0 (symmetry plane)
Fluent, 6000 iterations
OpenFOAM, 4500 iterations
,
tpt
t
p pCp p
∞
∞ ∞
−=
−
,
tpt
t
p pCp p
∞
∞ ∞
−=
−
Prototype car simulations: aerodynamicresults (OF vs. Fluent)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Total pressure distribution around the car, y=- 0. 4
Fluent, 6000 iterations
OpenFOAM, 4500 iterations
,
tpt
t
p pCp p
∞
∞ ∞
−=
−
,
tpt
t
p pCp p
∞
∞ ∞
−=
−
Prototype car simulations: aerodynamicresults (OF vs. Fluent)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Total pressure distribution around the car, y=- 0. 7
Fluent, 6000 iterations
OpenFOAM, 4500 iterations
,
tpt
t
p pCp p
∞
∞ ∞
−=
−
,
tpt
t
p pCp p
∞
∞ ∞
−=
−
Prototype car simulations: aerodynamicresults (OF vs. Fluent)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Total pressure distribution around the car, z=0. 1 1
Fluent, 6000 iterations
OpenFOAM, 4500 iterations
,
tpt
t
p pCp p
∞
∞ ∞
−=
−
,
tpt
t
p pCp p
∞
∞ ∞
−=
−
Prototype car simulations: inter-nodescalability tests (Matrix vs. Jazz)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Speedup, Matrix vs Jazz, PCG
0
4
8
12
16
20
24
0 4 8 12 16 20
Spee
dup
Number of nodes
Matrix, PCG
Jazz, PCG
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•The computing node is selected asthe fundamental unit
1( )( )
node
N nodes
time per stepspeedup time per step−
−
− −= − −
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Speedup, Matrix vs Jazz, GAMG
Prototype car simulations: inter-nodescalability tests (Matrix vs. Jazz)
0
4
8
12
16
0 4 8 12 16 20
Spee
dup
Number of nodes
Matrix, GAMG
Jazz, GAMG
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•The computing node is selected asthe fundamental unit
1( )( )
node
N nodes
time per stepspeedup time per step−
−
− −= − −
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Speedup, Matrix, GAMG vs PCG
Prototype car simulations: inter-nodescalability tests (Matrix vs. Jazz)
0
4
8
12
16
20
24
0 8 16 24 32
Spee
dup
Number of nodes
Matrix, PCG
Matrix, GAMG
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•The computing node is selected asthe fundamental unit
1( )( )
node
N nodes
time per stepspeedup time per step−
−
− −= − −
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Speedup, Jazz, GAMG vs PCG
Prototype car simulations: inter-nodescalability tests (Matrix vs. Jazz)
0
4
8
12
16
20
24
0 4 8 12 16 20
Spee
dup
Number of nodes
Jazz, PCG
Jazz, GAMG
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•The computing node is selected asthe fundamental unit
1( )( )
node
N nodes
time per stepspeedup time per step−
−
− −= − −
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Comments:
The PCG solver clearly outperformsGAMG when the parallelization startsto become extensive (approximatelyabove 100 processes for the half-carcase)
Jazz appears to scale better thanMatrix, probably because of the more capable infiniband network (QDR vs DDR) and of better cache “filling” asthe single processes become smaller
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•The computing node is selected asthe fundamental unit
Prototype car simulations: inter-nodescalability tests (Matrix vs. Jazz)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Time- per- step, Matrix, GAMG vs PCG
Prototype car simulations: absolute and single-node performances (Matrix vs. Jazz)
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•Time-per-step evaluated on a per-core basis
0
10
20
30
40
50
60
70
8 16 32 64 128 256
time
(s)
Number of cores
Matrix, PCG
Matrix, GAMG
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Time- per- step, Jazz, GAMG vs PCG
Prototype car simulations: absolute and single-node performances (Matrix vs. Jazz)
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•Time-per-step evaluated on a per-core basis
0
5
10
15
20
25
30
12 24 48 96 192
time
(s)
Number of cores
Jazz, PCG
Jazz, GAMG
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Time- per- step, single- node, Matrix, GAMG vs PCG
Prototype car simulations: absolute and single-node performances (Matrix vs. Jazz)
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•Time-per-step evaluated on a per-core basis
0
50
100
150
200
250
300
1 2 4 8
time
(s)
Number of cores
Matrix, PCG
Matrix, GAMG
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Prototype car simulations: absolute and single-node performances (Matrix vs. Jazz)
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•Time-per-step evaluated on a per-core basis
Time- per- step, single- node, Jazz, GAMG vs PCG
0102030405060708090
100
1 2 6 12
time
(s)
Number of cores
Jazz, PCG
Jazz, GAMG
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Prototype car simulations: absolute and single-node performances (Matrix vs. Jazz)
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•Time-per-step evaluated on a per-core basis
Comments:
Though the very inefficient intra-node scaling, the newer Intel arch. is(as expected) much faster than the AMD one
If the procs. number is kept in the “acceptable scaling range”, the GAMG solver is always faster than the PCG one (e. g. 40% faster on 64 Matrix cores)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Speedup efficiency, 1 6 ppn, PCG vs GAMGCase description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•16 and 32 MPI processes per node considered
Prototype car simulations: scalabilitytests (Fermi, symmetrical grid)
0
20
40
60
80
100
120
2 4 8 16 32 64 128 256
Spee
dup
effic
ienc
y (%
)
Number of nodes
Fermi, PCG, 16 PPN
Fermi, GAMG, 16 PPN
1 1· ·( ). .(%) 100 ( )
node
N nodes Ntime per steps e time per step
−
−
− −= − −
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Prototype car simulations: scalabilitytests (Fermi, symmetrical grid)
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•16 and 32 MPI processes per node considered
0
20
40
60
80
100
120
2 4 8 16 32 64
Spee
dup
effic
ienc
y (%
)
Number of nodes
Fermi, PCG, 16 PPN
Fermi, PCG, 32 PPN
Speedup efficiency, PCG, 1 6 ppn vs. 32 ppn
1 1· ·( ). .(%) 100 ( )
node
N nodes Ntime per steps e time per step
−
−
− −= − −
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Prototype car simulations: scalabilitytests (Fermi, symmetrical grid)
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•16 and 32 MPI processes per node considered
0
20
40
60
80
100
120
2 4 8 16 32 64
Spee
dup
effic
ienc
y (%
)
Number of nodes
Fermi, PCG, 16 PPN
Fermi, PCG, 32 PPN
Speedup efficiency, PCG, 1 6 ppn vs. 32 ppn
What about absolute performance?
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Prototype car simulations: scalabilitytests (Fermi, symmetrical grid)
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•16 and 32 MPI processes per node considered
Time- per- step, PCG, 1 6 ppn vs. 32 ppn
Apparently usingo more ppn could be beneficial in terms of absolute performance, butactually when the number of nodes reaches a “practical” value (64) the benefit vanishes, and in addition…
0
5
10
15
20
25
30
35
2 4 8 16 32 64
time
(s)
Number of nodes
Fermi, PCG, 16 PPN
Fermi, PCG, 32 PPN
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Output generation time, PCG, 1 6 ppn vs. 32 ppn
Prototype car simulations: I/O performance tests (Fermi, symmetrical grid)
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG linear solver on pressure
•Output generation time andinitialization time monitored
•16 and 32 MPI processes per node considered
05
101520253035404550
4 8 16 32 64 128
time
(s)
Number of nodes
Fermi, PCG, 16 PPN
Fermi, PCG, 32 PPN
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Initialization time, PCG, 1 6 ppn vs. 32 ppn
Prototype car simulations: I/O performance tests (Fermi, symmetrical grid)
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG linear solver on pressure
•Output generation time andinitialization time monitored
•16 and 32 MPI processes per node considered 0
50
100
150
200
250
4 8 16 32 64 128
time
(s)
Number of nodes
Fermi, PCG, 16 PPN
Fermi, PCG, 32 PPN
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Prototype car simulations: commentsabout Fermi runs (symmetrical grid)
Comments:
The case is of course too small to prove Fermi’s real potential, but…
…up to the minimum “practical” nodenumber (64) the SIMPLE iteration scalingis acceptable (PCG)
…when the I/O capability of the nodesgets actually saturated, a dramatic dropin the I/O efficiency occurs (and thingsget even worse with 32 ppn)
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•16 and 32 MPI processes per node considered
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Time- per- step, PCG, symm. vs. doubledCase description:
•Doubled grid (~15*106 cells)
•PCG solver on pressure equation
•Only 16 ppn considered
•Comparison made assuming the samemesh-per-node load distribution (i. e. doubling the number of nodes forthe bigger grid)
Further simulations on Fermi: doubledgrid
0
0,5
1
1,5
2
2,5
3
32-64 64-128 128-256
time
(s)
Number of nodes (symm-double)
Fermi, PCG, symm
Fermi, PCG, double
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Further simulations on Fermi: doubledgrid
Case description:
•Doubled grid (~15*106 cells)
•PCG solver on pressure equation
•Only 16 ppn considered
•Comparison made assuming the samemesh-per-node load distribution (i. e. doubling the number of nodes forthe bigger grid)
O. g. t. , PCG, symm. vs. doubled
0
5
10
15
20
25
30
35
40
32-64 64-128 128-256
time
(s)
Number of nodes (symm-double)
Fermi, PCG, symm
Fermi, PCG, double
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Further simulations on Fermi: doubledgrid
Case description:
•Doubled grid (~15*106 cells)
•PCG solver on pressure equation
•Only 16 ppn considered
•Comparison made assuming the samemesh-per-node load distribution (i. e. doubling the number of nodes forthe bigger grid)
I. t. , PCG, symm. vs. doubled
0
100
200
300
400
500
600
32-64 64-128 128-256
time
(s)
Number of nodes (symm-double)
Fermi, PCG, symm
Fermi, PCG, double
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Further simulations on Fermi: doubledgrid
Comments:
The SIMPLE iteration weak-scalingperformance appears fairly good and thus should encourage more tests on bigger cases, but…
…the I/O issues are confirmed
Case description:
•Doubled grid (~15*106 cells)
•PCG and GAMG linear solver on pressure equation
•Only 16 ppn considered
•Comparison made assuming the samemesh-per-node load distribution (i. e. doubling the number of nodes forthe bigger grid)
Conclusions (1)
• Hi-ZEV a is successful example of how industry can take advantagefrom the combination of parallelized open-source CFD toolkits and highly qualified HPC infrastructures, in a collaborative project framework
• The OpenFOAM® code has been evaluated on “conventional” AMD and Intel HPC facilities for external aerodynamics applications, showing:– Good accuracy compared to well established commercial CFD codes;– Interesting parallel performances (still not totally exploited), at least for
small/medium size cases (~ 107 cells) and depending on the optimal pressuresolver choice (PCG scales better, GAMG is faster for smal procs. numbers)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Conclusions (2)
• The OpenFOAM® performances have been assessed also on the BG/Q supercomputer Fermi and, in spite of the (relatively) smallsize of the considered cases, the following remarks can beextracted:– The solver iteration scaling performances are promising (with PCG), especially in
the perspective of coping with much bigger problems;– Though for the considered cases a more conventional architecture (e. g. Intel
Xeon) seems to be a better choice, a deeper investigation should be made in order to include also performance vs. energy consumption aspects;
– Unfortunately, for massively parallel applications (thousands of processes) a dramatic I/O efficiency question rises (further evaluation needed)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Aknowledgments
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
(1) Nu.m.i.d.i.a. S. r. l.(2) Dept. of Industrial Engineering, University of Rome “Tor Vergata”(3) CINECA Roma, Dipartimento SCAI
A. De Maio(1), V. Krastev(2), P. Lanucara(3), F. Salvadore(3)
M. Testa(1) (for providing the half-car grid and Fluent results)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Workshop “HPC enabling of OpenFOAM® for CFD applications”