© 2011 ANSYS, Inc. May 27, 2016 1 ANSYS CFD的高速運算介紹 李龍育 Dragon CFD技術經理 虎門科技
© 2011 ANSYS, Inc. May 27, 2016 1
ANSYS CFD的高速運算介紹
李龍育 Dragon
CFD技術經理
虎門科技
© 2011 ANSYS, Inc. May 27, 2016 2 Taiwan Auto-Design Co.
虎門科技股份有限公司,創立於
1980年,提供客戶全球最優質的
工程分析軟體ANSYS與技術服務
• 結構強度分析
ANSYS Mechanical
• 落摔分析
ANSYS LS-DYNA
• 散熱與熱流場分析
ANSYS FLUENT、 ICEPAK、CFX
• 電磁場分析
ANSYS Emag、 Maxwell
• 多物理耦合分析
Provider of Engineering Solutions and Methodology
• 總公司 : 新北市板橋區
• 台中分公司
• 台南分公司
虎門科技 CADMEN
© 2011 ANSYS, Inc. May 27, 2016 3
Wave Front Co., Ltd.
Applications
• 稀薄氣體流場分析
• CVD, Plasma
• 磁控濺鍍 (Sputter)
• 蒸鍍 (Evaporate)
• 乾式蝕刻
Software
• DSMC-Neutrals
• Particle-Plus
© 2011 ANSYS, Inc. May 27, 2016 4
Semiconductor
TADC: Successful Stories
Electronics
Equipments
Green Energy
Chemical Engineering
© 2011 ANSYS, Inc. May 27, 2016 5
Fluid Dynamics Structural Mechanics
ANSYS Simplorer
ANSYS Engineering Knowledge Manager
ANSYS HPC ANSYS Workbench
Electromagnetics
ANSYS DesignXplorer
Systems and Multiphysics
ANSYS FLUENT
ANSYS CFX
ANSYS Icepak
ANSYS HFSS
ANSYS Maxwell
ANSYS Q3D
ANSYS Mechanical
ANSYS LS-DYNA
ANSYS nCode
ANSYS Acoustics
About ANSYS Advanced Physics Solvers
© 2011 ANSYS, Inc. May 27, 2016 6
Comprehensive Multiphysics
Structural- Mechanics
Fluid Dynamics
Electromagnetics
Systems and Embedded Software
ANSYS Vision for the Future from the Beginning • Predict complex product performance under real world conditions • Simulate complete virtual prototypes
Aero-Vibro-Acoustics-Coupling
© 2011 ANSYS, Inc. May 27, 2016 7
• 馬達在現今使用上的需求,追求高效能的同時,對於考量散熱、結構強度、震動與噪音等的議題逐漸升高,然而各物理量之間卻存在著相互影響之關係,共同的求解不同場域問題,將能更全面的評估馬達的設計。
• ANSYS 提供了優良的電機設計和分析能力:
– 電磁性能
– 電氣驅動性能
– 結構分析
– 熱流分析
– 聲學分析
• ANSYS 耦合技術可以將
電磁力轉換至多物理場域分析
馬達分析的需求
線圈溫度分佈 振動與噪音
Core loss
© 2011 ANSYS, Inc. May 27, 2016 8
Why HPC Is Important?
Faster
• Reduce turn around time
• Consider more design variants
Larger
• Assess larger, more detailed models
• Consider more complex physics
• From single component to system simulations
Easier
• Put powerful computation resources at users’ fingertips
• Efficient decisions earlier in the product development cycle 2504
4944
10053
19073
31476
28 56 112 224 448
Time steps done in one day with 16.0.0 oil_rig_7m Intel Haswell
© 2011 ANSYS, Inc. May 27, 2016 9
Improved Solver Performance & Scaling
Improved parallel scalability:
• Also observed for more typical model sizes at more regular core counts!
• At extreme core counts o 86% efficiency for 830M cell case at 36K cores,
with species transport
o 80% efficiency for 91M cell case at 16K cores, with complex physics
© 2011 ANSYS, Inc. May 27, 2016 10
Case Details:
• Wave loading on Oil Rig
• Number of cells: 7 Million
• Cell Type: Mixed
• Models used: SST K-omega turbulence
• Solver: Pressure based segregated, VOF, Green-Gauss cell based, unsteady
0
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
0 128 256 384 512 640 768
Rat
ing
Number of Cores
15.0.7
16.0.0
Improved Parallel Performance & Scaling
© 2011 ANSYS, Inc. May 27, 2016 11
Case Details:
• External flow over aircraft landing gear
• Number of cells: 15 Million
• Cell Type: Mixed
• Models used: LES + Acoustics
• Solver: Pressure based coupled, Least Square cell based, Unsteady
0
500
1000
1500
2000
2500
3000
3500
4000
0 512 1024 1536 2048
Rat
ing
NumCores
15.0.7
16.0.0
Improved Parallel Performance & Scaling
© 2011 ANSYS, Inc. May 27, 2016 12
Application Example
HPC 計算效能範例- VOF噴嘴暫態流場運算
• 噴嘴流場分析
• 網格數量: 13.6M
• 使用模組: VOF
• 暫態穩定後,計算10個疊代,平均一個疊代所需時間
虎門伺服器計算時間
網格數 13.6M ←
計算核心數 1 30
CPU時脈 2.6GHz ←
記憶體容量 256GB ←
記憶體時脈 1600MHz ←
計算一個time step時間(分) 192.1 4.6
500 time step 時間(天) 66.7 1.6
© 2011 ANSYS, Inc. May 27, 2016 14
4 Main Products
HPC (per-process)
HPC Pack • HPC product rewarding volume parallel processing for
high-fidelity simulations
• Each simulation consumes one or more Packs
• Parallel enabled increases quickly with added Packs
HPC Workgroup • HPC product rewards volume parallel processing for
increased simulation throughput shared among engineers throughout a single location or the world
• 16 to 32768 parallel shared across any number of simulations on a single server
HPC Parametric Pack • Enables simultaneous execution of multiple design points
while consuming just one set of licenses
2048
32
8
128
512
Parallel Enabled (Cores)
HPC Packs per Simulation
1 2 3 4 5
32768
8192
6 7
© 2011 ANSYS, Inc. May 27, 2016 15
Deliver Outstanding Parallel Performance & Scaling at an Ever Increasing Scale of Parallelism
● A continuous software development focus on HPC enabling parallel improvements made release by release - also at R17.0.
● ANSYS solvers are highly optimized to run fast and deliver outstanding parallel scaling at an increasing scale of parallelism!
● As we are committed to taking simulations to new levels of software scalability, we are having strong technology partnerships with hardware vendors, and supercomputer centres (e.g. NCSA, HLRS).
ANSYS Features & Capabilities
Customer Benefits
● As HPC evolves into the future, ANSYS is the right choice to sustain the software investment that is required to stay ahead.
● And, to ensure that your ROI in HPC resources is maximized – now and into the future!
● Reduced time to solution of your current models by leveraging more cores.
● Be less constrained by hardware limitations because ‘bigger’ models can be sped up at your existing compute capacity.
Supercomputing Milestone
© 2011 ANSYS, Inc. May 27, 2016 16
Improved Parallel Performance & Scaling
ANSYS Fluent 17.0
ANSYS Application Example
Big speed-ups for moving dynamic mesh due to: • Neighborhood optimization • Sliding interface optimization • Parallel solver optimization • Combustion code refactoring
In-Cylinder Combustion Model:
• 55% faster at 384 cores
• 7 cell zones, MDM, Spray, Partially premixed, 1.6 million cells
© 2011 ANSYS, Inc. May 27, 2016 18
GPU-accelerated ANSYS products
Fluent® Mechanical Nexxim
HFSS
TM
TM
TM
© 2011 ANSYS, Inc. May 27, 2016 19
CPU + GPU
AN
SY
S F
luent
Tim
e (
Sec)
AMG solver time
5.9
x
2.5
x Lower is
Better
Solution time
GPU Acceleration of Water Jacket Analysis
• Unsteady RANS model
• Fluid: water
• Internal flow
• CPU: Intel Xeon E5-2680; 8 cores
• GPU: 2 X Tesla K40
Water jacket model
ANSYS Fluent 15.0 performance on pressure-based coupled Solver
NOTE: Times
for 20 time steps CPU only CPU + GPU CPU only
4557
775
6391
2520
© 2011 ANSYS, Inc. May 27, 2016 20
GPU Scaling on 111M Aerodynamic Problem
• 111M mixed cells
• External aerodynamics
• Steady, k-e turbulence
• Double-precision solver
• CPU: Intel Xeon E5-2667; 12 cores per node
• GPU: Tesla K40, 4 per node
Truck Body Model
144 CPU cores – Amg
48 GPUs – AmgX
AMG solver time per iteration (secs)
29
11
Fluent solution time per iteration (secs)
36
18
144 CPU cores
144 CPU cores + 48 GPUs
2.7 X
2 X
Lower is
Better
Note: AmgX is a GPU solver developed by NVIDIA and is implemented by ANSYS in Fluent for accelerating CFD
Better performance on problems with relatively high %AMG solver time
80% AMG solver time
© 2011 ANSYS, Inc. May 27, 2016 21
GPU Performance: Problem Size
• External aerodynamics
• Steady, k-e turbulence
• Double-precision solver
• CPU: Intel Xeon E5-2667; 12 cores per node
• GPU: Tesla K40, 4 per node
Truck Body Model
14 million cells
13
9.5
111 million cells
36
18
144 CPU cores
144 CPU cores + 48 GPUs
1.4 X
2 X
Lower is
Better
36 CPU cores
36 CPU cores + 12 GPUs
AN
SY
S F
luent
Tim
e (
Sec)
Better speed-ups on larger and harder-to-solve problems
NOTE: Reported
times are per
iteration
Page 22
Parametric Solutions, Inc. Proprietary
ANSYS FLUENT Accelerated with NVIDIA GPU
ANSYS 15.0
1.8M cells
8 Cores (Xeon E5-2687)
64 GB Ram
NVIDIA K40 GPU
Solution Time: GPU : ~ 1 hr
No GPU: ~ 1.75 hrs
GPU Produces a 43% Reduction in Solution Time
© 2011 ANSYS, Inc. May 27, 2016 23
Particle Plus HPC Performance • The parallel performance about PIC computation with Particle-PLUS. It
shows execution time and speed up for 1, 2, 4, 8, 16 and 24 processes,
"speed up“ where means computation speed, relative to single computation.
• The graph is plotted using the following data,
• number_of_processor execution_time[hour] speed_up
• 1 76.1500 1.00
• 2 39.4872 1.93
• 4 21.6081 3.52
• 8 11.7622 6.47
• 16 6.40496 11.89
• 24 6.03144 12.63
•where I measure execution times under the following condition: • •- number of super particles (effective) : about 5E7 •- electron density : about 5E16 •- sampling time steps : 5000 •- number of grids : 7200 •- no outputs of results •- execution on unix PC cluster
Note that parallel performance depends greatly on number of super particles which are used effectively. If the number is not many, then parallel performance becomes bad. For example, when the number of super particles is about 5E5, the speed up at 24-parallel process is 3.1 times as single process.
© 2011 ANSYS, Inc. May 27, 2016 24
Fluids
Thermal
Emag CAD Import
Structural
Post- process
Meshing Workflow
Design Points
Thank you for your
attention!
EnSight