Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com George Delic , Ph.D. HiPERiSM Consulting, LLC (919)484-9803 P.O. Box 569, Chapel Hill, NC 27514 [email protected] http://www.hiperism.com HiPERiSM Consulting, LLC.
Dec 15, 2015
Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com
George Delic , Ph.D.
HiPERiSM Consulting, LLC
(919)484-9803
P.O. Box 569,
Chapel Hill, NC [email protected]
http://www.hiperism.com
HiPERiSM Consulting, LLC.
Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com
CHOOSING A COMPILER FOR AQM APPLICATIONS ON LINUX
George Delic, Ph.D.
Models-3 User’s WorkshopOctober 27-29, 2003
RTP, NC
Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com
Overview
1. Introduction2. Choice of Hardware3. Choice of Compilers4. Choice of Benchmarks5. Comparing Execution Times6. Evaluation of SSE Results7. Tests for AQM’s8. Conclusions
Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com
Introduction
MotivationAQM’s are migrating to COTS hardwareLinux is preferredRich choice of compilers is now availableNeed to learn about portability issues
What is known about compilers for IA-32?CMAQ releases switch compilers w/o commentWhere is the analysis of differences in
Performance?Numerical accuracy & stability?Portability problems?
Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com
Choice of Hardware & Compilers
HardwareIntel Pentium III (933 MHz, dual processor) with
SSE extensions and 256MB L2 cacheLinux 2.4.20 kernel
Fortran compilers for IA-32Absoft 8.0Intel 7.1Lahey 5.6Portland CDK 4.0
Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com
Choice of Benchmarks
Kallman Integer and Logical AlgorithmUses only I & L operations with bit intrinsicsNegligible I/O and memory operationsSix cases with problem size scaling
Stommel Ocean Model sp Floating Point AlgorithmJacobi iteration sweep over 2-D physical
domainRegular loops optimal for testing vectorizationSix cases in the range N=2x103 to 7x103 with
N2=4 to 49 million data points
Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com
Choice of Benchmarks (cont.)
Princeton Ocean Model dp FP AlgorithmExample of “real-world” code that is
numerically unstable with sp arithmetic! 500+ vectorizable loops to exercise compilers9 procedures account for 85% of CPU time 2-Day simulation for two cases:
Small problem: 65 x 49 x 21 Large problem: 100 x 40 x 15
Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com
Comparing Execution Times: Kallman compiler switches
Compiler and version
Compiler command and selected switches
Absoft 8.0 f90 –O3 –ffixed
Intel 7.1 ifc –O3 –tpp6 -FI
Lahey 5.6 lf95 –tpp –fix
Portland 4.0 pgf90 –fast
Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com
Comparing Execution Times: Kallman (seconds)
N Absoft Intel Lahey Portland
30 0.21 0.36 0.48 0.60
44 40.38 80.19 98.45 135.29
48 6.44 13.15 16.16 22.52
52 23.03 48.20 59.30 83.28
56 197.78 412.83 509.31 712.42
60 12891.58 26734.09 32833.08 45451.38
Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com
Comparing Execution Times: Kallman (log10 seconds)
Kallman Integer & Logical Algorithm (PIII 933 MHz)
-1
0
1
2
3
4
5
1 2 3 4 5 6
Case
Lo
g1
0 o
f w
all
tim
e (
se
co
nd
s)
Absoft
Intel
Lahey
Portland
Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com
Comparing Execution Times: Kallman (ratio to Absoft time)
Kallman Integer & Logical Algorithm (PIII 933 MHz)
0
0.5
1
1.5
2
2.5
3
3.5
4
1 2 3 4 5 6
Case
Ra
tio
to
Ab
so
ft t
ime
Intel / AbsoftLahey / AbsoftPortland /Absoft
Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com
Comparing Execution Times: SOM (POM) compiler switches
(without SSE)Compiler and version
Compiler command and selected switches
Absoft 8.0 f90 –s –cpu:p6–O3 (-N113) –ffixed
Intel 7.1 ifc –O3 (-r8) –tpp6 -FI
Lahey 5.6 lf95 –tpp (-dbl) –fix
Portland 4.0 pgf90 –fast (-r8) –Mvect
Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com
Comparing Execution Times: SOM without SSE (seconds)
N Absoft Intel Lahey Portland
2000 50.0 38.8 36.4 41.4
3000 110.5 94.4 87.7 92.7
4000 197.7 159.6 150.3 163.3
5000 305.3 224.3 246.8 253.1
6000 443.4 320.0 332.0 388.5
7000 586.5 427.6 477.9 524.4
Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com
Comparing Execution Times: SOM (without SSE)
SOM Floating Point Algorithm (PIII 933 MHz)
0
100
200
300
400
500
600
700
1 2 3 4 5 6
Case
Wa
ll t
ime
(s
ec
on
ds
)
AbsoftIntelLaheyPortland
Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com
Statistics for four compilers: SOM (without SSE)
SOM Floating Point Algorithm (PIII 933 MHz): Statistics for four compilers
0
100
200
300
400
500
600
1 2 3 4 5 6
Case
Wa
ll t
ime
(s
ec
on
ds
)
Mean
StandardDeviationCoefficient ofVariation x 1000
Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com
Comparing Execution Times: POM (without SSE)
POM Floating Point Algorithm (PIII 933 MHz)
0100200
300400500600700
800900
1000
1 2
Case
Wa
ll t
ime
(s
ec
on
ds
)
AbsoftIntelLaheyPortland
Case Absoft Intel Lahey Portland
1 909.1 826.4 728.8 836.3
2 825.1 786.9 671.2 755.3
Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com
Statistics for four compilers: Variability vs. problem size
Coefficient of Variation for four compilers(PIII 933 MHz, without SSE)
00.05
0.10.15
0.20.25
0.30.35
0.40.45
0.5
1 2 3 4 5 6
Case
Sta
nd
ard
De
via
tio
n /
Me
an
KallmanSOMPOM
Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com
Evaluation of SSE Results
IA-32 HardwareIntel Pentium III+ supports Streaming-
Single-Instruction-Multiple-Data Extensions (SSE)
Linux 2.4.20 kernel supports SSE
Fortran compilers that enable SSEIntel 7.1Portland CDK 4.0
Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com
Comparing Execution Times: SOM (POM) compiler switches
(with SSE)
Compiler and version
Compiler command and selected switches
Intel 7.1 ifc –O3 -xK (-r8) –tpp6 -FI
Portland 4.0 pgf90 –fast (-r8) –Mvect=sse
Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com
Comparing Execution Times: SOM (with SSE)
SOM Floating Point Algorithm (PIII 933 MHz)
0
100
200
300
400
500
600
1 2 3 4 5 6
Case
Wa
ll t
ime
(s
ec
on
ds
)
IntelIntel (SSE)PortlandPortland (SSE)
Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com
Comparing Execution Times: POM (with SSE)
POM Floating Point Algorithm (PIII 933 MHz)
0
100
200
300
400
500
600
700
800
900
1 2
Case
Wa
ll t
ime
(s
ec
on
ds
)
IntelIntel (SSE)PortlandPortland (SSE)
Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com
Evaluation of SSE Results
Fortran compilers with SOM (sp)Intel 7.1
Average speed up of 1.44
Portland CDK 4.0Average speed up of 1.70
Fortran compilers with POM (dp)Intel 7.1
Average speed up of 1.25
Portland CDK 4.0Average speed up of 1.19
Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com
Tests for AQM’s
Next steps for CMAQ with four compilers:• Report on portability issues• Re-compilation of all libraries• Performance instrumentation & analysis• Numerical & stability analysis• OpenMP performance study
Please propose scenarios worthwhile using for these tests!
Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com
Conclusions
Hardware: COTS is the way to go but ……. Linux: Operating System is popular but ….. Programming Environment: rich in choices Consequences for AQM: the combination
of hardware, Linux, and programming environment needs careful on-going evaluation.
HiPERiSM is ready for this task!
Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com
HiPERiSM’s URL
http://www.hiperism.com
Talk to us about your requirements