This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Documentation and application notesDocumentation and application notes– IA-32 IntelIA-32 Intel®® Architecture Software Developer’s Manual Architecture Software Developer’s Manual – Intel PentiumIntel Pentium®® 4 and Intel Xeon 4 and Intel XeonTMTM Processor Optimization Manual Processor Optimization Manual– Intel App Note AP485 - “Intel Processor Identification and CPU Intel App Note AP485 - “Intel Processor Identification and CPU
Instructions”Instructions”– Intel App Note AP 949 “Intel App Note AP 949 “ Using Spin-Loops on Intel Pentium 4 Using Spin-Loops on Intel Pentium 4
Processor and Intel Xeon Processor”Processor and Intel Xeon Processor”– Intel App Note “Detecting Support for Jackson Technology Intel App Note “Detecting Support for Jackson Technology
C, C++ and Fortran95C, C++ and Fortran95– Available on Windows* and Linux*Available on Windows* and Linux*– Available for 32-bit and 64-bit platformsAvailable for 32-bit and 64-bit platforms
Utilization of latest processor/platform featuresUtilization of latest processor/platform features– Optimizations for NetBurst™ architecture (Pentium® 4 and Optimizations for NetBurst™ architecture (Pentium® 4 and
Xeon™ processor)Xeon™ processor)– Optimizations for Itanium® architecture Optimizations for Itanium® architecture
Seamless integration into Windows* (IDE)Seamless integration into Windows* (IDE)and Linux* environmentand Linux* environment
Source and binary compatible with Microsoft* Source and binary compatible with Microsoft* compiler; compiler; mostly source compatible with GNU (gcc)mostly source compatible with GNU (gcc)
(D850MD 850 motherboard) Chipset,256 MB Memory, Windows* XP Professional
Edition (build 2600), GeForce 3/nVidia* Graphics
SPECfp_base2000(Geomean of Fortran)
400
500
600
700
800
900
CVF* 6.6 Intel® Fortran Compiler 6.0
28%Faster
Floating-point Performance!!
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. Users’ results are dependent upon the application characteristics (loopy vs. flat), mix of C and C++, and other factors. For more information on performance tests and on the performance of Intel products, reference [www.intel.com] or call (U.S.) 1-800-628-8686 or 1-916-356-3104.
400
500
600
700
800
900
Leading C++ Compiler Intel® C++ Compiler 6.0
17%Faster Integer Performance!!
SPECint_base2000 = 703
SPECint_base2000 = 825Geomean of Fortran = 881
Geomean of Fortran = 686
Intel SW Development Tools – Compilers
Intel® C++ Compiler 6.0 for Linux*Intel® C++ Compiler 6.0 for Linux*
PovRay Image Rendering TimePovRay Image Rendering Time
Special Performance FeaturesSpecial Performance Features
Auto-Vectorization for NetBurst™ architectureAuto-Vectorization for NetBurst™ architecture Software-Pipelining for EPIC architectureSoftware-Pipelining for EPIC architecture Auto-Parallelization and OpenMP based parallelizationAuto-Parallelization and OpenMP based parallelization
– for Hyper-Threading and multi-processor systemsfor Hyper-Threading and multi-processor systems Data Pre-FetchingData Pre-Fetching Profile-Guided Optimization (PGO)Profile-Guided Optimization (PGO) Inter-procedural Optimization (IPO)Inter-procedural Optimization (IPO) CPU Dispatch CPU Dispatch
– Establishes code path at runtime dependent on actual processor type Establishes code path at runtime dependent on actual processor type – Allows single binary with optimal performance across Allows single binary with optimal performance across
processor familiesprocessor families
Intel SW Development Tools – Compilers
TechniquesTechniques Overview Overview
Exploit parallelism to speedup applicationExploit parallelism to speedup applicationVectorizationVectorization
– Supported by programming languages and Supported by programming languages and compilerscompilers – Motivated by modern architecturesMotivated by modern architectures
Superscalarity, deeply pipelined coreSuperscalarity, deeply pipelined core SIMDSIMD Software pipelining on ItaniumSoftware pipelining on Itanium™ architecture™ architecture
ParallelizationParallelization – OpenMPOpenMP™™ directives for shared memory directives for shared memory
multiprocessor systemsmultiprocessor systems– MPI computations for clustersMPI computations for clusters
Features by Intel Compilers
Intel processors and vectorizationIntel processors and vectorization
Pentium® with MMX™technology, Pentium® IIprocessors
Pentium® III processor
Pentium® 4 processor
Integer types, 64 bits
Streaming SIMD Extensions (SSE),Single precision floating point
foo.c(7) : (col. 2) remark: LOOP WAS AUTO-PARALLELIZED....
./foo.exe -- Executable detects and uses number of processors…
-Qpar_report[n] - get helpful messages from the compiler
Features by Intel Compilers - Parallelization
OpenMP™ DirectivesOpenMP™ Directives
OpenMP* standard (OpenMP* standard (www.openmp.orgwww.openmp.org))– Set of directives to enable the writing of multithreaded Set of directives to enable the writing of multithreaded
programsprogramsUse of shared memory parallelism on Use of shared memory parallelism on
programming language levelprogramming language level– PortabilityPortability– PerformancePerformance
Support by Intel® CompilersSupport by Intel® Compilers – Windows*, Linux*Windows*, Linux*– IA-32 and ItaniumIA-32 and Itanium™™ architectures architectures
Features by Intel Compilers - Parallelization
Simple DirectivesSimple Directivesfoo(float *a, float *b, float *c){ int i;#pragma parallel for (i=0; i<N; i++) { *c++ = (*a++)*bar(b++); };}
Pointers and procedure calls with escaped pointers prevent analysis for autoparallelization
Use simple directives instead
Features by Intel Compilers - Parallelization
void foo()void foo()
{ int a[1000], b[1000], c[1000], x[1000], i, NUM;{ int a[1000], b[1000], c[1000], x[1000], i, NUM;
/* parallel region *//* parallel region */
#pragma omp parallel private(NUM) shared(x, a, b, c)#pragma omp parallel private(NUM) shared(x, a, b, c)
{ NUM = omp_get_num_threads();{ NUM = omp_get_num_threads();
#pragma omp for private(i) /* work-sharing for loop */#pragma omp for private(i) /* work-sharing for loop */
for (i = 0; i< 1000; i++) {for (i = 0; i< 1000; i++) {
x[i] = bar(a[i], b[i], c[i], NUM); /* assume bar has no side-effects */ x[i] = bar(a[i], b[i], c[i], NUM); /* assume bar has no side-effects */
}}
}}
}}
OpenMP* DirectivesOpenMP* Directives
icl -Qopenmp -c foo.c { -xopenmp on Linux}foo.cfoo.c(10) : (col. 1) remark: OpenMP DEFINED LOOP WAS PARALLELIZED.foo.c(7) : (col. 1) remark: OpenMP DEFINED REGION WAS PARALLELIZED.
Features by Intel Compilers - Parallelization
OpenMP™ + VectorizationOpenMP™ + Vectorization
Combined speedupCombined speedupOrder of use might be importantOrder of use might be important
Supported by Intel® CompilersSupported by Intel® Compilers
Features by Intel Compilers
Make performance a feature of your applications today –
stay competitive
Make performance a feature of your applications today –
stay competitive
Intel® CompilersIntel® Compilers
Leading-Edge compiler technologiesLeading-Edge compiler technologiesCompatible with leading industry standard Compatible with leading industry standard
compilerscompilersProcessor optimized code generationProcessor optimized code generationSupport single source code across Intel Support single source code across Intel