Top Banner
Hyper-Threading Hyper-Threading Intel Compilers Intel Compilers Andrey Naraikin Andrey Naraikin Senior Software Engineer Senior Software Engineer Software Products Division Software Products Division Intel Nizhny Novgorod Lab Intel Nizhny Novgorod Lab November 29, 2002 November 29, 2002
34

Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

Dec 23, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

Hyper-Threading Hyper-Threading Intel CompilersIntel Compilers

Andrey NaraikinAndrey Naraikin

Senior Software EngineerSenior Software Engineer

Software Products DivisionSoftware Products Division

Intel Nizhny Novgorod LabIntel Nizhny Novgorod Lab

November 29, 2002November 29, 2002

Page 2: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

AgendaAgenda Hyper-Threading Technology OverviewHyper-Threading Technology Overview

Introduction: Intel SW Development ToolsIntroduction: Intel SW Development Tools– MotivationMotivation

– ChallengesChallenges

– Intel SW ToolsIntel SW Tools

Intel Compilers OverviewIntel Compilers Overview– Technologies supportedTechnologies supported

– SPEC and other benchmarksSPEC and other benchmarks

– Some features supported by Intel CompilersSome features supported by Intel Compilers

Page 3: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

Hyper-Threading Overview Today’s ProcessorsToday’s Processors

Single Processor SystemsSingle Processor Systems– Instruction Level Parallelism (ILP)Instruction Level Parallelism (ILP)

– Performance improved with more CPU resourcesPerformance improved with more CPU resources

Multiprocessor SystemsMultiprocessor Systems– Thread Level Parallelism (TLP) Thread Level Parallelism (TLP)

– Performance improved by adding more CPUsPerformance improved by adding more CPUs

Hyper-Threading technology enables TLP to single processor system.

Page 4: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

Hyper-Threading Overview Today’s SoftwareToday’s Software

Sequential tasksSequential tasks

Parallel tasksParallel tasks

Open FileOpen File Edit Spell Check Edit Spell Check

Open DB’sOpen DB’s Address Book Address Book

InBox MeetingInBox Meeting

Page 5: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

Hyper-Threading Overview Multi-ProcessingMulti-Processing

Multi-tasking workload + processor resources=> Improves MT Performance

Multi-tasking workload + processor resources=> Improves MT Performance

Run parallel tasks using multiple processors Run parallel tasks using multiple processors

CPU 1CPU 1

CPU 2CPU 2

CPU 3CPU 3

Page 6: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

Hyper-Threading: Quick ViewHyper-Threading: Quick View

Page 7: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

Dual-Core ArchitectureDual-Core Architecture

Hyper-Threading

Processor Processor Execution Execution ResourcesResources

ASAS ASAS

Multiprocessor

Processor Processor Execution Execution ResourcesResources

ASAS

Processor Processor Execution Execution ResourcesResources

ASAS

AS = Architecture State (eax, ebx, control registers, etc.), xAPIC

Hyper-Threading Technology looks like Hyper-Threading Technology looks like two processors to softwaretwo processors to software

Hyper-Threading Technology looks like Hyper-Threading Technology looks like two processors to softwaretwo processors to software

Hyper-Threading TechnologyHyper-Threading Technology

Page 8: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

Hyper-Threading Architecture OverviewHyper-Threading Architecture Overview

Pentium, VTune and Xeon is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries.

Page 9: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

Pentium, VTune and Xeon is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries.

Hyper-Threading Architecture DetailsHyper-Threading Architecture Details

Page 10: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

Hyper-Threading Overview Resource UtilizationResource Utilization

Tim

e (p

roc.

cyc

les)

Note: Each box represents a processor execution unit

Superscalar MultiprocessingHyper-

Threading

Multiprocessing With Hyper-Threading

Page 11: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

Performance BenefitPerformance Benefit

0

0.5

1

1.5

2

A1 A2 A3 A4 A5 A6 A7 A8 A9

Application

Rel

ativ

e S

pee

du

p

SMP

HTT

Serial

Hyper-Threading TechnologyHyper-Threading Technology

CodeCode DescriptionDescription

A1A1 EngineeringEngineering

A2A2 GeneticsGenetics

A3A3 ChemistryChemistry

A4A4 EngineeringEngineering

A5A5 WeatherWeather

A6A6 GeneticsGenetics

A7A7 CFDCFD

A8A8 FEAFEA

A9A9 FEAFEA

“Hyper-Threading Technology: Impact on Compute-Intensive Workloads,” Intel Technical Journal, Vol. 6, 2002.

Page 12: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

Key PointKey Point

Hyper-Threading Technology gives better utilization of processor resources

Hyper-Threading Technology gives more computing power for multithreaded applications

Hyper-Threading TechnologyHyper-Threading Technology

Page 13: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

CollateralCollateralWeb SitesWeb Sites

– http://developer.intel.com/technology/hyperthread/http://developer.intel.com/technology/hyperthread/– http://developer.intel.com/design/pentium4/applnotshttp://developer.intel.com/design/pentium4/applnots– http://developer.intel.com/design/pentium4/manualshttp://developer.intel.com/design/pentium4/manuals

Documentation and application notesDocumentation and application notes– IA-32 IntelIA-32 Intel®® Architecture Software Developer’s Manual Architecture Software Developer’s Manual – Intel PentiumIntel Pentium®® 4 and Intel Xeon 4 and Intel XeonTMTM Processor Optimization Manual Processor Optimization Manual– Intel App Note AP485 - “Intel Processor Identification and CPU Intel App Note AP485 - “Intel Processor Identification and CPU

Instructions”Instructions”– Intel App Note AP 949 “Intel App Note AP 949 “ Using Spin-Loops on Intel Pentium 4 Using Spin-Loops on Intel Pentium 4

Processor and Intel Xeon Processor”Processor and Intel Xeon Processor”– Intel App Note “Detecting Support for Jackson Technology Intel App Note “Detecting Support for Jackson Technology

Enabled Processors”Enabled Processors”

Page 14: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

Collateral (Cont’d)Collateral (Cont’d)Intel Technology Journal Intel Technology Journal

– http://developer.intel.com/technology/itj/http://developer.intel.com/technology/itj/

Intel Threading ToolsIntel Threading Tools– http://www.intel.com/software/products/http://www.intel.com/software/products/

OpenMPOpenMP– http://www.openmp.orghttp://www.openmp.org

HT Overview HT Overview – http://www.ixbt.com/cpu/pentium4-3ghz-ht.shtmlhttp://www.ixbt.com/cpu/pentium4-3ghz-ht.shtml

Page 15: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

Performance AdvantagePerformance AdvantageOptimization PathOptimization Path

StandardStandardCompilerCompiler

Little or Little or No Code ChangeNo Code Change

Minor Code ChangeMinor Code Change(1 Line)(1 Line)

13x13x

Analysis with VTune™Analysis with VTune™

1x1x

Intel SW Development Tools

4x4x

IntelIntelCompilerCompiler

7x7x

9x9xOpenMPOpenMP

ThreadingThreading

IntelIntelCompilerCompiler

IntelIntelCompilerCompiler

15x faster15x faster

OpenMPOpenMPThreadingThreading

IntelIntelCompilerCompiler

MinorMinorCode ChangeCode Change

PerformancePerformanceLibrariesLibraries

(IPP or MKL)(IPP or MKL)

StandardStandardCompilerCompiler

PerformancePerformanceLibrariesLibraries

(IPP or MKL)(IPP or MKL)

PerformancePerformanceLibrariesLibraries

(IPP or MKL)(IPP or MKL)

Page 16: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

Sunset Simulation Sunset Simulation Optimized PerformanceOptimized Performance

Intel SW Development Tools

15x faster15x faster

Page 17: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

Intel® CompilersIntel® Compilers

C, C++ and Fortran95C, C++ and Fortran95– Available on Windows* and Linux*Available on Windows* and Linux*– Available for 32-bit and 64-bit platformsAvailable for 32-bit and 64-bit platforms

Utilization of latest processor/platform featuresUtilization of latest processor/platform features– Optimizations for NetBurst™ architecture (Pentium® 4 and Optimizations for NetBurst™ architecture (Pentium® 4 and

Xeon™ processor)Xeon™ processor)– Optimizations for Itanium® architecture Optimizations for Itanium® architecture

Seamless integration into Windows* (IDE)Seamless integration into Windows* (IDE)and Linux* environmentand Linux* environment

Source and binary compatible with Microsoft* Source and binary compatible with Microsoft* compiler; compiler; mostly source compatible with GNU (gcc)mostly source compatible with GNU (gcc)

Intel SW Development Tools – Compilers

Page 18: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

Benchmarks: Intel® Compilers 6.0 Benchmarks: Intel® Compilers 6.0 for Windows*for Windows*

SPECint_base2000

Configuration info: Intel® Pentium® 4 Processor, 2.4 GHz, Intel® Medford 850 Motherboard,

(D850MD 850 motherboard) Chipset,256 MB Memory, Windows* XP Professional

Edition (build 2600), GeForce 3/nVidia* Graphics

SPECfp_base2000(Geomean of Fortran)

400

500

600

700

800

900

CVF* 6.6 Intel® Fortran Compiler 6.0

28%Faster

Floating-point Performance!!

Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests.  Any difference in system hardware or software design or configuration may affect actual performance.  Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. Users’ results are dependent upon the application characteristics (loopy vs. flat), mix of C and C++, and other factors. For more information on performance tests and on the performance of Intel products, reference [www.intel.com] or call (U.S.) 1-800-628-8686 or 1-916-356-3104.

400

500

600

700

800

900

Leading C++ Compiler Intel® C++ Compiler 6.0

17%Faster Integer Performance!!

SPECint_base2000 = 703

SPECint_base2000 = 825Geomean of Fortran = 881

Geomean of Fortran = 686

Intel SW Development Tools – Compilers

Page 19: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

Intel® C++ Compiler 6.0 for Linux*Intel® C++ Compiler 6.0 for Linux*

PovRay Image Rendering TimePovRay Image Rendering Time

Configuration info: Intel® Pentium® 4 processor, 2.0 GHz, 256 MB Memory, nVidia* GeForce 2 graphics card, Linux* 2.4.7, PovRay 3.1G

Intel SW Development Tools – Compilers

60%

80%

100%

120%

140%

160%

gcc 2.96, O2 andFast-math

Optimization

Intel® 6.0 ComparableOptimization

Intel® 6.0 MaximumOptimization

20.30 Seconds

14.75 Seconds

13.57 Seconds

Imp

rove

me

Imp

rove

me

nt

nt

Page 20: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

Special Performance FeaturesSpecial Performance Features

Auto-Vectorization for NetBurst™ architectureAuto-Vectorization for NetBurst™ architecture Software-Pipelining for EPIC architectureSoftware-Pipelining for EPIC architecture Auto-Parallelization and OpenMP based parallelizationAuto-Parallelization and OpenMP based parallelization

– for Hyper-Threading and multi-processor systemsfor Hyper-Threading and multi-processor systems Data Pre-FetchingData Pre-Fetching Profile-Guided Optimization (PGO)Profile-Guided Optimization (PGO) Inter-procedural Optimization (IPO)Inter-procedural Optimization (IPO) CPU Dispatch CPU Dispatch

– Establishes code path at runtime dependent on actual processor type Establishes code path at runtime dependent on actual processor type – Allows single binary with optimal performance across Allows single binary with optimal performance across

processor familiesprocessor families

Intel SW Development Tools – Compilers

Page 21: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

TechniquesTechniques Overview Overview

Exploit parallelism to speedup applicationExploit parallelism to speedup applicationVectorizationVectorization

– Supported by programming languages and Supported by programming languages and compilerscompilers – Motivated by modern architecturesMotivated by modern architectures

Superscalarity, deeply pipelined coreSuperscalarity, deeply pipelined core SIMDSIMD Software pipelining on ItaniumSoftware pipelining on Itanium™ architecture™ architecture

ParallelizationParallelization – OpenMPOpenMP™™ directives for shared memory directives for shared memory

multiprocessor systemsmultiprocessor systems– MPI computations for clustersMPI computations for clusters

Features by Intel Compilers

Page 22: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

Intel processors and vectorizationIntel processors and vectorization

Pentium® with MMX™technology, Pentium® IIprocessors

Pentium® III processor

Pentium® 4 processor

Integer types, 64 bits

Streaming SIMD Extensions (SSE),Single precision floating point

Streaming SIMD Extensions 2 (SSE 2),Double precision floating point,Integer types, 128 bits

Type of processor Vectorization features supported

Features by Intel Compilers - Vectorization

Page 23: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

Compiler automatically transforms Compiler automatically transforms sequential code for SIMD executionsequential code for SIMD execution

Automatic VectorizationAutomatic Vectorization

for (i=0; i<n; i++) { a[i] = a[i] + b[i]; a[i] = sin(a[i]);}

for(i=0; i<n; i=i+VL) { a(i : i+VL-1) = a(i : i+VL-1) + b(i : i+VL-1); a(i : i+VL-1) = _vmlSin(a(i : i+VL-1));}

icl - Qx[MKW]

Run-Time Run-Time LibraryLibrary

HW SIMD HW SIMD instructioninstruction

Features by Intel Compilers - Vectorization

Page 24: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

Vectorization ExampleVectorization Example

0.0 1.0 2.0 3.0 4.0 5.0

0.0 1.0 2.0 3.0 4.0 5.0

6.0

6.0

7.0

7.0

8.0

8.0

9.0

9.0

0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0 18.0

a

b

Scalar

Vector 4.0 6.0 8.0 10.0 12.0 14.0 16.0 18.00.0 2.0

Features by Intel Compilers - Vectorization

double a[N], b[N]; int i;

for (i = 0; i < N; i++) a[i] = a[i] + b[i];

icl - QxW

Page 25: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

Reduction ExampleReduction Example

a 11.00.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0

0.0 0.0 0.0 0.0

0.0 1.0 2.0 3.0

4.0 6.0 8.0 10.0

12.0 15.0 18.0 21.0

30.0 36.0

66.0

Loop kernel

Postlude

float a[N], x;

int i;

x=0.0;

for (i = 0; i < N; i++)

x += a[i];

Features by Intel Compilers - Vectorization

Page 26: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

Parallel Program DevelopmentParallel Program Development

Ease of use/

maintenaince

Explicit threading using operating system callsWith industry standard OpenMP* directivesAutomatically using the compiler

Parallelization

Features by Intel Compilers - Parallelization

Page 27: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

AutoparallelizationAutoparallelization

float a[N], b[N], c[N];int i;for (i=0; i<N; i++) c[i] = a[i]*b[i];

icl -Qparallel foo.c { -xparallel on Linux}

….foo.c

foo.c(7) : (col. 2) remark: LOOP WAS AUTO-PARALLELIZED....

./foo.exe -- Executable detects and uses number of processors…

-Qpar_report[n] - get helpful messages from the compiler

Features by Intel Compilers - Parallelization

Page 28: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

OpenMP™ DirectivesOpenMP™ Directives

OpenMP* standard (OpenMP* standard (www.openmp.orgwww.openmp.org))– Set of directives to enable the writing of multithreaded Set of directives to enable the writing of multithreaded

programsprogramsUse of shared memory parallelism on Use of shared memory parallelism on

programming language levelprogramming language level– PortabilityPortability– PerformancePerformance

Support by Intel® CompilersSupport by Intel® Compilers – Windows*, Linux*Windows*, Linux*– IA-32 and ItaniumIA-32 and Itanium™™ architectures architectures

Features by Intel Compilers - Parallelization

Page 29: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

Simple DirectivesSimple Directivesfoo(float *a, float *b, float *c){ int i;#pragma parallel for (i=0; i<N; i++) { *c++ = (*a++)*bar(b++); };}

Pointers and procedure calls with escaped pointers prevent analysis for autoparallelization

Use simple directives instead

Features by Intel Compilers - Parallelization

Page 30: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

void foo()void foo()

{ int a[1000], b[1000], c[1000], x[1000], i, NUM;{ int a[1000], b[1000], c[1000], x[1000], i, NUM;

/* parallel region *//* parallel region */

#pragma omp parallel private(NUM) shared(x, a, b, c)#pragma omp parallel private(NUM) shared(x, a, b, c)

{ NUM = omp_get_num_threads();{ NUM = omp_get_num_threads();

#pragma omp for private(i) /* work-sharing for loop */#pragma omp for private(i) /* work-sharing for loop */

for (i = 0; i< 1000; i++) {for (i = 0; i< 1000; i++) {

x[i] = bar(a[i], b[i], c[i], NUM); /* assume bar has no side-effects */ x[i] = bar(a[i], b[i], c[i], NUM); /* assume bar has no side-effects */

}}

}}

}}

OpenMP* DirectivesOpenMP* Directives

icl -Qopenmp -c foo.c { -xopenmp on Linux}foo.cfoo.c(10) : (col. 1) remark: OpenMP DEFINED LOOP WAS PARALLELIZED.foo.c(7) : (col. 1) remark: OpenMP DEFINED REGION WAS PARALLELIZED.

Features by Intel Compilers - Parallelization

Page 31: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

OpenMP™ + VectorizationOpenMP™ + Vectorization

Combined speedupCombined speedupOrder of use might be importantOrder of use might be important

– Parallelization overheadParallelization overhead

– Vectorize inner loopsVectorize inner loops

– Parallelize outer loopsParallelize outer loops

Supported by Intel® CompilersSupported by Intel® Compilers

Features by Intel Compilers

Page 32: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

Make performance a feature of your applications today –

stay competitive

Make performance a feature of your applications today –

stay competitive

Intel® CompilersIntel® Compilers

Leading-Edge compiler technologiesLeading-Edge compiler technologiesCompatible with leading industry standard Compatible with leading industry standard

compilerscompilersProcessor optimized code generationProcessor optimized code generationSupport single source code across Intel Support single source code across Intel

processor familiesprocessor families

Intel SW Development Tools

Page 33: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

CollateralCollateralIntel Technology Journal Intel Technology Journal

– http://developer.intel.com/technology/itj/http://developer.intel.com/technology/itj/

Intel Threading ToolsIntel Threading Tools– http://www.intel.com/software/products/http://www.intel.com/software/products/

OpenMPOpenMP– http://www.openmp.orghttp://www.openmp.org

HT Overview HT Overview – http://www.ixbt.com/cpu/pentium4-3ghz-ht.shtmlhttp://www.ixbt.com/cpu/pentium4-3ghz-ht.shtml

Page 34: Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.

To be continued…To be continued…