Top Banner
Intel Confidential Intel® Parallel Studio XE 2011 -- Industry Leading Tools For Advanced Performance Xiaoping Duan Technical Consulting Engineer SSG/DPD/CMTS Aug 16 th , 2011
18

Intel® Parallel Studio XE 2011 · PDF file-a scalable alternative serial container ... to maximize concurrency ... Intel Compiler in Microsoft Visual Studio* 17

Feb 04, 2018

Download

Documents

HoàngNhi
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Intel® Parallel Studio XE 2011 · PDF file-a scalable alternative serial container ... to maximize concurrency ... Intel Compiler in Microsoft Visual Studio* 17

Intel Confidential

Intel® Parallel Studio XE 2011 -- Industry Leading Tools For Advanced Performance

Xiaoping Duan Technical Consulting Engineer

SSG/DPD/CMTS Aug 16th, 2011

Page 2: Intel® Parallel Studio XE 2011 · PDF file-a scalable alternative serial container ... to maximize concurrency ... Intel Compiler in Microsoft Visual Studio* 17

The software industry goes parallel Increase application performance & scalability

The Challenge • Serial applications

can not take advantage of multicore platforms.

• Number of processor cores is increasing.

• Remaining competitive requires parallelizing serial code or creating new parallel applications.

Application Performance “Scaling”

1 2 4 8 1X

2X

4X

8X

Number of processor cores

Page 3: Intel® Parallel Studio XE 2011 · PDF file-a scalable alternative serial container ... to maximize concurrency ... Intel Compiler in Microsoft Visual Studio* 17

Phase Productivity Tool Feature Benefit

Advanced Build & Debug

Intel® Composer XE

C/C++ and Fortran compilers, performance libraries, and parallel models

Application performance, scalability and quality for current multicore and future many-core systems.

Advanced Verify

Intel® Inspector XE

Memory & threading error checking tool for higher code reliability & quality

Increases productivity and lowers cost, by catching memory and threading defects early

Advanced Tune

Intel® VTune™

Amplifier XE

Performance Profiler to optimize performance and scalability

Removes guesswork, saves time, makes it easier to find performance and scalability bottlenecks Combines ease of use with deeper insights.

Intel® Parallel Studio XE 2011 Powerful tools to create fast, reliable and secure code

Page 4: Intel® Parallel Studio XE 2011 · PDF file-a scalable alternative serial container ... to maximize concurrency ... Intel Compiler in Microsoft Visual Studio* 17

Optimizing Compiler and Libraries Intel® Parallel Composer

Built-in optimizations and libraries

yield faster code

World Class Fortran Compilers

Expanded and Simplified Parallelism Options

“Intel® Parallel Studio XE 2011 is a great software development tool for performance-oriented Windows*-based C++ software developers. I achieved an astonishing boost in performance by using Intel® Cilk™ Plus and the Array Notations in my code. If you need performance, try Intel Parallel Studio XE 2011.”

Jorge Martinis

Research & Development Engineer

BR&E Inc.

Page 5: Intel® Parallel Studio XE 2011 · PDF file-a scalable alternative serial container ... to maximize concurrency ... Intel Compiler in Microsoft Visual Studio* 17

Intel® Parallel Building Blocks Enables portability, reliability, scalability, simplicity

• Data parallel and general purpose parallelism solutions • Language extensions and library solutions • Optimized high-level algorithms and low-level constructs to build

custom algorithms • Mix and match new parallel models within an application to suit the

developer’s environment / application and algorithms

Page 6: Intel® Parallel Studio XE 2011 · PDF file-a scalable alternative serial container ... to maximize concurrency ... Intel Compiler in Microsoft Visual Studio* 17

Intel® Cilk™ Plus Keywords

Feature Example Semantics

Spawning a function call

x = cilk_spawn func(g(y),h(z)); func executes asynchronously.

Synchronization statement

cilk_sync; Wait for all children spawned inside the current function.

Parallel_for loop

cilk_for (int i = 0; i < N; i++) { statement; }

Loop iterations execute in parallel.

Page 7: Intel® Parallel Studio XE 2011 · PDF file-a scalable alternative serial container ... to maximize concurrency ... Intel Compiler in Microsoft Visual Studio* 17

C/C++ Extensions for Array Notation

• Intel-specific extension to C/C++

• Vector operations are expressed directly in the language, creating predictable performance

• No new data types added

√ Available in Intel® Composer XE 2011

√ Integrates seamlessly with: • Intel® Threading Building Blocks • Intel® Cilk™ Plus • OpenMP* • pthreads • Windows*

√ Support all hardware as the Intel® C/C++ Compiler

Page 8: Intel® Parallel Studio XE 2011 · PDF file-a scalable alternative serial container ... to maximize concurrency ... Intel Compiler in Microsoft Visual Studio* 17

Intel® Threading Building Blocks

Concurrent Containers

Common idioms for concurrent access

- a scalable alternative serial container

with a lock around it

Miscellaneous Thread-safe timers

Generic Parallel Algorithms

Efficient scalable way to exploit the power

of multi-core without having to start

from scratch

Task scheduler

The engine that empowers parallel

algorithms that employs task-stealing

to maximize concurrency

Synchronization Primitives

User-level and OS wrappers for

mutual exclusion, ranging from atomic

operations to several flavors of

mutexes and condition variables

Memory Allocation

Per-thread scalable memory manager and false-sharing free allocators

Threads

OS API wrappers

Thread Local Storage

Scalable implementation of thread-local

data that supports infinite number of TLS

Page 9: Intel® Parallel Studio XE 2011 · PDF file-a scalable alternative serial container ... to maximize concurrency ... Intel Compiler in Microsoft Visual Studio* 17

Intel® Array Building Blocks

ArBB kernels in “serial” C++ app

Standard C++ compiler

ArBB Runtime

• Templates

• Overloaded operators

• Links with dynamic library

• Dynamic compiler

• Threading and heterogeneous runtime

Sequential

CPU Future

Intel® SSE**

Intel® AVX**

Single source

Page 10: Intel® Parallel Studio XE 2011 · PDF file-a scalable alternative serial container ... to maximize concurrency ... Intel Compiler in Microsoft Visual Studio* 17

Where is my application…

Spending Time? Wasting Time? Waiting Too Long?

• Focus tuning on functions taking time

• See call stacks • See time on source

• See cache misses on your source

• See functions sorted by # of cache misses

• See locks by wait time

• Red/Green for CPU utilization during wait

Intel® VTune™ Amplifier XE Performance Profiler

• Windows & Linux

• Low overhead

• No special recompiles

Advanced Profiling For Scalable Multicore Performance

Claire Cates

Principal Developer, SAS Institute Inc.

We improved the performance of the latest run 3 fold. We wouldn't have found the problem without something like Intel® VTune™ Amplifier XE.

Page 11: Intel® Parallel Studio XE 2011 · PDF file-a scalable alternative serial container ... to maximize concurrency ... Intel Compiler in Microsoft Visual Studio* 17

New - Intel® Inspector XE Combines memory and thread checking in one tool

• Finds hard to detect coding

defects – Memory leaks and memory

corruption – Threading data races and

deadlocks

• Supports different implementations of threading – Native threads and

Intel® Parallel Building Blocks

• Works on standard builds and binaries

• Can identify over 250 security errors

Finds security errors including Buffer overruns and uninitialized variables

Finds data races and deadlocks

Make changes to source code in context of error

Advanced memory checking

Thread checking

Security static analysis

Find errors like memory leaks and corruption

Page 12: Intel® Parallel Studio XE 2011 · PDF file-a scalable alternative serial container ... to maximize concurrency ... Intel Compiler in Microsoft Visual Studio* 17

Simplifies the Transition from Multicore to Future Manycore

Page 13: Intel® Parallel Studio XE 2011 · PDF file-a scalable alternative serial container ... to maximize concurrency ... Intel Compiler in Microsoft Visual Studio* 17
Page 14: Intel® Parallel Studio XE 2011 · PDF file-a scalable alternative serial container ... to maximize concurrency ... Intel Compiler in Microsoft Visual Studio* 17

Optimization Notice

Intel® compilers, associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel® and non-Intel microprocessors (for example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors. In addition, certain compiler options for Intel compilers, including some that are not specific to Intel micro-architecture, are reserved for Intel microprocessors. For a detailed description of Intel compiler options, including the instruction sets and specific microprocessors they implicate, please refer to the “Intel® Compiler User and Reference Guides” under “Compiler Options." Many library routines that are part of Intel® compiler products are more highly optimized for Intel microprocessors than for other microprocessors. While the compilers and libraries in Intel® compiler products offer optimizations for both Intel and Intel-compatible microprocessors, depending on the options you select, your code and other factors, you likely will get extra performance on Intel microprocessors. Intel® compilers, associated libraries and associated development tools may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel® SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel® and non-Intel microprocessors, Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements. We hope to win your business by striving to offer the best performance of any compiler or library; please let us know if you find we do not.

Notice revision #20101101

Page 15: Intel® Parallel Studio XE 2011 · PDF file-a scalable alternative serial container ... to maximize concurrency ... Intel Compiler in Microsoft Visual Studio* 17

Back up

Page 16: Intel® Parallel Studio XE 2011 · PDF file-a scalable alternative serial container ... to maximize concurrency ... Intel Compiler in Microsoft Visual Studio* 17

Intel® Parallel Composer Intel® C++ Performance on Intel Architecture

Page 17: Intel® Parallel Studio XE 2011 · PDF file-a scalable alternative serial container ... to maximize concurrency ... Intel Compiler in Microsoft Visual Studio* 17

Intel Compiler in Microsoft Visual Studio*

17

Page 18: Intel® Parallel Studio XE 2011 · PDF file-a scalable alternative serial container ... to maximize concurrency ... Intel Compiler in Microsoft Visual Studio* 17

Array Building Blocks

Generalized data-parallel programming model

Supports wide variety of patterns and collections

Supports explicit dynamic generation and management of code

Implementation targets both threads and vector code

Machine independent optimization

Offload management Machine specific code

generation and optimizations

Scalable threading runtime

Virtual Machine

Virtual ISA

Debug/ Svcs

Memory Manager

Backend JIT

Compiler

Threading Runtime

CPU Accelerator Future

Application calling ArBB APIs

C++ API Other Language Bindings