Intel Confidential Intel® Parallel Studio XE 2011 -- Industry Leading Tools For Advanced Performance Xiaoping Duan Technical Consulting Engineer SSG/DPD/CMTS Aug 16 th , 2011
Intel Confidential
Intel® Parallel Studio XE 2011 -- Industry Leading Tools For Advanced Performance
Xiaoping Duan Technical Consulting Engineer
SSG/DPD/CMTS Aug 16th, 2011
The software industry goes parallel Increase application performance & scalability
The Challenge • Serial applications
can not take advantage of multicore platforms.
• Number of processor cores is increasing.
• Remaining competitive requires parallelizing serial code or creating new parallel applications.
Application Performance “Scaling”
1 2 4 8 1X
2X
4X
8X
Number of processor cores
Phase Productivity Tool Feature Benefit
Advanced Build & Debug
Intel® Composer XE
C/C++ and Fortran compilers, performance libraries, and parallel models
Application performance, scalability and quality for current multicore and future many-core systems.
Advanced Verify
Intel® Inspector XE
Memory & threading error checking tool for higher code reliability & quality
Increases productivity and lowers cost, by catching memory and threading defects early
Advanced Tune
Intel® VTune™
Amplifier XE
Performance Profiler to optimize performance and scalability
Removes guesswork, saves time, makes it easier to find performance and scalability bottlenecks Combines ease of use with deeper insights.
Intel® Parallel Studio XE 2011 Powerful tools to create fast, reliable and secure code
Optimizing Compiler and Libraries Intel® Parallel Composer
Built-in optimizations and libraries
yield faster code
World Class Fortran Compilers
Expanded and Simplified Parallelism Options
“Intel® Parallel Studio XE 2011 is a great software development tool for performance-oriented Windows*-based C++ software developers. I achieved an astonishing boost in performance by using Intel® Cilk™ Plus and the Array Notations in my code. If you need performance, try Intel Parallel Studio XE 2011.”
Jorge Martinis
Research & Development Engineer
BR&E Inc.
Intel® Parallel Building Blocks Enables portability, reliability, scalability, simplicity
• Data parallel and general purpose parallelism solutions • Language extensions and library solutions • Optimized high-level algorithms and low-level constructs to build
custom algorithms • Mix and match new parallel models within an application to suit the
developer’s environment / application and algorithms
Intel® Cilk™ Plus Keywords
Feature Example Semantics
Spawning a function call
x = cilk_spawn func(g(y),h(z)); func executes asynchronously.
Synchronization statement
cilk_sync; Wait for all children spawned inside the current function.
Parallel_for loop
cilk_for (int i = 0; i < N; i++) { statement; }
Loop iterations execute in parallel.
C/C++ Extensions for Array Notation
• Intel-specific extension to C/C++
• Vector operations are expressed directly in the language, creating predictable performance
• No new data types added
√ Available in Intel® Composer XE 2011
√ Integrates seamlessly with: • Intel® Threading Building Blocks • Intel® Cilk™ Plus • OpenMP* • pthreads • Windows*
√ Support all hardware as the Intel® C/C++ Compiler
Intel® Threading Building Blocks
Concurrent Containers
Common idioms for concurrent access
- a scalable alternative serial container
with a lock around it
Miscellaneous Thread-safe timers
Generic Parallel Algorithms
Efficient scalable way to exploit the power
of multi-core without having to start
from scratch
Task scheduler
The engine that empowers parallel
algorithms that employs task-stealing
to maximize concurrency
Synchronization Primitives
User-level and OS wrappers for
mutual exclusion, ranging from atomic
operations to several flavors of
mutexes and condition variables
Memory Allocation
Per-thread scalable memory manager and false-sharing free allocators
Threads
OS API wrappers
Thread Local Storage
Scalable implementation of thread-local
data that supports infinite number of TLS
Intel® Array Building Blocks
ArBB kernels in “serial” C++ app
Standard C++ compiler
ArBB Runtime
• Templates
• Overloaded operators
• Links with dynamic library
• Dynamic compiler
• Threading and heterogeneous runtime
Sequential
CPU Future
Intel® SSE**
Intel® AVX**
Single source
Where is my application…
Spending Time? Wasting Time? Waiting Too Long?
• Focus tuning on functions taking time
• See call stacks • See time on source
• See cache misses on your source
• See functions sorted by # of cache misses
• See locks by wait time
• Red/Green for CPU utilization during wait
Intel® VTune™ Amplifier XE Performance Profiler
• Windows & Linux
• Low overhead
• No special recompiles
Advanced Profiling For Scalable Multicore Performance
Claire Cates
Principal Developer, SAS Institute Inc.
We improved the performance of the latest run 3 fold. We wouldn't have found the problem without something like Intel® VTune™ Amplifier XE.
New - Intel® Inspector XE Combines memory and thread checking in one tool
• Finds hard to detect coding
defects – Memory leaks and memory
corruption – Threading data races and
deadlocks
• Supports different implementations of threading – Native threads and
Intel® Parallel Building Blocks
• Works on standard builds and binaries
• Can identify over 250 security errors
Finds security errors including Buffer overruns and uninitialized variables
Finds data races and deadlocks
Make changes to source code in context of error
Advanced memory checking
Thread checking
Security static analysis
Find errors like memory leaks and corruption
Optimization Notice
Intel® compilers, associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel® and non-Intel microprocessors (for example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors. In addition, certain compiler options for Intel compilers, including some that are not specific to Intel micro-architecture, are reserved for Intel microprocessors. For a detailed description of Intel compiler options, including the instruction sets and specific microprocessors they implicate, please refer to the “Intel® Compiler User and Reference Guides” under “Compiler Options." Many library routines that are part of Intel® compiler products are more highly optimized for Intel microprocessors than for other microprocessors. While the compilers and libraries in Intel® compiler products offer optimizations for both Intel and Intel-compatible microprocessors, depending on the options you select, your code and other factors, you likely will get extra performance on Intel microprocessors. Intel® compilers, associated libraries and associated development tools may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel® SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel® and non-Intel microprocessors, Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements. We hope to win your business by striving to offer the best performance of any compiler or library; please let us know if you find we do not.
Notice revision #20101101
Array Building Blocks
Generalized data-parallel programming model
Supports wide variety of patterns and collections
Supports explicit dynamic generation and management of code
Implementation targets both threads and vector code
Machine independent optimization
Offload management Machine specific code
generation and optimizations
Scalable threading runtime
Virtual Machine
Virtual ISA
Debug/ Svcs
Memory Manager
Backend JIT
Compiler
Threading Runtime
CPU Accelerator Future
Application calling ArBB APIs
C++ API Other Language Bindings