Intel® Cluster Studio XE Introduction to Intel® Cluster Studio XE …community.hartree.stfc.ac.uk/access/content/group/admin... · 2013-04-15 · This document contains information
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. This document contains information on products in the design phase of development. All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps.
Intel, VTune, Cilk, Xeon and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries.
*Other names and brands may be claimed as the property of others
• Intel® Cluster Studio XE is a collection of tools for developing, debugging, tuning and maintaining HPC applications
• Intel® Inspector XE and Intel® VTune™ Amplifier XE extend the Cluster Tool Kit. These tools are especially useful for hybrid applications combining shared memory threading and distributed memory message passing
• Intel® Cluster Studio XE includes the Intel C, C++ and Fortran Compiler
• Many additional features of Intel® Composer XE go beyond classical compilers, for example: correctness checking and libraries like the Intel® Math Kernel Library (MKL)
• The Intel Fortran compiler implements key Fortran 2008 features like Coarray Fortran, and almost a complete implementation of Fortran 2003
• ITAC is a tool for understanding Intel® MPI program behavior, finding bottlenecks and performance analysis
• ITAC is more than a profiler because it visualizes temporal behavior of MPI routines showing dependencies and load imbalances
• A correctness checking library is also available
• ITAC is easy to use, simply invoke it by setting an extra flag to mpirun/mpiexec or by setting an environment variable without changing your application or your run scripts
• combines classical program performance analysis and thread profiling
• For hybrid HPC applications the threaded part may be investigated by Intel® VTune™ Amplifier XE. Each MPI process may be profiled with some limitation on the event counters
• For each MPI rank a result directory for further analysis with the Intel® VTune™ Amplifier XE GUI is generated. The result generation can be limited to a subset of ranks
• A simple test program is part of the IMPI distribution. Versions for C, C++ and Fortran are available. We may go with the C version: $ cp $I_MPI_ROOT/test/test.c
$ mpiicc -o test.x test.c
mpiicc is the wrapper script for icc. For gcc it is mpicc. Also available mpiifort, mpiicpc and mpicxx
• This works on a single node and systems with automatic node file settings. For more nodes we usually need to define a hostfile with a single node name per line:
• test.x prints out rank and hostname for each MPI process
• More information about basic settings for this run may be obtained by setting: $ export I_MPI_DEBUG=5 and run the program again. The environment variable will be propagated to all ranks automatically.
• Intel® MPI by default pins processes to cores, sockets and nodes. The default strategy might change in the future but the user may enforce different mappings
• The most easy way is to use the process per node flag: $ mpirun –ppn <nprocs-per-node> –n <nprocs> ./test.x
<nprocs-per-node> == 1 is round robin with next process on next node. The effect of other settings may be explored with test.x
• ITAC may be applied without touching the program or environment. One way to get a first trace is: $ mpirun –trace –n <nprocs> ./test.x
• Alternatively, just set the preload library and run without the –trace flag: $ export LD_PRELOAD=libVT.so $ mpirun –f <hostfile> -n <nprocs> ./test.x this is actually what the flag does internally. This methodology may be applied to situations with complex run scripts not knowing where the mpirun is actually executed. Note: this does not work for statically linked Intel® MPI (not recommended).
• Correctness Checker validates MPI correctness. It uses another library but may be started like the ordinary ITAC: $ mpirun –check –n <nprocs> ./test.x or $ export LD_PRELOAD=libVTmc.so
• Intel® Inspector XE offers memory checking and correctness checking for threaded applications. For MPI applications we may use it in the following way:
The command line version inspxe-cl is used as the MPI executable. Lab example: $ mpirun –n 4 inspxe-cl --result-dir insp_mi3 --collect mi3 -- ./poisson.x
• After running the MPI program result directories should appear with the previously defined base name and indexed with MPI rank.
• Results may be viewed as ASCII output: $ inspxe-cl --report problems --r insp_mi3.0 or by using the Intel® Inspector XE GUI: $ inspxe-gui insp_mi3.0 Results may also be transferred to a Windows* computer and viewed by the Windows* version of Intel® Inspector XE
• Intel® VTune™ Analyzer XE provides detailed information timings and core events. It can also provide insight into the behavior of threaded applications:
The command line version amplxe-cl is used as the MPI executable. Detailed Lab example: $ mpirun –n 4 amplxe-cl --result-dir axe_ho -collect hotspots -- ./poisson.x
• After running the MPI program result directories should appear with the previously defined base name and indexed by MPI rank.
• Results may be viewed as ASCII output: $ amplxe-cl --report hotspots -r axe_ho.0 or by using the Intel® Vtune™ Amplifier GUI: $ amplxe-gui axe_ho.0 Results may also be transferred to Windows Laptop and viewed by the Windows* version of Intel® Vtune™ Amplifier XE
• Short Intel® Cluster Studio XE Overview with single slide per tool showing history and HPC related purpose of each separate tool that is part of Intel® Cluster Studio XE
• Short introduction into each tool showing explicit command lines for most simple usage scenarios. These introductions can be viewed as “Quick Start Guides“. The emphasis is on simplicity.
• Support for Hybrid Threading + MPI application by inclusion of new MPI aware versions of Intel® Inspector XE and Intel® VTune™ Amplifier XE