Top Banner
Introduction to the Intel tracing tools R. Bader (LRZ) M. Deilmann (Intel)
34

Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

Aug 30, 2019

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

Introduction to the Intel tracing tools

R. Bader (LRZ)

M. Deilmann (Intel)

Page 2: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 2March 2008

Help - my parallel program doesn’t scale!

Isolate performance issues in parallel programs

... especially those in large MPI programs

inefficient MPI programming bottleneckslatency-dominated performance load imbalancedeadlock (hmmm ...)

... but also those via subroutine time consumption

Existing facilitiesgprof for subroutinesPMPI profiling interface (MPI standard!) + upshot

... have problems with large parallel programs: Trace files cannot be easily analyzed

Page 3: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 3March 2008

Intel Tracing Tools

Solve the above problem:integration of subroutine and MPI profilingoriginally developed by ZI für Mathematik/FZ Jülichmarketed by Pallas, now Intel

Platformsx86 (Solaris & Linux)IPF, (EM64T) (Linux)

Future development on non-Intel platforms

unclearVampir NG (FZ Jülich, ZIH Dresden)

Supercomputers:Altix 3700 and 4700NEC SX-4/5/6/8CRAY

Version installed at LRZCampus licenseGUI Version 7.1Tracing Library Version 7.1

at RRZEnot availabletrace libraries are rather expensive

Page 4: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 4March 2008

Documentation and resources

Intel Web Sitehttp://www.intel.com/cd/software/products/asmo-na/eng/cluster/tanalyzer

Note: name mapping VAMPIR Intel trace analyzer

VAMPIRtrace Intel trace collector

LRZ Web Sitehttp://www.lrz-muenchen.de/services/software/parallel/vampir

see especially the links to the user’s guides at the end of this document

specific Usage instructions for each platform the tracing libraries are available on

Page 5: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

Basic usage of tracing librariesand GUI

Page 6: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 6March 2008

Two Components of ITT

1. Instrumentation

MPI library calls: divert to profiling interfaceAchieved by specifying –vtraceswitch to mpif90/mpicccommands

this is a LRZ-specific settingmake things work uniformly across platformsIntel MPI: -t=log instead

In most cases, relinkingsufficient

Additional functionalitysubroutine tracing switching tracing on and off

requires source code changes and recompilation

2. Visualization

during/after program run tracefile is written to disk

may be visualized on any platform using the ITA GUI

many views of the trace data available via a transparent menu structure

hundreds of MPI processes viewable with good performance

STF (Structured Trace File) format still lots of data

Page 7: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 7March 2008

Step 1: Instrument your Code

Example: MPI heat conduction exampleUses non-blocking sends and receives as well as reduction operationssets up MPI data types and topology (MPI_CART_CREATE)

set up environment: module load mpi_tracing

recompile code (for inclusion of subroutine tracing):mpif90 -vtrace -c <lots_of_options> <source_file>

relink binary:mpif90 -vtrace -o heat_mpi.exe <lots_of_objects>

this is the name of our application

Warning:do not specify MPI librariesat your own discretion

ruins link sequence set up by –vtrace!

Page 8: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 8March 2008

Step 2: Prepare Configuration File

Edit a file vt_heat to contain the following lines:

# Log fileLOGFILE-NAME heat.stfLOGFILE-FORMAT STF# disable all MPI activityACTIVITY MPI OFF# enable all bcasts, recvs and sendsSYMBOL MPI_WAITALL ONSYMBOL MPI_IRECV ONSYMBOL MPI_ISEND ONSYMBOL MPI_BARRIER ONSYMBOL MPI_ALLREDUCE ON# enable all activities in the Application classACTIVITY Application ON

Page 9: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 9March 2008

Step 3: Run the application

Set up environment: module load mpi_tracing (if not yet done)export VT_CONFIG=vt_heat

Run the executable:mpirun -np 4 ./heat_mpi.exe

Upon completion you should get the message

and a number of files heat.stf* should exist.

To save disk space, convert to stfsingle:

stftool heat.stf --convert - --logfile-format STFSINGLE | \gzip -c > heat.stfsingle.gz

[0] Intel Trace Collector INFO: Writing tracefile heat.stf in /home/cluster/a2832ba/Kurse/ParProg_2005/vt/heat

Do not use this format forviewing!

Page 10: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 10March 2008

Step 4: Start the GUI

X11R5 or higher is needed

Is X Authentication working (Use ssh –X ...)?

Now enter

traceanalyzer pmatmul.stfThe Qt based Trace Analyzer GUI should now start up, showing the following widgets:

Main windowInside main window: panel referring to the presently analyzed tracefile

there can be more than one panel/tracefile in the main window at any time

Page 11: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 11March 2008

And this is what things look like at start:Chart: Function profile Flat profile

Page 12: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 12March 2008

Overall load balance

times inclusive traced subroutine calls

Page 13: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 13March 2008

Resolving the MPI callsright klick Ungroup Group MPI

can sort by category

Times foreach MPI routine specified in configfile

Page 14: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 14March 2008

Call tree

Notes:

call tree can only resolve activities switched on for tracing

By default, User code not resolvable unless automatic or manual subroutine tracing compiled in

(automatic subroutine tracing not yet available on Intel)

Page 15: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 15March 2008

Call tree split up into user processes

Select “Children of Group All_Processes”

can also select expansion into various groups, two of which are defined at least:

MPIApplication

(dotted circle click)

Page 16: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 16March 2008

... call up function group editor

Select e.g., one of the major function groups and press OK

You will then essentially filter out all other groups from the view ...

Page 17: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 17March 2008

... like so:

Page 18: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 18March 2008

Now a different view:Time-Line of all MPI processes

Select

Charts Event Timeline

this will open an additional pane in the tracing subwindow

then zoom in to some region of interest by using the left mouse button

do this repeatedly to obtain ...

Page 19: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 19March 2008

... this small, 0.0001 second trace section

red parts: (different shades) are the various MPI activitiesblue parts:User code

black lines:communication, left click for context menu message properties ...to obtain further information about this message

Note that profile window (below) adjusts itself to selected timeslice!

Page 20: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 20March 2008

Next Chart option:Qualitative Timeline

This will give cosynchronous information for e.g., transfer rates

other quantities selectable via context menu:

transfer durationtransfer volume

selectable events can befunction eventsmessages (as here)collective operations

Page 21: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 21March 2008

Quantitative Timeline: Accumulated measure of activities

This answers the question:

How many CPUs are presently engaged in each activity?

Example:

Yellow arrow shows interval where application does communicationexclusively

Note:

can remove activities via the context menu if too cluttered

Page 22: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 22March 2008

The final Chart type:Message profile

gives you metricsmessage volumethroughputtimecount

for senders (horizontal) vs.

receivers (vertical)

Note:color codes enable you to easily find sore pointsnon-dense communication patterns are good

Page 23: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

Some advanced features:

The Vampirtrace API

Controlling tracefile output

MPI message checking

Page 24: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 24March 2008

User-Level instrumentation:The ITC API (1)

Subroutine library:Control profiling process from within the programDefine own activity classes and their membersDefine performance counters (not discussed here)Include Files

Fortran: VT.incC: VT.h

Fortran calls differ from C callsIn this presentation, reference to Fortran callsrefer to user’s guide for further information

VT API has changed over timeMay need to update existing instrumentation for newer releases

Page 25: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 25March 2008

User-Level instrumentation:The ITC API (2)

initialize / finalizeVTINIT(ierr) is automatically called fromMPI_INIT(...)

for tracing of non-MPI programs:

explicit call is required

use libVTsp.a if available

VTFINI(ierr) called fromMPI_FINALIZE(...)

control:VTTRACEON(), VTTRACEOFF()

switch tracing on/offVTFLUSH(ierr)

write memory to flush file

user-defined states

group subroutine calls into class of activities ...

call VTCLASSDEF( && 'mylib',mylib_handle,ierr)

... and then add symbols for each state (subroutine?)

call VTFUNCDEF( && 'mystate1',mylib_handle, && mystate1_handle,ierr)

(etc.)... until group is complete

default integers

Page 26: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 26March 2008

User-Level instrumentation:The ITC API (3)

Actual measurement start with

call VTBEGIN( && mystate1_handle,ierr)

stop withcall VTEND( && mystate1_handle,ierr)

nested calls are possible... but no overlaps!

program proginclude 'VT.inc'...[ call vttraceoff() ]call mpi_init(...) call vtclassdef( &

'mylib',mylib_handle,ierr)call vtfuncdef('name', &

mylib_handle,name_handle,ierr)[ call vttraceon(ierr) ]...subroutine name(....)include 'VT.inc'

(declarations)call vtbegin(name_handle,ierr)

(executable statements)call vtend(name_handle,ierr)

name_handleglobal variable

Page 27: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 27March 2008

Controlling tracefile output (1)

Tracefile production:trace data are stored in memory buffers, possibly controlled by configuration entries:

MEM-BLOCKSIZE: size of buffersMEM-MAXBLOCKS: maximum number of memory buffers

VT should not use too much memory (may disrupt application!)

Exhaustion of memory buffers can happen after some time

⇒ What to do?Applications which hang or crash

⇒ ?

x

64kByte

0 =unlimited

Environment Variables:VT_CONFIG: Name of configuration fileVT_CONFIG_RANK: Rank of MPI process

reading configuration filewriting trace file

Default is MPI process ID 0Always specify name and format of log file in configuration file!Change rank of process writing trace file:

LOGFILE-RANK <number>entry in configuration file

Page 28: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 28March 2008

Controlling tracefile output (2)

coping with buffer overrun

a) Flush data to disk: default action (AUTOFLUSH is on)

possibly use suitable MEM-FLUSHBLOCKS value to trigger background flushing

b) Overwrite from beginning:only last part of tracefile written.

AUTOFLUSH offMEM-OVERWRITE on

c) Stop trace collection only first part of tracefile written

AUTOFLUSH off

MEM-OVERWRITE off(default)

coping with tracefile size

Long runs may produce tens (or hundreds) GBytes!STF reduces difficulties with visualizing this

Frame definition (not discussed here)

Reduction of size:Insert VTtraceon() and VTtraceoff() callsUse activity/symbol filtering via configuration file (e. g., no administrative MPI calls)Use b) or c) shown leftTrace subset of processes

PROCESS 0:N offPROCESS 0:N:3 on

Page 29: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 30March 2008

Controlling tracefile output (3)Treatment of crashing applications

In this case:replace the –vtrace switch by –vtrace_fs(“failsafe tracing”)presently available only for Fortran on the Altix

replacement of tracing library

Failures:Signals (SIGINT, SIGTERM) – SIGKILL not caught

Premature exit of processes without MPI_FinalizeMPI error (comm. problems, wrong parameters)

What is done?freezes MPI processeswriting done via TCP socketsSIGINT is sent after writing data

Page 30: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 31March 2008

Controlling tracefile output (4)Treatment of hanging applications

Deadlock detection is automatically performedIf ITC observes no progress for a certain amount of time in any process, then it assumes a deadlock

stops the application and writes a trace file.

Timeout is configurable via DEADLOCK-TIMEOUT

“no progress” is defined as “inside the same MPI call” -obviously this is just a heuristic approach and may fail:

If all processes remain in MPI for a long time e.g. due to a long data transfer, then the timeout might be reached premature abort

default timeout is 5 minutes

Page 31: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 32March 2008

Further functionality

Recording source locationenables you to dive into source from message line or function in the GUIpotentially very large performance overheadpresently only for GCC based MPI

Automatic subroutine tracingclass application can be ungroupedno manual instrumentation neededUse the –tcollect switch in addition to –vtrace

Intel compiler 10.0 or higher is needed

Page 32: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 33

MPI message checking

Error detection for MPI code

also makes use of MPI profiling interfacesupported module stack on LRZ systems:

completely recompile application

No static linkage

Run executable with LD_PRELOAD set:

Run executable with LD_PRELOAD set:

report is written to stderrcheck lines marked ERROR or WARNING

Further environment variables for execution control

VT_DEADLOCK_TIMEOUTdefault 60 sec

VT_DEADLOCK_WARNINGdefault 300 s

VT_CHECK_MAX_ERRORSdefault 1

VT_CHECK_MAX_REPORTSdefault 0 (unlimited)

use –genv to propagate

March 2008

module unload mpi_tracing module unload mpi.parastation mpi.altix# may need to unload further modules module load mpi.intel module load mpi_tracing

mpiexec -genv LD_PRELOAD libVTmc.so \-n [No. of tasks] ./myprog.exe

Page 33: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 34

Message checking: Example output for insufficient buffering

[0] INFO: CHECK LOCAL:EXIT:SIGNAL ON… many more info lines

[0] ERROR: LOCAL:BUFFER:INSUFFICIENT_BUFFER: error[0] ERROR: Buffer [0x6000000000056920, 0x600000000005698e[ of size 110 cannot store message of size 111.[0] ERROR: Free space [0x6000000000056920, 0x600000000005698e[, 110 bytes.[0] ERROR: Check buffer handling (use larger buffer in MPI_Buffer_attach(),[0] ERROR: receive the oldest message(s) to free up space before buffering new ones,[0] ERROR: check for race conditions between buffering and receiving messages, etc).[0] ERROR: Note that message sizes are calculated using the worst-case scenario that[0] ERROR: the application has to be prepared for: MPI_Pack_size() + MPI_BSEND_OVERHEAD.[0] ERROR: New message of size 111 was to be sent by:[0] ERROR: MPI_Bsend(*buf=0x6006480, count=16, datatype=MPI_CHAR, dest=1, tag=100, comm=MPI_COMM_WORLD)

[0] ERROR: testpairs (/home/cluster/a2832ba/size.c:85)[0] ERROR: wrongbuffer (/home/cluster/a2832ba/size.c:358)[0] ERROR: main (/home/cluster/a2832ba/size.c:378)[0] ERROR: __libc_start_main (/lib/libc-2.4.so)[0] ERROR: _start (/home/cluster/a2832ba/size)[0] INFO: 1 error, limit CHECK-MAX-ERRORS reached => aborting

points out source lines and faulty MPI call

provides recommendations on fixing the problem

March 2008

Page 34: Introduction to the Intel tracing tools - blogs.fau.de fileMarch 2008 ©2005-8 LRZ, RRZE and Intel 3 Intel Tracing Tools Solve the above problem: integration of subroutine and MPI

©2005-8 LRZ, RRZE and Intel 35

MPI checking: Final remarks

further possible environment settings:deduce from <program>.prot

deadlock detection for large number of MPI taskspossibly need to readjust timeoutselse large messages may provoke false negative

Documentationcheck the Intel Trace Collector User‘s Guideon x86: can also integrate Valgrind for memory checking

March 2008