Top Banner
11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory, UTK Performance Evaluation Research Center, LBL [email protected] http://icl.cs.utk.edu/~mucci/dynaprof/snapshots/sc2002.ppt
55

11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

Jan 12, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

PAPI and Dynaprof

Application Signatures and Performance Analysis of Scientific Applications

Philip J. MucciInnovative Computing Laboratory, UTK

Performance Evaluation Research Center, LBL

[email protected]://icl.cs.utk.edu/~mucci/dynaprof/snapshots/sc2002.ppt

Page 2: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Goals

● Understanding the behavior of the application– Identification of bottlenecks.– Usage of the hardware resources.– Effects of that usage on performance.

● Using Dynaprof to achieve that goal– Command line usage– 3 Dynaprof probes

● Wallclock Time● Hardware performance counters● Resource usage traces

Page 3: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Motivation

● Optimize the application's performance.● Evaluate the algorithms efficiency.● Generate an application signature.

– A collection of data that represent the major terms in the performance model.

● Develop a performance model.

Page 4: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Overview of Hardware Counters

● Data is NOT PORTABLE, but PAPI is...● Small number of registers dedicated for

performance monitoring functions.– AMD Athlon, 4 counters

– Pentium <= III, 2 counters

– Pentium IV, 18 counters

– IA64, 4 counters

– Alpha 21x64, 2 counters

– Power 3, 8 counters

– Power 4, 8 counters to a group

– UltraSparc II, 2 counters

– MIPS R14K, 2 counters

Page 5: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Applications used in this Tutorial

● Serial: – FSPX: A binary alloy solidification benchmark.– SWIM: The SPEC shallow water benchmark.

● Parallel (MPI):– Ex19 from PetSC distribution. – Solves nonlinear driven cavity with multigrid. A 2D

driven cavity problem solved in a velocity-vorticity formulation.

Page 6: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

FPSX Execution Environment

● Intel PIII, 1.2 Ghz– FP Results/Clock: 1 1.2 Gflips

● 4 SP/clk with SSE, 2DP/clk with SSE2

– Caches: 16K/16K, 256K● G77 version 2.96-g -O -malign-double -mpentiumpro -funroll-

loops -fexpensive-optimizations

● Execution time:> /bin/time fspx

115.370u 0.030s 1:58.17 97.6% 0+0k 0+0io 162pf+0w

Page 7: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

swim Execution Environment

● IBM Nighthawk, 16-way Power 3, 375MHz– FP Results/Clock: 4 (1.5 Gflips)– Caches: 32K/64K, 8MB– MPI over TCP/IP via switch

● Xlc 5.0.2.1 built with -g -O3 -qstrict -qarch=pwr3 -qtune=pwr3

● Execution time:> /bin/time poe swim -procs 2

0.4u 0.0s 0:15 3% 217+3933k 0+0io 1pf+0w

Page 8: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

ex19 Execution Environment

● IBM Nighthawk, 16-way Power 3, 375MHz– FP Results/Clock: 4 (1.5 Gflips)– Caches: 32K/64K, 8MB

● Xlc 5.0.2.1 built with -g● Execution time:

> /bin/time poe ex19 -procs 2 -da_grid_x 56 -da_grid_y 56

0.520u 0.200s 0:44.18 1.6% 297+3580k 0+0io 0pf+0w

Page 9: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Gprof

● Gathers timer interrupts vs. text address.● Recompile with -p option.● Gprof profile is useful for a high level overview● Does it tell us why?

Page 10: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Gprof Profile of FSPX

%time cumulative self calls ms/call tot/call name 21.71 18.93 18.93 6080 3.11 3.11 flux_ 19.99 36.36 17.43 9124 1.91 3.91 proflux_ 8.26 43.56 7.20 6080 1.18 1.18 pde_ 8.11 50.63 7.07 6080 1.16 4.17 phase_ 7.96 57.57 6.94 100061386 0.00 0.00 cplintg_ 7.46 64.08 6.51 100061388 0.00 0.00 cpsintg_ 6.05 69.36 5.28 49807360 0.00 0.00 tsofx_ 5.60 74.24 4.88 49807362 0.00 0.00 tlofx_ 4.07 77.79 3.55 62202877 0.00 0.00 cpl_ 2.44 79.92 2.13 37371906 0.00 0.00 cps_ 1.67 81.38 1.46 37371904 0.00 0.00 hl_ 1.43 82.63 1.25 37371904 0.00 0.00 hs_ 1.07 83.56 0.93 24903681 0.00 0.00 elqds_ 0.89 84.34 0.78 37371904 0.00 0.00 aks_

Page 11: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

FPSX: Top 4 functions

● Top 4 functions make up 50% of execution time● In module update.F

– flux– proflux– pde

● In module phase.F– phase

● Use the list command to explore modules and functions

Page 12: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Gprof Profile of SWIM

% cumulative self time seconds seconds name 37.3 3.22 3.22 .calc2 [1] 33.4 6.10 2.88 .calc1 [2] 24.7 8.23 2.13 .calc3 [3] 1.3 8.34 0.11 .kickpipes [4] 1.0 8.43 0.09 .inital [5]

Page 13: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Gprof Profile of ex19

% cumulative self name 70.6 22.57 22.57 .MatLUFactorNumeric_SeqAIJ_Inode 6.4 24.61 2.04 .MatFDColoringCreate_MPIAIJ [2] 5.2 26.26 1.65 .MatSetValues_MPIAIJ [3] 3.4 27.35 1.09 .MatLUFactorSymbolic_SeqAIJ [4] 2.3 28.09 0.74 .MatSolve_SeqAIJ_Inode [5] 2.3 28.82 0.73 .FormFunctionLocal [6] 1.7 29.35 0.53 .memset [7] 1.2 29.74 0.39 .MatSetValues [8] 0.9 30.02 0.28 .MatFDColoringApply [9] 0.7 30.24 0.22 .kickpipes [10]

Page 14: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Dynaprof Environment Variables

● LD_LIBRARY_PATH: Colon seperated list where to look for shared libraries. We need to find:– DynInst library– PAPI library– Any dependancies on the above. (libperfctr.so,

libcpc.so)● DYNINSTAPI_RT_LIB: Full pathname of

DynInst runtime library.● No settings necessary for AIX/DPCL port

Page 15: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Running Dynaprof

● Usage:

dynaprof [-d] [serial_application]● -d enables debugging output● Specifying an application automatically loads it

into the tool immediately after initialization.

Page 16: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Command Line Interface

● Uses GNU Readline library for input● Full featured Command Line Editing

– File and command completion: <Tab>– History: <Up>/<Down>

● Settings, macros and aliases in ~/.inputrc● Allows Emacs or VI style bindings

– set editing-mode emacs– set editing-mode vi

● See man page, TexInfo file or home page.

Page 17: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Load command

● Starts the application and stops it at the first instruction.

● Usage:

load <application> [args]

> dynaprof

(dynaprof) load tests/fpsx

Page 18: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Poeload command

● For use with MPI applications on AIX and DPCL.– DPCL < 3.2.5 requires full path

● Usage:

poeload <application> [args]

(dynaprof) poeload tests/swim -procs 2

Page 19: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Mpiload command

● For use with MPI applications.● Stops the application after it calls PMPI_Init().

● Mostly useful for script driven execution of MPI jobs

● Usage:

mpiload <application> [args]

(dynaprof) mpiload tests/mpicount

Page 20: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Attach command

● Attaches to a running application (or poe process) and stops it.

● Usage:

attach <application> <pid>(dynaprof) ^Z

> tests/fspx &

[2] 17500

> fg

(dynaprof) attach tests/fspx 17500

Page 21: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Poeattach Command

● For use with MPI applications on AIX and DPCL.– DPCL < 3.2.5 requires full path

● Usage:

poeattach <application> <pid_of_poe>

(dynaprof) ^Z

poe ex19 -da_grid_x 56 -da_grid_y 56 -procs 2 &

[2] 17500

> fg

(dynaprof) poeattach ex19 17500

Page 22: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

List command

● list

– List all modules in process● list <pattern>

– List all matching modules● list <module>

– List all functions in module● list <module> <pattern>

– List all matching functions in module● list <module> <function>

– List instrumentable points in function

Page 23: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Exploring FSPX

(dynaprof) listDEFAULT_MODULEeos.Fphase.Fsetup.Fsupmain.Fio.Fproperties.FsolveT.Fupdate.Flibm.so.6libc.so.6

●G77's Fortran Runtime supportCode compiled with g77 without -gends up in the DEFAULT_MODULE

●Application Code

●Shared libraries

Page 24: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Exploring FSPX 2(dynaprof) list DEFAULT_MODULEcall_gmon_startfini_dummycopyap_endop_gengt_numf_sne_de_di_temf_listtype_fl_Rrd_count

●G77's Fortran Runtime supportCode compiled with g77 without -gends up in the DEFAULT_MODULE

Page 25: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Exploring FSPX 3(dynaprof) list phase.FPhase_(dynaprof) list update.Fproflux_flux_pde_(dynaprof) list phase.F phase_Entry

Call tsofx_Call tlofx_Call eslds_Call elqds_Call tinsol_Call s_wsleCall do_lioCall do_lioCall do_lio

Function Calls

Page 26: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Use command

● Loads a probe shared library into address space

(dynaprof) use [probe [args]]● Use by itself displays current probe.● To change options, respecify probe.● 4 probes in this release

– Wallclock: Real time clock– PAPI: Hardware metrics– Perfometer: RT Visi of streaming hardware metrics

Page 27: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Instr command

● instr

– list all instrumented functions● instr module <pattern> [arg]

– Instrument all functions in modules matching pattern● instr function <module> <pattern> [arg]

– Instrument all functions matching pattern in module

Page 28: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Threads and Dynaprof Probes

● For threaded code, use the same probe!● Dynaprof detects threads and loads a special

version of the probe library.● Each probe specifies what to do when a new

thread is discovered.● Each thread gets the same instrumentation.

Page 29: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Probe Warning

● Instrumentation is not free.● Consider granularity of region being measured.● Overhead for PAPI 2.3 is O(100) cycles.

– Between 500 and 2000 cycles for a 2 counter read.● Overhead for Wallclock is O(100) cycles.

Page 30: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Wallclock Probe

● High resolution, low latency timer● Usage:

use wallclockprobe● Reports time in microseconds, 1.0x10-6s.

Page 31: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

PAPI Probe

● Count PAPI Presets or Native Events● Usage:

use papiprobe [event,event,...]● Default argument is either PAPI_FP_INS or PAPI_TOT_INS if the architecture doesn't support it.

● Available events a can be obtained by using:

papi_avail -a

Page 32: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

PAPI Probe and Multiplexing

● More than physical number of metrics automatically enables multiplexing.

● Minimum runtime of instrumented regions must be observed, such that all virtual counters get a chance to run at least once.

run-timemin

= num_events * .01s

● Automatic warning functionality is being rolled into PAPI.

Page 33: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

PAPI Native Events

● Look in the PAPI distribution● See the README file for your architecture in the src directory

● See the example program tests/native.c in the src/tests directory

Page 34: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Power 3 EventsPAPI_L1_DCM Yes Level 1 data cache misses (PM_LD_MISS_L1,PM_ST_L1MISS)PAPI_L1_ICM No Level 1 instruction cache misses (PM_IC_MISS)PAPI_L1_TCM Yes Level 1 cache misses (PM_IC_MISS,PM_LD_MISS_L1,PM_ST_L1MISS)PAPI_CA_SNP No Requests for a snoop (PM_SNOOP)PAPI_CA_SHR No Requests for exclusive access to shared cache line (PM_SNOOP_E_TO_S)PAPI_CA_ITV No Requests for cache line intervention (PM_SNOOP_PUSH_INT)PAPI_BRU_IDL No Cycles branch units are idle (PM_BRU_IDLE)PAPI_FXU_IDL No Cycles integer units are idle (PM_FXU_IDLE)PAPI_FPU_IDL No Cycles floating point units are idle (PM_FPU_IDLE)PAPI_LSU_IDL No Cycles load/store units are idle (PM_LSU_IDLE)PAPI_TLB_TL No Total translation lookaside buffer misses (PM_TLB_MISS)PAPI_L1_LDM No Level 1 load misses (PM_LD_MISS_L1)PAPI_L1_STM No Level 1 store misses (PM_ST_L1MISS)PAPI_L2_LDM No Level 2 load misses (PM_LD_MISS_EXCEED_L2)PAPI_L2_STM No Level 2 store misses (PM_ST_MISS_EXCEED_L2)PAPI_BTAC_M No Branch target address cache misses (PM_BTAC_MISS)PAPI_PRF_DM No Data prefetch cache misses (PM_PREF_MATCH_DEM_MISS)PAPI_TLB_SD No Translation lookaside buffer shootdowns (PM_TLBSYNC_RERUN)PAPI_CSR_FAL No Failed store conditional instructions (PM_ST_COND_FAIL)PAPI_CSR_SUC No Successful store conditional instructions (PM_RESRV_CMPL)PAPI_CSR_TOT No Total store conditional instructions (PM_RESRV_RQ)PAPI_MEM_SCY Yes Cycles Stalled Waiting for memory accesses (PM_CMPLU_WT_LD,PM_CMPLU_WT_ST)PAPI_MEM_RCY No Cycles Stalled Waiting for memory Reads (PM_CMPLU_WT_LD)PAPI_MEM_WCY No Cycles Stalled Waiting for memory writes (PM_CMPLU_WT_ST)PAPI_STL_ICY No Cycles with no instruction issue (PM_0INST_DISP)PAPI_STL_CCY No Cycles with no instructions completed (PM_0INST_CMPL)PAPI_BR_CN No Conditional branch instructions (PM_CBR_DISP)PAPI_BR_MSP No Conditional branch instructions mispredicted (PM_MPRED_BR_CAUSED_GC)PAPI_BR_PRC No Conditional branch instructions correctly predicted (PM_BR_PRED)

Page 35: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Power 3 Events 2

PAPI_FMA_INS No FMA instructions completed (PM_EXEC_FMA)PAPI_TOT_IIS No Instructions issued (PM_INST_DISP)PAPI_TOT_INS No Instructions completed (PM_INST_CMPL)PAPI_INT_INS Yes Integer instructions (PM_FXU0_PROD_RESULT,PM_FXU1_PROD_RESULT,PM_FXU2_PROD_RESULT)PAPI_FP_INS Yes Floating point instructions (PM_FPU0_CMPL,PM_FPU1_CMPL)PAPI_LD_INS No Load instructions (PM_LD_CMPL)PAPI_SR_INS No Store instructions (PM_ST_CMPL)PAPI_BR_INS No Branch instructions (PM_BR_CMPL)PAPI_FLOPS Yes Floating point instructions per second (PM_CYC,PM_FPU0_CMPL,PM_FPU1_CMPL)PAPI_TOT_CYC No Total cycles (PM_CYC)PAPI_IPS Yes Instructions per second (PM_CYC,PM_INST_CMPL)PAPI_LST_INS Yes Load/store instructions completed (PM_LD_CMPL,PM_ST_CMPL)PAPI_SYC_INS No Synchronization instructions completed (PM_SYNC)PAPI_FDV_INS No Floating point divide instructions (PM_FPU_FDIV)PAPI_FSQ_INS No Floating point square root instructions (PM_FPU_FSQRT)

Page 36: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Power 4 Events

PAPI_L1_DCM Yes Level 1 data cache misses (PM_LD_MISS_L1,PM_ST_MISS_L1)PAPI_FXU_IDL No Cycles integer units are idle (PM_FXU_IDLE)PAPI_TLB_DM No Data translation lookaside buffer misses (PM_DTLB_MISS)PAPI_TLB_IM No Instruction translation lookaside buffer misses (PM_ITLB_MISS)PAPI_TLB_TL Yes Total translation lookaside buffer misses (PM_DTLB_MISS,PM_ITLB_MISS)PAPI_L1_LDM No Level 1 load misses (PM_LD_MISS_L1)PAPI_L1_STM No Level 1 store misses (PM_ST_MISS_L1)PAPI_STL_ICY No Cycles with no instruction issue (PM_0INST_FETCH)PAPI_HW_INT No Hardware interrupts (PM_EXT_INT)PAPI_FMA_INS No FMA instructions completed (PM_FPU_FMA)PAPI_TOT_IIS No Instructions issued (PM_INST_DISP)PAPI_TOT_INS No Instructions completed (PM_INST_CMPL)PAPI_INT_INS No Integer instructions (PM_FXU_FIN)PAPI_FP_INS No Floating point instructions (PM_FPU_FIN)PAPI_FLOPS Yes Floating point instructions per second (PM_CYC,PM_FPU_FIN)PAPI_TOT_CYC No Total cycles (PM_CYC)PAPI_IPS Yes Instructions per second (PM_CYC,PM_INST_CMPL)PAPI_L1_DCA Yes Level 1 data cache accesses (PM_LD_REF_L1,PM_ST_REF_L1)PAPI_L1_DCR No Level 1 data cache reads (PM_LD_REF_L1)PAPI_L1_DCW No Level 1 data cache writes (PM_ST_REF_L1)PAPI_FDV_INS No Floating point divide instructions (PM_FPU_FDIV)PAPI_FSQ_INS No Floating point square root instructions (PM_FPU_FSQRT)

Page 37: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Pentium III EventsPAPI_L1_DCM No Level 1 data cache misses (0x45,0x45)PAPI_L1_ICM No Level 1 instruction cache misses (0xf28,0xf28)PAPI_L2_ICM No Level 2 instruction cache misses (0x68,0x68)PAPI_L1_TCM No Level 1 cache misses (0xf2e,0xf2e)PAPI_L2_TCM No Level 2 cache misses (0x24,0x24)PAPI_CA_SHR No Requests for exclusive access to shared cache line (0x22e,0x22e)PAPI_CA_CLN No Requests for exclusive access to clean cache line (0x66,0x66)PAPI_CA_INV No Requests for cache line invalidation (0x69,0x69)PAPI_CA_ITV No Requests for cache line intervention (0x4007b,0x4007b)PAPI_TLB_IM No Instruction translation lookaside buffer misses (0x85,0x85)PAPI_L1_LDM No Level 1 load misses (0xf29,0xf29)PAPI_L1_STM No Level 1 store misses (0xf2a,0xf2a)PAPI_L2_LDM Yes Level 2 load misses (0x24,0x25)PAPI_L2_STM No Level 2 store misses (0x25,0x25)PAPI_BTAC_M No Branch target address cache misses (0xe2,0xe2)PAPI_HW_INT No Hardware interrupts (0xc8,0xc8)PAPI_BR_CN No Conditional branch instructions (0xc4,0xc4)PAPI_BR_TKN No Conditional branch instructions taken (0xc9,0xc9)PAPI_BR_NTK Yes Conditional branch instructions not taken (0xc4,0xc9)PAPI_BR_MSP No Conditional branch instructions mispredicted (0xc5,0xc5)PAPI_BR_PRC Yes Conditional branch instructions correctly predicted (0xc4,0xc5)PAPI_TOT_IIS No Instructions issued (0xd0,0xd0)PAPI_TOT_INS No Instructions completed (0xc0,0xc0)PAPI_FP_INS No Floating point instructions (0xc1,0x0)PAPI_BR_INS No Branch instructions (0xc4,0xc4)PAPI_VEC_INS No Vector/SIMD instructions (0xb0,0xb0)PAPI_FLOPS Yes Floating point instructions per second (0xc1,0x79)

Page 38: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Intel Pentium IV Events

PAPI_L1_DCM No Level 1 data cache misses 0x0003b000/0x12000204@0x8000000c)

PAPI_L2_DCM No Level 2 data cache misses (0x0003b000/0x12000204@0x8000000c)

PAPI_L1_LDM No Level 1 load misses (0x0003b000/0x12000204@0x8000000c)PAPI_L1_STM No Level 1 store misses (0x0003b000/0x12000204@0x8000000c)PAPI_L2_LDM No Level 2 load misses (0x0003b000/0x12000204@0x8000000c)PAPI_L2_STM No Level 2 store misses (0x0003b000/0x12000204@0x8000000c)PAPI_TOT_INS No Instructions completed

(0x00039000/0x04000204@0x8000000c)PAPI_FP_INS No Floating point instructions

(0x0003b000/0x18000204@0x8000000c 0x00033000/0x09000034@0x80000008)

PAPI_TOT_CYC No Total cycles (0x00ff9000/0x7e000004@0x8000000d)

(Arguments to perfex -e from PerfCtr distribution)

Page 39: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Sun UltraSparc II Events

PAPI_L1_ICM Yes Level 1 instruction cache misses (0x8,0x8)PAPI_L2_TCM Yes Level 2 cache misses (0xc,0xc)PAPI_CA_SNP No Requests for a snoop (-1,0xe)PAPI_CA_INV No Requests for cache line invalidation (0xe,-1)PAPI_L1_LDM Yes Level 1 load misses (0x9,0x9)PAPI_L1_STM Yes Level 1 store misses (0xa,0xa)PAPI_BR_MSP No Conditional branch instructions mispredicted (-1,0x2)PAPI_TOT_IIS No Instructions issued (-1,0x1)PAPI_TOT_INS No Instructions completed (-1,0x1)PAPI_LD_INS No Load instructions (0x9,-1)PAPI_SR_INS No Store instructions (0xa,-1)PAPI_TOT_CYC No Total cycles (0x0,0x0)PAPI_IPS Yes Instructions per second (0x0,0x1)PAPI_L1_DCR No Level 1 data cache reads (0x9,-1)PAPI_L1_DCW No Level 1 data cache writes (0xa,-1)PAPI_L1_ICH No Level 1 instruction cache hits (-1,0x8)PAPI_L2_ICH No Level 2 instruction cache hits (-1,0xf)PAPI_L1_ICA No Level 1 instruction cache accesses (0x8,-1)PAPI_L2_TCH No Level 2 total cache hits (-1,0xc)PAPI_L2_TCA No Level 2 total cache accesses (0xc,-1)

Page 40: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Sun UltraSparc III Events

PAPI_L1_ICM No Level 1 instruction cache misses (-1,0x8)PAPI_L2_ICM No Level 2 instruction cache misses (-1,0xf)PAPI_L2_TCM No Level 2 cache misses (-1,0xc)PAPI_TLB_DM No Data translation lookaside buffer misses (-1,0x12)PAPI_TLB_IM No Instruction translation lookaside buffer misses (-1,0x11)PAPI_L1_LDM No Level 1 load misses (-1,0x9)PAPI_L1_STM No Level 1 store misses (-1,0xa)PAPI_BR_MSP No Conditional branch instructions mispredicted (-1,0x2)PAPI_TOT_IIS No Instructions issued (0x1,0x1)PAPI_TOT_INS No Instructions completed (0x1,0x1)PAPI_FP_INS Yes Floating point instructions (0x18,0x27)PAPI_TOT_CYC No Total cycles (0x0,0x0)PAPI_IPS Yes Instructions per second (0x0,0x1)PAPI_L1_DCR No Level 1 data cache reads (0x9,-1)PAPI_L1_DCW No Level 1 data cache writes (0xa,-1)PAPI_L1_ICH No Level 1 instruction cache hits (0x8,-1)PAPI_L1_ICA Yes Level 1 instruction cache accesses (0x8,0x8)PAPI_L2_TCH Yes Level 2 total cache hits (0xc,0xc)PAPI_L2_TCA No Level 2 total cache accesses (0xc,-1)PAPI_FML_INS No Floating point multiply instructions (-1,0x27)PAPI_FAD_INS No Floating point add instructions (0x18,-1)

Page 41: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

MIPS R12K EventsPAPI_L1_DCM No Level 1 data cache misses (25)PAPI_L1_ICM No Level 1 instruction cache misses (9)PAPI_L2_DCM No Level 2 data cache misses (26)PAPI_L2_ICM No Level 2 instruction cache misses (10)PAPI_L1_TCM Yes Level 1 cache misses (9,25)PAPI_L2_TCM Yes Level 2 cache misses (10,26)PAPI_CA_SHR No Requests for exclusive access to shared cache line (31)PAPI_CA_INV No Requests for cache line invalidation (13)PAPI_CA_ITV No Requests for cache line intervention (12)PAPI_TLB_TL No Total translation lookaside buffer misses (23)PAPI_PRF_DM No Data prefetch cache misses (17)PAPI_CSR_FAL No Failed store conditional instructions (5)PAPI_CSR_SUC Yes Successful store conditional instructions (20,5)PAPI_CSR_TOT No Total store conditional instructions (20)PAPI_BR_CN No Conditional branch instructions (6)PAPI_BR_MSP No Conditional branch instructions mispredicted (24)PAPI_BR_PRC Yes Conditional branch instructions correctly predicted(6,24)PAPI_TOT_IIS No Instructions issued (1)PAPI_TOT_INS No Instructions completed (15)PAPI_FP_INS No Floating point instructions (21)PAPI_LD_INS No Load instructions (18)PAPI_SR_INS No Store instructions (19)PAPI_FLOPS Yes Floating point instructions per second (0,21)PAPI_TOT_CYC No Total cycles (0)PAPI_IPS Yes Instructions per second (0,15)PAPI_LST_INS Yes Load/store instructions completed (18,19)

Page 42: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Alpha/DADD 21264 Events

PAPI_L1_ICM No Level 1 instruction cache misses (0x3)PAPI_L2_TCM No Level 2 cache misses (0x1)PAPI_TLB_DM No Data translation lookaside buffer misses (0x2)PAPI_BR_UCN No Unconditional branch instructions (0x15)PAPI_BR_CN No Conditional branch instructions (0x16)PAPI_BR_NTK No Conditional branch instructions not taken (0x18)PAPI_BR_MSP No Conditional branch instructions mispredicted (0x19)PAPI_BR_PRC No Conditional branch instructions correctly predicted (0x1a)PAPI_TOT_IIS No Instructions issued (0x7)PAPI_TOT_INS No Instructions completed (0x8)PAPI_INT_INS No Integer instructions (0x9)PAPI_FP_INS No Floating point instructions (0x14)PAPI_LD_INS No Load instructions (0xa)PAPI_SR_INS No Store instructions (0xb)PAPI_TOT_CYC No Total cycles (0x0)PAPI_LST_INS No Load/store instructions completed (0xc)PAPI_SYC_INS No Synchronization instructions completed (0xd)PAPI_FML_INS No Floating point multiply instructions (0x11)PAPI_FAD_INS No Floating point add instructions (0x10)PAPI_FDV_INS No Floating point divide instructions (0x12)PAPI_FSQ_INS No Floating point square root instructions (0x13)

Page 43: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Perfometer Probe

● Sends a stream of performance data every N seconds to the Perfometer GUI.

● Functions can be colored at instrumentation time.– Default color is white, 0xFFFFFF

● Usage:use perfometerprobe [0xRRGGBB]

instr <args> <0xRRGGBB>

Page 44: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Perfometer Probe 2

● Perfometer GUI is NOT launched automatically.● showrgb in X11 lists colors and names.● Run the Java GUI

– Java -jar Perfometer.jar● Connect up to the specified hostname and port.

Page 45: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Instrumenting SWIM withperfometerprobe

Module perfometerprobe.so was loaded.Module libperfometer.so was loaded.Module libpapi.so was loaded.(dynaprof) instr function swim.F calc1_ 0xff0000swim.F, inserted 1 instrumentation points(dynaprof) instr function swim.F calc2_ 0x00ff00swim.F, inserted 1 instrumentation points(dynaprof) instr function swim.F calc3_ 0x0000ffswim.F, inserted 1 instrumentation points(dynaprof) runModule libnss_files.so.2 was loaded.Module libnss_nisplus.so.2 was loaded.Module libnsl.so.1 was loaded.Module libnss_dns.so.2 was loaded.Module libresolv.so.2 was loaded.Perfometer client awaiting connection on port #33733

Page 46: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Instrumenting FSPX forInstructions Per Cycle

(dynaprof) use probes/papiprobe PAPI_TOT_CYC, PAPI_TOT_INSModule papiprobe.so was loaded.Module libpapi.so was loaded.Module libperfctr.so was loaded.(dynaprof) instr module update.Fupdate.F, inserted 3 instrumentation points(dynaprof) instr module pde.F (dynaprof) instrproflux_flux_pde_(dynaprof) instr module phase.Fphase.F, inserted 1 instrumentation points(dynaprof) instrproflux_flux_pde_phase_

Page 47: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Instrumenting SWIM forInstructions Per Cycle

(dynaprof) use probes/papiprobe PAPI_TOT_CYC, PAPI_TOT_INSModule papiprobe.so was loaded.Module libpapi.so was loaded.Module libperfctr.so was loaded.(dynaprof) instr function swim.F calc*Swim.F, inserted 3 instrumentation points(dynaprof) instrcalc1_calc2_calc3_calc3z_

Page 48: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Reporting Probe Data

● The wallclock and PAPI probes produce very similar data.

● Both use a parsing script written in Perl.– wallclockrpt <file>– papiproberpt <file>

● Produce 3 profiles– Inclusive: T

function = T

self + T

children

– Exclusive: Tfunction

= Tself

– 1-Level Call Tree: Tchild

= Inclusive Tfunction

Page 49: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Fspx Cycles

& Instrs.

Exclusive Profile of Metric PAPI_TOT_INS.

Name Percent Total Calls ------------- ------- ----- --------TOTAL 100 3.031e+11 1 unknown 53.81 1.631e+11 1 proflux_ 27.75 8.411e+10 9124 phase_ 15.44 4.68e+10 6080 flux_ 2.507 7.598e+09 6080 pde_ 0.4884 1.48e+09 6080

Inclusive Profile of Metric PAPI_TOT_INS.

Name Percent Total SubCalls------------- ------- ----- --------TOTAL 100 3.031e+11 0 proflux_ 59.31 1.797e+11 2.242e+08phase_ 37.69 1.142e+11 1.247e+08flux_ 2.507 7.598e+09 0 pde_ 0.4884 1.48e+09 0

1-Level Inclusive Call Tree of Metric PAPI_TOT_INS.

Parent/-Child Percent Total Calls ------------- ------- ----- --------TOTAL 100 3.031e+11 1 proflux_ 100 1.797e+11 9124 - akl_ 8.504 1.529e+10 3.737e+07- aks_ 8.4 1.51e+10 3.737e+07- cpl_ 8.525 1.532e+10 3.737e+07- cps_ 8.525 1.532e+10 3.737e+07- hl_ 9.689 1.742e+10 3.737e+07- hs_ 9.564 1.719e+10 3.737e+07flux_ 100 7.598e+09 6080 pde_ 100 1.48e+09 6080 phase_ 100 1.142e+11 6080 - tsofx_ 11.72 1.339e+10 2.49e+07- tlofx_ 11.49 1.312e+10 2.49e+07- eslds_ 12.88 1.471e+10 2.49e+07- elqds_ 12.69 1.449e+10 2.49e+07- tinsol_ 4.999e-07 571 1 - tinmush_ 1.114 1.273e+09 7.271e+04- xsoft_ 0.121 1.383e+08 7.271e+04- xloft_ 0.1031 1.178e+08 7.271e+04- cpl_ 8.913 1.018e+10 2.483e+07

Exclusive Profile of Metric PAPI_TOT_CYC.

Name Percent Total Calls ------------- ------- ----- --------TOTAL 100 5.017e+11 1 unknown 53.62 2.69e+11 1 proflux_ 27.75 1.393e+11 9124 phase_ 14.9 7.475e+10 6080 flux_ 3.096 1.554e+10 6080 pde_ 0.6356 3.189e+09 6080

Inclusive Profile of Metric PAPI_TOT_CYC.

Name Percent Total SubCalls------------- ------- ----- --------TOTAL 100 5.017e+11 0 proflux_ 57.32 2.876e+11 2.242e+08phase_ 38.92 1.953e+11 1.247e+08flux_ 3.096 1.554e+10 0 pde_ 0.6356 3.189e+09 0

1-Level Inclusive Call Tree of Metric PAPI_TOT_CYC.

Parent/-Child Percent Total Calls ------------- ------- ----- --------TOTAL 100 5.017e+11 1 proflux_ 100 2.876e+11 9124 - akl_ 7.945 2.285e+10 3.737e+07- aks_ 7.871 2.264e+10 3.737e+07- cpl_ 8.84 2.542e+10 3.737e+07- cps_ 8.705 2.503e+10 3.737e+07- hl_ 9.252 2.661e+10 3.737e+07- hs_ 8.967 2.579e+10 3.737e+07flux_ 100 1.554e+10 6080 pde_ 100 3.189e+09 6080 phase_ 100 1.953e+11 6080 - tsofx_ 12.42 2.425e+10 2.49e+07- tlofx_ 12.42 2.425e+10 2.49e+07- eslds_ 13.41 2.618e+10 2.49e+07- elqds_ 13.41 2.62e+10 2.49e+07- tinsol_ 1.013e-06 1978 1 - tinmush_ 1.716 3.351e+09 7.271e+04- xsoft_ 0.1749 3.415e+08 7.271e+04- xloft_ 0.151 2.95e+08 7.271e+04- cpl_ 8.032 1.569e+10 2.483e+07

Page 50: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

fspx IPC

Exclusive Profile of Metric PAPI_TOT_INS.

Name Percent Total Calls ------------- ------- ----- --------TOTAL 100 3.031e+11 1 unknown 53.81 1.631e+11 1 proflux_ 27.75 8.411e+10 9124 phase_ 15.44 4.68e+10 6080 flux_ 2.507 7.598e+09 6080 pde_ 0.4884 1.48e+09 6080

Inclusive Profile of Metric PAPI_TOT_INS.

Name Percent Total SubCalls------------- ------- ----- --------TOTAL 100 3.031e+11 0 proflux_ 59.31 1.797e+11 2.242e+08phase_ 37.69 1.142e+11 1.247e+08flux_ 2.507 7.598e+09 0 pde_ 0.4884 1.48e+09 0

1-Level Inclusive Call Tree of Metric PAPI_TOT_INS.

Parent/-Child Percent Total Calls ------------- ------- ----- --------TOTAL 100 3.031e+11 1 proflux_ 100 1.797e+11 9124 - akl_ 8.504 1.529e+10 3.737e+07- aks_ 8.4 1.51e+10 3.737e+07- cpl_ 8.525 1.532e+10 3.737e+07- cps_ 8.525 1.532e+10 3.737e+07- hl_ 9.689 1.742e+10 3.737e+07- hs_ 9.564 1.719e+10 3.737e+07flux_ 100 7.598e+09 6080 pde_ 100 1.48e+09 6080 phase_ 100 1.142e+11 6080 - tsofx_ 11.72 1.339e+10 2.49e+07- tlofx_ 11.49 1.312e+10 2.49e+07- eslds_ 12.88 1.471e+10 2.49e+07- elqds_ 12.69 1.449e+10 2.49e+07- tinsol_ 4.999e-07 571 1 - tinmush_ 1.114 1.273e+09 7.271e+04- xsoft_ 0.121 1.383e+08 7.271e+04- xloft_ 0.1031 1.178e+08 7.271e+04- cpl_ 8.913 1.018e+10 2.483e+07

Exclusive Profile of Metric PAPI_TOT_CYC.

Name Percent Total Calls ------------- ------- ----- --------TOTAL 100 5.017e+11 1 unknown 53.62 2.69e+11 1 proflux_ 27.75 1.393e+11 9124 phase_ 14.9 7.475e+10 6080 flux_ 3.096 1.554e+10 6080 pde_ 0.6356 3.189e+09 6080

Inclusive Profile of Metric PAPI_TOT_CYC.

Name Percent Total SubCalls------------- ------- ----- --------TOTAL 100 5.017e+11 0 proflux_ 57.32 2.876e+11 2.242e+08phase_ 38.92 1.953e+11 1.247e+08flux_ 3.096 1.554e+10 0 pde_ 0.6356 3.189e+09 0

1-Level Inclusive Call Tree of Metric PAPI_TOT_CYC.

Parent/-Child Percent Total Calls ------------- ------- ----- --------TOTAL 100 5.017e+11 1 proflux_ 100 2.876e+11 9124 - akl_ 7.945 2.285e+10 3.737e+07- aks_ 7.871 2.264e+10 3.737e+07- cpl_ 8.84 2.542e+10 3.737e+07- cps_ 8.705 2.503e+10 3.737e+07- hl_ 9.252 2.661e+10 3.737e+07- hs_ 8.967 2.579e+10 3.737e+07flux_ 100 1.554e+10 6080 pde_ 100 3.189e+09 6080 phase_ 100 1.953e+11 6080 - tsofx_ 12.42 2.425e+10 2.49e+07- tlofx_ 12.42 2.425e+10 2.49e+07- eslds_ 13.41 2.618e+10 2.49e+07- elqds_ 13.41 2.62e+10 2.49e+07- tinsol_ 1.013e-06 1978 1 - tinmush_ 1.716 3.351e+09 7.271e+04- xsoft_ 0.1749 3.415e+08 7.271e+04- xloft_ 0.151 2.95e+08 7.271e+04- cpl_ 8.032 1.569e+10 2.483e+07

proflux 0.61phase 0.63flux 0.49pde 0.46

Page 51: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Swim Cycles

& Instrs.

Exclusive Profile of Metric PAPI_TOT_INS.

Name Percent Total Calls ------------- ------- ----- --------TOTAL 100 1.723e+09 1 calc2 38.28 6.598e+08 120 calc1 32.31 5.567e+08 120 calc3 22.33 3.847e+08 118 unknown 7.084 1.221e+08 1

Inclusive Profile of Metric PAPI_TOT_INS.

Name Percent Total SubCalls------------- ------- ----- --------TOTAL 100 1.723e+09 0 calc2 39.42 6.793e+08 1680 calc1 35.28 6.08e+08 1800 calc3 22.87 3.942e+08 1652

1-Level Inclusive Call Tree of Metric PAPI_TOT_INS.

Parent/-Child Percent Total Calls ------------- ------- ----- --------TOTAL 100 1.723e+09 1 calc1 100 6.08e+08 120 - fsav 0.02065 1.255e+05 120 - mpi_irecv 0.03132 1.904e+05 120 - mpi_isend 0.05911 3.593e+05 120 - mpi_isend 0.06434 3.912e+05 120 -mpi_waitall 0.9013 5.479e+06 120 - mpi_irecv 0.03132 1.904e+05 120 - mpi_irecv 0.03132 1.904e+05 120 - mpi_isend 0.05356 3.256e+05 120 - mpi_isend 0.05079 3.088e+05 120 -mpi_waitall 6.813 4.142e+07 120 - mpi_irecv 0.03132 1.904e+05 120 - mpi_irecv 0.03132 1.904e+05 120 - mpi_isend 0.07504 4.562e+05 120 - mpi_isend 0.06757 4.108e+05 120 -mpi_waitall 0.161 9.791e+05 120 calc2 100 6.793e+08 120 - fsav 0.01848 1.255e+05 120 - mpi_irecv 0.02804 1.904e+05 120 - mpi_irecv 0.02804 1.904e+05 120 - mpi_isend 0.07762 5.273e+05 120 - mpi_isend 0.048 3.26e+05 120 -mpi_waitall 0.8084 5.491e+06 120 - mpi_irecv 0.02804 1.904e+05 120 - mpi_isend 0.05213 3.541e+05 120

Exclusive Profile of Metric PAPI_TOT_CYC.

Name Percent Total Calls ------------- ------- ----- --------TOTAL 100 3.181e+09 1 calc2 34.85 1.108e+09 120 calc1 33.48 1.065e+09 120 calc3 26.1 8.301e+08 118 unknown 5.568 1.771e+08 1

Inclusive Profile of Metric PAPI_TOT_CYC.

Name Percent Total SubCalls------------- ------- ----- --------TOTAL 100 3.181e+09 0 calc2 35.98 1.144e+09 1680 calc1 35.61 1.133e+09 1800 calc3 26.88 8.55e+08 1652

1-Level Inclusive Call Tree of Metric PAPI_TOT_CYC.

Parent/-Child Percent Total Calls ------------- ------- ----- --------TOTAL 100 3.181e+09 1 calc1 100 1.133e+09 120 - fsav 0.03432 3.887e+05 120 - mpi_irecv 0.07356 8.332e+05 120 - mpi_isend 0.0663 7.51e+05 120 - mpi_isend 0.0739 8.371e+05 120 -mpi_waitall 0.7189 8.143e+06 120 - mpi_irecv 0.1646 1.864e+06 120 - mpi_irecv 0.03407 3.859e+05 120 - mpi_isend 0.1867 2.115e+06 120 - mpi_isend 0.06067 6.872e+05 120 -mpi_waitall 4.22 4.78e+07 120 - mpi_irecv 0.03979 4.506e+05 120 - mpi_irecv 0.03008 3.407e+05 120 - mpi_isend 0.1014 1.148e+06 120 - mpi_isend 0.07568 8.573e+05 120 -mpi_waitall 0.1076 1.219e+06 120 calc2 100 1.144e+09 120 - fsav 0.03382 3.87e+05 120 - mpi_irecv 0.03222 3.687e+05 120 - mpi_irecv 0.03554 4.067e+05 120 - mpi_isend 0.0959 1.097e+06 120 - mpi_isend 0.05655 6.471e+05 120 -mpi_waitall 0.7268 8.317e+06 120 - mpi_irecv 0.1865 2.134e+06 120 - mpi_isend 0.2616 2.993e+06 120 - mpi_isend 0.06976 7.983e+05 120

Page 52: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Swim IPC

Exclusive Profile of Metric PAPI_TOT_INS.

Name Percent Total Calls ------------- ------- ----- --------TOTAL 100 1.723e+09 1 calc2 38.28 6.598e+08 120 calc1 32.31 5.567e+08 120 calc3 22.33 3.847e+08 118 unknown 7.084 1.221e+08 1

Inclusive Profile of Metric PAPI_TOT_INS.

Name Percent Total SubCalls------------- ------- ----- --------TOTAL 100 1.723e+09 0 calc2 39.42 6.793e+08 1680 calc1 35.28 6.08e+08 1800 calc3 22.87 3.942e+08 1652

1-Level Inclusive Call Tree of Metric PAPI_TOT_INS.

Parent/-Child Percent Total Calls ------------- ------- ----- --------TOTAL 100 1.723e+09 1 calc1 100 6.08e+08 120 - fsav 0.02065 1.255e+05 120 - mpi_irecv 0.03132 1.904e+05 120 - mpi_isend 0.05911 3.593e+05 120 - mpi_isend 0.06434 3.912e+05 120 -mpi_waitall 0.9013 5.479e+06 120 - mpi_irecv 0.03132 1.904e+05 120 - mpi_irecv 0.03132 1.904e+05 120 - mpi_isend 0.05356 3.256e+05 120 - mpi_isend 0.05079 3.088e+05 120 -mpi_waitall 6.813 4.142e+07 120 - mpi_irecv 0.03132 1.904e+05 120 - mpi_irecv 0.03132 1.904e+05 120 - mpi_isend 0.07504 4.562e+05 120 - mpi_isend 0.06757 4.108e+05 120 -mpi_waitall 0.161 9.791e+05 120 calc2 100 6.793e+08 120 - fsav 0.01848 1.255e+05 120 - mpi_irecv 0.02804 1.904e+05 120 - mpi_irecv 0.02804 1.904e+05 120 - mpi_isend 0.07762 5.273e+05 120 - mpi_isend 0.048 3.26e+05 120 -mpi_waitall 0.8084 5.491e+06 120 - mpi_irecv 0.02804 1.904e+05 120 - mpi_isend 0.05213 3.541e+05 120

Exclusive Profile of Metric PAPI_TOT_CYC.

Name Percent Total Calls ------------- ------- ----- --------TOTAL 100 3.181e+09 1 calc2 34.85 1.108e+09 120 calc1 33.48 1.065e+09 120 calc3 26.1 8.301e+08 118 unknown 5.568 1.771e+08 1

Inclusive Profile of Metric PAPI_TOT_CYC.

Name Percent Total SubCalls------------- ------- ----- --------TOTAL 100 3.181e+09 0 calc2 35.98 1.144e+09 1680 calc1 35.61 1.133e+09 1800 calc3 26.88 8.55e+08 1652

1-Level Inclusive Call Tree of Metric PAPI_TOT_CYC.

Parent/-Child Percent Total Calls ------------- ------- ----- --------TOTAL 100 3.181e+09 1 calc1 100 1.133e+09 120 - fsav 0.03432 3.887e+05 120 - mpi_irecv 0.07356 8.332e+05 120 - mpi_isend 0.0663 7.51e+05 120 - mpi_isend 0.0739 8.371e+05 120 -mpi_waitall 0.7189 8.143e+06 120 - mpi_irecv 0.1646 1.864e+06 120 - mpi_irecv 0.03407 3.859e+05 120 - mpi_isend 0.1867 2.115e+06 120 - mpi_isend 0.06067 6.872e+05 120 -mpi_waitall 4.22 4.78e+07 120 - mpi_irecv 0.03979 4.506e+05 120 - mpi_irecv 0.03008 3.407e+05 120 - mpi_isend 0.1014 1.148e+06 120 - mpi_isend 0.07568 8.573e+05 120 -mpi_waitall 0.1076 1.219e+06 120 calc2 100 1.144e+09 120 - fsav 0.03382 3.87e+05 120 - mpi_irecv 0.03222 3.687e+05 120 - mpi_irecv 0.03554 4.067e+05 120 - mpi_isend 0.0959 1.097e+06 120 - mpi_isend 0.05655 6.471e+05 120 -mpi_waitall 0.7268 8.317e+06 120 - mpi_irecv 0.1865 2.134e+06 120 - mpi_isend 0.2616 2.993e+06 120 - mpi_isend 0.06976 7.983e+05 120

calc20.59calc10.53calc30.46

Page 53: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Perfometer Screenshot

Page 54: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Dynaprof 0.8 SC Release

● Binary distribution for 4 Platforms on the website– AIX 3.x / DPCL 3.2.5 on Power 3– Linux / DynInst 3.0 on Pentium <= III– Solaris 2.8 / DynInst 3.0 on UltraSparc II/III– IRIX / DynInst 3.0 on MIPS R10/12/14k– Power 4 and Pentium 4 are coming...

● Xdynaprof Java/Swing GUI included● perfometerprobe and GUI included● Updated documentation

Page 55: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

References

● The Dynaprof Homepage

http://www.cs.utk.edu/~mucci/dynaprof

● The PAPI Homepage

http://icl.cs.utk.edu/projects/papi

● The DynInst Homepage

http://www.dyninst.org

● The DPCL Homepage

http://oss.software.ibm.com/developerworks/opensource/dpcl

● The Vprof Homepage

http://aros.ca.sandia.gov/~cljanss/perf/vprof

● The GNU Readline Homepage

http://cnswww.cns.cwru.edu/~chet/readline/rltop.html