Top Banner
Advanced Modular Software Performance Monitoring CPU profiling with Intel® VTune™ Amplifier XE Alexander Mazurov Ferrara University, CERN
35

Advanced Modular Software Performance Monitoring

May 21, 2015

Download

Technology

CPU profiling with Intel® VTune™ Amplifier XE
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Advanced Modular Software Performance Monitoring

Advanced Modular Software Performance Monitoring

CPU profiling with Intel® VTune™ Amplifier XE

Alexander MazurovFerrara University, CERN

Page 2: Advanced Modular Software Performance Monitoring

2

I. Event Processing Software II. Profilers III. Intel® VTune™ Amplifier XE IV. Gaudi Framework V. Gaudi Intel Profiler Auditor VI. Profiling examples

Page 3: Advanced Modular Software Performance Monitoring

3

Physics events

The Higgs Boson

Simulation * Trigger * Analysis

I. Event Processing Software

Page 4: Advanced Modular Software Performance Monitoring

4

Detectorevents

Events to storage

106 events/sec 4500 events/sec

LHCb High Level Trigger (HLT) Software

Moore

Page 5: Advanced Modular Software Performance Monitoring

5

II. Profilers

Collect information related to how an application or

system perform.

Page 6: Advanced Modular Software Performance Monitoring

6

Measure frequency and duration of functions calls and/or code

instructions.

CPU Profiler

Page 7: Advanced Modular Software Performance Monitoring

7

Profiling Techniques

- Hardware counters- Instrumenting the code

Page 8: Advanced Modular Software Performance Monitoring

8

Hardware countersExploit hardware performance counters from Performance Monitoring Unit (PMU)

Counters: - Translation lookaside buffer (TLB) misses - Cache misses - Stall cycles - Memory access latency - ...

Perfmon2 * Intel VTune Amplifier

Page 9: Advanced Modular Software Performance Monitoring

9

Instrumenting the code

- Statically: * Change code manually / automatically * Compiler assisted (gcc -pg)

- Dynamically (at runtime): * Change code in runtime - Valgrind - Google Performance Tools - Intel VTune Amplifier

Page 10: Advanced Modular Software Performance Monitoring

10

III. VTune™ Amplifier XEPerformance Profiling Tool

- x86 (32 and 64-bit)- GUI and CLI

Page 11: Advanced Modular Software Performance Monitoring

11

VTune™ FeaturesRuntime instrumenting profiler

- User-mode sampling- Hardware-based sampling- Concurrency and locks and waits analysis- Threading timeline- Attach to a running process- Source view

Page 12: Advanced Modular Software Performance Monitoring

12

1) Interupts a process2) Collect samples of all active instruction addresses

3) Restore a call sequence upon each sample.

How user-mode sampling works?

Page 13: Advanced Modular Software Performance Monitoring

13

User-mode analysis types

- Hotspots- Concurrency- Locks and Waits

Page 14: Advanced Modular Software Performance Monitoring

14

User-mode samplingHotspots analysis:

Page 15: Advanced Modular Software Performance Monitoring

15

Group results

Page 16: Advanced Modular Software Performance Monitoring

16

Call Stack

Page 17: Advanced Modular Software Performance Monitoring

17

Filter by timeline

Page 18: Advanced Modular Software Performance Monitoring

18

CPU time by code line

Debug mode (-g)

Page 19: Advanced Modular Software Performance Monitoring

19

User-mode sampling is a statistical method and does not provide a 100% accurate results.

Accuracy depends on:- Duration of the collection- Speed of processor- Amount of software activity- Sampling interval * recommended value is 10 ms * profiling is only 5% slower

Sampling Accuracy

Page 20: Advanced Modular Software Performance Monitoring

20

Integrating VTune™ Amplifierto Event Processing Framework

Page 21: Advanced Modular Software Performance Monitoring

21

IV. GaudiEvent processing framework

MooreTrigger

GaussSimulation

BrunelReconstruction

OnlineMonitoring

and commissioningDaVinci

Physicsanalysis

Page 22: Advanced Modular Software Performance Monitoring

22

Gaudi Architecture

Algorithms * Services * Tools

Page 23: Advanced Modular Software Performance Monitoring

23

Moore Event LoopHlt1DiMuonHighMassFilterSequence Hlt1DiMuonHighMassStreamer FastVeloHlt MuonRec Velo2CandidatesDiMuonHighMass GECLooseUnit createITLiteClusters createVeloLiteClusters

Algorithms Sequence

How to profile algorithms?

Page 24: Advanced Modular Software Performance Monitoring

24

V. Gaudi Intel Profiling Auditor

VTune™ User API +

Gaudi Auditors API

Page 25: Advanced Modular Software Performance Monitoring

25

VTune™ User API

- Start/Pause profiling- Mark profiling regions

Page 26: Advanced Modular Software Performance Monitoring

26

Gaudi Auditors API

Algorithm

Start event End event

Callback functions

Page 27: Advanced Modular Software Performance Monitoring

27

Algorithms profiling (I)

CPU time per sequence branch

Page 28: Advanced Modular Software Performance Monitoring

28

Algorithms profiling (II)

Page 29: Advanced Modular Software Performance Monitoring

29

Gaudi configuration

from Configurables import IntelProfilerAuditorprofiler = IntelProfilerAuditor()profiler.StartFromEventN = 5000 profiler.StopAtEventN = 15000AuditorSvc().Auditors +=  [profiler]

Page 30: Advanced Modular Software Performance Monitoring

30

Run: $> intelprofiler -o /collected/data job.py

Analyze (GUI): $> amplxe-gui /collecter/data/r001hs

Analyze (CLI): $> amplxe-cl -reports hotspots -r /collecter/data/r001hs

Page 31: Advanced Modular Software Performance Monitoring

31

VI. Profiling examples

1. Memory allocation functions2. Measuring profiling accuracy3. Custom reports

Page 32: Advanced Modular Software Performance Monitoring

32

1. Memory allocation functionsoperatornew from libstdc++ library:

tc_new from tcmalloc library:

tc_new uses twice less time then operatornew

Page 33: Advanced Modular Software Performance Monitoring

33

2. Measuring profiling accuracy

Intel Profiling Auditorvs .

Timing AuditorMeasures the absolute time of

algorithm's run1000 events

Page 34: Advanced Modular Software Performance Monitoring

34

3. Custom reportsBuild reports using CSV files exported

from VTune Amplifier

Page 35: Advanced Modular Software Performance Monitoring

35

Conclusions

Intel® VTune™ Amplifier XE:

+ Various analysis types and reports + Rich User API + Reasonable overhead time