Cycle Accurate Profiling With Perf Cycle Accurate Profiling With Perf Cycle Accurate Profiling With Perf Cycle Accurate Profiling With Perf Cycle Accurate Profiling With Perf Cycle Accurate Profiling With Perf Cycle Accurate Profiling With Perf Cycle Accurate Profiling With Perf Cycle Accurate Profiling With Perf Cycle Accurate Profiling With Perf Cycle Accurate Profiling With Perf Cycle Accurate Profiling With Perf Cycle Accurate Profiling With Perf Cycle Accurate Profiling With Perf Cycle Accurate Profiling With Perf Cycle Accurate Profiling With Perf Cycle Accurate Profiling With Perf Paweł Moll <[email protected]> 1
41
Embed
Cycle Accurate Profiling With Perf - eLinux.org · · 2016-07-06/ # time perf record -F 1000 -e cpu-clock \ ... 9747993 S:0x000FD546 F04F0CF8 1 MOV r12,#0xf8 _Exit ... but the goal
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Cycle Accurate Profiling With PerfCycle Accurate Profiling With PerfCycle Accurate Profiling With PerfCycle Accurate Profiling With PerfCycle Accurate Profiling With PerfCycle Accurate Profiling With PerfCycle Accurate Profiling With PerfCycle Accurate Profiling With PerfCycle Accurate Profiling With PerfCycle Accurate Profiling With PerfCycle Accurate Profiling With PerfCycle Accurate Profiling With PerfCycle Accurate Profiling With PerfCycle Accurate Profiling With PerfCycle Accurate Profiling With PerfCycle Accurate Profiling With PerfCycle Accurate Profiling With PerfPaweł Moll <[email protected]>
n Embedded Trace Buffern Dedicated, small SRAMn Flight recorder use case
n Trace Portn External analysern Large buffern External (high speed) pins
n Embedded Trace Routern Sinks data into main interconnectn Usually uses system DRAMn Consumes memory system bandwidth
7
ETM Protocoln Highly compressed data
n Generated against program memoryn Based on E/N atoms
n eg. b1NEEEE00 up to 16+1 instructionsn Branches
n Only if not evident (eg. eg. B <imm>)n No address == previous addressn Exceptions, instruction sets, processor state
n Synchronisation packetsn Data packetsn PTM protocol
n One bit per conditional branch
8
Issues
n Requires memory contents for decompressionn Multitasking OSn JIT enginesn Self-modifying code (kernel runtime patching, kprobes, dynamic trace events)
n Parallel and out-of-order excecution
9
Additional features
n Filteringn Addressn CONTEXTIDn VMID
n Trigerringn Addressn DBG <imm>n Countern Sequencer
n Timestampingn Correlation (synchronisation)
10
Linux perf
11
Linux perf framework
n PMU driversn Many use cases, eg. statisticsn Sampling profiler
n Periodic PC (IP) samplingn Timer or PMU counter overflow interruptn Typical sampling rate 1kHz (every 1ms)
12
Sampling profile
n Statistical approximation of a processn Think analog/digital convertern Shannon’s theorem: “If a function x(t) contains no frequencies higher
than B cps, it is completely determined by giving its ordinates at a seriesof points spaced 1/(2B) seconds apart.”
13
CoreSight Linux framework
n Developed by Mathieu Poirier at Linaron Based on 2012 code from Code Auroran At v7 stage now (http://lwn.net/Articles/614232/)n Control trace components via sysfs
n Enable sink, enable source, dump buffer contents
n Separate decodersn Full series at http://git.linaro.org/kernel/coresight.git
14
Intel PT
n “an exciting new feature coming in future processors” (2013)n Integrating with perf
n Auxiliary buffersn Decoder integrated with user space tool
n Kernel portions at v4 stage now (http://lwn.net/Articles/609010/)n Full series at
n It’s all interesting…n …but the goal is to see a cycle accurate profile of the processn Limit data scope
n Filter out kernel addresses (or cheat using entry/exit points)n Filter out other contexts (or cheat by protecting CPU)n Collate migrated data (or cheat by setting affinity)n Generate memory map with DSOs (or cheat by linking statically)
n Convert into perf data stream
27
perf.data
n Starts with headern Description of all eventsn Followed by data records …
n selection of samplesn by default: PERF_SAMPLE_IP, PERF_SAMPLE_TID, PERF_SAMPLE_TIME,
PERF_SAMPLE_PERIODn generated on every timer/counter interrupt
n …interleaved with system informationn eg. PERF_RECORD_MMAP, PERF_RECORD_COMM, PERF_RECORD_EXIT,
n 616 lines of reportn perf annotate works as well!
38
Summary
39
Summary
n Proof of conceptn Can help with pathological casesn Scaling issuesn Powerful but need to by “civilised”n Nearest future
n Drivers in mainlinen perf tool decoder integration
40
Thank You
The trademarks featured in this presentation are registered and/or unregistered trademarks of ARMLimited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. All other marks featuredmay be trademarks of their respective owners.