Transcript
Trace Visualization
Visualization and Analysis of MPI Resources
Motivation & Mission• Motivation
– Parallel programming is about performance!– Scaling to thousands of cores is required– You need a decent MPI implementation, e.g. Open MPI– You also need a ready-to-use performance monitoring and
analysis tool
• Mission– Visualization of dynamics of complex parallel processes– Requires two components
• Monitor/Collector (VampirTrace)• Charts/Browser (Vampir)
– Available for major platforms– Open Source (partially)
Event Trace Visualization
• Trace Visualization– Alternative and supplement to automatic analysis
– Show dynamic run-time behavior graphically
– Provide statistics and performance metrics• Global timeline for parallel processes/threads
• Process timeline plus performance counters
• Statistics summary display
• Message statistics
• More
– Interactive browsing, zooming, selecting• Adapt statistics to zoom level (time interval)
• Also for very large and highly parallel traces
Vampir History
• PARvis at Research Center Jülich
• 1995: Vampir at Research Center Jülich
http://www.top500.org/reports/1995/vampir/vampir.html
– 1997: Vampir at TU Dresden
– 2006: new version VampirServer (or Vampir NG)
• Distributed storage, enhanced scalability
• Client/server architecture
– 2009: Vampir7 – redesign of GUI using QT
Vampir Toolset Architecture
Vampir
Trace
Vampir
Trace
TraceFile
(OTF)
Vampir 7
TraceBundle
VampirServer
CPU CPU
CPU CPUCPU CPU
CPUCPU
Multi-Core
Program
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
Many-CoreProgram
Vampir for Windows
• Vampir for UNIX– VampirClassic
(single threaded)
– VampirServer(MPI parallel)
• Vampir for Windows– Based on parallel service
engine
– All new browser
• A beta of the newBrowser for Linuxavailable at www.vampir.eu
Vampir Classic
All in one, single threaded
Vampir 7 for Windows
Threaded
service DLL
Windows
GUIAPI
Vampir Server
Parallelized
service engine
Visualization
(Motif)Sockets
Usage order of the VampirPerformance Analysis Toolset
1. Instrument your application with VampirTrace
2. Run your application with an appropriate test set
3. Analyze your trace file with Vampir1. Small trace files with a low number of processes can be
analyzed on your local workstation
1. Start your local Vampir
2. Load trace file from your local disk
2. Large trace files should be stored on the cluster file system
1. Start VampirServer on your analysis cluster
2. Start your local Vampir
3. Connect local Vampir with the VampirServer on the analysis cluster
4. Load trace file from the cluster file system
Vampir Displays
The main displays of Vampir:
• Master Timeline (Global Timeline )
• Process and Counter Timeline
• Function Summary
• Message Summary
• Process Summary
• Communication Matrix
• Call Tree
Vampir 7: Displays for a WRF trace
Master Timeline ( Global Timeline )
Master Timeline
Process and Counter Timeline
Process Timeline
Counter Timeline
Function Summary
FunctionSummary
Message Summary
Process Summary
ProcessSummary
Communication Matrix
CommunicationMatrix
Call Tree
Customizable Chart Layout
•No cluttering
•Time based alignment
•View impact at a glance
•Simple controls (hidden)
•User defined– Combination
– Rows and columns
– Arrangement
– Size
Dresden, September 15thComprehensive Performance Tracking with
Vampir 7.0Slide 17
Master Timeline
Func. GroupSummary
Secondary Timeline
Process Timeline
FunctionSummary
Call Tree
FunctionLegend
ContextView
Toolbars
Sessions
• What is a session?– Trace file– Chart selection– Layout– Preferences (i. e. colors)– Chart options
• Scope of session properties– Identical for all traces– Trace specific– Matter of taste– Therefore: scope is
customizable
• Can be attached to trace data
Dresden, September 15thComprehensive Performance Tracking with
Vampir 7.0Slide 18
• Toolbars• Master Timeline• Secondary Timeline• Process Timeline• Function Summary• Function Group Summary• Call Tree• Function Legend• Context View
Master Timeline
Func. GroupSummary
Secondary Timeline
Process Timeline
FunctionSummary
Call Tree
FunctionLegend
ContextView
Toolbars
Master Timeline
Func. GroupSummary
Secondary Timeline
Process Timeline
FunctionSummary
Call Tree
FunctionLegend
ContextView
Toolbars
Master Timeline
Func. GroupSummary
Secondary Timeline
Process
Timeline
FunctionSummary
Call Tree
FunctionLegend
ContextView
TOOLBARS
Master Timeline
Func. GroupSummary
Secondary Timeline
Process
Timeline
FunctionSummary
Call Tree
FunctionLegend
ContextView
TOOLBARS
Trace
File
(OTF)
Config
File
Typical Performance Problems
Finding Bottlenecks
• Trace Visualization– Vampir provides a number of display types
– Each allows many different options
• Advice– Identify essential parts of an application (initialization,
main iteration, I/O, finalization)
– Identify important components of the code (serial computation, MPI P2P, collective MPI, OpenMP)
– Make a hypothesis about performance problems
– Consider application’s internal workings if known
– Select the appropriate displays
– Use statistic displays in conjunction with timelines
FINDING BOTTLENECKS
Communication
Computation
Memory, I/O, etc.
Tracing itself
Bottlenecks in Communication
• Communications as such (dominating over computation)
• Late sender, late receiver
• Point-to-point messages instead of collective communication
• Unmatched messages
• Overcharge of MPI’s buffers
• Bursts of large messages (bandwidth)
• Frequent short messages (latency)
• Unnecessary synchronization (barrier)
All of the above usually result in high MPI time share
Bottlenecks in Communication
unnecessary MPI_Barriers
Bottlenecks in Communication
Patterns of successive MPI_Allreduce calls
Bottlenecks in Communication
Inefficient implementation of MPI_Allgatherv
Further Bottlenecks
• Unbalanced computation
– Single late comer
• Strictly serial parts of program
– Idle processes/threads
• Very frequent tiny function calls
• Sparse loops
Further Bottlenecks
Example: Idle OpenMP threads
Bottlenecks in Computation
• Memory bound computation
– Inefficient L1/L2/L3 cache usage
– TLB misses
– Detectable via HW performance counters
• I/O bound computation
– Slow input/output
– Sequential I/O on single process
– I/O load imbalance
• Exception handling
Bottlenecks in Computation
Low FP rate due to heavy cache misses
Bottlenecks in Computation
Low FP rate due to heavy FP exceptions
Bottlenecks in Computation
Irregular slow I/O operations
Effects due to Tracing
• Measurement overhead
– Especially grave for tiny function calls
– Solve with selective instrumentation
• Long/frequent/asynchronous trace buffer flushes
• Too man concurrent counters
• Heisenbugs
Effects due to Tracing
Trace buffer flushes are explicitly marked in the trace.
It is rather harmless at the end of a trace as shown here.
Conclusion– Performance analysis very important in HPC
– Use performance analysis tools for profiling and tracing
– Do not spend effort in DIY solutions, e.g. like printf-debugging
– Use tracing tools with some precautions• Overhead
• Data volume
– Let us know about problems and about feature wishes
– vampirsupport@zih.tu-dresden.de
Summary• Vampir & VampirServer
– Interactive trace visualization and analysis– Intuitive browsing and zooming– Scalable to large trace data sizes (100GByte)– Scalable to high parallelism (20000 processes)
• Vampir for Linux in progress, beta available
• VampirTrace– Convenient instrumentation and measurement– Hides away complicated details– Provides many options and switches for experts
• VampirTrace is part of Open MPI > 1.3
top related