PINTOS PINTOS : An : An Execution Phase Based Execution Phase Based Optimization and Optimization and Simulation Tool Simulation Tool ) ) Wei Hsu Wei Hsu , Jinpyo Kim, Sreekumar , Jinpyo Kim, Sreekumar Kodak Kodak Computer Science Computer Science Department Department University of Minnesota University of Minnesota October 9 October 9 , , 2004 2004 PIN Tutorial at ASPLOS`04 PIN Tutorial at ASPLOS`04
33
Embed
PINTOS : An Execution Phase Based Optimization and Simulation Tool )
PINTOS : An Execution Phase Based Optimization and Simulation Tool ). Wei Hsu , Jinpyo Kim, Sreekumar Kodak Computer Science Department University of Minnesota October 9 , 2004 PIN Tutorial at ASPLOS`04. Outline. What is Pintos? Wh at can Pintos do ? - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
PINTOSPINTOS: An Execution : An Execution Phase BasedPhase Based Optimization Optimization
and Simulation Tooland Simulation Tool))Wei HsuWei Hsu, Jinpyo Kim, Sreekumar Kodak, Jinpyo Kim, Sreekumar Kodak
Computer ScienceComputer Science Department DepartmentUniversity of MinnesotaUniversity of Minnesota
October 9October 9,, 2004 2004PIN Tutorial at ASPLOS`04PIN Tutorial at ASPLOS`04
OutlineOutline
• What is Pintos?
• What can Pintos do?
• Phase detection for optimization and simulation
• Optimization (instruction prefetching)
• Fast Simulation
• Summary
What is Pintos?What is Pintos?• PINTOS is a PIN based Tool for Optimization and
Simulation• A research framework supports adaptive object code
optimization – Supports deep analysis of run-time program behavior for object
code optimization (e.g. instruction, data prefetching)– Integrates HPM performance monitoring (Pfmon) with dynamic
instrumentation (PIN).
• Also supports fast performance simulation– Identifies program phases (with coarse and fine granularity)– Generates simulation strings that capture representative
program behaviors
Pintos FrameworkPintos Framework
program
pfmon
profile
profileanalysis
Opttargets
program
pfmon
profile
profileanalysis
phasetargets
PIN-basedAnalysis
control flow
CacheSim
PIN-basedPhase
Detection
SimulationString Gen
Optim
ization
Sim
ulation
FilteredOpt
Targets
SimulationStrings
PhaseInfo
Our BackgroundOur Background• ADORE dynamic optimization system
Main Thread
Kernel / Pfmon
Hardware Performance Monitoring Unit
DynamicOptimization
Thread
Code Cache
Trace Selection
Optimization
Deployment
Phase Detection
ADORE Performance: ADORE Performance: Speedup of ORC2.1 Speedup of ORC2.1 +O2 Compiled SPEC2000 Benchmarks+O2 Compiled SPEC2000 Benchmarks
0.00
%
1.59
%
8.75
%
1.32
%
6.18
%
4.97
%
0.00
%
1.00
%
0.71
%
4.14
%
18.6
3%
0.83
%
0.00
%
7.02
%
0.06
%
0.00
%
8.66
%
115.
25%
22.4
0%
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
ADORE ADORE Performance at Different Performance at Different Sampling RatesSampling Rates
What What can can Pintos doPintos do for us? for us?
• Pintos uses pfmon to identify high-level performance problems (e.g. I-cache miss) and locate target code (phases) for optimization
• Pintos then uses PIN-based analysis tool to focus on target code (phases) to conduct deep analysis
• Pintos provides a framework to support deep analysis of program behavior so that we may experience with new object code optimization techniques and feed them to ADORE.
• Simulation strings can be generated by Pintos and used for more efficient micro-architecture simulations
Phase Phase basedbased Optimization and Optimization and SimulationSimulation
• Phase is a sequence of code that consistently exhibits certain performance behaviors in Pintos, for example– Gzip shows consistent and repeated data cache miss patterns – Crafty exhibits consistent I-cache misses
• A repeating phase can serve as an unit for dynamic and adaptive optimization, or for fast performance simulations. – Optimization unit can be basic block, trace, procedure and
region (loops and loop nests including complex control transfers)– Simulation unit can be an extended code sequence
Phase Phase DDetectionetection
• One phase detection method doesn’t fit all needs. – Dynamic data cache prefetching requires coarse grain
• Truncated Execution gives very inaccurate results
• Reduced Input sets do not always behave the same as reference inputs so the performance estimation based on reduced input sets may be misleading.
Mechanism of SMARTSMechanism of SMARTS
UWW U (K-1) * U
Program Run Time
W: Warm up time (Fixed to 2000 instructions for SPEC 2000)
U: Detailed Simulation (Fixed to 1000 instructions for SPEC2000)
(K-1)*U:
Function Simulation with Functional Warming (The tool gives the value of K for which the IPC will be within + 3% of the actual value with 99.7% confidence interval)
Issues in Previous WorkIssues in Previous WorkSMARTS • Value of U and W fixed for SPEC 2000 suite. Have to
identify them for every new benchmark suite (Very time consuming)
• Over sampling in steady phases. Does not effectively exploit the existence of phases in programs
SIMPOINT• The user chooses the length of simulation point (100
million, 10 million, 1 million)• Provides Simulation Points based on Clustering of Basic
Block profiles which is generated using sim-fast or ATOM
Selected Simpoints Simpoint Clusters IPC Data from Itanium2
SummarySummary
• We show the combination of HPM sampling (Pfmon) and dynamic instrumentation (Pin) in our research framework (Pintos) for adaptive object code optimization and micro-architectural simulation.
• PASS (Phase Aware Stratified Sampling) may lead to a more efficient way in simulating the interaction between compiler optimizations and new micro-architectural features.