Identifying Program Power Phase Behavior Using Power Vectors Canturk Isci & Margaret Martonosi WWC-6 10.27.2003 Austin, TX
Jan 13, 2016
Identifying Program Power Phase Behavior Using Power Vectors
Canturk Isci & Margaret Martonosi
WWC-6
10.27.2003Austin, TX
2
Power Phase BehaviorPower Phase Behavior
Existence of distinguishable intervals during an application’s execution lifetime such that:
They share significantly higher resemblance within themselves in terms of power behavior the application exhibits on a given processorThis similarity is carried out by not only the total processor power, but also the distribution of power into processor sub-units
(Filtered) VPR Power Breakdowns
0
1
2
3
4
5
6
0 100 200 300 400 500Time (s)
Po
wer
[W
atts
]
(Filtered) GAP Power Breakdowns
0
1
2
3
4
5
6
0 50 100 150 200 250Time (s)
Po
wer
[W
atts
]
(Filtered) Gcc Power Breakdowns
0
1
2
3
4
5
0 100 200Time (s)
Po
wer
[W
atts
]
3
Our Power Phase AnalysisOur Power Phase AnalysisGoal:
Identify phases in program power behavior
Determine execution points that correspond to these phases
Define small set of power signatures that represent overall power behavior
4
Our ApproachOur ApproachOur Approach – Outline:
Collect samples of estimated power values for processor sub-units <Power Vectors> at application runtime
Define a power vector similarity metric
Group sampled program execution into phases
Determine execution points and representative signature vectors for each phase group
Analyze the accuracy of our approximation
5
MotivationMotivationCharacterizing power behavior:
Future power-aware architectures and applications
Dynamic power/thermal management
Architecture research
Utilizing power vectors:
Direct relation to actual processor power consumption
Acquired at runtime
Identify program phases with no knowledge of application
6
0
5
10
15
20
25
30
35
40
45
50
Vector(0) Vector(3) Vector(6) Vector(9) Vector(12) Vector(15) Vector(18) Vector(21) Vector(24)
RETIRE
Schedule Inst Queue2
Inst Queue1
Rename Allocation
Ucode ROM
1st Level BPU Trace Cache
Inst Dec
FP Regfile INT Regfile FP Exec
INT Exec
Data TLB MEM control
MOB
L1 cache ITLB & Fetch
2nd Level BPU L2 Cache
Bus Control
Generating Power VectorsGenerating Power Vectors
POWERCLIENT
POWERSERVER Voltage readings
via RS232 to logging machine
Convert voltage to measured power Convert access rates to component powers
1mV/Adc conversion
Counter based access rates
over ethernet
22 Entries of each power vector sample
7
Power Vector Similarity MetricPower Vector Similarity Metric How to quantify the ‘power behavior
dissimilarity’ between two execution points?1. Consider solely total power difference
2. Consider manhattan distance between the corresponding 2 vectors
3. Consider manhattan distance between the corresponding 2 vectors normalized
4. Consider a combination of (2) & (3)
Construct a “similarity matrix” to represent similarity among all pairs of execution points
Each entry in the similarity matrix:
8
Gcc Component Power Breakdowns
0
1
2
3
4
5
6
0 50 100 150 200 250Time (s)
Po
we
r [W
att
s] L1 cache
TraceCache
RETIRE
Gcc Total Power
0
10
20
30
40
50
60
0 50 100 150 200 250
Po
wer
[W
atts
]
MEASURED POWER COUNTER ESTIMATED POWER
Gzip Total Power
0
10
20
30
40
50
60
44 88 132 176 220 264 308 352 396 440Time (s)
Pow
er [W
atts
]
MEASURED POWER COUNTER ESTIMATED POWER
Gzip Component Power Breakdowns
0
1
2
3
4
5
6
44 144 244 344 444Time (s)
Pow
er [W
atts
] L1 cache
INT Exec
RETIRE
Gcc
& G
zip
Mat
rix
Plo
tsG
cc &
Gzi
p M
atri
x P
lots
(Sim
ilar
to S
imPoin
ts w
ork
of
Sh
erw
ood
et
al.)
9
Gcc Component Power Breakdowns
0
1
2
3
4
5
6
0 50 100 150 200 250Time (s)
Po
we
r [W
att
s] L1 cache
TraceCache
RETIRE
Gcc Elaboration: Very variant power
Almost identical power behavior at 30, 50, 180s.
Although 88s, 110s, 140s, 210s and 230s show similar total power; 88, 210 and 230 share higher similarity.
Gcc Total Measured Power
0
10
20
30
40
50
60
0 50 100 150 200 250Time (s)
Po
wer
[W
atts
]
Gcc Total Power
0
10
20
30
40
50
60
0 50 100 150 200 250
Pow
er [W
atts
]
MEASURED POWER COUNTER ESTIMATED POWER
Gcc
& G
zip
Mat
rix
Plo
tsG
cc &
Gzi
p M
atri
x P
lots
10Gcc
& G
zip
Mat
rix
Plo
tsG
cc &
Gzi
p M
atri
x P
lots
Gzip Elaboration: Much regular power behavior
Spurious similarities are again distinguished by the similarity analysis
GZIP Total Measured Power
0
10
20
30
40
50
60
44 94 144 194 244 294 344 394 444
Time (s)
Po
wer
[W
atts
]
Gzip Total Power
0
10
20
30
40
50
60
44 88 132 176 220 264 308 352 396 440Time (s)
Pow
er [W
atts
]
MEASURED POWER COUNTER ESTIMATED POWER
Gzip Component Power Breakdowns
0
1
2
3
4
5
6
44 144 244 344 444Time (s)
Pow
er [W
atts
] L1 cache
INT Exec
RETIRE
11
Grouping Execution PointsGrouping Execution Points “Thresholding Algorithm”:
Define a threshold of similarity < % of max dissimilarity>
Start from first execution point (0,0) and identify ones in the fwd execution path that lie within threshold for both normalized and absolute metrics
Tag the corresponding execution points (j,j) as the same group
Find next untagged execution point (r,r) and do the same along forward path
Rule: A tagged execution point cannot add new elements to its group!We demonstrate the outcome of thresholding with Grouping Matrices
12
Gzip Grouping MatricesGzip Grouping Matrices
Gzip has 974 power vectors Cluster vectors based on similarity
using “thresholding” Max Gzip power dissimilarity: 47.35W
13
Gzip Group Distribution for Threshold = 1%
0
50
100
150
200
250
300
0 50 100 150 200 250 300 350 400 450 500
Time (s)
Gro
up #
0
10
20
30
40
50
Gzip Group Distribution for Threshold = 10%
05
1015202530354045
0 50 100 150 200 250 300 350 400 450 500
Time (s)
Gro
up #
0
10
20
30
40
50
Generated Group DistributionsGenerated Group Distributions
14
Representative Vectors & Execution PointsRepresentative Vectors & Execution Points
We have each execution point assigned to a group For Each Group:
For Each Execution Point:
We can represent whole execution with as many power vectors as the number of generated groups
Define a representative vector as the average of all instances of that group
Select the execution point that started the group (The earliest point in each group)
Assign the corresponding group’s representative vector as that point’s power vector
Assign the power vector of the selected execution point for that group as that point’s power vector
15
Rec
onst
ruct
ing
Pow
er T
race
R
econ
stru
ctin
g P
ower
Tra
ce
wit
h R
epre
sen
tati
ve V
ecto
rs:
wit
h R
epre
sen
tati
ve V
ecto
rs:
RECONSTRUCTED GZIP POWER for Threshold=1% <254 Vectors>
5
15
25
35
45
55
0 50 100 150 200 250 300 350 400 450 500Time (s)
Po
wer
[W
]
TOTAL_MEASURED_POWER TOTAL_MODELED_POWER RECONSTRUCTED_POWER(Representative Vectors)
RECONSTRUCTED GZIP POWER for Threshold=10% <33 Vectors>
5
15
25
35
45
55
0 50 100 150 200 250 300 350 400 450 500Time (s)
Po
we
r [W
]
TOTAL_MEASURED_POWER TOTAL_MODELED_POWER RECONSTRUCTED_POWER(Representative Vectors)
16
Rec
onst
ruct
ing
Pow
er T
race
R
econ
stru
ctin
g P
ower
Tra
ce
wit
h S
elec
ted
Exe
cuti
on P
oin
ts:
wit
h S
elec
ted
Exe
cuti
on P
oin
ts:
RECONSTRUCTED GZIP POWER for Threshold=10% <33 Vectors>
5
15
25
35
45
55
0 50 100 150 200 250 300 350 400 450 500Time (s)
Po
we
r [W
]
TOTAL_MEASURED_POWER TOTAL_MODELED_POWER RECONSTRUCTED_POWER(Vectors Based on Selected Execution Points)
RECONSTRUCTED GZIP POWER for Threshold=1% <254 Vectors>
5
15
25
35
45
55
0 50 100 150 200 250 300 350 400 450 500Time (s)
Po
we
r [W
]
TOTAL_MEASURED_POWER TOTAL_MODELED_POWER RECONSTRUCTED_POWER(Vectors Based on Selected Execution Points)
17
Component Power CharacterizationsComponent Power Characterizations
GZIP Modeled Power - Vector Components
0
10
20
30
40
50
60
44 66 88 110132
154176
198220
242264
286308
330352
374396
418440
462
Ve
cto
r C
om
po
ne
nts
Bus Control L2 Cache 2nd Level BPU ITLB & Fetch L1 cache MOB MEM control Data TLB INT Exec FP Exec INT Regfile FP Regfile Inst Dec Trace Cache 1st Level BPU Ucode ROM Allocation Rename Inst Queue1 Inst Queue2 Schedule RETIRE TOTAL_MODELED_POWER
GZIP Reconstructed Power - Vector Components
0
10
20
30
40
50
60
44 66 88 110132
154176
198220
242264
286308
330352
374396
418440
462
Vec
tor
Co
mp
on
ents
Bus Control_R L2 Cache_R 2nd Level BPU_R ITLB & Fetch_R L1 cache_R MOB_R MEM control_R Data TLB_R INT Exec_R FP Exec_R INT Regfile_R FP Regfile_R Inst Dec_R Trace Cache_R 1st Level BPU_R Ucode ROM_R Allocation_R Rename_R Inst Queue1_R Inst Queue2_R Schedule_R RETIRE_R RECONSTRUCTED_POWER
(Representative Vectors)GZIP Reconstructed Power - Vector Components
0
10
20
30
40
50
60
44 66 88 110132
154176
198220
242264
286308
330352
374396
418440
462
Ve
cto
r C
om
po
ne
nts
Bus Control_R L2 Cache_R 2nd Level BPU_R ITLB & Fetch_R L1 cache_R MOB_R MEM control_R Data TLB_R INT Exec_R FP Exec_R INT Regfile_R FP Regfile_R Inst Dec_R Trace Cache_R 1st Level BPU_R Ucode ROM_R Allocation_R Rename_R Inst Queue1_R Inst Queue2_R Schedule_R RETIRE_R RECONSTRUCTED_POWER
(Vectors Based on Selected Execution Points)
18
GZIP Reconstructed Power - Absolute Errors
0
1
2
3
4
5
6
7
8
9
10
44 66 88 110 132 154 176 198 220 242 264 286 308 330 352 374 396 418 440 462
Time(s)
Ab
solu
te D
iffe
ren
ce
0
10
20
30
40
50
60
Po
wer
[W
]
(Vectors Based on Selected Execution Points)GZIP Reconstructed Power - Absolute Errors
0
1
2
3
4
5
6
7
8
9
10
44 66 88 110 132 154 176 198 220 242 264 286 308 330 352 374 396 418 440 462
Ab
solu
te D
iffe
ren
ce
0
10
20
30
40
50
60
Time (s)
Po
wer
[W
]
(Representative Vectors)
Approximation ErrorApproximation Error
Due to thresholding algorithm Errors for selected exec. points are bounded with the threshold Max Error: 4.71W & RMS Error: 3.08W
As representative vectors are group centroids Cumulative errors for repr. vectors are lower Max Error: 7.10W & RMS Error: 2.31W
Error in total power< Σ(Component errors)
19
ConclusionConclusion Presented a power oriented methodology to
identify program phases that uses power vectors generated during program runtime
Provided a similarity metric to quantify power behavior similarity of different execution samples
Demonstrated our representative sampling technique to characterize program power behavior
Can be useful for power & characterization research: Power Phase identification/prediction Reduced power simulation Dynamic power/thermal management
20
Related WorkRelated Work Dhodapkar and Smith [ISCA’02]
Working set signatures to detect phase changes Sherwood et. al. [PACT’01,ASPLOS’02,ISCA’30]
Similarity analysis based on program basic block profiles to identify phases
Todi [WWC’01] Clustering based on counter information to identify
similar behavior
Our work in comparison Power oriented Power behavior similarity metric Runtime No information about the application is required Bounded approximation error with thresholding
21
EOP
22
Power Vector Components More detail on Power Vectors
Different Similarity Metrics Similarity matrices and equations for all discussed techniques
Similarity & Grouping Matrices Exemplified description of the two matrices and plots
Current & Future Research Discussion of the ongoing and future research Includes also some new ideas and some things to do to make
our current analysis solid
Questions & Rebuttals Starts with the discussion of presented work Discusses some shortcomings, things that need to be done to
improve and to verify that it is unique and solid(Some parts of current work also discusses these issues)
Also provides some answers to reviewers’ questions Includes some possible new ideas
EXTRA SLIDESEXTRA SLIDES
23
Defining ComponentsDefining Components
24
P4 Architecture vs LayoutP4 Architecture vs Layout
Components to Model:
1) Bus Control2) L2 Cache3) 2nd Level BPU4) ITLB & Ifetch5) L1 Cache
6) MOB7) Mem Control8) DTLB9) Int EXE10)FP EXE11) Int RF
12)FP RF13)Decode14)Trace $15)1st Level BPU16)Microcode ROM17)Allocation
18)Rename19) Inst-n Qs20)Schedule21) Inst-n Qs22)Retirement
Back Back
25
Defining Events Defining Events Access Rates Access Rates We determined 24 events to approximate access rates
for 22 components Used Several Heuristics to represent each access rate
Examples:
Need to rotate counters 4 times to collect all event data Used 15 counters & 4 rotations to collect all event data
26
Access Rates Access Rates Component Powers Component Powers
“Performance Counter based Access Rate estimations are used as proxy for max component power weighting together with microarchitectural details in order to estimate processor sub-unit powers”
EX: Trace cache delivers 3 uops/cycle in deliver mode and 1 uop/cycle in build mode:
Power(TC)=[Access-Rate(TC)/3 + Access-Rate(ID)] x MaxPower(TC) + Non-gated TC CLK power
Total power is computed as the sum of all 22 component powers + measured idle power (8W):
27
Counter Access HeuristicsCounter Access Heuristics 1) BUS CONTROL:
No 3rd Level cache BSQ allocations ~ IOQ allocations Metric1: Bus accesses from all agents
Event: IOQ_allocationCounts various types of bus transactions
Should account for BSQ as wellaccess based rather than duration
MASK:Default req. type, all read (128B) and write (64B) types, include OWN,OTHER and PREFETCH
Metric2: Bus Utilization(The % of time Bus is utilized)Event: FSB_data_activity
Counts DataReaDY and DataBuSY events on BusMask:
Count when processor or other agents drive/read/reserve the busExpression: FSB_data_activity x BusRatio / Clocks Elapsed
To account for clock ratios
28
Counter Access HeuristicsCounter Access Heuristics 2) L2 Cache:
Metric: 2nd Level cache referencesEvent: BSQ_cache_reference
Counts cache ref-s as seen by bus unitMASK:
All MESI read misses (LD & RFO)2nd level WR misses
3) 2nd Level BPU: Metric 1: Instructions fetched from L2 (predict)
Event: ITLB_ReferenceCounts ITLB translations
Mask:All hits, misses & UC hits
Metric 2: Branches retired (history update)Event: branch_retired
Counts branches retiredMask:
Count all Taken/NT/Predicted/MissP
29
Counter Access HeuristicsCounter Access Heuristics 4) ITLB & I-Fetch:
etc……… 10) FP Execution:
Metric: FP instructions executedevent1: packed_SP_uop
counts packed single precision uopsevent2: packed_DP_uop
counts packed single precision uopsevent3: scalar_SP_uop
counts scalar double precision uopsevent4: scalar_DP_uop
counts scalar double precision uopsevent5: 64bit_MMX_uop
counts MMX uops with 64bit SIMD operandsevent6: 128bit_MMX_uop
counts integer SSE2 uops with 128bit SIMD operandsevent7: x87_FP_UOP
counts x87 FP uopsevent8: x87_SIMD_moves_uop
counts x87, FP, MMX, SSE, SSE2 ld/st/mov uops Back Back
30
Similarity Based on Total PowerSimilarity Based on Total Power
31
Similarity Based on Absolute Power Similarity Based on Absolute Power VectorsVectors
32
Similarity Based on Normalized Power Similarity Based on Normalized Power VectorsVectors
33
Similarity Based on Both Absolute and Similarity Based on Both Absolute and Normalized Power VectorsNormalized Power Vectors
Back Back
Similarity Matrix ExampleSimilarity Matrix Example
Consider 4 vectors, each with 4 dimensions:
1
2
3
5
2
1
5
3
2
2
4
4
4
3
5
1
0 6 3 7
6 0 3 6
3 3 0 7
7 6 7 0
322214543
:2 &1 VectorsBetween Distance
:nCalculatio DistanceManhattan Exemplary
Log all distances in the similarity matrix
0 6 3 7
6 0 3 6
3 3 0 7
7 6 7 0
Color-scale from black to white (only for upper diagonal)
35
Interpreting Similarity Matrix PlotInterpreting Similarity Matrix Plot
Back Back
Level of darkness at any location (r,c) shows the amount of similarity between vectors –samples– r & c.
i.e. 0 & 2
All samples are perfectly similar to themselves
All (r,r) are black
Vertically above the diagonal shows similarity of the sample at the diagonal to previous samples
i.e. 1 vs. 0
Horizontally right of the diagonal shows similarity of the sample at the diagonal to future samples
i.e. 1 vs. 2,3
Grouping Matrix ExampleGrouping Matrix Example
Consider same 4 vectors:
0 6 3 76 0 3 63 3 0 77 6 7 0
togethergrouped becan 5.3 distance with VectorsThreshold 50%
together grouped becan none 0.7, Threshod 10%
7 :VectorsBetween Distance Maximum
Mark execution pairs with distance ≤ Threshold
2
1
5
3
2
2
4
4
4
3
5
1
1
2
3
5
0 6 3 76 0 3 63 3 0 77 6 7 0
0 6 3 76 0 3 63 3 0 77 6 7 0
0 6 3 76 0 3 63 3 0 77 6 7 0 Back Back
37
Current & Future ResearchCurrent & Future Research
FOLLOWING SLIDES DISCUSS ONGOING RESEARCH RELATED TO POWER PHASES. PLANS FOR FUTURE RESEARCH ARE ALSO DISCUSSED
38
THE BIG PICTURETHE BIG PICTURE
Performance Monitoring
Real Power Measurement
PowerModeling
ThermalModeling
Performance Monitoring
Real Power Measurement
PowerModeling
ThermalModeling
PowerPhases
ProgramProfiling
ProgramStructure
To Estimate component power & temperature breakdowns for P4 at runtime…
To analyze how power phase behavior relates to program structure
Bottom line…
39
PowerModeling
PowerPhases
ProgramProfiling
ProgramStructure
PowerPhases
ProgramProfiling
ProgramStructure
Phase BranchPhase Branch
Power Phase Behavior Similarity Based on Power Vectors Identifying similar program regions
Profiling Execution Flow Sampling process’ execution “PCsampler” LKM
Program Structure Execution vs. Code space Power Phases Exec. Phases
NOT YET?
40
PowerModeling
PowerPhases
ProgramProfiling
ProgramStructure
PowerPhases
POWER PHASE BEHAVIORPOWER PHASE BEHAVIOR
Power Phase Behavior Similarity Based on Power Vectors Identifying similar program regions
Profiling Execution Flow Sampling process’ execution “PCsampler” LKM
Program Structure Execution vs. Code space Power Phases Exec. Phases
NOT YET
41
Identifying Power PhasesIdentifying Power PhasesMost of the methodology is ready
Complete Gzip case in WWCExtensibility to other benchmarks
Generated similarity metrics for several Performing phase identification with
thresholdingRepeatibilty of the experimentSeveral other possible ideas such as:
Thresholding + k-means clustering Two-pass thresholding PCA for dimension reduction (or SVD?) Manhattan = L1 norm
Euclidian (L2) – not interestingChebyschev (Linf) - ??
42
PowerModeling
PowerPhases
ProgramProfiling
ProgramStructure
ProgramProfiling
Program Execution ProfileProgram Execution Profile
ProgramStructure
Power Phase Behavior Similarity Based on Power Vectors Identifying similar program regions
Profiling Execution Flow Sampling process’ execution “PCsampler” LKM
Program Structure Execution vs. Code space Power Phases Exec. Phases
NOT YET
43
Program Execution ProfileProgram Execution Profile
Sample program flow simultaneously with power Our LKM implementation: “PCsampler”
Not Finished…
Generate code space similarity in parallel with power space similarity
Relative comparisons of methods for: Complexity Accuracy Applicability, etc.
44
CURRENT STATECURRENT STATE
Sample PC Binding to functionsReacquire PID
Those SPECs, Runspec: always in fixed address at ELF_program interpreterBenches: change pid between datasets
Verify PC with objdump So we can make sure it is the PC
we’re sampling
45
Initial Data: PC?? Trace For gzip-sourceInitial Data: PC?? Trace For gzip-source
Gzip-Source PC Distribution
134515000
134520000
134525000
134530000
134535000
134540000
134545000
0 20 40 60 80 100 120 140 160Sample
Me
mo
ry L
oc
ati
on
Correspond to:<send_bits><bit_reverse><lm_init><longest_match><fill_window>
functions
Back Back
46
Generality of the technique All Spec benchmarks show
distinct phase behavior:Repeatability of the experiment
Need to be able to arrive at similar phase behavior in order to characterize an application
Correlation between vector components Inherent redundancy in power vectors Could be removed with PCA
Alternative norms for similarityApplicability of selected execution points
DiscussionDiscussionFrom here onFrom here on
47
Similarity Analysis for Other Similarity Analysis for Other Applications?Applications? SPECs show similar applicable behavior
Not always phase-like, i.e. twolf has more like a power gradient
Results for other benchmarks: <NOT READY> Gcc & Twolf:
# of groups w.r.t. thresholdsErrors plots for reconstructed & selected vectors
Apply to other applications: Desktop applications
Will follow the bursty behavior, maybe determine action signatures??
Saving, computation, streaming, etc. Ghostscript might be interesting
Correlation between phases vs. locations of images
48
Dependence of Results to the Dependence of Results to the Applied Power Model?Applied Power Model?The generic technique requires only
sufficiently detailed power breakdowns that add up to total power Doesn’t matter how you acquire the ‘power
vectors’ otherwiseIf you use some other characterization
data other than power vectors Can still perform phase analysis, but cannot
provide a direct estimation for reconstructed power behavior
Maybe use some kind of mapping?Log measured power data as well as characterization metrics?
Would still be unable to predict component-wise
49
Possibility for Other Processors?Possibility for Other Processors?
Most recent processors are keen on power management There will be enough power variability to exploit for power
phase analysis Porting the power estimation to other architectures
Requires significant effort to Define power related metricsImplement counter reader and power estimation user and kernel SW
Porting to same architecture, different implementation More straightforward
Reevaluate max/idle/gated power estimates Experiences with other architectures:
Castle project for Pentium Pro (P6) Few watts of variationLow dimensionality
IBM Power3 IIVery low measured power variation
50
Other Statistical Techniques?Other Statistical Techniques?
Alternative measures of distance: Different norms
Canberra Distance
Squared Chi-squared distance, etc.Other similarity metrics:
Pearson’s correlationcoefficient
Cosine similarity
N
i
N
crN iPViPVLcr
1 22
1
)()(,
22
1)()(
)()(),(i
iPViPV
iPViPV
cr
crcrC
22
122
)()(),(i
iPViPV crcrPC
22
1
222
1
2
22
1
)()(
)()(
),(i
ci
r
icr
iPViPV
iPViPV
crCS
51
SPEClite vs. SimPoints SPEClite vs. SimPoints vs. Power Vectorsvs. Power VectorsApproaches of 3 methods:
SPEClite: Performance counts (runtime) (Perf. Oriented) (k-means partitioning for vectors normalized to 0 mean and unit variance) (PCA to create reduced vectors) (selects vectors closest to centroids)
SimPoints: Basic block accesses (Simulation) (Perf. Oriented) (k-means partitioning for normalized basic block vectors) (selects vectors closest to centroids –except for ‘Early’ Simpoints)
Power Vectors:Component power consumptions (Runtime) (Power Oriented) (threshold based partitioning for ‘normalized + absolute’ vectors) (Selects earliest vector of each group)
52
Why Power Vectors w.r.t. Others?Why Power Vectors w.r.t. Others?
Provides a direct interpretation for power consumption Could be used to identify specific power behavior for
dynamic power/thermal management Power phases might be not a perfect
translation of performance phases <CURRENT WORK INVESTIGATES>
I.e. same basic block accesses during different architectural states
Generated at runtime Easy repeatability, etc.
Thresholding provides upper bound estimate for the power approximation with selected execution points
53
Modified StuffModified Stuff
FOLLOWING SLIDES I MODIFIED FROM THE ORIGINAL, BUT STILL KEEP ‘EM
54
Our Power Phase AnalysisOur Power Phase Analysis Goal:
Identify phases in program power behavior Determine execution points
that correspond to these phases Define small set of power signatures
that represent overall power behavior
Our Approach: Collect samples of estimated power values for processor sub-
units <Power Vectors> at application runtime Define a similarity metric regarding these power vectors Process the outcome of this similarity metric to group sampled
execution points (/Power Vectors) into phases Determine execution points and representative signature
vectors for each phase group Quantify the closeness of our approximation based on these
vectors to original power behavior
55
MotivationMotivation Characterizing power behavior:
Future power-aware architectures and applications Dynamic power/thermal management
Multiconfigurable hardwareThread scheduling, DVS, DFSRecurring phase prediction
Architecture researchRepresentative –reduced– simulation points
Utilizing power vectors: Direct relation to actual processor power consumption Acquired at runtime
Similarity relations generated quicklyEasy repeatability for different datasets/compilationsIdentify (recurring) phases over large scales of execution
Identify program phases with no knowledge of application (i.e. no basic block profile, PC sampling, code space info, etc.)
56
Power Vector Similarity MetricPower Vector Similarity Metric How to quantify the ‘power behavior
dissimilarity’ between two execution points?1. Consider solely total power difference
Conceals significant phase information2. Consider manhattan distance between the
corresponding 2 vectors Vectors with small magnitudes are inherently closer
3. Consider manhattan distance between the corresponding 2 vectors normalized Indifferent to magnitude of power consumption
4. Consider a combination of (2) & (3) Restricts us to both absolute and ratio-wise similarities
Construct a “similarity matrix” to represent similarity among all pairs of execution points
Each entry in the similarity matrix:
57
Gcc Component Power Breakdowns
0
1
2
3
4
5
6
0 50 100 150 200 250Time (s)
Po
we
r [W
att
s] L1 cache
TraceCache
RETIRE
Gcc Total Power
0
10
20
30
40
50
60
0 50 100 150 200 250
Po
wer
[W
atts
]
MEASURED POWER COUNTER ESTIMATED POWER
Gzip Total Power
0
10
20
30
40
50
60
44 88 132 176 220 264 308 352 396 440Time (s)
Pow
er [W
atts
]
MEASURED POWER COUNTER ESTIMATED POWER
Gzip Component Power Breakdowns
0
1
2
3
4
5
6
44 144 244 344 444Time (s)
Pow
er [W
atts
] L1 cache
INT Exec
RETIRE
Gcc
& G
zip
Mat
rix
Plo
tsG
cc &
Gzi
p M
atri
x P
lots
Gcc Elaboration: Very variant power
Almost identical power behavior at 30, 50, 180s.
Although 88s, 110s, 140s, 210s and 230s show similar total power; 88, 210 and 230 share higher similarity.
Gcc Total Measured Power
0
10
20
30
40
50
60
0 50 100 150 200 250Time (s)
Po
wer
[W
atts
]
Gzip Elaboration: Much regular power behavior
Spurious similarities such as 100-150s and 200-280 are distinguished by the similarity analysis
GZIP Total Measured Power
0
10
20
30
40
50
60
44 94 144 194 244 294 344 394 444
Time (s)
Po
wer
[W
atts
]
58
Similarity for Simplicity?Similarity for Simplicity? So, we can identify similar power phases:
I.e. informally: if similarity matrix(r,c) is DARK Execution points r & c have similar power behavior
2 Questions: 1) How do we group the execution points (power vectors) based on their similarity? 2) Could we represent power behavior with reasonable accuracy, with a small number of ‘signature’ vectors?
Our answer to Q.1: “Thresholding Algorithm”: Define a threshold of similarity < % of max dissimilarity> Start from first execution point (0,0) and identify ones in the fwd execution path that lie within threshold for both
normalized and absolute metrics Tag the corresponding execution points (j,j) as the same group Find next untagged execution point (r,r,) and do the same along fwd path Rule: A tagged execution point cannot add new elements to its group!
We demonstrate the outcome of thresholding with Grouping Matrices
59
Gzip Grouping MatricesGzip Grouping Matrices
Gzip has 974 power vectors Cluster vectors based on similarity
using “thresholding” Max Gzip power dissimilarity: 47.35W
60
ConclusionConclusion Presented a power oriented methodology to
identify program phases that uses power vectors generated during program runtime
Provided a similarity metric to quantify power behavior similarity of different execution samples
Demonstrated our representative sampling technique to characterize program power behavior Representative vectors for program power signatures Execution points for representative simulation
Can be useful for power & characterization research: Power Phase identification/prediction Reduced power simulation Dynamic power/thermal management