Power Management in High Performance Processors through Dynamic Resource Adaptation and Multiple Sleep Mode Assignments Houman Homayoun National Science Foundation Computing Innovation Fellow Department of Computer Science University of California San Diego
28
Embed
Power Management in High Performance Processors through Dynamic Resource Adaptation and Multiple Sleep Mode Assignments Houman Homayoun National Science.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Power Management in High Performance Processors through
Dynamic Resource Adaptation and Multiple Sleep Mode
Assignments
Houman HomayounNational Science Foundation Computing Innovation
Fellow
Department of Computer Science University of California San Diego
Brief overview of state-of-art superscalar processor Introducing the idea of multiple sleep modes design Architectural control of multiple sleep modes Results Conclusions
Can identify the high IPB period, once the first low IPB period is detected.
The number of fetched branches is counted every 512 cycles, once the number of branches is found to be less than a certain threshold (24 in this work) a high IPB period identified. The IPB is then predicted to remain high for the next twenty 512 cycles intervals (10K cycles).
Branch predictor peripherals transition from basic-lp mode to lp mode when a high IPB period is identified.
During pre-stall and stall periods the branch predictor peripherals transition to aggr-lp and ultra-lp mode, respectively.
The Instruction Queue is a CAM-like structure which holds
instructions until they can be issued. Set entries for new dispatched instructions Read entries to issue instructions to functional units Wakeup instructions waiting in the IQ once a result is ready Select instructions for issue when the number of instructions
available exceed the processor issue limit (Issue Width).
No Need to always have such aggressive wakeup/issue width!No Need to always have such aggressive wakeup/issue width!
At each cycle, the match lines are pre-charged high To allow the individual bits associated with an instruction tag to be compared with the
results broadcasted on the taglines. Upon a mismatch, the corresponding matchline is discharged. Otherwise, the match line
stays at Vdd, which indicates a tag match. At each cycle, up to 4 instructions broadcasted on the taglines,
four sets of one-bit comparators for each one-bit cell are needed. All four matchlines must be ORed together to detect a match on any of the broadcasted tags.
The result of the OR sets the ready bit of instruction source operand
The ROB and the register file are multi-ported SRAM structures with several functionalities:
Setting entries for up to IW instructions in each cycle, Releasing up to IW entries during commit stage in a cycle, and Flushing entries during the branch recovery.
How Architecture can help reducing power in ROB, Register File and Instruction Queue
Issue rate decrease
-10%
0%10%
20%30%
40%50%
60%70%
80%90%
100%
Scenario I
Scenario II
Significant issue width decrease!
Scenario I: The issue rate drops by more than 80%Scenario II: The issue rate drops is 22% for integer benchmarks and 32.6% for floating-point benchmarks.
How Architecture can help reducing power in ROB, Register File and Instruction Queue
Register File occupancy
Scenario I IRF
non-Scenario I
IRF
Scenario I FRF
non-Scenario I
FRF
Scenario II IRF
non-Scenario II
IRF
Scenario II FRF
non-Scenario II FRF
bzip2 74.4 28.8 0.0 0.0 56.6 30.7 0.0 0.0
crafty 83.4 31.9 0.1 0.0 51.4 32.2 0.0 0.0
gap 46.2 41.1 0.1 0.7 65.8 42.9 0.6 0.5
gcc 46.3 21.2 0.2 0.1 28.7 24.0 0.0 0.1
gzip 45.1 27.2 0.0 0.0 39.8 27.2 0.0 0.0
mcf 40.8 29.3 1.0 1.1 46.8 36.4 3.2 0.1
parser 37.4 29.8 0.0 0.0 57.0 29.8 0.1 0.0
twolf 58.7 32.3 2.6 2.1 46.0 29.8 2.5 2.0
vortex 70.9 31.1 0.3 0.2 52.4 35.0 0.2 0.2
vpr 63.9 29.0 7.8 8.6 66.4 41.0 8.7 8.3
INT average 55.3 29.2 1.1 1.2 50.3 32.0 1.4 1.0
applu 6.0 5.6 76.6 64.8 1.7 6.2 77.3 73.7
apsi 16.1 18.3 65.7 37.6 15.8 17.9 58.8 43.6
art 35.4 25.0 36.2 30.7 23.0 29.0 42.9 6.3
equake 34.2 27.4 16.1 7.1 32.7 29.4 21.0 9.6
facerec 52.6 22.5 50.0 28.9 30.3 38.4 48.1 35.0
galgel 50.4 27.4 41.8 48.7 32.1 26.0 61.0 44.2
lucas 21.7 23.8 47.7 44.0 41.7 22.1 29.7 47.0
mgrid 5.9 6.2 90.0 80.7 1.9 6.4 96.7 87.2
swim 23.3 27.8 77.1 78.1 29.7 23.1 87.1 76.2
wupwise 26.3 28.8 53.5 28.7 40.5 26.9 38.0 42.2
FP average 26.6 20.9 56.5 44.7 24.0 22.1 56.2 46.0
IRF occupancy always grows for both scenarios when IRF occupancy always grows for both scenarios when experimenting with integer benchmarks. a similar case is for experimenting with integer benchmarks. a similar case is for FRF when running floating-point benchmarks and only FRF when running floating-point benchmarks and only during scenario II during scenario II