On-demand solution to minimize I-cache leakage energy Group members: Chenyu Lu and Tzyy-Juin Kao
Feb 22, 2016
On-demand solution to minimize I-cache leakage energy
Group members: Chenyu Lu and Tzyy-Juin Kao
Motivation• High power dissipation causes thermal problems, such as higher
packaging, power delivery and cooling costs• In 70nm technology, leakage may constitute as much as 50% of total
energy dissipation • Use the super-drowsy leakage saving technique
– Lower the supply voltage to a level (0.25V) near the threshold voltage (0.2V)– Data can still maintain but can not be accessed– Require one cycle penalty to wake up from the saving mode to the active mode
• Use the on-demand wakeup policy on the I-cache– Only the cache lines currently in use need to be awake– Accurately predict the next cache line by using the branch predictor– On most branch mispredictions, the extra wakeup stage is overlapped with the
misprediction recovery
Overview• Super-drowsy cache line
– A Schmitt trigger inverter controls the voltage of the cache line at the leakage saving mode
– Replace multiple supply voltage sources
• Wakeup prediction policy – enables on-demand wakeup – The branch predictor already
identifies which line need to be woken up
– No additional wakeup-prediction structure is needed
Methodology• Leakage energy = drowsy_energy + active_energy + turn_on_energy• Monitoring every cycle in sim-outorder: active_lines & turn_on• Add a wake_bit to every block:
– 0: means it’s in drowsy mode this cycle– 1: means it’s in active mode this cycle– 2: means it’s in active mode this cycle and the next cycle– 3: means it in drowsy mode this cycle and will be in active mode next cycle
• Update the wake_bit and count the active_lines every cycle using Update_wakeup()
• Change the wake_bit every instruction fetch using fetch_line()• Improved strategy
Interval * Active_Power < Interval * Drowsy_Power + Turn_On_Energy• Speculate with a list of recently-accessed cache lines
Results
Change block size
Change interval
Future Work
• One cycle extra latency when target address misprediction (0.08% performance drop according to the paper)
• Apply On demand policy on data cache– No prediction– Extra latency can be hidden by locality and out of
order execution