Top Banner
SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for Embedded Computer Systems (CECS) University of California, Irvine {kurdahi, dutt}@uci.edu Temperature-Aware SoC Optimization Framework SRC Task # 1617.001
33

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

Dec 26, 2015

Download

Documents

Walter Pierce
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

PIs: Fadi J. Kurdahi and Nikil D. DuttCenter for Embedded Computer Systems (CECS)

University of California, Irvine

{kurdahi, dutt}@uci.edu

Temperature-Aware SoC Optimization Framework

SRC Task # 1617.001

Page 2: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI Annual Review: March 2009 #2

Outline

Background and Motivation

Task Details, Accomplishments

Technical Overview

Page 3: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI Annual Review: March 2009 #3

Background and MotivationSOC Design Methodologies

Traditionally focused on performance, cost, and switching power Temperature and its effects were second tier metrics

Temperature is increasingly becoming a primary design constraint Particularly for sub-100 nm process technologies

Page 4: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

Temperature & SRAM

Effects of high temperature: Increased leakage power Reduced lifetime (e.g. electromigration, stress) Increased interconnect signal propagation delay Increased switching delay of transistors

Increased cell delay due to temperature SRAM’s access time (read/write) will increase A failure occurs when access time > rated time period

Thus, an increase in temperature can cause an SRAM cell to fail.

Page 5: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

Process Variation & SRAMRandom Dopant Fluctuation (RDF):

Dominant impact on a transistor’s strength mismatch Intra-Die Variation (different characteristics of cells within an SRAM block) RDF typically modeled as a Gaussian distribution of threshold voltage

Because of process variation, not all the cells in an SRAM block will fail at the same temperature Different cells will fail at different temperature Read failure, Write failure

Because of variation in threshold voltage, value stored in the cell may flip (Destructive read failure)

Page 6: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI Annual Review: March 2009 #6

Outline

Background and Motivation

Task Details, Accomplishments

Technical Overview

Page 7: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

RELOCATE

Register File Local Access Pattern Redistribution Mechanism for Power and Thermal Management in Out-of-Order Embedded Processor

Houman Homayoun, Aseem Gupta, Avesta Sasan, Alex Veidenbaum,

Fadi Kurdahi, Nikil Dutt

University of California Irvine

Page 8: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

Outline

MotivationBackground studyStudy of Register file UnderutilizationStudy of Register file default access patternsAccess concentration and activity redistribution to relocate

register file access patternsResults

Page 9: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

Why Register File?

RF is one of the hottest units in a processor A small, heavily multi-ported SRAM Accessed very frequently

Example: IBM PowerPC 750FX, AMD Athlon 64

AMD Athlon 64 core floorplan blocksThermal Image of AMD Athlon 64 core floorplan blocks using infrared cameras, Courtesy of Renau et al. ISCA 2007

Page 10: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

Why Temperature?

Higher power densities (Watt per mm2) lead to higher operating temperatures, which(i) Increase the probability of timing violations

(ii) Reduce IC lifetime

(iii) Lower operating frequency

(iv) Increase leakage power

(v) Require expensive cooling mechanisms

(vi) Overall increase in design effort and cost

Page 11: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

Prior Work: Activity Migration

Reduces temperature by migrating the activity to a replicated unit. requires a replicated unit

large area overhead leads to a large performance degradation

Tem

pera

ture

T final

T ambient

Active Period

Idle Period

T init

T crisis

time

AM AM+PG

Page 12: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

Conventional Register Renaming

Free List

Active List

Tail pointer

Head pointer Instruction # Original code Renamed code

1 RA <- ... PR1 <- ...

2 …. <- RA .... <- PR1

3 branch to _L branch to _L

4 RA <- ... PR4 <- ...

5 ... ...

... ...

6 _ L:

_ L:

7 …. <- RA .... <- PR1

Register Renamer Register allocation-release

• Physical registers are allocated/released in a somewhat random order

Page 13: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

Analysis of Register File Operation

1. Register File Occupancy

(a)

0%10%20%30%40%50%60%70%80%90%

100%

RF_ocuupancy < 16 16 < RF_ocuupancy < 32

32 < RF_ocuupancy < 48 48 < RF_ocuupancy < 64

(b)

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

RF_ocuupancy < 16 16 < RF_ocuupancy < 3232 < RF_ocuupancy < 48 48 < RF_ocuupancy < 64

MiBench SPECint2K

Page 14: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

Performance Degradation with a Smaller Register File

(a)

0%

5%

10%

15%

20%

25%

30%

35%

% p

erfo

rman

ce d

egra

dat

ion

48-entry 32-entry 16-entry

(b)

0%

10%

20%

30%

40%

50%

60%

% p

erfo

rman

ce d

egra

dat

ion

48-entry 32-entry 16-entry

MiBench SPECint2K

Page 15: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

Analysis of Register File Operation

2. Register File Access Distribution Coefficient of variation (CV) shows a “deviation” from average # of

accesses for individual physical registers.

nai is the number of accesses to a physical register i during a specific period (10K cycles). na is the average

N, the total number of physical registers

na

nanaN

CV

n

ii

access

2

1

)(1

Page 16: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

Coefficient of Variation

(a)

0%

2%

4%

6%

8%

10%

12%

% c

oef

fici

ent

of

vari

atio

n

(b)

0%

2%

4%

6%

8%

10%

12%

14%

% c

oef

fici

ent

of

vari

atio

n

MiBench SPEC2K

Page 17: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

Register File Operation

Underutilization which is distributed uniformly

while only a small number of registers are occupied at any given time, the total accesses are uniformly distributed over the entire physical register file during the course of execution

Page 18: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

RELOCATE: Access Redistribution within a Register File

The goal is to “concentrate” accesses within a partition of a RF (region) Some regions will be idle (for 10K cycles)

Can power-gate them and allow to cool down

register activity (a) baseline, (b) in-order (c) distant patterns

Page 19: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

An Architectural Mechanism to Support Access Redistribution

Active partition: a register renamer partition currently used in register renaming

Idle partition: a register renamer partition which does not participate in renaming

Active region: a region of the register file corresponding to a register renamer partition

(whether active or idle) which has live registers

Idle region: a region of the register file corresponding to a register renamer partition

(whether active or idle) which has no live registers

Page 20: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

Activity Migration without Replication

An access concentration mechanism allocates registers from only one partition

This default active partition (DAP) may run out of free registers before the 10K cycle “convergence period” is over

another partition (according to some algorithm) is then activated (referred to as additional active partitions or AAP )

To facilitate physical register concentration in DAP, if two or more partitions are active and have free registers, allocation is performed in the same order in which partitions were activated.

Page 21: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

The Access Concentration Mechanism

Partition activation order is 1-3-2-4

Free List

Active List

Free List

Active List

Free List

Active List

Free List

Active List

Partition P1

Free-list 1 full Free-list 3 full Free-list 2 full

Active List 4 emptyActive List 2 emptyActive List 3 empty

Partition P2

Partition P4

Partition P3

Free-list 4 full

Active List 1 empty

Page 22: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

The Redistribution Mechanism

The default active partition is changed once every N cycles to redistribute the activity within the register file (according to some algorithm) Once a new default partition (NDP) is selected, all active partitions

(DAP+AAP) become idle.

The idle partitions do not participate in register renaming, but their corresponding RF regions may have to be kept active (powered up) A physical register in an idle partition may be live

An idle RF region is power gated when its active list becomes empty.

Page 23: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

The Redistribution Mechanism

Free List

Active List

Free List

Active List

Free List

Active List

Free List

Active List

Partition P1

Free-list 1 full Free-list 3 full Free-list 2 full

Active List 4 emptyActive List 2 emptyActive List 3 empty

Partition P2

Partition P4

Partition P3

Free-list 4 full

Active List 1 empty

Page 24: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

Performance Impact?

There is a two-cycle delay to wakeup a power gated physical register region

The register renaming occurs in the front end of the

microprocessor pipeline whereas the register access occurs in the back end. There is a delay of at least two pipeline stages between renaming

and accessing a physical register file Can wake up the requested region in time

Can wake up a required register file region without incurring a performance penalty

at the time of access

Page 25: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

Experimental setup

MASE (SimpleScalar 4.0) Model MIPS-74K processor, 800 MHz

MiBench and SPECint2K benchmarks compiled with Compaq compiler, -O4 flag

Industrial memory compiler used 64-entry, 64bit single-ended SRAM memory in TSMC 45nm

technology

HotSpot to estimate thermal profiles

Page 26: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

Experimental setupTable 1. Processor Architecture

L1 I-cache 8KB, ,4 way, 2 cycles

L1 D-cache 8KB, 4 way, 2 cycles

L2-cache 128KB, 15 cycles

Fetch, dispatch 2 wide

Register file 64 entry

Memory 50 cycles

Instruction fetch queue

2

Load/store queue 16 entry

Arithmetic units 2 integer

Complex unit 2 INT

Pipeline 12 stages

Processor speed 800 MHz

Issue Out-of-order

Table 2. RF Design specification

Process 45nm-CMOS

9 metal layers

Register

file layout area

0.009mm2

Operating Modes Active:R/W

Sleep: no data retention

Operating Voltage 0.6V~1.1V

Read Access Cycle

200MHz

to 1.1GHz

Access time typical corner (0.9V, 45 )

0.32ns

Active Power (Total) in typical corner (0.9V, 45 )

66mW

@ 800MHz

Active Leakage Power typical corner (0.9V, 45 )

15mW

Sleep Leakage Power in typical corner (0.9V, 45 )

2mW Wakeup Delay 0.42ns

Wakeup Energy per register file row (64bits)

0.42nJ

Page 27: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

Results

(a)

0%5%

10%15%20%25%30%35%40%45%50%55%

Po

we

r R

ed

uc

tio

n %

num_partition=2 num_partition=4 num_partition=8

Mibench RF power reduction

Page 28: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

Results

(b)

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

Po

we

r R

ed

uc

tio

n %

num_partition=2 num_partition=4 num_partition=8

SPEC2K RF power reduction

Page 29: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

Analysis of Power Reduction

Increasing the number of RF partitions provides more opportunity to capture and cluster unmapped registers to a partition Indicates that wakeup overhead is amortized for a larger number of

partitions.

Some exceptions the overall power overhead associated with waking up an idle region

becomes larger as the number of partition increases. frequent but ineffective power gating and its overhead as the number of

partition increases

Page 30: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

Peak Temperature Reduction

Table 1. Peak temperature reduction for MiBench benchmarks

temperature reduction for different number of partition (C )

base

temperature

(C ) 2P 4P 8P

basicMath 94.3 3.6 4.8 5.0

bc 95.4 3.8 4.4 5.2

crc 92.8 5.3 6.0 6.0

dijkstra 98.4 6.3 6.8 6.4

djpeg 96.3 2.8 3.5 2.4

fft 94.5 6.8 7.4 7.6

gs 89.8 6.5 7.4 9.7

gsm 92.3 5.8 6.7 6.9

lame 90.6 6.2 8.5 11.3

mad 93.3 3.8 4.3 2.2

patricia 79.2 11.0 12.4 13.2

qsort 88.3 10.1 11.6 11.9

search 93.8 8.7 9.3 9.1

sha 90.1 5.1 5.4 4.5

susan_corners 92.7 4.7 5.3 5.1

susan_edges 91.9 3.7 5.8 6.3

tiff2bw 98.5 4.5 5.9 4.1

average 92.5 5.6 6.8 6.9

Table 2. Peak temperature reduction for SPEC2K integer benchmarks

temperature reduction for different number of partition (C )

base

temperature

(C ) 2P 4P 8P

bzip2 92.7 4.8 3.9 3.1

crafty 83.6 9.5 11 10.4

eon 77.3 10.6 12.4 12.5

galgel 89.4 6.9 7.2 5.8

gap 86.7 4.8 5.9 7.1

gcc 79.8 7.9 9.4 10.1

gzip 95.4 3.2 3.8 3.9

mcf 85.8 6.9 8.7 9.4

parser 97.8 4.3 5.8 4.8

perlbmk 85.8 10.6 12.3 12.6

twolf 86.2 8.8 10.2 10.5

vortex 81.7 11.3 12.5 12.9

vpr 94.6 4.9 5.2 4.4

average 87.4 7.2 8.3 8.2

Page 31: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

Analysis of Temperature Reduction

Increasing the number of partitions results in larger power density in each partition because RF access activity is concentrated in a smaller partition

While capturing more idle partitions and power gating them may potentially result in higher power reduction, larger power density due to smaller partition size results in overall higher temperature

Page 32: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

Conclusions

Showed Register File Underutilization

Studied Register file default access patterns

Propose access concentration and activity redistribution to relocate register file accesses

Results show a noticeable power and temperature reduction in the RF

RELOCATE technique can be applied when units are underutilized as opposed to activity migration, which requires replication

Page 33: SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI PIs: Fadi J. Kurdahi and Nikil D. Dutt Center for.

SRC Project 1617.001 Temperature-Aware SoC Optimization Framework PIs: Fadi Kurdahi & Nikil Dutt, UCI

Current and Future Work Extension

Formulate the Best partition selection out of available partitions for activity redistribution.

Apply activity concentration and redistribution mechanism to other hot units; example: L1 cache.

Apply Proactive NBTI Recovery to the idle partitions to improve lifetime reliability.

Trade-off NBTI recovery and power gating to simultaneously reduce power and improve lifetime reliability.

Tackle the temperature barrier in 3D stack processor design using similar activity concentration and redistribution.