Top Banner
MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power- Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei Li Key Laboratory of Computer System and Architecture, ICT (Institute of Computing Technology), CAS, Beijing, P.R. China NVIDIA Corporation, USA
21

MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.

Jan 04, 2016

Download

Documents

Arthur Ball
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.

MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency

Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei Li

Key Laboratory of Computer System and Architecture,ICT (Institute of Computing Technology), CAS, Beijing, P.R. China

NVIDIA Corporation, USA

Page 2: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.

Outline

• What’s Path-grained Timing Adaptability (PTA)

• Potential of PTA for Efficiency Improvement

• How to Exploit PTA

• Case Study Results

• Conclusions

Page 3: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.

Impact of DVFS to Path Delay

P1

P2

FF

FFCritial Path

TCycle Period

Non-critical Path

K-1th stage Kth stage

T T

• Traditionally, suppose voltage scaling down makes P1 and P2 timing critical, then what?

• Scaling down frequency to all stages of pipeline

Question:

• Can these emerging critical paths be salvaged to trade more voltage scaling down?

• Maybe Yes! By fine-grained time stealing

Page 4: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.

Timing Imbalance

T T

FF

FF

FF

FF

PCP

TCycle Period

NCP

Generous Flip-flop (GFF)

Backward Adaptable Flip-flop (BAFF)

Forward Adaptable Flip-flop (FAFF)

Unadaptable Flip-flop (UAFF)

Slack_up > TH, Slack_dn > TH

Slack_up > TH, Slack_dn ≤ TH

Slack_up ≤ TH, Slack_dn > TH

Slack_up ≤ TH, Slack_dn ≤ TH

Page 5: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.

Intrinsic Timing Imbalance

• Case study• FPU, adopted by OpenSPARC T1

• Support all IEEE 754 floating-point data types

• Synthesized by Synopsys Design Compiler with UMC 0.18um technology

• Cycle period: (1+10%) ×T critical

1

10

100

1000

10000

GF

F

FA

FF

BA

FF

UA

FF

GF

F

FA

FF

BA

FF

UA

FF

GF

F

FA

FF

BA

FF

UA

FF

GF

F

FA

FF

BA

FF

UA

FF

GF

F

FA

FF

BA

FF

UA

FF

GF

F

FA

FF

BA

FF

UA

FF

GF

F

FA

FF

BA

FF

UA

FF

TH=0.1Cycle

TH=0.15Cycle

TH=0.2Cycle

TH=0.25Cycle

TH=0.3Cycle

TH=0.35Cycle

TH=0.4Cycle

# F

lip-f

lop

s

The GFFs, FAFFs, and BAFFs take considerable even dominated proportion!

Attractive Potential

Page 6: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.

DVFS Exacerbating Imbalance

• Generally, the time margin of longer paths diminish much more faster than that of short ones

FF

• Assume that the path delay is the sum of delay of gates on the path

• TG : the gate delay

• Delta: the delay change during the voltage scaling down

• Before voltage scaling down• △S1 = (n - m) × TG

• After voltage scaling down• △S2 = (n - m) × (TG + Delta)

Define: S=|Slack_dn △ - Slack_up|

Slack_dnSlack_up

n gates m gates

△S1 < S△ 2

Example

Page 7: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.

If the Imbalance be utilized…• Check the lower bound of cycle period T

• Traditionally:

T1 = n× (TG+Delta)• From MicroFix’s perspective:

T2 = (m+n)/2 × (TG+Delta) ≤ T1 - TH

T

δ

n

(m+n)/2

Without MicroFix

With MicroFix

F=1/Tδ= δ(V)

F

1/V

1/n

2/(m+n)

Without MicroFix

With MicroFix

Note: preclude the UAFFs

Page 8: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.

How to deal with UAFFs?

• Two-supply voltage scheme [Usami, JSSC’98] [Ghosh, TCAD’07]

• Critical Isolation: the critical paths resulting in UAFFs

• The supply voltage of Critical Isolation are more conservative than that of other portion out of Critical Isolation.

Critical Isolation

Powered by Conservative Voltage

Powered by Aggressive Voltage

The exploitable scope of MicroFix

Page 9: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.

How to “Fix’’?

• Two supply voltage scheme• Timing sensors [Yan, DATE’09][Agarwal, VTS’07]

• Multiple-phase Clocks (generated by a DLL)

……(K-1)FFs

KFFs

Delay Error Prediction Signals

……(K-1)th stage

LogicKth stage

Logic

Timing Sensors

Timing Sensors

Target Pipeline

Voltage/Frequency

Control

Normal Voltage Supply

………… …… …… ……

CLK……

……

……

FCLK

BCLK

Conservative Voltage Supply

CLK

BCLK

FCLKT×TH

T×TH

UAFFFAFF

GFF

FCLK BCLK

BAFF

CLK

FFs

Page 10: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.

Operational Principles

V, F V, FV, F

Reduce frequency from F to F Reduce voltage from V to V

(a) Traditional DVFS

Increase frequency from F to F Increase voltage from V to V

Reducing Power

Increasing Performance

V, FV, F

Increase voltage from V to V

Increase frequency from F to F

V, F

V V-v Monitoring

No error predicted

V V+ v

Error predicted

F F + fMonitoring

No error predicted

F F- fError

predicted

(b) MicroFix enhanced DVFS

Reduce frequency from F to F

Reduce voltage from V to V

Restore a tight margin

Restore a tight margin

Ensure that the restored margin ‘v’ and ‘f ’ can guard safe voltage and frequency turning.

Page 11: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.

Experimental Setup

• Gate-level• Study the adaptability and overhead with a

synthesized FPU – Timing info. -> PrimeTime

• Transistor-level • Investigated the Power-Performance tradeoffs

with Hspice simulations – 32nm PTM models dedicated for HP and LP

applications, respectively.

Page 12: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.

Exploring Design Tradeoffs

• ‘TH’ play a critical role in determining the ultimate Efficiency

Critical Isolation

The exploitable scope of MicroFix

Critical Isolation

The exploitable scope of MicroFix

Smaller ‘TH’, smaller CI, but less aggressive voltage reduction!

1

10

100

1000

10000

GF

F

FA

FF

BA

FF

UA

FF

GF

F

FA

FF

BA

FF

UA

FF

GF

F

FA

FF

BA

FF

UA

FF

GF

F

FA

FF

BA

FF

UA

FF

GF

F

FA

FF

BA

FF

UA

FF

GF

F

FA

FF

BA

FF

UA

FF

GF

F

FA

FF

BA

FF

UA

FF

TH=0.1Cycle

TH=0.15Cycle

TH=0.2Cycle

TH=0.25Cycle

TH=0.3Cycle

TH=0.35Cycle

TH=0.4Cycle

# F

lip-f

lop

s

Larger ‘TH’, larger CI, but more aggressive voltage reduction!

What ‘TH’ is optimal?

Page 13: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.

Exploring Design Tradeoffs /2

• Percentage of Cells in Critical Isolation

0.00% 0.00% 0.16%2.04%

10.82%

22.70%

33.52%

0%

5%

10%

15%

20%

25%

30%

35%

40%

0.1 0.15 0.2 0.25 0.3 0.35 0.4

TH

Pe

rce

nta

ge

of

Ce

lls

Page 14: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.

Exploring Design Tradeoffs /3

• Sensor Area Overhead• a sensor is about 8x that of a pipeline flip-flop (based on the

number of transistors) [Yan, DATE09]

• The paths in the critical isolation and those with ‘over-larger’ slack (i.e. slack >T × TH + tmargin) do not need to be monitored by sensors

0.00%

2.10%3.75%

9.20%

12.34%10.95%

9.97%

0.0%

2.0%

4.0%

6.0%

8.0%

10.0%

12.0%

14.0%

0.1 0.15 0.2 0.25 0.3 0.35 0.4TH

Se

nso

r a

rea

ove

rhe

ad

Page 15: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.

Exploring Design Tradeoffs /4

• Sensor Power Overhead• in the most pessimistic case (TH=0.3, all sensors

simultaneously flag timing errors): 14%

• HOWEVER, such worst-case power overhead can hardly happen due to three reasons

1) Sensors do not need to be always on

2) It’s almost impossible all sensors flag impending timing errors simultaneously

3) TH=0.3 actually is not a optimal configuration

Therefore, the pessimistic power overhead won’t offset much efficiency of MicroFix!

Page 16: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.

Hspice Simulations

• Object: Investigate the detailed delay-power relation of the target pipeline

• It is ideal to directly simulate the transistor-level model of the target pipeline with Hspice; however it is very labor-intensive and time consuming.

• So we took a indirect way to conduct the Hspice simulations

Ptotal(V,F) = Pcomb(V,F)+Pff(V,F)1/F = T = tc + tsetup + tc−to−q

Page 17: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6

Voltage (V)N

orm

aliz

ed P

ower

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

6

1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6

Voltage (V)

Nor

amliz

ed D

elay

(a) (b)

High Perf.

Low PowerLow Power

High Perf.

Combinational Component

• ISCAS85 (c432, c499, c880, c1355, c1908, c2670)

• 32nm PTM models (HP and LP versions)

Normalized V-D and V-P relations comply well with all of the simulated benchmarks!

Page 18: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.

Sequential Component

• V-D

• V-P

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

6

1 0.9 0.8 0.7 0.6Voltage (V)

No

rma

lize

d D

ela

y

t_setup + t_c-to_q

t_setup + t_c-to_q

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6Voltage (V)

Nor

mal

ized

Pow

er

α=1α=0.5α=0.25α=1α=0.5α=0.25

Low PowerHigh Perf.

Low Power

High Perf.

(a) (b)

Page 19: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.

Efficiency Comparsion

TH = 0.2 is an optimal choice!Efficiency Improvement: 35% EDP, 28% PDP

0.00%

2.10%3.75%

9.20%

12.34%10.95%

9.97%

0.0%

2.0%

4.0%

6.0%

8.0%

10.0%

12.0%

14.0%

0.1 0.15 0.2 0.25 0.3 0.35 0.4TH

Se

nso

r a

rea

ove

rhe

ad

Page 20: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.

Conclusion

• MicroFix can improve DVFS efficiency by exploiting the path-grained adaptability

• The timing imbalance threshold, TH, implies a critical design tradeoff

• The efficiency of EDP for HP application up to 35% and PDP for LP application up to 28%, at the expense of only 7% area overhead

Page 21: MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.

Thanks!

Q&A