Top Banner
A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Professor Shiyan Hu, Ph.D. Department of Electrical and Computer Engineering Michigan Technological University
68

A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Jan 06, 2016

Download

Documents

Crystal Crystal

A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion. Professor Shiyan Hu, Ph.D. Department of Electrical and Computer Engineering Michigan Technological University. Moore’s law. Twice the number of transistors, approximately every two years. 2. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Professor Shiyan Hu, Ph.D.Department of Electrical and Computer Engineering

Michigan Technological University

Page 2: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Moore’s law

2

Twice the number of transistors, approximately every two years

Page 3: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Interconnect Delay Dominates Gate Delay

3

Page 4: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Technology Scaling

4

130nm 65nm

Global interconnect lengths does not shrink Local interconnect lengths shrink Delay ∝ RC Resistance R = rL/S, where S is reduced Capacitance C slightly changes

Page 5: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Interconnect Delay Scaling

5

Scaling factor s=0.7 per generation Emore Delay of a wire of length l

tint = (rl)(cl)/2= rcl2/2 (first order)

Local interconnects tint : (r/s2)(c)(ls)2/2 = rcl2/2

– Local interconnect delay is roughly unchanged

Global interconnects tint : (r/s2)(c)(l)2/2= rcl2

– Global interconnect delay doubles which is unsustainable

Interconnect delay increasingly more dominant

Page 6: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Timing Driven Buffer Insertion

6

Page 7: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Buffers Reduce RC Wire Delay

7

R

x/2

cx/4 cx/4rx/2

∆t = t_buf – t_unbuf = RC + tb – rcx2/4

x/2

cx/4 cx/4rx/2

C

C R

x

∆t

x/2

x

Page 8: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Intuitive Analysis

8

Interconnect Elmore delay = rcL2/2

l=2 lll

L

/22

1

1Interconnect Delay 2 2

2 2Since there are L/2 buffers

L Lrc rc rcL

(Of course, we need to consider buffer delay)

Page 9: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

The delay of a wire of length L is T=rcL2/2

Detailed Analysis

9

gddg

ggd

CRl

cRrCrclL

clCrlclCRNT

12/

2/

0dldT

02 2

opt

gd

l

CRrcL

rc

CRl gdopt

2

L

r,c – Resistance, cap. per unit lengthRd – On resistance of inverterCg – Gate input capacitance

l Assume N identical buffers with equal inter-buffer length l (L = Nl). To minimize

delay

Page 10: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Quadratic Delay -> Linear Delay

10

Substituting lopt back into the interconnect delay expression:

rc

CR

CRcRrC

rc

CRrcL

CRl

cRrCrclLT

gd

gddg

gd

gdopt

dgoptopt

2

2

1

cRrCrcCRLT dggdopt 2

Delay grows linearly with L instead of quadratically.This is why buffer insertion is highly effective and thus widely used for reducing circuit delay.

Page 11: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

25% Gates are Buffers

11

Saxena, et al. [TCAD 2004]

Page 12: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

ITRS Projections

12

Page 13: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Problem Formulation

13

Minimal cost (area/power) solution

1. Steiner Tree

2. n candidate buffer locations

T

Page 14: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Solution Characterization

14

To model effect to downstream, a candidate solution is associated with

• v: a node• C: downstream

capacitance• Q: required arrival

time• W: cumulative

buffer cost

Page 15: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Candidate Buffering Solutions

15

Page 16: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Dynamic Programming (DP)

16

Candidate solutions are propagated toward the source

Start from sinks Candidate solutions

are generated Three operations

– Add Wire

– Insert Buffer

– Merge Solution Pruning

Page 17: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Solution Propagation: Add Wire

17

c2 = c1 + cx q2 = q1 - (rcx2/2 + rxc1) r: wire resistance per unit length c: wire capacitance per unit length

(v1, c1, w1, q1)(v2, c2, w2, q2)x

Page 18: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Solution Propagation: Insert Buffer

18

(v1, c1, w1, q1)(v1, c1b, w1b, q1b)

q1b = q1 - d(b) c1b = C(b) w1b = w1 + w(b) d(b): buffer delay

Page 19: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Solution Propagation: Merge

19

cmerge = cl + cr

wmerge = wl + wr

qmerge = min(ql , qr)

(v, cl , wl , ql) (v, cr, wr, qr)

Page 20: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Example of Solution Propagation

20

(v1, 1, 20, 0)22

v1 v1

(v2, 3, 16, 0)

• r = 1, c = 1• Rb = 1, Cb = 1, tb = 1• Rd = 1

(v2, 1, 12, 1)

v1

(v3, 5, 8, 0)

v1

(v3, 3, 8, 1)

slack = 5slack = 3

Add wire

Add wire

Insert bufferAdd wire

Add driver Add driver

(v, C, Q, W)

Page 21: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Solution Propagation

21

(1)

(2)

(3)

Page 22: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Exponential Runtime

22

2 solutions

4 solutions

8 solutions

16 solutions

n candidate buffer locations lead to 2n solutions

Page 23: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Too Many Solutions

23

Needs solution pruning for acceleration Two candidate solutions

– (v, c1, q1,w1)

– (v, c2, q2,w2)

Solution 1 is inferior to Solution 2 if – c1 c2 : larger load

– and q1 q2 : tighter timing

– and w1 w2: larger cost

Page 24: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Car Race - Speed

24

END

Car Speed <=> RAT

Page 25: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Car Race - Load

25

Load <=> Load Capacitance

Page 26: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Faster & Smaller Load

26

ENDFaster & smaller load(larger RAT, smaller

capacitance):Good

Slower & larger load(smaller RAT, larger

capacitance):Inferior

Page 27: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Faster & Larger Load: Result 1

27

END

Page 28: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Faster & Larger Load: Result 2

28

END

Who will be the winner?Cannot tell at this moment,

so keep both of them.

Page 29: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Pruning

29

(Q1,C1,W1)

(Q2,C2,W2)

inferior/dominatedif C1 C2,W1 W2 and Q1 Q2 Non-dominated solutions are

maintained: for the same Q and W, pick min C # of solutions depends on # of distinct W and Q, but not their values

Page 30: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Generating Candidates

30

(1)

(2)

(3)

Page 31: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Pruning Candidates

31

(3)

(a) (b)

Both (a) and (b) look the same to the source.Remove the one with the worse slack and cost

(4)

Page 32: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Candidate Example Continued

32

(4)

(5)

Page 33: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Candidate Example Continued

33

After pruning

(5)

At driver, compute the candidate solution satisfying the timing target with minimum cost. The result is optimal.

Page 34: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Branch Merge

34

Right Candidates

Left Candidates

Page 35: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Pruning During Branch Merge

35

With pruning(n1n2) solutions after each branch merge. Worst-case ((n/m)m) solutions.

Page 36: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Selected Milestone Works on Timing Buffering

36

1990 1991 ……. 1996 ……. 2003 2004 ……. 2008 2009

van

Ginne

ken’s

algo

rithm

Lillis’

algo

rithm

Shi a

nd Li’s

alg

orith

m

NP-har

dnes

s pro

of

Is it possible to design a provably good algorithm running in polynomial time with theoretical guarantee on the error to the optimal solution?

This is a major open problem for a decade!

Page 37: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Bridging The Gap

37

We are bridging the gap!

A Fully Polynomial Time Approximation Scheme (FPTAS) Provably good Computes a solution

with cost at most (1+ɛ) of the optimal cost for any ɛ>0

Runs in time polynomial in n (nodes), b (buffer types) and 1/ɛ

Best solution for an NP-hard problem in theory

Highly practical

Page 38: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

The Rough Picture

38

W*: the cost of optimal solution

Make guess on W*

Good (close to W*)

Not Good

Key 2: Smart guessKey 1: Efficient checking

Check it

Return the solution

Page 39: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Key 1: Efficient Checking

39

Benefit of guess Only maintain

the solutions with cost no greater than the guessed cost

This is the first reason for acceleratation

Page 40: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

The Oracle

40

Oracle (x): the checker, able to decide whether x>W* or not

– Without knowing W*– Answer efficiently

Page 41: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Construction of Oracle(x)

41

Scale and round each buffer cost

Only interested in whether there is a solution with

cost up to x satisfying timing

constraint

Dynamic Programming

Perform DP to scaled problem with cost upper bound n/ɛ. Time

polynomial in n/ɛ

Page 42: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Scaling and Rounding

42

xɛ/n 2xɛ/n 3xɛ/n 4xɛ/n

Buffer cost

0

Page 43: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Scaling and Rounding

43

Buffer cost1 2 3 40

# distinct buffer costs is at most O(n/ε) since only solutions with W bounded by n/ɛ are propagated.

Rounding error at each buffer xɛ/n, total rounding error xɛ. • Larger xɛ/n: larger error, fewer distinct costs and faster • Smaller xɛ/n: smaller error, more distinct costs and slower • Rounding is the second reason for acceleration

Page 44: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Oracle Construction

44

Yes, there is a solution satisfying timing

constraint

No, no such solution

With cost rounded and scaled back, the solution has cost at most n/ɛ • xɛ/n + xɛ=

(1+ɛ)x > W*

With cost rounded and scaled back, the solution has cost at least n/ɛ •

xɛ/n = x W*

Run dynamic programming with cost n/ɛ

Page 45: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Rounding on Q

45

# solutions bounded by # distinct W and Q # W = O(n/ɛ1), ɛ1 is used for W

– Rounding before DP # Q

– Round up Q to nearest value in {0, ɛ2T/m , 2ɛ2T/m, 3ɛ2T/m,…,T }, in branch merge (m is # sinks)

– Rounding during DP– # Q = O(m/ɛ2), ɛ2 is used for Q – Rounding error bounded by ɛ2T/m per branch merge, by

ɛ2T for the whole tree # non-dominated solutions is O(mn/ɛ1ɛ2)

3ɛ2T/m2ɛ2T/mɛ2T/m 4ɛ2T/m0

Page 46: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Q-W Rounding Before Branch Merge

46

W

Q

n/ɛ1

T

ɛ2T/m

0 1 2 3 4

2ɛ2T/m

3ɛ2T/m

4ɛ2T/m

Page 47: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Buffer Insertion Runtime

47

branch single ain solutions dominated-non )(most At 1

2

21 bnmn

O

pruning.bin - Wcross No node.each for time)( 1

22

21 bnmnb

O

mergebranch aafter solutions )(21

mnO

esbuffer typ b with solutions dominated-non )( introducesinsertion buffer A 1nb

O

bins- W)(1n

O

Page 48: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Branch Merge Runtime - 1

48

Target Q=0

When merging Wl=2 with Wr=1, previously we need to try quadratic # of combinations, now only linear # of combinations.

Page 49: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Branch Merge Runtime - 2

49

Target Q= ɛ2T/m

Page 50: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Branch Merge Runtime - 3

50

Target Q= 2ɛ2T/m

Page 51: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Branch Merge Runtime - 4

51

time)( each takes wherea,W Wall try a, WmergedFor 2

rl am

O

)( is runtime total,0,1,...,aFor 2

21

2

1 mn

On

)( isit bins, into solutions puttingfor timeIncluding2

21

2

1

2

21 mnbnmn

O

mergebranch aafter solutions )(21

mnO

Page 52: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Timing-Cost Approximate DP

52

Lemma: a buffering solution with cost at most (1+ɛ1)W* and with timing at most (1+ɛ2)T can be computed in time

)(1

23

21

2

22

1

22

1

2

21

2

bnbmnnmbmnnm

O

Page 53: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

U (L): upper (lower) bound on W* Naive binary search style approach

Runtime (# iterations) depends on the initial bounds U and L

Key 2: Geometric Sequence Based Guess

53

Oracle (x)

x=(U+L)/2

Set U and L on W*

U= (1+ɛ)x L= x

W*<(1+ɛ)x W* x

Page 54: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Adapt ɛ1

54

Rounding factor xɛ1/n for W Larger ɛ1: faster with rough estimation Smaller ɛ1: slower with accurate estimation Adapt ɛ1 according to U and L

Page 55: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

U/L Related Scale and Round

55

Buffer cost

0U/L

xɛ/n

xɛ/n

Page 56: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Conceptually

56

Begin with large ɛ1 and progressively reduce it (towards ɛ) according to U/L as x approaches W*

Fix ɛ2=ɛ in rounding T for limiting timing violation

• Set ɛ1 as a geometric sequence of …, 8, 4, 2, 1, 1/2, …, ɛ• Suppose that one run of DP takes O(n/ɛ1) time. Total runtime is bounded by the last run as O(… + n/8 + n/4 + n/2 + … + n/ɛ) = O(n/ɛ).

Page 57: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Oracle Query Till U/L<2

57

'

*,

*,

*,

*,'

1 ,1

i

iliu

il

iui

WWx

W

W

)()()1

(

)3/4(2/1

1*,

*,

2

2

1*,

*,

2

2

1'

2

2it

ti iu

il

ti iu

il

ti i W

WnmO

W

WnmO

nmO

)() 59.0()(2

2

0

)3/4(2/1

2

2)3/4(2/1

0*,

*,

2

2

nm

Onm

OW

WnmO

tjtj iu

il j

j

it

tu

tl

iu

il

iu

il

iu

il

il

iu

il

iu

W

W

W

W

W

W

W

W

W

W

W

W

)3/4(

*,

*,

*,

*,

3/4

*,

*,

*,

*,

4/3

*,

*,

*1,

*1,

)(1

23

21

2

22

1

22

1

2

21

2

bnbmnnmbmnnm

O

Page 58: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Mathematically

58

Page 59: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

When U/L<2

59

At least one feasible solution, otherwise no solution with cost 2n/ɛ • Lɛ/n = 2L U

Lɛ/n rounding error per buffer and Lɛ in a solution

A single DP runtime

Pick min cost solution satisfying timing at driver

W=2n/ɛ

Scale and round each cost by Lɛ/n

Run DP

Page 60: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

U/L<2

The Algorithmic Flow

60

Oracle (x)

Adapting ɛ1 =[U/L-1]1/2

Set U and L of W*

Set x=[UL/(1+ ɛ1)]1/2

Update U or L

Compute final solution

Page 61: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Main Theorem

61

Theorem: a (1+ ɛ) approximation to the timing constrained minimum cost buffering problem can be computed in O(m2n2b/ɛ3+ n3b2/ɛ) time for 0<ɛ<1 and in O(m2n2b/ɛ+mn2b+n3b) time for ɛ 1

Page 62: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Experiments

62

Experimental Setup– 1000 industrial nets

– 48 industrial buffer types including non-inverting buffers and inverting buffers

Compared to Dynamic Programming which is the state of the art technique and is widely used in industry

Page 63: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Cost Ratio Compared to DP

63

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

FPTASFPTAS

Buf

fer

Cos

t R

atio

Approximation

Page 64: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Speedup Compared to DP

64

Spe

edup

Approximation

0.01 0.05 0.1 0.2 0.3 0.4 0.50

1

2

3

4

5

6

FPTASFPTAS

Page 65: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Observations

65

FPTAS always achieves the theoretical guarantee Larger ɛ leads to more speedup On average about 5x faster than dynamic programming Can run 4.6x faster with 0.57% solution degradation <5% nets with timing violations which can be fixed by a simple

timing recovery procedure

Page 66: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Our Bridge

66

NP-Hardness Complexity

Exponential Time Algorithm

Page 67: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Conclusion

67

Propose a (1+ ɛ) approximation for timing constrained minimum cost buffering for any ɛ > 0 (DAC’09)

– Runs in O(m2n2b/ɛ3+ n3b2/ɛ) time– Timing-cost approximate dynamic programming – Double-ɛ geometric sequence based oracle search– 5x speedup in experiments– Few percent additional buffers as guaranteed

theoretically The first provably good approximation algorithm on this

problem which is a major open problem in the field

Page 68: A  Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Thanks