Top Banner
Interconnect Optimizations
47

Buffer Insertion

Jul 11, 2016

Download

Documents

Akshay Desai

basics of buffer insertion
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Buffer Insertion

Interconnect Optimizations

Page 2: Buffer Insertion

A scaling primer• Ideal process scaling:

– Device geometries shrink by S= 0.7x)• Device delay shrinks by s

– Wire geometries shrink by • R/ : /(ws.hs) = r/s2

• Cc/ : (hs)./(Ss) = Cc• C/: similar• R/ doubles, C/ and Cc/ unchanged

SS

GG

DD

h

w

l

S

l

h

Sw

Page 3: Buffer Insertion

Interconnect role• Short (local) interconnect

– Used to connect nearby cells– Minimize wire C, i.e., use short min-width wires

• Medium to long-distance (global) interconnect– Size wires to tradeoff area vs. delay– Increasing width Capacitance increases, Resistance

decreases Need to find acceptable tradeoff - wire sizing problem• “Fat” wires

– Thicker cross-sections in higher metal layers– Useful for reducing delays for global wires– Inductance issues, sharing of limited resource

Page 4: Buffer Insertion

Cross-Section of A Chip

Page 5: Buffer Insertion

Block scaling

• Block area often stays same – # cells, # nets doubles

– Wiring histogram shape invariant

• Global interconnect lengths don’t shrink• Local interconnect lengths shrink by s

Page 6: Buffer Insertion

Interconnect delay scaling• Delay of a wire of length l :

int = (rl)(cl) = rcl2 (first order)

• Local interconnects : int : (r/s2)(c)(ls)2 = rcl2

– Local interconnect delay unchanged (compare to faster devices)

• Global interconnects : int : (r/s2)(c)(l)2 = (rcl2)/s2

– Global interconnect delay doubles – unsustainable!

• Interconnect delay increasingly more dominant

Page 7: Buffer Insertion

Buffer Insertion For Delay Reduction

Page 8: Buffer Insertion

Analysis of Simple RC Circuit

)()()(

)())(()(

)()()(

tvtvdttdvRC

dttdvC

dttCvdti

tvtvtiR

T

T

state variable

Inputwaveform

± v(t)CR

vT(t)

i(t)

Page 9: Buffer Insertion

Analysis of Simple RC Circuit

Step-input response:

match initial state:

output response for step-input:

v0v0u(t)

v0(1-e-t/RC)u(t)

)()()(0 tuvtv

dttdvRC

)()( 0 tuvKetv RCt

)()1()( 0 tuevtv RCt

0)( 0)0( 0 tuvKv

Page 10: Buffer Insertion

Delays of Simple RC Circuit• v(t) = v0(1 - e-t/RC) -- waveform

under step input v0u(t)

• v(t)=0.5v0 t = 0.69RC– i.e., delay = 0.69RC (50% delay)

v(t)=0.1v0 t = 0.1RC

v(t)=0.9v0 t = 2.3RC– i.e., rise time = 2.2RC (if defined as time from 10% to 90% of Vdd)

• Commonly used metric TD = RC (= Elmore delay)

Page 11: Buffer Insertion

Elmore Delay

Delay

Page 12: Buffer Insertion

Elmore Delay

• Driver is modeled as R• Driver intrinsic gate delay t(B)• Delay = all Ri all Cj downstream from Ri Ri*Cj• Elmore delay at n2 R(B)*(C1+C2)+R(w)*C2• Elmore delay at n1 R(B)*(C1+C2)

R(B)C1 R(w) C2

n1

B

n2

Page 13: Buffer Insertion

Elmore Delay

• For uniform wire

• No matter how to lump, the Elmore delay is the same

x

C

unit wire capacitance cunit wire resistance r

Page 14: Buffer Insertion

Delay for Buffer

v

C

u

C(b)

u

Intrinsic buffer delayDriver resistanceInput capacitance

Page 15: Buffer Insertion

R

Buffers Reduce Wire Delay

x/2

cx/4 cx/4rx/2

t_unbuf = R( cx + C ) + rx( cx/2 + C )

t_buf = 2R( cx/2 + C ) + rx( cx/4 + C ) + tb

t_buf – t_unbuf = RC + tb – rcx2/4

x/2

cx/4 cx/4rx/2

CC R

x

∆t

Page 16: Buffer Insertion

Combinational Logic Delay

Combinational logic delay <= clock period

Combinational Logic

RegisterPrimary Input

RegisterPrimary Outputclock

Page 17: Buffer Insertion

Buffered global interconnects: Intuition

Interconnect delay = r.c.l2

Now, interconnect delay = r.c.li2 < r.c.l2 (where l = lj )

since (lj 2) < (lj )2

(Of course, account for buffer delay also)

l1 lnl3l2

l

Page 18: Buffer Insertion

Optimal inter-buffer length• First order (lumped parasitic, Elmore delay) analysis

• Assume N identical buffers with equal inter-buffer length l (L = Nl)

• For minimum delay,

gddg

ggd

CRl

cRrCrclL

clCrlclCRNT

12/

2/

0dldT

02 2

opt

gd

lCRrcL

rcCR

l gdopt

2

L

Rd – On resistance of inverterCg – Gate input capacitancer,c – Resistance, cap. per micron

… …l

Page 19: Buffer Insertion

Optimal interconnect delay• Substituting lopt back into the interconnect delay

expression:

rcCR

CRcRrC

rcCR

rcL

CRl

cRrCrclLT

gd

gddg

gd

gdopt

dgoptopt

2

2

1

cRrCrcCRLT dggdopt 2

Delay grows linearly with L (instead of quadratically)

Page 20: Buffer Insertion

Total buffer count

• Ever-increasing fractions of total cell count will be buffers– 70% in 32nm

0

10

20

30

40

50

60

70

80

90nm 65nm 45nm 32nm

% c

ells

use

d to

buf

fer n

ets

clk-bufbuftot-buf

Page 21: Buffer Insertion

Source: ITRS, 2003Source: ITRS, 20030.1

1

10

100250 180 130 90 65 45 32

Feature size (nm)Relativedelay

Gate delay (fanout 4)Local interconnect (M1,2)Global interconnect with repeatersGlobal interconnect without repeaters

ITRS projections

Page 22: Buffer Insertion

Buffers Improve Slack

RAT = 300Delay = 350Slack = -50 RAT = 700Delay = 600Slack = 100RAT = 300Delay = 250Slack = 50RAT = 700Delay = 400Slack = 300

slackmin = -50

slackmin = 50Decouple capacitive load from critical path

RAT = Required Arrival TimeSlack = RAT - Delay

Page 23: Buffer Insertion

Timing Driven Buffering Problem Formulation

• Given– A Steiner tree– RAT at each sink– A buffer type– RC parameters– Candidate buffer locations

• Find buffer insertion solution such that the slack at the driver is maximized

Page 24: Buffer Insertion

Candidate Buffering Solutions

Page 25: Buffer Insertion

Candidate Solution Characteristics

• Each candidate solution is associated with– vi: a node

– ci: downstream capacitance

– qi: RAT

vi is a sinkci is sink capacitance

v is an internal node

Page 26: Buffer Insertion

Van Ginneken’s Algorithm

Candidate solutions are propagated toward the source

Dynamic Programming

Page 27: Buffer Insertion

Solution Propagation: Add Wire

• c2 = c1 + cx• q2 = q1 – rcx2/2 – rxc1

• r: wire resistance per unit length• c: wire capacitance per unit length

(v1, c1, q1)(v2, c2, q2)x

Page 28: Buffer Insertion

28

Solution Propagation: Insert Buffer

• c1b = Cb • q1b = q1 – Rbc1 – tb

• Cb: buffer input capacitance

• Rb: buffer output resistance

• tb: buffer intrinsic delay

(v1, c1, q1)(v1, c1b, q1b)

Page 29: Buffer Insertion

Solution Propagation: Merge

• cmerge = cl + cr

• qmerge = min(ql , qr)

(v, cl , ql) (v, cr , qr)

Page 30: Buffer Insertion

Solution Propagation: Add Driver

• q0d = q0 – Rdc0 = slackmin

• Rd: driver resistance

• Pick solution with max slackmin

(v0, c0, q0)(v0, c0d, q0d)

Page 31: Buffer Insertion

Example of Solution Propagation

(v1, 1, 20)22

v1 v1

(v2, 3, 16)

• r = 1, c = 1• Rb = 1, Cb = 1, tb = 1• Rd = 1

(v2, 1, 12)

v1

(v3, 5, 8)v1

(v3, 3, 8)

slack = 5

slack = 3

Add wire

Add wire

Insert buffer Add wire

Add driver

Add driver

Page 32: Buffer Insertion

32

Example of Merging

Left candidates

Right candidates

Merged candidates

Page 33: Buffer Insertion

Solution Pruning

• Two candidate solutions– (v, c1, q1)– (v, c2, q2)

• Solution 1 is inferior if – c1 > c2 : larger load

– and q1 < q2 : tighter timing

Page 34: Buffer Insertion

Pruning When Insert Buffer

They have the same load cap Cb, only the one with max q is kept

Page 35: Buffer Insertion

35

Generating Candidates(1)

(2)

(3)

From Dr. Charles Alpert

Page 36: Buffer Insertion

36

Pruning Candidates(3)

(a) (b)

Both (a) and (b) “look” the same to the source.Throw out the one with the worst slack

(4)

Page 37: Buffer Insertion

37

Candidate Example Continued(4)

(5)

Page 38: Buffer Insertion

38

Candidate Example ContinuedAfter pruning

(5)

At driver, compute which candidate maximizesslack. Result is optimal.

Page 39: Buffer Insertion

39

Merging Branches

Right Candidates

Left Candidates

Page 40: Buffer Insertion

40

Pruning Merged Branches

Critical

With pruning

Page 41: Buffer Insertion

41

Van Ginneken Example

(20,400)

(20,400)(30,250)(5, 220)

WireC=10,d=150

BufferC=5, d=30

(20,400)

BufferC=5, d=50C=5, d=30

WireC=15,d=200C=15,d=120

(30,250)(5, 220)

(45, 50)(5, 0)(20,100)(5, 70)

Page 42: Buffer Insertion

42

Van Ginneken Example Cont’d

(20,400)(30,250)(5, 220)

(45, 50)(5, 0)(20,100)(5, 70)

(5,0) is inferior to (5,70). (45,50) is inferior to (20,100)

(20,400)(30,250)(5, 220)

(20,100)(5, 70)(30,10)

(15, -10)

Pick solution with largest slack, follow arrows to get solution

Wire C=10

Page 43: Buffer Insertion

Basic Data Structure

(c1, q1) (c2, q2) (c3, q3)

Sorted list such that• c1 < c2 < c3

• If there is no inferior candidates q1 < q2 < q3

Worse load cap

Better timing

Page 44: Buffer Insertion

44

Prune Solution List

(c1, q1) (c2, q2) (c3, q3)

Increasing c

q1 < q2 ?

(c4, q4)

q3 < q4 ?

Y

N Prune 2 q1 < q3 ?

q2 < q3 ?

Yq3 < q4 ?

YPrune 3 q1 < q4 ?

N Prune 3

N

N Prune 4N Prune 4

q2 < q4 ?

Page 45: Buffer Insertion

45

Pruning In Merging

(cl1, ql1)

(cl2, ql2)

(cl3, ql3)

(cr1, qr1)

(cr2, qr2)

ql1 < ql2 < qr1 < ql3 < qr2

Merged candidate

s

(cl1+cr1, ql1)

(cl2+cr1, ql2)

(cl3+cr1, qr1)

(cl3+cr2, ql3)

(cl1, ql1)

(cl2, ql2)

(cl3, ql3)

(cr1, qr1)

(cr2, qr2)

(cl1, ql1)

(cl2, ql2)

(cl3, ql3)

(cr1, qr1)

(cr2, qr2)

(cl1, ql1)

(cl2, ql2)

(cl3, ql3)

(cr1, qr1)

(cr2, qr2)

Left candidate

s

Right candidate

s

Page 46: Buffer Insertion

Van Ginneken Complexity

• Generate candidates from sinks to source

• Quadratic runtime– Adding a wire does not change #candidates

– Adding a buffer adds only one new candidate

– Merging branches additive, not multiplicative

– Linear time solution list pruning

• Optimal for Elmore delay model

Page 47: Buffer Insertion

Multiple Buffer Types

(v1, 1, 20)22

v1

v1

(v2, 3, 16)

• r = 1, c = 1

• Rb1 = 1, Cb1 = 1, tb1 = 1

• Rb2 = 0.5, Cb2 = 2, tb2 = 0.5

• Rd = 1

(v2, 1, 12)v1

(v2, 2, 14)