Top Banner
mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and Systems Engineering Dept. Rensselaer Polytechnic Institute
45

Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/1

Critical ALU Path Optimization and Implementation in a

BiCMOS Process for Gigahertz Range Processors

Matthew W. Ernest

Electrical, Computer and Systems Engineering Dept.

Rensselaer Polytechnic Institute

Page 2: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/2

Overview

• Motivation

• Parallel Prefixes and Carry Types

• HBT Digital Circuits

• Pseudo-carry Adder

• Future Directions

Page 3: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/3

Motivation

“Speed has always been important otherwise one wouldn't need the computer.” -Seymour

Cray

• Ubiquity

• Simplicity

• Complexity

Page 4: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/4

Parallel Prefixes

• The set of problems covering sequences of operations where terms are added in order to the result of the previous operation

• Carry computation is an application of parallel prefix theory

Given: x0 x1 x2 ... xk

Find: x0 x0 x1 x0 x1 x2 ... x0 x1 x2... xk

Page 5: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/5

Carry types: Carry Select• Compute possible results in

parallel• Select when actual carry-in

available• Requires internal carry for

blocks, e.g. ripple• Delay: O(f(n/b) +b), min.

O(n1/2)• Area: O(f(n/b)b+b), approx.

2n • Affected by block sizing

0

1

0

1

Page 6: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/6

Carry Types: Carry look-ahead

• Carry-out can be “generated” at current position or carry-in “propagated”

• Delay: O(1)• Area: O(n2)• High fan-in/fan-out

Page 7: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/7

Carry Types: Block carry look-ahead

• A block propagates a carry if all bits in the block propagate a carry

• A block generates a carry if a bit generates a carry and all succeeding bits propagate

• Delay: O(log n)

• Area: O(n log n)

Page 8: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/8

Block carry look-ahead trees

Page 9: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/9

Carry vs. Pseudo-carryCout=Gn+ Pn• Gn-1 +…+Pn• Pn-1• ... P0• Cin

If G=A•Band P=A+Bthen

G=G•PCout= Pn•Gn+ Pn• Gn-1 +…+Pn• Pn-1• ... P0• Cin

Cout= Pn(Gn+ Gn-1 +…+Pn-1• ... P0• Cin)Cout= Pn•Hn

Hn =Gn+ Gn-1 +…+Pn-1• ... P0• Cin

Page 10: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/10

Carry vs. Pseudo-carry

• Redundant terms create factorization opportunities

• Factorization moves terms from critical paths to non-critical paths

• Multiple paths can be parallelized

• Products with fewer terms lead to implementations with smaller, faster gates

Page 11: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/11

Block Generate:Gi•j

0= Gij + Pi

jGij-1i + … + Pi

jPij-1iPi

j-2i•••Gi0

If G=A•Band P=A+Bthen

G=G•PGi•j

0= PijGi

j + PijGi

j-1i + … + PijPi

j-1iPij-2i•••Gi

0

Gi•j0= Pi

j(Gij + Gi

j-1i + … + Pij-1iPi

j-2i•••Gi0)

Hi•j0= Gi

j + Gij-1i + … + Pi

j-1iPij-2i•••Gi

0

Deriving Block Pseudo-carry from Block Carry Look-ahead Terms

• Pseudo-carries can be generated in blocks like carries

Page 12: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/12

H2s= G1

s+1 + G1s

Hi+js= Hj

s+i + Ijs+i-1•Hi

s

Hi+j+ks= Hk

s+I+j + Iks+I+j-1•Hj

s+i + Iks+I+j-1• Ij

s+i-1•His

Ip+qt= Iq

t+p•Ipt

Ip+q+rt= Ir

t+q+p•Iqt+p•Ip

t

Generalized Pseudocarry Equations

Page 13: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/13

Sn=AnBnCn-1

IfTn=AnBn

Cm= Pm•Hm

thenSn=TnPn-1Hn-1

Generating Sums Using Pseudocarry

• Sum with pseudo-carry no more complex than sum with carry

• Other look-ahead features still apply, e.g. Han-Carlson “every other carry”

Page 14: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/14

Adder comparision

Bits Rip

ple

CSelA B C CLA

PC

LA

32 32 12 12 9 6 5

64 64 20 16 12 7 6

Page 15: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/15

HBT Digital Circuits

• Exponential I/V relationship leads to high gain and fast switching

• Vertical arrangement allows critical dimensions to be smaller with tighter tolerances

• Traditionally high DC power consumption: compare increasing leakage and switching currents for FETs

Page 16: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/16

Current Steering Logic• Constant current source equals

combined emitter currents• Ratio of current through each

transistor is exp. function of base voltage

• Difference in currents at collector converted to difference in voltage on pull-up resistors.

Page 17: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/17

Single-ended vs. Double-ended

• Limited to simple functions

• Large fan-in

• Any function of inputs• Fan-in limited by supply

voltage

Page 18: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/18

Look-ahead gate w/ fully differential logic

Hn

In

Hn-1 Hn-1

In

Hn

Hn Hn

In In

Hn-1 Hn-1

Hn-2 Hn-2

In-1 In-1

Page 19: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/19

Mixed input look-ahead gates

Hn

In

Hn-1

In

HnVr Vr • In(Hn+ Hn-1) + In•Hn

• Hn+ In•Hn-1

• Two series-gated levels for three inputs

Page 20: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/20

Hn Hn

InIn

Hn-1 Hn-1Hn-2

In-1 In-1

Hn

Mixed input look-ahead gates

• In In-1(Hn+ Hn-1 + Hn-2) + In

In-1(Hn+ Hn-1) + In• In-1• Hn

• Hn+ In•Hn-1 + In• In-1• Hn-2

• Three series-gated levels for five inputs

Page 21: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/21

Pseudocarry BlocksH2

sH2

s H2s

H2s H2

sH2

s H2s

H2s H2

sH2

s H2s

H2s H2

sH2

s H2s

H2s H2

sH2

s H2s

H2s H2

sH2

s H2s

H2s H2

sH2

s H2s

H2s H2

sH2

s H2s

H2s

H6s

H6s H6

sH6

s H6s

H6s H6

sH6

s H6s

H6s

H18s

H18s H14

sH14

s

H32s

H32s

Page 22: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/22

Pseudocarry Tree Oscillator

B A

Cin

Cout

32

031

1

1 Select

Page 23: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/23

Carry Tree High-speed Output

2 x 165 ps

Page 24: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/24

Breakdown of measured delay

Devices

71%

Wire C

12%

Temperature

6%

Resistor model

11%

Total measured delay = 165 ps

Page 25: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/25

Loaded vs. unloaded toggling

• At design time, fT peak at 1.2mA/um2 but limit at 2mA/um2

• For some devices, max. frequency when driving load can occur above fT peak current

• Models supported this, no reason at time to not believe them

• However, models are never qualified above fT peak current!

Page 26: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/26

Loaded vs. unloaded toggling

0.00E+00

1.00E-11

2.00E-11

3.00E-11

4.00E-11

5.00E-11

6.00E-11

7.00E-11

8.00E-11

0.00E+00 5.00E-04 1.00E-03 1.50E-03 2.00E-03 2.50E-03

Tail Current

Bu

ffer

Del

ay

Page 27: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/27

Resistor Model Effects9805A 99B

Simulated Fabricated

Pull-up 444 528

Tail 1000 1091

Page 28: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/28

Model parameter variation

0

50

100

150

200

250

300

350

400

450

500

9708A 9802 9805 1999B v2.3

Design Kit

Par

amte

r val

ue RB (ohms)

RE (ohms)

RC (ohms)

DARPA02 Design DARPA02 Fabrication

Page 29: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/29

Cadence internal parasitic methods

• Approximates all capacitance as polynomial function of distance between conductors

• Cannot extract RC and capacitance between conductors at the same time: killer for differential wiring!

• Convenient, but window of usability small and shrinking

Page 30: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/30

QuickCap capacitance extraction

• Field solving with floating random walk method

• Accuracy almost wholly a function of run time: 4x run time give ½ error

• Random walks independent, near perfect parallelization

Page 31: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/31

Comparing parasitic extraction

0

5

10

15

20

25

30

35

40

45

50

0 200 400 600 800 1000 1200

Length (um)

Dela

y (

ps) Qcap RC

RCNET

PCAP

Calc RC

Page 32: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/32

Cadence/QuickCap Design Flow• Extract physical data

from layout

• Compute RC with QuickCap

• Extract netlist from schematic

• Combine to simulate with Spectre

Page 33: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/33

Partial manual extraction with QuickCap

• Identify main wires of oscillation paths: approx. dozen pairs

• QuickCap extraction for each wire-ground cap. and cap. between pair

• Add RC-ladder for each pair by hand to schematic and simulate

Page 34: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/34

Simulation with Parasitic Extraction

Feedback path

w/o parasitics

(ps)

QuickCap parasitic cap.

(ps)

COEFGEN parasitic cap.

(ps)

Raphael parasitic

cap.(ps)

QuickCap parasitic

RC(ps)

Cin 100 121 128 131 135

A1 103 123 130 129 137

A31 108 127 129 132 141

Page 35: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/35

Pseudo-carry Tree configured as Ring Oscillator

B A

Cin

Sel0Sel1

Cout

32 30 1

1

1

00...00 11...11

Page 36: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/36

SMI00 Test Structure Layout

Page 37: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/37

SMI00 Test Structure

Page 38: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/38

Carry Tree High-speed Outputs

16 x 146 ps

Page 39: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/39

Comparisons of published adders

Reference Type Size Gate Del. TimeZIMM96 Carry 32 5 -STEL96 Adder 64(32) 12.5(12?) -WANG97 Adder 32 3 2.7nsCHAN98 Adder 64(32) 27(19.5) -SILB98 Fixed 64 - 550 psAIPP99 Adder 64 - 660 psSAGE01 Adder 32[16x2] - <500psMATH01 Adder 64 - 482 psSTAS01 Adder 64 - 440 psLEE02 Adder 64   900 psVANA02 ALU 32 8 <200 ps

Page 40: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/40

Cascode Output Stage• Eliminates Miller

capacitance between input and output

• Reduces Cjc and Cjs on outputs

• Shortens rise time, but increases delay

Page 41: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/41

Dotted Emitter/Collector

Page 42: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/42

“Wide/Short” gate with dotted emitter/collector

Page 43: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/43

“Wide/Short” gate with dotted emitter/collector

• Shorter trees lead to lower supply voltages• Wider trees reduce ratio of emitter-followers to

terms computed, lowering total current• More inputs per look-ahead gate means fewer

look-ahead levels• Elimination of single-ended inputs on critical H

signals allow faster switching with reduced swing

Page 44: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/44

Even wider look-ahead gate

Width limited by• Accumulated Cjc and Cjs of dotted-and node• Saturation vs. breakdown• Fan-out loading from inputs and interconnect

Page 45: Mwe/PHD/1 Critical ALU Path Optimization and Implementation in a BiCMOS Process for Gigahertz Range Processors Matthew W. Ernest Electrical, Computer and.

mwe/PHD/45

Conclusions

• 32-bit addition depth reduced to 5 gates fabricated. 4 and 3 gate depth circuits designed.

• Gate to compute 3-way look-ahead fabricated. Up to 8-way look-ahead designed.

• Carry delay for 32-bit addition measured at 146ps.• QuickCap technology file for 5HP brings

simulated results within 11% of measured.