CMOS Logic Circuit Design

Mattausch, CMOS Design, H21/6/12 1

Arithmetic Modules (Part 2)• Circuits for Multiplication

– Manual Multiplication Process of Positive Binaries– Combinational Multiplier Circuits for Positive Multi-Bit Binaries – Sequential Multiplier Circuits for Positive Multi-Bit Binaries– Handling of the Sign Bit

• Circuits for Division– Manual Division Process of Positive Binaries– Combinational Divider Circuits for Positive Multi-Bit Binaries– Handling of the Sign Bit

• Parallel Arithmetic for Increased Throughput– Matrix Arrangement of Simple Arithmetic Modules– Important Application: Picture Processing

CMOS Logic Circuit Designhttp://www.rnbs.hiroshima-u.ac.jp/RNBS/

Link（リンク）: 研究所教員講義ノートの下 CMOS論理回路設計


Div

ider

Construction of the Datapath (Arithmetic Part)

Only digital processing systems for high-speed applications contain specialized multiplier/divider circuits in their datapath.

Control

Reg

iste

r

Add

er

Shift

er

Boo

lean

Uni

t

Mul

tiplie

r

Mul

tiple

xer

Bit 0Bit 1Bit 2Bit 3

Bit 30Bit 31

Dat

a-In

Dat

a-O

ut

Last Lecture

Today’s Lecture


Circuits for Multiplication- Manual Multiplication Process of

Positive Binaries - Combinational Multiplier Circuits for

Positive Multi-Bit Binaries- Sequential Multiplier Circuits for

Positive Multi-Bit Binaries - Handling of the Sign Bit


Manual Multiplication Process (Positive Binaries)

In the manual multiplication process partial products are calculated and added for obtaining the product.

4-Bit Example General Algorithm for N BitStart

Y0=?Add multiplicand

to product

Shift multiplicand left 1 bit

Shift multiplier right 1 bit

End

Nth repetition

Y0=1 Y0=0

No

Yes

0

Multiplier (Y):

Multiplicand (X):

1

0

0

0

0

1

1

00010000

00000001

+++

0001001

•

PartialProducts

(PP)Product (Z)


Mathematical Formulation (Positive Binaries)

The manual multiplication process of adding the partial products is used in CMOS digital multiplier circuits.

Multiplier (Y, N-bit binary):

Multiplicand (X, M-bit binary): X = (Xii= 0

M−1

∑ ⋅2i)

Y = (Yii =0

N −1

∑ ⋅2i)

Product (P,( M+N-1)-bit binary): Z = (Zkk= 0

N +M−1

∑ ⋅ 2i) = Xi ⋅ Yj ⋅2i + j( )

i = 0

M−1

∑⎡ ⎣ ⎢

⎤ ⎦ ⎥

j= 0

N−1

∑

= PPjj =0

N −1

∑

Partial Product (PPj,(j+M)-bit binary): PPj = 2j ⋅Yj⋅ (Xii= 0

M−1

∑ ⋅ 2i)⎡ ⎣ ⎢

⎤ ⎦ ⎥


Array-Multiplier Circuit (4-Bit Binary Example)

The combinational array-multiplier circuit directly imitates the manual adding of partial products.

X0

HA FA

X2X3

FA

X1

HA

FA FA

X2X3

FA

X1

HA

X0

FA FA

X2X3

FA

X1

HA

X0

X2X3 X1 X0

Y0

Y1

Y3

Y2

Z0Z1Z2Z3Z4Z5Z6Z7

FA = Full Adder

HA = Half Adder (Cin=0)

Cout,HA = A•B

SHA = A•B + A •B

Critical Path: 8 Adder Stages


Carry-Save Multiplier Circuit (4-Bit Example)

The carry-save multiplier circuit shortens the critical path by transferring the carry bit always to the next partial product.

Definition:Xi Yj

XiYj

Z3

HA

Z0Z1Z2Z4Z5Z6Z7

X0Y0X0Y1X1Y0

HA

X1Y1X2Y0

HA

X2Y1X3Y0X3Y1

FA

X0Y2

FA

X1Y2

FA

X2Y2

FA

X0Y3

FA

X1Y3

FA

X2Y3

HAFAFA

X3Y2

X3Y3

MergingAdder

Critical Path:6 Adder Stages


Wallace-Tree Multiplier Circuit (Principle)

The Wallace-tree moves all XiYj to the first or second full-adder (FA) level. Each FA serves as 3 to 2 compressor circuit.

Bit-slice in a carry-save multipliercircuit for the input bit of order 25

Z5

HA

X4Y1X5Y0

FA

X3Y2

FA

X2Y3

FA

X1Y4

FA

X0Y5

C5,1

C5,2

C5,3

C5,4

C6,1

C6,2

C6,3

C6,4

C6

Z5

FA

X4Y1X5Y0

FA

X3Y2

FA

X2Y3

FA

X1Y4 X0Y5

C5,1C6,1

C6,3

C6

C6,2 C5,2

C5,3

Same bit-slice in a Wallace-tree multiplier circuit has a shorter critical

path by 2 adder stages.


Wallace-Tree Multiplier Circuit (6•6 Example)

The Wallace-tree multiplier circuit has an irregular structure for the bit slices so that the layout becomes difficult.

In a Wallace-tree multiplier circuit each bit slice has a

different structure.However, the critical path

has the smallest number of full-adder stages.

2021

22

23

24

25

26

27

28

29210

Full-adder : +

Mergingadder

Bit-Slicefor 25


Circuits for Multiplication- Manual Multiplication Process of

Positive Binaries - Combinational Multiplier Circuits for

Positive Multi-Bit Binaries- Sequential Multiplier Circuits for

Positive Multi-Bit Binaries - Handling of the Sign Bit


Bit-Serial Multiplier Circuit

The bit-serial multiplier determines the product of X (M-bit) and Y (N-bit) in (M+1)•(N+1) clock cycles.

Basic multiplier-circuit elements

N-bit shiftregister

Definition:X Y

＆

X Y

Serial streamsleast-significantbit first

Clock for Y is(M+1)•CLK


Serial-Parallel Multiplier Circuit (Example Y=4-Bit)

The serial-parallel multiplier feeds X (M-bit) serial and Y (N-bit) parallel. The product Z is determined in M+N clock cycles.

Each coefficient Zk(•2k) of the product is calculated in one

clock cycle for all k.

Multiplier bits in,least-significantbit first

X

Y0

Clock

FAClock

Cout

Cin

Y1

Clock

FA

Clock

Cout

Cin

Y2

Clock

Y3

FA

Clock

Cout

Cin

Product bits out,least-significantbit first

Z

Critical Path of N-1 Adder Stagesdetermines the clock cycle.

Critical Path:


Pipelined Serial-Parallel Multiplier Circuit (Example Y=4-Bit)

Again X (M-bit) and Y (N-bit) inputs are fed serial and parallel, respectively. Z is determined pipelined in M+2N clock cycles.

Each coefficient Zk(•2k) of the product is calculated in N-1

clock cycles, but pipelined, for all k.

Critical Path of 1 Adder Stageallows a shorter clock cycle.

Critical Path:

Multiplier bits in,least-significantbit first

Product bits out, least-significantbit first

X

Y0

Clock

FA

Clock

Cout

Cin

Y1

Clock

FA

Clock

Cout

Cin

Y2

Clock

Y3

FA

Clock

Cout

Cin

Z

Clock


Handling of the Sign Bit in Multiplications

The sign bit of the product is simply determined with an EXOR circuit from the sign bits of multiplicand and multiplier.

The sign bits SX and SYof multiplicand and multiplierdetermine the sign bit SZof the product.

X base2 = SXXM−1XM− 2 oo o X3X2X1X0

Ybase 2 = SYYN −1YN − 2o oo Y3Y2Y1Y0

SX SY SZ = SX⊗SY = SX•SY+ SX•SY

0 00 1

1 01 1

01

10

The sign bit SZ of the product is only 1 (negative)if SX and SY are different.


Circuits for Division- Manual Division Process of

Positive Binaries - Combinational Divider Circuits for

Positive Multi-Bit Binaries- Handling of the Sign Bit


Manual Division Process (Positive Binaries)

The division process recursively subtracts 2i•D from R (Rinitial=A). The sign of the result determines bit qi of Q.

Example: 4-Bit Divisor and 7 Bit Dividend

General Algorithm for M-Bit Divisor and N-Bit Dividend

Start1Quotient (Q):

0

0

1

0

0

1

10001

01101

0101

0Divisor (D): 1000 01

Remainder (R):

-Dividend (A):

0001-

01

D’=D•2N-1, Q=0, R=A

R=R-D’

R≥0Q=Q •21+20 Q=Q •21

R=R+D’D’=D’ •2-1

R≥D

End

noyes

yes

no


Basic Unit of a Combinational Divider Circuit

The quotient-bit-dependent restore can be realized with a multiplexer and the shift function can be realized by interconnecting the divisor bit to the next column.

The basic unit of a divider circuit has to provide a subtract, a quotient-bit dependent restore and a shift function for

the divisor bit.

=DC

Si

S’idj

CiCi+1

dj

qk

FA Ci

qk

Ci+1

dj

dj

Si

S’i

MUX01


Combinational Array-Divider Circuit (Example for 6-bit dividend and 3-bit divisor)

The critical delay path of array-divider circuits (N-bit dividend, M-bit divisor) contains (N-M/2)•(M-1) full-adder stages.

Critical Path:0

DC DC DC

DC DC DC

DC DC DC

a2

a1

a0

0

0

q2

q1

q0

r2 r1 r0

DC

a5

a4

a3

q3 DC DC DC

DC DCq4

q5

d0d1d2

0

0

0


Handling of the Sign Bit in Divisions

The sign bits of quotient and remainder are equal. An EXOR circuit of dividend and divisor sign bits determines them.

The sign bits SA and SDof dividend and divisordetermine the sign bits Sqand SR of the quotient andremainder.

A base2 = SAAM−1AM− 2o o o A3A2A1A 0

Dbase 2 = SDDN −1DN − 2o oo D3D2D1D0

SA SD SQ = SA =SA⊗SD

0 00 1

1 0

01

11 1 0

The sign bits Sqand SR of the quotient andremainder are only 1 (negative)if SA and SD are different.


Parallel Arithmetic for Increased Throughput

- Matrix Arrangement of Simple ArithmeticModules (Processing Elements)

- Important Application: Picture Processing


Parallel Processing with Many Simple Elements

High performance applications can be realized with simple application-specific processing elements (PE). Each PE has

the principle construction with datapath, control and memory.

Input/Output Control

Memory

Datapath

InterconnectUnit

Construct simple processing elements, which are optimized for

a target application.

ProcessingElement

(PE)

Input/Outputof PE


Parallel Processing Structures with PEs

The common structure for parallel processing with PEs are the linear structure and the matrix structure.

Linear structure forparallel processing PE PE PE PE PE

Global Data-Exchange Bus

Local Data-Exchange between PEs

Matrix structure forparallel processing

PE PE PE PE PE

PE PE PE PE PE

PE PE PE PE PE


Real-Time Picture Processing Needs a PE Matrix

High performance image processing needs the parallel processing power of a processing matrix with PEs.

CarCar

MotionMotion

ObjectRecognition

ObjectRecognition

ObjectTrackingObject

Tracking

Picture SegmentationPicture Segmentation

Region ExtractionRegion Extraction・・・

Intelligent Information Processing

Picture Segmentation• Natural image is partitioned into meaningful regions• Important initial task for higher level image processing


Research at RCNS: Video-Picture Segmentation

Video-picture segmentation is a hot research example for the necessity of parallel processing with a PE matrix.

Matrix-ProcessingNetwork

Interconnection Registers (Type 2)

Interconnection Registers (Type 1)Processing Element (PE)

P1 P2 P3

P4 P5 P6

P7 P8 P9

Color-Picture Segmentation Example

(RCNS: Research Center for Nanodevices and Systems)


Video-Picture Segmentation Test-Chip Design(RCNS: Research Center for Nanodevices and Systems)

森本高志 (M2)，原田洋明 (M1) の研究成果。（システムLSIを実現するためのハード設計資産およびソフト設計資産を対象とする、主要半導体メーカー１０社等からの賞。）

Cell-Network(10×10pixels)

4.31mm2

2.1mm

2.6mm

Cell-Network(10×10pixels)

4.31mm2

2.1mm

2.6mm PE

Type 2

Type 2Type 1

Type 1

213μm

207μm

Interconnection registers

PE

Type 2

Type 2Type 1

Type 1

213μm

207μm

Interconnection registers

The CMOS test-chip has a processingmatrix for 100 (10×10) pixels(Processing Elements: 100, Interconnection Registers: 121)

The area of the processing matrix is4.31mm2

CMOS Logic Circuit Design

Documents