Page 1
1
Lecture 16: Arithmetic Units
• Basic Building Blocks • Adders
Building Blocks for Digital Architectures
• Arithmetic Unit – Bit-sliced datapath (adder, multiplier, shifters, etc)
• Memory – RAM, ROM, Buffers, Shift Registers
• Control – Finite State Machine (PLA, random logic) – Counters
• Interconnect – Switches – Arbiters – Bus
MEMORY
DATAPATH
CONTROL
INPUT-OUTPUT
Page 2
2
3
An Intel Microprocessor
9-1
Mux
9-1
Mux
5-1
Mux
2-1
Mux
ck1
CARRYGEN
SUMGEN+ LU
1000um
b
s0
s1
g64
sum sumb
LU : LogicalUnit
SUM
SEL
a
to Cachenode1
REG
Itanium has 6 integer execution units like this
4
Bit-Sliced Datapath
Adder stage 1
Wiring
Adder stage 2
Wiring
Adder stage 3
Bit slice 0
Bit slice 2
Bit slice 1
Bit slice 63
Sum Select
Shifter
Multiplexers
Loopback Bus
From register files / Cache / Bypass
To register files / Cache
Loopback Bus
Loopback Bus
Page 3
3
5
Itanium Integer Datapath
Fetzer, Orton, ISSCC’02
Adders
• Single-bit Addition • Carry-Ripple Adder • Carry-Skip Adder • Carry-Select Adder • Carry-Lookahead Adder • Tree Adder
• Reading: Chapter 10, W&H
Page 4
4
Single-Bit Addition
Half Adder Full Adder
A B Cout S 0 0 0 1 1 0 1 1
A B C Cout S 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1
A B
S
Cout
A B
C
S
Cout
S = Cout = S = A U B
Cout = AB
Full Adder Design I
• Brute force implementation from eqns
out ( , , )S A B C
C MAJ A B C= ⊕ ⊕
=
A B C S
C out
MA
J
A B C
A
B B B
A
C S
C C C
B B B
A A
C B A A B C
A B C
B A
C out C
A A B B
Page 5
5
Complimentary Static CMOS Full Adder
A B
B
A
Ci
Ci A
X
VDD
VDD
A B
Ci BA
B VDD
A
B
Ci
Ci
A
B
A CiB
Co
VDD
S
A Better Structure: The Mirror Adder
V DD
C i A
B B A
B
A
A B Kill
Generate "1"-Propagate
"0"-Propagate
V DD
C i
A B C i
C i
B
A
C i
A
B B A
V DD
S C o
Page 6
6
Mirror Adder
Stick Diagram
CiA B
VDD
GND
B
Co
A Ci Co Ci A B
S
Layout
• Clever layout circumvents usual line of diffusion – Use wide transistors
on critical path – Eliminate output
inverters
Page 7
7
Carry Propagate Adders
• N-bit adder called CPA
+
BN...1AN...1
SN...1
CinCout 1 1111 +0000
A 4...1 carries
B 4...1 S 4...1
C in C out 0 1111 +0000
C in C out
Carry-Ripple Adder
• Simplest design: cascade full adders
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Page 8
8
4-bit ripple carry adder
SS
Cout
A
B
C
Cout
MINORITY
A1
SS
Cout
A
B
C
Cout
MINORITY
SS
Cout
A
B
C
Cout
MINORITY
SS
Cout
A
B
C
Cout
MINORITY
SS
Cout
A
B
C
Cout
MINORITY
B1
C1
A0 B0
C0
A2 B2
C2
A3 B3
C3
S0
S1
S2
S3
C4
4-bit ripple carry adder
A B C
B A
C out C A
A B B
A B B B
A
C S C
C C B B
B A A
A B C
B A
C out C A
A B B
A B B B
A
C S C
C C B B
B A A
A B C
B A
C out C A
A B B
A B B B
A
C S C
C C B B
B A A
A B C
B A
C out C A
A B B
A B B B
A
C S C
C C B B
B A A
A
B B B
A
C S
C C C
B B B
A A
C B A A B C
A B C
B A
C out
C A
A B B
A0 B0
C0
A1 B1
C1
A2 B2
C2
S1
S0
A3 B3
C3
S2
C4
S3
Page 9
9
Inversion Property
A B
S
CoCi FA
A B
S
CoCi FA
S A B Ci, ,( ) S A B Ci
, ,( )=
Co A B Ci, ,( ) Co A B Ci
, ,( )=S S
C out
A B C
C out
Inversions
• Critical path passes through majority gate
Cout Cin
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Page 10
10
The Mirror Adder • The NMOS and PMOS chains are completely symmetrical
– A maximum of two series transistors in the carry-generation circuitry. • Critical to minimize capacitance at node Co
– Decrease diffusion capacitance – CO cap. composed of 4 diffusion cap, 2 internal gate cap, and 6 gate
cap in the connecting adder cell • The transistors connected to Ci are placed closest to the
output. – CI arrives late; internal diff. cap discharged/charged by A and B
• Only the transistors in the carry stage have to be optimized for speed – All transistors in the sum stage can be minimal size.
Adder design
• Adder delay is limited by critical path – Carry generation and propagation
• High-speed adder design tries to minimize delay associated with carry generation and propagation
• Need a structured way of looking at this – Introduce Carry Generate and Propagation notation
Page 11
11
Carry Generation and Propagation • For a full adder, define what happens to carries
– Generate: Cout = 1 independent of C • G =
– Propagate: Cout = C • P =
– Kill: Cout = 0 independent of C • K =
Carry Generate/Propagate
Ci=AiBi + (Ai + Bi)Ci-1 Ci=AiBi + (Ai + Bi)Ci-1 Ci=Gi + PiCi-1
Cout=G+PCin
Carry of Mirror Adder
G = A • B P = A ⊕ B
Single bit Notation Where Gi:0=Ci
A 0 A 1 A 2 A 3
A 4 A 5 A 6 A 7
F
When does this Generate and Propagate a Carry?
What about more complex adders?
Page 12
12
Generate / Propagate
• Equations often factored into G and P • Generate and propagate for groups spanning i:j
• Base case
• Sum:
0:00:00inGCP==0:00:00inGCP==
Addition: 3 step process
• Bitwise PG Logic: Compute bitwise generate and propagate signals
• Group PG Logic: Combine these signals to determine group generates Gi-1:0 for all N ≥ i ≥ 1
• Sum Logic: calculate sums using
Page 13
13
PG Logic
S1
B1A1
P1G1
G0:0
S2
B2
P2G2
G1:0
A2
S3
B3A3
P3G3
G2:0
S4
B4
P4G4
G3:0
A4 Cin
G0 P0
1: Bitwise PG logic
2: Group PG logic
3: Sum logicC0C1C2C3
Cout
C4
Carry-Ripple Revisited
:0 1:0 i i i iG G P G −= + g
S1
B1A1
P1G1
G0:0
S2
B2
P2G2
G1:0
A2
S3
B3A3
P3G3
G2:0
S4
B4
P4G4
G3:0
A4 Cin
G0 P0
C0C1C2C3
Cout
C4
G1:0 = G1 + P1 G0:0 G2:0 = G2 + P2 G1:0 G3:0 = G3 + P3 G2:0
Si=Pi xor Gi:0
Gi=Ai Bi Pi=Ai xor Bi
Page 14
14
PG Diagram Notation
i:j
i:j
i:k k-1:j
i:j
i:k k-1:j
i:j
Gi:k
Pk-1:j
Gk-1:j
Gi:j
Pi:j
Pi:k
Gi:k
Gk-1:j
Gi:j Gi:j
Pi:j
Gi:j
Pi:j
Pi:k
Black cell Gray cell Buffer
Carry-Ripple PG Diagram
Delay
0123456789101112131415
15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
Bit Position
ripple xor( 1)pg AOt t N t t= + − +