Processors, FPGAs, and ASICs Stephen A. Edwards Columbia University Spring 2015
Processors, FPGAs, and ASICs
Stephen A. Edwards
Columbia University
Spring 2015
Spectrum of IC choicesFlexible, efficient
Cheap, quick to design
Full Custom
ASIC
Gate Array
FPGA
PLD
GP Processor
SP Processor
Multifunction
Fixed-function
You choose
Polygons (Intel)
Circuit (Sony)
Wires
Logic network
Logic function
Program (e.g., ARM)
Program (e.g., DSP)
Settings (e.g., Ethernet Ctrl.)
Part number (e.g., 74HCT00)
An N-Channel MOS Transistor
p (holes)
n n
3 V
0 V+
+
+
Ammeter
0
Drain Source
GateSiO2
Gate at 0V: Off
An N-Channel MOS Transistor
p (holes)
n n
3 V
3 V+
+
+
Ammeter
0
Drain Source
GateSiO2
+++++++++
− − − − − − −−
Gate positive: On
CMOS Inverter Layout
Y
Vss
Vdd
A
Cross Section ThroughN-channel FET
A
Vss
Vdd
Y
Top View
The CMOS NAND Gate
Y
A
B
AB
Y
Two-input NAND gate:
two n-FETs in series;
two p-FETs in parallel
The CMOS NAND Gate
Y
A
B
AB
Y
0
0
00
1
1
Both inputs 0:
Both p-FETs turned on
Output pulled high
The CMOS NAND Gate
Y
A
B
AB
Y
0
1
01
1
1
One input 1, the other 0:
One p-FET turned on
Output pulled high
One n-FET turned on, but does notcontrol output
The CMOS NAND Gate
Y
A
B
AB
Y
1
1
11
0
0
Both inputs 1:
Both n-FETs turned on
Output pulled low
Both p-FETs turned off
Full Custom: Intel 4004 Masks
Full Custom: Intel 4004 Die Photograph
Standard Cell ASICs
Standard Cell ASICs
Channeled Gate Arrays
Channeled Gate Arrays
Sea-of-Gates Gate Arrays
FPGAs: Floorplan
DLL DLL
DLLDLL
BLO
CK
RA
MB
LOC
K R
AM
BLO
CK
RA
MB
LOC
K R
AM
I/O LOGIC
CLBs CLBs
CLBs CLBs
FPGAs: Routing
FPGAs: CLB
I3
I4
I2
I1
Look-UpTable
D
CK
EC
Q
R
S
I3
I4
I2
I1
O
O
Look-UpTable
D
CK
EC
Q
R
SXQ
X
XB
CE
CLK
CIN
BX
F1
F2
F3
SR
BY
F5IN
G1
G2
YQ
Y
YB
COUT
G3
G4
F4
Carryand
ControlLogic
Carryand
ControlLogic
PLAs/CPLDs: The 22v100 4 8 12 16 20 24 28
Increments
FirstFuseNumbers
32 36 40
Macro-cell
R = 5809P = 5808
R = 5811P = 5810
R = 5813P = 5812
R = 5815P = 5814
R = 5817P = 5816
Asynchronous Reset
23
22
21
20
19
1
2
3
4
5
(to all registers)
396
0
440
880
924
1452
1496
2112
2156
2860
I/O/Q
I/O/Q
I/O/Q
I/O/Q
I/O/Q
I
I
I
I
CLK/I
Macro-cell
Macro-cell
Macro-cell
Macro-cell
Example: Euclid’s Algorithm
int gcd(int m, int n){
int r;while ((r = m % n) != 0) {
m = n;n = r;
}return n;
}
i386 Programmer’s Model
31 0eax Mostlyebx General-ecx Purposeedx Registers
esi Source indexedi Destination indexebp Base pointeresp Stack pointer
eflags Status wordeip Instruction Pointer
15 0cs Code segmentds Data segmentss Stack segmentes Extra segmentfs Data segmentgs Data segment
Euclid on the i386gcd: pushl %ebp
movl %esp,%ebppushl %ebxmovl 8(%ebp),%eaxmovl 12(%ebp),%ecxjmp .L6
.L4: movl %ecx,%eaxmovl %ebx,%ecx
.L6: cltdidivl %ecxmovl %edx,%ebxtestl %edx,%edxjne .L4movl %ecx,%eaxmovl -4(%ebp),%ebxleaveret
SPARC Programmer’s Model
31 0r0 Always 0r1 Global Registers...
r7r8/o0 Output Registers
...r14/o6 Stack Pointerr15/o7r16/l0 Local Registers
...r23/l7
31 0r24/i0 Input Registers
...r30/i6 Frame Pointerr31/i7 Return Address
PSW Status WordPC Program Counter
nPC Next PC
SPARC Register Windows
The output registers ofthe calling procedurebecome the inputs tothe called procedure
The global registersremain unchanged
The local registers arenot visible acrossprocedures
r8/o0...
r15/o7r16/l0
...r23/l7
r8/o0 r24/i0... ...
r15/o7 r31/i7r16/l0
...r23/l7
r8/o0 r24/i0... ...
r15/o7 r31/i7r16/l0
...r23/l7r24/i0
...r31/i7
Euclid on the SPARC
gcd:save %sp, -112, %spmov %i0, %o1b .LL3mov %i1, %i0mov %i0, %o1b .LL3mov %i1, %i0
.LL5:mov %o0, %i0
.LL3:mov %o1, %o0call .rem, 0mov %i0, %o1cmp %o0, 0bne .LL5mov %i0, %o1retrestore
Motorola DSP56301
Figure 1-1.
PLL OnCE™
ClockGenerator
Internal DataBus
Switch
Program RAM4096 × 24(Default)
YABXABPAB
YDBXDBPDBGDB
MODC/IRQBMODB/IRQC
ExternalData Bus
Switch
14
MODA/IRQD
DSP56300
652
24-Bit
24
24
X DataRAM
2048 × 24(Default)
Y DataRAM
2048 × 24(Default)
DDB
DAB
Memory Expansion Area
Peripheral
Core
YM
_EB
XM
_EB
PM
_EB
PIO
_EB
Expansion Area
6
SCI
JTAG
3
RESET
MODD/IRQA
PINIT/NMI
2
Boot-strapROM
EXTAL
XTAL
ADDRESS
CONTROL
DATA
TripleTimer
HostInterface
(HI32)
ESSI
AddressGeneration
UnitSix ChannelDMA Unit
ProgramInterrupt
Controller
ProgramDecode
Controller
ProgramAddress
Generator
Data ALU24 × 24 + 56 → 56-bit
Two 56-bit Accumulators56-bit Barrel Shifter
PowerManagement
ExternalBus
Interface and
I - CacheControl
ExternalAddress
BusSwitch
5
DE
MAC
DSP 56000 Programmer’s Model
55 4847 2423 0x1 x0 Sourcey1 y0 Registers
a2 a1 a0 Accumulatorb2 b1 b0 Accumulator
15 0r7...
r4
r3...
r0
15 0n7...
n4
n3...
n0
15 0m7
...m4
m3...
m0
AddressRegisters
15 0Program CounterStatus RegisterLoop AddressLoop Count
15 PC Stack...0
15 SR Stack...0
Stack pointer
Motorola DSP56301 ALU
Bit Field Unit and Barrel Shifter
AccumulatorShifter
Immediate Field
48
56
24
24
56
56
56
56
X Data Bus
Y Data Bus
2424
X0
X1
Y0
Y1
24 24
Multiplier
Accumulatorand Rounding Unit
A (56)
B (56)
Shifter/Limiter
Pipeline Register
P Data Bus
MUX
56
56
Forwarding Register
56
Motorola DSP56301 AGU
N0
N1
N2
N3 M3
M2
M1
M0
AddressALU
AddressALU
R0
R1
R2
R3 R7
R6
R5
R4 M4
M5
M6
M7 N7
N6
N5
N4
Triple Multiplexer
Low Address ALU High Address ALU
XAB YAB PAB
Program Address Bus
EP
Global Data Bus
FIR Filter in 56000
move #samples, r0move #coeffs, r4move #n-1, m0move m0, m4movep y:input, x:(r0)clr a x:(r0)+, x0 y:(r4)+, y0
rep #n-1mac x0,y0,a x:(r0)+, x0 y:(r4)+, y0
macr x0,y0,a (r0)-movep a, y:output
TI TMS320C6000 VLIW DSP
ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ
ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ
ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ
ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ
ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ
ÁÁÁÁÁÁÁÁÁÁÁÁ
ÁÁÁÁÁÁÁÁÁÁÁÁ
ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ
ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ
2X
1X
.L2
.S2
.M2
.D2
(B0–B15)
(A0–A15)
ÁÁ
ÁÁ
Á
ÁÁÁ
ÁÁÁÁÁÁ
ÁÁÁÁ
ÁÁÁÁ
ÁÁÁÁ
ÁÁ
ÁÁÁÁÁÁ
ÁÁ
ÁÁ
ÁÁ
ÁÁÁ
ÁÁÁ
ÁÁ
ÁÁÁ
ÁÁ .D1
.M1
ÁÁÁÁÁÁ
ÁÁÁ
ÁÁÁÁÁÁÁÁÁ
ÁÁ
.S1
ÁÁÁÁÁ
ÁÁÁ
.L1
long src
dst
src2
src1
ÁÁÁÁÁÁ
ÁÁÁ
ÁÁ
ÁÁÁÁ
src1
src1
src1
src1
src1
src1
src1
8
8
8
8
88
long dst
long dstdst
dst
dst
dst
dst
dst
dst
src2
src2
src2
src2
src2
src2
src2
long src
Controlregister
fileÁ
DA1
DA2
ST1
LD1
LD2
ST2
32
32
Data path A
Data path B
Register file A
Register file B
long srclong dst
long dstlong src
FIR in One ’C6 Assembly Instruction
FIRLOOP:
LDH .D1 *A1++, A2 ; Fetch next sample
|| LDH .D2 *B1++, B2 ; Fetch next coeff.
|| [B0] SUB .L2 B0, 1, B0 ; Decrement count
|| [B0] B .S2 FIRLOOP ; Branch if non-zero
|| MPY .M1X A2, B2, A3 ; Sample × Coeff.
|| ADD .L1 A4, A3, A4 ; Accumulate result
Load a halfword (16 bits)
Do this on unit D1
Use the cross path
Predicated instruction (only if B0 non-zero)
Run in parallel
AX88796 Ethernet Controller
MAC
Core
&
PHY+
Tranceiver
8K* 16 SRAM
and Memory Arbiter
Remote
DMA
FIFOs NE2000
Registers
Host Interface
STA
SEEPROM
I/F
SD[15:0] SA[9:0] Ctl BUS
MII I/F
EECS
EECK
EEDI
EEDO
TPI, TPO
SPP
/ GPIO Print Port
or
General
I/O
SMDC
SMDIO
Ethernet Controller Registers
PAGE 0 (PS1=0,PS0=0)
OFFSET READ WRITE
00H Command Register
( CR )
Command Register
( CR )
01H Page Start Register
( PSTART )
Page Start Register
( PSTART )
02H Page Stop Register
( PSTOP )
Page Stop Register
( PSTOP )
03H Boundary Pointer
( BNRY )
Boundary Pointer
( BNRY )
04H Transmit Status Register
( TSR )
Transmit Page Start Address
( TPSR )
05H Number of Collisions Register
( NCR )
Transmit Byte Count Register 0
( TBCR0 )
06H Current Page Register
( CPR )
Transmit Byte Count Register 1
( TBCR1 )
07H Interrupt Status Register
( ISR )
Interrupt Status Register
( ISR )
08H Current Remote DMA Address 0
( CRDA0 )
Remote Start Address Register 0
( RSAR0 )
09H Current Remote DMA Address 1
( CRDA1 )
Remote Start Address Register 1
( RSAR1 )
0AH Reserved Remote Byte Count 0
( RBCR0 )
0BH Reserved Remote Byte Count 1
( RBCR1 )
0CH Receive Status Register
( RSR )
Receive Configuration Register
( RCR )
0DH Frame Alignment Errors
( CNTR0 )
Transmit Configuration Register ( TCR )
0EH CRC Errors
( CNTR1 )
Data Configuration Register
( DCR )
0FH Missed Packet Errors
( CNTR2 )
Interrupt Mask Register
( IMR )
10H, 11H Data Port Data Port
12H IFGS1 IFGS1
13H IFGS2 IFGS2
14H MII/EEPROM Access MII/EEPROM Access
15H Test Register Test Register
16H Inter-frame Gap (IFG) Inter-frame Gap (IFG)
17H GPI GPOC
18H - 1AH Standard Printer Port (SPP) Standard Printer Port (SPP)
1BH - 1EH Reserved Reserved
1FH Reset Reserved
Philips SAA7114H Video Decoder
SAA7114H Registers, page 1 of 7 (!)
Fixed-function: The 7400 series
handbook, halfpage
1Y1A
31B
1
2
2Y2A
62B
4
5
3Y3A
83B
9
10
4Y4A
114B
12
13
Q0
D0
CP
OE
Q1
D1
Q2
D2
Q3
D3
Q4
D4
Q5
D5
Q6
D6
Q7
D7
D
CP Q
D
CP Q
D
CP Q
D
CP Q
D
CP Q
D
CP Q
D
CP Q
D
CP Q
7400 74374Quad NAND Gate Octal D Flip-Flop