SSS 4/9/99 CMU Reconfigurable Comput ing 1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu [email protected]
Dec 22, 2015
SSS 4/9/99 CMU Reconfigurable Computing 1
The CMU Reconfigurable Computing Project
April 9, 1999
Mihai Budiu
SSS 4/9/99 CMU Reconfigurable Computing 2
Current Project Members
ECE Department
Herman Schmit Srihari CadambiMatt MoeRobert TaylorRonald Laufer
CS Department
Seth Copen GoldsteinMihai Budiu
SSS 4/9/99 CMU Reconfigurable Computing 3
Why Study Reconfigurable Hardware?
It is a nice computation paradigm(wire your own computer)
SSS 4/9/99 CMU Reconfigurable Computing 4
Algorithm Year System Versus Speedup xDNA matching 1992 SPLASH 2 SPARC 10 4300
FIR Filter 1998 PipeRench UltraSparc300Mhz
90
IDEA Encryption 1998 PipeRench UltraSparc300Mhz
61
SAT solver 1997 Pamette SPARC 5110Mhz
17--1100
Ray Casting 1995 RIPP-10 Pentium75Mhz
33.8
Hidden MarkovModel
1996 1 Xilinx FPGA SPARC 10 24.4
DES Encryption 1996 GARP UltraSparc170Mhz
24
SPEC92 1994 MIPS+RC MIPS 1.22
Why Study Reconfigurable Hardware
SSS 4/9/99 CMU Reconfigurable Computing 5
Commercial Players
Source: In-stat April 1998 *Does not include software, hardwire or support EPROMs
SSS 4/9/99 CMU Reconfigurable Computing 6
What Is “Reconfigurable Hardware?”
Universal gates
and/or
storage elements
Interconnectionnetwork
Switches
SSS 4/9/99 CMU Reconfigurable Computing 7
Basic Ingredient: RAM cell
0001
Universal gate = RAM
a0
a1
a0
a1
dataa1 & a2
SSS 4/9/99 CMU Reconfigurable Computing 8
A switch is controlled by a 1-bit RAM cell
0
1
1
1
Basic Ingredients (ctd)
SSS 4/9/99 CMU Reconfigurable Computing 9
Outline
• What is reconfigurable hardware
• RH vs other computation paradigms
• Challenges in RH research
• PipeRench: the CMU project:– the hardware– the software
• Conclusions
SSS 4/9/99 CMU Reconfigurable Computing 10
RH vs ASICs• Generally Application-Specific Integrated Circuits
will be faster than RH:– RH wires are slow & big– RH bit-slices are costly to interconnect– RH devices must store configuration on the chip
but• RH can be reprogrammed
– new algorithms– to fix bugs
• RH cheaper in small production• RH tolerates faults better• RH sometimes faster with staged computation
SSS 4/9/99 CMU Reconfigurable Computing 11
RH vs Microprocessors
• RH less flexible (like a VLIW with fixed instructions)
but• RH provides more (customized)
computation elements• RH can decrease memory traffic• RH can be tailored for specific algorithms
and data types
RH will not replace mP, but complement them
SSS 4/9/99 CMU Reconfigurable Computing 12
Types of RH
• FPGAs: bit-level logic functionality(the basic processing elements compute on 1 bit)
• word-based architectures: PipeRench (CMU)(basic PE operates on 8 bits)
(basic PE is a small ALU)
• coarse architectures: RAW (MIT)(basic PE is a MIPS 2000 core)
SSS 4/9/99 CMU Reconfigurable Computing 13
RH In A SystemTitle:(coupling)Creator:(FrameMaker 5.5 PowerPC: LaserWriter 8 8.5.1)Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.
SSS 4/9/99 CMU Reconfigurable Computing 14
Challenges In RC
• Software tools:– Programming RC like software development– Automatic compilation from HLL– Automatic program partitioning
• Mapping efficiently algorithms (no ISA)• System issues
– interfaces– find “ideal” RC fabric
SSS 4/9/99 CMU Reconfigurable Computing 16
Hardware Goals
• To build a complete reconfigurable hardware device
• To build the system integration hardware
• To host the device in a PC
SSS 4/9/99 CMU Reconfigurable Computing 17
Our Device:
• Word processing elements
• Pipelined architecture
• Virtualized hardware
• Local interconnection network
• Wide pipelined bus
SSS 4/9/99 CMU Reconfigurable Computing 18
Configurationmemory
Stripes
Data & Configcontroller
Processingelements
SSS 4/9/99 CMU Reconfigurable Computing 19
Hardware Virtualization
Instructionscurrently in hardware
Instructions paged out
Actual availablehardware
Prog
ram
SSS 4/9/99 CMU Reconfigurable Computing 20
Hardware Virtualization (2)
compute
compute
compute
configurePage in
Page out
Program in configurationmemory
hardware
Overlap configuration with computation.
SSS 4/9/99 CMU Reconfigurable Computing 21
Processing Elements
• Look-up table• Any 3-to-1 function
a b
Cin
out
PE2 PE0PE1
SSS 4/9/99 CMU Reconfigurable Computing 22
The Interconnection Network
Word-level cross-bar
P*B bits
Pass Registers
0
P*B*N bits
B bits
PEPE N PE 1
SSS 4/9/99 CMU Reconfigurable Computing 23
The PCI BoardTitle:chip.epsCreator:fig2dev Version 3.2 Patchlevel 0-beta3Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.
SSS 4/9/99 CMU Reconfigurable Computing 25
Software GoalTo program reconfigurable devices using the standard
software development processes:
– Compile C or Java– Do it quickly
Partitioner
DIL
Java
Data-flow Intermediate Language
Configuration
Reconfigurable HW CPU
Built
SSS 4/9/99 CMU Reconfigurable Computing 26
Building Circuits From DIL
a = b + c * d;
e = c - d;
• variables wires• operators gates
+
*
cb d
a
-
e
SSS 4/9/99 CMU Reconfigurable Computing 27
Mapping Circuits To
-
+
a b c
-
+
a b c
-+
a b c
-+
a b c
SSS 4/9/99 CMU Reconfigurable Computing 28
The DIL Compiler Front-End
Parser
Evaluator
Loader
Loader
Dil
input file
Circuit
component
library
Component
circuits
Backend
SSS 4/9/99 CMU Reconfigurable Computing 29
The DIL Compiler BackendCircuit
(expanded)
OptimizerPlacer-
Router
CircuitCircuit
(placed)
Code generator
AsmC++
Front-end
C++xfig
The whole compilation process is very fast (compared to classical CAD tools).
We can compile two orders of magnitude faster.
SSS 4/9/99 CMU Reconfigurable Computing 30
Small Big
Efficient usage Wasteful
Slower Faster bit-slice
Flexible interconnect Coarse routing
Bigger configuration Fewer configuration bits
Place and route easier Constrains the compiler
Processing Element Size Tradeoffs
SSS 4/9/99 CMU Reconfigurable Computing 31
Stripe Width Tradeoffs
Wider NarrowerFewer stripes More will fit
Virtualize more Fewer page-insBandwidth waste Less bandwidth available
Placer freedom Placement constrained
SSS 4/9/99 CMU Reconfigurable Computing 32
Wider Narrower
More area Less area
High bandwidth Time-mux bus
Bus Width Tradeoffs
SSS 4/9/99 CMU Reconfigurable Computing 33
Clock Speed Tradeoffs(run-time)
Faster Slower
Short critical path Big chains
Long pipeline built Compact circuits
Decomposition overhead Little decomposition
Virtualized more Less virtualized
+24
2424+
++
2424
24
88
8
SSS 4/9/99 CMU Reconfigurable Computing 34
Configuration Bits per Stripe
0
200
400
600
800
1000
1200
1400
1600
64 80 96 112 128 144Stripe Width
Co
nfi
gu
rati
on
Bit
s
2 4 8 16 32
PE bit width
SSS 4/9/99 CMU Reconfigurable Computing 35
Title:(fir-throughput.eps)Creator:Adobe Illustrator(TM) 7.0Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.
SSS 4/9/99 CMU Reconfigurable Computing 36
Project Status• Operational:
– Behavioral and structural models of Piperench in Verilog
– Assembler, simulator– Tools for visualization and debugging– One tile fabricated and tested– Very fast compiler from intermediate language
• In work:– Prototype PipeRench to be taped this summer – PCI board to host PipeRench in a PC
SSS 4/9/99 CMU Reconfigurable Computing 37
Simulated Speed-up vs. UltraSparc @ 300Mhz
328.8
29.020.6
90.961.8
26.0
76.1
1.0
10.0
100.0
1000.0
ATR Cordic DCT FIR IDEA Nqueens Over
SSS 4/9/99 CMU Reconfigurable Computing 38
Future Work
• Build the PCI board
• Build the OS device drivers
• Start investigating HLL issues:– automatic partitioning– translation to DIL– special code transformations
SSS 4/9/99 CMU Reconfigurable Computing 39
Conclusions
• A set of important applications can benefit from RC devices
• RC offer potential for substantial performance improvement at a low cost
• RC devices will soon be mainstreamin the embedded computing world; perhaps in the future they will also permeate the desktop Pentium V
UVR