Top Banner
SSS 4/9/99 CMU Reconfigurable Comput ing 1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu [email protected]
38
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 1

The CMU Reconfigurable Computing Project

April 9, 1999

Mihai Budiu

[email protected]

Page 2: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 2

Current Project Members

ECE Department

Herman Schmit Srihari CadambiMatt MoeRobert TaylorRonald Laufer

CS Department

Seth Copen GoldsteinMihai Budiu

Page 3: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 3

Why Study Reconfigurable Hardware?

It is a nice computation paradigm(wire your own computer)

Page 4: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 4

Algorithm Year System Versus Speedup xDNA matching 1992 SPLASH 2 SPARC 10 4300

FIR Filter 1998 PipeRench UltraSparc300Mhz

90

IDEA Encryption 1998 PipeRench UltraSparc300Mhz

61

SAT solver 1997 Pamette SPARC 5110Mhz

17--1100

Ray Casting 1995 RIPP-10 Pentium75Mhz

33.8

Hidden MarkovModel

1996 1 Xilinx FPGA SPARC 10 24.4

DES Encryption 1996 GARP UltraSparc170Mhz

24

SPEC92 1994 MIPS+RC MIPS 1.22

Why Study Reconfigurable Hardware

Page 5: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 5

Commercial Players

Source: In-stat April 1998  *Does not include software, hardwire or support EPROMs

Page 6: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 6

What Is “Reconfigurable Hardware?”

Universal gates

and/or

storage elements

Interconnectionnetwork

Switches

Page 7: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 7

Basic Ingredient: RAM cell

0001

Universal gate = RAM

a0

a1

a0

a1

dataa1 & a2

Page 8: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 8

A switch is controlled by a 1-bit RAM cell

0

1

1

1

Basic Ingredients (ctd)

Page 9: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 9

Outline

• What is reconfigurable hardware

• RH vs other computation paradigms

• Challenges in RH research

• PipeRench: the CMU project:– the hardware– the software

• Conclusions

Page 10: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 10

RH vs ASICs• Generally Application-Specific Integrated Circuits

will be faster than RH:– RH wires are slow & big– RH bit-slices are costly to interconnect– RH devices must store configuration on the chip

but• RH can be reprogrammed

– new algorithms– to fix bugs

• RH cheaper in small production• RH tolerates faults better• RH sometimes faster with staged computation

Page 11: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 11

RH vs Microprocessors

• RH less flexible (like a VLIW with fixed instructions)

but• RH provides more (customized)

computation elements• RH can decrease memory traffic• RH can be tailored for specific algorithms

and data types

RH will not replace mP, but complement them

Page 12: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 12

Types of RH

• FPGAs: bit-level logic functionality(the basic processing elements compute on 1 bit)

• word-based architectures: PipeRench (CMU)(basic PE operates on 8 bits)

(basic PE is a small ALU)

• coarse architectures: RAW (MIT)(basic PE is a MIPS 2000 core)

Page 13: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 13

RH In A SystemTitle:(coupling)Creator:(FrameMaker 5.5 PowerPC: LaserWriter 8 8.5.1)Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.

Page 14: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 14

Challenges In RC

• Software tools:– Programming RC like software development– Automatic compilation from HLL– Automatic program partitioning

• Mapping efficiently algorithms (no ISA)• System issues

– interfaces– find “ideal” RC fabric

Page 15: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 15

The CMU Reconfigurable Computing Project

Page 16: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 16

Hardware Goals

• To build a complete reconfigurable hardware device

• To build the system integration hardware

• To host the device in a PC

Page 17: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 17

Our Device:

• Word processing elements

• Pipelined architecture

• Virtualized hardware

• Local interconnection network

• Wide pipelined bus

Page 18: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 18

Configurationmemory

Stripes

Data & Configcontroller

Processingelements

Page 19: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 19

Hardware Virtualization

Instructionscurrently in hardware

Instructions paged out

Actual availablehardware

Prog

ram

Page 20: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 20

Hardware Virtualization (2)

compute

compute

compute

configurePage in

Page out

Program in configurationmemory

hardware

Overlap configuration with computation.

Page 21: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 21

Processing Elements

• Look-up table• Any 3-to-1 function

a b

Cin

out

PE2 PE0PE1

Page 22: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 22

The Interconnection Network

Word-level cross-bar

P*B bits

Pass Registers

0

P*B*N bits

B bits

PEPE N PE 1

Page 23: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 23

The PCI BoardTitle:chip.epsCreator:fig2dev Version 3.2 Patchlevel 0-beta3Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.

Page 24: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 25

Software GoalTo program reconfigurable devices using the standard

software development processes:

– Compile C or Java– Do it quickly

Partitioner

DIL

Java

Data-flow Intermediate Language

Configuration

Reconfigurable HW CPU

Built

Page 25: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 26

Building Circuits From DIL

a = b + c * d;

e = c - d;

• variables wires• operators gates

+

*

cb d

a

-

e

Page 26: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 27

Mapping Circuits To

-

+

a b c

-

+

a b c

-+

a b c

-+

a b c

Page 27: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 28

The DIL Compiler Front-End

Parser

Evaluator

Loader

Loader

Dil

input file

Circuit

component

library

Component

circuits

Backend

Page 28: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 29

The DIL Compiler BackendCircuit

(expanded)

OptimizerPlacer-

Router

CircuitCircuit

(placed)

Code generator

AsmC++

Front-end

C++xfig

The whole compilation process is very fast (compared to classical CAD tools).

We can compile two orders of magnitude faster.

Page 29: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 30

Small Big

Efficient usage Wasteful

Slower Faster bit-slice

Flexible interconnect Coarse routing

Bigger configuration Fewer configuration bits

Place and route easier Constrains the compiler

Processing Element Size Tradeoffs

Page 30: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 31

Stripe Width Tradeoffs

Wider NarrowerFewer stripes More will fit

Virtualize more Fewer page-insBandwidth waste Less bandwidth available

Placer freedom Placement constrained

Page 31: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 32

Wider Narrower

More area Less area

High bandwidth Time-mux bus

Bus Width Tradeoffs

Page 32: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 33

Clock Speed Tradeoffs(run-time)

Faster Slower

Short critical path Big chains

Long pipeline built Compact circuits

Decomposition overhead Little decomposition

Virtualized more Less virtualized

+24

2424+

++

2424

24

88

8

Page 33: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 34

Configuration Bits per Stripe

0

200

400

600

800

1000

1200

1400

1600

64 80 96 112 128 144Stripe Width

Co

nfi

gu

rati

on

Bit

s

2 4 8 16 32

PE bit width

Page 34: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 35

Title:(fir-throughput.eps)Creator:Adobe Illustrator(TM) 7.0Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.

Page 35: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 36

Project Status• Operational:

– Behavioral and structural models of Piperench in Verilog

– Assembler, simulator– Tools for visualization and debugging– One tile fabricated and tested– Very fast compiler from intermediate language

• In work:– Prototype PipeRench to be taped this summer – PCI board to host PipeRench in a PC

Page 36: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 37

Simulated Speed-up vs. UltraSparc @ 300Mhz

328.8

29.020.6

90.961.8

26.0

76.1

1.0

10.0

100.0

1000.0

ATR Cordic DCT FIR IDEA Nqueens Over

Page 37: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 38

Future Work

• Build the PCI board

• Build the OS device drivers

• Start investigating HLL issues:– automatic partitioning– translation to DIL– special code transformations

Page 38: SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu mihaib@cs.cmu.edu.

SSS 4/9/99 CMU Reconfigurable Computing 39

Conclusions

• A set of important applications can benefit from RC devices

• RC offer potential for substantial performance improvement at a low cost

• RC devices will soon be mainstreamin the embedded computing world; perhaps in the future they will also permeate the desktop Pentium V

UVR