Top Banner
1 is for Circuits: Capturing FPGA Circuits as Sequential Code for Portability Scott Sirowy*, Greg Stitt , Frank Vahid* This work was supported in part by the National Science Foundation and the Semiconductor Research Corporation *Department of Computer Science and Engineering University of California, Riverside {ssirowy,vahid}@cs.ucr.edu Also with the Center for Embedded Computer Systems at UC Irvine ‡Department of Electrical and Computer Engineering University of Florida [email protected]
21

1 is for Circuits: Capturing FPGA Circuits as Sequential Code for Portability Scott Sirowy*, Greg Stitt ‡, Frank Vahid* † This work was supported in part.

Dec 27, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 is for Circuits: Capturing FPGA Circuits as Sequential Code for Portability Scott Sirowy*, Greg Stitt ‡, Frank Vahid* † This work was supported in part.

1

is for Circuits: Capturing FPGA Circuits as Sequential Code for Portability

Scott Sirowy*, Greg Stitt‡, Frank Vahid*†

This work was supported in part by the National Science Foundation and the Semiconductor Research

Corporation

*Department of Computer Science and Engineering

University of California, Riverside{ssirowy,vahid}@cs.ucr.edu

†Also with the Center for Embedded Computer Systems at UC Irvine

‡Department of Electrical and Computer Engineering

University of [email protected]

Page 2: 1 is for Circuits: Capturing FPGA Circuits as Sequential Code for Portability Scott Sirowy*, Greg Stitt ‡, Frank Vahid* † This work was supported in part.

2 of 21

“C is for Circuits” vs.High Level Synthesis

Designer captures spatial algorithm as custom circuit

N unsorted

Split

1 sorted 1 sorted

SplitMerge

MergeSplit

2 sorted2 sorted

4 sorted 4 sorted

Designer captures application with temporal algorithmquicksort( array, left, right){

if right > left: pivot= array[left] newpivot = partition(array, left, right, pivot) quicksort(array, left, newpivot -1) quicksort(array, newpivot + 1, right)}

Synthesis

?

Page 3: 1 is for Circuits: Capturing FPGA Circuits as Sequential Code for Portability Scott Sirowy*, Greg Stitt ‡, Frank Vahid* † This work was supported in part.

3 of 21

“C is for Circuits” vs.High Level Synthesis

Designer captures spatial algorithm as custom circuit

N unsorted

Split

1 sorted 1 sorted

SplitMerge

MergeSplit

2 sorted2 sorted

4 sorted 4 sorted

Designer captures application with temporal algorithmquicksort( array, left, right){

if right > left: pivot= array[left] newpivot = partition(array, left, right, pivot) quicksort(array, left, newpivot -1) quicksort(array, newpivot + 1, right)}

Synthesis

Queue 1_1, 1_2, 2_1, 2_2, 4_s, 4_us;Split(16_u.dequeue, 16_u.dequeue, 1_1, 1_2);stage1 = Merge(1_1.dequeue, 1_2.dequeue);Split(16_u.dequeue, 16_u.dequeue);stage1 += Merge(1_1.dequeue, 1_2.dequeue);Split(stage1, 2_1, 2_2);stage2 = Merge(2_1, 2_2);Split(16_u.dequeue, 16_u.dequeue);stage1 = Merge(1_1.dequeue, 1_2.dequeue);Split(16_u.dequeue, 16_u.dequeue);stage1 += Merge(1_1.dequeue, 1_2.dequeue);Split(stage1);stage2 += Merge(2_1, 2_2);Split(stage2, 4_1, 4_2);…

Capture in temporal language

Synthesis

Page 4: 1 is for Circuits: Capturing FPGA Circuits as Sequential Code for Portability Scott Sirowy*, Greg Stitt ‡, Frank Vahid* † This work was supported in part.

4 of 21

?

Goal: Portable Circuit Distribution Format

0010010010101010111011100111001011001101110001011000100100011100101111011111100101110011100011101000011100101111

Current circuit distribution method Bitstreams

Tightly coupled to a specific device

4 sorted4 sorted

8 sorted8 sorted

16 unsorted

Split

1 sorted1 sorted

2 sorted2 sorted

SplitMerge

Merge

MergeSplit

MergeSplit

16 sorted

1111110000100110111011100111001011001101110001011111110100011100101111011000000000110011100011101000011100101111

1111110000100110111011100111001011001101110001011111110100011100101111011000000000110011100011101000011100101111

FPGA

+** +

MEMProc.

FPGA

+ +Proc.

FPGA

Proc.

Proc.FPGA

Proc.

Proc.

Applic

ati

on c

once

ptu

aliz

ed

and c

aptu

red a

s cir

cu

it

Page 5: 1 is for Circuits: Capturing FPGA Circuits as Sequential Code for Portability Scott Sirowy*, Greg Stitt ‡, Frank Vahid* † This work was supported in part.

5 of 21

Goal: Portable Circuit Distribution Format

Current circuit distribution method RTL

Good across multiple FPGA devices But requires resynthesis/mapping May not use FPGA resources most

effectively Loop unrolling, memory mapping, hard-

core use, …

4 sorted4 sorted

8 sorted8 sorted

16 unsorted

Split

1 sorted1 sorted

2 sorted2 sorted

SplitMerge

Merge

MergeSplit

MergeSplit

16 sorted

Entity Circuitport( … );

Architecture of…Begin…End arch; FPGA

+** +

MEMProc.

FPGA

+ +Proc.

FPGA

Proc.

Proc.FPGA

Proc.

Proc.

Applic

ati

on c

once

ptu

aliz

ed

and c

aptu

red a

s cir

cu

it

Page 6: 1 is for Circuits: Capturing FPGA Circuits as Sequential Code for Portability Scott Sirowy*, Greg Stitt ‡, Frank Vahid* † This work was supported in part.

6 of 21

#include <foo.h>

int main(){ float pi = 3.141; while(1){ … }}

Goal: Portable Circuit Distribution Format

Higher abstraction C code (or any sequential language)

Can yield more effective resource usage

Could even run on platforms with no FPGA

But also requires resynthesis/mapping

4 sorted4 sorted

8 sorted8 sorted

16 unsorted

Split

1 sorted1 sorted

2 sorted2 sorted

SplitMerge

Merge

MergeSplit

MergeSplit

16 sorted

FPGA

+** +

MEMProc.

FPGA

+ +Proc.

FPGA

Proc.

Proc.FPGA

Proc.

Proc.

ProcessorProcessor

Applic

ati

on c

once

ptu

aliz

ed

and c

aptu

red a

s cir

cu

it

Page 7: 1 is for Circuits: Capturing FPGA Circuits as Sequential Code for Portability Scott Sirowy*, Greg Stitt ‡, Frank Vahid* † This work was supported in part.

7 of 21

~~~~~~~~~

~~~~~~~~~

Problem: Many FPGA Applications Captured “Spatially” as Circuits, not C

Designer captures spatial algorithm as custom circuit for max performance

N unsorted

Split

1 sorted 1 sorted

SplitMerge

MergeSplit

2 sorted2 sorted

4 sorted 4 sorted

~~~~~~~~~

~~~~~~~~~

Circuits in FCCM Year

3D Vector Normalization 2001Regular Expression 2001RC4 2002Gaussian Noise Gen. 2003Molecular Dynamics 2004Particle Graphics 2005

Shortest Path 2006

~~~~~~~~~

~~~~~~~~~

~~~~~~~~~

~~~~~~~~~

~~~~~~~~~

~~~~~~~~~

~~~~~~~~~

~~~~~~~~~

~~~~~~~~~

~~~~~~~~~

~~~~~~~~~

~~~~~~~~~

~~~~~~~~~

~~~~~~~~~

~~~~~~~~~

~~~~~~~~~

~~~~~~~~~

~~~~~~~~~

~~~~~~~~~

~~~~~~~~~

~~~~~~~~~

~~~~~~~~~

~~~~~~~~~

~~~~~~~~~

~~~~~~~~~

~~~~~~~~~

~~~~~~~~~

~~~~~~~~~~~~

~~~~~~

~~~~~~~~~

70 custom circuits in FCCM’01-’06 alone

Page 8: 1 is for Circuits: Capturing FPGA Circuits as Sequential Code for Portability Scott Sirowy*, Greg Stitt ‡, Frank Vahid* † This work was supported in part.

8 of 21

Queue 1_1, 1_2, 2_1, 2_2, 4_s, 4_us;Split(16_u.dequeue, 16_u.dequeue, 1_1, 1_2);stage1 = Merge(1_1.dequeue, 1_2.dequeue);Split(16_u.dequeue, 16_u.dequeue);stage1 += Merge(1_1.dequeue, 1_2.dequeue);Split(stage1, 2_1, 2_2);stage2 = Merge(2_1, 2_2);Split(16_u.dequeue, 16_u.dequeue);stage1 = Merge(1_1.dequeue, 1_2.dequeue);Split(16_u.dequeue, 16_u.dequeue);stage1 += Merge(1_1.dequeue, 1_2.dequeue);Split(stage1);stage2 += Merge(2_1, 2_2);Split(stage2, 4_1, 4_2);…

Capturing Circuit Level Designs in

N unsorted

Split

1 sorted 1 sorted

SplitMerge

MergeSplit

2 sorted2 sorted

4 sorted 4 sorted

Can designers’ circuits be reverse-engineered to some form of C code?

From which original circuit will be synthesized by “standard” synthesis tools

Synthesis

Designer captures spatial algorithm as custom circuit for max performance

Page 9: 1 is for Circuits: Capturing FPGA Circuits as Sequential Code for Portability Scott Sirowy*, Greg Stitt ‡, Frank Vahid* † This work was supported in part.

9 of 21

Previous Work Convert existing sequential algorithms to

circuits Diniz, Eles, Frigo, Henkel, Najjar, Srinivasan, Stitt,

etc. Coding guidelines for synthesis

Stitt CODES/ISSS 2006 Reverse engineering techniques

Doom, Hanson et. al Languages that encapsulate spatial and

temporal concepts SystemC, StreamsC, etc.

Page 10: 1 is for Circuits: Capturing FPGA Circuits as Sequential Code for Portability Scott Sirowy*, Greg Stitt ‡, Frank Vahid* † This work was supported in part.

10 of 21

Study Methodology Chose pseudo-random subset of all

applicable FPGA circuit designs from past six years of FCCM (Field Programmable Custom Computing Machines)

Attempted to capture circuit with high level C such that a “standard” synthesis tool would output the original circuit

~~~~~~~~~

~~~~~~~~~~~~

~~~~~~

~~~~~~~~~~~~

~~~~~~

~~~~~~~~~~~~

~~~~~~

~~~~~~~~~~~~

~~~~~~

~~~~~~~~~

Page 11: 1 is for Circuits: Capturing FPGA Circuits as Sequential Code for Portability Scott Sirowy*, Greg Stitt ‡, Frank Vahid* † This work was supported in part.

11 of 21

Study Methodology

CDFG creation

Optimizations/SchedulingResource Allocation

VHDL Creation

CDFG analysis

int main(){Float pi = 3.14;…;…; }

int main(){Float pi = 3.14;…;…; }

Capture circuit in C code?

~~~~~~~~~

~~~~~~~~~~~~

~~~~~~

~~~~~~~~~~~~

~~~~~~

~~~~~~~~~~~~

~~~~~~

~~~~~~~~~~~~

~~~~~~

~~~~~~~~~

1.

2.

3.“Standard” Synthesis

“Standard” HLS tool Manually performed Optimizations

applied in same order for every application

1. Function Inlining2. Loop Unrolling3. Predication4. Constant Propagation5. Dead Code Elimination6. Code Hoisting7. Pipeline Analysis

?

Each circuit either Re-derivable from C Not re-derivable from C

Re-derivable Temporal C (the

“natural” algorithm Spatial C (reflecting the

circuit) Not re-derivable

Might still be possible

Page 12: 1 is for Circuits: Capturing FPGA Circuits as Sequential Code for Portability Scott Sirowy*, Greg Stitt ‡, Frank Vahid* † This work was supported in part.

12 of 21

Gaussian Noise GeneratorFCCM 2003 Lee et. al

Linear Feedback Shift Registers

u2

f(u1)g1(u2) g2(u2)

*

x1 x2

+

*

+

Stage1

Stage2

Stage3

Stage4

u1

~~~~~~~~~

~~~~~~~~~~~~

~~~~~~

~~~~~~~~~~~~

~~~~~~

~~~~~~~~~~~~

~~~~~~

~~~~~~~~~~~~

~~~~~~

~~~~~~~~~

2.

1.

int main(){…}

int main(){…}

Capture circuit in C code?

CDFG creation

Optimizations/Scheduling

Resource Allocation

VHDL Creation

CDFG analysis

Synthesis1. Function Inlining2. Loop Unrolling3. Predication4. Constant Propagation5. Dead Code Elimination6. Code Hoisting7. Pipeline Analysis

Page 13: 1 is for Circuits: Capturing FPGA Circuits as Sequential Code for Portability Scott Sirowy*, Greg Stitt ‡, Frank Vahid* † This work was supported in part.

13 of 21

Gaussian Noise GeneratorFCCM 2003 Lee et. al

~~~~~~~~~

~~~~~~~~~~~~

~~~~~~

~~~~~~~~~~~~

~~~~~~

~~~~~~~~~~~~

~~~~~~

~~~~~~~~~~~~

~~~~~~

~~~~~~~~~

2.

1.

int main(){…}

int main(){…}

Capture circuit in C code?

CDFG creation

Optimizations/Scheduling

Resource Allocation

VHDL Creation

CDFG analysis

Synthesis1. Function Inlining2. Loop Unrolling3. Predication4. Constant Propagation5. Dead Code Elimination6. Code Hoisting7. Pipeline Analysis

inline float rand0_1() { return rand()/((float) RAND_MAX+1);}

inline Stage1 doStage1() { Stage1 result; result.u1 = rand0_1(); result.u2 = rand0_1(); return result;}

inline Stage2 doStage2( float u1, float u2 ) {

Stage2 result; float f_u1, g1_u2, g2_u2;

f_u1 = sqrt( -log( u1 ) ); g1_u2 = sin( 2*M_PI*u2 ); g2_u2 = cos( 2*M_PI*u2 ); result.x1 = f_u1*g1_u2; result.x2 = f_u1*g2_u2; return result;}

inline Stage3 doStage3( float x1, float x2 ) {

static float acc1=0.0, acc2=0.0; Stage3 result;

result.x1 = acc1 + x1; result.x2 = acc2 + x2; acc1 = x1; acc2 = x2; return result;}

inline void doStage4( int i, int j, float x1, float x2 ) {

noise[i] = stage3.x1; noise[j] = stage3.x2;}

int main() {

Stage1 stage1; Stage2 stage2; Stage3 stage3; unsigned int i=0;

while (1) { stage1 = doStage1(); stage2 = doStage2( stage1.u1, stage1.u2 ); stage3 = doStage3( stage2.x1, stage2.x2 ); doStage4( i, i+1%NUM_SAMPLES,

stage3.x1, stage3.x2 ); i = (i+2)%NUM_SAMPLES; }

return 1;}

Linear Feedback Shift Registers

u2

f(u1)g1(u2) g2(u2)

*x1 x2

+

*

+

Stage1

Stage2

Stage3

Stage4

u1

Page 14: 1 is for Circuits: Capturing FPGA Circuits as Sequential Code for Portability Scott Sirowy*, Greg Stitt ‡, Frank Vahid* † This work was supported in part.

14 of 21

Gaussian Noise GeneratorFCCM 2003 Lee et. al

~~~~~~~~~

~~~~~~~~~~~~

~~~~~~

~~~~~~~~~~~~

~~~~~~

~~~~~~~~~~~~

~~~~~~

~~~~~~~~~~~~

~~~~~~

~~~~~~~~~

2.

1.

int main(){…}

int main(){…}

Capture circuit in C code?

CDFG creation

Optimizations/Scheduling

Resource Allocation

VHDL Creation

CDFG analysis

Synthesis1. Function Inlining2. Loop Unrolling3. Predication4. Constant Propagation5. Dead Code Elimination6. Code Hoisting7. Pipeline Analysis

rand()

rand()

u1 u2

doStage1()

g2(u2)f(u1)g1(u2)

*

x1 x2

u1u2 u2

*

doStage2()

acc1

acc2x1 x2+

x1 x2

+

doStage3()

acc1

acc2

doStage4()x1 x2

noise[i]

noise[j]

u1 u2

LFSR

doStage1()

f(u1)g1(u2) g2(u2)

* *

u1u2doStage2()

x1

+

acc1

+

acc2

x2

doStage3()

noise[]

doStage4()

sel

x1 x2

CDFG Creation/AnalysisScheduling/Resource Allocation

Page 15: 1 is for Circuits: Capturing FPGA Circuits as Sequential Code for Portability Scott Sirowy*, Greg Stitt ‡, Frank Vahid* † This work was supported in part.

15 of 21

Gaussian Noise GeneratorFCCM 2003 Lee et. al

rand()

rand()

u1 u2

doStage1()

g2(u2)f(u1)g1(u2)

*

x1 x2

u1u2 u2

*

doStage2()

acc1

acc2x1 x2+

x1 x2

+

doStage3()

acc1

acc2

doStage4()x1 x2

noise[i]

noise[j]

u1 u2

LFSR

doStage1()

f(u1)g1(u2) g2(u2)

* *

u1u2doStage2()

x1

+

acc1

+

acc2

x2

doStage3()

noise[]

doStage4()

sel

x1 x2

CDFG Creation/AnalysisScheduling/Resource Allocation

doStage1()

doStage2()

doStage4()

main()

doStage3()

LFSR

f(u1)g1(u2)

**

+

acc1

+

acc2

sel

g2(u2)

Circuit from “Standard” Synthesis

Page 16: 1 is for Circuits: Capturing FPGA Circuits as Sequential Code for Portability Scott Sirowy*, Greg Stitt ‡, Frank Vahid* † This work was supported in part.

16 of 21

Gaussian Noise GeneratorFCCM 2003 Lee et. al

Original Circuit

If (nearly) same “Rederivable from C”

Linear Feedback Shift Registers

u2

f(u1)g1(u2) g2(u2)

*x1 x2

+

*

+

Stage1

Stage2

Stage3

Stage4

u1

doStage1()

doStage2()

doStage4()

main()

doStage3()

LFSR

f(u1)g1(u2)

**

+

acc1

+

acc2

sel

g2(u2)

Circuit from “Standard” Synthesis

Page 17: 1 is for Circuits: Capturing FPGA Circuits as Sequential Code for Portability Scott Sirowy*, Greg Stitt ‡, Frank Vahid* † This work was supported in part.

17 of 21

Results2001 3D Vec. Normalization Yes Spatial, if online algorithms can be specified 2001 Efficient CAM No Uses dynamic FPGA routing2001 Automated Sensor Yes Temporal, floating point -> fixed point2001 Regular Expression Yes Spatial, creative connections of one-bit flip flops2002 Hyperspectral Image Yes Spatial, data reordering2002 Machine Vision Yes Spatial, custom pipelining2002 RC4 Yes Temporal, straightforward implementation2002 Set Covering Yes Spatial, data structures for easy hw implementation2002 Template Matching Yes Spatial, heavy modifications to original algorithm2002 Triangle Mesh Yes Spatial, custom encoding scheme2003 Congruential Sieves Yes Temporal, straightforward translation2003 Content Scanning Yes Temporal2003 F.P and Square Root Yes Spatial2003 Gaussian Noise Yes Spatial, requires the use of spatial C constructs2003 TRNG No Requires sampling a high frequency clock for noise2004 3D FDTD Method Yes Spatial2004 Deep Packet Filter No Requires knowledge of underlying FPGA2004 Online Floating Point No Online algorithm, variable length buffers2004 Molecular Dynamics Yes Spatial2004 Pattern Matching Yes Spatial2004 Seismic Migration Yes Spatial2004 Software Deceleration No Use a uP for its cache2004 V.M Window No Specific timing schemes implemented2005 Data Mining Yes Spatial2005 Cell Automata Yes Temporal2005 Particle Graphics Yes Spatial2005 Radiosity Yes Temporal2005 Transient Waves Yes Spatial2005 Road Traffic Yes Temporal2006 All Pairs Shortest Path Yes Spatial2006 Apriori Data Mining Yes Spatial2006 Molecular Dynamics Yes Spatial, define separate memories, custom pipeline2006 Gaussian Elimination Yes Spatial2006 Radiation Dose Yes Temporal2006 Random Variates Yes Spatial

Year of Publication Design Re-derivable from C Method/Reason

82% of the circuit designswere re-derivable from C

Page 18: 1 is for Circuits: Capturing FPGA Circuits as Sequential Code for Portability Scott Sirowy*, Greg Stitt ‡, Frank Vahid* † This work was supported in part.

18 of 21

ResultsPerformance Comparison

012345

Float

MD

CLA-E

C

Noise M

D`

Traffic

Elimin

atio

n

Avera

geExe

cuti

on

Tim

e

We couldn’t describein C to re-derive samecircuit

Used separate on-boardmemories

Custom

Synthesized

Re-derivable from C

Not re-derivable from C

Similar or identical performance

Page 19: 1 is for Circuits: Capturing FPGA Circuits as Sequential Code for Portability Scott Sirowy*, Greg Stitt ‡, Frank Vahid* † This work was supported in part.

19 of 21

ResultsArea Comparison

00.5

11.5

2Custom

Synthesized

Extra area due to added multiplexors or registers, none of which significantly altered behavior of the circuit

Page 20: 1 is for Circuits: Capturing FPGA Circuits as Sequential Code for Portability Scott Sirowy*, Greg Stitt ‡, Frank Vahid* † This work was supported in part.

20 of 21

onclusion Designers continue to conceptualize/capture

some FPGA applications “spatially” as circuits Despite increasing C-based synthesis tools

For 35 FCCM circuits studied, 82% were re-derivable from some form of C

Distributing a circuit using C code expands the range of target platforms and the longevity of an application

Compared to a netlist or RTL distribution Future work

Using C as part of a standard binary for FPGA

Page 21: 1 is for Circuits: Capturing FPGA Circuits as Sequential Code for Portability Scott Sirowy*, Greg Stitt ‡, Frank Vahid* † This work was supported in part.

21 of 21

Sponsors This presentation brought to you

by the letters

And viewers like you…

SFN