Top Banner
Implementing Algorithms in FPGA- Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable Systems Summer Institute Urbana, Illinois, July 11-13 2005
38

Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

Dec 28, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based

SynthesisDoug Johnson, Technical Marketing Manager

NCSA/OSC Reconfigurable Systems Summer Institute

Urbana, Illinois, July 11-13 2005

Page 2: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute2

Celoxica

UK-Based System design company Provider of design tools, IP & services for Digital Imaging & Signal Processing

Image Processing

Video Processing

Sonar/ Radar signal processing

Biometrics

Massively parallel data mining and matching

Complete solutions for Electronic Level System (ESL) Design

System/ algorithm acceleration

Co-design partitioning

Co-simulation & co-verification (C/ C++/ SystemC/ Handel-C/ Matlab/ VHDL/ Verilog)

Hardware compilation & C synthesis to reconfigurable architectures

Consulting and professional services

Systems analysis and design strategy

System implementation capability

Page 3: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute3

Presentation Objectives

Prerequisites Motivations for using FPGAs in RC and HPC HPC and RC FPGA systems hardware and infrastructure

Objectives HPC algorithms and Considerations for Reconfigurable Computing (RC) Share a perspective on the State-of-the-Art for C-based HW design Describe the C to FPGA Flow Illustrate with code examples … Look forward to some critical debate…

Page 4: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute4

Agenda

Reconfigurable Computing Considerations, core algorithm relationships, commercial applications

C-based design The solution space (its place in EDA) Nature of C for HW design

The Design Flow Summary JPEG2000 Design Example

Page 5: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute5

Agenda

Reconfigurable Computing (RC) Considerations, core algorithm relationships, commercial applications

C-based design The solution space (its place in EDA) Nature of C for HW design

The Design Flow Summary

“RC = Using FPGAs for (algorithmic) computation”

1. Embedded: Well established – body of knowledge/experience

2. Enterprise: Some

3. HPC: Starting Out

Page 6: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute6

Promised Opportunities Algorithm Acceleration

Exploit parallelism to increase performance with custom HW implementation Algorithm Offload

Free CPU resource by offloading bottleneck processes

BIG Challenges Development complexity

Design framework and methods, deployment and integration/middleware Coupling to coprocessor/data bandwidth Price/Performance/Power! Choosing the right applications!

Reconfigurable Computing

1980 1990 2000 20X0?

Commercial C-to-FPGA tools

FPGAs

Closely Coupled SystemsPartitioning Frameworks

Intimately Coupled SystemsAdvanced Compilers

First RC Successes

Page 7: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute7

FPGA Computing and Methodology

High Performance Embedded and Reconfigurable Computing Why FPGA Computing?

Moore’s Law showing signs of strain Ability to parallelize in HW Price/GOPS coming down rapidly Hard IP blocks – excellent density

Example: Floating Point Performance Maximum for Virtex-4 – 50 GFLOPS (Courtesy of Dave Bennett, Xilinx Labs) Maximum for Virtex-2 – 17.5 GFLOPS “ “ “ “ “ “ “Can fit 10’s of FPUs on 2 Xilinx Virtex-4’s” (Courtesy of Justin Tripp, LANL) Use of hard macros for functions is mandatory (example DSP48 on Virtex-4)

C-based design for FPGAs Several offerings on commercial marketplace or in research

Commercial – Celoxica, Mentor Graphics, Impulse Technologies, Mitrion… Research – Sandia, UC Riverside, LANL

RTL/HDL is the most widely used way to get to FPGAs but is not usable by SW engineers

Page 8: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute8

Conventional Wisdom for RC

1. Small data objects Data transfer overhead to coprocessor, High operation to byte ratio

2. Modest arithmetic Difficult to design and implement complex algorithms in HW Integer/fixed precision calculations Floating point too resource expensive

3. Data-parallelism Parallelism essential - FPGA clocks order of magnitude slower than CPUs Fine grain - wide data widths Medium grain - operation/function routine Course grain - multiple instantiations of application processes

4. Pipeline-ability Streaming Applications – most successful

5. Simple Control Difficult to design complex scheduling schemes in Parallel HW

High Density Devices

C-based design

Closely coupled systems

Essential

Fewer Issues with Latency in HPC

Soft Cores/C-based design

2005

Page 9: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute9

Further Considerations

6. Exploiting “Soft” programmable HW Configurable Applications

Schedule and load HW content prior to HW execution Reconfigurable Applications

Dynamically change HW content during HW execution

Few Compelling Examples in HPC

Page 10: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute10

Commercial RC Applications

Well established in embedded systems:

Digital Video Technology and Image Processing “PROCESSING AT THE SENSOR” versus local and/or remote processing 3D LCD display development and test Real-time verification of HDTV image processing algorithms Robust image matching - product tracking and production line control

Digital Signal Processing Engine control unit for 3-phase motors Radar and sonar beamforming and spatial filtering Computer aided tomography security system

Communications and Networking

Internet reconfigurable multimedia terminal, MP3, VoIP etc. Ground traffic simulation testbed for broadband satellite network communications Satellite based Internet data tracking system

Rapid Systems Prototyping Automotive safety system incorporating sensor fusion Robotic vision system for object detection and robot guidance

Defense & Security

Consumer Automotive & Industrial

…using C-based design

Page 11: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute11

Enterprise Computing Content processing solutions

XML parsing, virus checking Packet/Pattern Matching/Filtering Compression/decompression Security/Encryption – DES/3-DES, SHA, MD5, AES/Rijndael

High Performance Computing Image processing

CT scan analysis, 3D modeling, Ray Tracing Finite element analysis and simulation Custom Vector Engines Genome calculations Seismic data processing

Commercial RC Applications…using C-based design

Page 12: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute12

CFD

FourierMethods

n-body

GraphTheoretic

RasterGraphics

DiscreteEvents

PatternMatching

SymbolicProcessing

MonteCarlo

Transport

PDE

ODE

Fields

BasicAlgorithms

&NumericalMethods

Combustion

Structural Mechanics

Multibody Dynamics

Electromagnetics

Geophysical Fluids

Weather and Climate

Aerodynamics

Reservoir Modelling

Ecosystems

CVD

Plasma Processing

Astrophysics

Seismic Processing

Cloud Physics

Chemical Reactors

Boilers

Chemical Reactors

Magnet Design

Economics Models

Phylogenetic Trees

Electrical Grids

Pipeline Flows

Distribution Networks Biosphere/Geosphere

Neural NetworksCrystallography

Tomographic Reconstruction

MRI Imaging

DiffractionInversionProblems

Signal Processing

Condensed MatterElectronic Structure

RationalDrug Design

Biomolecular Dynamics

Nanotechnology

DataAssimilation

Chemical Dynamics Atomic

Scattering

ActinideChemistry

FractureMechanics

Cosmology

Astrophysics

Orbital Mechanics

MilitaryLogistics

Manufacturing Systems

Population Genetics

Air TrafficControl

TransportationSystems

Economics

VLSI Design

QCD Nuclear Structure

NeutronTransport

VirtualReality

VirtualPrototypes

ComputationalSteering

ScientificVisualization

MultimediaCollaborationTools

GenomeProcessing

ComputerVision

Databases

Data Mining

Cryptography

IntelligentSearch

ComputerAlgebra

Number TheoryAutomatedDeductionIntelligent

AgentsCAD

Molecular Modeling

Electronic Structure

Quantum Chemistry

Flow in Porous Media

RadiationReaction-Diffusion

Multiphase Flow

Source: Rick Stevens - ANL

Core Algorithm Relationships in HPC

Page 13: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute13

CFD

FourierMethods

n-body

GraphTheoretic

RasterGraphics

DiscreteEvents

PatternMatching

SymbolicProcessing

MonteCarlo

Transport

PDE

ODE

Fields

BasicAlgorithms

&NumericalMethods

Combustion

Structural Mechanics

Multibody Dynamics

Electromagnetics

Geophysical Fluids

Weather and Climate

Aerodynamics

Reservoir Modelling

Ecosystems

CVD

Plasma Processing

Astrophysics

Seismic Processing

Cloud Physics

Chemical Reactors

Boilers

Chemical Reactors

Magnet Design

Economics Models

Phylogenetic Trees

Electrical Grids

Pipeline Flows

Distribution Networks Biosphere/Geosphere

Neural NetworksCrystallography

Tomographic Reconstruction

MRI Imaging

DiffractionInversionProblems

Signal Processing

Condensed MatterElectronic Structure

RationalDrug Design

Biomolecular Dynamics

Nanotechnology

DataAssimilation

Chemical Dynamics Atomic

Scattering

ActinideChemistry

FractureMechanics

Cosmology

Astrophysics

Orbital Mechanics

MilitaryLogistics

Manufacturing Systems

Population Genetics

Air TrafficControl

TransportationSystems

Economics

VLSI Design

QCD Nuclear Structure

NeutronTransport

VirtualReality

VirtualPrototypes

ComputationalSteering

ScientificVisualization

MultimediaCollaborationTools

GenomeProcessing

ComputerVision

Databases

Data Mining

Cryptography

IntelligentSearch

ComputerAlgebra

Number TheoryAutomatedDeductionIntelligent

AgentsCAD

Molecular Modeling

Electronic Structure

Quantum Chemistry

Flow in Porous Media

RadiationReaction-Diffusion

Multiphase Flow

Source: Rick Stevens - ANL

Core Algorithm Relationships in HPC

How do we map out the right Apps?

Page 14: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute14

Exploiting FPGA in HPC

Hardware:

“Enterprise Quality” co-processor system products (Cray XD1, SGI RASC) Robust PCI/PCIx/VME-based FPGA card solutions for development

A software design methodology is essential:

SW dominated application sector Target developers have a SW background Register Transfer Level (RTL), Hardware Description Languages (HDL) are foreign

Complete designs can be specified in a C environment Porting to HW implementations simplified

Platform abstractions through API’s and Libraries Simplified Specification, Development, Deployment

How do we select and benchmark?

Page 15: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute15

Agenda

Reconfigurable Computing Considerations, core algorithm relationships, commercial applications

C-based design The solution space (its place in EDA – Electronic Design Automation) Nature of C for HW design

The Design Flow Summary JPEG2000 Design Example

Page 16: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute16

Function

Architecture

Implementation

Embedded Hardware (HW) Design

Physical Design

Algorithm Design

Block Design

RTL

ArchitectureExploration

Specification

Design Analysis

Interface Synthesis

Custom Processors

Fast Mixed Simulation

HW Accelerated Simulation

Fixed Point extraction

HLL Synthesis

Implementation IP ModelsTLM Frameworks

DSP IP

Reconfigurable Prototypes

Emulation Platforms

Implementation IP

RTL Verification

Algorithm Design

Block Design

RTL

ArchitectureExploration

Design Analysis

Interface Synthesis

Custom Processors

Mixed Simulation

HW Accelerated Simulation

Fixed Point extraction

HLL Synthesis

Implementation IP ModelsTLM Frameworks

DSP IP

Reconfigurable Prototypes

Emulation Platforms

Implementation IP

RTL Verification

Algorithm Design

ArchitectureExploration

C-Based Synthesis

API’s/Libraries

FPGA/SoPC

C to FPGA/SoPC

Page 17: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute17

C to FPGA Accelerated System

Algorithm Design

EDIF

FPGA

Function & Architecture

Implementation

Mixed Simulation

C for HWCA

C/C++AL

API’s/Libraries

OBJ

Processor

SoftwareModel

Specification Model

TestbenchDesign

HW SW

Partitioning

System Model

Design AnalysisOptimization

P&RSynthesis

RTL

C-Based Synthesis

ArchitectureExploration

BSPBSP

COMMS

Page 18: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute18

Challenges for C-based synthesis

Concurrency (Parallelism) Compiler-determined (behavioral synthesis) Explicit

Timing Constraints Explicit Rules-based

Data Types Annotations, additional or C++

Communication Additional or C-like

Page 19: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute19

Two Approaches to C-based Design

ANSI/ISO C Language Standard

par{…}, seq{…}, Interfaces, Channels,

Bit Manipulation,RAM & ROM

Single cycle assignment

Bits and bit-vectorsArbitrary width integers

Signals

Core LibrariesTLM (PAL/DSM), Fixed/Floating point …

Handel-C

ANSI/ISO C++ Language Standard

Modules, Ports, Processes, Events,

Interfaces, ChannelsEvent Driven Sim Kernel

4-valued logic/vectorsBits and bit-vectors

Arbitrary width integersFixed-point

C++ user-defined types

Signal, Timer, Mutex, Semaphore, FIFO, etcPrimitive Channels

Kahn Process Networks, Static Dataflow…Standard Channels for Various MOC

Core LibrariesSCV, TLM, Master/Slave …

SystemC

Core Language Core Language Data TypesData Types

C Algorithm to FPGA SoC (System-on-a-Chip) Prototyping/Verification

Page 20: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute20

Agenda

Reconfigurable Computing Considerations, core algorithm relationships, commercial applications

C-based design The solution space (its place in EDA) Nature of C for HW design

The Design Flow Summary JPEG2000 Design Example

Page 21: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute21

System Design Refinement

Function

par{ processA(…); processB(…); processC(…); processD(…); }

void processD(…){ unsigned 9 a,b,c; par{ a=1; b=2; } c=3; };

A B

C DCP

CA

C/C++

Handel-C

• System Function• Course grain parallelism

• Parallel algorithm design• Fine-grain parallism • Bit/cycle true processes• Algorithm Testbench

AL

Handel-C

Architecture• Add interfaces• Signal/cycle accurate test

A B

C D

void main(){ interface port_in… interface port_out… … }

CA Handel-C

EDIF/RTL

A B

C D

Page 22: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute22

Systems Integration

Implementation• Complete system design• Interface to pins• Multi-Clock domain• IP Integration

A

C

RTL from HDL IP

A B

C D

D

CLK

RST

Data

B

EDIF (Electronic Design Interface Format)

set clock = external “CLK”;set reset = external “RST”;interface Data(…)…void main() { par{ processA(…); processB(…); processC(…); processD(…); }}

{ interface processD(…)…};

{ interface processB(…)…};

EDIF/RTL

Page 23: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute23

Parallel Debug in C environment

Algorithm Design

Page 24: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute24

Resource Usage/Speed Estimations

ArchitectureExploration

Page 25: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute25

FPGA Support

Technology mappingOptimizations

Page 26: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute26

Handel-C Template Multiplier

set clock = external "clk";void main(){

…while(1) par{

… process();}

}

void process(){ unsigned W A, B, C;

while(1) par { … Multiply(A, B, &C); … }}

void Multiply(unsigned W A, unsigned W B, unsigned W *C){ static unsigned W a[W], b[W], c[W]; par{ a[0] = A; b[0] = B; c[0] = a[0][0] == 0 ? 0 : b[0]; par (i = 1; i < W; i++) { a[i] = a[i-1] >> 1; b[i] = b[i-1] << 1; c[i] = c[i-1] + (a[i][0] == 0 ? 0 : b[i]); } *C = c[W-1]; }}

Pipelined

Page 27: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute27

Agenda

Reconfigurable Computing Considerations, core algorithm relationships, commercial applications

C-based design The solution space (its place in EDA) Nature of C for HW design

The Design Flow Summary JPEG2000 Design Example

Page 28: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute28

Summary

Commercial C-based design is a reality For the HPC and RC communities it offers:

Fastest route to accelerating SW designs in FPGA Lower barrier to adoption than RTL technologies Greater customization and productivity than block based approaches Complete integration with RTL/block based approaches for “Power

users”

Deterministic and quality results State of the art tools used by embedded systems designers

RC platforms for rapid prototyping Simple migration, development to deployment with full library support

Page 29: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

Design Example

JPEG2000 Image Compression Algorithm

Page 30: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute30

Example Design

Five Steps to HW Platform:

1. Specification Model Algorithm Profiling

2. Functional System Model System Estimations

3. Architecture and Communication Model Optimization

4. Implementation Model Direct Synthesis C to EDIF

5. HW Platform Board level integration

JPEG 2000 Compressor

Pre processing

RGB to YUVconversion

DWT

Quantization

Tier-2 Encoder

Rate Control

Original Image

Coded Image

Tier-1 Encoder

Page 31: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute31

1. Specification Model

Function & ArchitectureSoftware

Model

Specification Model

TestbenchDesign

Pre processing

RGB to YUVconversion

DWT

Quantization

Tier-2 Encoder

Rate Control

Original Image

Coded Image

Tier-1 Encoder

0

1

2

3

4

5

6

Memory Usage (x86) MB

Current

Sum

Algorithm Profiling

- Memory

- Processing Time

- Data Flow

22 *.c and *.h files

1468 lines of code

DWT/Tier1 are the compute intensive blocks

C/C++AL

Page 32: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute32

2. Functional System Model

Function & ArchitectureSoftware

Model

Specification Model

TestbenchDesign

HW SW

Partitioning

System ModelPre processing

RGB to YUVconversion

quantization

Tier-2 Encoder

Rate Control

Original Image

Coded Image

Tier-1 Encoder

DWT

Handel-CCA

C/C++AL

/* C */void sw_block(…){

}

/*Handel-C*/extern “C” sw_block(…);

void main(void){ while(1) par{

sw_block(…);hw_block(…);

} }

void hw_block(…){ … } Cycles/speed/area…

Page 33: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute33

3. Architecture and Communication Model

Function & Architecture

Pre processing

RGB to YUVconversion

quantization

Tier-2 Encoder

Rate Control

Original Image

Coded Image

DWT

Handel-CCA

C/C++AL

FIFO

FIFO

DsmPortH2S

DsmRead(…)

DsmWrite(…)

DsmFlush(…)

Dataflow/Cycles/speed/area…

Tier-1 Encoder

Page 34: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute34

4. Implementation Model

EDIFRTL

Implementation

EDIF

Device Family

A B

C D

void main(){ interface port_in… interface port_out… … }

Page 35: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute35

Estimations from Synthesis

DWT ~ 6% VII1000

Page 36: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute36

5. Hardware Platform

Implementation

uP HW

RAM

• Microblaze + Xilinx FPGA• Nios + Altera FPGA• Xilinx V2Pro• Toshiba MeP + FPGA• PowerPC + PLB + FPGA• PC + FPGA PCI Card•…etc

uP HW

uP HW

RAM

HW

uP

EDIF

FPGA

P&R

DWT

Slices: 758

Device utilization : 7%

Speed (MHz): 151

Lines of code: 395

Implementation Model Estimations

DWT ~6%

From P&R Report for VII1000-4A B

C D

Board Level IntegrationSpecific I/O ImplementationsPin Location constraints

Page 37: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute37

JPEG2000 DWT Implementation

Example taken from a “Xilinx Design Challenge” Comparison made with HDL approach See Article in Xcell Volume 46

http://www.xilinx.com/publications/xcellonline/xcell_46/xc_celoxica46.htm

Observations

Comparable

Using C faster

Using C quicker

Expert vs Novice

HDL

800

7%

128

435

20*+6 hours

* Doesn’t include partitioning spec.

development

C-Based Design 1st pass

Slices646

Device utilization 6%

Speed (MHz) 110

Lines of code 386

Design time (days) 6

Simulation time 5 mins

2nd pass

546

5%

130

386

7 (6+1)

5 mins

Final

758

7%

151

395

7 (6+1)

20 mins

* Lena used as testbench throughout, input bit width12, max 1K image width

Page 38: Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis Doug Johnson, Technical Marketing Manager NCSA/OSC Reconfigurable.

NCSA/OSC Reconfigurable Systems Summer Institute38

> Celoxica 1st Pass

Slices 1.347

Device utilization 12%

Speed (MHz) 89.5

Lines of code 310

Design time (days) 10

Simulation time for Lena jpeg 5 mins

JPEG2000 MQ coder Implementation

Observations

HDL Smaller

HC Faster

HC Quicker

Expert vs Novice

Celoxica Final

1,999

18%

115.5

330

12 (10+2)

5 mins

HDL

620

6%

76

800

30*

Hours

* Doesn’t include partitioning spec.

development

> Common language base eased porting to hardware of the MQ coder source & DSM allowed partition, co verification & data to be moved between hardware & software

> Optimizations included adding parallelism, replacing for() loops with while() loops, & simplifying loop control.

> Design developed in a unified design environment