Implementing Algorit hms in FPGA-Based Re configurable Compute rs Using C-Based Syn thesis E2MATRIX RESEARCH LAB Opp Phagwara Bus Stand, Backside Axis Bank, Parmar Complex Phagwara, Punjab ( India ). Contact : +91 9041262727 Web : www.e2matrix.com Email : [email protected]
38
Embed
Image Enhancement Methods Approach - image processing using Verilog on FPGA ( Verilog )
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Implementing Algorithms in FPGA-Based Reconfigurable Computers Using C-Based Synthesis
E2MATRIX RESEARCH LABOpp Phagwara Bus Stand, Backside Axis Bank, Parmar
ComplexPhagwara, Punjab ( India ). Contact : +91 9041262727
Prerequisites Motivations for using FPGAs in RC and HPC HPC and RC FPGA systems hardware and infrastructure
Objectives HPC algorithms and Considerations for Reconfigurable Computing (RC) Share a perspective on the State-of-the-Art for C-based HW design Describe the C to FPGA Flow Illustrate with code examples … Look forward to some critical debate…
The solution space (its place in EDA) Nature of C for HW design
The Design Flow Summary
“RC = Using FPGAs for (algorithmic) computation”1. Embedded: Well established – body of knowledge/experience 2. Enterprise: Some3. HPC: Starting Out
Reconfigurable Computing
Promised Opportunities Algorithm Acceleration
Exploit parallelism to increase performance with custom HW implementation Algorithm Offload
Free CPU resource by offloading bottleneck processes
BIG Challenges Development complexity
Design framework and methods, deployment and integration/middleware Coupling to coprocessor/data bandwidth Price/Performance/Power! Choosing the right applications!
High Performance Embedded and Reconfigurable Computing Why FPGA Computing?
Moore’s Law showing signs of strain Ability to parallelize in HW Price/GOPS coming down rapidly Hard IP blocks – excellent density
Example: Floating Point Performance Maximum for Virtex-4 – 50 GFLOPS (Courtesy of Dave Bennett, Xilinx Labs) Maximum for Virtex-2 – 17.5 GFLOPS “ “ “ “ “ “ “Can fit 10’s of FPUs on 2 Xilinx Virtex-4’s” (Courtesy of Justin Tripp, LANL) Use of hard macros for functions is mandatory (example DSP48 on Virtex-4)
C-based design for FPGAs Several offerings on commercial marketplace or in research
RTL/HDL is the most widely used way to get to FPGAs but is not usable by SW engineers
Conventional Wisdom for RC
1. Small data objects Data transfer overhead to coprocessor, High operation to byte ratio
2. Modest arithmetic Difficult to design and implement complex algorithms in HW Integer/fixed precision calculations Floating point too resource expensive
3. Data-parallelism Parallelism essential - FPGA clocks order of magnitude slower than CPUs Fine grain - wide data widths Medium grain - operation/function routine Course grain - multiple instantiations of application processes
4. Pipeline-ability Streaming Applications – most successful
5. Simple Control Difficult to design complex scheduling schemes in Parallel HW
Further Considerations 6. Exploiting “Soft” programmable HW
Configurable Applications Schedule and load HW content prior to HW execution
Reconfigurable Applications Dynamically change HW content during HW execution
Commercial RC Applications Well established in embedded systems:
Digital Video Technology and Image Processing “PROCESSING AT THE SENSOR” versus local and/or remote processing 3D LCD display development and test Real-time verification of HDTV image processing algorithms Robust image matching - product tracking and production line control
Digital Signal Processing Engine control unit for 3-phase motors Radar and sonar beamforming and spatial filtering Computer aided tomography security system
Communications and Networking Internet reconfigurable multimedia terminal, MP3, VoIP etc. Ground traffic simulation testbed for broadband satellite network communications Satellite based Internet data tracking system
Rapid Systems Prototyping Automotive safety system incorporating sensor fusion Robotic vision system for object detection and robot guidance
C-based design The solution space (its place in EDA) Nature of C for HW design
The Design Flow Summary JPEG2000 Design Example
Summary Commercial C-based design is a reality For the HPC and RC communities it offers:
Fastest route to accelerating SW designs in FPGA Lower barrier to adoption than RTL technologies Greater customization and productivity than block based approaches Complete integration with RTL/block based approaches for “Power users”
Deterministic and quality results State of the art tools used by embedded systems designers
RC platforms for rapid prototyping Simple migration, development to deployment with full library support
> Celoxica 1st PassSlices 1.347Device utilization 12%Speed (MHz) 89.5Lines of code 310Design time (days) 10Simulation time for Lena jpeg 5 mins
JPEG2000 MQ coder ImplementationObservations
HDL Smaller
HC FasterHC Quicker
Expert vs Novice
Celoxica Final1,99918%115.533012 (10+2)5 mins
HDL6206%7680030*Hours
* Doesn’t include partitioning spec.
development
> Common language base eased porting to hardware of the MQ coder source & DSM allowed partition, co verification & data to be moved between hardware & software
> Optimizations included adding parallelism, replacing for() loops with while() loops, & simplifying loop control.
> Design developed in a unified design environment
E2MATRIX RESEARCH LABOpp Phagwara Bus Stand,
Backside Axis Bank, Parmar ComplexPhagwara, Punjab ( India ).Contact : +91 9041262727Web : www.e2matrix.com