Using High-Level Synthesis to Migrate Open source Software Algorithms to Semiconductor Chip designs Umesh Sisodia CEO, CircuitSutra Technologies © Accellera Systems Initiative 1
Using High-Level Synthesis to Migrate Open source Software Algorithms
to Semiconductor Chip designs
Umesh SisodiaCEO, CircuitSutra Technologies
© Accellera Systems Initiative 1
High-Level SynthesisChip designing at Higher Level of Abstraction
Higher level of abstraction:
Segregation of design & Implementation
Design Engineers: Design decisions
Abstract Description: Functionality & Macro Architecture
Synthesizable C / C++ / SystemC
HLS Tool: RTL Implementation from abstract descriptionTechnology aware microarchitecture
FSM, Datapaths, Pipelining, Multiplexers, Registers, Memory etc..
Handles the detailed and mechanical RTL implementation
tasks on the basis of Technology Library and HLS directives /
constraints
© Accellera Systems Initiative 2
HLS Tool
Technology LibraryCell functions, Area,
Timing etc..
INPUT DESIGN
C / C++
OUTPUT RTL
Directives / Constraints
Clock, Pipelining, Latency,
Memory Architecture
Verilog
VHDL
Fast, High-Quality path to RTL
5 – 10x Less Code: Reduce design efforts
10 – 1000x Faster Simulation: Increased productivity
Multiple implementations from same input sourceTech Library: FPGA / eFPGA / ASIC / Different tech nodes
Directives/Constraints: Optimize Power / Performance / Area
Bridges Hardware & Software DomainOpens the use of FPGAs to embedded software engineers
Vast pool of free C / C++ tools available to designers
Existing C / C++ implentations available to start with
HLS is more effective for Computation - Algorithm Centric designs
Lots of open software
implementations (Open source /
Inhouse) are available for the
algorithms in these domains
Audio Processing
Speech Algorithm
Computer Vision,
Video Processing,
Image Processing
Networking
5G, Bluetooth
Artificial Intelligence
Deep Learning
Can we define a Methodology to
seamlessly migrate these to Semiconductor
Chips ?
Sobel Filter – Taking through HLS Used in image processing and computer visionUsed for edge detection2D filtering operation
Generates 2D Map of the gradient, X & Y Gradient of image intensity at each pixel
Finds the direction of largest increase from light to dark and rate the change in that direction
© Accellera Systems Initiative 3
Y gradientX gradient
We modified the C code to make it compliant with Synthesizable Subset. Used Mentor Catapult to synthesize C code and generate Verilog
Open Source Implementationhttps://github.com/petermlm/SobelFilter, License: Apache Version 2.0
Contributors: Pedro Melgueira, Alessandro Capotondi,
Sobel Filter – Taking through HLS
© Accellera Systems Initiative 4
Top Level Design: C Function / SystemC ModuleDesign should have only one top level C functions. C++ class should be instantiated in top level C function
Functions:C functions synthesize into RTL blocksC function arguments synthesize into RTL I/O port
Arrays:Arrays in C code synthesize to Memory: RAM / ROM / FIFOArray at top level interface synthesize to ports to access external memory
HLS Tool parse the code for extract the designFunctionality should be extractable at compile timeConstructs must be unambiguous C constructs must be of fixed or bounded size
Not SupportedMemory allocation: malloc, free, new deleteOS System Calls: File read / write, Time, Date etc..Function pointers, Recursive functions etc..Math Library, STL Classes, Other Utility libraries Non const global variablesNon const static data members
Avoid Uninitialized Variables May lead to inconsistent behavior in C & RTL
Dynamic memory allocation => Bounded arrays
Datatypes: Impacts the Precision, Area and Performance 32-bit integer can be avoided if a 10-bit integer is sufficientUse bit accurate data types instead of using standard C / C++ datatypes
SystemC datatypesMentor Graphics : Algorithm C Data types (AC Datatypes)Xilinx: Arbitrary Precision Data Types (AP Data types)
Changing math.h function calls to use appropriate hardware math libraries optimized for synthesis
abs() => ac_abs(), pow() => ac_pow_pwl(), sqrt => ac_sqrt_pwl()
We modified the C code to make it compliant with Synthesizable Subset. Used Mentor Catapult to synthesize C code and generate Verilog
Refining the code for HLS ..
© Accellera Systems Initiative5
Original Code
Synthesizable HLS ModelFUNCTIONALITY
Synthesizable Subset: C, C++, SystemC
HLS Tool
Functionally correct RTL
HLS Tool
OPTIMIZATION
Well defined guidelines Software engineers can do it Requires RTL Expertise
Further Code Restructuring
Optimization directives in the code: Loop unrolling, Loop pipelining etc..
Capture Macro architecture: Registers, Memory, Interfaces etc..)
HLS Tool:Directives Constraints
Optimized RTL
TestbenchTestSuite
Co-simulation
Testbench
Existing Software implementation
(C / C++)
Testbench
C Level ValidationValidate that algorithm is correct
Ensure that algorithm is intactUse same Test SuiteCompare results: original implementation as golden
RTL Functional VerificationSame Test Suite
C Level validationFaster SimulationLess code to verifyCatch bugs earlyReduce efforts for RTL verification
Test SuiteDevelop Comprehensive Test Suite for high functional coverage
Reuse: -Testsuite of original software-Compliance test suite of protocols
REFINE
VALIDATE
OPTIMIZE
Advance ESL Flows – Shift Left
© Accellera Systems Initiative6
Original Code
HLS ModelFUNCTIONALITY
Synthesizable Subset:
HLS Tool
FRTL
Existing Software implementation
(C / C++)
OPTIMIZATIONMacro ArchitectureTool Directive
VP ModelLoosely Timed
Wrap C code in SystemC / TLM
Interfaces
System Verilog, UVM
C / C
++: Unit Testcases
System Level Testcases (C
, Assembly, Em
bedded Applications)
Virtual PrototypeFast Simulation Model of SoC
IP Models
SystemC Wrapper
CPU ISSInterruptController
Memory Model
Bus Model
RTL Simulation FPGA Emulation
HYBRID
PortableStimulus
HLS Resources ..Free / Opensource HLS LibrariesNvidia Matchlib– SystemC/C++ library of common HW functionshttps://github.com/NVlabs/%20matchlib
Open source libraries from XilinxVitis Accelerated Libraries (https://github.com/Xilinx/Vitis_Librarie )Vivado HLS library for FINN – Quantized Neural Network (QNN) using FINN (https://github.com/Xilinx/finn-hlslib)Vivado HLS Tiny Tutorial – Algorithm, Interface and miscellaneous (https://github.com/Xilinx/HLS-Tiny-Tutorials)Vivado HLS Libraries for Networking (https://github.com/Xilinx/HLS_packet_processing)
Open source libraries from Mentor GraphicsHLSLIBS (https://hlslibs.org/)
OthersGemm_hls: Scalable matrix matrix multiplication on FPGA (https://github.com/spcl/gemm_hls)hls4ml: A package for machine learning inference in FPGAs (https://github.com/cornell-zhang/rosetta)Rosetta: A Realistic High-level Synthesis Benchmark Suite for Software Programmable FPGAs (https://github.com/cornell-zhang/rosetta)
Porting Vivado HLS Designs to Catapult HLS Platformhttps://www.mentor.com/hls-lp/resources/overview/porting-vivado-hls-designs-to-catapult-hls-platform-ca231e8d-a9d1-4983-a17b-e55dc7a6a8ae
© Accellera Systems Initiative 7
Design Space Exploration• HLS enables better Design space exploration
– Better chances to get optimal implementation for the target application / device
• Easy Switching between: Software & Hardware (ASIC, FPGA or eFPGA)
• Hardware – software partitioning– Identifying what to run on the processor and what to move to hardware– FPGASoC / eFPGA + HLS + Virtual Platforms
• Same Code can produce design for different purposes– Supports hardware evaluation.
• Rapidly explore options for power, performance, and area without changing source code.• Smaller designs, Faster design, Optimal design
– Re-targeted for different market / applications within days– Re-use between FPFA & ASIC
© Accellera Systems Initiative 8
Thank you for your time.
© Accellera Systems Initiative 9