A Short History Of HLS Discussion Alex Grove, FirstEDA Dr. David Thomas, Imperial College>
A Short History Of HLS Discussion
Alex Grove, FirstEDA Dr. David Thomas, Imperial College>
HLS Tackles Complexity & Scale
• Abstraction is what enables design complexity
• Today’s BIG Question: why are we still using RTL?
Transistors Gates
RTL
IP & Reuse
IP (Flexible)
HLS ?
What is HLS, ESL, Systems, High Level ..
• ESL term originated with Garry Smith (EDA Analysis) – Electronic System Level
• ESL anything that operates above RTL design process – Targeting digital hardware
• Very large scope in terms of design tools – Design capture
– Simulation & Analysis
– Implementation
• HLS target is typically RTL in VHDL or Verilog – So to provide path to implementation
– Few looked to skip the existing RTL implementation process
My historical relationship with HLS
• I’m a software engineer
– Never been taught RTL or digital design
• First used HLS to target an FPGA around 1999
– The HLS: Handel-C
– The device: Xilinx 4000 series
• For me it was the perfect hardware language
– Complex control-flow as easy as C
– Cycle-level pipelining as easy as VHDL
What is HLS Synthesis?
• H/W functionality is described in an untimed source – Designers create the “how”
i.e. the architecture
– Tool optimizes the micro architecture, the “when”
E F
+
x +
x
“HLS is a term used to described a new class of synthesis tools that work above the RTL level of abstraction. The key difference to RTL synthesis is the input to a HLS tool is mostly untimed and the tool decides when operations take place. This is called scheduling and was first introduced some 20 years ago under the name of Behavioural Synthesis.”
HLS, circa 2001 int filter(int x);
int combine(int a, int b);
int myFunc(int *pMem, int n)
{
int acc=0, i;
for(i=0; i<n; i++){
if(filter(i)){
acc=combine(pMem[i],acc);
}
}
return acc;
}
What about Software Acceleration?
• Software acceleration (C synthesis) is a automated processes that optimizes a function, typically described in C/C++, to some parallel architecture. In this use case the functional description requires minor refinements however such techniques provide modest improvements in terms of performance compared to that of HLS.
• Examples :-
– Celoxica & Handle-C
– PICO (Program In Chip Out) HP Labs
• VLIW processor
Why Didn’t We All Use Handel-C?
• Users did not see the need for it
• FPGA real-estate too expensive
• Timing model is explicit and cycle-based
• Couldn’t automatically pipeline
When? Yesterday, Today, Tomorrow?
2010 1990 2000
WWW 1990 Stratix II 2004
VHDL 1987
Xilinx First FPGA 1985
3G 2004
Virtex-II Pro 2002
FLEX 10K 1995 Stratix 2002
APEX 1999 FLEX 8000 1992
Stratix V 2010
Virtex 1998
DC 3.0c 1993
GSM 1991
Virtex-7 2010
DVB 1995
DVB-T 1999
DVB-T2
FLEX 10k 2.3 K LE -> StratixV 952 K LE VirtexII Pro 44K Slices -> Virtex7 700K (690T) / 1900k (2000T)
When? Yesterday, Today, Tomorrow?
2010 1990 2000
Formal (EC) 1995
Specman 1992
IMEC CATHEDRAL-II (1986)
Synopsys BC 1994
SPW 1990
Handle-C 1990
SystemVerilog 2005
Mentor Catapult 2004
Synplify DSP 2006
CyberWorkBench 2006
AutoESL 2006
C2S 2008
Altera DSP Builder 2005
SUPERLOG 2000
OCAPI 1999
Synfora 2003
SystemC 1.0 2000
AccelChip 2003
PhyOpt 1999
Mentor Monet 1999
Celoxica 2000
SpecC 2001
Forte 2001
CoWare 1996
C Level 1998
Cynapps Cynlib 1999
Current relationship with HLS
• Teaching: we use HLS for 1st year projects
• User: just another tool in the toolbox
• Research: trying to improve HLS tools
Tools We Focus Upon in Research
LegUp Vivado
HLS
Altera OpenCL
Maxeler OpenSPL
Input language
LegUp Vivado
HLS
Altera OpenCL
Maxeler OpenSPL
OpenCL
Plain C or C++
High-level Streaming
Target users
LegUp Vivado
HLS
Altera OpenCL
Maxeler OpenSPL
Advanced software developers
Software developers Hardware developers
System Integration
LegUp Vivado
HLS
Altera OpenCL
Maxeler OpenSPL
Advanced software developers
None (full custom)
Overall Goals
LegUp Vivado
HLS
Altera OpenCL
Maxeler OpenSPL
Robustness & Standardisation
Low-level performance & IP development
Exploration & Research
Throughput & system delivery
Floating Point
Memory Access
Key questions in academia
• What or who is HLS for? – Allow software developers to program hardware
– Make hardware developers more efficient
• How high is “High”? – For loops and if statements?
– While loops and recursion?
• What is the right integration model? – Is HLS just for making IP cores?
– How should HLS integrate into software systems?
So, Which language?
• C/C++ and all of it’s derivatives – Includes
• SystemC (C + Concurrency) • SystemVerilog • Math languages like Mathworks M • Model Based Design
– Simulink like block based schematics
• API’s like OpenCL / CUDA? • Something else?
– VHDL 2008? Has fixed point types now..
Has the climate for HLS changed?
• Can the tools handle more input constructs?
• Has the synthesis quality got a lot better?
• Are the FPGAs getting bigger / cheaper?
• Are RTL programmers more expensive?