The Queen’s Tower The Queen’s Tower Imperial College London Imperial College London South Kensington, SW7 South Kensington, SW7 Generating Hardware Generating Hardware Designs by Source Code Designs by Source Code Transformation Transformation Ashley Brown, Wayne Luk, Paul Kelly STS ‘06
27
Embed
Generating Hardware Designs by Source Code Transformation
Generating Hardware Designs by Source Code Transformation. Ashley Brown, Wayne Luk, Paul Kelly STS ‘06. What would we like to do?. Take an algorithm in written in C. Generate an efficient hardware design, run it on an FPGA. Fast design cycle, easy to maintain code. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
The Queen’s TowerThe Queen’s TowerImperial College LondonImperial College LondonSouth Kensington, SW7South Kensington, SW7
Generating Hardware Generating Hardware Designs by Source Code Designs by Source Code
TransformationTransformation
Ashley Brown, Wayne Luk, Paul KellySTS ‘06
21st June 2005 | Ashley Brown # 2
What would we like to do?What would we like to do?
• Take an algorithm in written in C.
• Generate an efficient hardware design, run it on an FPGA.
• Fast design cycle, easy to maintain code.
• C programmers should be able to create fast hardware!
21st June 2005 | Ashley Brown # 3
Background: Handel-CBackground: Handel-C
• C-based programming language for digital system design.
• One clock-cycle per statement.
• Explicit parallelism.
• Compiler generates hardware design from Handel-C source.
while (j != 3) { par { t0 = aa[0] * bb[0]; t1 = aa[1] * bb[1]; } par { cc[i][j] = t0 + t1; j++; }}
Handel-C code example.
21st June 2005 | Ashley Brown # 4
ProblemsProblems
• Software programmers: Bad Handel-C, poor hardware.– No exploitation of statement-level parallelism.
– Long expressions.
– Lots of for loops!
• Experienced Handel-C designers: good hardware, hard to read code.– Trickery to reduce clock cycles, increase clock rate.
• Finding the “optimal” solution is not easy.– Optimisation effectiveness depends on the target
architecture (see the results later!)
21st June 2005 | Ashley Brown # 5
SolutionsSolutions
• Restructure Handel-C code to optimise.– Can parallelise if desired.– Duplicate hardware if necessary.
• Apply transformations to the original source, leaving it intact.– The original readable description is still available.– A more efficient version is used for hardware generation.
• Allow the user to define custom transformations with a transformation language.
• Generate a whole design-space of solutions, with different optimisations.
21st June 2005 | Ashley Brown # 6
What’s New?What’s New?
• Previous work with user-specified transformations has been:– For software-based C.
– Aimed at parallelising/optimising for microprocessors
• Can’t duplicate microprocessor hardware on the fly – it’s either there or not.We can duplicate hardware, pipeline – FASTER DESIGN!
• Previous work on hardware language transformations do not allow the user to describe transformations (Haydn-C).We do – the user can target their code explicitly.
• Exploring an entire design-space is usually done at the hardware level, not high-level language (although not always, e.g. ASC).We generate a full design-space – find *the* best solution.
21st June 2005 | Ashley Brown # 7
Basic ComponentsBasic Components
// 1 * x = x
std_times 1_elim {
pattern {
1 * cmlexpr(operand)
}
generate {
cmlexpr(operand
}
}
always transform
)
Wildcards, such as cmlexpr, allow a pattern to be matched and substituted
into the new tree
The generate section describes the code should
replace the pattern.
The pattern section describes the format of the
code to match for this transformation.
The optional always keyword indicates that this
transformation should always be applied where it
can.
Each transformation can have a name to identify it
for reporting.
CML transformations are defined within transform
blocks.
Wildcard matching:• cmlexpr - matches any expression• cmlstmt - matches any statement• cmlstmtlist - matches a list of statements
Wildcard matching:• cmlexpr - matches any expression• cmlstmt - matches any statement• cmlstmtlist - matches a list of statements
21st June 2005 | Ashley Brown # 8
Ensuring Data IntegrityEnsuring Data Integrity
• Three types of condition are defined to ensure data integrity:– Data-flow sets.
– Expression evaluation.
– Constant validation.
• Transformations have a conditions section to define these.
21st June 2005 | Ashley Brown # 9
Hand-coded vs AutomatedHand-coded vs Automated
Sequential Automated
do { if(A >= B) { A -= B; C = (C << 1) | 1; } else { C << 1; } B >>= 1; Bits--;}while(Bits != 0);
do { par { if(A >= B) { par { A -= B; C = (C << 1) | 1; } } else { C << 1; } B >>= 1; Bits--; }}while(Bits != 0);
do { par { if(A >= B) { par { A -= B; C = (C << 1) | 1; } } else { C = (C << 1); } B = (B >> 1); Bits = (Bits – 1); }}while(Bits != 0);
Hand-coded
21st June 2005 | Ashley Brown # 10
Test TransformationsTest Transformations
• Generic – applicable to all programs:– autopar – parallelise sequential statements with no
dependencies.
– fortowhile – convert for loops into corresponding while loops.
– lttoeq – convert for loops with < in the loop condition to ==.
• Application specific – targetted at the test programs:– matrixpar – parallelisation of an inner loop.
21st June 2005 | Ashley Brown # 11
More TransformationsMore Transformations
• Various mathematical rearrangments:– Factorise to reduce multiplies.
– Remove *1, *0, +0 etc.
• More interesting:– Dead-code elimination (remember data conditions!)
– Variable replacement• remove dependencies in code by replacing variables with the
expressions assigned to them last (again, remember data conditions!)
21st June 2005 | Ashley Brown # 12
Execution Time ImprovementExecution Time Improvement
Power/Execution Time Comparison at 50MHz
0
500
1000
1500
2000
2500
3000
3500
4000
base autopar fortowhile lttoeq matrixpar-
noshift
matrixpar-shift
Code Version
Exe
cution T
ime
(ns)
0
50
100
150
200
250
300
Dynam
ic P
ow
er E
stim
ate (m
W)
Execution Time
Dynamic Power Estimatelttoeq increases fmax on Altera, but decreases it on
Xilinx
lttoeq increases fmax on Altera, but decreases it on
Xilinx
Ex
ec
uti
on
Tim
e (s
)
Optimisation Applied (Optimisations are Cumulative)
21st June 2005 | Ashley Brown # 13
Design-Space ExplorationDesign-Space Exploration
• Difficult to decide which transformation is best.
• Don’t guess, produce several solutions.
• Branch the AST whenever a transformation is applied.– In-place branches: small AST.
– Propagate branches when no more transformations can be applied.
– Repeat transformation process on each new solution.
21st June 2005 | Ashley Brown # 14
Design Space ExplorationDesign Space Exploration
239
232139
97
98
99
100
101
102
103
104
105
0 50 100 150 200 250 300
Code Version
fmax
21st June 2005 | Ashley Brown # 15
Design Space ExplorationDesign Space Exploration
• Assume design with an fmax of 104MHz, must match that.
• Many solutions matching.– we should consider other factors such as area, power or
number of cycles.
• Being brief: look at solutions 139 and 232.
• Only partially parallelised. Solution with most parallelism (239) does not meet the fmax requirement.
21st June 2005 | Ashley Brown # 16
Future WorkFuture Work
• Extensions to the language to allow additional matching.
• expr replicator, complex expression matching.
• Preservation of structure – e.g. a++; does not become a = a + 1;
• Heuristics for selecting transformations to apply.
• Genetic algorithms for transformation selection? “Breed” good transformation solutions.