SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter A. Milder 1 ([email protected]) Franz Franchetti, James C. Hoe, and Markus Pueschel 2 Department of ECE Department of ECE Carnegie Mellon University CMU/ECE/Hoe, February 2013, slide‐1 now with 1 SUNY Stonybrook and 2 ETH
38
Embed
SPIRAL DSP Transform Compilerinst.eecs.berkeley.edu/~cs294-88/sp13/lectures/SPIRAL... · 2013-04-08 · SPIRAL DSP Transform Compiler: Application Specific Hardware Synthesis Peter
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
SPIRAL DSP Transform Compiler:Application Specific Hardware Synthesis
Peter A. Milder1 ([email protected])Franz Franchetti, James C. Hoe, and Markus Pueschel2
Department of ECEDepartment of ECECarnegie Mellon University
CMU/ECE/Hoe, February 2013, slide‐1
now with 1SUNY Stonybrook and 2ETH
The SPIRAL ProjectThe SPIRAL Project• High performance implementations of linear DSP
f (DFT DCT DWT fil )transforms (DFT, DCT, DWT, filters, etc) are an important class of design problemsH d d i d t i i t i k d i• Hand design and tuning is tricky and expensive– needs both math and implementation knowledge
i i d di– time‐consuming and tedious – needs to repeat effort for every new context
• SPIRAL research goal: A flexible push‐button design generator that produces SW & HW implementations comparable with expert hand design
CMU/ECE/Hoe, February 2013, slide‐2
comparable with expert hand design
Why we can do better than hand designy g
• SPIRAL is only focused on linear DSP transforms
• These transforms are highly structured, highly regular and very well understood mathematicallyegu a a d e y e u de stood at e at ca y
• Algorithmic implementations of a transform can be d f ll i k f lenumerated following a known set of rules
• For a given objective function and mapping target, aFor a given objective function and mapping target, a computer generates a solution at least as good as the best human effortby trying enough
CMU/ECE/Hoe, February 2013, slide‐3
implementations
SPIRAL Framework
SPIRAL
I want a DFT of size 1024on a {Xilinx, P4, Cell....}
SPIRAL automationstarts here
where mostwhere mosttools beginautomatingthe problem
CMU/ECE/Hoe, February 2013, slide‐4
Principle 1: Domain knowledge in the systemPrinciple 2: Optimization at a high level of abstraction
www.spiral.net/hardware/dftgen.html
CMU/ECE/Hoe, February 2013, slide‐5
High‐Level, Quality, and SpecializationHigh Level, Quality, and Specialization
High level:High‐level:tools know
better than you
RTL Synthesis: general purpose
y
RTL Synthesis: general‐purposebut special handling of
structures like FSM, arith, etc.
Place and Route: works the same
, ,
CMU/ECE/Hoe, February 2013, slide‐6
Place‐and‐Route: works the sameno matter what design
OutlineOutline• SPIRAL Formula Framework
• SPIRAL for HW FFT cores
• SPIRAL for HW FFT “un”‐coreSPIRAL for HW FFT un core
CMU/ECE/Hoe, February 2013, slide‐7
Linear TransformsLinear Transforms• Linear transform is a matrix‐vector multiplication
– computing by definition takes O(N2) operations– the matrix has structure
• E.g. discrete Fourier transform: y = DFTN xy0y1.
x0x1.
k 0 .. N-1
j
yj.
= xk.
.Njkie 2j
0 ..
CMU/ECE/Hoe, February 2013, slide‐8
.yN-1
.xN-1
N-1
“Fast” AlgorithmsFast Algorithms• a “fast” algorithm factors the matrix into a sequence of structured sparse matricesstructured, sparse matricescheaper sparse multiplies O(N log(N)) operations
• E g Cooley Tukey Factorization of DFT• E.g. Cooley‐Tukey Factorization of DFT4
11
1111
11
1111
111111ii
11
1
1111
111
1
1111
11
111111
11
iii
ii
• Matrix formula representation
44
CMU/ECE/Hoe, February 2013, slide‐9
4222
42224 LDFTIDIDFTDFT
Factorization RulesFactorization RulesE.g. Cooley‐Tukeyg y y
mnnmn
mnnmnmn LDFTIDIDFTDFT
11
– DFT2 is – D is a diagonal matrix of twiddle factors
1111
– L is a stride permutation matrix– AB=[aj,kB] is the tensor (or kronecker) product
A In
a0,0a0,0a0,000 a0,1a0,1a0,10
0
a1,0a1 00 a1,1a1 1
00BBB
e g I B
CMU/ECE/Hoe, February 2013, slide‐10
A In a1,0a1,00a1,1a1,10
0
B
e.g., In B
“Fast” Fourier Transform AlgorithmsFast Fourier Transform Algorithms• Recursively factorize by the Cooley‐Tukey rule until only leaf cases remain (e g DFT for radix r)only leaf cases remain (e.g. DFTr for radix‐r)
8242
82428 LDFTIDIDFTDFT
• Exponential number of alternatives
82
4222
42222
8242 LLDFTIDIDFTIDIDFT
Exponential number of alternatives 8DFT
2DFT 4DFT
8DFT
4DFT 2DFT
• Each ruletree corresponds a different algorithm
2DFT 4DFT2DFT 2DFT2DFT 2DFT
CMU/ECE/Hoe, February 2013, slide‐11
• Each ruletree corresponds a different algorithm• All cost O(N log(N))
A System of Transforms and Rulesy
2
)(2 2/1,1 FdiagDCT II
QnIV
nII
nII
n FIDCTDCTPDCT 22/)(
2/)(
2/)(
DDCTSDCT IIn
IVn )()(
IV )(r
IVn MMDCT 1
)(
PDFTIDIDFTDFT
CDSTDCTBDFT In
Inn )( )(
2/)(2/
50+ transforms150+ rules
))(()()( // hFIIIhF ddnkdk
dnn
PDFTIDIDFTDFT mnmnnm
EhCirchF )()(
( )n
WHT I WHT I
EWIPIWDWTWDWT knnnn )())(()( 2/2/2/
EhCirchFn )()(
CMU/ECE/Hoe, February 2013, slide‐12
1 1 12 2 2 21
( )n n n n n ni i i t
iWHT I WHT I
Algorithmic Design Spaceg g psize # of DFT # of DCT‐IV