Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,
Post on 29-Sep-2020
2 Views
Preview:
Transcript
Design of Accelerators with Constraints
Design of Processor Accelerators with Constraints
Christophe Wolinski1, Krzysztof Kuchcinski2,Kevin Martin3, Erwan Raffin3,4 and François Charot3
1Rennes University I/IRISA, France2Dept. of Computer Science, Lund University, Sweden
3INRIA, Centre Rennes-Bretagne Atlantique, France
4Thomson R&D, Rennes, FranceWolinski & Kuchcinski 1(19)
Design of Accelerators with Constraints
Problem Definition
Data-path
Processor core
x
in
out
x
in
out
in
+
in
out
+
in
out
in
out
/* Sample C code */ void fir(const int x[], const int h[], int y[]) { int i, j, sum; for (j = 0; j < 100; j=j+1) { sum = 0; for (i = 0; i < 8; i=i+1) sum += x[i + j] * h[i]; sum = sum >> 15; y[j] = sum; } }
Compilation
Wolinski & Kuchcinski 2(19)
Design of Accelerators with Constraints
ASIP
Reg1 Reg2 RegN
InterconnectionsData-path
Processor core
Reconfigurablecell
in
x
in
xin
+
in
+
+ x x out
out out out
x
in
out
x
in
out
in
+
in
out
+
in
out
in
out
. . .
ASIP
Reconfigurablecell
m
m
*
+
in
out
+
*
in
+
in
in ctrl 0
ctrl 1
0
10 2
1
application specificinstructionsreconfigurable unitsbetter performanceand lower poweretc.
Wolinski & Kuchcinski 3(19)
Design of Accelerators with Constraints
Main Problems
Identification of computational patterns for instructionsSelection of a subset of instructions for implementationSequential or parallel execution scenariosPattern merging to build reconfigurable cell
Goal:Get speed-up of an application with minimal hardware cost.
Wolinski & Kuchcinski 4(19)
Design of Accelerators with Constraints
Main Problems
Identification of computational patterns for instructionsSelection of a subset of instructions for implementationSequential or parallel execution scenariosPattern merging to build reconfigurable cell
Goal:Get speed-up of an application with minimal hardware cost.
Wolinski & Kuchcinski 4(19)
Design of Accelerators with Constraints
Our Design Flow
Hardware Software
GECOS compiler framework
Pattern generation
Graph coveringScheduling
C Front-end
HCDG builder Dataflow analyser
PolyhedraltransformationsSelection
CDFG
HCDGHCDG
Polyhedric HCDG
HCDG
Architecture model
HCDG
DIPS
Pattern merging
SystemC, VHDL generator
Synthesis GCC NIOS
Program generator
SPS NIS
merged patterns
VHDL
Modified C
Assembly Code
DURASE flow
Extensionbitstream
C program
Target Instruction Set
ExtensionCABA Model
SystemC
CP-based methods:Pattern GenerationGraph covering andSchedulingPattern merging
Wolinski & Kuchcinski 5(19)
Design of Accelerators with Constraints
CP Solution
JaCoP.graph constraints(Sub-)graph isomorphism constraintsClique constraintsSimple pathetc.
Other standard constraints (number of inputs/outputs, criticalpath, etc.)
Methods based on constraintsPattern generation- purely based on constraintMatch identificationPattern selection and schedulingPattern merging
Wolinski & Kuchcinski 6(19)
Design of Accelerators with Constraints
CP Solution
JaCoP.graph constraints(Sub-)graph isomorphism constraintsClique constraintsSimple pathetc.
Other standard constraints (number of inputs/outputs, criticalpath, etc.)Methods based on constraints
Pattern generation- purely based on constraintMatch identificationPattern selection and schedulingPattern merging
Wolinski & Kuchcinski 6(19)
Design of Accelerators with Constraints
Pattern Generation
n0 n1 n2
nsn3
n4
n5
n6 n7
n8
Pattern 3 inputs2 outputs
allsucc(Ns) = {n4,n5,n6,n7,n8}
Seed node
∀n ∈ Np ∧ n 6= ns ∃path(Pns , n, ns)
∀n ∈ (N − (allsucc(ns) ∪ ns)) : nsel = 1 ⇒X
m∈succ(n)
msel ≥ 1
∀n ∈ (N − (allsucc(ns) ∪ ns)) :X
m∈succ(n)
msel = 0⇒ nsel = 0
∀n ∈ allsucc(ns) : nsel = 1 ⇒X
m∈(pred(n)∩(allsucc(ns )∪ns ))
msel ≥ 1
∀n ∈ allsucc(ns) :X
m∈(pred(n)∩(allsucc(ns )∪ns ))
msel = 0 ⇒ nsel = 0
Wolinski & Kuchcinski 7(19)
Design of Accelerators with Constraints
Pattern Generation
n0 n1 n2
nsn3
n4
n5
n6 n7
n8
Pattern 3 inputs2 outputs
allsucc(Ns) = {n4,n5,n6,n7,n8}
Seed node
∀n ∈ Np ∧ n 6= ns ∃path(Pns , n, ns)
∀n ∈ (N − (allsucc(ns) ∪ ns)) : nsel = 1 ⇒X
m∈succ(n)
msel ≥ 1
∀n ∈ (N − (allsucc(ns) ∪ ns)) :X
m∈succ(n)
msel = 0⇒ nsel = 0
∀n ∈ allsucc(ns) : nsel = 1 ⇒X
m∈(pred(n)∩(allsucc(ns )∪ns ))
msel ≥ 1
∀n ∈ allsucc(ns) :X
m∈(pred(n)∩(allsucc(ns )∪ns ))
msel = 0 ⇒ nsel = 0
Wolinski & Kuchcinski 7(19)
Design of Accelerators with Constraints
Pattern Generation
n0 n1 n2
nsn3
n4
n5
n6 n7
n8
Pattern 3 inputs2 outputs
allsucc(Ns) = {n4,n5,n6,n7,n8}
Seed node
∀n ∈ Np ∧ n 6= ns ∃path(Pns , n, ns)
∀n ∈ (N − (allsucc(ns) ∪ ns)) : nsel = 1 ⇒X
m∈succ(n)
msel ≥ 1
∀n ∈ (N − (allsucc(ns) ∪ ns)) :X
m∈succ(n)
msel = 0⇒ nsel = 0
∀n ∈ allsucc(ns) : nsel = 1 ⇒X
m∈(pred(n)∩(allsucc(ns )∪ns ))
msel ≥ 1
∀n ∈ allsucc(ns) :X
m∈(pred(n)∩(allsucc(ns )∪ns ))
msel = 0 ⇒ nsel = 0
Wolinski & Kuchcinski 7(19)
Design of Accelerators with Constraints
Pattern Generation
n0 n1 n2
nsn3
n4
n5
n6 n7
n8
Pattern 3 inputs2 outputs
allsucc(Ns) = {n4,n5,n6,n7,n8}
Seed node
∀n ∈ Np ∧ n 6= ns ∃path(Pns , n, ns)
∀n ∈ (N − (allsucc(ns) ∪ ns)) : nsel = 1 ⇒X
m∈succ(n)
msel ≥ 1
∀n ∈ (N − (allsucc(ns) ∪ ns)) :X
m∈succ(n)
msel = 0⇒ nsel = 0
∀n ∈ allsucc(ns) : nsel = 1 ⇒X
m∈(pred(n)∩(allsucc(ns )∪ns ))
msel ≥ 1
∀n ∈ allsucc(ns) :X
m∈(pred(n)∩(allsucc(ns )∪ns ))
msel = 0 ⇒ nsel = 0
Wolinski & Kuchcinski 7(19)
Design of Accelerators with Constraints
Pattern Generation (cont’d)
DIPS ← ∅for each ns ∈ N
TPS ← ∅CPS ← FindAllPatterns(G, ns)for each p ∈ CPS
if ∀pattern ∈ TPS : p 6≡ patternTPS ← TPS ∪ {p},NMPp ← | FindAllMatches(G, p) |
NMPns ← | FindAllMatches(G, ns) |for each p ∈ TPS
if coef · NMPn ≤ NMPpDIPS ← DIPS ∪ {p}
return DIPS
Wolinski & Kuchcinski 8(19)
Design of Accelerators with Constraints
Pattern Selection
*0
in
+8
*1
in
*2
in
+9
*3
in
*4
in
+10
*5
in
*6
in
+11
*7
in
+26
+12 +13
+27
*14 *16 *17*15
+18 +19
*22 *20 *21*23
+24+25
outout
+
out
+
in *
in
+
in *
in
+
out
+
*
in
+
in*
in
+
out *
*
out
*
in+
out
in
+
* *
*
in
out
*
in
out +
* out
in *
out
+
out
+
in
*
+
in
out
*
in
*
out
in
+
out
in
+
out
in
+
out
in
// Inputs: G=(N,E)-- application graph,// DIPS-- Definitively Identified Pattern Set// Mp-- set of matches for pattern p,// M-- set of all matches,// matchesn-- set of matches that could cover the node n,
M ← ∅for each p ∈ DIPS
Mp ← FindAllMatches(G, p)M ← M ∪ Mp
for each m ∈ Mfor each n ∈ m
matchesn ← matchesn ∪ {m}
m0 m1 m9n0n
*
all matches
all n
od
es n2
m2m3 m4m5 m6m7m8
n5
n3n4
n1
n6
* * * * * *
***
*** **
*
Wolinski & Kuchcinski 9(19)
Design of Accelerators with Constraints
Pattern Selection
*0
in
+8
*1
in
*2
in
+9
*3
in
*4
in
+10
*5
in
*6
in
+11
*7
in
+26
+12 +13
+27
*14 *16 *17*15
+18 +19
*22 *20 *21*23
+24+25
outout
+
out
+
in *
in
+
in *
in
+
out
+
*
in
+
in*
in
+
out *
*
out
*
in+
out
in
+
* *
*
in
out
*
in
out +
* out
in *
out
+
out
+
in
*
+
in
out
*
in
*
out
in
+
out
in
+
out
in
+
out
in
// Inputs: G=(N,E)-- application graph,// DIPS-- Definitively Identified Pattern Set// Mp-- set of matches for pattern p,// M-- set of all matches,// matchesn-- set of matches that could cover the node n,
M ← ∅for each p ∈ DIPS
Mp ← FindAllMatches(G, p)M ← M ∪ Mp
for each m ∈ Mfor each n ∈ m
matchesn ← matchesn ∪ {m}
m0 m1 m9n0n
*
all matches
all n
od
es n2
m2m3 m4m5 m6m7m8
n5
n3n4
n1
n6
* * * * * *
***
*** **
*
Wolinski & Kuchcinski 9(19)
Design of Accelerators with Constraints
Pattern Selection
*0
in
+8
*1
in
*2
in
+9
*3
in
*4
in
+10
*5
in
*6
in
+11
*7
in
+26
+12 +13
+27
*14 *16 *17*15
+18 +19
*22 *20 *21*23
+24+25
outout
+
out
+
in *
in
+
in *
in
+
out
+
*
in
+
in*
in
+
out *
*
out
*
in+
out
in
+
* *
*
in
out
*
in
out +
* out
in *
out
+
out
+
in
*
+
in
out
*
in
*
out
in
+
out
in
+
out
in
+
out
in
// Inputs: G=(N,E)-- application graph,// DIPS-- Definitively Identified Pattern Set// Mp-- set of matches for pattern p,// M-- set of all matches,// matchesn-- set of matches that could cover the node n,
M ← ∅for each p ∈ DIPS
Mp ← FindAllMatches(G, p)M ← M ∪ Mp
for each m ∈ Mfor each n ∈ m
matchesn ← matchesn ∪ {m}
m0 m1 m9n0n
*
all matches
all n
od
es n2
m2m3 m4m5 m6m7m8
n5
n3n4
n1
n6
* * * * * *
***
*** **
*
Wolinski & Kuchcinski 9(19)
Design of Accelerators with Constraints
Pattern Selection and Scheduling
Match selection- optimize execution time
ExecutionTime =∑m∈M
msel ·mdelay
Match delay defined by constraints
mdelay = δinm + δm + δoutm
Wolinski & Kuchcinski 10(19)
Design of Accelerators with Constraints
Scheduling Example- FIR filter
Integer mul
in in
add mul
in in
mul
in in
add
add mul
in in
add mul
in in
Integer
add mul
in in
mul
in in
shr
out out
add
add
add
mul
in in
add
in
out
mul
in in
mul
in in
out
Integer mul
in in
add
out
Integer
shr
out out
add
in add
in in
match M_18
match M_7
match M_6
match M_5
match M_4
match M_3
match M_20
Wolinski & Kuchcinski 11(19)
Design of Accelerators with Constraints
Scheduling Example (cont’d)
!
"
#
$
%
&
'
(
)
*
"!
""
"#
"$
"%
"&
"'
"(
")
"*
#!
+,-
+,-
./")
./")
01
./(
./(
01
./'
./'
01
./&
./&
01
./%
./%
01
./$
./$
01
./#!
01
2,3
01
01
2,3
01
01
2,3
456755
01
01
01
01
01
01
01
2,3
2,3
0182,3
9:3;1<021<
2,3
!"#$%&'(%&
)*+,
)*+,
)*-
)*-
)*.
)*.
)*/
)*/
)*0
)*0
)*1
)*1
'%
)*23
3
+
2
1
0
/
.
-
,
4
+3
++
+2
+1
+0
+/
'%
'%
'%
(5#
(5#
(5#
'%
(5#
'%
(5#
'%
(5#
'%
678977
'%:(5#
;5<
;5<
Wolinski & Kuchcinski 12(19)
Design of Accelerators with Constraints
Pattern Merging
+0
out
+1
*2
in
+3
*4
in
+5
+6
out
*7
in
in
in
+0/+5
outmux3
+1/+6
mux1
out
*2
in
+3
in in
mux0
*4/*7
in
in in
+0/+6
out
+1
*2
in
+3/+5
in in*4/*7
in
a) original strategy b) our method
Wolinski & Kuchcinski 13(19)
Design of Accelerators with Constraints
Pattern Merging
+0
out
+1
*2
in
+3
*4
in
+5
+6
out
*7
in
in
in
+0/+5
outmux3
+1/+6
mux1
out
*2
in
+3
in in
mux0
*4/*7
in
in in
+0/+6
out
+1
*2
in
+3/+5
in in*4/*7
in
a) original strategy b) our method
Wolinski & Kuchcinski 13(19)
Design of Accelerators with Constraints
Pattern Merging- compatibility graph
*2/*7
+3/+5 +3/+6
+1/+5 +1/+6
+0/+5 +0/+6
*4/*7
(*2,+1)/(*7,+6)
(+3,+0)/(+5,+6) (+1,+0)/(+5,+6)
(*4,+1)/(*7,+6) (*2,+1,+0)/(*7,+6) (*4,+1,+0)/(*7, +6)
Wolinski & Kuchcinski 14(19)
Design of Accelerators with Constraints
Pattern Merging- additional constraints
Critical path constraints
Delayu = Latency(u) · selu∀(u, v) ∈ E : Startu + Delayu ≤ Startv∀u ∈ Out : Startu + Delayu ≤ CPL
Number of multiplexers on critical path
Wolinski & Kuchcinski 15(19)
Design of Accelerators with Constraints
Results
Results obtained for MediaBench and MiBench benchmark setscompiled for NIOS target processor with DURASE system.
2 in / 1 out 4 in / 2 outmodel A model B model A model B
Benchmarks Nod
es
cycl
es
coef
iden
tified
sele
cted
cove
rage
cycl
es
spee
dup
sele
cted
cove
rage
cycl
es
spee
dup
coef
iden
tified
sele
cted
cove
rage
cycl
es
spee
dup
sele
cted
cove
rage
cycl
es
spee
dup
JPEG Write BMP Header 34 34 0 6 2 82% 14 2.42 2 82% 14 2.42 0 66 2 88% 12 2.83 3 88% 12 2.83JPEG Smooth Downsample 66 78 0 5 2 19% 68 1.14 2 19% 68 1.14 0 49 4 95% 44 1.77 4 100% 35 2.22JPEG IDCT 250 302 0.5 28 10 76% 214 1.41 10 76% 134 2.25 0.5 254 13 83% 141 2.36 15 89% 112 2.69EPIC Collapse 274 287 0 11 8 68% 165 1.74 8 68% 165 1.74 0 111 11 71% 156 1.83 14 71% 159 1.8BLOWFISH encrypt 201 169 0.5 11 3 74% 90 1.87 3 74% 90 1.87 0 153 8 90% 81 2.08 7 92% 73 2.31SHA transform 53 57 0 5 3 64% 28 2.03 3 64% 28 2.03 0 48 8 98% 22 2.59 6 95% 17 3.35MESA invert matrix 152 334 0.5 2 2 10% 320 1.04 2 10% 320 1.04 0.5 53 9 65% 262 1.27 9 65% 243 1.37FIR unrolled 67 131 0 3 2 9% 126 1.04 2 9% 126 1.04 1 10 2 94% 98 1.30 2 97% 67 1.95FFT 10 18 0 0 - - - - - - - - 0 12 2 60% 10 1.80 2 60% 10 1.80
Average 50% 1.5 50% 1.7 83% 2 84% 2.3
Wolinski & Kuchcinski 16(19)
Design of Accelerators with Constraints
Results (cont’d)
Rijndael and GSM encoders for patterns with 7 nodes limit.
speed-up
Application |V| iden
tified
sele
cted
cove
rage
Seq. Par.Part of Rijndael encryption encoder 106 10 6 75% 1.9 2.9Part of GSM encoder 604 11 7 66% 2.1 3.4
Wolinski & Kuchcinski 17(19)
Design of Accelerators with Constraints
Conclusions
Constraint makes it possible to explore solutions that is difficultto examine using specific algorithms.Constraints provide flexibility of defining different conditions.(Sub-)graph isomorphism constraints offer easy way to definedesign problems.Experimental results are very encouraging.
Wolinski & Kuchcinski 18(19)
Design of Accelerators with Constraints
Further Reading
Ch. Wolinski and K. Kuchcinski.Automatic selection of application-specific reconfigurable processorextensions.In Proc. Design Automation and Test in Europe, Munich, Germany, March10-14, 2008.
Ch. Wolinski, K. Kuchcinski, K. Martin, E. Raffin, and F. Charot.How constrains programming can help you in the generation of optimizedapplication specific reconfigurable processor extensions.In Proc. of The Intl. Conference on Engineering of Reconfigurable Systemsand Algorithms, Las Vegas, USA, (Invited paper), July 13-16, 2009.
K. Martin, Ch. Wolinski, K. Kuchcinski, A. Floch, and F. Charot.Constraint-driven identification of application specific instructions in theDURASE system.In SAMOS IX: International Workshop on Systems, Architectures, Modelingand Simulation, Samos, Greece, July 20-23, 2009.
Wolinski & Kuchcinski 19(19)
top related