Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh- Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL, School of Computer and Communication Sciences Improving Synthesis of Compressor Trees on FPGAs via Integer Linear Programming
Philip Brisk2
Paolo Ienne2
Hadi Parandeh-Afshar1,2
1: University of Tehran, ECE Department2: EPFL, School of Computer and Communication Sciences
Improving Synthesis of Compressor Trees on FPGAs via Integer Linear Programming
March 14, 2008 2
Outline Motivation Generalized Parallel Counters ILP Formulation Experimental Results Conclusion
March 14, 2008 3
Outline Motivation Generalized Parallel Counters ILP Formulation Experimental Results Conclusion
March 14, 2008 4
Motivation: Why multi-input addition is important? Partial product reduction in parallel multiplication
Wallace and Dadda in the 1960s Multi-input addition occurs in many multimedia and
signal processing H.264/AVC Variable Block Size Motion Estimation FIR Filters 3G Wireless Base Station Channel Cards
Flow graph transformations expose opportunities to use compresor trees in high-level synthesis [Verma and
Ienne, ICCAD 2004]
March 14, 2008 5
Multi Input Addition Implementation ASIC
Compressor Trees + Final Adder Counters are the basic blocks Wallace/Dadda/3-Greedy
FPGA Adder Trees
Full Adder Implemented in CLB Structure Fast Carry-Chain (Xilinx and Altera) Reduces Routing Delay
Compressor Trees have poor performance Fast carry chains can not be used Counters are inflexible
GOAL: Better implementation of compressor trees on FPGAs
March 14, 2008 6
Outline Motivation Generalized Parallel Counters ILP Formulation Experimental Results Conclusion
March 14, 2008 7
Generalized Parallel Counters (GPCs) Parallel Counter: Sum bits with the same rank Generalized Parallel Counter: Sum bits having
different ranks Example GPCs are more flexible and reduce the number of
logic levels GPCs are more complex, but the additional
complexity is absorbed in LUTs! GPCs are perfect building blocks to create better
compressors out of FPGA LUTs
(3; 2) Counter (3, 3; 4) GPC
March 14, 2008 8
GPC Implementation
K-LUT K-LUT K-LUTGPC
NN
KK
March 14, 2008 9
Goal How to best select GPC types and
connect them to build a compressor tree
0123Rank
March 14, 2008 10
Goal How to best select GPC types and
connect them to build a compressor tree
0123Rank
March 14, 2008 11
Goal How to best select GPC types and
connect them to build a compressor tree
0123Rank
March 14, 2008 12
Goal How to best select GPC types and
connect them to build a compressor tree
0123Rank
March 14, 2008 13
Goal How to best select GPC types and
connect them to build a compressor tree
0123Rank
March 14, 2008 14
Outline Motivation Generalized Parallel Counters ILP Formulation Experimental Results Conclusion
March 14, 2008 15
ILP Formulation
GPC
ki = 0ki = 1
kj = 0
kj = 1
kj = 2
Objective Function Minimizing Levels of GPCs
GPC Representation in ILP
March 14, 2008 16
ILP Formulation Variables
pm,i,ki {0, 1} – True if there is a connection between the m-th input bit and an input of rank ki of GPCi.m0m1m2
GPC1e1,2,0,1 e0,2,1,0
p0,0,0p1,0,1p2,1,0
q0,0,0
q2,1,1
q1,2,2
n0n2 n1
GPC0
GPC2
n3
m3
D3,3
March 14, 2008 17
ILP Formulation Variables
qi,ki,m{0, 1} – True if there is a connection between the ki-th output of GPCi and an output bit of rank m.
m0m1m2
GPC1e1,2,0,1 e0,2,1,0
p0,0,0p1,0,1p2,1,0
q0,0,0
q2,1,1
q1,2,2
n0n2 n1
GPC0
GPC2
n3
m3
D3,3
March 14, 2008 18
ILP Formulation Variables
ei,j,ki,kj{0, 1} – True if there is a connection from the ki-th output of GPCi and an input of rank kj of GPCj.
m0m1m2
GPC1e1,2,0,1 e0,2,1,0
p0,0,0p1,0,1p2,1,0
q0,0,0
q2,1,1
q1,2,2
n0n2 n1
GPC0
GPC2
n3
m3
D3,3
March 14, 2008 19
ILP Formulation Variables
Di,j{0, 1} – True if there is a direct connection from the ith input bit and an output bit of rank j.
m0m1m2
GPC1e1,2,0,1 e0,2,1,0
p0,0,0p1,0,1p2,1,0
q0,0,0
q2,1,1
q1,2,2
n0n2 n1
GPC0
GPC2
n3
m3
D3,3
March 14, 2008 20
ILP Formulation Connection rules
Circuit I/Os Each circuit input should be connected to either a GPC or
the final adder Each output rank should be derived k-times (K=3, final
adder is a ternary adder) GPC I/Os
Satisfying number of allowable I/Os considering input ranks Wires
Satisfying rank constraints of source and destination of each wire
March 14, 2008 21
ILP Formulation ILP Improvement
Using [Parandeh-Afshar et. al, APSDAC 2008] heuristic for estimating maximum number of GPCs at each Level
GPC on level L can only connect to inputs of GPCs on levels L+1 and L+2
March 14, 2008 22
Outline Motivation Generalized Parallel Counters ILP Formulation Experimental Results Conclusion
March 14, 2008 23
Experimental Methodology
CPLEX ILP Solver Altera Stratix-II
90nm CMOS Technology Implementations of multi-input addition
Adder Tree – Ternary adder tree State of the art for FPGAs
Heuristic – Mapping heuristic described in [13] ILP – ILP formulation described here
March 14, 2008 24
Experimental results (Delay)
Delay (ns)
0
1
2
3
4
5
6
7
8
9
adpc
mad
d2Q fir3
G72x_
2mac
Motion
Est.
m12x1
2
m16x1
6
RQGQBQ
RYGYBYsa
mul
Avera
ge
Adder Tree Heuristic ILP
ILP on average is:
32% faster than Adder Tree5% faster than the Heuristic
March 14, 2008 25
Experimental Results (Area)
Area (ALM)
0
50
100
150
200
250
300
350
adpc
mad
d2Q fir3
G72x_
2mac
Motion
Est.
m12x1
2
m16x1
6
RQGQBQ
RYGYBYsa
mul
Avera
ge
Adder Tree Heuristic ILP
ILP on average consumes:
3% less resources than Adder Tree 13% less resources than Heuristic
March 14, 2008 26
Outline Motivation Generalized Parallel Counters ILP Formulation Experimental Results Conclusion
March 14, 2008 27
Conclusion Conventional wisdom has held that adder trees
outperform compressor trees on FPGAs Ternary adder trees were a major selling point of
the Altera Stratix II architecture
Conventional wisdom is wrong! GPCs map nicely onto LUTs Compressor trees on FPGAs, are faster than adder
trees when built from GPCs