Top Banner
Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh- Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL, School of Computer and Communication Sciences Improving Synthesis of Compressor Trees on FPGAs via Integer Linear Programming
27

Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

Jan 20, 2018

Download

Documents

Harvey Ford

March 14, Outline Motivation Generalized Parallel Counters ILP Formulation Experimental Results Conclusion
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

Philip Brisk2

Paolo Ienne2

Hadi Parandeh-Afshar1,2

1: University of Tehran, ECE Department2: EPFL, School of Computer and Communication Sciences

Improving Synthesis of Compressor Trees on FPGAs via Integer Linear Programming

Page 2: Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

March 14, 2008 2

Outline Motivation Generalized Parallel Counters ILP Formulation Experimental Results Conclusion

Page 3: Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

March 14, 2008 3

Outline Motivation Generalized Parallel Counters ILP Formulation Experimental Results Conclusion

Page 4: Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

March 14, 2008 4

Motivation: Why multi-input addition is important? Partial product reduction in parallel multiplication

Wallace and Dadda in the 1960s Multi-input addition occurs in many multimedia and

signal processing H.264/AVC Variable Block Size Motion Estimation FIR Filters 3G Wireless Base Station Channel Cards

Flow graph transformations expose opportunities to use compresor trees in high-level synthesis [Verma and

Ienne, ICCAD 2004]

Page 5: Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

March 14, 2008 5

Multi Input Addition Implementation ASIC

Compressor Trees + Final Adder Counters are the basic blocks Wallace/Dadda/3-Greedy

FPGA Adder Trees

Full Adder Implemented in CLB Structure Fast Carry-Chain (Xilinx and Altera) Reduces Routing Delay

Compressor Trees have poor performance Fast carry chains can not be used Counters are inflexible

GOAL: Better implementation of compressor trees on FPGAs

Page 6: Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

March 14, 2008 6

Outline Motivation Generalized Parallel Counters ILP Formulation Experimental Results Conclusion

Page 7: Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

March 14, 2008 7

Generalized Parallel Counters (GPCs) Parallel Counter: Sum bits with the same rank Generalized Parallel Counter: Sum bits having

different ranks Example GPCs are more flexible and reduce the number of

logic levels GPCs are more complex, but the additional

complexity is absorbed in LUTs! GPCs are perfect building blocks to create better

compressors out of FPGA LUTs

(3; 2) Counter (3, 3; 4) GPC

Page 8: Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

March 14, 2008 8

GPC Implementation

K-LUT K-LUT K-LUTGPC

NN

KK

Page 9: Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

March 14, 2008 9

Goal How to best select GPC types and

connect them to build a compressor tree

0123Rank

Page 10: Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

March 14, 2008 10

Goal How to best select GPC types and

connect them to build a compressor tree

0123Rank

Page 11: Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

March 14, 2008 11

Goal How to best select GPC types and

connect them to build a compressor tree

0123Rank

Page 12: Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

March 14, 2008 12

Goal How to best select GPC types and

connect them to build a compressor tree

0123Rank

Page 13: Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

March 14, 2008 13

Goal How to best select GPC types and

connect them to build a compressor tree

0123Rank

Page 14: Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

March 14, 2008 14

Outline Motivation Generalized Parallel Counters ILP Formulation Experimental Results Conclusion

Page 15: Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

March 14, 2008 15

ILP Formulation

GPC

ki = 0ki = 1

kj = 0

kj = 1

kj = 2

Objective Function Minimizing Levels of GPCs

GPC Representation in ILP

Page 16: Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

March 14, 2008 16

ILP Formulation Variables

pm,i,ki {0, 1} – True if there is a connection between the m-th input bit and an input of rank ki of GPCi.m0m1m2

GPC1e1,2,0,1 e0,2,1,0

p0,0,0p1,0,1p2,1,0

q0,0,0

q2,1,1

q1,2,2

n0n2 n1

GPC0

GPC2

n3

m3

D3,3

Page 17: Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

March 14, 2008 17

ILP Formulation Variables

qi,ki,m{0, 1} – True if there is a connection between the ki-th output of GPCi and an output bit of rank m.

m0m1m2

GPC1e1,2,0,1 e0,2,1,0

p0,0,0p1,0,1p2,1,0

q0,0,0

q2,1,1

q1,2,2

n0n2 n1

GPC0

GPC2

n3

m3

D3,3

Page 18: Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

March 14, 2008 18

ILP Formulation Variables

ei,j,ki,kj{0, 1} – True if there is a connection from the ki-th output of GPCi and an input of rank kj of GPCj.

m0m1m2

GPC1e1,2,0,1 e0,2,1,0

p0,0,0p1,0,1p2,1,0

q0,0,0

q2,1,1

q1,2,2

n0n2 n1

GPC0

GPC2

n3

m3

D3,3

Page 19: Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

March 14, 2008 19

ILP Formulation Variables

Di,j{0, 1} – True if there is a direct connection from the ith input bit and an output bit of rank j.

m0m1m2

GPC1e1,2,0,1 e0,2,1,0

p0,0,0p1,0,1p2,1,0

q0,0,0

q2,1,1

q1,2,2

n0n2 n1

GPC0

GPC2

n3

m3

D3,3

Page 20: Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

March 14, 2008 20

ILP Formulation Connection rules

Circuit I/Os Each circuit input should be connected to either a GPC or

the final adder Each output rank should be derived k-times (K=3, final

adder is a ternary adder) GPC I/Os

Satisfying number of allowable I/Os considering input ranks Wires

Satisfying rank constraints of source and destination of each wire

Page 21: Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

March 14, 2008 21

ILP Formulation ILP Improvement

Using [Parandeh-Afshar et. al, APSDAC 2008] heuristic for estimating maximum number of GPCs at each Level

GPC on level L can only connect to inputs of GPCs on levels L+1 and L+2

Page 22: Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

March 14, 2008 22

Outline Motivation Generalized Parallel Counters ILP Formulation Experimental Results Conclusion

Page 23: Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

March 14, 2008 23

Experimental Methodology

CPLEX ILP Solver Altera Stratix-II

90nm CMOS Technology Implementations of multi-input addition

Adder Tree – Ternary adder tree State of the art for FPGAs

Heuristic – Mapping heuristic described in [13] ILP – ILP formulation described here

Page 24: Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

March 14, 2008 24

Experimental results (Delay)

Delay (ns)

0

1

2

3

4

5

6

7

8

9

adpc

mad

d2Q fir3

G72x_

2mac

Motion

Est.

m12x1

2

m16x1

6

RQGQBQ

RYGYBYsa

mul

Avera

ge

Adder Tree Heuristic ILP

ILP on average is:

32% faster than Adder Tree5% faster than the Heuristic

Page 25: Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

March 14, 2008 25

Experimental Results (Area)

Area (ALM)

0

50

100

150

200

250

300

350

adpc

mad

d2Q fir3

G72x_

2mac

Motion

Est.

m12x1

2

m16x1

6

RQGQBQ

RYGYBYsa

mul

Avera

ge

Adder Tree Heuristic ILP

ILP on average consumes:

3% less resources than Adder Tree 13% less resources than Heuristic

Page 26: Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

March 14, 2008 26

Outline Motivation Generalized Parallel Counters ILP Formulation Experimental Results Conclusion

Page 27: Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL,…

March 14, 2008 27

Conclusion Conventional wisdom has held that adder trees

outperform compressor trees on FPGAs Ternary adder trees were a major selling point of

the Altera Stratix II architecture

Conventional wisdom is wrong! GPCs map nicely onto LUTs Compressor trees on FPGAs, are faster than adder

trees when built from GPCs