Top Banner
Recent Advances in Cut-based Recent Advances in Cut-based FPGA Technology Mapping Kevin Chung April 3, 2009
66

Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Oct 12, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Recent Advances in Cut-based Recent Advances in Cut-based

FPGA Technology Mapping

Kevin ChungApril 3, 2009

Page 2: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Preamble

� Logic synthesis and verification

research is alive and vibrant

� FPGAs are growing fast – scalability

in runtime and memory paramountin runtime and memory paramount

Page 2

Page 3: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Outline

1. Review of Cut-based Mapping

2. More Efficient Cut Computation

3. Lossless Synthesis

4. Priority Cuts

5. Area Recovery

6. WireMap

Page 3

Page 4: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Cut-based Mapping Algorithm

Input: And-Inverter Graph

1. Compute all K-feasible cuts

2. Compute best arrival time at each node

• In topological order (from PI to PO)

• Assuming that each cut maps to a K-LUT

Page 4

• Assuming that each cut maps to a K-LUT

• Assuming that each K-LUT has unit delay

3. Chose the best cover

• In reverse topological order (from PO to PI)

Output: Mapped Netlist

Page 5: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Cut-based Mapping Advantages

� Advantages

–Cuts have direct correspondence to LUTs

• Easy to create LUT-based cost functions� different LUT input delays

� output switching activity

Page 5

� output switching activity

–Cut computation is fast and simple

–Dynamic programming mapping solution

• guarantees optimal delay

• efficient search of LUT design space

Page 6: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Cut-based Mapping Challenges

� Feasible cuts grow quickly wrt LUT size

� Results depend upon AIG netlist

structure– many possible equivalent AIG structures

– logic restructuring optimizations that

K

Avg # of

cuts per

node

Page 6

– logic restructuring optimizations that

works well for one part of the design

may not give good mapping for another

4 8

5 16

6 38

7 95

8 240

Page 7: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Outline

1. Review of Cut-based Mapping

2. More Efficient Cut Computation

• Cut Dropping

• Cut Dominance

Page 7

• Cut Dominance

3. Lossless Synthesis

4. Priority Cuts

5. Area Recovery

6. WireMap

Page 8: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Cut Dropping

{ {q}, {b, c} }

r

{ {r}, {p, q}, {p, b, c}, {a, b, q}, {a, b, c} }

During bottom up computation of cuts, the set of cuts of a node

can be freed once all its fan-outs have been processed

{ {p}, {a, b} } Can delete these cuts

Page 8

a b c

p q

{ {q}, {b, c} }

Bottom-up

computation

{ {p}, {a, b} }

• Once the cuts of node r are computed, the cuts of q are no longer needed

• But can’t discard the cuts of node p since not all fan-outs of p have been processed

• Dramatically reduces peak memory consumption on large designs

once node r is done

Page 9: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Cuts Behaving Badly

x

f { .. {d, b, c} .. {a, b, c} .. }

{ .. {a, d, b, c} .. {a, b, c} .. }

Bottom-up cut computation in the presence of re-convergence

might produce dominated cuts

x = ~a + a.b + ~b.c

Page 9

a cb

d e

f { .. {d, b, c} .. {a, b, c} .. }

Cut {a, b, c} dominates cut

{a, d, b, c}

• The “good” cut {a, b, c} is there: so not a quality issue

• But the “bad” cut {a, d, b, c} may be propagated further: so a run-time issue

• Want to discard dominated cuts quickly

Page 10: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Signature-based Dominance

Problem: Given two cuts how to quickly determine whether one is

a subset of another

sig (c) = Σ 2ID(n) mod 32

n ∈c

Define signature of a cut:

(Σ means bit-wise OR)

Page 10

Observation: If cut c1 dominates cut c2 then

sig(c1) OR sig(c2) = sig(c2)

Cheap test for the common case that a cut does not dominate another. Only if

this fails is an actual comparison made.

n ∈c

where ID(n) is the integer id of node n

(Σ means bit-wise OR)

Page 11: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Example

� Let the node id’s be a = 1, b = 2, c = 3, d = 4

� Let c1 = {a, b, c} and c2 = {a, d, b, c}

� sig (c1) = 21 OR 22 OR 23

= 0001 OR 0010 OR 0100

= 0111

Page 11

= 0111

� sig (c2) = 21 OR 24 OR 22 OR 23

= 0001 OR 1000 OR 0010 OR 0100

= 1111

� As sig (c1) OR sig (c2) ≠ ≠ ≠ ≠ sig (c1), c2 does not dominate c1

� But sig (c1) OR sig (c2) = sig (c2), so c1 may dominate c2

Page 12: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

K = 4 K = 5 K = 6 K = 7 K = 8

Name N C/N T, s C/N T, s C/N T, s C/N T, s C/N T, s L/N, %

alu4 2642 6.7 0.00 12.3 0.01 23.1 0.04 45.5 0.18 94.7 1.02 0.00

apex2 2940 7.2 0.01 14.2 0.02 29.2 0.07 62.6 0.32 139.7 1.90 0.00

apex4 2017 8.5 0.00 19.5 0.03 47.0 0.10 116.3 0.62 293.5 4.49 0.10

bigkey 3080 6.6 0.01 12.1 0.02 24.2 0.05 50.1 0.20 99.7 0.84 0.00

clma 11869 8.1 0.04 18.2 0.11 44.4 0.51 114.9 3.01 306.3 20.99 1.64

des 3020 8.0 0.01 17.0 0.03 38.7 0.12 92.0 0.69 218.0 4.80 4.37

diffeq 2566 6.5 0.01 12.3 0.01 26.6 0.07 65.0 0.50 155.9 2.80 3.66

dsip 2521 6.2 0.01 10.7 0.01 20.7 0.03 42.0 0.10 86.7 0.44 0.00

Run-time of K-feasible Cut Computation

Page 12

dsip 2521 6.2 0.01 10.7 0.01 20.7 0.03 42.0 0.10 86.7 0.44 0.00

elliptic 5502 6.4 0.01 10.6 0.03 18.5 0.07 36.9 0.33 83.4 2.12 0.20

ex1010 7652 9.2 0.02 23.3 0.11 61.8 0.61 165.8 4.01 438.2 30.43 1.99

ex5p 1719 9.4 0.01 24.1 0.02 66.2 0.17 188.2 1.30 514.8 10.50 14.14

frisc 5905 7.1 0.01 14.4 0.04 32.3 0.16 79.8 0.88 209.0 6.30 1.24

misex3 2441 7.7 0.01 15.7 0.02 33.3 0.08 73.7 0.38 170.7 2.48 0.00

pdc 7527 9.4 0.03 24.8 0.12 67.4 0.68 183.7 4.41 489.4 31.71 4.40

s298 2514 7.9 0.00 17.5 0.02 44.0 0.13 121.9 0.94 346.5 7.10 7.56

s38417 12867 6.6 0.03 13.5 0.10 32.0 0.46 83.1 3.24 225.9 23.72 3.38

s38584 11074 6.1 0.03 11.4 0.06 22.4 0.20 46.7 0.98 101.5 5.81 0.86

seq 2761 7.5 0.00 15.2 0.02 31.7 0.08 68.6 0.37 153.3 2.25 0.04

spla 6556 9.6 0.03 25.8 0.11 73.9 0.69 215.5 4.98 561.4 31.14 13.83

tseng 1920 6.5 0.01 11.8 0.01 23.5 0.04 50.6 0.21 112.7 1.32 1.35

Average 4954.6 7.56 0.01 16.22 0.05 38.05 0.22 95.15 1.38 240.0 9.61 2.94

Page 13: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

K = 4 K = 5 K = 6 K = 7 K = 8

Name Total Drop Total Drop Total Drop Total Drop Total Drop

clma 2.56 0.10 6.60 0.22 18.09 0.54 52.03 1.47 152.55 4.07

ex1010 1.87 0.37 5.45 0.97 16.25 2.27 48.40 4.68 140.70 8.38

pdc 1.90 0.27 5.69 0.75 17.42 2.00 52.75 4.98 154.56 11.83

s38417 2.28 0.15 5.28 0.37 14.12 1.10 40.80 3.55 121.98 10.25

s38584.1 1.80 0.11 3.86 0.20 8.52 0.40 19.72 0.86 47.15 1.94

spla 1.68 0.21 5.15 0.59 16.63 1.65 53.88 4.34 154.44 10.04

Peak Memory in Mb with Cut Dropping

Page 13

spla 1.68 0.21 5.15 0.59 16.63 1.65 53.88 4.34 154.44 10.04

Ratio 1.00 0.11 1.00 0.10 1.00 0.08 1.00 0.07 1.00 0.06

Page 14: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Outline

1. Review of Cut-based Mapping

2. More Efficient Cut Computation

3. Lossless Synthesis

4. Priority Cuts

Page 14

4. Priority Cuts

5. Area Recovery

6. WireMap

Page 15: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Structural Bias

The mapped netlist very closely resembles the subject graph

f

Technology

Mapping

fp

p

Page 15

a b c d

Mapping

e a b c d e

Every input of every LUT in the mapped netlist must be present in the

subject graph ..

.. otherwise technology mapping will not find the match

m

m

Page 16: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

The Problem of Structural Bias

f

f

f

Root problem: Best matches for mapping may not be found

This match is not found

p

p

Page 16

a b c d e a b c d e a b c d e

Since the point q is not present in the subject graph,

the match on the extreme right will not be found

q

mm

Page 17: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

The Problem of Structural Bias

f

f

The match would be found with a different subject graph

p

f

Page 17

a b c d e

a b c d e

q

m

a b c d

q

e

=

Page 18: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Traditional Synthesis Flow

Technology-

independent

synthesis

sweep

eliminate

resub

simplify

Boolean

Network

No guarantee of optimality since each

synthesis step is heuristic.

Page 18

Since only network at the end of technology independent synthesis used

for mapping, good intermediate netlists not used

fx

resub

sweep

eliminate

sweep

full simplify

Technology

Specific

Mapping

Mapped

Netlist

But structural bias means the mapped

netlist depends heavily on the final

network.

Page 19: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Lossless Synthesis Flow

Idea: Merge intermediate networks into a single network with choices

which can be explored during mapping

sweep

eliminate

resub

Boolean

Network

Technology-

independent

synthesis

Choice operator

Page 19

Technology mapping is not

any harder with choices

(Lehman-Watanabe ’95,

Chen and Cong `01)

resub

simplify

fx

resub

sweep

eliminate

sweep

full simplify

Technology

MappingMapped

Netlist

Choice operator

Page 20: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Lossless Synthesis Flow

sweep

eliminate

resub

Boolean

Network

speed up

Script

optimizes

areaScript

optimizes

delay

Can combine results of different technology independent optimization

scripts

Page 20

resub

simplify

fx

resub

sweep

eliminate

sweep

full simplify

Technology

MappingMapped

Netlist

reduce

depth

delay

Page 21: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Mapping with Choices

sweep

eliminate

resub

simplify

Boolean

Network

Question 1:

How to implement an

efficient choice operator?

Page 21

fx

resub

sweep

eliminate

sweep

full simplify

Technology

MappingMapped

Netlist

efficient choice operator?

Question 2:

How to map quickly with

choices?

Page 22: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Mapping with Choices

sweep

eliminate

resub

simplify

Boolean

Network

Question 1:

How to implement an

efficient choice operator?

Page 22

fx

resub

sweep

eliminate

sweep

full simplify

Technology

MappingMapped

Netlist

efficient choice operator?

Question 2:

How to map quickly with

choices?

Page 23: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Detecting Choices

Task: Given two Boolean networks, we need to create a network with choices

Network 1

x = (a + b).c

y = b.c.d

Network 2

x = a.c + b.c

y = b.c.d

Step 1: Make And-Inverter decomposition of networks

Page 23

a b c d

x y

a b c d

x y

Step 1: Make And-Inverter decomposition of networks (dotted means inversion)

Page 24: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Detecting Choices

Network 1

x = (a + b).c

Network 2

x = a.c + b.c

Step 2: Use combinational equivalence to detect functionally equivalent nodes up to complementation (Kuehlmann ’04, …)

– Random simulation to detect possibly equivalent nodes

– SAT-based decision procedure to prove equivalence

Page 24

y = b.c.d y = b.c.d

a b c d

x y

a b c d

x y

Page 25: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Detecting Choices

Step 3: Merge equivalent nodes with choice edges

x y x y

Page 25

a b c d a b c d

a b c d

x y

x now represents a

class of nodes that are

functionally equivalent

up to complementation

Page 26: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Mapping with Choices

sweep

eliminate

resub

simplify

Boolean

Network

Question 1:

How to implement an

efficient choice operator?

Page 26

fx

resub

sweep

eliminate

sweep

full simplify

Technology

MappingMapped

Netlist

efficient choice operator?

Question 2:

How to map quickly

with choices?

Page 27: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Mapping without Choices

Input: And-Inverter Graph

1. Compute all K-feasible cuts

2. Compute best arrival time at each node

• In topological order (from PI to PO)

Page 27

• Assuming that each cut maps to a K-LUT

• Assuming that each K-LUT has unit delay

3. Chose the best cover

• In reverse topological order (from PO to PI)

Output: Mapped Netlist

Page 28: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Mapping with Choices

Input: And-Inverter Graph with Choices

1. Compute all K-feasible cuts with choices

2. Compute best arrival time at each node

• In topological order (from PI to PO)

Page 28

• Assuming that each cut maps to a K-LUT

• Assuming that each K-LUT has unit delay

3. Chose the best cover

• In reverse topological order (from PO to PI)

Output: Mapped Netlist

Only Step 1 requires modification

Page 29: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Cut Computation with Choices

Cuts are now computed for equivalence classes of nodes

x yx1 x2

{ {x1}, {p, r}, {p, b, c}, {a, c, r}, {a, b, c} } { {x2}, {q, c}, {a, b, c} }

Page 29

Cuts ( x ) = Cuts ( x1 ) ∪∪∪∪ Cuts( x2 )

= { {x1}, {p, r}, {p, b, c}, {a, c, r}, {a, b, c}, {x2}, {q, c} }

a b c d

p q r

Page 30: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Mapping with Choices

Input: And-Inverter Graph with Choices

1. Compute all K-feasible cuts with choices

2. Compute best arrival time at each node

• In topological order (from PI to PO)

Page 30

• Assuming that each cut maps to a K-LUT

• Assuming that each K-LUT has unit delay

3. Chose the best cover

• In reverse topological order (from PO to PI)

Output: Mapped Netlist

No changes needed except for Step 1

Page 31: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Lossless Synthesis Summary

Also called Mapping with Structure Choices

Advantages

� Equivalent netlist variations are recorded

– mapping algorithm selects best among alternative

Page 31

– mapping algorithm selects best among alternative

structures to optimize a cost function

� Simple extension of mapping algorithm

Disadvantages

� Even more cuts to explore!

Page 32: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Outline

1. Review of Technology Mapping

2. More Efficient Cut Computation

3. Lossless Synthesis

4. Priority Cuts

Page 32

4. Priority Cuts

5. Area Recovery

1. Area-flow

2. Exact Area

6. WireMap

Page 33: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Exhaustive Cut Enumeration Mapping

� Large designs have many K-feasible cuts

– 1M node AIG has ~40M 6-cuts

– Needs ~2GB and ~30 sec for computation

�Past ways of tackling the problem

Page 33

– Detect and remove dominated cuts

• Does not help much

– Perform cut pruning (store N cuts/node)

• Throws away useful cuts even if N = 1000

– Store only cuts on the frontier

• Reduces memory but increases runtime

Page 34: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Priority Cuts: A Bag of Tricks

• Compute and prioritize cuts (select subset of all cuts)

• Fast and memory efficient – affordable for multiple passes

• Potentially lower quality overcome via multiple passes

• Use different sorting criteria in each mapping pass to explore

additional cost criteria

Page 34

• Include the best cut from the previous pass into the set of

candidate cuts of the current pass

• Efficient memory management

• Only maintain complete set of priority cuts for nodes on the

mapping frontier

• Precompute frontier to create efficiently managed memory pool

• Only save best cut for each node

Page 35: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Computing Priority Cuts

� Consider nodes in a topological order

– At each node, merge two sets of fanin cuts (each containing up to C

cuts) getting (C+1) * (C+1) + 1 cuts

– Sort these cuts using a given cost function, select C best cuts, and

use them for computing priority cuts of the fanouts

– Select one best cut, and use it to map the node

Page 35

– Select one best cut, and use it to map the node

� Sorting criteria

Mapping pass Primary metric Tie-breaker 1 Tie-breaker 2

depth depth cut size area flow

area flow area flow fanin refs depth

exact area exact area fanin refs depth

Page 36: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Priority-Cut-Based Mapping

Input: And-Inverter Graph

1. Compute all K-feasible cuts for each node

2. Compute arrival time at each node

• In topological order (from PI to PO)

• Compute the depth of all cuts and choose the best one

• Compute at most C good cuts and choose the best one

3. Chose the best cover

Page 36

3. Chose the best cover

• In reverse topological order (from PO to PI)

Output: Mapped Netlist

Page 37: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Complexity Analysis

� Traditional mapping algorithm

– FlowMap O(Kmn) (J. Cong et al, TCAD ’94)

– CutMap O(2KmnK) (J. Cong et al, FPGA ’95)

– DAOmap O(KnK) (J. Cong et al, ICCAD’04)

� Proposed mapping algorithm

Page 37

� Proposed mapping algorithm

– O(KC2n)

• 6-LUT mapping has about 5X speedup

• 8-LUT mapping has up to 100X speedup

K is max cut size

C is max number of cuts

n is number of nodes

m is number of edges

C between 8 and 16 achieves

optimal delay with good runtime

Page 38: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Outline

1. Review of Technology Mapping

2. More Efficient Cut Computation

3. Lossless Synthesis

4. Priority Cuts

Page 38

4. Priority Cuts

5. Area Recovery

1. Area-flow

2. Exact Area

6. WireMap

Page 39: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Overview of Area Recovery

� Initial mapping is delay oriented

– Gets best delay for all paths

– Area-based tie-breaking

� Not all paths critical

– Area recovery tries to slow down non critical paths to

Page 39

– Area recovery tries to slow down non critical paths to

reduce area

– Each node with positive slack: choose a different cut

that reduces area

– Done as subsequent passes after delay-oriented

mapping

� Question: how to measure area?

Page 40: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

How to Measure Area?

q r

x

p

y

q r

x

p

y

Naïve definition: Area (cut) = 1 + [ Σ area (fan-in) ]

Page 40

c d e fa b

Area of cut {p, c, d}

= 1 + [1 + 0 + 0]

= 2

c d e fa b

Area of cut {a, b, q}

= 1 + [ 0 + 0 + 1]

= 2

Naïve definition says both cuts are equally good in area

Naïve definition ignores sharing due to multiple fan-outs

Page 41: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Area-flow

q r

x

p

y

q r

x

p

y

∑+=i i

i

nLeafNumFanout

nLeafAFnAF

))((

))((1)(

Page 41

c d e fa b

Area-flow of cut {p, c, d}

= 1 + [1 + 0 + 0]

= 2

c d e fa b

Area-flow of cut {a, b, q}

= 1 + [ 0/1 + 0/1 + ½]

= 1.5

Area-flow “correctly” accounts for sharing and penalizes replication

It is a floating point value!

Area-flow recognizes that cut {a, b, q} is better

Page 42: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Area Recovery with Area-flow

1. Do delay-optimal mapping

2. Compute slack at each node

3. Do area recovery with area-flow

– Done in topological order from PI to PO

Page 42

– Among all the cuts which do not exceed slack budget

choose cut with smallest area-flow

– Fan-out of a node is estimated from delay optimal

mapping

– We only do one pass

• Saw only marginal improvement on subsequent passes

Page 43: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Exact Area

p

X

6 6

p

X

6 6

Exact-area (cut) = 1 + [ Σ exact-area (fan-in with no other fan-out) ]

- Gives minimum area solution within an MFFC

Page 43

Cut {s, t, q}

Area flow = 1+ [.25+.25 +1] = 2.5

Exact area = 1 + 1 = 2 (due to q)

Area flow will choose this cut.

Cut {p, e, f}

Area flow = 1+ [(.25+.25+3)/2] = 2.75

Exact area = 1 + 0 (p is used elsewhere)

Exact area will choose this cut.

db c e fa

s tq

db c e fa

s tq

6

Page 44: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Area Recovery with Exact-area

1. Do delay-optimal mapping

2. Compute slack at each node

3. Do area recovery with area-flow

4. Do area recovery with exact-flow

Page 44

4. Do area recovery with exact-flow

– Done in topological order from PI to PO

– Among all the cuts which do not exceed slack budget

choose cut with smallest exact-area

– Note: Unlike area-flow, no estimation involved

– We only do one pass

• Saw only marginal improvement on subsequent passes

Page 45: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Priority-Cut Mapping with Area Recovery

Input: And-Inverter Graph

1. Compute all K-feasible cuts for each node

2. Compute arrival time at each node

• In topological order (from PI to PO)

• Compute the depth of all cuts and choose the best one

• Compute at most C good cuts and choose the best one

3. Perform area recovery

Page 45

3. Perform area recovery

• Using area flow

• Using exact local area

• In each iteration, re-compute at most C good cuts and choose the best one

4. Chose the best cover

• In reverse topological order (from PO to PI)

Output: Mapped Netlist

Page 46: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Area Recovery Summary

�Two step area recovery

�Area-flow has global view

�Exact area has local view

Page 46

–Ensures local minimum is reached

�Order in which nodes are processed

for both steps is important

�Order of the two passes is important

Page 47: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Experimental Comparison

� Compare area-recovery with state-of-the-art academic mapper DAOmap– DAOmap uses many (~10) different area recovery heuristics

– Some more effective than others

� Just the two heuristics of area-recovery and exact-area give better results on their benchmarks

Page 47

area give better results on their benchmarks

� Also separate comparison with choices obtained from lossless synthesis flow– Six snapshots of MVSIS script.rugged

– Not the best FPGA optimization script ☺

– Improves both area and delay

Page 48: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

DAOmap MVSIS-baseline MVSIS-choices MVSIS-choices 2x Example

Depth LUTs T, s Depth LUTs T, s Depth LUTs T, s Depth LUTs T, s

alu4 6 1065 0.5 6 992 0.34 6 972 0.64 6 949 +0.84

apex2 7 1352 0.6 7 1200 0.36 7 1249 0.95 7 1191 +1.34

apex4 6 931 0.7 6 891 0.24 6 895 0.74 6 894 +1.47

bigkey 3 1245 0.6 3 797 0.34 3 797 0.75 3 684 +1.07

clma 13 5425 5.9 13 4426 1.50 11 3883 4.30 11 3453 +5.20

des 5 965 0.8 5 1024 0.36 5 947 0.93 5 1104 +1.87

diffeq 10 817 0.6 10 844 0.30 9 745 0.46 9 736 +0.43

Comparison with DAOmap

Page 48

dsip 3 686 0.5 3 686 0.23 3 685 0.19 3 684 +0.36

elliptic 12 1965 2.0 12 2017 0.61 12 2005 0.72 12 2022 +1.25

ex1010 7 3564 4.0 7 3258 1.15 7 3305 3.39 7 3302 +5.80

ex5p 6 778 1.0 6 744 0.36 5 724 1.17 5 675 +1.40

frisc 16 1999 1.9 15 2009 0.76 14 1875 1.54 13 1867 +1.58

misex3 6 980 0.8 6 957 0.26 6 926 0.73 6 861 +0.94

pdc 7 3222 4.6 8 2920 1.13 7 2738 4.73 7 2692 +5.59

s298 13 1258 2.4 13 826 0.30 12 863 4.07 11 826 +1.49

s38417 9 3815 3.8 9 3864 1.46 8 2989 4.04 7 2729 +2.76

s38584 7 2987 27.0 7 2844 1.11 7 2497 2.58 6 2470 +1.69

seq 6 1188 0.8 6 1109 0.30 5 1136 0.79 6 1016 +1.38

spla 7 2734 4.0 7 2535 1.03 7 2319 4.68 7 2224 +4.79

tseng 10 706 0.6 10 752 0.25 8 719 0.39 8 705 +0.31

Ratio 1.00 1.00 1.00 1.00 0.93 0.37 0.95 0.89 0.95 0.93 0.86 1.46

Page 49: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Outline

1. Review of Cut-based Mapping

2. More Efficient Cut Computation

3. Lossless Synthesis

4. Priority Cuts

Page 49

4. Priority Cuts

5. Area Recovery

6. WireMap

Page 50: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Motivation

� Cut-based mapping algorithms do well in

minimizing LUT levels and area (LUT count)

– Performance of circuit correlates to LUT levels

– Logic block utilization correlates well to LUT count

Page 50

� Could we change cut based mapping to improve

netlist for packing, placement, routing?

� Area calculation gives each LUT equal weight –

but should this be the case?

Page 51: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Virtex-5 LUT6

LUT6

A6

A5

A4

A3

O6

O5

Page 51

A3

A2

A1

Page 52: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

V5 LUT6 Details and Packing

A6

A5

A4

A3

A2

A1

5LUT

O6LUT

Page 52

A1

O6

5LUT O5

O5LUT

Can we produce smaller LUTs without increasing LUT levels?

Page 53: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Placement and Routing

�Routing is done for connections between

inputs and outputs of a LUT (and other

design elements)

� Fewer connections to route should make

the design easier to place and route

Page 53

the design easier to place and route

�Can we come up with a mapping algorithm

to minimize the total # of connections in a

design?

Page 54: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Motivation Revisited

�Could we use cut based mapping to

improve netlist for clustering, placement,

routing?

– Can we come up with a mapping algorithm to

Page 54

– Can we come up with a mapping algorithm to

minimize the total # of connections in a design?

– Can we produce smaller LUTs without increasing

LUT levels?

�Area calculation gives equal weight to all

LUTs – should that be the case?

Page 55: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Edge Recovery Overview

Key: Find a simple to compute cut metric that minimizes edge counts and creates more small LUTs

∑+=i i

i

nLeafNumFanout

nLeafEFnNumFaninnEF

))((

))(()()(

Page 55

1. Edge flow phase: Use edge flow cost function to minimize global edge counts

2. Exact edge phase: Use optimal algorithm to minimize edge counts within MFFCs

• Contrast with Area Flow eqn:

∑+=i i

i

nLeafNumFanout

nLeafAFnAF

))((

))((1)(

Page 56: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Edge Flow Phase

1. Do delay-optimal mapping

2. Compute slack at each node

3. Do area recovery with area-flow with one change in how cuts are selected

– Done in topological order from PI to PO

Page 56

– Done in topological order from PI to PO

– Among all the cuts which do not exceed slack budget choose cut with smallest area-flow

– If 2 cuts have the same area-flow then choose the cut with the lower edge-flow

• Edge flow is a tie breaker when area is within epsilon

Page 57: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Exact Edge Phase

1. Do delay-optimal mapping

2. Compute slack at each node

3. Do edge recovery with edge-flow

4. Do edge recovery with exact edge with one

Page 57

4. Do edge recovery with exact edge with one

change

– Done in topological order from PI to PO

– Among all the cuts which do not exceed slack budget

choose cut with smallest area, and to break ties choose

cuts with lower number of edges

– Note: Unlike edge-flow, no estimation involved

Page 58: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Modified Cut Prioritization Heuristics

� Consider nodes in a topological order

– At each node, merge two sets of fanin cuts (each containing C

cuts) getting (C+1) * (C+1) + 1 cuts

– Sort these cuts using a given cost function, select C best cuts, and

use them for computing priority cuts of the fanouts

– Select one best cut, and use it to map the node

Page 58

– Select one best cut, and use it to map the node

� Sorting criteria

Mapping pass Primary metric Tie-breaker 1 Tie-breaker 2

Depth depth cut size area flow

area/edge flow area flow edge flow depth

exact area/edge exact area exact edge depth

Page 59: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Experimental Method

• Implemented WireMap using ABC

• Compared against two ABC mapping algorithms

• Baseline – mapping with area recovery

• Mapping with Structure Choices (MSC) – area-recovery mapping with alternative netlists produced

Page 59

recovery mapping with alternative netlists produced by synthesis

• WireMap was built on top of MSC

• Performed packing of single-output LUTs to dual-output LUTs using maximum cardinality matching

• Used VPR to place/route design for wirelength and critical path delays

Page 60: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

WireMap Results

� MSC is superior to baseline mapping

– Single-output LUT count reduced by 9.1%

– Edge count reduced by 8.1% and dual-output LUT count reduced

by 7.7% - similar level of reduction as single-output LUT count

� WireMap leads to further reduction in edges by 9.3%

Page 60

and dual-output LUT count by 9.4% versus MSC

– Single-output LUT count only reduced by 1.3% wrt. MSC

� WireMap improvements to edges and dual-output

LUTs not directly related to single-output LUT count

reduction

Page 61: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

WireMap Results - Packing

LUT Distribution: MSC vs. WireMap

50.00%

60.00%

The histogram below shows how the single-output LUT size distribution was

modified leading to a 9.4% reduction in dual output LUT6s

Page 61

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

%L

UT

s

MSC WireMap

MSC 4.71% 8.00% 15.87% 23.49% 47.93%

WireMap 10.12% 12.66% 17.89% 20.19% 39.14%

LT2 LT3 LT4 LT5 LT6

Page 62: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

WireMap Results – Place and Route

• Wirelength was reduced by 8.5% vs. MSC

• Minimum channel width reduced by 6%.

Page 62 twl = total wire length, mcw = minimum channel width required to route in VPR

*cpd = critical path delay using the smallest possible channel width across the three implementations

• Critical path delay reduced by 2.3%.

Page 63: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

WireMap Summary

�Edge recovery cut-based mapping algorithm

that extends area recovery heuristic with an

edge cost function

– area flow->edge flow

Page 63

– exact area->exact edge

�Minimizes total # of connections in the

design

� Improves packing by increasing frequency of

smaller LUTs

Page 64: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Overall Summary

� Cut-based mapping is efficient and flexible

� Lossless Synthesis

– Map over multiple synthesis snapshots

� Priority Cuts

– Limit # of cuts explored

Page 64

– Limit # of cuts explored

• Runtime and memory scalability

• Without compromising QoR

� Improved area recovery

– Global area-flow and local exact area

– Order of application is important

� WireMap

– Pack/place/route friendly cut-based mapping algorithm

Page 65: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

Key Takeaways

� Pay attention to runtime and memory scalability

� Defer choices between alternative implementations to

later phases that make better decisions

� Global optimization followed by exact local

optimization is effective

Page 65

optimization is effective

� Overcome suboptimal solution via multiple passes

that explore different corners of the optimization space

� Best solutions consider what is done in synthesis,

mapping, placement and routing

Page 66: Recent Advances in Cut-based FPGA Technology Mapping · Cut-based Mapping Advantages Advantages –Cuts have direct correspondence to LUTs • Easy to create LUT-based cost functions

References

� S. Jang, B. Chan, K. Chung, and A. Mishchenko, "WireMap:

FGPA technology mapping for improved routability". Proc.

FPGA '08. PDF

� S. Cho, S. Chatterjee, A. Mishchenko, and R. Brayton,

"Efficient FPGA mapping using priority cuts". (Poster.) Proc.

FPGA '07. PDF

Page 66

FPGA '07. PDF

� A. Mishchenko, S. Chatterjee, and R. Brayton, "Improvements

to technology mapping for LUT-based FPGAs". IEEE TCAD,

Vol. 26(2), Feb 2007, pp. 240-253. PDF ICCAD

� All publications for ABC:

http://www.eecs.berkeley.edu/~alanmi/publications/