Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1.

Post on 14-Dec-2015

213 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

Transcript

Derivation of Efficient FSM from Polyhedral Loop Nests

Tomofumi Yuki, Antoine Morvan, Steven Derrien

INRIA/Université de Rennes 1

1

High-Level Synthesis

Writing HDL is too costly Led to emergence of HLS tools

HLS is sensitive to input source Must be written in “HW-aware” manner

Source-to-Source transformations Common in optimizing compilers (semi-)automated exploration at HLS

stage Further enhance

productivity/performance

2

HLS Specific Transformations Not all optimizing compiler

transformations make sense in embedded context Its converse is also true

Finite State Machines is an example

for loops are preferred ingeneral purpose context

3

for i for j S0 for k S1

while (…) if (…) S0; if (…) S1; if (…) k = k+1; if (…) i=i+1; j=0;

FSMderivation

Contributions

Analytical model of Loop Pipelining Understanding when to use Nested LP w.r.t. Single Loop Pipelining

Derivation of Finite State Machines Handles imperfectly nested loops Based on polyhedral techniques

Pipelining of the control-path Computing n-states ahead Improves performance of the control-

path

4

Outline

Modeling Loop Pipelining Single Loop Pipelining Nested Loop Pipelining NLP vs SLP

FSM Derivation Evaluation Conclusion

5

Single Loop Pipelining

Overlapped execution of innermost loop

6

for i=1:M for j=1:N S(i,j);

for i=1:M for j=1:N stage0(i,j); stage1(i,j); stage2(i,j); stage3(i,j);

for i=1:M s0(i,1); s1(i,1); s0(i,2); s2(i,1); s1(i,2); … s3(i,1); s2(i,2); … s0(i,N); s3(i,2); … s1(i,N); … s2(i,N); s3(i,N);

Pipeline flush/fill Overhead

Overhead for each iteration of the outer loop

7

i=1 i=2 i=3

for i=1:M for j=1:N s0(i,j); s1(i,j); s2(i,j); s3(i,j);

under-utilized stages

Nested Loop Pipelining

“Compress” by pipelining alltogether

8

i=1i=2

i=3

for i=1:M for j=1:N s0(i,j); s1(i,j); s2(i,j); s3(i,j);

for i=1:M j=j+1; j<N s0(i,j); s1(i,j); s2(i,j); s3(i,j);

while(has_next) i,j=next(i,j) s0(i,j); s1(i,j); s2(i,j); s3(i,j);

NLP Overhead

Larger control-path FSM for loop nest, instead of a single

loop FSM for SLP is a simple check on loop

bound Hinders maximum frequency

Complex control-path may take longer than one data-path stage

Savings in flush/fill overhead must be greater than the loss in frequency

9

Modeling Trade-offs

Important parameters: Frequency Degradation due to NLP Innermost trip count Number of pipeline stages

f*: NLP frequency normalized to SLP f* = 0.9 means 10% degradation in

frequency α= #stages / trip count

larger α means large flush/fill overhead

10

When is NLP Beneficial?

11

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.2 0.12 0.24 0.36 0.48 0.60 0.72 0.84 0.96 1.08 1.20

0.4 0.14 0.28 0.42 0.56 0.70 0.84 0.98 1.12 1.27 1.41

0.6 0.16 0.32 0.48 0.64 0.80 0.96 1.12 1.28 1.45 1.59

0.8 0.18 0.36 0.54 0.72 0.90 1.08 1.27 1.45 1.61 1.79

1 0.20 0.40 0.60 0.80 1.00 1.20 1.41 1.59 1.79 2.00

1.2 0.22 0.44 0.66 0.88 1.10 1.32 1.54 1.75 1.96 2.22

1.4 0.24 0.48 0.72 0.96 1.20 1.45 1.67 1.92 2.17 2.38

1.6 0.26 0.52 0.78 1.04 1.30 1.56 1.82 2.08 2.33 2.63

1.8 0.28 0.56 0.84 1.12 1.41 1.67 1.96 2.22 2.50 2.78

2 0.30 0.60 0.90 1.20 1.49 1.79 2.08 2.38 2.70 3.03

f*: higher = less degradation

α: larger = small trip count (innermost)

Program Characteristic

(cannot change)

Improving control-path is

possible

Model speedup as a function of f* and α

Outline

Modeling Loop Pipelining FSM Derivation

Polyhedral Representation Computing Transitions State Look Ahead

Evaluation Conclusion

12

Polyhedral Representation

Represent loops as mathematical objects

13

for i = 0:N for j = 0:M S

S M

N

for i = 0:N for j = 0:i S0 for k = 0:N-i S1

S0S1

FSM Derivation

next function Find a piece-wise function that gives the

immediate successor in lexicographic order

Proposed in 1998 for low-level code generation

Direct Application to FSM Each piece = condition of transition Function = transition

Can be composed to obtain nextn

14

State Look Ahead

Pipelining the control-flow When data-path is heavily pipelined,

control-path becomes the critical path Computing n-states ahead

Allows n-stage pipelining of the control-path

15

datapath

i,j i,’j’ i”,j”

next2

datapath

i,j i’,j’

next

Other Optimizations

Merging transitions next computed can have many

transitions Some can be merged by looking at its

context

Common Sub-expressions HLS tools sometimes fail to catch

16

next(i,j) = (i,j+1) if i<N

next(i,j) = (N,j+1) if i=Nnext(i,j) = (i,j+1) if i≤N

if (a>b && c>d) A;if (a>b && e>f) B;

x = a>b;if (x && c>d) A;if (x && e>f) B;

Evaluation Methodology

Focus on control-path empty data-path (incrementing arrays) independent iterations loops with different shapes

3 versions: different pipelining SLP : innermost loop NLP: all loops FSM-LA2: while loop of FSM with next2

17

Evaluation: HLS Phase

Maximum Target Frequency

18

rect 2d rect 3d triangular 2d

triangular 3d

0

100

200

300

400

500

SLP NLP FSM-LA2

Evaluation: Synthesized Design Achieved Frequency

19

rect 2d rect 3d triangular 2d

triangular 3d

0

100

200

300

400

500

SLP NLP FSM-LA2

Conclusion

Improved FSM generation from for loops Example of HLS specific transformation State look ahead to pipeline control-path HLS tools currently lack compiler

optimizations Applied to Nested Loop Pipelining

Enlarge applicability by reducing its overhead

Future Directions Other uses of next function Other HLS-specific transformations

20

21

top related