Top Banner
Optimal Polynomial-Time Optimal Polynomial-Time Interprocedural Register Interprocedural Register Allocation for High-Level Allocation for High-Level Synthesis Using SSA Form Synthesis Using SSA Form Philip Brisk Ajay K. Verma Paolo Ienne csda csda
27

Philip Brisk

Dec 31, 2015

Download

Documents

abel-spence

csda. csda. Optimal Polynomial-Time Interprocedural Register Allocation for High-Level Synthesis Using SSA Form. Philip Brisk. Ajay K. Verma. Paolo Ienne. Outline. Register Allocation Overview Interprocedural Register Allocation Related Work SSA Form With Launch and Landing Pads - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Philip Brisk

Optimal Polynomial-Time Optimal Polynomial-Time Interprocedural Register Allocation Interprocedural Register Allocation

for High-Level Synthesis Using for High-Level Synthesis Using SSA FormSSA Form

Philip Brisk

Ajay K. Verma

Paolo Ienne

csdacsda

Page 2: Philip Brisk

OutlineOutline

Register Allocation Overview

Interprocedural Register Allocation

Related Work

SSA Form With Launch and Landing Pads

Optimal Solution

Experimental Results

Conclusion

Page 3: Philip Brisk

Modeling Register Allocation Modeling Register Allocation

For Procedure Pi…Build interference graph Gi = (Vi, Ei)

Vi – One vertex for each variableEi – Edge between each pair of interfering

variablesTwo variables interfere if their lifetimes overlap

Compute the chromatic number χ(Gi)Color assignment = Register assignmentNP-Complete in general

Page 4: Philip Brisk

Local Interferences Local Interferences

Local Interferences – Single ProcedureOverlapping lifetimesStatic Single Assignment (SSA) Form

Interference graph is chordal

X

Y

X

Z

Y

Z

Y

X

ZX

Z

Y

Page 5: Philip Brisk

Global Interferences Global Interferences

Global InterferencesVariable V is live across a call to procedure PV interferes with EVERY local variable in P

And all variables in all procedures reachable from P

Must consider all paths through the Call Graph

Main:

V

Call P

V

P QMainP:

Call Q

Q:

Page 6: Philip Brisk

Global Interferences and Global Interferences and Recursion Recursion

Fact: No register can hold a local variable across a recursive

function callRuntime stack is requiredSome exceptions (e.g. static local variables)

Ignored here

Call Graph Compute strongly connected components (SCCs) Collapse each SCC into a single node Resulting “Augmented Component Graph” is acyclic

Page 7: Philip Brisk

Interprocedural Register Interprocedural Register Allocation Allocation

Interprocedural Interference Graph (IIG)Undirected graph G = (V, E)V – All variables in all proceduresE – Local AND global interferencesCompute chromatic number χ(G)

Page 8: Philip Brisk

Related Work Related Work

Interprocedural Register Allocation in HLSColor IIG with heuristic [Vemuri et al., TODAES

’02]IIG is largePolynomial heuristics are still slow

Scalable Approach [Beidas and Zhu, ASP-DAC ’05]Color each procedure individually

Use any heuristic you wantUse any intermediate representation you want

Propagate global interferences at call points IIG is never built

Page 9: Philip Brisk

Contribution Contribution

Interprocedural register allocationOptimal, polynomial-time algorithmScalable

IIG is never built If built, it would be chordal

Each Procedure colored individuallySSA Form – interference graph is chordal

Special case of [Beidas and Zhu, ASP-DAC ’05]Top-down color propagationNovel SSA-based intermediate representationChordal color assignment (with offset)

Page 10: Philip Brisk

Preallocation of Global Registers Preallocation of Global Registers

Global registers hold variables that are live across procedure calls How many do we need?

Pi – Procedure

ck – Call Point

Pi ck Pj

Procedure CallP – Set of Procedures in

App.

L(ck) – Set of variables live across ck

ck : Call Pj

Page 11: Philip Brisk

Preallocation of Global Registers Preallocation of Global Registers

δk = δi + |L(ck)|

δi = MAX {δk}

Pi

δ1…

Compute: δ – Number of variables live… At the entry of a procedure Across a call point

δ2

δm

1 ≤ k ≤ m

Procedure: Pi

ck: Call …

(δi is known)

L(ck)…

(i.e. Over all points that call Pi)

Page 12: Philip Brisk

Example Example P1

c7 c8 c9 c10 c11

P2

P3

P4

c12 c13

P5

c14

P6

i δi

0P1

P2 0

P3 0

P4 0

P5 0

P6 0

c7 0

c8 0

c9 0

c10 0

c11 0

c12 0

c13 0

c14 0

ci |L(ci)|

c7 1

c8 2

c9 3

c10 2

c11 5

c12 3

c13 3

c14 2

δ1 = 0

P1

0P1

P1

0P1

c7

c7 1

1

δ7 = |L(c7)| + δ1

δ7 = 1 + 0 = 1

c7 1

c8

2

c8 2

δ8 = |L(c8)| + δ1

δ8 = 2 + 0 = 2

c9

c8 2

3

c9 3

δ9 = |L(c9)| + δ1

δ9 = 3 + 0 = 3

c10

2

c10 2

c9 3

δ10 = |L(c10)| + δ1

δ10 = 2 + 0 = 2

c11

5

c11 5

c10 2

δ11 = |L(c11)| + δ1

δ11 = 5 + 0 = 5

c7 c8

P2

0P1

P2 2

c7 1

c8 2

δ2 = MAX{δ7, δ8}δ2 = MAX{1, 2} = 2

c11 5

c9

P3

P2 2

c9 3

P3 3

δ3 = MAX{δ9}δ3 = MAX{3} = 3

c10

P4

c10 2

P3 3

P4 2

δ4 = MAX{δ10}δ4 = MAX{2} = 2

P2

c123

P4 2

P2 2

c12 5δ12 = |L(c12)| + δ2

δ12 = 3 + 2 = 5

P3

c13

3

c12 5

P3 3

c13 6δ13 = |L(c13)| + δ3

δ13 = 3 + 3 = 6

P4

c14

2

c13 6

P4 2

c14 4

δ14 = |L(c14)| + δ4

δ14 = 2 + 2 = 4

c12 c13

P5

c14 4

c12 5

c13 6

P5 6

δ5 = MAX{δ12, δ13}δ5 = MAX{5, 6} = 6

c11

c14

P6

P5 6

P6 5

c11 5

c14 4

δ6 = MAX{δ11, δ14}δ6 = MAX{5, 4} = 5

P6 5

Page 13: Philip Brisk

Preallocation of Global Registers Preallocation of Global Registers

When Procedure Pi is called.. At most δi variables live across calls leading to Pi

Holds for every path in the call graph

How to ensure that all variables live across calls leading to Pi are assigned to the right register?

N = MAX {δi} – Number of global registers allocatedPi P T = {T1, ….,

TN}

Page 14: Philip Brisk

Launch and Landing Pads Launch and Landing Pads

Procedure Pi calls Pj; (m = δi) Assign variables live across calls leading to Pi to

T1…Tm

Let ck be the call point; n = |L(ck)|

Launch Pad Parallel copy placed before the call

(Tm+1…Tm+n) ψ(L(ck))

Landing Pad Copy the values back after the call

L(ck) ψ((Tm+1…Tm+n))

Page 15: Philip Brisk

Theoretical Consequences of Theoretical Consequences of Launch and Landing PadsLaunch and Landing Pads

Theorem: All global interferences involve at least one global register

Corollary: Local variables in distinct procedures do not interfere

Corollary: No local variable in “main” has a global interference

Theorem: Every variable defined locally in Pi (m = δi)

Interferes with global registers T1…Tm Does NOT interfere with global registers Tm+1, … TN

=> Can assign local vars in Pi to global registers Tm+1, … TN

Page 16: Philip Brisk

Reducing the Chromatic Number Reducing the Chromatic Number

Procedure: A

V …Call BW …… VX …… WY …… XCall B… Y

Procedure: B

Z …… Z

V

X

W

Y

Z

V

Y

W

X

Chromatic Number = 3

Page 17: Philip Brisk

Reducing the Chromatic Number Reducing the Chromatic Number

Procedure: A

V …T1 Ψ(V)Call BV Ψ-1(T1)W …… VX …… WY …… XT1 Ψ(Y)Call BY Ψ-1(T1)… Y

X

V

W

T1Z

Procedure: B

Z …… ZV

Y

W

X

T1

V

T1

Y

Chromatic Number = 2

Page 18: Philip Brisk

Characterizing the IIGCharacterizing the IIG

Theorem: T is a clique in the IIG

Theorem: IIG is chordal

Theorem:

Chromatic Number of the IIG is: R = MAX{δi + χ(Gi)}

Pi P

Page 19: Philip Brisk

ExampleExample

T1 T2 T3 T4 T5 T6

CLIQUE

N = 6

G1

δ1 = 0

G2

δ2 = 2

G3

δ3 = 3

G4

δ4 = 2

G5

δ3 = 6

G6

δ6 = 5

Global interference

Tj interferes with each local variable in Gi

Page 20: Philip Brisk

Coloring AlgorithmColoring Algorithm

1. Use SSA+LLP Form, but DON’T build the IIG

2. For Pi colors in the range 1..δi are unavailable

3. Color the local (chordal) interference graph G i of Pi

Complexity: O(Vi + Ei)

4. For each vertex in Pi, replace color c with c + δi

Complexity: O(Vi)

Page 21: Philip Brisk

ExperimentsExperiments

Applications taken from Mediabench and MiBench Written in C Compiled Using Machine SUIF

Optimal color assignment

Compare to heuristics Color Palette Propagation

Top-Down, Bottom-Up [Beidas and Zhu, ASP-DAC’05] Heuristic Color Assignment [Matula and Beck, JACM ’83]

Page 22: Philip Brisk

Registers Allocated Registers Allocated (Normalized to Optimal)(Normalized to Optimal)

0

0.5

1

1.5

2

2.5

Optimal Top-Down Bottom-Up

Page 23: Philip Brisk

Runtime Runtime (Normalized to Optimal)(Normalized to Optimal)

0

50

100

150

200

250

Optimal Top Down Bottom-Up

Page 24: Philip Brisk

Runtime of PegwitRuntime of Pegwit(Normalized to Optimal)(Normalized to Optimal)

0

1000

2000

3000

4000

5000

6000

pegwit

Optimal Top-Down Bottom-Up

Page 25: Philip Brisk

LimitationsLimitations

Global Variables Interfere with all variables in the programLifetime can still be analyzed

Static Local Variables Initialized on first accessHold their values across function calls

Function PointersResolution is NP-Complete

Page 26: Philip Brisk

ConclusionConclusion

Inteprocedural register allocation in HLSOptimal, polynomial-time algorithm

Uses SSA Form + Launch/Landing PadsIIG is a chordal graphScalable – no need to build IIGSignificantly faster than sub-optimal heuristics

A few limitationsGlobal variables, local static variablesFunction pointers

Resolution is NP-Complete

Page 27: Philip Brisk

Related Work Related Work

Register Allocation in HLSClique Partitioning/Coloring Problem

[Tseng and Siewiorek, ’86]

Scheduled DFGs – Interval Graphs [Kurdahi and Parker, ’87]

Scheduled Cyclic DFGs – Circular Arc Graphs (NP-Complete) [Stok, ’92]

Restrictions on Variable Lifetimes – Chordal Graphs

[Springer and Thomas, ’94]

Static Single Assignment Form – Chordal Graphs [Brisk et al. 2005/6], [Hack and Goos, 2005/6],

[Bouchez et al. 2005]