Top Banner
CS502: Compiler Design Code Optimization Manas Thakur Fall 2020
57

Code Optimization Manas Thakur

Dec 03, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Code Optimization Manas Thakur

CS502: Compiler Design

Code Optimization

Manas Thakur

Fall 2020

Page 2: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 2

Fast. Faster. Fastest?

Lexical AnalyzerLexical Analyzer

Syntax AnalyzerSyntax Analyzer

Semantic AnalyzerSemantic Analyzer

Intermediate Code Generator

Intermediate Code Generator

Character stream

Token stream

Syntax tree

Syntax tree

Intermediaterepresentation

Machine-Independent Code Optimizer

Code GeneratorCode Generator

Target machine code

Intermediate representation

Machine-Dependent Code Optimizer

Target machine code

SymbolTable

F r

o n

t e

n d

B a

c k

e n

d

Page 3: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 3

Role of Code Optimizer

● Make the program better

– time, memory, energy, ...

● No guarantees in this land!

– Will a particular optimization for sure improve something?

– Will performing an optimization affect something else?

– In what order should I perform the optimizations?

– What “scope” to perform certain optimization at?

– Is the optimizer fast enough?

● Can an optimized program be optimized further?

Page 4: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 4

Full employment theorem for compiler writers

● Statement: There is no fully optimizing compiler.

● Assume it exists:

– such that it transforms a program P to the smallest program Opt(P) that has the same behaviour as P.

– Halting problem comes to the rescue:● Smallest program that never halts:

L1: goto L1– Thus, a fully optimizing compiler could solve the halting problem by

checking if a given program is L1: goto L1!

– But HP is an undecidable problem.

– Hence, a fully optimizing compiler can’t exist!

● Therefore we talk just about an optimizing compiler.

– and keep working without worrying about future prospects!

Page 5: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 5

How to perform optimizations?● Analysis

– Go over the program

– Identify some properties● Potentially useful properties

● Transformation

– Use the information computed by the analysis to transform the program

● without affecting the semantics

● An example that we have (not literally) seen:

– Compute liveness information

– Delete assignments to variables that are dead

Page 6: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 6

Classifying optimizations● Based on scope:

– Local to basic blocks

– Intraprocedural

– Interprocedural

● Based on positioning:

– High-level (transform source code or high-level IR)

– Low-level (transform mid/low-level IR)

● Based on (in)dependence w.r.t. target machine:

– Machine independent (general enough)

– Machine dependent (specific to the architecture)

Page 7: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 7

May versus Must information

● Consider the program:

● Which variables may be assigned?

– a, b, c● Which variables must be assigned?

– a● May analysis:

– the computed information may hold in at least one execution of the program.

● Must analysis:

– the computed information must hold every time the program is executed.

if (c) { a = ... b = ...} else { a = ... c = ...}

Page 8: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 8

Many many optimizations

● Constant folding, constant propagation, tail-call elimination, redundancy elimination, dead code elimination, loop-invariant code motion, loop splitting, loop fusion, strength reduction, array scalarization, inlining, synchronization elision, cloning, data prefetching, parallelization . . . etc . .

● How do they interact?

– Optimist: we get the sum of all improvements.

– Realist: many are in direct opposition.

● Let us study some of them!

Page 9: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 9

Constant propagation● Idea:

– If the value of a variable is known to be a constant at compile-time, replace the use of the variable with the constant.

– Usually a very helpful optimization

– e.g., Can we now unroll the loop?● Why is it good?● Why could it be bad?

– When can we eliminate n and c themselves?● Now you know how well different optimizations might interact!

n = 10;c = 2;for (i=0; i<n; ++i) s = s + i * c;

n = 10;c = 2;for (i=0; i<10; ++i) s = s + i * 2;

Page 10: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 10

Constant folding

● Idea:

– If operands are known at compile-time, evaluate expression at compile-time.

– What if the code was?

– And what now?

r = 3.141 * 10; r = 31.41;

PI = 3.141;r = PI * 10;

PI = 3.141;r = PI * 10;d = 2 * r;

Constant propagation

Constant folding

Called partial evaluation

Page 11: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 11

Common sub-expression elimination

● Idea:

– If program computes the same value multiple times,reuse the value.

– Subexpressions can be reused until operands are redefined.

a = b + c;c = b + c;d = b + c;

t = b + c;a = t;c = t;d = b + c;

Page 12: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 12

Copy propagation

● Idea:

– After an assignment x = y, replace the uses of x with y.

– Can only apply up to another assignment to x, or

... another assignment to y!

– What if there was an assignment y = z earlier?● Apply transitively to all assignments.

x = y;if (x > 1) s = x + f(x);

x = y;if (y > 1) s = y + f(y);

Page 13: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 13

Dead-code elimination

● Idea:

– If the result of a computation is never used,remove the computation.

– Remove code that assigns to dead variables.● Liveness analysis done before would help!

– This may, in turn, create more dead code.● Dead-code elimination usually works transitively.

x = y + 1;y = 1;x = 2 * z;

y = 1;x = 2 * z;

Page 14: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 14

Unreachable-code elimination

● Idea:

– Eliminate code that can never be executed

– High-level: look for if (false) or while (false)● perhaps after constant folding!

– Low-level: more difficult● Code is just labels and gotos● Traverse the CFG, marking reachable blocks

#define DEBUG 0if (DEBUG) print(“Current value = ", v);

Page 15: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 15

Next class

● Next class:

– How to perform the optimizations that we have seen using a dataflow analysis?

● Starting with:

– The back-end fullform of CFG!

● Approximately only 10 more classes left.

– Hope this course is being successful in making (y)our hectic days a bit more exciting :-)

Page 16: Code Optimization Manas Thakur

CS502: Compiler Design

Code Optimization (Cont.)

Manas Thakur

Fall 2020

Page 17: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 17

Recall A2

● Is ‘a’ initialized in this program?

– Reality during run-time: Depends

– What to tell at compile-time?● Is this a ‘must’ question or a ‘may’ question?● Correct answer: No

– How do we obtain such answers?● Need to model the control-flow

int a;if (*) { a = 10;else { //something that doesn’t touch ‘a’}x = a;

Page 18: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 18

Control-Flow Graph (CFG)

● Nodes represent instructions; edges represent flow of control

a = 0L1: b = a + 1 c = c + b a = b * 2 if a < N goto L1 return c

a = 0

b = a + 1

c = c + b

a = b * 2

a < N

return c

Page 19: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 19

Some CFG terminology

● pred[n] gives predecessors of n

– pred[1]? pred[4]? pred[2]?

● succ[n] gives successors of n

– succ[2]? succ[5]?

● def(n) gives variables defined by n

– def(3) = {c}

● use(n) gives variables used by n

– use(3) = {b, c}

a = 0

b = a + 1

c = c + b

a = b * 2

a < N

return c

1

2

3

4

5

6

Page 20: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 20

Live ranges revisited

● A variable is live, if its current value may be used in future.

– Insight:● work from future to past● backward over the CFG

● Live ranges:

– a: {1->2, 4->5->2}

– b: {2->3, 3->4}

– c: All edges except 1->2

a = 0

b = a + 1

c = c + b

a = b * 2

a < N

return c

1

2

3

4

5

6

Page 21: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 21

Liveness● A variable v is live on an edge if there is a

directed path from that edge to a use of v that does not go through any def of v.

● A variable is live-in at a node if it is live on any of the in-edges of that node.

● A variable is live-out at a node if it is live on any of the out-edges of that node.

● Verify:

– a: {1->2, 4->5->2}

– b: {2->4}

a = 0

b = a + 1

c = c + b

a = b * 2

a < N

return c

1

2

3

4

5

6

Page 22: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 22

Computation of liveness

● Say live-in of n is in[n], and live-out of n is out[n].

● We can compute in[n] and out[n] for any n as follows:

in[n] = use[n] ∪ (out[n] – def[n])

out[n] = s∀ ∈succ[n] ∪ in[s]

Called dataflow equations.Called flow functions.

Page 23: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 23

Liveness as an iterative dataflow analysis

Initialize

Save previous values

Computenew values

Repeat till fixed-point

IDFAfor each n

in[n] = {}; out[n] = {}

repeat

for each n

in’[n] = in[n]; out’[n] = out[n]

in[n] = use[n] (out[n] – def[n])∪

out[n] = s succ[n] ∀ ∈ in[s]∪● until in’[n] == in[n] and out’[n] == out[n] ∀n

Page 24: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 24

Liveness analysis example

a = 0

b = a + 1

c = c + b

a = b * 2

a < N

return c

1

2

3

4

5

6

in[n] = use[n] ∪ (out[n] – def[n]) out[n] = s∀ ∈succ[n] ∪ in[s]

Fixed point

Page 25: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 25

In backward order a = 0

b = a + 1

c = c + b

a = b * 2

a < N

return c

1

2

3

4

5

6● Fixed point only in 3 iterations!

● Thus, the order of processing statements is important for efficiency.

in[n] = use[n] ∪ (out[n] – def[n]) out[n] = s∀ ∈succ[n] ∪ in[s]

Page 26: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 26

Complexity of our liveness computation algorithm

● For input program of size N

– ≤N nodes in CFG

⇒ N variables

⇒ N elements per in/out

⇒ O(N) time per set union

– for loop performs constant number of set operations per node

⇒ O(N2) time for for loop

– Each iteration of for loop can only add to each set (monotonicity)

– Sizes of all in and out sets sum to 2N2

thus bounding the number of iterations of the repeat loop

⇒ worst-case complexity of O(N4)

– Much less in practice (usually O(N) or O(N2)) if ordered properly.

repeat

for each n

in’[n] = in[n]; out’[n] = out[n]

in[n] = use[n] (out[n] – def[n])∪

out[n] = s succ[n] ∀ ∈ in[s]∪

until in’[n] == in[n] and out’[n] == out[n] ∀n

Page 27: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 27

Least fixed points

● There is often more than one solution for a given dataflow problem.

– Any solution to dataflow equations is a conservative approximation.

● Conservatively assuming a variable is live does not break the program:

– Just means more registers may be needed.

● Assuming a variable is dead when really live will break things.

● Many possible solutions; but we want the smallest: the least fixed point.

● The iterative algorithm computes this least fixed point.

Page 28: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 28

Confused!?

● Is compilers a theoretical topic or a practical one?

● Recall:

– “A sangam of theory and practice.”

● Next class:

– We are not leaving a topic as important as IDFA so soon!

Page 29: Code Optimization Manas Thakur

CS502: Compiler Design

Code Optimization (Cont.)

Manas Thakur

Fall 2020

Page 30: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 30

Recall our IDFA algorithm

for each n

in[n] = ...; out[n] = ...

repeat

for each n

in’[n] = in[n]; out’[n] = out[n]

in[n] = ...

out[n] = ...

until in’[n] = in[n] and out’[n] = out[n] for all n

Initialize

Save previous values

Computenew values

Repeat till fixed-point

Do we need to process all the nodes in each iteration?

Page 31: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 31

Worklist-based Implementation of IDFA

● Initialize a worklist of statements

● Forward analysis:

– Start with the entry node

– If OUT(n) changes, then add succ(n) to the worklist

● Backward analysis:

– Start with the exit node

– If IN(n) changes, then add pred(n) to the worklist

● In both the cases, iterate till fixed point.

Page 32: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 32

Writing an IDFA (Cont.)

● Initialization of IN and OUT sets depends on the analysis:

– empty if the information grows

– all the nodes if the information shrinks

● Requirement for termination:

– unidirectional growth/shrinkage

– Called monotonicity

● Confluence/Meet operation (at control-flow merges):

– Union

– Intersection Depends on the analysis

Page 33: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 33

Live-variable analysis revisited

● Direction:

– Backward

● Initialization:

– Empty sets

● Flow functions:

– out[n] = s∀ ∈succ[n] ∪ in[s]– in[n] = use[n] ∪ (out[n] – def[n])

● Confluence operation:

– Union

Page 34: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 34

Common sub-expressions revisited

● Idea:

– If a program computes the same value multiple times,reuse the value.

– Subexpressions can be reused until operands are redefined.

– Say given a node n, the expressions computed at n are denoted as gen(n) and the ones killed (operands redefined) at n are denoted as kill(n).

a = b + c;c = b + c;d = b + c;

t = b + c;a = t;c = t;d = b + c;

Page 35: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 35

Common subexpressions as an IDFA● Direction:

– Forward

● Initialization:

– Empty sets

● Flow functions:

– in[n] = p∀ pred∈ [n] ∩ out[p]– out[n] = gen[n] ∪ (in[n] – kill[n])

● Confluence operation:

– Intersection

Page 36: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 36

Are we efficient enough?

● When can IDFAs take a lot of time?

● Which operations could be expensive?

– Confluence

– Equality

● Compilers may have to perform several IDFAs.

● How can we make an IDFA more efficient (perhaps with some loss of precision)?

repeat

for each n

in’[n] = in[n]; out’[n] = out[n]

in[n] = use[n] (out[n] – def[n])∪

out[n] = s succ[n] in[s]∀ ∈ ∪

until in’[n] == in[n] and out’[n] == out[n] n∀

Page 37: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 37

Basic Blocks a = 0L1: b = a + 1 c = c + b a = b * 2 if a < N goto L1 return c

a = 0

b = a + 1

c = c + b

a = b * 2

a < N

return c

a = 0

b = a + 1 c = c + b a = b * 2 a < N

return c

Each instruction as a node Using basic blocks

Page 38: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 38

Basic Blocks (Cont.)

● Idea:

– Once execution enters a basic block, all statements are executed in sequence.

– Single-entry, single-exit region

● Details:

– Starts with a label

– Ends with one or more branches

– Edges may be labeled with predicates● True/false● Exceptions

● Key: Improve efficiency, with reasonable precision.

Page 39: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 39

Have you got a compiler’s eyes yet?

● What properties can you identify about this program?

● What’s the advantage if it was rewritten as follows?

● Def-use becomes explicit.

S1: y = 1;S2: y = 2;S3: x = y;

S1: y1 = 1;S2: y2 = 2;S3: x = y2;

Page 40: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 40

Static Single Assignment (SSA)

● A form of IR in which each use can be mapped to a single definition.

– Achieved using variable renaming and phi nodes.

● Many compilers use SSA form in their IRs.

if (flag) x = -1;else x = 1;y = x * a;

if (flag) x1 = -1;else x2 = 1;x3 = Φ(x

1, x

2)

y = x3 * a;

Page 41: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 41

SSA Classwork

● Convert the following program to SSA form:

– (Hint: First convert to 3AC)

x = 0;for (i=0; i<N; ++i) { x += i; i = i + 1; x--;}x = x + i;

x1 = 0;i1 = 0;L1:i13 = Φ(i1,i3);if (i13 < N) { x13 = Φ(x1,x3); x2 = x13 + i13; i2 = i13 + 1;

x3 = x2 – 1;i3 = i2 + 1;goto L1;

}x4 = Φ(x1, x3);x5 = x4 + i13;

Page 42: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 42

Effect of SSA on Register Allocation!?

● What is the effect of SSA form on liveness?

● What does SSA do?

– Breaks a single variable into multiple instances

– Instances represent distinct, non-overlapping uses

● Effect:

– Breaks up live ranges; often improves register allocation

x x1 x2

Page 43: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 43

Featuring Next in Code Optimization

● Heard of the 80-20 or 90-10 rule?

– X% of time is spent in executing y% of the code, where X >> y.

● Which all kinds of code portions tend to form the region ‘y’ in typical programs?

– Loops

– Methods

● Tomorrow: Loop optimizations

Page 44: Code Optimization Manas Thakur

CS502: Compiler Design

Loop Optimizations

Manas Thakur

Fall 2020

Page 45: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 45

Why optimize loops?

● Form a significant portion of the time spent in executing programs.

– If N is just 10000 (not uncommon), we have too many instructions!● How many in the above loop?

– What if S1/S2 is/are function calls?

● Involve costly instructions in each iteration:

– Comparisons

– Jumps

for (i=0; i<N; i++) { S1; S2;}

Page 46: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 46

What is a loop?

● A loop in a CFG is a set of nodes S such that:

– There is a designated header node h in S

– There is a path from each node in S to h

– There is a path from h to each node in S

– h is the only node in S with an incoming edge from outside S

Page 47: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 47

Are all these loops?

Page 48: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 48

What about these?

Page 49: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 49

Identifying loops using dominators

● A node d dominates a node n if every path from entry to n goes through d.

● Compute dominators of each node:

Page 50: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 50

Flow function for computing dominators

● Assuming D[i] is the set of dominators of node i:

D[entry] = {entry}

D[n] = {n} ∪ p∀ pred∈ [n] ∩ D[p]

Page 51: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 51

Identifying loops using dominators (Cont.)

● First, identify a back edge:

– An edge from a node n to another node h, where h dominates n● Each back edge leads to a loop:

– Set X of nodes such that for each x ∈ X, h dominates x and there is a path from x to n not containing h

– h is the header

● Verify:

Page 52: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 52

Loop-Invariant Code Motion (LICM)

● Loop-invariant code:

– d: t = a OP b, such that:● a and b are constants; or● all the definitions of a and b that reach d are outside the loop; or● only one definition each of a and b reaches d, and that definition is

loop-invariant.

● Example:

L0: t = 0L1: i = i + 1 t = a * b M[i] = t if i<N goto L1L2: x = t

Page 53: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 53

LICM: Get ready for code hoisting

● Can we always hoist loop-invariant code?

● Criteria for hoisting d: t = a OP b:

– d dominates all loop exits at which t is live-out, and

– there is only one definition of t in the loop, and

– t is not live-out of the loop preheader

● How can we hoist code in the pink and the orange blocks?

L0: t = 0L1: i = i + 1 t = a * b M[i] = t if i<N goto L1L2: x = t

L0: t = 0L1: if i>=N goto L2 i = i + 1 t = a * b M[i] = t goto L1L2: x = t

L0: t = 0L1: M[j] = t i = i + 1 t = a * b M[i] = t if i<N goto L1L2: x = t

Page 54: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 54

Induction-variable optimization

● Induction variables:

– Variables whose value depends on iteration variable

● Optimization:

– Compute them efficiently, if possible

s = 0 i = 0L1: if i>=N goto L2 j = i * 4 k = j + a x = M[k] s = s + x i = i + 1 goto L1L2:

s = 0 k’ = a b = N * 4 c = a + bL1: if k’>=c goto L2 x = M[k’] s = s + x k’ = k’ + 4 goto L1L2:

Page 55: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 55

Loop unrolling

● Minimize the number of increments and condition-checks

● Be careful about the increase in code size (I-cache misses!)

L1: x = M[i] s = s + x i = i + 4 if i<N goto L1L2:

L1: x = M[i] s = s + x x = M[i+4] s = s + x i = i + 8 if i<N goto L1L2:

if i<N-8 goto L1 goto L2L1: x = M[i] s = s + x x = M[i+4] s = s + x i = i + 8 if i<N-8 goto L1L2: x = M[i] s = s + x i = i + 4 if i<N goto L2L3:

Only even no. of iterations: Any no. of iterations:

Unroll by factor of 2

Page 56: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 56

Loop interchange● A C/Java programmer starting with MATLAB:

● But MATLAB stores matrices in column-major order!

● Implication?

– Cache misses (perhaps in each iteration)!

● Solution (interchange the loops!):

for i=1:1000, for j=1:1000, a(i) = a(i) + b(i,j)*c(i) endend

for j=1:1000, for i=1:1000, a(i) = a(i) + b(i,j)*c(i) endend

Page 57: Code Optimization Manas Thakur

Manas Thakur CS502: Compiler Design 57

Many more loop optimizations

● Loop fusion

● Loop fission

● Loop inversion

● Loop tiling

● Loop unswitching

● . . .

● Vectorization

● ParallelizationNext class!

Some other time!