Optimizing Compiler . Scalar optimizations .

Software & Services Group, Developer Products DivisionCopyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Optimizing Compiler. Scalar optimizations.


Main characteristics of the application, affecting its performance

Calculations efficiency, Memory usage effectiveness, Correct branch prediction, Efficient use of vector instructions, The effectiveness of parallelization, Instructional parallelism level.

Software & Services Group, Developer Products DivisionCopyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10/17/10

Optimizing compiler roleCompiler translates the entire source program into an equivalent program in

the resulting machine code or assembly language.The main objective of optimizing compiler is obtaining effective code for target

computer system.From a developer point of view, the program must be: easily readable and modifiable easy to debug quickly performedA developer needs reliable unified development environment ability to vary the levels of debugging and performance possibility to obtain high-performance code for different operating systems

and microprocessor architectures.


An optimizing compiler is complex software system, driven by the requirements to the resulting code. Compiler developers face: the complexity of the optimizations legality proof, calculations of profitability, lack of compile-time representation of a typical input data, etc. It requires close cooperation with the developer for achieving the best results.

To use features of the compiler successfully, the programmer must: have ideas about computer systems which will be used by his applications; have knowledge about compiler command line options; learn the basic techniques of performance improvements which are used by the

compiler; be familiar with the main problems causing the application slowdown; have ideas about the input data which the application will use; know how to analyze program performance.


Intel compilers

Intel provides C/C++ and Fortran compilers for Windows, Linux and Mac OS operating systems.

For Windows INTEL compiler is made as plug-in for the Microsoft Visual Studio.

The important purposes of the Intel compilers are well-timed support of all new computer systems, compatibility with Microsoft Visual Studio on a Windows platform and gcc on Linux and Mac OS, supplying convenient environment to develop effective applications.

www.intel.com/software/products


FE (C++/C or Fortran)

Internal representation

Profiler

Scalar optimizations Loop optimizations

Code generation

Source files

Object files

Temporary files or object files with

IR

Interprocedural optimizations

Scalar optimizations

Code generation

Executable file oflibrary

Two pass and single pass compilation

scheme

Loop optimizations

-Qipo/-Qip


Front End

Parsing is the process of input characters analysis, usually in accordance with a given formal grammar.

During parsing the source code is converted into a data structure. Usually it is a tree that reflects the syntax structure of the input sequence and is well suited for further processing.

Typically, parsing is divided into two levels:

• lexical analysis - the input stream of characters partitioned into a linear sequence of tokens - "words" of language (eg, integers, identifiers, string constants, etc.);

• semantic analysis - token are converted into statements and expressions of used language, according to grammatical rules.

At the output we get FE related tables, which are called the internal representation of the program. The usual practice is to share one internal representation for the various high-level languages .



void sub(int *a,int k,int r) { int i;

for(i=0;i<k;i++) a[i]=r;}

(Statements)STMT_ENTRYSTMT_ASSIGNSTMT_WHILE_DOSTMT_RETURN

List of statements is base structure of internal representation.

Statements may be regarded as the smallest independent elements of the programming language.Statements are used to describe assignments, flow control commands (such as IF, GOTO, CALL, RETURN), the function calls, etc.


The statements are usually presented in a list and can be linked in two ways:1.) Lexically. Each statement has a predecessor and a successor.2.) By control flow graph.

struct Stmt { common_members: int type; Stmt * pred; Stmt *succ; Basic_Block bblock; … }

Some simple scalar optimizations based on walking through the list of statements to find some specific statements and process them:For_All_Subroutine_Stmt(subroutine,stmt) { if(Stmt_type(stmt) == Stmt_Assign { //assignment processing }}


Expressionsa = b + c;

Stmt_AssignN

lval rval

Expr_Var

‘a’

Expr_Add

Expr_Var

‘b’

Expr_Var

‘c’

expressions represent expression tree.

Boundary expressions can be variables or constants

Internal representation also contains a lot of tables describing different objects such as variables, functions, types, etc.


Control Flow GraphA Control Flow Graph (CFG) represents all paths through a

program control could travers during its execution. In a control flow graph each node represents a basic block (a straight-line piece of code without any jumps or jump targets). Jump target starts a block, and jump ends a block. Directed edges are used to represent jumps of the control. There are two specially designated blocks: the entry block, through which control enters into the flow graph, and the exit block, through which all control flow leaves.

The CFG is essential to many compiler optimizations.


int main() {

int sum=0;

int i=1;

while (i<11) {

sum=sum+i;

i = i+1;

}

printf(“%d\n”,sum);

}

Entry

Sum=0;

i=1;

L12:

if (i<11)

sum = sum+i;

i = i+1;

Goto L12

printf(..)

Return

CFG example

Struct BBLOCK {STMT first_stmtSTMT last_stmtBBLOCK_LIST pred_listBBLOCK_LIST succ_list…}


FE (C++/C or Fortran)


Profiler

Scalar optimizations Loop optimizations

Code generation

Source files

Object files

Temporary files or object files with

IR

Interprocedural optimizations

Scalar optimizations

Code generation

Executable file oflibrary

Two pass and single pass compilation

scheme

Loop optimizations

-Qipo/-Qip


Scalar optimizationsThere are well-known scalar optimizations such as constant folding, constant

propagation and copy propagation.Constant folding is a process of calculating a constants at compile time.Constant propagation is substitution of variables with known constant values by these

values in the expression.

int x = 14;int y = 7 - x/2;

int x = 14;int y = 7 - 14/2;

int x = 14;int y = 0;

Constant propagation

Constant folding

Copy propagation is substitution of variables by their values.

y = x;z = 3+y;

y = x;z = 3+x;

Copy propagation


Common subexpressions eliminationSearch for identical subexpressions and saving the

calculation result in a temporary variable for later reuse.

a = b * c + g; d = b * c * d;

tmp = b * c; a = tmp + g;d = tmp * d;

CSE


Dead code elimination Removal of code that does not change the output of the program.

int foo() { int a = 24; int b = 25; int c; if(a<0) printf(«a<0 »);c = a << 2; return c; }

int foo() { int a = 24; int c; c = a << 2; return c; }

Dead code elimination

There are many cases when dead code can appear. It can be the result of scalar optimizations, inlining, etc.


Removal of excessive branching, broaching conditionsSometimes conditional branches can be deleted because of previous

conditions

if(x>0) { … if(x>0) { a=x; } else { a=-x; } …}

if(x>0) { … a=x; …}

Condition propagation


Why Control Flow Graph is important for scalar optimizations?

10/17/10

X = C1;L = X;

Y = X;X = C2;

Z = X;

IF(X>C1)

When we can propagate the information about the values of X? For straight-line piece of code the answer is trivial. CFG resolves ambiguity.


Data Flow analysis

Data Flow Analysis is a technique for gathering information about a possible set of values for each variable calculated at various points of a program. Control flow graph (CFG) is used to identify those parts of the program in which a certain value is assigned to a variable can be propagated.

A definition-use graph is a graph that contains the edges from each variable definition point in the program to every point of its use.


Construction of def-use chain for the base block is trivial. Each variable definition is associated with all subsequent uses of it. Each subsequent redefinition stops and starts a new chain.

In order to use this local graph CFG computed using several sets those characterize the behavior of the block:

• Uses (b): A set of variables used in the block, but have no definitions within the block.

• Defsout (b): A set of definitions that have been made in b, and reached the end of the block.

• Killed (b): A set of definitions that were canceled within a block by other definitions.

• Reaches (b): The set of all definitions made in other units, including b, which can reach b.


To understand what definition will be used in our basic block, it is important to know reaches (b).

It can be constructed via an iterative process that will calculate the reaches (b) through the sets of previous blocks.

Reaches (b) = U for all predecessors (defsout (p) U (reaches (p) ∩ ¬ killed (p))

The problem is that in the presence of loops, the set reaches(b) may depend on the reaches (b). If we will repeat this equation many times for each basic block CFG – final decision can be get.


Constructed sets are used for many scalar optimizations such as dead code elimination, constant propagation and etc. The main problem of this approach is a large number of edges in the Def-Use graph and a great time for calculation of these sets. As result a lot of resources are needed for processing.

S1 X= S2 X= S3 X=

S4

S5 =X S6 =X S7 =X

This example illustrates the problem. Definitions of S1, S2, S3 pass through the top of S4. Since each definition reaches every use, there are nine edges. Static single assignment form (SSA) was proposed to simplify DEF/USE chain.


SSA (Static single assignment form)

SSA form proposes unique name for each variable definition and introduction of special pseudo-assignments.

X1= X2= X3=

=X4 =X4 =X4

S1 S2 S3

S4

S5 S6S7

X4=φ(X1,X2,X3)


SSA is designed to save developers from building complex use / def chains for local variables. Power of SSA is that each variable has only one definition in the program. Therefore, use / def chain is trivial.

SSA introduces special presentation of Phi-functions in places with uncertainty, to create a new variable. This so-called pseudo-assignment.In the construction is necessary to place Phi - functions and create new unique variables.

The new variables are generated by completing the variable name with a unique option.In order to correctly insert the Phi function is necessary to consider some of the concepts of graph theory.

http://en.wikipedia.org/wiki/Image:SSA_example1.1.png

http://en.wikipedia.org/wiki/Image:SSA_example1.3.png


Dominance frontier of node x is set w of all nodes where x dominates all predecessors nodes from w, but doesn’t dominates nodes from w.

Example: Dom[5] = {5,6,7,8}DF[5] ={5,4,12,11}

1

2

3

4

5

6 7

8

9

11

11

12

5

6 7

810

Node N dominates node M if all ways to M pass through N.A node is an immediate dominator of node M if it is the last dominator on any path from entry node to M.


In SSA form, each variable definition must dominate the use of this variable.

Construction of the dominators set for each basic block can be the following:

The set of dominators for a node N is the intersection of the dominators set of all his predecessors, and the node itself.

Strict dominator N, this dominator!= N. Immediate dominator – the closest node from the set of dominators.idom (N) - the immediate dominator for basic block Nchildren (N) - the set of basic blocks for N, which it dominates

2

3

4

5

6


Criterion of dominance frontier: if the basic block N contains a definition of variable A , then every node on the dominance frontier of node N requires Phi function for A. Each Phi function is also the definition, so you must apply the criterion while there are nodes which requires Phi function.

B=A

A=x

A_2=φ(A_1,A_3) A_

B=A_2

3=x

Inserting φ functions for the node 5 of the scheme on slide 25

.


Optimization using the SSA form:

• Dead code elimination• If the variable a_ver is not used than it should be removed.

• Constant propagation• If there is an assignment a_ver = const, then all of a_ver should be replaced by

const

• If there is a φ-function a_next = φ (c, c) than φ should be replaced by c.

• Copy propagation• If there is an assignment a_n = b_k than all usages of a_n should be replaced with

b_k.

• If there is an assignment a_n = φ (b_k, b_k) than φ should be replaced with b_k.


Thank you!

Optimizing Compiler . Scalar optimizations .

Documents

intel corporation

windows intel compiler

software services group

respective owners

intel compilersintel

compiler developers

program performance

developer point of view