Compiler Construction - cdeep.iitb.ac.in

Post on 11-May-2022

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

CADSL

Compiler ConstructionAn Introduction

Virendra SinghAssociate ProfessorComputer Architecture and Dependable Systems LabDepartment of Electrical EngineeringIndian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/E-mail: viren@ee.iitb.ac.in

EE­717/453:Advance Computing for Electrical EngineersLecture 17 (30 Sep 2013)

CADSLEE-717/EE-453@IITB 2

What is a compiler?

A program that reads a program written in one language and translates it into another language.

Source language Target language

Traditionally, compilers go from high-level languages to low-level languages.

30 Sep 2012

CADSL30 Sep 2012 EE-717/EE-453@IITB 3

Compilers

Common compilation tasks● Language translation● Error checking and report ● Performance improvement

Fundamental compilation principles● The compiler must preserve the meaning of

source program● The compiler must improve the source program

in some discernible way

CADSLEE-717/EE-453@IITB 4

Compiler Architecture

Front End – language specific

Back End –machine specific

SourceLanguage

Target Language

Intermediate Language

In more detail:

Separation of Concerns Retargeting

30 Sep 2012

CADSLEE-717/EE-453@IITB 5

Compiler Architecture

Scanner(lexical

analysis)

Parser(syntax

analysis)

CodeOptimizer

SemanticAnalysis

(IC generator)

CodeGenerator

SymbolTable

Sourcelanguage

tokens Syntacticstructure

IntermediateLanguage

Targetlanguage

IntermediateLanguage

30 Sep 2012

CADSLEE-717/EE-453@IITB 6

Translation of an assignment

Translation of an assignment statement

30 Sep 2012

CADSL30 Sep 2012 EE-717/EE-453@IITB 7

Lexical Analysis Character stream token stream

● Recognize “words” of a language Theoretical problem: specify and recognize

patterns in strings ● Scanner as a practical application● Regular expression, finite automata● Tools that automatically generate scanners are

commonly used

index := start + step * 20Input:index := start + step * 20After scanning:

identifier operator number

CADSL30 Sep 2012 EE-717/EE-453@IITB 8

Syntactical Analysis

Token stream syntax tree● Recognize “sentences” of a

language Grammars and parsers

● CFG● Parsers can be automatically

generated● Top-down and bottom-up

parsing● Predictive parsing● Driven process of compiler front-

ends

After scanning:index := start + step * 20

After parsing:

index

Assign

ExpID :=

NumID

ID

+ Exp

*

start

step 20

Exp

Exp Exp

CADSLEE-717/EE-453@IITB 9

Semantic Analysis The semantic analyzer uses the syntax tree and the

information in the symbol table to check the source program for semantic consistency with the language definition.

Gathers type information and saves it in either the syntax tree or the symbol table, for subsequent use during intermediate-code generation.

An important part of semantic analysis is type checking, where the compiler checks that each operator has matching operands. For example, many programming language definitions require an array index to be an integer; the compiler must report an error if a floating-point number is used to index an array.

The language specification may permit some type conversions called coercions. For example, a binary arithmetic operator may be applied to either a pair of integers or to a pair of floating-point numbers. If the operator is applied to a floating-point number and an integer, the compiler may convert or coerce the integer into a floating-point number.

30 Sep 2012

CADSL30 Sep 2012 EE-717/EE-453@IITB 10

Semantic Analysis

Understand/annotate meaning of the program● Syntax-directed translation● Check semantic errors

● Inconsistent variable definitions and uses● Type systems

● Collect knowledge of the input program● Symbol tables● Scopes

CADSLEE-717/EE-453@IITB 11

Compiler Architecture

Scanner(lexical

analysis)

Parser(syntax

analysis)

CodeOptimizer

SemanticAnalysis

(IC generator)

CodeGenerator

SymbolTable

Sourcelanguage

tokens Syntacticstructure

IntermediateLanguage

Targetlanguage

IntermediateLanguage

30 Sep 2012

CADSL30 Sep 2012 EE-717/EE-453@IITB 12

Intermediate Code Generation

Representation of the input program● Internal to the compiler● Encode knowledge collected during compilation● Varied forms and levels

● Typically a compiler use more than one kind of IR● Desired properties

● Easy to produce, manipulate, and translate into the target code

CADSL

IR scheme

30 Sep 2012 EE-717/EE-453@IITB 13

• front end produces IR

• optimizer transforms IR to more efficient program

• back end transforms IR to target code

CADSL

Kinds of IR Abstract syntax trees (AST) Linear operator form of tree (e.g., postfix notation)

Directed acyclic graphs (DAG) Control flow graphs (CFG) Program dependence graphs (PDG) Static single assignment form (SSA) 3-address code Hybrid combinations

30 Sep 2012 EE-717/EE-453@IITB 14

CADSL

Categories of IR Structural

● graphically oriented (trees, DAGs)● nodes and edges tend to be large● heavily used on source-to-source translators

Linear● pseudo-code for abstract machine● large variation in level of abstraction● simple, compact data structures● easier to rearrange

Hybrid● combination of graphs and linear code (e.g. CFGs)● attempt to achieve best of both worlds

30 Sep 2012 EE-717/EE-453@IITB 15

CADSL

Important IR properties

Ease of generation Ease of manipulation Cost of manipulation Level of abstraction Freedom of expression (!) Size of typical procedure Original or derivative

30 Sep 2012 EE-717/EE-453@IITB 16

Subtle design decisions in the IR can have far-reaching effects on the speed and effectiveness of the compiler! Degree of exposed detail can be crucial

CADSL

Abstract Syntax Tree

30 Sep 2012 EE-717/EE-453@IITB 17

An AST is a parse tree

A linear operator form of this tree (postfix) would be:

x 2 y * -

CADSL

Directed Acyclic Graph

30 Sep 2012 EE-717/EE-453@IITB 18

A DAG is an AST with unique, shared nodes for each value.

x := 2 * y + sin(2*x)z := x / 2

CADSL

Control Flow Graph A CFG models transfer of control in a program● nodes are basic blocks (straight-line blocks of

code)● edges represent control flow (loops, if/else, goto

…)

30 Sep 2012 EE-717/EE-453@IITB 19

if x = y thenS1

elseS2

endS3

CADSL

3-address code

30 Sep 2012 EE-717/EE-453@IITB 20

Statements take the form: x = y op z● single operator and at most three names

x – 2 * y t1 = 2 * yt2 = x – t1

Advantages:● Compact form● Names for intermediate values

CADSL

Typical 3-address codes

assignments x = y op z

x = op y

x = y[i]

x = y

branches goto L

conditional branches if x relop y goto L

procedure calls param xparam ycall p

address and pointer assignments

x = &y*y = z

30 Sep 2012 EE-717/EE-453@IITB 21

CADSL

IR choices Other hybrids exist

● combinations of graphs and linear codes● CFG with 3-address code for basic blocks

Many variants used in practice● no widespread agreement● compilers may need several different IRs!

Advice:● choose IR with right level of detail● keep manipulation costs in mind

30 Sep 2012 EE-717/EE-453@IITB 22

CADSL

Static Single Assignment Form

Goal: simplify procedure-global optimizations

Definition:

Program is in SSA form if every

variable

is only assigned once

30 Sep 2012 EE-717/EE-453@IITB 23

CADSL

Static Single Assignment (SSA) Each assignment to a temporary is given a unique name● All uses reached by that assignment are

renamed● Compact representation● Useful for many kinds of compiler optimization

30 Sep 2012 EE-717/EE-453@IITB 24

Ron Cytron, et al., “Efficiently computing static single assignment form and the control dependence graph,” ACM TOPLAS., 1991.

x := 3;x := x + 1;x := 7;x := x*2;

x1 := 3;x2 := x1 + 1;x3 := 7;x4 := x3*2;

CADSL

Why Static?

Why Static?● We only look at the static program● One assignment per variable in the program

At runtime variables are assigned multiple times!

30 Sep 2012 EE-717/EE-453@IITB 25

CADSL

Example: Sequence

a := b + cb := c + 1d := b + ca := a + 1e := a + b

a1 := b1 + c1b2 := c1 + 1d1 := b2 + c1a2 := a1 + 1e1 := a2 + b2

Original SSA

Easy to do for sequential programs:

30 Sep 2012 EE-717/EE-453@IITB 26

CADSL

Example: Condition

if B thena := b

elsea := c

end… a …

if B thena1 := b

elsea2 := c

End

… a? …

Original SSA

Conditions: what to do on control-flow merge?

30 Sep 2012 EE-717/EE-453@IITB 27

CADSL

Solution: Φ-Function

if B thena := b

elsea := c

end… a …

if B thena1 := b

elsea2 := c

Enda3 := Φ(a1,a2)

… a3 …

Original SSA

Conditions: what to do on control-flow merge?

30 Sep 2012 EE-717/EE-453@IITB 28

CADSL

The Φ-Function

Φ-functions are always at the beginning of a basic block

Select between values depending on control-flow

ak+1 := Φ (a1…ak): the block has k preceding blocks

PHI-functions are evaluated simultaneously within a basic block.

30 Sep 2012 EE-717/EE-453@IITB 29

CADSL

SSA and CFG

SSA is normally used for control-flow graphs (CFG)

Basic blocks are in 3-address form

30 Sep 2012 EE-717/EE-453@IITB 30

CADSL

Recall: Control flow graph

A CFG models transfer of control in a program● nodes are basic blocks (straight-line blocks of

code)● edges represent control flow (loops, if/else, goto

…)

© Marcus Denker

if x = y thenS1

elseS2

endS3

30 Sep 2012 EE-717/EE-453@IITB 31

CADSL

SSA: a Simple Example

if B thena1 := 1

elsea2 := 2

Enda3 := PHI(a1,a2)

… a3 …

30 Sep 2012 EE-717/EE-453@IITB 32

CADSL

Recall: IR

• front end produces IR• optimizer transforms IR to more efficient program• back end transform IR to target code

30 Sep 2012 EE-717/EE-453@IITB 33

CADSL

SSA as IR

30 Sep 2012 EE-717/EE-453@IITB 34

CADSL

Transforming to SSA

Problem: Performance / Memory● Minimize number of inserted Φ-functions● Do not spend too much time

Many relatively complex algorithms● We do not go too much into detail● See literature!

30 Sep 2012 EE-717/EE-453@IITB 35

CADSL

Minimal SSA

Two steps: ● Place Φ-functions● Rename Variables

Where to place Φ-functions?

We want minimal amount of needed Φ● Save memory● Algorithms will work faster

30 Sep 2012 EE-717/EE-453@IITB 36

CADSLEE-717/EE-453@IITB 37

Translation of an assignment

Translation of an assignment statement

30 Sep 2012

CADSL

Optimization: The Idea

Transform the program to improve efficiency

Performance: faster execution Size: smaller executable, smaller memory footprint

Tradeoffs: 1) Performance vs. Size

2) Compilation speed and memory

3830 Sep 2012 EE-717/EE-453@IITB

CADSL

No Magic Bullet!

There is no perfect optimizer Example: optimize for simplicity

Opt(P): Smallest Program

Q: Program with no output, does not stop

Opt(Q)?

3930 Sep 2012 EE-717/EE-453@IITB

CADSL

Optimization on many levels

Optimizations both in the optimizer and back-end

4030 Sep 2012 EE-717/EE-453@IITB

CADSL© Marcus Denker

Examples for Optimizations Constant Folding / Propagation Copy Propagation Algebraic Simplifications Strength Reduction Dead Code Elimination

● Structure Simplifications Loop Optimizations Partial Redundancy Elimination Code Inlining

4130 Sep 2012 EE-717/EE-453@IITB

top related