Top Banner
CADSL Compiler Construction An Introduction Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail: [email protected] EE-717/453:Advance Computing for Electrical Engineers Lecture 17 (30 Sep 2013)
41

Compiler Construction - cdeep.iitb.ac.in

May 11, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Compiler Construction - cdeep.iitb.ac.in

CADSL

Compiler ConstructionAn Introduction

Virendra SinghAssociate ProfessorComputer Architecture and Dependable Systems LabDepartment of Electrical EngineeringIndian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/E-mail: [email protected]

EE­717/453:Advance Computing for Electrical EngineersLecture 17 (30 Sep 2013)

Page 2: Compiler Construction - cdeep.iitb.ac.in

CADSLEE-717/EE-453@IITB 2

What is a compiler?

A program that reads a program written in one language and translates it into another language.

Source language Target language

Traditionally, compilers go from high-level languages to low-level languages.

30 Sep 2012

Page 3: Compiler Construction - cdeep.iitb.ac.in

CADSL30 Sep 2012 EE-717/EE-453@IITB 3

Compilers

Common compilation tasks● Language translation● Error checking and report ● Performance improvement

Fundamental compilation principles● The compiler must preserve the meaning of

source program● The compiler must improve the source program

in some discernible way

Page 4: Compiler Construction - cdeep.iitb.ac.in

CADSLEE-717/EE-453@IITB 4

Compiler Architecture

Front End – language specific

Back End –machine specific

SourceLanguage

Target Language

Intermediate Language

In more detail:

Separation of Concerns Retargeting

30 Sep 2012

Page 5: Compiler Construction - cdeep.iitb.ac.in

CADSLEE-717/EE-453@IITB 5

Compiler Architecture

Scanner(lexical

analysis)

Parser(syntax

analysis)

CodeOptimizer

SemanticAnalysis

(IC generator)

CodeGenerator

SymbolTable

Sourcelanguage

tokens Syntacticstructure

IntermediateLanguage

Targetlanguage

IntermediateLanguage

30 Sep 2012

Page 6: Compiler Construction - cdeep.iitb.ac.in

CADSLEE-717/EE-453@IITB 6

Translation of an assignment

Translation of an assignment statement

30 Sep 2012

Page 7: Compiler Construction - cdeep.iitb.ac.in

CADSL30 Sep 2012 EE-717/EE-453@IITB 7

Lexical Analysis Character stream token stream

● Recognize “words” of a language Theoretical problem: specify and recognize

patterns in strings ● Scanner as a practical application● Regular expression, finite automata● Tools that automatically generate scanners are

commonly used

index := start + step * 20Input:index := start + step * 20After scanning:

identifier operator number

Page 8: Compiler Construction - cdeep.iitb.ac.in

CADSL30 Sep 2012 EE-717/EE-453@IITB 8

Syntactical Analysis

Token stream syntax tree● Recognize “sentences” of a

language Grammars and parsers

● CFG● Parsers can be automatically

generated● Top-down and bottom-up

parsing● Predictive parsing● Driven process of compiler front-

ends

After scanning:index := start + step * 20

After parsing:

index

Assign

ExpID :=

NumID

ID

+ Exp

*

start

step 20

Exp

Exp Exp

Page 9: Compiler Construction - cdeep.iitb.ac.in

CADSLEE-717/EE-453@IITB 9

Semantic Analysis The semantic analyzer uses the syntax tree and the

information in the symbol table to check the source program for semantic consistency with the language definition.

Gathers type information and saves it in either the syntax tree or the symbol table, for subsequent use during intermediate-code generation.

An important part of semantic analysis is type checking, where the compiler checks that each operator has matching operands. For example, many programming language definitions require an array index to be an integer; the compiler must report an error if a floating-point number is used to index an array.

The language specification may permit some type conversions called coercions. For example, a binary arithmetic operator may be applied to either a pair of integers or to a pair of floating-point numbers. If the operator is applied to a floating-point number and an integer, the compiler may convert or coerce the integer into a floating-point number.

30 Sep 2012

Page 10: Compiler Construction - cdeep.iitb.ac.in

CADSL30 Sep 2012 EE-717/EE-453@IITB 10

Semantic Analysis

Understand/annotate meaning of the program● Syntax-directed translation● Check semantic errors

● Inconsistent variable definitions and uses● Type systems

● Collect knowledge of the input program● Symbol tables● Scopes

Page 11: Compiler Construction - cdeep.iitb.ac.in

CADSLEE-717/EE-453@IITB 11

Compiler Architecture

Scanner(lexical

analysis)

Parser(syntax

analysis)

CodeOptimizer

SemanticAnalysis

(IC generator)

CodeGenerator

SymbolTable

Sourcelanguage

tokens Syntacticstructure

IntermediateLanguage

Targetlanguage

IntermediateLanguage

30 Sep 2012

Page 12: Compiler Construction - cdeep.iitb.ac.in

CADSL30 Sep 2012 EE-717/EE-453@IITB 12

Intermediate Code Generation

Representation of the input program● Internal to the compiler● Encode knowledge collected during compilation● Varied forms and levels

● Typically a compiler use more than one kind of IR● Desired properties

● Easy to produce, manipulate, and translate into the target code

Page 13: Compiler Construction - cdeep.iitb.ac.in

CADSL

IR scheme

30 Sep 2012 EE-717/EE-453@IITB 13

• front end produces IR

• optimizer transforms IR to more efficient program

• back end transforms IR to target code

Page 14: Compiler Construction - cdeep.iitb.ac.in

CADSL

Kinds of IR Abstract syntax trees (AST) Linear operator form of tree (e.g., postfix notation)

Directed acyclic graphs (DAG) Control flow graphs (CFG) Program dependence graphs (PDG) Static single assignment form (SSA) 3-address code Hybrid combinations

30 Sep 2012 EE-717/EE-453@IITB 14

Page 15: Compiler Construction - cdeep.iitb.ac.in

CADSL

Categories of IR Structural

● graphically oriented (trees, DAGs)● nodes and edges tend to be large● heavily used on source-to-source translators

Linear● pseudo-code for abstract machine● large variation in level of abstraction● simple, compact data structures● easier to rearrange

Hybrid● combination of graphs and linear code (e.g. CFGs)● attempt to achieve best of both worlds

30 Sep 2012 EE-717/EE-453@IITB 15

Page 16: Compiler Construction - cdeep.iitb.ac.in

CADSL

Important IR properties

Ease of generation Ease of manipulation Cost of manipulation Level of abstraction Freedom of expression (!) Size of typical procedure Original or derivative

30 Sep 2012 EE-717/EE-453@IITB 16

Subtle design decisions in the IR can have far-reaching effects on the speed and effectiveness of the compiler! Degree of exposed detail can be crucial

Page 17: Compiler Construction - cdeep.iitb.ac.in

CADSL

Abstract Syntax Tree

30 Sep 2012 EE-717/EE-453@IITB 17

An AST is a parse tree

A linear operator form of this tree (postfix) would be:

x 2 y * -

Page 18: Compiler Construction - cdeep.iitb.ac.in

CADSL

Directed Acyclic Graph

30 Sep 2012 EE-717/EE-453@IITB 18

A DAG is an AST with unique, shared nodes for each value.

x := 2 * y + sin(2*x)z := x / 2

Page 19: Compiler Construction - cdeep.iitb.ac.in

CADSL

Control Flow Graph A CFG models transfer of control in a program● nodes are basic blocks (straight-line blocks of

code)● edges represent control flow (loops, if/else, goto

…)

30 Sep 2012 EE-717/EE-453@IITB 19

if x = y thenS1

elseS2

endS3

Page 20: Compiler Construction - cdeep.iitb.ac.in

CADSL

3-address code

30 Sep 2012 EE-717/EE-453@IITB 20

Statements take the form: x = y op z● single operator and at most three names

x – 2 * y t1 = 2 * yt2 = x – t1

Advantages:● Compact form● Names for intermediate values

Page 21: Compiler Construction - cdeep.iitb.ac.in

CADSL

Typical 3-address codes

assignments x = y op z

x = op y

x = y[i]

x = y

branches goto L

conditional branches if x relop y goto L

procedure calls param xparam ycall p

address and pointer assignments

x = &y*y = z

30 Sep 2012 EE-717/EE-453@IITB 21

Page 22: Compiler Construction - cdeep.iitb.ac.in

CADSL

IR choices Other hybrids exist

● combinations of graphs and linear codes● CFG with 3-address code for basic blocks

Many variants used in practice● no widespread agreement● compilers may need several different IRs!

Advice:● choose IR with right level of detail● keep manipulation costs in mind

30 Sep 2012 EE-717/EE-453@IITB 22

Page 23: Compiler Construction - cdeep.iitb.ac.in

CADSL

Static Single Assignment Form

Goal: simplify procedure-global optimizations

Definition:

Program is in SSA form if every

variable

is only assigned once

30 Sep 2012 EE-717/EE-453@IITB 23

Page 24: Compiler Construction - cdeep.iitb.ac.in

CADSL

Static Single Assignment (SSA) Each assignment to a temporary is given a unique name● All uses reached by that assignment are

renamed● Compact representation● Useful for many kinds of compiler optimization

30 Sep 2012 EE-717/EE-453@IITB 24

Ron Cytron, et al., “Efficiently computing static single assignment form and the control dependence graph,” ACM TOPLAS., 1991.

x := 3;x := x + 1;x := 7;x := x*2;

x1 := 3;x2 := x1 + 1;x3 := 7;x4 := x3*2;

Page 25: Compiler Construction - cdeep.iitb.ac.in

CADSL

Why Static?

Why Static?● We only look at the static program● One assignment per variable in the program

At runtime variables are assigned multiple times!

30 Sep 2012 EE-717/EE-453@IITB 25

Page 26: Compiler Construction - cdeep.iitb.ac.in

CADSL

Example: Sequence

a := b + cb := c + 1d := b + ca := a + 1e := a + b

a1 := b1 + c1b2 := c1 + 1d1 := b2 + c1a2 := a1 + 1e1 := a2 + b2

Original SSA

Easy to do for sequential programs:

30 Sep 2012 EE-717/EE-453@IITB 26

Page 27: Compiler Construction - cdeep.iitb.ac.in

CADSL

Example: Condition

if B thena := b

elsea := c

end… a …

if B thena1 := b

elsea2 := c

End

… a? …

Original SSA

Conditions: what to do on control-flow merge?

30 Sep 2012 EE-717/EE-453@IITB 27

Page 28: Compiler Construction - cdeep.iitb.ac.in

CADSL

Solution: Φ-Function

if B thena := b

elsea := c

end… a …

if B thena1 := b

elsea2 := c

Enda3 := Φ(a1,a2)

… a3 …

Original SSA

Conditions: what to do on control-flow merge?

30 Sep 2012 EE-717/EE-453@IITB 28

Page 29: Compiler Construction - cdeep.iitb.ac.in

CADSL

The Φ-Function

Φ-functions are always at the beginning of a basic block

Select between values depending on control-flow

ak+1 := Φ (a1…ak): the block has k preceding blocks

PHI-functions are evaluated simultaneously within a basic block.

30 Sep 2012 EE-717/EE-453@IITB 29

Page 30: Compiler Construction - cdeep.iitb.ac.in

CADSL

SSA and CFG

SSA is normally used for control-flow graphs (CFG)

Basic blocks are in 3-address form

30 Sep 2012 EE-717/EE-453@IITB 30

Page 31: Compiler Construction - cdeep.iitb.ac.in

CADSL

Recall: Control flow graph

A CFG models transfer of control in a program● nodes are basic blocks (straight-line blocks of

code)● edges represent control flow (loops, if/else, goto

…)

© Marcus Denker

if x = y thenS1

elseS2

endS3

30 Sep 2012 EE-717/EE-453@IITB 31

Page 32: Compiler Construction - cdeep.iitb.ac.in

CADSL

SSA: a Simple Example

if B thena1 := 1

elsea2 := 2

Enda3 := PHI(a1,a2)

… a3 …

30 Sep 2012 EE-717/EE-453@IITB 32

Page 33: Compiler Construction - cdeep.iitb.ac.in

CADSL

Recall: IR

• front end produces IR• optimizer transforms IR to more efficient program• back end transform IR to target code

30 Sep 2012 EE-717/EE-453@IITB 33

Page 34: Compiler Construction - cdeep.iitb.ac.in

CADSL

SSA as IR

30 Sep 2012 EE-717/EE-453@IITB 34

Page 35: Compiler Construction - cdeep.iitb.ac.in

CADSL

Transforming to SSA

Problem: Performance / Memory● Minimize number of inserted Φ-functions● Do not spend too much time

Many relatively complex algorithms● We do not go too much into detail● See literature!

30 Sep 2012 EE-717/EE-453@IITB 35

Page 36: Compiler Construction - cdeep.iitb.ac.in

CADSL

Minimal SSA

Two steps: ● Place Φ-functions● Rename Variables

Where to place Φ-functions?

We want minimal amount of needed Φ● Save memory● Algorithms will work faster

30 Sep 2012 EE-717/EE-453@IITB 36

Page 37: Compiler Construction - cdeep.iitb.ac.in

CADSLEE-717/EE-453@IITB 37

Translation of an assignment

Translation of an assignment statement

30 Sep 2012

Page 38: Compiler Construction - cdeep.iitb.ac.in

CADSL

Optimization: The Idea

Transform the program to improve efficiency

Performance: faster execution Size: smaller executable, smaller memory footprint

Tradeoffs: 1) Performance vs. Size

2) Compilation speed and memory

3830 Sep 2012 EE-717/EE-453@IITB

Page 39: Compiler Construction - cdeep.iitb.ac.in

CADSL

No Magic Bullet!

There is no perfect optimizer Example: optimize for simplicity

Opt(P): Smallest Program

Q: Program with no output, does not stop

Opt(Q)?

3930 Sep 2012 EE-717/EE-453@IITB

Page 40: Compiler Construction - cdeep.iitb.ac.in

CADSL

Optimization on many levels

Optimizations both in the optimizer and back-end

4030 Sep 2012 EE-717/EE-453@IITB

Page 41: Compiler Construction - cdeep.iitb.ac.in

CADSL© Marcus Denker

Examples for Optimizations Constant Folding / Propagation Copy Propagation Algebraic Simplifications Strength Reduction Dead Code Elimination

● Structure Simplifications Loop Optimizations Partial Redundancy Elimination Code Inlining

4130 Sep 2012 EE-717/EE-453@IITB