Top Banner
Compiler Construction I Dr. Michael Petter TECHNISCHE UNIVERSITÄT MÜNCHEN FAKULTÄT FÜR INFORMATIK SoSe 2015 1 / 58
45

Compiler Construction I

Jan 23, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Compiler Construction I

Compiler Construction I

Dr. Michael Petter

◦◦◦◦◦◦◦◦◦ ◦◦◦◦◦◦◦◦◦ ◦◦

TECHNISCHE UNIVERSITÄT MÜNCHEN

FAKULTÄT FÜR INFORMATIK

SoSe 2015

1 / 58

Page 2: Compiler Construction I

Organizing

Master or Bachelor in the 6th Semester with 5 ECTSPrerequisites

Informatik 1 & 2Theoretische InformatikTechnische InformatikGrundlegende Algorithmen

Delve deeper withVirtual MachinesProgrammoptimizationProgramming LanguagesPraktikum CompilerbauSeminars

Materials:TTT-based lecture recordingsthe slidesRelated literature list online (⇒Wilhelm/Seidl/Hack CompilerDesign)Tools for visualization of virtual machines (VAM)Tools for generating components of Compilers (JFlex/CUP) 2 / 58

Page 3: Compiler Construction I

Organizing

Dates:Lecture: Mo. 14:15-15:45Tutorial: You can vote on two dates via moodle

Exam:One Exam in the summer, none in the winterExam managed via TUM-onlineSuccessful (50% credits) tutorial exercises earns 0.3 bonus

3 / 58

Page 4: Compiler Construction I

Preliminary content

Basics in regular expressions and automataSpecification and implementation of scannersReduced context free grammars and pushdown automataBottom-Up SyntaxanalysisAttribute systemsTypecheckingCodegeneration for stack machinesRegister assignmentBasic Optimization

4 / 58

Page 5: Compiler Construction I

Topic:

Introduction

5 / 58

Page 6: Compiler Construction I

Interpreter

OutputProgram

InputInterpreter

Pro:No precomputation on program text necessary⇒ no/small Startup-time

Con:Program components are analyzed multiple times during theexecution⇒ longer runtime

6 / 58

Page 7: Compiler Construction I

Concept of a Compiler

Code

OutputMachineInput

Code

Program Compiler

Two Phases:1 Translating the program text into a machine code2 Executing the machine code on the input

7 / 58

Page 8: Compiler Construction I

Compiler

A precomputation on the program allowsa more sophisticated variable managementdiscovery and implementation of global optimizations

Disadvantage

The Translation costs time

Advantage

The execution of the program becomes more efficient⇒ payoff for more sophisticated or multiply running programs.

8 / 58

Page 9: Compiler Construction I

Compiler

general Compiler setup:

Analysis

Synthesis

Int. Representation

Com

pilerCom

pile

r

Program code

Code

9 / 58

Page 10: Compiler Construction I

Compiler

general Compiler setup:

Synthesis

Int. Representation

Com

pilerCom

pile

r

Program code

Code

Analysis

9 / 58

Page 11: Compiler Construction I

Compiler

The Analysis-Phase is divided in several parts:

Program code

Ana

lysi

s

9 / 58

Page 12: Compiler Construction I

Compiler

The Analysis-Phase is divided in several parts:

Scanner

Program code

Token-Stream

Partitioning in tokenslexicographic Analysis:

Ana

lysi

s

9 / 58

Page 13: Compiler Construction I

Compiler

The Analysis-Phase is divided in several parts:

Scanner

Parser

Program code

Token-Stream

Detecting hierarchical structure

Partitioning in tokenslexicographic Analysis:

syntactic Analysis:

Syntax treeAna

lysi

s

9 / 58

Page 14: Compiler Construction I

Compiler

The Analysis-Phase is divided in several parts:

Scanner

Parser

Program code

Token-Stream

Detecting hierarchical structure

Partitioning in tokenslexicographic Analysis:

syntactic Analysis:

Syntax tree

semantic Analysis:

(annotated) Syntax tree

Ana

lysi

s

Infering semantic propertiesChecker...Type

9 / 58

Page 15: Compiler Construction I

Topic:

Lexical Analysis

10 / 58

Page 16: Compiler Construction I

The lexical Analysis

Scanner Token-StreamProgram code

A Token is a sequence of characters, which together form a unit.Tokens are subsumed in classes. For example:

→ Names (Identifiers) e.g. xyz, pi, ...

→ Constants e.g. 42, 3.14, ”abc”, ...

→ Operators e.g. +, ...

→ reserved terms e.g. if, int, ...

11 / 58

Page 17: Compiler Construction I

The lexical Analysis

Scannerxyz + 42 xyz 42+

A Token is a sequence of characters, which together form a unit.Tokens are subsumed in classes. For example:

→ Names (Identifiers) e.g. xyz, pi, ...

→ Constants e.g. 42, 3.14, ”abc”, ...

→ Operators e.g. +, ...

→ reserved terms e.g. if, int, ...

11 / 58

Page 18: Compiler Construction I

The lexical Analysis

Scannerxyz + 42 xyz 42+

A Token is a sequence of characters, which together form a unit.Tokens are subsumed in classes. For example:

→ Names (Identifiers) e.g. xyz, pi, ...

→ Constants e.g. 42, 3.14, ”abc”, ...

→ Operators e.g. +, ...

→ reserved terms e.g. if, int, ...

11 / 58

Page 19: Compiler Construction I

The lexical Analysis

ScannerI O C

xyz + 42 xyz 42+

A Token is a sequence of characters, which together form a unit.Tokens are subsumed in classes. For example:

→ Names (Identifiers) e.g. xyz, pi, ...

→ Constants e.g. 42, 3.14, ”abc”, ...

→ Operators e.g. +, ...

→ reserved terms e.g. if, int, ...

11 / 58

Page 20: Compiler Construction I

The Lexical Analysis

Classified tokens allow for further pre-processing:

Dropping irrelevant fragments e.g. Spacing, Comments,...Separating Pragmas, i.e. directives vor the compiler, which arenot directly part of the program, like include-Statements;Replacing of Tokens of particular classes with their meaning /internal representation, e.g.

→ Constants;

→ Names: typically managed centrally in a Symbol-table, evt.compared to reserved terms (if not already done by thescanner) and possibly replaced with an index.

⇒ Siever

12 / 58

Page 21: Compiler Construction I

The Lexical Analysis

Discussion:Scanner and Siever are often combined into a single component,mostly by providing appropriate callback actions in the event thatthe scanner detects a token.Scanners are mostly not written manually, but generated from aspecification.

ScannerSpecification Generator

13 / 58

Page 22: Compiler Construction I

The Lexical Analysis - Generating:

... in our case:

ScannerSpecification Generator

Specification of Token-classes: Regular expressions;Generated Implementation: Finite automata + X

14 / 58

Page 23: Compiler Construction I

The Lexical Analysis - Generating:

... in our case:

[0−9]

[1−9]

0

0 | [1-9][0-9]* Generator

Specification of Token-classes: Regular expressions;Generated Implementation: Finite automata + X

14 / 58

Page 24: Compiler Construction I

Chapter 1:

Basics: Regular Expressions

15 / 58

Lexical Analysis

Page 25: Compiler Construction I

Regular Expressions

BasicsProgram code is composed from a finite alphabet Σ of inputcharacters, e.g. UnicodeThe sets of textfragments of a token class is in general regular.Regular languages can be specified by regular expressions.

Definition Regular Expressions

The set EΣ of (non-empty) regular expressionsis the smallest set E with:

ε ∈ E (ε a new symbol not from Σ);a ∈ E for all a ∈ Σ;(e1 | e2), (e1 · e2), e1

∗ ∈ E if e1, e2 ∈ E .

16 / 58

Page 26: Compiler Construction I

Regular Expressions

BasicsProgram code is composed from a finite alphabet Σ of inputcharacters, e.g. UnicodeThe sets of textfragments of a token class is in general regular.Regular languages can be specified by regular expressions.

Definition Regular Expressions

The set EΣ of (non-empty) regular expressionsis the smallest set E with:

ε ∈ E (ε a new symbol not from Σ);a ∈ E for all a ∈ Σ;(e1 | e2), (e1 · e2), e1

∗ ∈ E if e1, e2 ∈ E .

16 / 58

Stephen Kleene

Page 27: Compiler Construction I

Regular Expressions

... Example:((a · b∗)·a)(a | b)((a · b)·(a · b))

Attention:We distinguish between characters a, 0, $,... and Meta-symbols(, |, ),...To avoid (ugly) parantheses, we make use ofOperator-Precedences:

∗ > · > |and omit “·”Real Specification-languages offer additional constructs:

e? ≡ (ε | e)e+ ≡ (e · e∗)

and omit “ε”

17 / 58

Page 28: Compiler Construction I

Regular Expressions

... Example:((a · b∗)·a)(a | b)((a · b)·(a · b))

Attention:We distinguish between characters a, 0, $,... and Meta-symbols(, |, ),...To avoid (ugly) parantheses, we make use ofOperator-Precedences:

∗ > · > |and omit “·”

Real Specification-languages offer additional constructs:

e? ≡ (ε | e)e+ ≡ (e · e∗)

and omit “ε”

17 / 58

Page 29: Compiler Construction I

Regular Expressions

... Example:((a · b∗)·a)(a | b)((a · b)·(a · b))

Attention:We distinguish between characters a, 0, $,... and Meta-symbols(, |, ),...To avoid (ugly) parantheses, we make use ofOperator-Precedences:

∗ > · > |and omit “·”Real Specification-languages offer additional constructs:

e? ≡ (ε | e)e+ ≡ (e · e∗)

and omit “ε”17 / 58

Page 30: Compiler Construction I

Regular Expressions

Specifications need Semantics...Example:

Specification Semanticsabab {abab}a | b {a, b}ab∗a {abna | n ≥ 0}

For e ∈ EΣ we define the specified language [[e]] ⊆ Σ∗

inductively by:[[ε]] = {ε}[[a]] = {a}[[e∗]] = ([[e]])∗

[[e1|e2]] = [[e1]] ∪ [[e2]][[e1·e2]] = [[e1]] · [[e2]]

18 / 58

Page 31: Compiler Construction I

Keep in Mind:

The operators (_)∗,∪, · are interpreted in the context of setsof words:

(L)∗ = {w1 . . .wk | k ≥ 0,wi ∈ L}L1 · L2 = {w1w2 | w1 ∈ L1,w2 ∈ L2}

Regular expressions are internally represented as annotatedranked trees:

.

|

*

b

ε

a

(ab|ε)∗

Inner nodes: Operator-applications;Leaves: particular symbols or ε.

19 / 58

Page 32: Compiler Construction I

Keep in Mind:

The operators (_)∗,∪, · are interpreted in the context of setsof words:

(L)∗ = {w1 . . .wk | k ≥ 0,wi ∈ L}L1 · L2 = {w1w2 | w1 ∈ L1,w2 ∈ L2}

Regular expressions are internally represented as annotatedranked trees:

.

|

*

b

ε

a

(ab|ε)∗

Inner nodes: Operator-applications;Leaves: particular symbols or ε.

19 / 58

Page 33: Compiler Construction I

Regular Expressions

Example: Identifiers in Java:

le = [a-zA-Z_\$]di = [0-9]Id = {le} ({le} | {di})*

Float = {di}* (\.{di}|{di}\.) {di}*((e|E)(\+|\-)?{di}+)?

Remarks:“le” and “di” are token classes.Defined Names are enclosed in “{”, “}”.Symbols are distinguished from Meta-symbols via “\”.

20 / 58

Page 34: Compiler Construction I

Regular Expressions

Example: Identifiers in Java:

le = [a-zA-Z_\$]di = [0-9]Id = {le} ({le} | {di})*

Float = {di}* (\.{di}|{di}\.) {di}*((e|E)(\+|\-)?{di}+)?

Remarks:“le” and “di” are token classes.Defined Names are enclosed in “{”, “}”.Symbols are distinguished from Meta-symbols via “\”.

20 / 58

Page 35: Compiler Construction I

Regular Expressions

Example: Identifiers in Java:

le = [a-zA-Z_\$]di = [0-9]Id = {le} ({le} | {di})*

Float = {di}* (\.{di}|{di}\.) {di}*((e|E)(\+|\-)?{di}+)?

Remarks:“le” and “di” are token classes.Defined Names are enclosed in “{”, “}”.Symbols are distinguished from Meta-symbols via “\”.

20 / 58

Page 36: Compiler Construction I

Chapter 2:

Basics: Finite Automata

21 / 58

Lexical Analysis

Page 37: Compiler Construction I

Finite Automata

Example:

a b

ε

ε

Nodes: States;Edges: Transitions;Lables: Consumed input;

22 / 58

Page 38: Compiler Construction I

Finite Automata

Example:

a b

ε

ε

Nodes: States;Edges: Transitions;Lables: Consumed input;

22 / 58

Page 39: Compiler Construction I

Finite Automata

Definition Finite AutomataA non-deterministic finite automaton(NFA) is a tuple A = (Q,Σ, δ, I,F) with:

Q a finite set of states;Σ a finite alphabet of inputs;I ⊆ Q the set of start states;F ⊆ Q the set of final states andδ the set of transitions (-relation)

For an NFA, we reckon:

Definition Deterministic Finite AutomataGiven δ : Q× Σ→ Q a function and |I| = 1, then we call the NFA Adeterministic (DFA).

23 / 58

Michael Rabin Dana Scott

Page 40: Compiler Construction I

Finite Automata

Definition Finite AutomataA non-deterministic finite automaton(NFA) is a tuple A = (Q,Σ, δ, I,F) with:

Q a finite set of states;Σ a finite alphabet of inputs;I ⊆ Q the set of start states;F ⊆ Q the set of final states andδ the set of transitions (-relation)

For an NFA, we reckon:

Definition Deterministic Finite AutomataGiven δ : Q× Σ→ Q a function and |I| = 1, then we call the NFA Adeterministic (DFA).

23 / 58

Michael Rabin Dana Scott

Page 41: Compiler Construction I

Finite Automata

Computations are paths in the graph.Accepting computations lead from I to F .An accepted word is the sequence of lables along an acceptingcomputation ...

a b

ε

ε

24 / 58

Page 42: Compiler Construction I

Finite Automata

Computations are paths in the graph.Accepting computations lead from I to F .An accepted word is the sequence of lables along an acceptingcomputation ...

a b

ε

ε

24 / 58

Page 43: Compiler Construction I

Finite Automata

Once again, more formally:

We define the transitive closure δ∗ of δ as the smallest set δ′ with:

(p, ε, p) ∈ δ′ and(p, xw, q) ∈ δ′ if (p, x, p1) ∈ δ and (p1,w, q) ∈ δ′.

δ∗ characterizes for two states p and q the words, along eachpath between them

The set of all accepting words, i.e. A’s accepted language can bedescribed compactly as:

L(A) = {w ∈ Σ∗ | ∃ i ∈ I, f ∈ F : (i,w, f ) ∈ δ∗}

25 / 58

Page 44: Compiler Construction I

Chapter 3:

Converting Regular Expressions to NFAs

26 / 58

Lexical Analysis

Page 45: Compiler Construction I

In linear time from Regular Expressions to NFAs

e = a a

e = a|b

e = abba

εe = ε

e = a∗

a

b

ε

ε

ε

ε

ε

ε a ε

ε

ε

Thompson’s Algorithm

Produces O(n) states for regular expressions oflength n.

27 / 58

Ken Thompson