Top Banner
1 Course Overview PART I: overview material 1 Introduction 2 Language processors (tombstone diagrams, bootstrapping) 3 Architecture of a compiler PART II: inside a compiler 4 Syntax analysis 5 Contextual analysis 6 Runtime organization 7 Code generation PART III: conclusion 8 Interpretation 9 Review Supplementary material: Theoretical foundations (Context-free grammars)
41

1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

Jan 02, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

1

Course Overview

PART I: overview material1 Introduction

2 Language processors (tombstone diagrams, bootstrapping)

3 Architecture of a compiler

PART II: inside a compiler4 Syntax analysis

5 Contextual analysis

6 Runtime organization

7 Code generation

PART III: conclusion8 Interpretation

9 Review

Supplementary material:Theoretical foundations(Context-free grammars)

Page 2: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

2

Syntactic Analysis (a peek ahead at Chapter 4)

Sub–phase

Input Theoretical tools

Output

Scanner String of characters

Regular expression, Finite–state machine

Sequence of tokens

Parser Sequence of tokens

Context–free grammar, BNF, EBNF

Syntax tree or parse tree

Page 3: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

3

Example

E

E

E E

E+

id*

idid

• The program:x * y + z

• Input to parser:ID TIMES ID PLUS IDwe’ll write tokens as follows:

id * id + id

• Output of parser:a parse tree

Page 4: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

4

What must parser do?

1. Recognizer: not all sequences of tokens are programs– must distinguish between valid and invalid strings of tokens

2. Translator: must expose program structure• e.g., associativity and precedence• hence must return the syntax tree

We need:– A language for describing valid sequences of tokens

• context-free grammars• (analogous to regular expressions in the scanner)

– A method for distinguishing valid from invalid strings of tokens (and for building the syntax tree)• the parser• (analogous to the finite state machine in the scanner)

Page 5: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

5

Context-free grammars (CFGs)

• Example: Simple Arithmetic Expressions– In English:

• An integer is an arithmetic expression. • If exp1 and exp2 are arithmetic expressions,

then so are the following:

exp1 - exp2

exp1 / exp2

( exp1 )

• the corresponding CFG: we’ll write tokens as follows:

exp INTLITERAL E intlitexp exp MINUS exp E E - E exp exp DIVIDE exp E E / E exp LPAREN exp RPAREN E ( E )

Page 6: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

6

Reading the CFG

• The grammar has five terminal symbols: – intlit, -, /, (, ) – terminals of a grammar = tokens returned by the scanner.

• The grammar has one non-terminal symbol: – E – non-terminals describe valid sequences of tokens

• The grammar has four productions or rules, – each of the form: E

• left-hand side = a single non-terminal. • right-hand side = either

– a sequence of one or more terminals and/or non-terminals, or

(the empty string)

Page 7: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

7

Example, revisited

• Note: – a more compact way to write previous

grammar: E intlit | E - E | E / E | ( E )

or

E intlit | E - E | E / E | ( E )

Page 8: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

8

A formal definition of CFGs

• A CFG consists of– A set of terminals T– A set of non-terminals N– A start symbol S (one of the non-terminals)– A set of productions:

1 2

where and n

i

X YY Y

X N Y T N

Page 9: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

9

Notational Conventions

• In these lecture notes– Non-terminals are written in upper-case– Terminals are written in lower-case– The start symbol is the left-hand side of the

first production (unless specified otherwise)

Page 10: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

10

The Language of a CFG

The language defined by a CFG is the set of strings that can be derived from the start symbol of the grammar.

Derivation: Read productions as rules:

Means can be replaced by

1 nX Y Y

X 1 nY Y

Page 11: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

11

Derivation: key idea

1. Begin with a string consisting of the start symbol “S”

2. Replace any non-terminal X in the string by the right-hand side of some production

3. Repeat (2) until there are no non-terminals in the string

1 nX Y Y

Page 12: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

12

Derivation: an example

CFG:E idE E + E E E * E E ( E )

String id * id + id is in the language defined by the

grammar.

E

E+E

E E+E

id E + E

id id + E

id id + id

Derivation:

Page 13: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

13

Terminals

• Terminals are so called because there are no rules for replacing them

• Once generated, terminals are permanent

• Therefore, terminals are the tokens of the language

Page 14: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

14

The Language of a CFG (continued)

More formally, we can write

if there is a production

1 1 1 1 1i n i m i nX X X X X Y Y X X

1 i mX Y Y

Page 15: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

15

The Language of a CFG (continued)

Write

if

using a sequence of 0 or more replacement steps

1 1n mX X Y Y

1 1n mX X Y Y

Page 16: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

16

The Language of a CFG

Let G be a context-free grammar with start symbol S. Then the language of G is:

1 1| and every is a terminaln n ia a S a a a

Page 17: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

17

Example

Strings of balanced parentheses

The grammar:

( )S S

S

( )

|

S S

( ) | 0i i i

Which is the sameas

Page 18: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

18

Another Example

A simple arithmetic expression grammar:

Some strings in the language of this grammar:

E E+E | E E | (E) | id

id id + id

(id) id id

(id) id id (id)

Page 19: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

19

Derivations and Parse Trees

A derivation is a sequence of productions

A derivation can be drawn as a tree– Start symbol is the tree’s root– For a production add children

to node

S

1 nX Y Y X

1 nY Y

Page 20: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

20

Derivation Example

• Grammar

• String

E E+E | E E | (E) | id

id id + id

Page 21: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

21

Derivation Example (continued)

E

E+E

E E+E

id E + E

id id + E

id id + id

E

E

E E

E+

id*

idid

Page 22: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

22

Notes on Derivations

• A syntax tree or parse tree has– Terminals at the leaves– Non-terminals at the interior nodes

• An in-order traversal of the leaves yields the original input string

• As in the preceding example, we usually show a left–most derivation, that is, replace the left–most non–terminal remaining at each step

Page 23: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

23

Ambiguity

• Grammar

• String

E E+E | E E | (E) | id

id id + id

Page 24: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

24

Ambiguity (continued)

This string has two parse trees

E

E

E E

E*

id +

idid

E

E

E E

E+

id*

idid

Page 25: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

25

TEST YOURSELF

Question 1:– for each of the two parse trees, find the

corresponding left-most derivation

Question 2:– for each of the two parse trees, find the

corresponding right-most derivation

Page 26: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

26

Ambiguity (continued)

• A grammar is ambiguous if for at least one string:– the string has more than one parse tree– the string has more than one left-most derivation– the string has more than one right-most

derivation• Note that these three conditions are equivalent

• Ambiguity is BAD– because if the grammar is ambiguous then the

meaning of some programs is not well-defined

Page 27: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

27

Dealing with Ambiguity

• There are several ways to handle ambiguity

• Most direct method is to rewrite the grammar unambiguously

• For example, enforce precedence of * and / over + and –

Page 28: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

28

Enforcing Correct Precedence

• Rewrite the grammar– use a different nonterminal for each precedence

level – start with the lowest precedence (MINUS)

E E - E | E / E | ( E ) | id

rewrite to

E E - E | TT T / T | F F id | ( E )

Page 29: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

29

Example

parse tree for id – id / id

E E - E | TT T / T | F F id | ( E )

E

E

F F

T

-

id

/

idid

T

FT T

E

Page 30: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

30

TEST YOURSELF

Question 3:• Attempt to construct a parse tree for id-id/id

that shows the wrong precedence. – Why do you fail to construct such a parse tree?

Question 4:• Draw two parse trees for the expression a-b-

c– One should correctly group ((a-b)–c), and one

should incorrectly group (a–(b-c))

Page 31: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

31

Enforcing Correct Associativity

• The grammar captures operator precedence, but it is still ambiguous– fails to express that both subtraction and

division are left associative; • 5-3-2 is equivalent to: ((5-3)-2) but not to: (5-(3-2)). • 8/4/2 is equivalent to: ((8/4)/2) but not to: (8/(4/2)).

Page 32: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

32

Recursion

• A grammar is recursive in nonterminal X if: – X + … X …

• the notation + means “after one or more steps, X derives a sequence of symbols that includes another X”

• A grammar is left recursive in X if: – X + X …

• after one or more steps, X derives a sequence of symbols that starts with an X

• A grammar is right recursive in X if: – X + … X

• after one or more steps, X derives a sequence of symbols that ends with an X

Page 33: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

33

How to fix associativity

• The grammar given above is both left and right recursive in non–terminals exp and term– try at home: write the derivation steps that show this.

• To correctly express operator associativity: – For left associativity, use only left recursion. – For right associativity, use only right recursion.

• Here's the correct grammar: E E – T | TT T / F | F F id | ( E )

Page 34: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

34

Ambiguity: The Dangling Else Problem

• Consider the grammar S if E then S | if E then S else S | a

E b

• This grammar is ambiguous

Page 35: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

35

The Dangling Else Problem: Example

• The input string if b then if b then a else a

has two different parse trees:

S

E S

E S S

S

E S

E S

S

b a

ab

b

b a a

if then

else

if then

if then

if then

else

Page 36: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

36

The Dangling Else Problem: How to Fix

• else should match the closest unmatched then

• We can enforce this in a grammar:

S M /* all then are matched */ | U /* some then are unmatched */

M if E then M else M

| aU if E then S | if E then M else U

• Note: still generates the same set of strings

Page 37: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

37

The Dangling Else Problem: Example Revisited

• Consider: if b then if b then a else a

• There is now only one possible parse tree for this string

• Try to draw a different parse tree and you should see why this is true

U

E M

E M Mb

b

U

E M

E

a

U

E M

E

a

U

E M

E

then

if

if then

else

Page 38: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

38

Reg Exp are a Subset of CFG

We can inductively build a grammar for each Reg Exp: S a S aR1 R2 S S1 S2

R1 | R2 S S1 | S2

R1* S S1 S |

Where:G1 = grammar for R1, with start symbol S1

G2 = grammar for R2, with start symbol S2

Page 39: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

39

Backus–Naur Form

BNF is a special syntax for writing a CFG:

CFGE E+T | E–T | TT T*F | T/F | FF id | (E)

BNF<Expr> ::= <Expr> + <Term> | <Expr> – <Term> | <Term><Term> ::= <Term> * <Fact> | <Term> / <Fact> | <Fact><Fact> ::= id | ( <Expr> )

Page 40: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

40

Extended BNF

EBNF permits any Reg Exp on right side of productions:

BNF<Expr> ::= <Expr> + <Term> | <Expr> – <Term> | <Term><Term> ::= <Term> * <Fact> | <Term> / <Fact> | <Fact><Fact> ::= id | ( <Expr> )

EBNF<Expr> ::= <Term> (“+” <Term> | “–” <Term> )*<Term> ::= <Fact> (“*” <Fact> | “/” <Fact> )*<Fact> ::= id | “(” <Expr> “)”

Page 41: 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

41

TEST YOURSELF

Question 5: Write a CFG, BNF, and/or EBNF for each of these languages: – Strings of the form anbn. Example: aaabbb– Strings ambn such that m>n. Example: aaaabb– Strings ambn such that m<n. Example: aabbbb– Strings over {a, b} such that the number of a’s

equals the number of b’s. Example: baabba– Strings of the form ambncp. Example: aabbbcccc– Strings of the form ambncp such that either m=n

or n=p. Examples: aabbcccc, aabbbbcccc