Top Banner
Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion University 1
77

Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Jun 20, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Fall 2014-2015 Compiler PrinciplesLecture 2: Parsing part 1

Roman ManevichBen-Gurion University

1

Page 2: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Previously: lexical analysis

• High-level process

• Scanner generator (e.g., JFlex) automatically generates scanner code

2

List ofregular

expressions(one per lexeme)

NFA+Є DFAToken nextToken() {…}

Codeimplementingmaximal munchwith tie breaking policy

minimization

Page 3: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Books

3

CompilersPrinciples, Techniques, and ToolsAlfred V. Aho, Ravi Sethi, Jeffrey D. Ullman

Advanced Compiler Design and ImplementationSteven Muchnik

Modern Compiler DesignD. Grune, H. Bal, C. Jacobs, K. Langendoen

Modern Compiler Implementation in JavaAndrew W. Appel

Page 4: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Tentative syllabus

FrontEnd

Scanning

Top-downParsing (LL)

Bottom-upParsing (LR)

AttributeGrammars

IntermediateRepresentation

Lowering

Optimizations

Local Optimizations

DataflowAnalysis

LoopOptimizations

Code Generation

RegisterAllocation

InstructionSelection

4

mid-term exam

Page 5: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Agenda

5

• Understand role of syntax analysis

• Context-free grammars refresher

• Top-down parsing

Page 6: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

The bigger picture

• Compilers include different kinds of program analyses each further constrains the set of legal programs

– Lexical constraints

– Syntax constraints

– Semantic constraints

– “Logical” constraints(Verifying Compiler grand challenge)

6

Program consists of legal tokens

Program included in a given context-free language

Program included in a given attribute grammar (type checking, legal inheritance graph, variables initialized before used)

Memory safety: null dereference, array-out-of-bounds access,data races, functional correctness (program meets specification)

Page 7: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Syntax analysis overview

7

Page 8: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Role of syntax analysis

• Recover structure from stream of tokens– Parse tree / abstract syntax tree

• Error reporting (recovery)• Other possible tasks

– Syntax directed translation (one pass compilers)– Create symbol table– Create pretty-printed version of the program, e.g., Auto

Formatting function in Eclipse

8

High-levelLanguage

(scheme)

Executable

Code

LexicalAnalysis

Syntax Analysis

Parsing

AST SymbolTableetc.

Inter.Rep.(IR)

CodeGeneration

Page 9: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

From tokens to abstract syntax trees

59 + (1257 * xPosition)

)id*num(+num

Lexical Analyzer

program text

token stream

Parser

Grammar:

E id

E num

E E + E

E E * E

E ( E ) +

num

num x

*

Abstract Syntax Tree

validsyntaxerror

9

Lexicalerror valid

Regular expressionsFinite automata

Context-free grammarsPush-down automata

Page 10: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Context-free grammarsrefresher

10

Page 11: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Example grammar

11

shorthand for Statement

shorthand for Expression

shorthand for List(of expressions)

S S ; SS id := E S print (L)E idE numE E + EL EL L, E

Page 12: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

CFG terminology

12

Symbols:Terminals (tokens): ; := ( ) id num print

Non-terminals: S E L

Start non-terminal: SConvention: the non-terminal appearingin the first derivation rule

Grammar productions (rules)

N α

S S ; SS id := E S print (L)E idE numE E + EL EL L, E

Page 13: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

More definitions

• Sentential form: a sequence of symbols, terminals (tokens) and non-terminals

• Sentence: a sequence of terminals (tokens)

• Derivation step: given a sentential form αNβand rule N µ a step is the transitionαNβ αµβ

• Derivation sequence: a sequence of derivation steps 1 … k such that i i+1 is the result of applying one production and k is a sentence

13

Page 14: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Language of a CFG

• A word ω is in L(G) (valid program) if there exists a corresponding derivation sequence– Start the start symbol

– Repeatedly replace one of the non-terminals by a right-hand side of a production

– Stop when the sentence contains only terminals

• ω is in L(G) if S * ω– Rightmost derivation

– Leftmost derivation

14

Page 15: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Leftmost derivation

15

S

=> S ; S

=> id := E ; S

=> id := num ; S

=> id := num ; id := E

=> id := num ; id := E + E

=> id := num ; id := num + E

=> id := num ; id := num + num

a := 56 ; b := 7 + 3

id := num ; id := num + num

S S ; SS id := E S print (L)E idE numE E + EL EL L, E

Page 16: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Rightmost derivation

16

S

=> S ; S

=> S ; id := E

=> S ; id := E + E

=> S ; id := E + num

=> S ; id := num + num

=> id := E ; id := num + num

=> id := num ; id := num + num

a := 56 ; b := 7 + 3

id := num ; id := num + num

S S ; SS id := E S print (L)E idE numE E + EL EL L, E

Page 17: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Canonical derivations

• Leftmost/rightmost derivations may not be unique but they allow describing a derivation by the sequence of production rules taken (since non-terminal is already known)

17

Page 18: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Parse trees

• Tree nodes are symbols, children ordered left-to-right

• Each internal node is non-terminal and its children correspond to one of its productions

N µ1 … µk

• Root is start non-terminal

• Leaves are tokens

• Yield of parse tree: left-to-right walk over leaves

18

µ1 µk

N

Page 19: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Parse tree exercise

19

S S ; SS id := E S print (L)E idE numE E + EL EL L, E id := num ; id := num num+

Draw parse tree for expression

Page 20: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Parse tree exercise

20

id := num ; id := num num+

E E E

S E

S

S

Order-independent representation

S S ; SS id := E S print (L)E idE numE E + EL EL L, E

(S(Sa := (E56)E)S ; (Sb := (E(E7)E + (E3)E)E)S)SEquivalently add parentheses labeled by non-terminal names

Page 21: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Capabilities and limitations of CFGs

• CFGs naturally express– Hierarchical structure

• A program is a list of classes,A Class is a list of definition…

– Alternatives• A definition is either a field definition or a method definition

– Beginning-end type of constraints• Balanced parentheses S (S)S | ε

• Cannot express– Correlations between unbounded strings (identifiers)– For example: variables are declared before use: ω S ω

• Handled by semantic analysis (attribute grammars)

21

p. 173

Page 22: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Bad grammars

22

By Oren neu dag (Own work) [CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

Page 23: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Badly-formed grammars

• A non-terminal N is reachable if S * αNβ• A non-terminal N is generating if N * ω• A grammar G is badly-formed if it either contains unreachable non-

terminals or non-generating non-terminals– G1 = {

S xN y

}– G2 = {

S x | NN a N b N

}

• Theorem: for every grammar G there exists an equivalent well-formed grammar G’ ( that is, L(G)=L(G’) )Proof: exercise

• From now on, we will only handle well-formed grammars

23

Page 24: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Ambiguity in Context-free grammars

24

Page 25: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Sometimes there are two parse trees

25

Leftmost derivation

E

E + E

num + E

num + E + E

num + num + E

num + num + num

num(1)

E

E E

+

E E

+num(2) num(3)

Rightmost derivation

E

E + E

E + num

E + E + num

E + num + num

num + num + num

+ num(3)+num(1) num(2)

Arithmetic expressions:

E id

E num

E E + E

E E * E

E ( E )

1 + 2 + 3

E

E E

E

E

1 + (2 + 3) (1 + 2) + 3

Page 26: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Is ambiguity a problem for compilers?

Leftmost derivation

E

E + E

num + E

num + E + E

num + num + E

num + num + num

num(1)

E

E E

+

E E

+num(2) num(3)

Rightmost derivation

E

E + E

E + num

E + E + num

E + num + num

num + num + num

+ num(3)+num(1) num(2)

Arithmetic expressions:

E id

E num

E E + E

E E * E

E ( E )

1 + 2 + 3

E

E E

E

E

= 6 = 6

1 + (2 + 3) (1 + 2) + 3Depends on semantics

26

Page 27: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Problematic ambiguity example

Leftmost derivation

E

E + E

num + E

num + E * E

num + num * E

num + num * num

num(1)

E

E E

+

E E

*num(2) num(3)

Rightmost derivation

E

E * E

E * num

E + E * num

E + num * num

num + num * num

* num(3)+num(1) num(2)

Arithmetic expressions:

E id

E num

E E + E

E E * E

E ( E )

1 + 2 * 3

This is what we usually want: * has precedence over +

E

E E

E

E

= 7 = 9

1 + (2 * 3) (1 + 2) * 3

27

Page 28: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Ambiguous grammars

• A grammar is ambiguous if there exists a word for which there are– Two different leftmost derivations

– Two different rightmost derivations

– Two different parse trees

• Property of grammars, not languages

• Some languages are inherently ambiguous –no unambiguous grammars exist

• No algorithm to detect whether arbitrary grammar is ambiguous

28

Page 29: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Drawbacks of ambiguous grammars

• Ambiguous semantics

• Parsing complexity

• May affect other phases

• Solutions?

29

Page 30: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Drawbacks of ambiguous grammars

• Ambiguous semantics

• Parsing complexity

• May affect other phases

• Solutions

– Allow only non-ambiguous grammars

– Transform grammar into non-ambiguous

– Handle as part of parsing method

• Using special form of “precedence”

• Wait for bottom-up parsing lecture

30

Page 31: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Transforming ambiguous grammars to non-ambiguous by layering

Ambiguous grammar

E E + E

E E * E

E id

E num

E ( E )

Unambiguous grammar

E E + T

E T

T T * F

T F

F id

F num

F ( E )

Layer 1

Layer 2

Layer 3

Let’s derive 1 + 2 * 3

Each layer takes care of one way of composing sub-strings to form a string:1: by +2: by *3: atoms

31

Page 32: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Transformed grammar: * precedes +

Ambiguous grammar

E E + E

E E * E

E id

E num

E ( E )

Unambiguous grammar

E E + T

E T

T T * F

T F

F id

F num

F ( E )

Derivation

E

=> E + T

=> T + T

=> F + T

=> 1 + T

=> 1 + T * F

=> 1 + F * F

=> 1 + 2 * F

=> 1 + 2 * 3+ * 321

F F F

T

TE

T

E

Parse tree

32

Page 33: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Transformed grammar: + precedes *

Ambiguous grammar

E E + E

E E * E

E id

E num

E ( E )

Unambiguous grammar

E E * T

E T

T T + F

T F

F id

F num

F ( E )

+ * 321

Derivation

E

=> E * T

=> T * T

=> T + F * T

=> F + F * T

=> 1 + F * T

=> 1 + 2 * T

=> 1 + 2 * F

=> 1 + 2 * 3

F F F

T

T

E

T

E

Parse tree

33

Page 34: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Another example for layering

34

Ambiguous grammar

P ε

| P P

| ( P )

ε )( ε )(( )

P P

P P

P

ε )( ε )(( )

P P

P P

P

P

ε

P

Page 35: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Another example for layering

35

Ambiguous grammar

P ε

| P P

| ( P )

Unambiguous grammar

S P S

| ε

P ( S )

Takes care of “concatenation”

Takes care of nesting

ε )( ε )(( )

S S

P P

s

ε

s

P

s

s

s

ε

Page 36: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

“dangling-else” example

36

Ambiguous grammar

S if E then S

| if E then S else S

| other

if

S

Sthen

thenif elseE S S

E

E1

E2 S1 S2

if

S

Sthen

thenif

else

E S

SE

E1

E2 S1

S2

if E1 then (if E2 then S1 else S2) if E1 then (if E2 then S1) else S2

This is what we usually want: match else to closest unmatched then

if E1 then if E2 then S1 else S2

p. 174

Page 37: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

“dangling-else” example

37

if

S

Sthen

thenif else

Ambiguous grammar

S if E then S

| if E then S else S

| other

E S S

E

E1

E2 S1 S2

if

S

Sthen

thenif

else

E S

SE

E1

E2 S1

S2

if E1 then (if E2 then S1 else S2) if E1 then (if E2 then S1) else S2

Unambiguous grammar

S M | U

M if E then M else M

| other

U if E then S

| if E then M else U

if E1 then if E2 then S1 else S2

Matched statements

Unmatched statements

p. 174

Page 38: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Parsing strategies

38

Page 39: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Broad kinds of parsers

• Parsers for arbitrary grammars–Cocke-Younger-Kasami [‘65] method O(n3)

– Earley’s method (implemented by NLTK)

–Not commonly used by compilers

• Parsers for restricted classes of grammars– Top-Down

• With/without backtracking

–Bottom-Up

39

Page 40: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Top-down parsing

• Constructs parse tree in a top-down matter

• Preorder tree traversal

• Find the leftmost derivation

• Predictive: for every non-terminal and k-tokens predictthe next production LL(k)

• Challenge: beginning with the start symbol, try to guess the productions to apply to end up at the user's program

40

By Fidelio (Own work) [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC-BY-SA-3.0-2.5-2.0-1.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

Page 41: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Top-down parsing example

41

Unambiguous grammar

E E * T

E T

T T + F

T F

F id

F num

F ( E )

+ * 321

F F F

T

T

E

T

E

Page 42: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Top-down parsing example

42

Unambiguous grammar

E E * T

E T

T T + F

T F

F id

F num

F ( E )

We need this rule to match the * in the input

+ * 321

E

Page 43: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Top-down parsing example

43

Unambiguous grammar

E E * T

E T

T T + F

T F

F id

F num

F ( E )

+ * 321

E

T

E

Page 44: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Top-down parsing example

44

Unambiguous grammar

E E * T

E T

T T + F

T F

F id

F num

F ( E )

+ * 321

T

E

T

E

Page 45: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Top-down parsing example

45

Unambiguous grammar

E E * T

E T

T T + F

T F

F id

F num

F ( E )

+ * 321

F

T

T

E

T

E

Page 46: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Top-down parsing example

46

Unambiguous grammar

E E * T

E T

T T + F

T F

F id

F num

F ( E )

+ * 321

F F

T

T

E

T

E

Page 47: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Top-down parsing example

47

Unambiguous grammar

E E * T

E T

T T + F

T F

F id

F num

F ( E )

+ * 321

F F

T

T

E

T

E

Page 48: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Top-down parsing example

48

Unambiguous grammar

E E * T

E T

T T + F

T F

F id

F num

F ( E )

+ * 321

F F

T

T

E

T

E

Page 49: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Top-down parsing example

49

Unambiguous grammar

E E * T

E T

T T + F

T F

F id

F num

F ( E )

+ * 321

F F

T

T

E

T

E

F

Page 50: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Top-down parsing example

50

Unambiguous grammar

E E * T

E T

T T + F

T F

F id

F num

F ( E )

+ * 321

F F F

T

T

E

T

E

Page 51: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Bottom-up parsing

• Construct parse tree in a bottom-up manner

• Find the rightmost derivation in a reverse order

• For every potential right hand side and k-tokens decide when a production is found LR(k)

• Postorder tree traversal

• Challenge: beginning with the user's program, try to apply productions in reverse to convert the program back into the start symbol

51

Page 52: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Bottom-up parsing example

52

Unambiguous grammar

E E * T

E T

T T + F

T F

F id

F num

F ( E )

+ * 321

Page 53: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Bottom-up parsing example

53

Unambiguous grammar

E E * T

E T

T T + F

T F

F id

F num

F ( E )

+ * 321

F

Page 54: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Bottom-up parsing example

54

Unambiguous grammar

E E * T

E T

T T + F

T F

F id

F num

F ( E )

+ * 321

F

T

Page 55: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Bottom-up parsing example

55

Unambiguous grammar

E E * T

E T

T T + F

T F

F id

F num

F ( E )

+ * 321

F F

T

Page 56: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Bottom-up parsing example

56

Unambiguous grammar

E E * T

E T

T T + F

T F

F id

F num

F ( E )

+ * 321

F F

T

F

Page 57: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Bottom-up parsing example

57

Unambiguous grammar

E E * T

E T

T T + F

T F

F id

F num

F ( E )

+ * 321

F F

T

F

T

Page 58: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Bottom-up parsing example

58

Unambiguous grammar

E E * T

E T

T T + F

T F

F id

F num

F ( E )

+ * 321

F F

T

F

T

T

Page 59: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Bottom-up parsing example

59

Unambiguous grammar

E E * T

E T

T T + F

T F

F id

F num

F ( E )

+ * 321

F F

T

F

T

T

E

Page 60: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Bottom-up parsing example

60

Unambiguous grammar

E E * T

E T

T T + F

T F

F id

F num

F ( E )

+ * 321

F F

T

F

T

T

E

E

Page 61: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Top-down parsingvia

recursive descent

61

By Vahram Mekhitarian (Own work) [CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

Page 62: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Challenges in top-down parsing

• Top-down parsing begins with virtually no information– Begins with just the start symbol, which matches

every program

• How can we know which productions to apply?• In general, we can‘t

– There are some grammars for which the best we can do is guess and backtrack if we're wrong

• If we have to guess, how do we do it?– Parsing as a search algorithm– Too expensive in theory (exponential worst-case time)

and practice

62

Page 63: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Predictive parsing

• Given a grammar G and a word ω attempt to derive ω using G

• Idea– Apply production to leftmost nonterminal– Pick production rule based on next input token

• General grammar– More than one option for choosing the next

production based on a token

• Restricted grammars (LL)– Know exactly which single rule to apply– May require some lookahead to decide

63

Page 64: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Boolean expressions example

64

not ( not true or false )

E => not E => not ( E OP E ) =>not ( not E OP E ) =>not ( not LIT OP E ) =>not ( not true OP E ) =>not ( not true or E ) =>not ( not true or LIT ) =>not ( not true or false )

not E

E

( E OP E )

not LIT or LIT

true false

production to apply known from next token

E LIT | (E OP E) | not ELIT true | falseOP and | or | xor

Page 65: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Recursive descent parsing

• Define a function for every nonterminal

• Every function works as follows

– Find applicable production rule

– Terminal function checks match with next input token (if no match reports error)

– Nonterminal function calls (recursively) other functions

• If there are several applicable productions for a nonterminal, use lookahead

65

Page 66: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Matching tokens

• Variable current holds the current input token

66

match(token t) {

if (current == t)

current = next_token()

else

error

}

E LIT | (E OP E) | not ELIT true | falseOP and | or | xor

Page 67: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Functions for nonterminals

67

E() {

if (current {TRUE, FALSE}) // E LIT

LIT();

else if (current == LPAREN) // E ( E OP E )

match(LPAREN); E(); OP(); E(); match(RPAREN);

else if (current == NOT) // E not E

match(NOT); E();

else

error;

}

LIT() {

if (current == TRUE) match(TRUE);

else if (current == FALSE) match(FALSE);

else error;

}

E LIT | (E OP E) | not ELIT true | falseOP and | or | xor

Page 68: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Implementation via recursion

E → LIT

| ( E OP E )

| not E

LIT → true

| false

OP → and

| or

| xor

E() {

if (current {TRUE, FALSE}) LIT();

else if (current == LPAREN) match(LPARENT); E(); OP(); E(); match(RPAREN);

else if (current == NOT) match(NOT); E();

else error;

}

LIT() {

if (current == TRUE) match(TRUE);

else if (current == FALSE) match(FALSE);

else error;

}

OP() {

if (current == AND) match(AND);

else if (current == OR) match(OR);

else if (current == XOR) match(XOR);

else error;

}

68

Page 69: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Adding semantic actions

• Can add an action to perform on each production rule

• Can build the parse tree

– Every function returns an object of type Node

– Every Node maintains a list of children

– Function calls can add new children

69

Page 70: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Building the parse tree

Node E() {

result = new Node();

result.name = “E”;

if (current {TRUE, FALSE}) // E LIT

result.addChild(LIT());

else if (current == LPAREN) // E ( E OP E )

result.addChild(match(LPAREN));

result.addChild(E());

result.addChild(OP());

result.addChild(E());

result.addChild(match(RPAREN));

else if (current == NOT) // E not E

result.addChild(match(NOT));

result.addChild(E());

else error;

return result;

}

70

Page 71: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Recursive descent

• How do you pick the right A-production?

• Generally – try them all and use backtracking

• In our case – use lookahead

void A() {choose an A-production, A X1X2…Xk; for (i=1; i≤ k; i++) {if (Xi is a nonterminal) call procedure Xi();

elseif (Xi == current terminal)advance input;

elsereport error;

}}

71

Page 72: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Technical challengeswith recursive descent

72

Page 73: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

• With lookahead 1, the function for indexed_elem will never be tried… – What happens for input of the form ID[expr]

term ID | indexed_elemindexed_elem ID [ expr ]

Recursive descent: problem 1

73

Page 74: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Recursive descent: problem 2

int S() {

return A() && match(token(‘a’)) && match(token(‘b’));

}

int A() {

return match(token(‘a’)) || 1;

}

S A a bA a |

What happens for input “ab”?

What happens if you flip order of alternatives and try “aab”?

74

Page 75: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Recursive descent: problem 3

int E() {

return E() && match(token(‘-’)) && term();

}

E E - term | term

What happens with this procedure?

Recursive descent parsers cannot handle left-recursive grammars

p. 127

75

Page 76: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Indirect left recursion

76

E F - term | termF E

int E() {

return F() && match(token(‘-’)) && term();

}

int F() {

return E();

}

A grammar is left-recursive if it allows a derivation sequence of the form S * N* N

Example: E F - term E - term

Page 77: Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1comp151/wiki.files/02-parsing-1.pdf · Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion

Next lecture:more on top-down parsing

77