Compiler Principles Fall 2014-2015 Compiler Principles Lecture 4: Parsing part 3 Roman Manevich Ben-Gurion University.

Fall 2014-2015 Compiler PrinciplesLecture 4: Parsing part 3

Roman ManevichBen-Gurion University

2

Tentative syllabusFrontEnd

Scanning

Top-downParsing (LL)

Bottom-upParsing (LR)

AttributeGrammars

IntermediateRepresentation

Lowering

Optimizations

Local Optimizations

DataflowAnalysis

LoopOptimizations

Code Generation

RegisterAllocation

InstructionSelection

mid-term exam

3

Previously• Top-down parsing– Recursive descent– Handling conflicts– LL(k) via pushdown automata

4

Agenda• Shift-reduce (LR) parsing model

• Building the LR parsing table

• Types of conflicts

5

Shift-reduce parsing

6

Some terminology

• The opposite of derivation is called reduction– Let A α be a production rule– Let βAµ be a sentential form– A reduction replaces α with A: βαµ βAµ

• A handle is a substring that is reduced during a series of steps in a rightmost derivation

Using shift and reduce to parseE E + (E) E i

7

action Input Stackshift 1 + (2) + (3)reduce + (2) + (3) 1shift + (2) + (3) Eshift (2) + (3) E +shift 2) + (3) E + (reduce ) + (3) E + (2shift ) + (3) E + (Ereduce + (3) E + (E)shift + (3) Eshift (3) E +shift 3) E + (reduce ) E + (3shift ) E + (Ereduce E + (E)accept E

On each step we either:- shift a symbol from the input to the stack, or- reduce symbols on the stack

8

How will the parser know what to do?

• A state will keep the info gathered so far• A stack will maintain formerly reduced

handles and partially reduced handles• A table will tell it “what to do” based on– Current state,– Symbol on top of stack, and– k-next tokens (k≥0)

9

Model of an LR parser

LRParsing program

Stack

$ id + id + id

Output

Parser table

Input

State

10

States and LR(0) items• The state will “remember” the potential derivation rules

given the part that was already identified• For example, if we have already identified E then the

state will remember the two alternatives: (1) E → E * B, (2) E → E + B• Actually, we will also remember where we are in each of

them: (1) E → E ● * B, (2) E → E ● + B• A derivation rule with a location marker is called an

LR(0) item• The state is actually a set of LR(0) items

– For example: q13 = { E → E ● * B , E → E ● + B}

E → E * B | E + B | BB → 0 | 1

11

Intuition

• We gather the input token by token until we find a right-hand side of a rule and then we replace it with the nonterminal on the left side

• Going over a token and remembering it in the stack is a shift

• Each shift moves to a state that remembers what we’ve seen so far

• A reduce replaces a string in the stack with the nonterminal that derives it

Why do we need the stack?E E + (E) E i

12

action Input Stackshift 1 + (2) + (3)reduce + (2) + (3) 1shift + (2) + (3) Eshift (2) + (3) E +shift 2) + (3) E + (reduce ) + (3) E + (2shift ) + (3) E + (Ereduce + (3) E + (E)shift + (3) Eshift (3) E +shift 3) E + (reduce ) E + (3shift ) E + (Ereduce E + (E)accept E

• Suppose so far we have discovered E → 1 and gather information on “E +”

• In the given grammar this can only mean

E → E + ● (E)• Suppose state q represents this

possibility • Now, the next token is (, and we need

to ignore q for a minute, and work on E → 2 to obtain E+(E)

• Therefore, we push q to the stack, and after identifying E, we pop it to continue

13

LR parser stack

LRParsing program

5

T

2

+

7

id

0

Stack

$ id + id + id

Outputstate

symbol

goto action

Input

State

LR parsing table

state terminals non-terminals

shift/reduceactions

gotopart

0

1...

sn

rk

shift state n reduce by rule k

gm

goto state m

acc

accept

error

14

15

LR(0) parser table example

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

goto action STATE

T E $ ) ( + id

g6 g1 s7 s5 0

s2 s3 1

acc 2

g4 s7 s5 3

r3 r3 r3 r3 r3 4

r4 r4 r4 r4 r4 5

r2 r2 r2 r2 r2 6

g6 g8 s7 s5 7

s9 s3 8

r5 r5 r5 r5 r5 9

Always entire row of rk

Always entire row of shift and gotos (possibly accept)

16

LR parser moves

17

Shift move

LRParsing program

q...

Stack

$ … t …

Output

goto action

Input

If action[q, t] = sn thenpush t, push n

currentstate

n is the nextstate

18

Result of shift

LRParsing program

ntq...

Stack

$ … t …

Output

goto action

Input

If action[q, t] = sn thenpush t, push n

19

Reduce move

• If action[qn, t] = rk

• Production: (k) A σ1… σn

• Top of stack looks like q1 σ1… qn σn

1. Pop qn σn… q1 σ1

2. If goto[q, A] = q’ then push A, push q’

LRParsing program

qn

…

q…

Stack

$ … t …

Output

goto action

Input

2*n

Rule k

20

Result of reduce move

• If action[qn, t] = rk

• Production: (k) A σ1… σn

• Top of stack looks like q1 σ1… qn σn

1. Pop qn σn… q1 σ1

2. If goto[q, A] = q’ then push A, push q’

LRParsing program

Stack

Output

goto action

q’

A

q

…

$ … t …Input

21

Accept move

LRParsing program

q...

Stack

$ t …

Output

goto action

Input

If action[q, t] = acceptparsing completed

22

Error move

LRParsing program

q...

Stack

$ … t …

Output

goto action

Input

If action[q, t] = errorparsing discovered a syntactic error

23

Example of Shift-reduce parser run

24

Parsing id+id$

goto action ST E $ ) ( + id

g6 g1 s7 s5 0s2 s3 1

acc 2g4 s7 s5 3

r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6

g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

Stack Input Action

0 id + id $ ?

Initialize with state 0

25

Parsing id+id$


g6 g1 s7 s5 0s2 s3 1

acc 2g4 s7 s5 3


g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

Stack Input Action

0 id + id $ s5

26

Parsing id+id$

Stack Input Action0 id + id $ s50 id 5 + id $ r4


g6 g1 s7 s5 0s2 s3 1

acc 2g4 s7 s5 3


g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

27

Parsing id+id$



g6 g1 s7 s5 0s2 s3 1

acc 2g4 s7 s5 3


g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

pop id 5

28

Parsing id+id$



g6 g1 s7 s5 0s2 s3 1

acc 2g4 s7 s5 3


g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

push T 6

29

Parsing id+id$

Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r2


g6 g1 s7 s5 0s2 s3 1

acc 2g4 s7 s5 3


g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

30

Parsing id+id$

Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r20 E 1 + id $ s3


g6 g1 s7 s5 0s2 s3 1

acc 2g4 s7 s5 3


g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

31

Parsing id+id$

Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r20 E 1 + id $ s30 E 1 + 3 id $ s5


g6 g1 s7 s5 0s2 s3 1

acc 2g4 s7 s5 3


g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

32

Parsing id+id$

Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r20 E 1 + id $ s30 E 1 + 3 id $ s50 E 1 + 3 id 5 $ r4


g6 g1 s7 s5 0s2 s3 1

acc 2g4 s7 s5 3


g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

33

Parsing id+id$

Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r20 E 1 + id $ s30 E 1 + 3 id $ s50 E 1 + 3 id 5 $ r40 E 1 + 3 T 4 $ r3


g6 g1 s7 s5 0s2 s3 1

acc 2g4 s7 s5 3


g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

34

Parsing id+id$

Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r20 E 1 + id $ s30 E 1 + 3 id $ s50 E 1 + 3 id 5 $ r40 E 1 + 3 T 4 $ r30 E 1 $ s2


g6 g1 s7 s5 0s2 s3 1

acc 2g4 s7 s5 3


g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

35

Parsing id+id$

Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r20 E 1 + id $ s30 E 1 + 3 id $ s50 E 1 + 3 id 5 $ r40 E 1 + 3 T 4 $ r30 E 1 $ s20 E 1 $ 2 acc


g6 g1 s7 s5 0s2 s3 1

acc 2g4 s7 s5 3


g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

36

Constructing an LR(0)parsing table

37

Overall process

1. Construct a (determinized) transition diagram from LR(0) items

2. If there are conflicts – stop– Grammar is not LR(0)

3. Otherwise, fill table entries from diagram

LR(0) item

N αβ

Already matched To be matched

Input

Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β

38

LR(0) items

N αβ Shift Item

N αβ Reduce Item

39

LR(0) items enumeration example

• All items can be obtained by placing a dot at every position for every production:

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

1: S E$2: S E $3: S E $ 4: E T5: E T 6: E E + T7: E E + T8: E E + T9: E E + T 10: T id11: T id 12: T (E)13: T ( E)14: T (E )15: T (E)

Grammar LR(0) items

40

41

Operations for transition diagram construction

• Initial = {S’S$}• For an item set I solve:

Closure(I) = Closure(I) + {Xµ is in grammar| NαXβ in I}

• Goto(I, σ) = { Nασβ | Nασβ in I}– σ is either a terminal or nonterminal

42

Initial example

• Initial = {S E $}(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

Grammar

43

Closure example

• Initial = {S E $}• Closure({S E $}) =

S E $E TE E + TT id T ( E )

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

Grammar

44

Goto example

• Initial = {S E $}• Closure({S E $}) =

S E $E TE E + TT id T ( E )

• Goto({S E $ , E E + T, T id}, E) = {S E $, E E + T}

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

Grammar

45

Constructing the transition diagram

1. Start with state 0 containing itemClosure({S E $})

2. Repeat until no new states are discovered– For every state p containing item set Ip, and

symbol N, compute state q containing item setIq = Closure(Goto(Ip, N))

Why does it terminate?

46

LR(0) automaton example(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

S E$E TE E + TT idT (E)

T (E)E TE E + TT idT (E)

E E + T

T (E) S E$

S E$E E+ T E E+T

T idT (E)

T id

T (E)E E+T

E Tq0

q1

q2

q3

q4

q5

q6

q7

q8

q9

T

(

id

E

+

$

T

)

+

E

id

T

(i

(

47

LR(0) automaton construction example(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

S E$

q0

Initialize

48



q0

applyClosure

49



q0E T

q6

T


(

T id

q5id

S E$E E+ T

q1

E

50




E E + T

T (E) S E$

S E$E E+ T E E+T

T idT (E)

T id

T (E)E E+T

E Tq0

q1

q2

q3

q4

q5

q6

q7

q8

q9

T

(

id

E

+

$

T

)

+

E

id

T

(i

(

terminal transition corresponds to shift action in parse table

non-terminal transition corresponds to goto action in parse table

a single reduce item corresponds to reduce action

51

LR(0) conflicts

52

Conflicts

• Can construct a diagram for every grammar but some may introduce conflicts

• shift-reduce conflict: an item set contains at least one shift item and one reduce item

• reduce-reduce conflict: an item set contains two reduce items

What about shift-shift conflicts?

Shift-reduce conflict example

S E $E T E E + TT id T ( E )T id[E]

S E$E TE E + TT idT (E)T id[E] T id

T id[E]

q0

q5

T

(

id

E Shift/reduce conflict

…

…

…

53

Reduce-reduce conflict example

S E $E TE VE E + TT id V idT ( E )

S E$E TE VE E + TT idV idT (E)T i[E] T id

V id

q0

q5

T

(

id

Ereduce/reduce conflict

…

…

…

54

LR(0) conflicts

• Any grammar with an -rule cannot be LR(0)• Inherent shift/reduce conflict– A – reduce item– P αAβ – shift item– A can always be predicted from P αAβ

• Similar to FIRST-FOLLOW conflicts in LL(1) parsing– Similar solution

55

56

LR parsing variants

LR variants

• LR(0) – what we’ve seen so far• SLR(0)– Removes infeasible reduce actions via FOLLOW set

reasoning• LR(1)– LR(0) with one lookahead token in items

• LALR(1)– LR(1) with merging of states with same LR(0)

component

57

58

SLR parsing

SRL parsing

• A handle should not be reduced to a non-terminal N if the lookahead is a token that cannot follow N

• A reduce item N α is applicable only when the lookahead is in FOLLOW(N)– If b is not in FOLLOW(N) we just proved there is no

terminating derivation S =>* βNb and thus it is safe to remove the reduce item from the conflicted state

• Differs from LR(0) only on the ACTION table– Now a row in the parsing table may contain both shift

actions and reduce actions and we need to consult the current token to decide which one to take

59

SLR action table

State id + ( ) [ ] $

0 shift shift

1 shift shift

2 accept

3 shift shift

4 EE+T EE+T EE+T5 Tid Tid r5, s6 Tid6 ET ET ET7 shift shift

8 shift shift

9 T(E) T(E) T(E)

vs.

state action

q0 shift

q1 shift

q2

q3 shift

q4 EE+Tq5 Tidq6 ETq7 shift

q8 shift

q9 TE

SLR – use 1 token look-ahead LR(0) – no look-ahead

… as before…T id T id[E]

Lookahead token from the

input

60

[ is not in FOLLOW(T)

Next lecture:SLR/LR(1)/LALR(1)/Parser generation

Compiler Principles Fall 2014-2015 Compiler Principles Lecture 4: Parsing part 3 Roman Manevich Ben-Gurion University.

Documents

e e e e i

mean e e e

e e shift

e e b

e shift2

shifte e reducee e accepte

b e b b b

e shift3e shift3e reducee