Fall 2014-2015 Compiler Principles Lecture 4: Parsing part 3 Roman Manevich Ben-Gurion University
Dec 14, 2015
Fall 2014-2015 Compiler PrinciplesLecture 4: Parsing part 3
Roman ManevichBen-Gurion University
2
Tentative syllabusFrontEnd
Scanning
Top-downParsing (LL)
Bottom-upParsing (LR)
AttributeGrammars
IntermediateRepresentation
Lowering
Optimizations
Local Optimizations
DataflowAnalysis
LoopOptimizations
Code Generation
RegisterAllocation
InstructionSelection
mid-term exam
3
Previously• Top-down parsing– Recursive descent– Handling conflicts– LL(k) via pushdown automata
4
Agenda• Shift-reduce (LR) parsing model
• Building the LR parsing table
• Types of conflicts
5
Shift-reduce parsing
6
Some terminology
• The opposite of derivation is called reduction– Let A α be a production rule– Let βAµ be a sentential form– A reduction replaces α with A: βαµ βAµ
• A handle is a substring that is reduced during a series of steps in a rightmost derivation
Using shift and reduce to parseE E + (E) E i
7
action Input Stackshift 1 + (2) + (3)reduce + (2) + (3) 1shift + (2) + (3) Eshift (2) + (3) E +shift 2) + (3) E + (reduce ) + (3) E + (2shift ) + (3) E + (Ereduce + (3) E + (E)shift + (3) Eshift (3) E +shift 3) E + (reduce ) E + (3shift ) E + (Ereduce E + (E)accept E
On each step we either:- shift a symbol from the input to the stack, or- reduce symbols on the stack
8
How will the parser know what to do?
• A state will keep the info gathered so far• A stack will maintain formerly reduced
handles and partially reduced handles• A table will tell it “what to do” based on– Current state,– Symbol on top of stack, and– k-next tokens (k≥0)
9
Model of an LR parser
LRParsing program
Stack
$ id + id + id
Output
Parser table
Input
State
10
States and LR(0) items• The state will “remember” the potential derivation rules
given the part that was already identified• For example, if we have already identified E then the
state will remember the two alternatives: (1) E → E * B, (2) E → E + B• Actually, we will also remember where we are in each of
them: (1) E → E ● * B, (2) E → E ● + B• A derivation rule with a location marker is called an
LR(0) item• The state is actually a set of LR(0) items
– For example: q13 = { E → E ● * B , E → E ● + B}
E → E * B | E + B | BB → 0 | 1
11
Intuition
• We gather the input token by token until we find a right-hand side of a rule and then we replace it with the nonterminal on the left side
• Going over a token and remembering it in the stack is a shift
• Each shift moves to a state that remembers what we’ve seen so far
• A reduce replaces a string in the stack with the nonterminal that derives it
Why do we need the stack?E E + (E) E i
12
action Input Stackshift 1 + (2) + (3)reduce + (2) + (3) 1shift + (2) + (3) Eshift (2) + (3) E +shift 2) + (3) E + (reduce ) + (3) E + (2shift ) + (3) E + (Ereduce + (3) E + (E)shift + (3) Eshift (3) E +shift 3) E + (reduce ) E + (3shift ) E + (Ereduce E + (E)accept E
• Suppose so far we have discovered E → 1 and gather information on “E +”
• In the given grammar this can only mean
E → E + ● (E)• Suppose state q represents this
possibility • Now, the next token is (, and we need
to ignore q for a minute, and work on E → 2 to obtain E+(E)
• Therefore, we push q to the stack, and after identifying E, we pop it to continue
13
LR parser stack
LRParsing program
5
T
2
+
7
id
0
Stack
$ id + id + id
Outputstate
symbol
goto action
Input
State
LR parsing table
state terminals non-terminals
shift/reduceactions
gotopart
0
1...
sn
rk
shift state n reduce by rule k
gm
goto state m
acc
accept
error
14
15
LR(0) parser table example
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
goto action STATE
T E $ ) ( + id
g6 g1 s7 s5 0
s2 s3 1
acc 2
g4 s7 s5 3
r3 r3 r3 r3 r3 4
r4 r4 r4 r4 r4 5
r2 r2 r2 r2 r2 6
g6 g8 s7 s5 7
s9 s3 8
r5 r5 r5 r5 r5 9
Always entire row of rk
Always entire row of shift and gotos (possibly accept)
16
LR parser moves
17
Shift move
LRParsing program
q...
Stack
$ … t …
Output
goto action
Input
If action[q, t] = sn thenpush t, push n
currentstate
n is the nextstate
18
Result of shift
LRParsing program
ntq...
Stack
$ … t …
Output
goto action
Input
If action[q, t] = sn thenpush t, push n
19
Reduce move
• If action[qn, t] = rk
• Production: (k) A σ1… σn
• Top of stack looks like q1 σ1… qn σn
1. Pop qn σn… q1 σ1
2. If goto[q, A] = q’ then push A, push q’
LRParsing program
qn
…
q…
Stack
$ … t …
Output
goto action
Input
2*n
Rule k
20
Result of reduce move
• If action[qn, t] = rk
• Production: (k) A σ1… σn
• Top of stack looks like q1 σ1… qn σn
1. Pop qn σn… q1 σ1
2. If goto[q, A] = q’ then push A, push q’
LRParsing program
Stack
Output
goto action
q’
A
q
…
$ … t …Input
21
Accept move
LRParsing program
q...
Stack
$ t …
Output
goto action
Input
If action[q, t] = acceptparsing completed
22
Error move
LRParsing program
q...
Stack
$ … t …
Output
goto action
Input
If action[q, t] = errorparsing discovered a syntactic error
23
Example of Shift-reduce parser run
24
Parsing id+id$
goto action ST E $ ) ( + id
g6 g1 s7 s5 0s2 s3 1
acc 2g4 s7 s5 3
r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6
g6 g8 s7 s5 7s9 s3 8
r5 r5 r5 r5 r5 9
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
Stack Input Action
0 id + id $ ?
Initialize with state 0
25
Parsing id+id$
goto action ST E $ ) ( + id
g6 g1 s7 s5 0s2 s3 1
acc 2g4 s7 s5 3
r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6
g6 g8 s7 s5 7s9 s3 8
r5 r5 r5 r5 r5 9
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
Stack Input Action
0 id + id $ s5
26
Parsing id+id$
Stack Input Action0 id + id $ s50 id 5 + id $ r4
goto action ST E $ ) ( + id
g6 g1 s7 s5 0s2 s3 1
acc 2g4 s7 s5 3
r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6
g6 g8 s7 s5 7s9 s3 8
r5 r5 r5 r5 r5 9
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
27
Parsing id+id$
Stack Input Action0 id + id $ s50 id 5 + id $ r4
goto action ST E $ ) ( + id
g6 g1 s7 s5 0s2 s3 1
acc 2g4 s7 s5 3
r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6
g6 g8 s7 s5 7s9 s3 8
r5 r5 r5 r5 r5 9
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
pop id 5
28
Parsing id+id$
Stack Input Action0 id + id $ s50 id 5 + id $ r4
goto action ST E $ ) ( + id
g6 g1 s7 s5 0s2 s3 1
acc 2g4 s7 s5 3
r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6
g6 g8 s7 s5 7s9 s3 8
r5 r5 r5 r5 r5 9
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
push T 6
29
Parsing id+id$
Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r2
goto action ST E $ ) ( + id
g6 g1 s7 s5 0s2 s3 1
acc 2g4 s7 s5 3
r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6
g6 g8 s7 s5 7s9 s3 8
r5 r5 r5 r5 r5 9
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
30
Parsing id+id$
Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r20 E 1 + id $ s3
goto action ST E $ ) ( + id
g6 g1 s7 s5 0s2 s3 1
acc 2g4 s7 s5 3
r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6
g6 g8 s7 s5 7s9 s3 8
r5 r5 r5 r5 r5 9
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
31
Parsing id+id$
Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r20 E 1 + id $ s30 E 1 + 3 id $ s5
goto action ST E $ ) ( + id
g6 g1 s7 s5 0s2 s3 1
acc 2g4 s7 s5 3
r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6
g6 g8 s7 s5 7s9 s3 8
r5 r5 r5 r5 r5 9
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
32
Parsing id+id$
Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r20 E 1 + id $ s30 E 1 + 3 id $ s50 E 1 + 3 id 5 $ r4
goto action ST E $ ) ( + id
g6 g1 s7 s5 0s2 s3 1
acc 2g4 s7 s5 3
r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6
g6 g8 s7 s5 7s9 s3 8
r5 r5 r5 r5 r5 9
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
33
Parsing id+id$
Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r20 E 1 + id $ s30 E 1 + 3 id $ s50 E 1 + 3 id 5 $ r40 E 1 + 3 T 4 $ r3
goto action ST E $ ) ( + id
g6 g1 s7 s5 0s2 s3 1
acc 2g4 s7 s5 3
r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6
g6 g8 s7 s5 7s9 s3 8
r5 r5 r5 r5 r5 9
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
34
Parsing id+id$
Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r20 E 1 + id $ s30 E 1 + 3 id $ s50 E 1 + 3 id 5 $ r40 E 1 + 3 T 4 $ r30 E 1 $ s2
goto action ST E $ ) ( + id
g6 g1 s7 s5 0s2 s3 1
acc 2g4 s7 s5 3
r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6
g6 g8 s7 s5 7s9 s3 8
r5 r5 r5 r5 r5 9
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
35
Parsing id+id$
Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r20 E 1 + id $ s30 E 1 + 3 id $ s50 E 1 + 3 id 5 $ r40 E 1 + 3 T 4 $ r30 E 1 $ s20 E 1 $ 2 acc
goto action ST E $ ) ( + id
g6 g1 s7 s5 0s2 s3 1
acc 2g4 s7 s5 3
r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6
g6 g8 s7 s5 7s9 s3 8
r5 r5 r5 r5 r5 9
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
36
Constructing an LR(0)parsing table
37
Overall process
1. Construct a (determinized) transition diagram from LR(0) items
2. If there are conflicts – stop– Grammar is not LR(0)
3. Otherwise, fill table entries from diagram
LR(0) item
N αβ
Already matched To be matched
Input
Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β
38
LR(0) items
N αβ Shift Item
N αβ Reduce Item
39
LR(0) items enumeration example
• All items can be obtained by placing a dot at every position for every production:
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
1: S E$2: S E $3: S E $ 4: E T5: E T 6: E E + T7: E E + T8: E E + T9: E E + T 10: T id11: T id 12: T (E)13: T ( E)14: T (E )15: T (E)
Grammar LR(0) items
40
41
Operations for transition diagram construction
• Initial = {S’S$}• For an item set I solve:
Closure(I) = Closure(I) + {Xµ is in grammar| NαXβ in I}
• Goto(I, σ) = { Nασβ | Nασβ in I}– σ is either a terminal or nonterminal
42
Initial example
• Initial = {S E $}(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
Grammar
43
Closure example
• Initial = {S E $}• Closure({S E $}) =
S E $E TE E + TT id T ( E )
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
Grammar
44
Goto example
• Initial = {S E $}• Closure({S E $}) =
S E $E TE E + TT id T ( E )
• Goto({S E $ , E E + T, T id}, E) = {S E $, E E + T}
(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
Grammar
45
Constructing the transition diagram
1. Start with state 0 containing itemClosure({S E $})
2. Repeat until no new states are discovered– For every state p containing item set Ip, and
symbol N, compute state q containing item setIq = Closure(Goto(Ip, N))
Why does it terminate?
46
LR(0) automaton example(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
S E$E TE E + TT idT (E)
T (E)E TE E + TT idT (E)
E E + T
T (E) S E$
S E$E E+ T E E+T
T idT (E)
T id
T (E)E E+T
E Tq0
q1
q2
q3
q4
q5
q6
q7
q8
q9
T
(
id
E
+
$
T
)
+
E
id
T
(i
(
47
LR(0) automaton construction example(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
S E$
q0
Initialize
48
LR(0) automaton construction example(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
S E$E TE E + TT idT (E)
q0
applyClosure
49
LR(0) automaton construction example(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
S E$E TE E + TT idT (E)
q0E T
q6
T
T (E)E TE E + TT idT (E)
(
T id
q5id
S E$E E+ T
q1
E
50
LR(0) automaton construction example(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )
S E$E TE E + TT idT (E)
T (E)E TE E + TT idT (E)
E E + T
T (E) S E$
S E$E E+ T E E+T
T idT (E)
T id
T (E)E E+T
E Tq0
q1
q2
q3
q4
q5
q6
q7
q8
q9
T
(
id
E
+
$
T
)
+
E
id
T
(i
(
terminal transition corresponds to shift action in parse table
non-terminal transition corresponds to goto action in parse table
a single reduce item corresponds to reduce action
51
LR(0) conflicts
52
Conflicts
• Can construct a diagram for every grammar but some may introduce conflicts
• shift-reduce conflict: an item set contains at least one shift item and one reduce item
• reduce-reduce conflict: an item set contains two reduce items
What about shift-shift conflicts?
Shift-reduce conflict example
S E $E T E E + TT id T ( E )T id[E]
S E$E TE E + TT idT (E)T id[E] T id
T id[E]
q0
q5
T
(
id
E Shift/reduce conflict
…
…
…
53
Reduce-reduce conflict example
S E $E TE VE E + TT id V idT ( E )
S E$E TE VE E + TT idV idT (E)T i[E] T id
V id
q0
q5
T
(
id
Ereduce/reduce conflict
…
…
…
54
LR(0) conflicts
• Any grammar with an -rule cannot be LR(0)• Inherent shift/reduce conflict– A – reduce item– P αAβ – shift item– A can always be predicted from P αAβ
• Similar to FIRST-FOLLOW conflicts in LL(1) parsing– Similar solution
55
56
LR parsing variants
LR variants
• LR(0) – what we’ve seen so far• SLR(0)– Removes infeasible reduce actions via FOLLOW set
reasoning• LR(1)– LR(0) with one lookahead token in items
• LALR(1)– LR(1) with merging of states with same LR(0)
component
57
58
SLR parsing
SRL parsing
• A handle should not be reduced to a non-terminal N if the lookahead is a token that cannot follow N
• A reduce item N α is applicable only when the lookahead is in FOLLOW(N)– If b is not in FOLLOW(N) we just proved there is no
terminating derivation S =>* βNb and thus it is safe to remove the reduce item from the conflicted state
• Differs from LR(0) only on the ACTION table– Now a row in the parsing table may contain both shift
actions and reduce actions and we need to consult the current token to decide which one to take
59
SLR action table
State id + ( ) [ ] $
0 shift shift
1 shift shift
2 accept
3 shift shift
4 EE+T EE+T EE+T5 Tid Tid r5, s6 Tid6 ET ET ET7 shift shift
8 shift shift
9 T(E) T(E) T(E)
vs.
state action
q0 shift
q1 shift
q2
q3 shift
q4 EE+Tq5 Tidq6 ETq7 shift
q8 shift
q9 TE
SLR – use 1 token look-ahead LR(0) – no look-ahead
… as before…T id T id[E]
Lookahead token from the
input
60
[ is not in FOLLOW(T)
Next lecture:SLR/LR(1)/LALR(1)/Parser generation