CSC 3130: Automata theory and formal languages Andrej Bogdanov http://www.cse.cuhk.edu.hk/ ~andrejb/csc3130 The Chinese University of Hong Kong LR(k) grammars Fall 2008
Jan 29, 2016
CSC 3130: Automata theory and formal languages
Andrej Bogdanov
http://www.cse.cuhk.edu.hk/~andrejb/csc3130
The Chinese University of Hong Kong
LR(k) grammars
Fall 2008
LR(0) example from last time
A •aAbA •ab
A a•AbA a•bA •aAbA •ab
A aA•b
A aAb•
A ab•
ab
bAa
1
2
3
4
5
A aAb | ab
LR(0) parsing example revisited
Stack Input
S
S
SRSR
11a2
1a2a2
1a2a2b31a2A41a2A4b51A
aabbabb
bb
bb
A S
A aAb | ab A aAb aabb
12
2
345
A
A •aAbA •ab A a•Ab
A a•bA •aAbA •ab
A aA•b A aAb•
A ab•
a
b
b
A
a12
3
4 5
Aa b
a b
• •
• •
• •
•
Meaning of LR(0) items
•
A
A •Xundiscovered
part
NFA transitions to:
X •
X
focus
shift focus to subtree rooted at X(if X is nonterminal)
A X•move past subtreerooted at X
Outline of LR(0) parsing algorithm
• Algorithm can perform two actions:
• What if:
no complete item
is valid
there is one valid item,and it is complete
shift (S) reduce (R)
some valid items
complete, some not
more than one valid
complete item
S / R conflict R / R conflict
Definition of LR(0) grammar
• A grammar is LR(0) if S/R, R/R conflicts never occur– LR means parsing happens left to right and produces
a rightmost derivation
• LR(0) grammars are unambiguous and have a fastparsing algorithm
• Unfortunately, they are not “expressive” enoughto describe programming languages
context-free grammarsparse using CYK algorithm (slow)
LR(∞) grammars
…
Hierarchy of context-free grammars
LR(1) grammars
LR(0) grammarsparse using LR(0) algorithm
javaperl
python…
A grammar that is not LR(0)
S A(1) | Bc(2) A aA(3) | a(4) B a(5) | ab(6)
input: a
A grammar that is not LR(0)
S A(1) | Bc(2) A aA(3) | a(4) B a(5) | ab(6)
A
S
A B
A
aA
a a
A
a a
S S
ca
input:
possibilities:shift (3), reduce (4)reduce (5), shift (6)
• • •
valid LR(0) items:A a•A, A a• B a•, B a•b,A •aA, A •a
a
S/R, R/R conflicts!
Lookahead
S A(1) | Bc(2) A aA(3) | a(4) B a(5) | ab(6)
A
S
A B
A
aA
a a
A
a a
S S
ca
input:
• • •
apeek inside!
valid LR(0) items:A a•A, A a• B a•, B a•b,A •aA, A •a
Lookahead
S A(1) | Bc(2) A aA(3) | a(4) B a(5) | ab(6)
input: a apeek inside!
valid LR(0) items:A a•A, A a• B a•, B a•b,A •aA, A •a
A
A
a a
S
•
…
parse tree must look like this
action: shift
Lookahead
S A(1) | Bc(2) A aA(3) | a(4) B a(5) | ab(6)
input: a a apeek inside!
valid LR(0) items:A a•A, A a• A •aA, A •a
parse tree must look like this
…
A
A
aA
a
S
•action: shift
Lookahead
S A(1) | Bc(2) A aA(3) | a(4) B a(5) | ab(6)
input: a a a
valid LR(0) items:A a•A, A a• A •aA, A •a
parse tree must look like this
action: reduce
A
A
aA
a a
S
•
LR(0) items vs. LR(1) items
A
A
a b
a b
Aa b•
A aAb | ab
A a•Ab
A
A
a b
a b
Aa b•
[A a•Ab, b]
LR(0) LR(1)
LR(1) items
• LR(1) items are of the form
to represent this state in the parsing
[A •, x] [A •, ]or
x•
A
•
A
Outline of LR(1) parsing algorithm
• Step 1: Build NFA that describes valid item updates
• Step 2: Convert NFA to DFA– As in LR(0), DFA will have shift and reduce states
• Step 3: Run DFA on input, using stack to remember
sequence of states– Use lookahead to eliminate wrong reduce items
Recall NFA transitions for LR(0)
• States of NFA will be items (plus a start state q0)
• For every item S •we have a transition
• For every item A •X we have a transition
• For every item A •C and production C •
S •q0
A X•XA •X
C •A •C
NFA transitions for LR(1)
• For every item [S •,] we have a transition
• For every item A •X we have a transition
• For every item [A •C, x] and production C
for every y in FIRST(x)
[S •,]q0
[A X•, x]X
[A •X, x]
[C •, y]
[A •C, x]
FIRST sets
• Example
FIRST() is the set of terminals that occuron the left in some derivation starting from
S A(1) | cB(2) A aA(3) | a(4) B a(5) | ab(6)
FIRST(a) = {a}FIRST(A) = {a}FIRST(S) = {a, c}FIRST(bAc) = {b}FIRST(BA) = {a}FIRST() = ∅
Explaining the transitions
[A X•, x]X
[A •X, x]
[C •, y]
[A •C, x]
A
C x
•
A
X x •
A
X x
y ∈ FIRST(x)
y
C
• •
Example
S A(1) | Bc(2) A aA(3) | a(4) B a(5) | ab(6)
[S •A,]
q0
[S •Bc,]
[S A•,]
A[A •aA,]
[B •a,c]
[S B•c,]
[B •ab,c]
. . .
B
[A •a,]
Convert NFA to DFA
• Each DFA state is a subset of LR(1) items, e.g.
• States can contain S/R, R/R conflicts
• But lookahead can always resolve such conflicts
[A a•A, ] [A a•, ][B a•, c] [B a•b, c] [A •aA, ] [A •a, ]
Example
S A(1) | Bc(2) A aA(3) | a(4) B a(5) | ab(6)
stackinput
a
abBBcS
abc
bc
cc
A valid items
[S •A, ] [S •Bc, ] [A •aA, ] [A •a, ] [B •a, c] [B •ab, c]
S
SRSR
[A a•A, ] [A a•, ] [B a•, c] [B a•b, c] [A •aA, ] [A •a, ]
[B ab•, c] [S B•c, ][S Bc•, ]
look ahead!
LR(k) grammars
• A context-free grammar is LR(1) if all S/R, R/Rconflicts can be resolved with one lookahead
• More generally, LR(k) grammars can resolve allconflicts with k lookahead symbols– Items have the form [A •, x1...xk]
• LR(1) grammars describe the semantics of mostprogramming languages