Top Banner
CSC 3130: Automata theory and formal languages LR(k) grammars Fall 2008 MELJUN P. CORTES, MELJUN P. CORTES, MBA,MPA,BSCS,ACS MBA,MPA,BSCS,ACS MELJUN CORTES MELJUN CORTES
24

MELJUN CORTES Automata Theory (Automata14)

Jul 18, 2015

Download

Technology

MELJUN CORTES
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MELJUN CORTES Automata Theory (Automata14)

CSC 3130: Automata theory and formal languages

LR(k) grammars

Fall 2008MELJUN P. CORTES, MELJUN P. CORTES, MBA,MPA,BSCS,ACSMBA,MPA,BSCS,ACS

MELJUN CORTESMELJUN CORTES

Page 2: MELJUN CORTES Automata Theory (Automata14)

LR(0) example from last time

A → •aAbA→ •ab

A → a•AbA → a•bA → •aAbA → •ab

A → aA•b

A → aAb•

A → ab•

ab

bAa

1

2

3

4

5

A → aAb | ab

Page 3: MELJUN CORTES Automata Theory (Automata14)

LR(0) parsing example revisited

Stack Input

S

S

SRSR

11a2

1a2a2

1a2a2b31a2A41a2A4b51A

aabbabb

bb

bbεε

A S

A → aAb | ab A ⇒ aAb ⇒ aabb

12

2

345

A

A → •aAbA→ •ab A → a•Ab

A → a•bA → •aAbA → •ab

A → aA•b A → aAb•

A → ab•

a

b

b

A

a12

3

4 5

Aa b

a b

• •

• •

• •

Page 4: MELJUN CORTES Automata Theory (Automata14)

Meaning of LR(0) items

α •

A

A → α•Xβundiscovered part

εNFA transitions to:

X → •γ

X β

focus

shift focus to subtree rooted at X(if X is nonterminal)

A → αX•βmove past subtreerooted at X

Page 5: MELJUN CORTES Automata Theory (Automata14)

Outline of LR(0) parsing algorithm

• Algorithm can perform two actions:

• What if:

no complete itemis valid

there is one valid item,and it is complete

shift (S) reduce (R)

some valid itemscomplete, some not

more than one validcomplete item

S / R conflict R / R conflict

Page 6: MELJUN CORTES Automata Theory (Automata14)

Definition of LR(0) grammar

• A grammar is LR(0) if S/R, R/R conflicts never occur– LR means parsing happens left to right and produces a

rightmost derivation

• LR(0) grammars are unambiguous and have a fastparsing algorithm

• Unfortunately, they are not “expressive” enoughto describe programming languages

Page 7: MELJUN CORTES Automata Theory (Automata14)

context-free grammarsparse using CYK algorithm (slow)

LR(∞) grammars

Hierarchy of context-free grammars

LR(1) grammars

LR(0) grammarsparse using LR(0) algorithm

javaperl

python…

Page 8: MELJUN CORTES Automata Theory (Automata14)

A grammar that is not LR(0)

S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)

input: a

Page 9: MELJUN CORTES Automata Theory (Automata14)

A grammar that is not LR(0)

S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)

A

S

A B

A

aA

a a

A

a a

S S

ca

input:

possibilities:shift (3), reduce (4)reduce (5), shift (6)

• • •

valid LR(0) items:A → a•A, A → a• B → a•, B → a•b,A → •aA, A → •a

a

S/R, R/R conflicts!

Page 10: MELJUN CORTES Automata Theory (Automata14)

Lookahead

S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)

A

S

A B

A

aA

a a

A

a a

S S

ca

input:

• • •

apeek inside!

valid LR(0) items:A → a•A, A → a• B → a•, B → a•b,A → •aA, A → •a

Page 11: MELJUN CORTES Automata Theory (Automata14)

Lookahead

S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)

input: a apeek inside!

valid LR(0) items:A → a•A, A → a• B → a•, B → a•b,A → •aA, A → •a

A

A

a a

S

parse tree must look like this

action: shift

Page 12: MELJUN CORTES Automata Theory (Automata14)

Lookahead

S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)

input: a a apeek inside!

valid LR(0) items:A → a•A, A → a• A → •aA, A → •a

parse tree must look like this

A

A

aA

a

S

•action: shift

Page 13: MELJUN CORTES Automata Theory (Automata14)

Lookahead

S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)

input: a a a

valid LR(0) items:A → a•A, A → a• A → •aA, A → •a

parse tree must look like this

action: reduce

A

A

aA

a a

S

Page 14: MELJUN CORTES Automata Theory (Automata14)

LR(0) items vs. LR(1) items

A

A

a b

a b

Aa b•

A → aAb | ab

A → a•Ab

A

A

a b

a b

Aa b•

[A → a•Ab, b]

LR(0) LR(1)

Page 15: MELJUN CORTES Automata Theory (Automata14)

LR(1) items

• LR(1) items are of the form

to represent this state in the parsing

[A → α•β, x] [A → α•β, ε]or

α β x•

A

α β•

A

Page 16: MELJUN CORTES Automata Theory (Automata14)

Outline of LR(1) parsing algorithm

• Step 1: Build εNFA that describes valid item updates

• Step 2: Convert εNFA to DFA– As in LR(0), DFA will have shift and reduce states

• Step 3: Run DFA on input, using stack to remember sequence of states– Use lookahead to eliminate wrong reduce items

Page 17: MELJUN CORTES Automata Theory (Automata14)

Recall εNFA transitions for LR(0)

• States of εNFA will be items (plus a start state q0)

• For every item S → •α we have a transition

• For every item A → α•Xβ we have a transition

• For every item A → α•Cβ and production C → •δ

S → •αq0ε

A → αX•βXA → α•Xβ

C → •δεA → α•Cβ

Page 18: MELJUN CORTES Automata Theory (Automata14)

εNFA transitions for LR(1)

• For every item [S → •α, ε] we have a transition

• For every item A → α•Xβ we have a transition

• For every item [A → α•Cβ, x] and production C → δ

for every y in FIRST(βx)

[S → •α, ε]q0ε

[A → αX•β, x]X

[A → α•Xβ, x]

[C → •δ, y]ε

[A → α•Cβ, x]

Page 19: MELJUN CORTES Automata Theory (Automata14)

FIRST sets

• Example

FIRST(α) is the set of terminals that occuron the left in some derivation starting from α

S → A(1) | cB(2) A → aA(3) | a(4) B → a(5) | ab(6)

FIRST(a) = {a}FIRST(A) = {a}FIRST(S) = {a, c}FIRST(bAc) = {b}FIRST(BA) = {a}FIRST(ε) = ∅

Page 20: MELJUN CORTES Automata Theory (Automata14)

Explaining the transitions

[A → αX•β, x]X

[A → α•Xβ, x]

[C → •δ, y]ε

[A → α•Cβ, x]

α

A

C β x

α •

A

X β x α •

A

X β x

y ∈ FIRST(βx)

y

C β

δ • •

Page 21: MELJUN CORTES Automata Theory (Automata14)

Example

S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)

[S → •A, ε]

q0

ε

[S → •Bc, ε]

ε

[S → A•, ε]

A[A → •aA, ε]

[B → •a, c]

[S → B•c, ε]

[B → •ab, c]

. . .

ε

ε

ε

B

[A → •a, ε]ε

Page 22: MELJUN CORTES Automata Theory (Automata14)

Convert NFA to DFA

• Each DFA state is a subset of LR(1) items, e.g.

• States can contain S/R, R/R conflicts

• But lookahead can always resolve such conflicts

[A → a•A, ε] [A → a•, ε][B → a•, c] [B → a•b, c] [A → •aA, ε] [A → •a, ε]

Page 23: MELJUN CORTES Automata Theory (Automata14)

Example

S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)

stack input

ε

a

abBBcS

abc

bc

ccεε

A valid items[S → •A, ε] [S → •Bc, ε] [A → •aA, ε] [A → •a, ε] [B → •a, c] [B → •ab, c]

S

SRSR

[A → a•A, ε] [A → a•, ε] [B → a•, c] [B → a•b, c] [A → •aA, ε] [A → •a, ε]

[B → ab•, c] [S → B•c, ε]

[S → Bc•, ε]

look ahead!

Page 24: MELJUN CORTES Automata Theory (Automata14)

LR(k) grammars

• A context-free grammar is LR(1) if all S/R, R/Rconflicts can be resolved with one lookahead

• More generally, LR(k) grammars can resolve allconflicts with k lookahead symbols– Items have the form [A → α•β, x1...xk]

• LR(1) grammars describe the semantics of mostprogramming languages