Top Banner
CSC 3130: Automata theory and formal languages Andrej Bogdanov http://www.cse.cuhk.edu.hk/ ~andrejb/csc3130 The Chinese University of Hong Kong Normal forms and parsing Fall 2009
22

CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Normal forms.

Jan 18, 2018

Download

Documents

Silvia Barnett

First attempt Maybe we can try all possible derivations: S → 0S1 | 1S0S1 | T T → S |  x = S0S1 1S0S1 T 00S11 01S0S11 0T1 S  10S10S1... when do we stop?
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CSC 3130: Automata theory and formal languages Andrej Bogdanov  The Chinese University of Hong Kong Normal forms.

CSC 3130: Automata theory and formal languages

Andrej Bogdanovhttp://www.cse.cuhk.edu.hk/~andrejb/

csc3130

The Chinese University of Hong Kong

Normal forms and parsing

Fall 2009

Page 2: CSC 3130: Automata theory and formal languages Andrej Bogdanov  The Chinese University of Hong Kong Normal forms.

Testing membership and parsing• Given a grammar

• How can we know if a string x is in its language?

• If so, can we obtain a parse tree for x?• Can we tell if the parse tree is unique?

S → 0S1 | 1S0S1 | TT → S | e

Page 3: CSC 3130: Automata theory and formal languages Andrej Bogdanov  The Chinese University of Hong Kong Normal forms.

First attempt

• Maybe we can try all possible derivations:

S → 0S1 | 1S0S1 | TT → S | x = 00111

S 0S1

1S0S1

T

00S1101S0S110T1

S

10S10S1...

when do we stop?

Page 4: CSC 3130: Automata theory and formal languages Andrej Bogdanov  The Chinese University of Hong Kong Normal forms.

Problems

• How do we know when to stop?

S → 0S1 | 1S0S1 | TT → S | x = 00111

S 0S1

1S0S1

00S1101S0S110T1

10S10S1...

when do we stop?

Page 5: CSC 3130: Automata theory and formal languages Andrej Bogdanov  The Chinese University of Hong Kong Normal forms.

Problems

• Idea: Stop derivation when length exceeds |x|• Not right because of -productions

• We might want to eliminate -productions too

S → 0S1 | 1S0S1 | TT → S | x = 01011

S 0S1 01S0S11 01S011 010111 3 7 6 5

Page 6: CSC 3130: Automata theory and formal languages Andrej Bogdanov  The Chinese University of Hong Kong Normal forms.

Problems

• Loops among the variables (S → T → S) might make us go forever

• We want to eliminate such loops

S → 0S1 | 1S0S1 | TT → S | x = 00111

Page 7: CSC 3130: Automata theory and formal languages Andrej Bogdanov  The Chinese University of Hong Kong Normal forms.

Removal of -productions• A variable N is nullable if there is a derivation

• How to remove -productions (except from S)Find all nullable variables N1, ..., Nk

For every production of the form A → Ni,

add another production A → If Ni → is a production, remove itIf S is nullable, add the special production S →

N *

Page 8: CSC 3130: Automata theory and formal languages Andrej Bogdanov  The Chinese University of Hong Kong Normal forms.

Example• Find the nullable variables

S ACDA aB C ED | D BC | bE b

B C D

nullable variablesgrammar

Find all nullable variables N1, ..., Nk

Page 9: CSC 3130: Automata theory and formal languages Andrej Bogdanov  The Chinese University of Hong Kong Normal forms.

Finding nullable variables• To find nullable variables, we work backwards

– First, mark all variables A s.t. A as nullable– Then, as long as there are productions of the form

where all of A1,…, Ak are marked as nullable, mark A as nullable

A → A1… Ak

Page 10: CSC 3130: Automata theory and formal languages Andrej Bogdanov  The Chinese University of Hong Kong Normal forms.

Eliminating -productionsS ACDA aB C ED | D BC | bE b

nullable variables: B, C, D

For every production of the form A → Ni,add another production A →

If Ni → is a production, remove it

D CS ADD BD S ACS AC E

Page 11: CSC 3130: Automata theory and formal languages Andrej Bogdanov  The Chinese University of Hong Kong Normal forms.

Dealing with loops• A unit production is a production of the

form

where A1 and A2 are both variables• Example

A1 → A2

S → 0S1 | 1S0S1 | TT → S | R | R → 0SR

grammar: unit productions:

S T

R

Page 12: CSC 3130: Automata theory and formal languages Andrej Bogdanov  The Chinese University of Hong Kong Normal forms.

Removal of unit productions• If there is a cycle of unit productions

delete it and replace everything with A1

• Example

A1 → A2 → ... → Ak → A1

S → 0S1 | 1S0S1 | TT → S | R | R → 0SR

S T

R

S → 0S1 | 1S0S1S → R | R → 0SR

T is replaced by S in the {S, T} cycle

Page 13: CSC 3130: Automata theory and formal languages Andrej Bogdanov  The Chinese University of Hong Kong Normal forms.

Removal of unit productions• For other unit productions, replace every

chain

by productions A1 → ,... , Ak →

• Example

A1 → A2 → ... → Ak →

S → R → 0SR is replaced by S → 0SR, R → 0SR

S → 0S1 | 1S0S1 | R | R → 0SR

S → 0S1 | 1S0S1 | 0SR | R → 0SR

Page 14: CSC 3130: Automata theory and formal languages Andrej Bogdanov  The Chinese University of Hong Kong Normal forms.

Recap• After eliminating -productions and unit

productions, we know that every derivation

doesn’t shrink in length and doesn’t go into cycles

• Exception: S → – We will not use this rule at all, except to check if L

• Note -productions must be eliminated before unit

productions

S a1…ak where a1, …, ak are terminals*

Page 15: CSC 3130: Automata theory and formal languages Andrej Bogdanov  The Chinese University of Hong Kong Normal forms.

Example: testing membershipS → 0S1 | 1S0S1 | TT → S |

x = 00111

S → | 01 | 101 | 0S1 |10S1 | 1S01 | 1S0S1

S 01, 101

10S11S011S0S1

10011, strings of length ≥ 610101, strings of length ≥ 6

unit, -prod

eliminate

only strings of length ≥ 6

0S1 0011, 0101100S11strings of length ≥ 6

only strings of length ≥ 6

Page 16: CSC 3130: Automata theory and formal languages Andrej Bogdanov  The Chinese University of Hong Kong Normal forms.

Algorithm 1 for testing membership• How to check if a string x ≠ is in L(G)

Eliminate all -productions and unit productionsLet X := SWhile some new rule R can be applied to X

Apply R to XIf X = x, you have found a

derivation for xIf |X| > |x|, backtrack

If no more rules can be applied to X, x is not in L

Page 17: CSC 3130: Automata theory and formal languages Andrej Bogdanov  The Chinese University of Hong Kong Normal forms.

Practical limitations of Algorithm I• This method can be very slow if x is long

• There is a faster algorithm, but it requires that we do some more transformations on the grammar

G = CFG of the java programming languagex = code for a 200-line java program

algorithm might take about 10200 steps!

Page 18: CSC 3130: Automata theory and formal languages Andrej Bogdanov  The Chinese University of Hong Kong Normal forms.

Chomsky Normal Form• A grammar is in Chomsky Normal Form if every

production (except possibly S → ) is of the type

• Conversion to Chomsky Normal Form is easy:

A → BC A → aor

A → BcDEreplace terminalswith new variables

A → BCDEC → c break up

sequenceswith new variables

A → BX1

X1 → CX2

X2 → DEC → c

Page 19: CSC 3130: Automata theory and formal languages Andrej Bogdanov  The Chinese University of Hong Kong Normal forms.

Exercise• Convert this CFG into Chomsky Normal Form:

S |ADDAA aC cD bCb

Page 20: CSC 3130: Automata theory and formal languages Andrej Bogdanov  The Chinese University of Hong Kong Normal forms.

Algorithm 2 for testing membership

S AB | BCA BA | aB CC | bC AB | a

x = baaba

Idea: We generate each substring of x bottom up

ab b aaACB B ACACBSA SASCB– B

SAC–SAC

Page 21: CSC 3130: Automata theory and formal languages Andrej Bogdanov  The Chinese University of Hong Kong Normal forms.

Parse tree reconstruction

S AB | BCA BA | aB CC | bC AB | a

x = baabaab b aa

ACB B ACACBSA SASCB– B

SAC–SAC

Tracing back the derivations, we obtain the parse tree

Page 22: CSC 3130: Automata theory and formal languages Andrej Bogdanov  The Chinese University of Hong Kong Normal forms.

Cocke-Younger-Kasami algorithm

For cells in last row If there is a production A xi

Put A in table cell iiFor cells st in other rows If there is a production A BC where B is in cell sj and C is in cell jt Put A in cell st

x1 x2 … xk

11 22 kk12 23

… …1k

tablecells

s j t k1

Input: Grammar G in CNF, string x = x1…xk

Cell ij remembers all possible derivations of substring xi…xj