Simplifiaction of grammar

Lexicalanalyzer parser

Compiler

program machinecode

input output

A parser knows the grammarof the programming language

The parser finds the derivation of a particular input

10 + 2 * 5

Parser

E -> E + E | E * E | INT

E => E + E => E + E * E => 10 + E*E => 10 + 2 * E => 10 + 2 * 5

input

derivation

10

E

2 5

E => E + E => E + E * E => 10 + E*E => 10 + 2 * E => 10 + 2 * 5

derivation

derivation tree

E E

E E

+

*

Simplifications of

Context-Free Grammars

A Substitution Rule

bB

abbAB

abBcA

aaAA

aA

abbcA

ababbAcA

aaAA

aA

Substitute B

Equivalentgrammar

In general:

nyyyB

xBzA

||| 21

Substitute B

zxyzxyzxyA n||| 21 equivalentgrammar

Useless Productions

aAA

AS

S

aSbS

aAaaaaAaAAS

Some derivations never terminate...

Useless Production

bAB

A

aAA

AS

Another grammar:

Not reachable from S

Useless Production

In general:

If wxAyS

Then variable is usefulA

Otherwise, variable is uselessA

)(GLw

A is useful if it occurs in some sentential form and a string of terminals can be derived from it.

A production is useful if all its variables are useful

xA

Removing Useless Productions

Example Grammar:

aCbC

aaB

aA

CAaSS

||

Recognizing useless symbols

A variable may be useless because

there is no way of getting a terminal string from it.

( variable can not derive a string of terminals) Another reason for a variable B to be useless may

be that there is no x and y such that S * x B y ( Variable does not occur in a sentential

form)

First: find all variables that can producestrings with only terminals

aCbC

aaB

aA

CAaSS

|| },{ BA

},,{ SBA

Round 1:

Round 2:

Keep only the variablesthat produce terminal symbols

aCbC

aaB

aA

CAaSS

||

},,{ SBA

aaB

aA

AaSS

|

New grammar G’ = (V’,T’,P’,S) V’ = {S,A,B} T’={ a} P’:

aaB

aA

AaSS

|

aCbC

aaB

aA

CAaSS

||

G

algorithm

G= (V,T,P,S): given context free grammar

G’ = (V’,T’,P’ S) such that V’ contains only variables A for which

A * w T* is possible

Step 1: set V’ to empty

Step 2:repeat the following step until no more variables are added to V’.

For every A V for which P has a production of the form

A x1 x2…xn with all xi in V’ T

Add A to V’

Step 3: Take P’ as all the productions in P whose symbols are all in (V’ T)

If a terminal is not present in any production of P’ remove it from T to get T’.

Example

SAB A b B a B D E a

V’ =

V’ = {A,B,E}

V’={A,B,E,S}

P’:

S AB; B a; E a

A b

Recognize variables that cannot be reached from the start symbol

Dependency graph: It has vertices labeled with variables Add an edge between C and D if and only if there is

production of the form C x D y

if a variable is not reachable from starting state Removing it and the affected productions and

terminals will not change the grammar

Second:Find all variablesreachable from

aaB

aA

AaSS

|

S A B

Dependency Graph

notreachable

S

Keep only the variablesreachable from S

aaB

aA

AaSS

|

aA

AaSS

|

Final Grammar

Example

V’ =

V’ = {A,B,E}

V’={A,B,E,S}

P’:

S AB; B a; E a

A b

S

A

B

E

Only A and B are reachable,so E can be removed

So the new equivalent grammar

V’={A,B,S}, T’={a,b}

P’:

S AB; B a;

A b

•SAB

•A b

•B a

•B D

•E aThis grammar is now having only useful symbols

Process of removing useless symbols

1 Find an equivalent grammar removing symbols which can not produce string of terminals. i.e symbols (A) s.t. A* w

2 From the grammar so obtained get a new grammar having only those symbols (A)each of which is present in some sentential form.

S * x A y

Nullable Variablessuch productions are undesirable

:production A

Nullable Variable: A

Removing Nullable Variables

Example Grammar:

M

aMbM

aMbS

Nullable variable

M

M

aMbM

aMbSSubstitute

abM

aMbM

abS

aMbS

Final Grammar

Both the grammars generates the same language {anbn : n >= 1}

Theorem: Let G be any CFG with not in L(G).Then

there exists an equivalent grammar G’ having no productions.

First find the set Vn of all nullable variables of G as: 1 for all productions A put A in Vn

2 repeat following until no further variables are added to Vn

For all productions

B A1A2…An

Where all Ai are in Vn , put B into Vnas B is also nullable

Once the set Vn has been found we are ready to construct production set P’ as

Look all productions of P which are of the form A x1 x2… xm m >= 1 Where each xi is in V T For each such production of P we put into P’ those

productions as well as all those generated by replacing nullable variable with in all possible combinations.

If all xi are nullable then A will not be added in P’

Example

SABA aAA | B bBB | Here A, B are nullable(found in the first step)S is also nullable (in the second step)

So Vn ={ S,A,B}.so new set of rules areS AB | A| BA aAA | aA| aB bBB |bB | b

Note: the grammar obtained is not equivalent to given one as:

Is in given language but not in the language generated by the new grammar reason being condition of the theorem not satisfied

Example

S ABaCA BCB b| C D | D d

Nullable variables are A,B,C

S ABaC is replaced by S ABaC| BaC|AaC|Aba|aC|Aa|Ba|a

A BC is replaced by A BC | B | C

B b| is replaced by B b

C D | is replaced byC D

D d is copied as such D d

so the new equivalent grammar without any nullable symbol is

S ABaC| BaC|AaC|Aba|aC|Aa|Ba|a

A BC | B | CB bC DD d

Unit-Productions

BAUnit Production:

Such productions are undesirable

Removing Unit Productions

Observation:

AA

Is removed immediately

Example Grammar:

bbB

AB

BA

aA

aAS

bbB

AB

BA

aA

aAS

SubstituteBA

bbB

BAB

aA

aBaAS

|

|

Remove

bbB

BAB

aA

aBaAS

|

|

bbB

AB

aA

aBaAS

|

BB

SubstituteAB

bbB

aA

aAaBaAS

||

bbB

AB

aA

aBaAS

|

Remove repeated productions

bbB

aA

aBaAS

|

bbB

aA

aAaBaAS

||

Final grammar

First find for each A all variables B such that

A * B this can be done by drawing a dependency graph with an edge (C,D) whenever the grammar has a unit production C D

So A * B whenever there is a walk from A to B in the graph.

The new grammar G’ is obtained as:

1 include all non unit productions of P

2 If A * B we add to P’

A y1 | y2| … y n

Where B y1| y2| …yn is the set of all rules in P’ with B on the left

Example

S Aa | B B A| bb A a | bc | B S A B

S * A ; S * B, B * A and A *BIn first step we add

•S Aa B bb

•A a | bc

In the second step we add S a | bc | bb A bb B a | bc So the new grammar

S a | bc | bb|Aa

A a | bb |bc

B a | bb | bc

Step 1: Remove Nullable Variables

Step 2: Remove Unit-Productions

Step 3: Remove Useless Variables

Normal Formsfor

Context-free Grammars

Chomsky Normal FormAll productions have form:

BCA

variable variable

aAand

terminal

Examples:

bA

SAA

aS

ASS

Not ChomskyNormal Form

aaA

SAA

AASS

ASS

Chomsky Normal Form

Conversion to Chomsky Normal Form

Example:

AcB

aabA

ABaS

Not ChomskyNormal Form

AcB

aabA

ABaS

Introduce variables for terminals:

cT

bT

aT

ATB

TTTA

ABTS

c

b

a

c

baa

a

cba TTT ,,

Introduce intermediate variable:

cT

bT

aT

ATB

TTTA

ABTS

c

b

a

c

baa

a

cT

bT

aT

ATB

TTTA

BTV

AVS

c

b

a

c

baa

a

1

1

1V

Introduce intermediate variable:

cT

bT

aT

ATB

TTV

VTA

BTV

AVS

c

b

a

c

ba

a

a

2

2

1

1

2V

cT

bT

aT

ATB

TTTA

BTV

AVS

c

b

a

c

baa

a

1

1

Final grammar in Chomsky Normal Form:

cT

bT

aT

ATB

TTV

VTA

BTV

AVS

c

b

a

c

ba

a

a

2

2

1

1

AcB

aabA

ABaS

Initial grammar

From any context-free grammarnot in Chomsky Normal Form provided empty string is not a member of the language

we can obtain: An equivalent grammar in Chomsky Normal Form

In general:

The Procedure

First remove:

Nullable variables

Unit productions

For every symbol : a

In productions: replace with a aT

Add production aTa

New variable: aT

Replace any production nCCCA 21

with

nnn CCV

VCV

VCA

12

221

11

New intermediate variables: 221 ,,, nVVV

Theorem:For any context-free grammar there is an equivalent grammar in Chomsky Normal Formprovide Empty string is not aMemberIf empty string is in the language LFirst Find an equivalent chomsky’s normal form for L-{} ,Then add a new starting

var S’ and add one new prod

S’ S |

Observations

• Chomsky normal forms are good for parsing and proving theorems

• It is very easy to find the Chomsky normal form of any context-free grammar

Greibach Normal Form

All productions have form:

kVVVaA 21

symbol variables

0k

Examples:

bB

bbBaAA

cABS

||

GreibachNormal Form

aaS

abSbS

Not GreibachNormal Form

aaS

abSbS

Conversion to Greibach Normal Form:

bT

aT

aTS

STaTS

b

a

a

bb

GreibachNormal Form

Theorem: For any context-free grammarthere is an equivalent grammarin Greibach Normal Form if empty string is not a member

Observations

• Greibach normal forms are very good for parsing

• It is hard to find the Greibach normal form of any context-free grammar

An Applicationof

Chomsky Normal Forms

The CYK Membership Algorithm

Input:

• Grammar in Chomsky Normal Form G

• String

Output:

find if )(GLw

w

W = a1 a2 a3…an

Wij =ai…aj

Vij={AV : A wij}

Clearly w L(G) if and only if S V1n

Vij= {A: A BC with

B Vik and C Vk+1,j}

*

The Algorithm

• Grammar :G

bB

ABB

aA

BBA

ABS

• String : w aabbb

Input example:

W11 a

a b b b

aa ab bb bb

aab abb bbb

W14 aabb abbb

aabbb

aabbb

aA

aA

bB

bB

bB

aa ab bb bb

aab abb bbb

aabb abbb

aabbb

bB

ABB

aA

BBA

ABS

aA

aA

bB

bB

bB

aa abS,B

bbA

bbA

aab abb bbb

aabb abbb

aabbb

bB

ABB

aA

BBA

ABS

a A

a A

b B

b B

b B

aa

ab S,B

bb A

bb A

aab S,B

abb A

bbb S,B

aabb A

abbb S,B

V15 aabbb S,B

bB

ABB

aA

BBA

ABS

Since S is a member of V15

The given string is a member of L(G)

S AB | BC

A BA |a V12={S,A}

B CC | b V23={B}

C AB |a V34={S,C}

String baaba V45={S,A}

V11= {B}=V44

V22= V33=V55={A,C} W12= b a B A,CBA, BCA S

W34 = a b A,C B AB CB

V34 = S,C

W13= baa

b aa

B B

ba a

A ,S A,C

V13 empty set

V24= {B}

V35={B}

V14=empty set

V25= {S,A,C}

V15={S,A,C} so string is a member of L(G)

W15 ba aba

Simplifiaction of grammar

Documents

greibach normal

introduce

aba bb

free grammar

final grammar

set vn

equivalent

form