Lexical analyzer parser Compiler program machine code input output
Nov 02, 2014
Lexicalanalyzer parser
Compiler
program machinecode
input output
A parser knows the grammarof the programming language
The parser finds the derivation of a particular input
10 + 2 * 5
Parser
E -> E + E | E * E | INT
E => E + E => E + E * E => 10 + E*E => 10 + 2 * E => 10 + 2 * 5
input
derivation
10
E
2 5
E => E + E => E + E * E => 10 + E*E => 10 + 2 * E => 10 + 2 * 5
derivation
derivation tree
E E
E E
+
*
Simplifications of
Context-Free Grammars
A Substitution Rule
bB
abbAB
abBcA
aaAA
aA
abbcA
ababbAcA
aaAA
aA
Substitute B
Equivalentgrammar
In general:
nyyyB
xBzA
||| 21
Substitute B
zxyzxyzxyA n||| 21 equivalentgrammar
Useless Productions
aAA
AS
S
aSbS
aAaaaaAaAAS
Some derivations never terminate...
Useless Production
bAB
A
aAA
AS
Another grammar:
Not reachable from S
Useless Production
In general:
If wxAyS
Then variable is usefulA
Otherwise, variable is uselessA
)(GLw
A is useful if it occurs in some sentential form and a string of terminals can be derived from it.
A production is useful if all its variables are useful
xA
Removing Useless Productions
Example Grammar:
aCbC
aaB
aA
CAaSS
||
Recognizing useless symbols
A variable may be useless because
there is no way of getting a terminal string from it.
( variable can not derive a string of terminals) Another reason for a variable B to be useless may
be that there is no x and y such that S * x B y ( Variable does not occur in a sentential
form)
First: find all variables that can producestrings with only terminals
aCbC
aaB
aA
CAaSS
|| },{ BA
},,{ SBA
Round 1:
Round 2:
Keep only the variablesthat produce terminal symbols
aCbC
aaB
aA
CAaSS
||
},,{ SBA
aaB
aA
AaSS
|
New grammar G’ = (V’,T’,P’,S) V’ = {S,A,B} T’={ a} P’:
aaB
aA
AaSS
|
aCbC
aaB
aA
CAaSS
||
G
algorithm
G= (V,T,P,S): given context free grammar
G’ = (V’,T’,P’ S) such that V’ contains only variables A for which
A * w T* is possible
Step 1: set V’ to empty
Step 2:repeat the following step until no more variables are added to V’.
For every A V for which P has a production of the form
A x1 x2…xn with all xi in V’ T
Add A to V’
Step 3: Take P’ as all the productions in P whose symbols are all in (V’ T)
If a terminal is not present in any production of P’ remove it from T to get T’.
Example
SAB A b B a B D E a
V’ =
V’ = {A,B,E}
V’={A,B,E,S}
P’:
S AB; B a; E a
A b
Recognize variables that cannot be reached from the start symbol
Dependency graph: It has vertices labeled with variables Add an edge between C and D if and only if there is
production of the form C x D y
if a variable is not reachable from starting state Removing it and the affected productions and
terminals will not change the grammar
Second:Find all variablesreachable from
aaB
aA
AaSS
|
S A B
Dependency Graph
notreachable
S
Keep only the variablesreachable from S
aaB
aA
AaSS
|
aA
AaSS
|
Final Grammar
Example
V’ =
V’ = {A,B,E}
V’={A,B,E,S}
P’:
S AB; B a; E a
A b
S
A
B
E
Only A and B are reachable,so E can be removed
So the new equivalent grammar
V’={A,B,S}, T’={a,b}
P’:
S AB; B a;
A b
•SAB
•A b
•B a
•B D
•E aThis grammar is now having only useful symbols
Process of removing useless symbols
1 Find an equivalent grammar removing symbols which can not produce string of terminals. i.e symbols (A) s.t. A* w
2 From the grammar so obtained get a new grammar having only those symbols (A)each of which is present in some sentential form.
S * x A y
Nullable Variablessuch productions are undesirable
:production A
Nullable Variable: A
Removing Nullable Variables
Example Grammar:
M
aMbM
aMbS
Nullable variable
M
M
aMbM
aMbSSubstitute
abM
aMbM
abS
aMbS
Final Grammar
Both the grammars generates the same language {anbn : n >= 1}
Theorem: Let G be any CFG with not in L(G).Then
there exists an equivalent grammar G’ having no productions.
First find the set Vn of all nullable variables of G as: 1 for all productions A put A in Vn
2 repeat following until no further variables are added to Vn
For all productions
B A1A2…An
Where all Ai are in Vn , put B into Vnas B is also nullable
Once the set Vn has been found we are ready to construct production set P’ as
Look all productions of P which are of the form A x1 x2… xm m >= 1 Where each xi is in V T For each such production of P we put into P’ those
productions as well as all those generated by replacing nullable variable with in all possible combinations.
If all xi are nullable then A will not be added in P’
Example
SABA aAA | B bBB | Here A, B are nullable(found in the first step)S is also nullable (in the second step)
So Vn ={ S,A,B}.so new set of rules areS AB | A| BA aAA | aA| aB bBB |bB | b
Note: the grammar obtained is not equivalent to given one as:
Is in given language but not in the language generated by the new grammar reason being condition of the theorem not satisfied
Example
S ABaCA BCB b| C D | D d
Nullable variables are A,B,C
S ABaC is replaced by S ABaC| BaC|AaC|Aba|aC|Aa|Ba|a
A BC is replaced by A BC | B | C
B b| is replaced by B b
C D | is replaced byC D
D d is copied as such D d
so the new equivalent grammar without any nullable symbol is
S ABaC| BaC|AaC|Aba|aC|Aa|Ba|a
A BC | B | CB bC DD d
Unit-Productions
BAUnit Production:
Such productions are undesirable
Removing Unit Productions
Observation:
AA
Is removed immediately
Example Grammar:
bbB
AB
BA
aA
aAS
bbB
AB
BA
aA
aAS
SubstituteBA
bbB
BAB
aA
aBaAS
|
|
Remove
bbB
BAB
aA
aBaAS
|
|
bbB
AB
aA
aBaAS
|
BB
SubstituteAB
bbB
aA
aAaBaAS
||
bbB
AB
aA
aBaAS
|
Remove repeated productions
bbB
aA
aBaAS
|
bbB
aA
aAaBaAS
||
Final grammar
First find for each A all variables B such that
A * B this can be done by drawing a dependency graph with an edge (C,D) whenever the grammar has a unit production C D
So A * B whenever there is a walk from A to B in the graph.
The new grammar G’ is obtained as:
1 include all non unit productions of P
2 If A * B we add to P’
A y1 | y2| … y n
Where B y1| y2| …yn is the set of all rules in P’ with B on the left
Example
S Aa | B B A| bb A a | bc | B S A B
S * A ; S * B, B * A and A *BIn first step we add
•S Aa B bb
•A a | bc
In the second step we add S a | bc | bb A bb B a | bc So the new grammar
S a | bc | bb|Aa
A a | bb |bc
B a | bb | bc
Step 1: Remove Nullable Variables
Step 2: Remove Unit-Productions
Step 3: Remove Useless Variables
Normal Formsfor
Context-free Grammars
Chomsky Normal FormAll productions have form:
BCA
variable variable
aAand
terminal
Examples:
bA
SAA
aS
ASS
Not ChomskyNormal Form
aaA
SAA
AASS
ASS
Chomsky Normal Form
Conversion to Chomsky Normal Form
Example:
AcB
aabA
ABaS
Not ChomskyNormal Form
AcB
aabA
ABaS
Introduce variables for terminals:
cT
bT
aT
ATB
TTTA
ABTS
c
b
a
c
baa
a
cba TTT ,,
Introduce intermediate variable:
cT
bT
aT
ATB
TTTA
ABTS
c
b
a
c
baa
a
cT
bT
aT
ATB
TTTA
BTV
AVS
c
b
a
c
baa
a
1
1
1V
Introduce intermediate variable:
cT
bT
aT
ATB
TTV
VTA
BTV
AVS
c
b
a
c
ba
a
a
2
2
1
1
2V
cT
bT
aT
ATB
TTTA
BTV
AVS
c
b
a
c
baa
a
1
1
Final grammar in Chomsky Normal Form:
cT
bT
aT
ATB
TTV
VTA
BTV
AVS
c
b
a
c
ba
a
a
2
2
1
1
AcB
aabA
ABaS
Initial grammar
From any context-free grammarnot in Chomsky Normal Form provided empty string is not a member of the language
we can obtain: An equivalent grammar in Chomsky Normal Form
In general:
The Procedure
First remove:
Nullable variables
Unit productions
For every symbol : a
In productions: replace with a aT
Add production aTa
New variable: aT
Replace any production nCCCA 21
with
nnn CCV
VCV
VCA
12
221
11
New intermediate variables: 221 ,,, nVVV
Theorem:For any context-free grammar there is an equivalent grammar in Chomsky Normal Formprovide Empty string is not aMemberIf empty string is in the language LFirst Find an equivalent chomsky’s normal form for L-{} ,Then add a new starting
var S’ and add one new prod
S’ S |
Observations
• Chomsky normal forms are good for parsing and proving theorems
• It is very easy to find the Chomsky normal form of any context-free grammar
Greibach Normal Form
All productions have form:
kVVVaA 21
symbol variables
0k
Examples:
bB
bbBaAA
cABS
||
GreibachNormal Form
aaS
abSbS
Not GreibachNormal Form
aaS
abSbS
Conversion to Greibach Normal Form:
bT
aT
aTS
STaTS
b
a
a
bb
GreibachNormal Form
Theorem: For any context-free grammarthere is an equivalent grammarin Greibach Normal Form if empty string is not a member
Observations
• Greibach normal forms are very good for parsing
• It is hard to find the Greibach normal form of any context-free grammar
An Applicationof
Chomsky Normal Forms
The CYK Membership Algorithm
Input:
• Grammar in Chomsky Normal Form G
• String
Output:
find if )(GLw
w
W = a1 a2 a3…an
Wij =ai…aj
Vij={AV : A wij}
Clearly w L(G) if and only if S V1n
Vij= {A: A BC with
B Vik and C Vk+1,j}
*
The Algorithm
• Grammar :G
bB
ABB
aA
BBA
ABS
• String : w aabbb
Input example:
W11 a
a b b b
aa ab bb bb
aab abb bbb
W14 aabb abbb
aabbb
aabbb
aA
aA
bB
bB
bB
aa ab bb bb
aab abb bbb
aabb abbb
aabbb
bB
ABB
aA
BBA
ABS
aA
aA
bB
bB
bB
aa abS,B
bbA
bbA
aab abb bbb
aabb abbb
aabbb
bB
ABB
aA
BBA
ABS
a A
a A
b B
b B
b B
aa
ab S,B
bb A
bb A
aab S,B
abb A
bbb S,B
aabb A
abbb S,B
V15 aabbb S,B
bB
ABB
aA
BBA
ABS
Since S is a member of V15
The given string is a member of L(G)
S AB | BC
A BA |a V12={S,A}
B CC | b V23={B}
C AB |a V34={S,C}
String baaba V45={S,A}
V11= {B}=V44
V22= V33=V55={A,C} W12= b a B A,CBA, BCA S
W34 = a b A,C B AB CB
V34 = S,C
W13= baa
b aa
B B
ba a
A ,S A,C
V13 empty set
V24= {B}
V35={B}
V14=empty set
V25= {S,A,C}
V15={S,A,C} so string is a member of L(G)
W15 ba aba