Top Banner
Lecture Five: Context Free Grammar (CFG) Amjad Ali CFG, Lecture 5, slide
35

Lecture Five: Context Free Grammar (CFG)

Jan 24, 2016

Download

Documents

Caine

Lecture Five: Context Free Grammar (CFG). Amjad Ali. Definition of Context-Free Grammar. There are four important components in a grammatical description of a language: - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture Five: Context Free Grammar (CFG)

Lecture Five:Context Free Grammar

(CFG)

Amjad Ali

CFG, Lecture 5, slide

Page 2: Lecture Five: Context Free Grammar (CFG)

There are four important components in a grammatical description of a

language:

1. There is a finite set of symbols that form the strings of the language

being defined. This set was {0,1} in the palindrome example we just

saw. We call this alphabet the terminals, or terminal symbols.

2. There is a finite set of variables, also called sometimes nonterminals

or syntactic categories. Each variable represents a language; i.e., a

set of strings. In our example above, there was only one variable, P,

which we used to represent the class of palindromes over alphabet

{0,1}.

Definition of Context-Free Grammar

CFG, Lecture 5, slide

Page 3: Lecture Five: Context Free Grammar (CFG)

3. One of the variables represents the language being defined; it is

called the start symbol. Other variables represent auxiliary classes of

strings that are used to help define the language of the start symbol.

In our example, P , the only variable , is the start symbol.

4. There is a finite set of productions or rules that represent the recursive

definition of a language. Each production consists of:

a) A variable that is being (partially) defined by the production. This

variable is often called the head of the production.

b) The production symbol

CFG, Lecture 5, slide

Page 4: Lecture Five: Context Free Grammar (CFG)

c) A string of zero or more terminals and variables. This string, called

the body of the production, represents one way to form strings in the

language of the variable of the head. In so doing, we leave terminals

unchanged and substitute for each variable if the body any string that

is known to be in language of that variable.

CFG, Lecture 5, slide

Page 5: Lecture Five: Context Free Grammar (CFG)

Alternate Definition of Context-Free Grammar

A context-free grammar, CFG is a collection of three things:

1. An alphabet Σ of letters called terminals from which we are going to

make strings that will be the words of a language.

2. A set of symbols called nonterminals, one of which is the symbol S,

standing for “start here”.

3. A finite set of productions of the form.

One Nonterminals finite set of terminals and/or Nonterminals

CFG, Lecture 5, slide

Page 6: Lecture Five: Context Free Grammar (CFG)

Formal Definition of CFG

A context-free grammar is a 4-tuple (V, Σ, R ,S), where

1. V is finite set called the variables.

2. Σ is a finite set, disjoint from V, called the terminals.

3. R is a finite set of rules, with each rule being a variable and a string

of variables and terminals, and

4. SV is the start variable.

CFG, Lecture 5, slide

Page 7: Lecture Five: Context Free Grammar (CFG)

Palindrome Example

Some of the rules that define the palindromes, expressed in the context-free grammar notation, are:

1. P ^

2. P 0

3. P 1

4. P 0P0

5. P 1P1

CFG, Lecture 5, slide

Page 8: Lecture Five: Context Free Grammar (CFG)

Notions for CFG Derivations

Some conventions used while discussing CFG’s:

1. Lower-case letters near the beginning of the alphabet, a, b, and so on,

are terminal symbols. Digits and other characters such as + or

parentheses can also be used as terminals.

2. Upper-case letters near the beginning of the alphabet, A, B, and so on,

are variables.

3. Lower-case letters near the end of the alphabet, such as w or z, are

strings of terminals. This convention reminds us that the terminals are

analogous to the input symbols of an automation.

4. Upper-case letters near the end of the alphabet, such as X or Y, are either

terminals or variables.

CFG, Lecture 5, slide

Page 9: Lecture Five: Context Free Grammar (CFG)

5. Lower-case Greek letters, such as alpha and beta, are strings consisting of terminals and/or

variables.

There is no special notation for strings that consist of variables only, since this concept plays

no important role. However, a string named alpha or another Greek letter might happen to

have only variables.

CFG, Lecture 5, slide

Page 10: Lecture Five: Context Free Grammar (CFG)

Example: A complex CFG that represents (a simplification of ) expressions in a typical programming language. Operators used are limited to + and *, representing addition and multiplication respectively. Arguments act as identifiers, but instead of full set of typical identifiers (letters followed by zero or more letters and digits). The letters are a and b and the digits 0 and 1. Every identifier begins with a or b, which may be followed by any string in {a, b, 0, 1}* .

CFG, Lecture 5, slide

Page 11: Lecture Five: Context Free Grammar (CFG)

Two variables used in this grammar:

1. E which represents expressions and it represents the language of expressions we are defining.

2. I represents identifiers.

The productions will be:

1. E I2. E E+E3. E E * E4. E (E)5. I a6. I b7. I Ia8. I Ib9. I I0

10. I I1

CFG, Lecture 5, slide

Page 12: Lecture Five: Context Free Grammar (CFG)

Suppose a string of the above CFG is a*(a+b00).

Its derivations will be:

E => E * E Production no. 3

=> I * E Production no. 1

=> a * E Production no. 5

=> a * (E) Production no. 4

=> a * (E + E) Production no. 2

=> a * (I + E) Production no. 1

CFG, Lecture 5, slide

Page 13: Lecture Five: Context Free Grammar (CFG)

=> a * a (a + E) Production no.5

=> a * a (a + I) Production no.1

=> a * a (a + I0) Production no. 9

=> a * a (a + I00) Production no. 9

=> a * (a + b00) Production no. 6

CFG, Lecture 5, slide

Page 14: Lecture Five: Context Free Grammar (CFG)

Leftmost and Right most Derivations

Leftmost derivation:In order to restrict the number of choices we have in deriving a

string, it is often useful to require that at each step we replace the leftmost variable by one of its production bodies. Such a derivation is called a leftmost derivation.

Rightmost derivation:

In order to restrict the number of choices we have in deriving a string, it is often useful to require that at each step we replace the rightmost variable by one of its production bodies. Such a derivation is called a rightmost derivation.

CFG, Lecture 5, slide

Page 15: Lecture Five: Context Free Grammar (CFG)

Example:The inference that a*(a+b00) is in the language of

variable E can be reflected in a derivation of that string, starting with the string E.

Leftmost derivation will be:

E => E * E => I * E => a * E => a * (E) => a * (E + E)

=> a * ( I + E ) => a * ( a + E) => a * ( a + I) =>

a * ( a + I0) => a * ( a + I00) => a * ( a + b00)

We can summarize the leftmost derivation as E => a*(a+b00) or E * E => a * (E)

*lm

*lm

lm lm lm

lm

lm lm

lm

lm

lm lm

lm

CFG, Lecture 5, slide

Page 16: Lecture Five: Context Free Grammar (CFG)

Rightmost derivation will be:

E => E * E => E * (E) => E * (E + E) => E * (E + I) => E * (E + I0)

=> E * ( E + I00 ) => E * (E + b00) => E * (I + b00) =>

E * ( a + b00) => I * ( a + b00) => a * ( a + b00)

So the rightmost derivation can be expressed as E => a*(a+b00).

CFG, Lecture 5, slide

rm rm

rm

rm

rm

rm

rm

rm

rm

rmrm

rm

Page 17: Lecture Five: Context Free Grammar (CFG)

Inference, Derivations and Parse Trees

I. The recursive inference procedure determines that terminal string w is in the language of variable A.

II. A=>w.

III. A =>w.

IV. A =>w.

V. There is a parse tree with root A and yield w.

*

*lm

*rm

CFG, Lecture 5, slide

Page 18: Lecture Five: Context Free Grammar (CFG)

Some Examples:

Example#1:Let the terminal be a and the nonterminal be S, and the productions be

S aSS ^

The above language is a*.

To derive a6 in this CFG the following derivations will be used.

S => aS => aaS => aaS => aaaS => aaaaS => aaaaaS => aaaaaaS => aaaaaa^ = aaaaaa

Notice:i. means “can be replaced by” as in S aS.ii. => means “can develop into” as in aaS => aaaS

CFG, Lecture 5, slide

Page 19: Lecture Five: Context Free Grammar (CFG)

Example#2:Let the terminals be a and b and the only nonterminal be S, and the

productions be

S aSS bSS aS b

The language generated by this CFG is the set of all possible strings of letters a and b except for the null string, which we cannot generate.

To produce the string baab the following derivations will be used.

S => bS => baS => baaS => baab

CFG, Lecture 5, slide

Page 20: Lecture Five: Context Free Grammar (CFG)

Example#3:Let the terminals be a and b, the only nonterminal be S, and the productions be

S aSS bSS aS bS ^

The word ab can be generated by the derivationS =>aS =>abS =>ab^ =ab

or by the derivationS=>aS =>ab

The language of this CFG is also (a+b)*, but the sequence of productions that is used to generate a specific word is not unique.

The third and fourth productions are redundant.

CFG, Lecture 5, slide

Page 21: Lecture Five: Context Free Grammar (CFG)

Example#4:Let the terminals be a and b, the only nonterminal be S and X, and the productions be

S XaaXX aXX bXX ^

The words generated from S have the formanything aa anything

or (a+b)*aa(a+b)*which is the language of all words with a double a in them somewhere.

For example, to generate baabaab, we can proceed as follows:S=>XaaX=>bXaaX=>baXaaX=>baaXaaX=>baabXaaX =>baab^aaX=>baabaaX=>baabaabX=>baabaab^=baabaab

CFG, Lecture 5, slide

Page 22: Lecture Five: Context Free Grammar (CFG)

Example#5:Let the terminals be a and b, the only nonterminal be S,X and Y and the productions be

S XYX aXX bXX aY YaY YbY a

X productions are:X aXX bXX a

In the preceding productions, it can be seen that:o any string of terminals that comes from X must end in an ao any words ending in an a can be derived from X

CFG, Lecture 5, slide

Page 23: Lecture Five: Context Free Grammar (CFG)

To derive the word babba from X, the procedure will be:X=>bX=>baX=>babX=>babbX=>babba

Considering variable Y:Y productions are:

Y YaY YbY a

It can be seen that the words that can be derived from Y:o Exactly those that begin with an a

To derive abbab, the procedure will be:

Y=>Yb=>Yab=>Ybab=>Ybbab=>abbab

CFG, Lecture 5, slide

Page 24: Lecture Five: Context Free Grammar (CFG)

Since S XY

The words that can be derived from S have a double a in them.

To derive babaabb, the procedure will be:

S=>XY=>bXY=>baXY=>babXY=>babaY=>babaYb=>babaYbb

=>babaabb

CFG, Lecture 5, slide

Page 25: Lecture Five: Context Free Grammar (CFG)

Example#6:Let the terminals be a and b, and the three nonterminals be S, BALANCED, and UNBALANCED.

The productions are:S SSS BALANCED SS S BALANCED S ^S UNBALANCED S UNBALANCED

BALANCED aa BALANCED bb

UNBALANCED ab UNBALANCED baIn the preceding productions, it can be seen that:

o The language generated is the set of all words with an even number of a’s and an even number of b’s i.e. the language EVEN-EVEN.

CFG, Lecture 5, slide

Page 26: Lecture Five: Context Free Grammar (CFG)

Derivation of word aababbab:

S=>BALANCED S

=>aaS

=>aa UNBALANCED S UNBALANCED

=>aa ba S UNBALANCED

=>aa ba S ab

=>aa ba BALANCED S ab

=>aa ba bb S ab

=>aa ba bb ^ ab

= aababbab

CFG, Lecture 5, slide

Page 27: Lecture Five: Context Free Grammar (CFG)

Example#7:Let the terminals be a and b, and only one nonterminal S.

The productions are:

S aSbS ^

The language generated by these productions is the nonregular language anbn.

Derivation of a6Sb6 using the above productions:

S=>aSb=>aaSbb

=>aaaSbbb=>aaaaSbbbb

=>aaaaaSbbbbb=>aaaaaaSbbbbbb

=>aaaaaabbbbbbCFG, Lecture 5, slide

Page 28: Lecture Five: Context Free Grammar (CFG)

Example#8:Let the terminals be a and b, and only one nonterminal S.

The productions are:

S aSaS bSbS ^

The language generated by these productions is the nonregular language PALINDROME(a word that reads the same backwards as forwards.

Derivation of word abbaabba using the above productions:

S=>aSb=>aaSbb

=>aaaSbbb=>aaaaSbbbb

=>aaaaaSbbbbb=>aaaaaaSbbbbbb

=>aaaaaabbbbbbCFG, Lecture 5, slide

Page 29: Lecture Five: Context Free Grammar (CFG)

Derivation of word abbaabba using the above productions:

S =>aSa

=>abSba

=>abbSbba

=>abbaSabba

=>abbaabba

CFG, Lecture 5, slide

Page 30: Lecture Five: Context Free Grammar (CFG)

Example#9:

ODD PALINDROME language is the language containing odd number of letters in words.

To convert a general palindrome(which can contain both even and odd letters).

Grammar for ODD PALINDROME is:S => aSaS => bSbS => aS => b

The above grammar can be modified to be the entire languae PALINDROME as:

S => aSaS => bSbS => aS => bS => ^

CFG, Lecture 5, slide

Page 31: Lecture Five: Context Free Grammar (CFG)

Example#10:

A nonregular language that can be generated by CFG is anban.

S => aSaS => b

CFG, Lecture 5, slide

Page 32: Lecture Five: Context Free Grammar (CFG)

Example#11:

Let the terminals be a and b, the nonterminals be S, A, and B, and the productions be

S aBS bAA aA aSA bAAB bB bSB aBB

The language that this CFG generates is the language EQUAL of all strings that have an equal number of a’s and b’s in them.

Some words of this language are abba, aaabbb, and ba.CFG, Lecture 5, slide

Page 33: Lecture Five: Context Free Grammar (CFG)

Ambugity

Definition:A CFG is called ambiguous if for at least one word in the language that it

generates there are two possible derivations of the word that correspond to different syntax trees.If a CFG is not ambiguous, it is called unambiguous.

Ambiguous Grammars:

Consider the form E + E * E. It has two derivations from E.

1. E=> E + E => E + E * E

2. E=> E * E => E + E * E

CFG, Lecture 5, slide

Page 34: Lecture Five: Context Free Grammar (CFG)

E E

E+ E E * E

E * E E + E

fig. I fig. IITwo parse trees with the same yield

CFG, Lecture 5, slide

Page 35: Lecture Five: Context Free Grammar (CFG)

Removing Ambiguity from Grammars

There are two causes of ambiguity in the previous ambiguous grammar:

I. The precedence of operators is not respected. While fig. I properly groups the * before the + operator, fig. II is also a valid parse tree and groups the + ahead of the *. We need to force only the structure of fig. I to be legal in an unambiguous grammar.

II. A sequence of identical operators can group either from the left or from the right. For example, if the *’s in fig(I and II) were replaced by +’s, we would see two different parse trees for the string E + E + E. Since addition and multiplication are associative, it doesn’t matter whether we group from the left or the right, but to eliminate ambiguity, we must pick one. The conventional approach is to insist on grouping from the left, so the structure of fig. II is the only correct grouping of two +-signs

CFG, Lecture 5, slide