Top Banner
Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1
31

Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Dec 26, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Chapter 4

Context-Free Languages

Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

1

Page 2: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Introduction to Computation 2

Using Grammar Rules to Define a Language

• Regular languages and FAs are too simple for many purposes– Using context-free grammars allows us to describe

more interesting languages– Much high-level programming language syntax can be

expressed with context-free grammars– Context-free grammars with a very simple form

provide another way to describe the regular languages• Grammars can be ambiguous• We will study how derivations can be related to the

structure of the string being derived

Page 3: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Using Grammar Rules to Define a Language (cont’d.)

• A grammar is a set of rules, usually simpler than those of English, by which strings in a language can be generated

• Consider the language AnBn = {anbn | n 0}, defined using the recursive definition: AnBn – For every S AnBn, aSb AnBn

• Think of S as a variable representing an arbitrary element, and write these rules as S S aSb (In the process of obtaining an element of AnBn, S can be

replaced by either string)

Introduction to Computation 3

Page 4: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Using Grammar Rules to Define a Language (cont’d.)

• If and are strings, and contains at least one occurrence of S, then means that is obtained from in one step, by using one of the two rules to replace a single occurrence of S by either or aSb

• For example, we could write: S aSb aaSbb aaaSbbb aaabbb

to describe a derivation of the string aaabbb

• We can simplify the rules by using the | symbol to mean “or”, so that the rules become S | aSb

Introduction to Computation 4

Page 5: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Context-Free Grammars: Definitions and More Examples

• Definition: A context-free grammar (CFG) is a 4-tuple G=(V, , S, P), where V and are disjoint finite sets, S V, and P is a finite set of formulas of the form

A , where A V and (V ∪ )*– Elements of are terminal symbols, or terminals, and

elements of V are variables, or nonterminals– S is the start variable, and elements of P are grammar

rules, or productions– We use for productions in a grammar and for a

step in a derivation– The notations n and * refer to n steps and

zero or more steps, respectively

Introduction to Computation 5

Page 6: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Context-Free Grammars: Definitions and More Examples (cont’d.)

• We will sometimes write G to indicate a derivation in a particular grammar G

• means that there are strings 1, 2, and in

(V ∪ )* and a production A in P such that = 1A2 and = 12

– This is a single step in a derivation

• What makes the grammar context-free is that the production above, with left side A, can be applied wherever A occurs in the string (irrespective of the context; i.e., regardless of what 1 and 2 are)

Introduction to Computation 6

Page 7: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Context-Free Grammars: Definitions and More Examples (cont’d.)

• Definition: If G = (V, , S, P) is a CFG, the language generated by G is

L(G) = { x * | S G* x}

(S is the start variable, and x is a string of terminals)

• A language L is a context-free language (CFL) if there is a CFG G with L = L(G)

Introduction to Computation 7

Page 8: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Context-Free Grammars: Definitions and More Examples (cont’d.)

• Consider AEqB = {x {a,b}* | na(x) = nb(x)}

• Let’s develop a CFG for AEqB• If x is a non-null string in AEqB then either x = ay,

where y Lb = {z | nb(z) = na(z) + 1}, or x = by, where y La = {z | na(z) = nb(z) + 1}

– We represent Lb by the variable B and La by the variable A

– The productions so far are S | aB | bA– All we need now are productions for A and B

Introduction to Computation 8

Page 9: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Context-Free Grammars: Definitions and More Examples (cont’d.)

• If a string x La starts with a, then the remainder is a member of AEqB

• If it starts with b, the rest has two more a’s than b’s• Observation: a string containing two more a’s than

b’s must be the concatenation of two strings, each with one more a; similarly with a and b reversed

• The grammar resulting from these observations isS | aB | bAA aS | bAAB bS | aBB

(Note: if A were the start variable, it would generate La)

Introduction to Computation 9

Page 10: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Context-Free Grammars: Definitions and More Examples (cont’d.)

• Theorem 4.9: If L1 and L2 are CFLs over , then so are

L1 ∪ L2, L1L2, and L1*• Suppose G1 and G2 are CFGs that generate L1 and L2

respectively, and assume that they have no variables in common

• Suppose that S1 and S2 are the start variables. Su, Sc and Sk , the start variables of the new grammars, will be new variables.– Gu just adds the rules Su S1 | S2 to G1 and G2

– Gc just adds the rule Sc S1S2 to G1 and G2

– Gk just adds the rules Sk | SkS1 to G1Introduction to Computation 10

Page 11: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Regular Languages and Regular Grammars

• The three operations in Theorem 4.9 are the ones involved in the recursive definition of regular languages

• The “basic” regular languages over , and {}, are easily seen to be CFLs

• Now we can prove by structural induction that every regular language over is a CFL

• In fact, however, the CFG can be of a simpler form.

Definition 4.13: A context-free grammar is regular if every production is of the form A B or A

Introduction to Computation 11

Page 12: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Regular Languages and Regular Grammars (cont’d.)

• Theorem 4.14: For every language L *, L is regular if and only if L = L(G) for some regular grammar G

• Proof: – If L is a regular language, then there is a FA

M=(Q, , q0, A, ) that accepts it

– Define G=(V, , S, P) by letting V be Q, S the initial state q0, and P the set containing the production T aU for every transition (T, a) = U in M and the production T for every accepting state T of M

Introduction to Computation 12

Page 13: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Regular Languages and Regular Grammars (cont’d.)

• G is a regular grammar, and G accepts the same language as M– For every x = a1a2…an, the transitions on these symbols

that start at q0 end at an accepting state if and only if there is a derivation of x in G

• To prove the other direction we can start with a regular grammar G and reverse the construction to produce M– M may be an NFA, but it still accepts L(G), and it

follows that L(G) is regular

Introduction to Computation 13

Page 14: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Derivation Trees and Ambiguity

• So far we’ve been interested in what strings a CFG generates

• It is also useful to consider how a string is generated by a CFG

• A derivation may provide information about the structure of a string, and if a string has several possible derivations, one may be more appropriate than another

• We can draw trees to represent derivations

Introduction to Computation 14

Page 15: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Derivation Trees and Ambiguity (cont’d.)

• The root node represents the start variable S• Any interior node and its children represent a

production A used in the derivation; the node represents A, and the children, from left to right, represent the symbols in .

• Each leaf node represents a symbol or • The string derived is read off from left to right,

ignoring ’s• Every derivation has exactly one derivation tree, but

a tree can represent more than one derivation

Introduction to Computation 15

Page 16: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Derivation Trees and Ambiguity (cont’d.)

• In a derivation, at each step some production is applied to some occurrence of a variable

• Consider a derivation that starts S S + S. We could apply a production to either the first or second of the S’s, but the resulting trees would be the same

• When we talk about a string having several possible derivations, one being more appropriate, we are talking about derivations corresponding to different trees

Introduction to Computation 16

Page 17: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Derivation Trees and Ambiguity (cont’d.)

• We can distinguish between trivially different derivations and essentially different ones by specifying that in a derivation, we always choose the left-most variable to expand

• Definition 4.16: A derivation in a CFG is a leftmost derivation (LMD) if, at each step, a production is applied to the leftmost variable-occurrence in the current string– A rightmost derivation is defined similarly

Introduction to Computation 17

Page 18: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Derivation Trees and Ambiguity (cont’d.)

• Theorem 4.17: If G is a CFG, then for any x L(G) these three statements are equivalent:– x has more than one derivation tree– x has more than one LMD– x has more than one RMD

• Proof: see book• Definition 4.18: A CFG G is ambiguous if, for at least

one x L(G), x has more than one derivation tree (or equivalently, according to Theorem 4.17, more than one LMD)

Introduction to Computation 18

Page 19: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Derivation Trees and Ambiguity (cont’d.)

• A classic example of ambiguity is the dangling else• In C, an if-statement can be defined by

S if ( E ) S | if ( E ) S else S | OS

(where OS stands for “other statement”)• Consider the statement

if (e1) if (e2) f(); else g();– In C, the else to belong to the second if, but this

grammar does not rule out the other interpretation

• The two derivation trees shown on the next slide show the two interpretations of a dangling else

Introduction to Computation 19

Page 20: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Introduction to Computation 20

Page 21: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Derivation Trees and Ambiguity (cont’d.)

• Clearly the grammar given is ambiguous, but there are equivalent grammars that allow only the correct interpretation

• Example:

S S1 | S2

S1 if ( E ) S1 else S1 | OS

S2 if ( E ) S | if ( E ) S1 else S2

Introduction to Computation 21

Page 22: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

22

Derivation Trees and Ambiguity (cont’d.)

Consider the CFG G : S S + S | S * S | (S) | a • G generates simple algebraic expressions • One reason for ambiguity is that the relative

precedence of + and * hasn’t been specified: a+a*a could be interpreted as (a+a)*a or as a+(a*a)

• In fact, S S + S causes ambiguity by itself, because a+a+a could be interpreted as either (a+a)+a or a+(a+a). Similarly for S S * S

• We might try to correct both problems by using the productions S S + T | T T T + F | F

(think of T as “term” and F as “factor”) Introduction to Computation

NDSU
Page 23: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

23

Derivation Trees and Ambiguity (cont’d.)

• * now has higher precedence than + (all the multiplications are performed within a term)

• By making the production S S + T, not S T + S, we make + associate to the left. Similarly for *

• We want parenthetical expressions to be evaluated first; this means we should consider such an expression to be part of a factor. The resulting unambiguous CFG generating L(G) is

S S + T | T T T * F | F F (S) | a (proofs of unambiguity and equivalence are both

somewhat complicated)Introduction to Computation

Page 24: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Simplified Forms and Normal Forms

• Questions about the strings generated by a CFG are sometimes easier to answer if we know something about the form of the productions– For example, if we know that a grammar has no

-productions and no unit productions (A B) we can deduce that no derivation of a string x can take more than 2|x| - 1 steps (see book for details). We could then, in principle, determine whether x can be derived by considering derivations no longer than this

• We show how to modify an arbitrary CFG to have no productions of either of these types

Introduction to Computation 24

Page 25: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Simplified Forms and Normal Forms (cont’d.)

• Suppose we have the production A BCDCB, and can be derived from either B or C. If we get rid of -productions, then the steps that replace B and C by will no longer be possible, but we must still be able to get all the same non-null strings from A

• We must retain the production A BCDCB but we should add A CDCB, A DCB, A BDCB, and so on

• We will need to know what variables can derive (we will call such a variable a nullable variable)

Introduction to Computation 25

Page 26: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Simplified Forms and Normal Forms (cont’d.)

• Definition 4.26: A recursive definition of the set of nullable variables of G– If there is a production A then A is nullable

– If A1, A2, …, Ak are nullable variables and there is a production B A1A2… Ak , then B is nullable

• This leads immediately to an algorithm for identifying the nullable variables

Introduction to Computation 26

Page 27: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Simplified Forms and Normal Forms (cont’d.)

• Theorem 4.27: For every CFG G = (V, , S, P) the following algorithm produces a CFG G1=(V, , S, P1) having no -productions for which L(G1) = L(G) – {}

– Identify the nullable variables in V and initialize P1 to P

– For every production A in P, add to P1 every production obtained by deleting from one or more variable-occurrences involving a nullable variable

– Delete every -production from P1, as well as every production of the form A A

Introduction to Computation 27

Page 28: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Simplified Forms and Normal Forms (cont’d.)

• The procedure we use to eliminate unit productions is similar

• We first identify pairs of variables (A, B) for which A * B (in this case we call B A-derivable); then for each such pair (A, B) and each nonunit production B , we add the production A

• Such pairs can be found as follows:– If A B is a production, then B is A-derivable– If C is A-derivable and C B is a production, then B is

A-derivable– No other variables are A-derivable

Introduction to Computation 28

Page 29: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Simplified Forms and Normal Forms (cont’d.)

• Theorem 4.28: For every CFG G = (V, , S, P) without -productions, the CFG G1=(V, , S, P1) produced by the following algorithm generates the same language as G and has no unit productions:– Initialize P1 to P, and for each A V, identify the A-

derivable variables– For every such pair A B and every nonunit

production B , add the production A to P1

– Delete all unit productions from P1

Introduction to Computation 29

Page 30: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Simplified Forms and Normal Forms (cont’d.)

• Definition 4.29: A CFG is said to be in Chomsky normal form if every production is of one of these two types:

A BC (where B and C are variables)

A (where is a terminal)

• Theorem 4.30: For every context-free grammar G, there is another CFG G1 in Chomsky normal form such that L(G1) = L(G) – {}

• The algorithm on the next slide shows how to generate G1

Introduction to Computation 30

Page 31: Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Simplified Forms and Normal Forms (cont’d.)

• The first step is to eliminate -productions and unit productions

• The second step is to introduce for every terminal symbol a new variable X and production X

• In every production, replace every terminal by its new variable (except for the new productions above)

• Replace a production like A BACB by the productions A BY1, Y1 AY2, Y2 CB, where Y1 and Y2 are new variables

• The resulting CFG is in Chomsky normal formIntroduction to Computation 31