Top Banner
CSCI 3130: Automata theory and formal languages Andrej Bogdanov http://www.cse.cuhk.edu.hk/ ~andrejb/csc3130 The Chinese University of Hong Kong Context-free languages Fall 2010
26

CSCI 3130: Automata theory and formal languages

Jan 19, 2016

Download

Documents

Elma

Fall 2010. The Chinese University of Hong Kong. CSCI 3130: Automata theory and formal languages. Context-free languages. Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130. Context-free grammar. A → 0 A 1 A → B B → #. A , B are variables. 0 , 1 , # are terminals. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CSCI 3130: Automata theory and formal languages

CSCI 3130: Automata theory and formal languages

Andrej Bogdanov

http://www.cse.cuhk.edu.hk/~andrejb/csc3130

The Chinese University of Hong Kong

Context-free languages

Fall 2010

Page 2: CSCI 3130: Automata theory and formal languages

Context-free grammar

A → 0A1A → BB → #

A, B are variables

A 0A1 00A11 000A111

000B111 000#111

0, 1, # are terminals

A is the start variable

this is a derivation

Page 3: CSCI 3130: Automata theory and formal languages

Context-free grammar

• A context-free grammar (CFG) is (V, , R, S) where– V is a finite set of variables or non-terminals is a finite set of terminals (V = )– R is a set of productions or substitution rules of

the form

where A is a variable V and is a string of variables and terminals

– S is a variable called the start variable

A →

Page 4: CSCI 3130: Automata theory and formal languages

The grammar of English

a girl with a flower likes the boy

ART NOUN PREP ART NOUN VERB ART NOUN

SENTENCE

VERB-PHRASENOUN-PHRASE

CMPLX-VERB

PREP-PHRASE NOUN-PHRASE

CMPLX-NOUNCMPLX-NOUN CMPLX-NOUN

Page 5: CSCI 3130: Automata theory and formal languages

The grammar of English

SENTENCE → NOUN-PHRASE VERB-PHRASENOUN-PHRASE → CMPLX-NOUNNOUN-PHRASE → CMPLX-NOUN PREP-PHRASEVERB-PHRASE → CMPLX-VERBVERB-PHRASE → CMPLX-VERB PREP-PHRASEPREP-PHRASE → PREP CMPLX-NOUNCMPLX-NOUN → ARTICLE NOUNCMPLX-VERB → VERB NOUN-PHRASECMPLX-VERB → VERB

ARTICLE → aARTICLE → theNOUN → boyNOUN → girlNOUN → flowerVERB → likesVERB → touchesVERB → seesPREP → with

variables: SENTENCE, NOUN-PHRASE, …

terminals: a, the, boy, girl, flower, likes, touches, sees, with

start variable: SENTENCE

This grammar describes (a part of) English

Page 6: CSCI 3130: Automata theory and formal languages

Derivations in English

SENTENCE → NOUN-PHRASE VERB-PHRASENOUN-PHRASE → CMPLX-NOUNNOUN-PHRASE → CMPLX-NOUN PREP-PHRASEVERB-PHRASE → CMPLX-VERBVERB-PHRASE → CMPLX-VERB PREP-PHRASEPREP-PHRASE → PREP CMPLX-NOUNCMPLX-NOUN → ARTICLE NOUNCMPLX-VERB → VERB NOUN-PHRASECMPLX-VERB → VERB

ARTICLE → aARTICLE → theNOUN → boyNOUN → girlNOUN → flowerVERB → likesVERB → touchesVERB → seesPREP → with NOUN-PHRASE VERB-PHRASE (1)

CPLX-NOUN VERB-PHRASE (2)

(1)(2)(3)(4)(5)(6)(7)(8)(9)

(10)(11)(12)(13)(14)(15)(16)(17)(18)

SENTENCE

ARTICLE NOUN VERB-PHRASE (7)

a NOUN VERB-PHRASE (10)

a boy VERB-PHRASE (12)

a boy CPLX-VERB (4)

a boy VERB (9)

a boy sees (17)

Page 7: CSCI 3130: Automata theory and formal languages

Grammars for programming languages

E E + E

E E * E

E (E)

E 0

E 1

E 9

Variables: ETerminals: +*()0123456789

E * E (E) * E

E

(E + E) * E (2 + E) * E (2 + 3) * E (2 + 3) * 5

meaning: “add 2 and 3, and then multiply by 5”

bash-3.2$ python Python 2.6.5 (r265:79359, Mar 24 2010, 01:32:55) >>> (2+3)*525

Page 8: CSCI 3130: Automata theory and formal languages

Notation and conventions

E E + E

E E * E

E (E)

E N

E E + E | E * E | (E) | N

N 0N | 1N | 0 | 1

Variables: E, N

Terminals: +, *, (, ), 0, 1

Start variable: E

N 0N

N 1N

N 0

N 1

Variables in UPPERCASE

Start variable comes first

conventions:shorthand:

Page 9: CSCI 3130: Automata theory and formal languages

Derivation

• A derivation is a sequential application of productions:

E

deri

vati

on

E * E (E)* E (E)* N (E + E)* 1 (E + E)* 1 (E + N)* 1 (N + N)* 1 (N + 1N)* 1 (N + 10)* 1 (1 + 10)* 1

obtained from in one production

* obtained from in zero or more productions

E E + E | E * E | (E) | N

N 0N | 1N | 0 | 1

E (1 + 10)* 1*

Page 10: CSCI 3130: Automata theory and formal languages

Context-free languages

• The language of a CFG is the set of all strings of terminals that can be derived from the start variable

L(G) = {w : w * and S w }*

• Questions we will ask:

I give you a CFG, what is the language?

I give you a language, write a CFG for it

Page 11: CSCI 3130: Automata theory and formal languages

Analysis example 1

• Can you derive:

A → 0A1 | BB → #

00#11

00#111

00##11

#

A 0A1 00A11 00B11 00#11

A B #

No, there is an uneven number of 0s and 1s

No, there are too many #

L(G) = {0n#1n: n ≥ 0}

Page 12: CSCI 3130: Automata theory and formal languages

Analysis example 1

• Can you derive:

• What is the language of this CFG?

A → 0A1 | BB → #

variables: A, Bterminals: 0, 1, # start variable: A

L = {0n#1n: n ≥ 0}

00#11

00#111

00##11

#

Page 13: CSCI 3130: Automata theory and formal languages

Analysis example 2

• Can you derive

S SS | (S) |

S (S) (2) () (3)

S (S)

(SS) ((S)S) ((S)

(S)) (()(S)) (()())() (()())

Page 14: CSCI 3130: Automata theory and formal languages

Parse trees

• A parse tree gives a more compact representation:

S (S)

(SS) ((S)S) ((S)

(S)) (()(S)) (()())(()())

S

S SS | (S) |

S S

( )S

( )S

S( )

Page 15: CSCI 3130: Automata theory and formal languages

Parse trees

S (S) (SS) ((S)S) ((S)(S)) (()(S)) (()())

S

S S

( )S

S( )

• One parse tree can represent several derivations

( )S

S (S) (SS) ((S)S) (()S) (()(S)) (()())

S (S) (SS) (S(S)) ((S)(S)) (()(S)) (()())

S (S) (SS) (S(S)) (S()) ((S)()) (()())

Page 16: CSCI 3130: Automata theory and formal languages

Analysis example 2

• Can you derive

S SS | (S) |

(()() No, because there is an unevennumber of ( and )

())()) No, because there is a prefixwith an excess of )

Page 17: CSCI 3130: Automata theory and formal languages

Analysis example 2

S SS | (S) | L(G) = {w:

w has the same number of ( and )

no prefix of w has more )than(}

( ( ) ( ) ) ( )

Parsing rules:

Divide w up in blocks withsame number of ( and )

Each block is in L(G)

Parse each block recursively

S S

S

S

S SS

SS

Page 18: CSCI 3130: Automata theory and formal languages

Design example 1

L = {0n1n | n 0}

S

These strings have recursive structure:

0000001111110000011111

000011110001110011

01

0S1|

Page 19: CSCI 3130: Automata theory and formal languages

Design example 2

L = numbers without leading zeros

0, 109, 2, 23 , 01, 003allowed not allowed

L → 1|2|3|4|5|6|7|8|9

S → 0|LN

D → 0|L

N → ND|1052870032

any number N

leading digit L

Page 20: CSCI 3130: Automata theory and formal languages

Design examples

L = {0n1n0m1m | n 0, m 0}

These strings have two parts:

L1 = {0n1n | n 0}L2 = {0m1m | m 0}

L = L1L2

rules for L1: S1 0S11|

L2 is the same as L1

S S1S1

S1 0S11 |

010011

00011100110011

Page 21: CSCI 3130: Automata theory and formal languages

Design examples

L = {0n1m0m1n | n 0, m 0}

These strings have nested structure:

inner part: 1m0m

outer part: 0n1n

S 0S1|II 1I0 |

011001

11000011

00110011

Page 22: CSCI 3130: Automata theory and formal languages

Design examples

L = {x: x has two 0-blocks with same number of 0s}

01011, 001011001, 1001010100101001000, 01111allowed not allowed

10010011010010110initial partmiddle part final part

A B C

A: , or ends in 1

C: , or begins with 1

Page 23: CSCI 3130: Automata theory and formal languages

Design examples

10010011010010110A B C

A: , or ends in 1

C: , or begins with 1

A → | U1

U → 0U | 1U | C → | 1U

D → 1U1 | 1

S → ABC B has recursive structure:

00110100D

same number of 0sat least one 0

B → 0D0 | 0B0

U: any string

D: begins and ends in 1

Page 24: CSCI 3130: Automata theory and formal languages

Context-free versus regular

• Write a CFG for the language (0 + 1)*111

• Can you do so for every regular language?

S U111U 0U | 1U |

Every regular language is context-free

regularexpression

DFANFA

Page 25: CSCI 3130: Automata theory and formal languages

From regular to context-free

regular expression

a (alphabet symbol)

E1 + E2

CFG

E1E2

E1*

grammar with no rules

S→

S →a

S→ S1 | S2

S→ S1S2

S→ SS1 |

In all cases, S becomes the new start symbol

Page 26: CSCI 3130: Automata theory and formal languages

Context-free versus regular

• Is every context-free language regular?

S → 0S1 | L = {0n1n: n ≥ 0}

Is context-free but not regular

regular context-free