CSCI 3130: Automata theory and formal languages Andrej Bogdanov http://www.cse.cuhk.edu.hk/ ~andrejb/csc3130 The Chinese University of Hong Kong Context-free languages Fall 2010
Jan 19, 2016
CSCI 3130: Automata theory and formal languages
Andrej Bogdanov
http://www.cse.cuhk.edu.hk/~andrejb/csc3130
The Chinese University of Hong Kong
Context-free languages
Fall 2010
Context-free grammar
A → 0A1A → BB → #
A, B are variables
A 0A1 00A11 000A111
000B111 000#111
0, 1, # are terminals
A is the start variable
this is a derivation
Context-free grammar
• A context-free grammar (CFG) is (V, , R, S) where– V is a finite set of variables or non-terminals is a finite set of terminals (V = )– R is a set of productions or substitution rules of
the form
where A is a variable V and is a string of variables and terminals
– S is a variable called the start variable
A →
The grammar of English
a girl with a flower likes the boy
ART NOUN PREP ART NOUN VERB ART NOUN
SENTENCE
VERB-PHRASENOUN-PHRASE
CMPLX-VERB
PREP-PHRASE NOUN-PHRASE
CMPLX-NOUNCMPLX-NOUN CMPLX-NOUN
The grammar of English
SENTENCE → NOUN-PHRASE VERB-PHRASENOUN-PHRASE → CMPLX-NOUNNOUN-PHRASE → CMPLX-NOUN PREP-PHRASEVERB-PHRASE → CMPLX-VERBVERB-PHRASE → CMPLX-VERB PREP-PHRASEPREP-PHRASE → PREP CMPLX-NOUNCMPLX-NOUN → ARTICLE NOUNCMPLX-VERB → VERB NOUN-PHRASECMPLX-VERB → VERB
ARTICLE → aARTICLE → theNOUN → boyNOUN → girlNOUN → flowerVERB → likesVERB → touchesVERB → seesPREP → with
variables: SENTENCE, NOUN-PHRASE, …
terminals: a, the, boy, girl, flower, likes, touches, sees, with
start variable: SENTENCE
This grammar describes (a part of) English
Derivations in English
SENTENCE → NOUN-PHRASE VERB-PHRASENOUN-PHRASE → CMPLX-NOUNNOUN-PHRASE → CMPLX-NOUN PREP-PHRASEVERB-PHRASE → CMPLX-VERBVERB-PHRASE → CMPLX-VERB PREP-PHRASEPREP-PHRASE → PREP CMPLX-NOUNCMPLX-NOUN → ARTICLE NOUNCMPLX-VERB → VERB NOUN-PHRASECMPLX-VERB → VERB
ARTICLE → aARTICLE → theNOUN → boyNOUN → girlNOUN → flowerVERB → likesVERB → touchesVERB → seesPREP → with NOUN-PHRASE VERB-PHRASE (1)
CPLX-NOUN VERB-PHRASE (2)
(1)(2)(3)(4)(5)(6)(7)(8)(9)
(10)(11)(12)(13)(14)(15)(16)(17)(18)
SENTENCE
ARTICLE NOUN VERB-PHRASE (7)
a NOUN VERB-PHRASE (10)
a boy VERB-PHRASE (12)
a boy CPLX-VERB (4)
a boy VERB (9)
a boy sees (17)
Grammars for programming languages
E E + E
E E * E
E (E)
E 0
E 1
…
E 9
Variables: ETerminals: +*()0123456789
E * E (E) * E
E
(E + E) * E (2 + E) * E (2 + 3) * E (2 + 3) * 5
meaning: “add 2 and 3, and then multiply by 5”
bash-3.2$ python Python 2.6.5 (r265:79359, Mar 24 2010, 01:32:55) >>> (2+3)*525
Notation and conventions
E E + E
E E * E
E (E)
E N
E E + E | E * E | (E) | N
N 0N | 1N | 0 | 1
Variables: E, N
Terminals: +, *, (, ), 0, 1
Start variable: E
N 0N
N 1N
N 0
N 1
Variables in UPPERCASE
Start variable comes first
conventions:shorthand:
Derivation
• A derivation is a sequential application of productions:
E
deri
vati
on
E * E (E)* E (E)* N (E + E)* 1 (E + E)* 1 (E + N)* 1 (N + N)* 1 (N + 1N)* 1 (N + 10)* 1 (1 + 10)* 1
obtained from in one production
* obtained from in zero or more productions
E E + E | E * E | (E) | N
N 0N | 1N | 0 | 1
E (1 + 10)* 1*
Context-free languages
• The language of a CFG is the set of all strings of terminals that can be derived from the start variable
L(G) = {w : w * and S w }*
• Questions we will ask:
I give you a CFG, what is the language?
I give you a language, write a CFG for it
Analysis example 1
• Can you derive:
A → 0A1 | BB → #
00#11
00#111
00##11
#
A 0A1 00A11 00B11 00#11
A B #
No, there is an uneven number of 0s and 1s
No, there are too many #
L(G) = {0n#1n: n ≥ 0}
Analysis example 1
• Can you derive:
• What is the language of this CFG?
A → 0A1 | BB → #
variables: A, Bterminals: 0, 1, # start variable: A
L = {0n#1n: n ≥ 0}
00#11
00#111
00##11
#
Analysis example 2
• Can you derive
S SS | (S) |
S (S) (2) () (3)
S (S)
(SS) ((S)S) ((S)
(S)) (()(S)) (()())() (()())
Parse trees
• A parse tree gives a more compact representation:
S (S)
(SS) ((S)S) ((S)
(S)) (()(S)) (()())(()())
S
S SS | (S) |
S S
( )S
( )S
S( )
Parse trees
S (S) (SS) ((S)S) ((S)(S)) (()(S)) (()())
S
S S
( )S
S( )
• One parse tree can represent several derivations
( )S
S (S) (SS) ((S)S) (()S) (()(S)) (()())
S (S) (SS) (S(S)) ((S)(S)) (()(S)) (()())
S (S) (SS) (S(S)) (S()) ((S)()) (()())
Analysis example 2
• Can you derive
S SS | (S) |
(()() No, because there is an unevennumber of ( and )
())()) No, because there is a prefixwith an excess of )
Analysis example 2
S SS | (S) | L(G) = {w:
w has the same number of ( and )
no prefix of w has more )than(}
( ( ) ( ) ) ( )
Parsing rules:
Divide w up in blocks withsame number of ( and )
Each block is in L(G)
Parse each block recursively
S S
S
S
S SS
SS
Design example 1
L = {0n1n | n 0}
S
These strings have recursive structure:
0000001111110000011111
000011110001110011
01
0S1|
Design example 2
L = numbers without leading zeros
0, 109, 2, 23 , 01, 003allowed not allowed
L → 1|2|3|4|5|6|7|8|9
S → 0|LN
D → 0|L
N → ND|1052870032
any number N
leading digit L
Design examples
L = {0n1n0m1m | n 0, m 0}
These strings have two parts:
L1 = {0n1n | n 0}L2 = {0m1m | m 0}
L = L1L2
rules for L1: S1 0S11|
L2 is the same as L1
S S1S1
S1 0S11 |
010011
00011100110011
Design examples
L = {0n1m0m1n | n 0, m 0}
These strings have nested structure:
inner part: 1m0m
outer part: 0n1n
S 0S1|II 1I0 |
011001
11000011
00110011
Design examples
L = {x: x has two 0-blocks with same number of 0s}
01011, 001011001, 1001010100101001000, 01111allowed not allowed
10010011010010110initial partmiddle part final part
A B C
A: , or ends in 1
C: , or begins with 1
Design examples
10010011010010110A B C
A: , or ends in 1
C: , or begins with 1
A → | U1
U → 0U | 1U | C → | 1U
D → 1U1 | 1
S → ABC B has recursive structure:
00110100D
same number of 0sat least one 0
B → 0D0 | 0B0
U: any string
D: begins and ends in 1
Context-free versus regular
• Write a CFG for the language (0 + 1)*111
• Can you do so for every regular language?
S U111U 0U | 1U |
Every regular language is context-free
regularexpression
DFANFA
From regular to context-free
regular expression
a (alphabet symbol)
E1 + E2
CFG
E1E2
E1*
grammar with no rules
S→
S →a
S→ S1 | S2
S→ S1S2
S→ SS1 |
In all cases, S becomes the new start symbol
Context-free versus regular
• Is every context-free language regular?
S → 0S1 | L = {0n1n: n ≥ 0}
Is context-free but not regular
regular context-free