Computational Models - Lecture 4tau-cm2019.wdfiles.com/local--files/course-schedule/Lecture-4.pdf · Computational Models - Lecture 4 Handout Mode Roded Sharan. Tel Aviv University.

Post on 26-Aug-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Computational Models - Lecture 4

Handout Mode

Roded Sharan.

Tel Aviv University.

March, 2019

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 1 / 45

Talk Outline

I Context Free Grammars/Languages (CFG/CFL)

I Chomsky Normal Form (CNF)

I Checking membership in a CNF grammar

I Stochastic CFG

I Sipser’s book, 2.1

I Additional reading: Hopcroft, 5.4 & 7.1; Durbin, 9.6

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 2 / 45

Context Free Grammars (CFG)

An example of a context free grammar, G1:

I A→ 0A1

I A→ B

I B → #

Terminology:

I Each line is a substitution rule or production.

I Each rule has the form: symbol→ string.The left-hand symbol is a variable (usually upper-case).

I A string consists of variables and terminals.

I One variable is the start variable (lhs of top rule). In this case, it is A.

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 3 / 45

Rules for generating strings

I Write down the start variable.

I Pick a variable written down in current string and a derivation that startswith that variable.

I Replace that variable with right-hand side of that derivation.

I Repeat until no variables remain.

I Return final string (concatenation of terminals).

Process is inherently non deterministic.

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 4 / 45

Example

Grammar G1:I A→ 0A1I A→ B

I B → #

Derivation with G1:

A → 0A1→ 00A11→ 000A111→ 000B111→ 000#111

Question 1What strings can be generated in this way from the grammar G1?

Answer: Exactly those of the form 0n#1n (n ≥ 0).Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 5 / 45

Context-Free Languages (CFL)

The language generated in this way is called the language of the grammar.

For example, L(G1) = {0n#1n : n ≥ 0}.

Any language generated by a context-free grammar is called a context-freelanguage.

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 6 / 45

A useful abbreviation

Rules with same variable on left hand side

A → 0A1A → B

are written as:

A→ 0A1 | B

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 7 / 45

English-like sentences

A grammar G2 to describe a few English sentences:

< SENTENCE > → < NP >< VERB >

< NP > → < ARTICLE >< NOUN >

< NOUN > → boy | girl | flower< ARTICLE > → a | the

< VERB > → touches | likes | sees

A specific derivation in G2:

< SENTENCE > → < NP >< VERB >

→ < ARTICLE >< NOUN >< VERB >

→ a < NOUN >< VERB >

→ a boy < VERB >

→ a boy sees

More strings generated by G2: a flower sees, the girl touches

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 8 / 45

Formal definition

A context-free grammar is a 4-tuple (V ,Σ,R,S), where

I V is a finite set of variables

I Σ is a finite set of terminals (V ∩ Σ = ∅)I R is a finite set of rules of the form A→ x , where A ∈ V and

x ∈ (V ∪ Σ)∗.

I S ∈ V is the start symbol.

I Let u, v ∈ (V ∪ Σ)∗. If (A→ w) ∈ R, then uAv yields uwv , denoteduAv → uwv .

I u ∗→ v if u = v , or u → u1 → . . .→ uk → v for some sequenceu1,u2, . . . ,uk

Note that if A ∗→ xBy and B ∗→ z, then A ∗→ xzy .

Definition 2

The language of the grammar G, denoted L(G), is {w ∈ Σ∗ : S ∗→ w}

where ∗→ is determined by G.Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 9 / 45

Example 1

G3 = ({S}, {a,b},R,S).

R (Rules): S → aSb | SS | ε

Some words in the language: aabb, aababb.

Question 3What is this language?

Hint: Think of parentheses: i.e., a is "(" and b is ")". (()), (()())

Using larger alphabet (i.e., more terminals), ([]()), represent well formedprograms with many kinds of nested loops, "if then/else" statements.

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 10 / 45

Example 2

G4 = ({S}, {a,b},R,S).

R (Rules): S → aSa | bSb | ε

Some words in the language: abba,aabaabaa.

Question 4What is this language?

L(G4) = {wwR : w ∈ {a,b}∗} (almost but not quite the set of palindromes)

Proving L(G4) = {wwR : w ∈ {a,b}∗}:

We show?

1. {wwR} ⊆ L(G4).

2. L(G4) ⊆ {wwR}.

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 11 / 45

Proving {wwR} ⊆ L(G4).

We prove by induction on the length of z, that z ∈ {wwR} =⇒ z ∈ L(G4).

I Base: |z| = 0. Since (S → ε) ∈ R, it holds that z = ε ∈ L(G4).

I Inductive step. Let z = wwR be a word of size 2k .

I Let w = σw ′ (hence |w ′| = k − 1)I Hence, z = σw ′(w ′)Rσ (since z = wwR)I By i.h. S ∗→ w ′(w ′)R

I Hence, S → σSσ ∗→ σw ′(w ′)Rσ = z

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 12 / 45

Proving z ∈ L(G4) =⇒ z ∈ {wwR}We prove by induction on the number of derivations steps used to derive z,that z ∈ L(G4) =⇒ z ∈ {wwR}.

I Single derivation. Only possible derivation is S → ε, and indeedε ∈ {wwR}.

I Inductive step. Assume S ∗→ z in k > 1 derivations steps.

I First derivation is S → σSσ for σ ∈ {a,b}I Hence z = σz ′σ, and z ′ is derived from S using k − 1 stepsI By i.h., z ′ = w ′(w ′)R

I Hence, z = σw ′(w ′)Rσ = wwR for w = σw ′

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 13 / 45

Example 3

G5 = ({S,A,B}, {a,b},R,S)

R (Rules):S → aB | bA | εA→ a | aS | bAAB → b | bS | aBB

Some words in the language: aababb, baabba.

Question 5What is this language?

L(G5) = {w ∈ {a,b}∗ : #a(w) = #b(w)}

#x (w) — number of occurrences of x in w

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 14 / 45

Proving L(G5) = L = {w ∈ {a,b}∗ : #a(w) = #b(w)}Again we prove

1. w ∈ L(G5) =⇒ w ∈ L.

2. w ∈ L =⇒ w ∈ L(G5).

Claim 6 (Implies 1.)

If S ∗→ w ∈ {a,b,A,B}∗, then #a(w) + #A(w) = #b(w) + #B(w).

Proof: DIY

Claim 7 (Implies 2.)

Let w ∈ {a,b}∗ and k = k(w) = #a(w)−#b(w). Then:

1. If k = 0, then S ∗→ wS

2. If k > 0, then S ∗→ wBk

3. If k < 0, then S ∗→ wA|k|

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 15 / 45

Proving Claim 7Let w ∈ {a, b}∗ and k = k(w) = #a(w)−#b(w). Then:

1. If k = 0, then S ∗→ wS

2. If k > 0, then S ∗→ wBk

3. If k < 0, then S ∗→ wA|k|

Proof by induction on |w |:

I Basis: w = ε then k(w) = 0. Since S ∗→ S, then S ∗→ εS

I Induction step: for w ∈ {a,b}n, write w = w ′σ (|w ′| = n − 1)

Assume for concreteness (other cases proved analogously)

1. σ=a2. k ′ = k(w ′) = (#a(w ′)−#b(w ′)) = k − 1>0

Then

I By i.h., S ∗→ w ′Bk ′

I Hence, S ∗→ w ′Bk ′= w ′BBk ′−1 ∗→ w ′aBBBk ′−1 = wBk ′+1 = wBk

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 16 / 45

Designing CFGs

No recipe in general, but few rules-of-thumb

I If CFG is the union of several CFGs, rename variables (not terminals) sothey are disjoint, and add new rule S → S1 | S2 | . . . | Si .

I For languages (like {0n#1n : n ≥ 0} ), with linked substrings, a rule ofform R → uRv is helpful to force desired relation between substrings.

I For a regular language, grammar “follows” a DFA for the language (seenext frame).

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 17 / 45

CFG for regular languages

Given a DFA : M = (Q,Σ, δ,q0,F )

CFG G for L(M)− {ε}:

1. Let R0 be the starting variable

2. Add rule Ri → aRj , for any qi ,qj ∈ Q and a ∈ Σ with δ(qi ,a) = qj

3. Add rule Ri → a, for any qi ∈ Q and a ∈ Σ s.t. δ(qi ,a) ∈ F .

Claim 8L(G) = L(M)− {ε}

Proof? DIY

We can also show the opposite: any grammar of this form generates aregular language.

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 18 / 45

Section 1

Parse Trees

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 19 / 45

Parse trees

Definition 9 (parse tree)

A labeled tree T is a parse tree of CFG G = (V ,Σ,R,S), ifI Inner nodes labels by elements of V .

I Leaves are labeled by elements of V ∪ Σ ∪ {ε}.I If n1, . . . ,nk are the labels of the direct descendants (from left to right) of

a node labeled A, then (A→ n1, . . . ,nk ) ∈ R

The yield of T , are the labels of all its leaves ordered written from left to right.

Theorem 10

Let G = (V ,Σ,R,S) be a CFG, let A ∈ V and let x ∈ (Σ∪V )∗. Then A ∗→ x iffG has a parse tree with root labeled by A, that yields x.

Proof? DIY

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 20 / 45

Example 1

A→ 0A1|B

B → #

The (unique) derivation sequence from A to 000#111:A→ 0A1→ 00A11→ 000A111→ 000B111→ 000#111

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 21 / 45

Example 2< SENTENCE > → < NP >< VERB >

< NP > → < ARTICLE >< NOUN >

< NOUN > →boy | girl | flower

< ARTICLE > →a | the

< VERB > →touches | likes | sees

SENTENCE

NOUN-PHRASE VERB

ARTICLE NOUN

a boy sees

Two derivation sequences from < SENTENCE > to aboysees:

I < SENTENCE >→< NP >< VERB >→< ARTICLE >< NOUN ><VERB >→ a < NOUN >< VERB >→ aboy < VERB >→ aboysees

I < SENTENCE >→< NP >< VERB >→< ARTICLE >< NOUN ><VERB >→< ARTICLE >< NOUN > sees →< ARTICLE > boysees →aboyseesRoded Sharan (TAU) Computational Models, Lecture 4 March, 2019 22 / 45

Conclusion

I Might be that several derivation sequences (i.e., S → . . .→ w) show thatw ∈ L(G), but fewer (or even single) trees rooted by S that yield w .

I A parse tree is indifferent to derivation order.

I We will focus on left derivations, where at every step the left-mostvariable is replaced.

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 23 / 45

Section 2

Ambiguity in Context-Free Languages

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 24 / 45

Ambiguity in CFLs

Grammar G: E → E + E | E × E | (E) | a

aXa+a

EEE

E

E

aXa+a

EEE

E

E

Is this a problem?Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 25 / 45

Ambiguity in CFLs cont.

Consider the grammar G′ = (V ,Σ,R,E), where

I V = {E ,T ,F}I Σ = {a,+,×, (, )}

I Rules:E → E + T | TT → T × F | FF → (E) | a

Claim 11L(G′) = L(G)

Proof: By induction on derivation length.

But G′ is not ambiguous. Proof (see Hopcroft 5.4, for a similar grammar)

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 26 / 45

Parsing tree of G′ for a + a× a

G′:

E → E + T | TT → T × F | FF → (E) | a

aXa+a

FFF

TT

TE

E

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 27 / 45

Parsing tree of G′ for (a + a)× a

G′:

E → E + T | TT → T × F | FF → (E) | a

( a + aX)a

F F F

T T

E

E

F

T

T

E

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 28 / 45

Ambiguity

Definition 12A string w is derived ambiguously from grammar G, if w has two or moredifferent parse trees that generate it from G. A CFG is ambiguous, if itambiguously derives a string.

I Ambiguity is usually not only a syntactic notion but also semantic,implying multiple meanings for the same string. Think of a + a× a fromlast grammar.

I It is sometime possible to eliminate ambiguity by finding a differentcontext free grammar generating the same language. This is true for thearithmetic expressions grammar.

I Some languages are inherently ambiguous.I Example: L = {anbncmdm : n,m ≥ 1} ∪ {anbmcmdn : n,m ≥ 1}I L is a CFL. Proof? DYII L is inherently ambiguous. Proof?

The problematic strings are of the form anbncndn, see Hopcroft

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 29 / 45

Section 3

Checking membership in CFLs

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 30 / 45

Checking membership in CFLs

Challenge

Given a CFG G and a string w , decide whether w ∈ L(G)?

Initial Idea: Design an algorithm that tries all derivations.

Problem: If G does not generate w , we’ll never stop.

Possible solution: Use special grammars that are:

I just as expressive!

I better for checking membership.

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 31 / 45

Section 4

Chomsky Normal Form (CNF)

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 32 / 45

Chomsky Normal Form (CNF)

A simplified, canonical form of context free grammars.

G = (V ,Σ,R,S) is in a CNF, if every rule in in R has one of the followingforms:

A→ a, A ∈ V ∧ a ∈ ΣA→ BC, A ∈ V ∧ B,C ∈ V \ {S}S → ε.

Simpler to analyze: each derivation adds (at most) a single terminal, S onlyappears once, ε appears only at the empty word

What does parse tree look like?

“Most” internal nodes are of degree 2, except parents of leaves that havedegree 1.

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 33 / 45

Generality of CNF

Theorem 13Any context-free language is generated by a context-free grammar inChomsky Normal Form.

Proof Idea:

I Add new start symbol S0.

I Eliminate all ε rules of the form A→ ε.

I Eliminate all “unit” rules of the form A→ B.

I Convert remaining “long rules” to proper form.

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 34 / 45

Add new start symbol

Add new start symbol S0 and rule S0 → S

(Guarantees that new start symbol is never on right hand side of a rule)e.g.

S → A | ab | εA → baA | S

becomes

S0 → SS → A | ab | εA → baA | S

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 35 / 45

Convert "long rules": terminals

S → ccAbA | bc | bA → a | bb

becomes

S → CCABA | BC | bA → a | BBB → bC → c

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 36 / 45

Convert "long rules": multiple nonterminals

S → AAAB

becomes

S → AN1

N1 → AN2

N2 → AB

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 37 / 45

Eliminate "ε-rules"

Repeat until all A→ ε (A 6= S) rules are gone:

I remove A→ ε

I for any rule of form C → AB or C → BA: add C → B.

I for any rule of form C → AA: add C → A and C → ε (unless C → ε hasalready been removed).

I for any rule of form C → A: add C → ε (unless C → ε has already beenremoved.)

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 38 / 45

Eliminate "unit rules"

Repeat until all unit rules removed

I remove some A→ B

I for each B → U (where U ∈ (Σ ∪ Γ)∗), add A→ U (unless A→ U was apreviously removed unit rule)

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 39 / 45

CNF: Example

S → ASA | aBA → B | SB → b | ε

Is transformed into:

S0 → AA1 | UB | a | SA | ASS → AA1 | UB | a | SA | ASA → b | AA1 | UB | a | SA | AS

A1 → SAU → aB → b

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 40 / 45

CNF has bounded derivation length

Lemma 14Let G be a CFG in CNF and let w ∈ L(G) be with |w | = n ≥ 1. Then everyderivation of w by G is of length 2n − 1.

Proof? By induction on |w |:I Base case is clear.

I For |w | > 1, derivation must start with S → A1A2 where each of Ai yieldsa non-empty substring wi of w and w = w1w2.

I By i.h. the derivation length is 1 + (2|w1| − 1) + (2|w2| − 1) = 2|w | − 1.

We next use CNF to check whether w ∈ L(G).

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 41 / 45

Section 5

Checking Membership for CFGs in ChomskyNormal Form

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 42 / 45

Checking membership for CFG in CNF formLet G = (V ,Σ,R,S) be CFG in CNF and let A ∈ V and w ∈ Σ∗

Algorithm 15 (Derive(A,w))

I w = ε: if (A→ ε) ∈ R return TRUE, otherwise return FALSE.

I |w | = 1: if (A→ w) ∈ R return TRUE, otherwise return FALSE.

I |w | > 1: for each (A→ BC) ∈ R and non-trivial partition w = w1w2:

I Call Derive(B,w1) and Derive(C,w2).I Return TRUE if both return TRUE.

I Return FALSE.

Claim: A ∗→ w ⇐⇒ Derive(A,w) = TRUE. Proof?

⇒: A ∗→ w =⇒ Derive(A,w) = TRUE, by induction on # of derivation steps.

⇐: Derive(A,w) = TRUE =⇒ A ∗→ w , by induction on |w |.

I Hence, Derive(S,w) = TRUE⇐⇒ w ∈ L(G).I Procedure Derive can also output a parse tree for wI Where have we used the fact that G is in CNF?

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 43 / 45

Time complexity of Derive

What is the time complexity T : N 7→ N of Derive?

I Each recursive call tests |R| rules and n partitions.I T (n) ≤ |R| · n · 2T (n − 1)

I T (n) ∈ O((|R| · n)n).

Exponential :-(

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 44 / 45

Efficient variant

I Keep in memory the results of Derive(A,w).

I Number of (different) possible inputs: |V | · n2. (?)=⇒ Only |V | · n2 calls, each takes O(|R| · n).=⇒ T (n) ∈ O(|R| · n3 · |V |).

I Polynomial time!

I This approach is called Dynamic Programming

Basic idea:

I If number of different inputs is limited, say I(n).I Each run (excluding recursive calls) takes at most R(n) timeI Total running time is bounded by T (n) ≤ R(n) · I(n).

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 45 / 45

The CYK Parsing Algorithm

j

k

i ( i, j )

i

j

(i, k)

( k+1, j )

Cell (X, l, m) is True if X can derive the substring (l,m)

(1, n )

A→BC

A

C

B

B. Majoros, Duke 1

(Cocke and Schwartz, 1970; Younger, 1967; Kasami, 1965)

Stochastic CFG A stochastic context-free grammar (SCFG) is a CFG plus a probability distribution on productions:

G = (V, Σ, R, S, Pp)

where Pp : R a [0,1], and probabilities are normalized at the level of each nonterminal X: ∀[ ∑ Pp(X→λ)=1 ] X∈V

X→λ

The probability of a derivation S⇒*x is the product of the probabilities for all its productions: ∏ i P(Xi→λi)

B. Majoros, Duke 2

CYK for SCFG

D(i,j,v) – probability of optimal parse tree with root v of xi … xj Initialization: ∀ 1≤i≤n,v D(i,i,v)=P(v → xi) Iteration: D(i,j,v)=max{y,z,k} [D(i,k,y) * D(k+1,j,z) * P(v →yz)] Termination: P(x,p*) = D(1,L,S) where p* is the optimal parsing Complexity: O(L3M3) time; O(L2M) memory (L-length; M-#non-terminals) [compare to HMM: O(LM2) time, O(LM) memory]

3

top related