Computational Models - Lecture 4tau-cm2019.wdfiles.com/local--files/course-schedule/Lecture-4.pdf · Computational Models - Lecture 4 Handout Mode Roded Sharan. Tel Aviv University.

Computational Models - Lecture 4

Handout Mode

Roded Sharan.

Tel Aviv University.

March, 2019

Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 1 / 45

Talk Outline

I Context Free Grammars/Languages (CFG/CFL)

I Chomsky Normal Form (CNF)

I Checking membership in a CNF grammar

I Stochastic CFG

I Sipser’s book, 2.1

I Additional reading: Hopcroft, 5.4 & 7.1; Durbin, 9.6

Context Free Grammars (CFG)

An example of a context free grammar, G1:

I A→ 0A1

I A→ B

I B → #

Terminology:

I Each line is a substitution rule or production.

I Each rule has the form: symbol→ string.The left-hand symbol is a variable (usually upper-case).

I A string consists of variables and terminals.

I One variable is the start variable (lhs of top rule). In this case, it is A.

Rules for generating strings

I Write down the start variable.

I Pick a variable written down in current string and a derivation that startswith that variable.

I Replace that variable with right-hand side of that derivation.

I Repeat until no variables remain.

I Return final string (concatenation of terminals).

Process is inherently non deterministic.

Example

Grammar G1:I A→ 0A1I A→ B

I B → #

Derivation with G1:

A → 0A1→ 00A11→ 000A111→ 000B111→ 000#111

Question 1What strings can be generated in this way from the grammar G1?

Answer: Exactly those of the form 0n#1n (n ≥ 0).Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 5 / 45

Context-Free Languages (CFL)

The language generated in this way is called the language of the grammar.

For example, L(G1) = {0n#1n : n ≥ 0}.

Any language generated by a context-free grammar is called a context-freelanguage.

A useful abbreviation

Rules with same variable on left hand side

A → 0A1A → B

are written as:

A→ 0A1 | B

English-like sentences

A grammar G2 to describe a few English sentences:

< SENTENCE > → < NP >< VERB >

< NP > → < ARTICLE >< NOUN >

< NOUN > → boy | girl | flower< ARTICLE > → a | the

< VERB > → touches | likes | sees

A specific derivation in G2:

< SENTENCE > → < NP >< VERB >

→ < ARTICLE >< NOUN >< VERB >

→ a < NOUN >< VERB >

→ a boy < VERB >

→ a boy sees

More strings generated by G2: a flower sees, the girl touches

Formal definition

A context-free grammar is a 4-tuple (V ,Σ,R,S), where

I V is a finite set of variables

I Σ is a finite set of terminals (V ∩ Σ = ∅)I R is a finite set of rules of the form A→ x , where A ∈ V and

x ∈ (V ∪ Σ)∗.

I S ∈ V is the start symbol.

I Let u, v ∈ (V ∪ Σ)∗. If (A→ w) ∈ R, then uAv yields uwv , denoteduAv → uwv .

I u ∗→ v if u = v , or u → u1 → . . .→ uk → v for some sequenceu1,u2, . . . ,uk

Note that if A ∗→ xBy and B ∗→ z, then A ∗→ xzy .

Definition 2

The language of the grammar G, denoted L(G), is {w ∈ Σ∗ : S ∗→ w}

where ∗→ is determined by G.Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 9 / 45

Example 1

G3 = ({S}, {a,b},R,S).

R (Rules): S → aSb | SS | ε

Some words in the language: aabb, aababb.

Question 3What is this language?

Hint: Think of parentheses: i.e., a is "(" and b is ")". (()), (()())

Using larger alphabet (i.e., more terminals), ([]()), represent well formedprograms with many kinds of nested loops, "if then/else" statements.

Example 2

G4 = ({S}, {a,b},R,S).

R (Rules): S → aSa | bSb | ε

Some words in the language: abba,aabaabaa.

L(G4) = {wwR : w ∈ {a,b}∗} (almost but not quite the set of palindromes)

Proving L(G4) = {wwR : w ∈ {a,b}∗}:

We show?

1. {wwR} ⊆ L(G4).

2. L(G4) ⊆ {wwR}.

Proving {wwR} ⊆ L(G4).

We prove by induction on the length of z, that z ∈ {wwR} =⇒ z ∈ L(G4).

I Base: |z| = 0. Since (S → ε) ∈ R, it holds that z = ε ∈ L(G4).

I Inductive step. Let z = wwR be a word of size 2k .

I Let w = σw ′ (hence |w ′| = k − 1)I Hence, z = σw ′(w ′)Rσ (since z = wwR)I By i.h. S ∗→ w ′(w ′)R

I Hence, S → σSσ ∗→ σw ′(w ′)Rσ = z

Proving z ∈ L(G4) =⇒ z ∈ {wwR}We prove by induction on the number of derivations steps used to derive z,that z ∈ L(G4) =⇒ z ∈ {wwR}.

I Single derivation. Only possible derivation is S → ε, and indeedε ∈ {wwR}.

I Inductive step. Assume S ∗→ z in k > 1 derivations steps.

I First derivation is S → σSσ for σ ∈ {a,b}I Hence z = σz ′σ, and z ′ is derived from S using k − 1 stepsI By i.h., z ′ = w ′(w ′)R

I Hence, z = σw ′(w ′)Rσ = wwR for w = σw ′

Example 3

G5 = ({S,A,B}, {a,b},R,S)

R (Rules):S → aB | bA | εA→ a | aS | bAAB → b | bS | aBB

Some words in the language: aababb, baabba.

L(G5) = {w ∈ {a,b}∗ : #a(w) = #b(w)}

#x (w) — number of occurrences of x in w

Proving L(G5) = L = {w ∈ {a,b}∗ : #a(w) = #b(w)}Again we prove

1. w ∈ L(G5) =⇒ w ∈ L.

2. w ∈ L =⇒ w ∈ L(G5).

Claim 6 (Implies 1.)

If S ∗→ w ∈ {a,b,A,B}∗, then #a(w) + #A(w) = #b(w) + #B(w).

Proof: DIY

Claim 7 (Implies 2.)

Let w ∈ {a,b}∗ and k = k(w) = #a(w)−#b(w). Then:

1. If k = 0, then S ∗→ wS

2. If k > 0, then S ∗→ wBk

3. If k < 0, then S ∗→ wA|k|

Proving Claim 7Let w ∈ {a, b}∗ and k = k(w) = #a(w)−#b(w). Then:

1. If k = 0, then S ∗→ wS

2. If k > 0, then S ∗→ wBk

3. If k < 0, then S ∗→ wA|k|

Proof by induction on |w |:

I Basis: w = ε then k(w) = 0. Since S ∗→ S, then S ∗→ εS

I Induction step: for w ∈ {a,b}n, write w = w ′σ (|w ′| = n − 1)

Assume for concreteness (other cases proved analogously)

1. σ=a2. k ′ = k(w ′) = (#a(w ′)−#b(w ′)) = k − 1>0

I By i.h., S ∗→ w ′Bk ′

I Hence, S ∗→ w ′Bk ′= w ′BBk ′−1 ∗→ w ′aBBBk ′−1 = wBk ′+1 = wBk

Designing CFGs

No recipe in general, but few rules-of-thumb

I If CFG is the union of several CFGs, rename variables (not terminals) sothey are disjoint, and add new rule S → S1 | S2 | . . . | Si .

I For languages (like {0n#1n : n ≥ 0} ), with linked substrings, a rule ofform R → uRv is helpful to force desired relation between substrings.

I For a regular language, grammar “follows” a DFA for the language (seenext frame).

CFG for regular languages

Given a DFA : M = (Q,Σ, δ,q0,F )

CFG G for L(M)− {ε}:

1. Let R0 be the starting variable

2. Add rule Ri → aRj , for any qi ,qj ∈ Q and a ∈ Σ with δ(qi ,a) = qj

3. Add rule Ri → a, for any qi ∈ Q and a ∈ Σ s.t. δ(qi ,a) ∈ F .

Claim 8L(G) = L(M)− {ε}

Proof? DIY

We can also show the opposite: any grammar of this form generates aregular language.

Section 1

Parse Trees

Parse trees

Definition 9 (parse tree)

A labeled tree T is a parse tree of CFG G = (V ,Σ,R,S), ifI Inner nodes labels by elements of V .

I Leaves are labeled by elements of V ∪ Σ ∪ {ε}.I If n1, . . . ,nk are the labels of the direct descendants (from left to right) of

a node labeled A, then (A→ n1, . . . ,nk ) ∈ R

The yield of T , are the labels of all its leaves ordered written from left to right.

Theorem 10

Let G = (V ,Σ,R,S) be a CFG, let A ∈ V and let x ∈ (Σ∪V )∗. Then A ∗→ x iffG has a parse tree with root labeled by A, that yields x.

Proof? DIY

Example 1

A→ 0A1|B

B → #

The (unique) derivation sequence from A to 000#111:A→ 0A1→ 00A11→ 000A111→ 000B111→ 000#111

Example 2< SENTENCE > → < NP >< VERB >

< NP > → < ARTICLE >< NOUN >

< NOUN > →boy | girl | flower

< ARTICLE > →a | the

< VERB > →touches | likes | sees

SENTENCE

NOUN-PHRASE VERB

ARTICLE NOUN

a boy sees

Two derivation sequences from < SENTENCE > to aboysees:

I < SENTENCE >→< NP >< VERB >→< ARTICLE >< NOUN ><VERB >→ a < NOUN >< VERB >→ aboy < VERB >→ aboysees

I < SENTENCE >→< NP >< VERB >→< ARTICLE >< NOUN ><VERB >→< ARTICLE >< NOUN > sees →< ARTICLE > boysees →aboyseesRoded Sharan (TAU) Computational Models, Lecture 4 March, 2019 22 / 45

Conclusion

I Might be that several derivation sequences (i.e., S → . . .→ w) show thatw ∈ L(G), but fewer (or even single) trees rooted by S that yield w .

I A parse tree is indifferent to derivation order.

I We will focus on left derivations, where at every step the left-mostvariable is replaced.

Section 2

Ambiguity in Context-Free Languages

Ambiguity in CFLs

Grammar G: E → E + E | E × E | (E) | a

Is this a problem?Roded Sharan (TAU) Computational Models, Lecture 4 March, 2019 25 / 45

Ambiguity in CFLs cont.

Consider the grammar G′ = (V ,Σ,R,E), where

I V = {E ,T ,F}I Σ = {a,+,×, (, )}

I Rules:E → E + T | TT → T × F | FF → (E) | a

Claim 11L(G′) = L(G)

Proof: By induction on derivation length.

But G′ is not ambiguous. Proof (see Hopcroft 5.4, for a similar grammar)

Parsing tree of G′ for a + a× a

E → E + T | TT → T × F | FF → (E) | a

Parsing tree of G′ for (a + a)× a

E → E + T | TT → T × F | FF → (E) | a

( a + aX)a

Ambiguity

Definition 12A string w is derived ambiguously from grammar G, if w has two or moredifferent parse trees that generate it from G. A CFG is ambiguous, if itambiguously derives a string.

I Ambiguity is usually not only a syntactic notion but also semantic,implying multiple meanings for the same string. Think of a + a× a fromlast grammar.

I It is sometime possible to eliminate ambiguity by finding a differentcontext free grammar generating the same language. This is true for thearithmetic expressions grammar.

I Some languages are inherently ambiguous.I Example: L = {anbncmdm : n,m ≥ 1} ∪ {anbmcmdn : n,m ≥ 1}I L is a CFL. Proof? DYII L is inherently ambiguous. Proof?

The problematic strings are of the form anbncndn, see Hopcroft

Section 3

Checking membership in CFLs

Challenge

Given a CFG G and a string w , decide whether w ∈ L(G)?

Initial Idea: Design an algorithm that tries all derivations.

Problem: If G does not generate w , we’ll never stop.

Possible solution: Use special grammars that are:

I just as expressive!

I better for checking membership.

Section 4

Chomsky Normal Form (CNF)

A simplified, canonical form of context free grammars.

G = (V ,Σ,R,S) is in a CNF, if every rule in in R has one of the followingforms:

A→ a, A ∈ V ∧ a ∈ ΣA→ BC, A ∈ V ∧ B,C ∈ V \ {S}S → ε.

Simpler to analyze: each derivation adds (at most) a single terminal, S onlyappears once, ε appears only at the empty word

What does parse tree look like?

“Most” internal nodes are of degree 2, except parents of leaves that havedegree 1.

Generality of CNF

Theorem 13Any context-free language is generated by a context-free grammar inChomsky Normal Form.

Proof Idea:

I Add new start symbol S0.

I Eliminate all ε rules of the form A→ ε.

I Eliminate all “unit” rules of the form A→ B.

I Convert remaining “long rules” to proper form.

Add new start symbol

Add new start symbol S0 and rule S0 → S

(Guarantees that new start symbol is never on right hand side of a rule)e.g.

S → A | ab | εA → baA | S

becomes

S0 → SS → A | ab | εA → baA | S

Convert "long rules": terminals

S → ccAbA | bc | bA → a | bb

becomes

S → CCABA | BC | bA → a | BBB → bC → c

Convert "long rules": multiple nonterminals

S → AAAB

becomes

S → AN1

N1 → AN2

N2 → AB

Eliminate "ε-rules"

Repeat until all A→ ε (A 6= S) rules are gone:

I remove A→ ε

I for any rule of form C → AB or C → BA: add C → B.

I for any rule of form C → AA: add C → A and C → ε (unless C → ε hasalready been removed).

I for any rule of form C → A: add C → ε (unless C → ε has already beenremoved.)

Eliminate "unit rules"

Repeat until all unit rules removed

I remove some A→ B

I for each B → U (where U ∈ (Σ ∪ Γ)∗), add A→ U (unless A→ U was apreviously removed unit rule)

CNF: Example

S → ASA | aBA → B | SB → b | ε

Is transformed into:

S0 → AA1 | UB | a | SA | ASS → AA1 | UB | a | SA | ASA → b | AA1 | UB | a | SA | AS

A1 → SAU → aB → b

CNF has bounded derivation length

Lemma 14Let G be a CFG in CNF and let w ∈ L(G) be with |w | = n ≥ 1. Then everyderivation of w by G is of length 2n − 1.

Proof? By induction on |w |:I Base case is clear.

I For |w | > 1, derivation must start with S → A1A2 where each of Ai yieldsa non-empty substring wi of w and w = w1w2.

I By i.h. the derivation length is 1 + (2|w1| − 1) + (2|w2| − 1) = 2|w | − 1.

We next use CNF to check whether w ∈ L(G).

Section 5

Checking Membership for CFGs in ChomskyNormal Form

Checking membership for CFG in CNF formLet G = (V ,Σ,R,S) be CFG in CNF and let A ∈ V and w ∈ Σ∗

Algorithm 15 (Derive(A,w))

I w = ε: if (A→ ε) ∈ R return TRUE, otherwise return FALSE.

I |w | = 1: if (A→ w) ∈ R return TRUE, otherwise return FALSE.

I |w | > 1: for each (A→ BC) ∈ R and non-trivial partition w = w1w2:

I Call Derive(B,w1) and Derive(C,w2).I Return TRUE if both return TRUE.

I Return FALSE.

Claim: A ∗→ w ⇐⇒ Derive(A,w) = TRUE. Proof?

⇒: A ∗→ w =⇒ Derive(A,w) = TRUE, by induction on # of derivation steps.

⇐: Derive(A,w) = TRUE =⇒ A ∗→ w , by induction on |w |.

I Hence, Derive(S,w) = TRUE⇐⇒ w ∈ L(G).I Procedure Derive can also output a parse tree for wI Where have we used the fact that G is in CNF?

Time complexity of Derive

What is the time complexity T : N 7→ N of Derive?

I Each recursive call tests |R| rules and n partitions.I T (n) ≤ |R| · n · 2T (n − 1)

I T (n) ∈ O((|R| · n)n).

Exponential :-(

Efficient variant

I Keep in memory the results of Derive(A,w).

I Number of (different) possible inputs: |V | · n2. (?)=⇒ Only |V | · n2 calls, each takes O(|R| · n).=⇒ T (n) ∈ O(|R| · n3 · |V |).

I Polynomial time!

I This approach is called Dynamic Programming

Basic idea:

I If number of different inputs is limited, say I(n).I Each run (excluding recursive calls) takes at most R(n) timeI Total running time is bounded by T (n) ≤ R(n) · I(n).

The CYK Parsing Algorithm

i ( i, j )

(i, k)

( k+1, j )

Cell (X, l, m) is True if X can derive the substring (l,m)

(1, n )

A→BC

B. Majoros, Duke 1

(Cocke and Schwartz, 1970; Younger, 1967; Kasami, 1965)

Stochastic CFG A stochastic context-free grammar (SCFG) is a CFG plus a probability distribution on productions:

G = (V, Σ, R, S, Pp)

where Pp : R a [0,1], and probabilities are normalized at the level of each nonterminal X: ∀[ ∑ Pp(X→λ)=1 ] X∈V

X→λ

The probability of a derivation S⇒*x is the product of the probabilities for all its productions: ∏ i P(Xi→λi)

B. Majoros, Duke 2

CYK for SCFG

D(i,j,v) – probability of optimal parse tree with root v of xi … xj Initialization: ∀ 1≤i≤n,v D(i,i,v)=P(v → xi) Iteration: D(i,j,v)=max{y,z,k} [D(i,k,y) * D(k+1,j,z) * P(v →yz)] Termination: P(x,p*) = D(1,L,S) where p* is the optimal parsing Complexity: O(L3M3) time; O(L2M) memory (L-length; M-#non-terminals) [compare to HMM: O(LM2) time, O(LM) memory]

Computational Models - Lecture 4tau-cm2019.wdfiles.com/local--files/course-schedule/Lecture-4.pdf · Computational Models - Lecture 4 Handout Mode Roded Sharan. Tel Aviv University.

Documents

Computational Models - Lecture...

Computational Intelligence Lecture 10:Fuzzy...

Computational Physics (Lecture 11)

Lecture 33: Computational photography

Computational Linguistics Lecture: Textual Entailment

Lecture Notes in Computational Science and...

Linear Programming Computational Geometry Lecture

Lecture 6. Computational Contact Mechanics

Computational Physics (Lecture 14)

ECE7850 Lecture 9 Model Predictive Control: Computational...

Computational Models - Lecture...

Computational Physics (Lecture 4)

Computational Psycholinguistics Lecture 13: Learning...

Computational Genomics Lecture #3

Computational Models — Lecture...

Computational Genomics Lecture #3a