Top Banner
MASTER DI SCIENZE COGNITIVE GENOVA 2005 14-10-05 Natural Language Grammars and Parsing Alessandro Mazzei Dipartimento di Informatica Università di Torino
53

Natural Language Grammars and Parsing

Feb 03, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Natural Language Grammars and Parsing

MASTER DI SCIENZE COGNITIVEGENOVA 2005

14-10-05

Natural Language Grammars andParsing

Alessandro MazzeiDipartimento di Informatica

Università di Torino

Page 2: Natural Language Grammars and Parsing

Natural Language Processing

Phonetics acoustic and perceptual elements

Phonology inventory of basic sounds (phonemes) and basic rules for combination, e.g. vowel harmony

Morphology how morphemes combine to form words, relationship of phonemes to meaning

Syntax sentence formation, word order and the formation of constituents from word groupings

Semantics how do word meanings recursively compose to form sentence meanings (from syntax to logical formulas)

Pragmatics meaning that is not part of compositional meaning

Page 3: Natural Language Grammars and Parsing

Natural Language Syntax

Syntactic Parsing: deriving a syntactic structure fromthe word sequence

Syntactic structure

Word sequence

Page 4: Natural Language Grammars and Parsing

Natural Language Syntax

Syntactic Parsing: deriving a syntactic structure fromthe word sequence

PaoloPaolo ama FrancescaN

NP

S

Paolo ama Francesca

NV

VPama

Francesca

sub obj

Page 5: Natural Language Grammars and Parsing

Generative approach to Syntax

Formal languages Generative grammars Context-Free Parser Probabilistic parsing Treebank

Page 6: Natural Language Grammars and Parsing

Formal Languages

Σ = {a1,a

2,...,a

n} alphabet

Σ* Σ = {0,1} 001,111110,ε,0 ∈ Σ*

Formal Language L ⊆ Σ*

Page 7: Natural Language Grammars and Parsing

Formal Languages

Σ = {0,1}

L1 = {01,0101,010101,01010101,...}

L2 = {01,0011,000111,00001111,...}

L3 = {11,1111,11111111,}

Page 8: Natural Language Grammars and Parsing

Formal Languages

Σ4 = {I,Anna,John,Harry,saw,see,swimming}

L4 = {I swim, I saw Harry swimming,...}

Page 9: Natural Language Grammars and Parsing

Natural and Formal languages

Σ = {a,aback,...,zoom,zucchini}

Natural Language L ⊆ Σ*

Page 10: Natural Language Grammars and Parsing

Generative approach to Syntax

Formal languages Generative grammars Context-Free Parser Probabilistic parsing Treebank

Page 11: Natural Language Grammars and Parsing

Rewriting Systems

● Turing, Post

Rewriting rule: Ψ → θ

Page 12: Natural Language Grammars and Parsing

Generative grammar

G=(Σ,V,S,P)

Σ = alphabet

V = {A,B,...}

S ∈ V

P = {Ψ → θ,...}

Page 13: Natural Language Grammars and Parsing

Grammar and derivation

If A → β ∈ P

αAγ ⇒ αβγ directly derives

if α1 ⇒ α

2, α

2 ⇒ α

3, ... , α

m-1 ⇒ α

m

α1 ⇒* α

m derives

L(G)={x ∈ Σ* : S ⇒* x}

Page 14: Natural Language Grammars and Parsing

Grammar 1

● G1=({0,1},{A,B},A,{A→0B,B→1A,B→1})

A⇒0B⇒01

A⇒0B⇒01A⇒010B⇒0101

A⇒0B⇒01A⇒010B⇒0101A⇒01010B⇒010101

L(G1)={01,0101,010101,...}

Page 15: Natural Language Grammars and Parsing

Grammar 2

● G2=({0,1},{S},S,{S→0S1,S→01})

S⇒01

S⇒0S1⇒0011

S⇒0S1⇒00S11⇒000111

L(G2)={01,0011,000111,...}

Page 16: Natural Language Grammars and Parsing

Derivation tree

S⇒0S1⇒00S11⇒000111

0

S

1S

0 1S

0 1

Page 17: Natural Language Grammars and Parsing

Generative Grammars and Natural Languages

● Generative Grammars can model the natural language as a formal language

● The derivation tree can model the syntactic structure of the sentences

Page 18: Natural Language Grammars and Parsing

Grammar 3

● G4=(Σ

4,{S,NP,VP,V

1,V

2},S,P

4})

Σ4 = {I,Anna,John,Harry,saw,see,swiming}

P4 = {S→ NP VP, VP→V

1 S, VP→V

2,

NP→I|John|Harry|Anna, V

1→saw|see, V

2→swimming}

Page 19: Natural Language Grammars and Parsing

Grammar 3

● G4=(Σ

4,{S,NP,VP,V

1,V

2},S,P

4})

S⇒NP VP⇒I VP⇒I V1S⇒I saw S ⇒I saw NP VP ⇒

I saw Harry VP⇒I saw Harry V2⇒I saw Harry swimming

L(G3)={I swim,I saw Harry swim,...}

Page 20: Natural Language Grammars and Parsing

Grammar 3

I

V2

NP

S

SV1

VP

sawNP VP

Harry

swim

S⇒NP VP⇒I VP⇒I V1S⇒I saw S ⇒I saw NP VP ⇒

I saw Harry VP⇒I saw Harry V2⇒I saw Harry swimming

Page 21: Natural Language Grammars and Parsing

Generative Power

● What is the smallest class of generative grammars that can generate the natural languages?

● Weak vs. Strong Generative power

Page 22: Natural Language Grammars and Parsing

Languages Chomsky hierarchy

(ab)n

anbn

anbncn

a2n

LDiag

Linear A → aB

Context-freeS → aSb

Context-sensitiveCaa → aaCa

Type 0

Ψ → θ

Page 23: Natural Language Grammars and Parsing

Languages Chomsky hierarchy

(ab)n

anbn

anbncn

a2n

LDiag

Linear A → aB

Context-freeS → aSb

Context-sensitiveCaa → aaCa

Type 0

Ψ → θ

Mildly Context-sensitiveCB → f(C,B)

Page 24: Natural Language Grammars and Parsing

Generative approach to Syntax

Formal languages Generative grammars Context-Free Parser Probabilistic parsing Treebank

Page 25: Natural Language Grammars and Parsing

Context-Free Grammars

G=(Σ,V,S,P) A → β

● Costituency● Grammatical relations● Subcategorization

Page 26: Natural Language Grammars and Parsing

Constituency

Constituent = group of contiguous (?!) words ● that are as a unit [Fodor-Bever,Bock-Loebell]

● that have syntactic properties

Ex. preposed-postposed, substitutability.

Noun Phrases (NP), Verb Phrases (VP),...

● CFG: Constituent ⇔ non terminal symbols V

Page 27: Natural Language Grammars and Parsing

Grammar lexicon

Page 28: Natural Language Grammars and Parsing

Grammar rules

Page 29: Natural Language Grammars and Parsing

Derivation Tree

Page 30: Natural Language Grammars and Parsing

Generative approach to Syntax

Formal languages Generative grammars Context-Free Parser Probabilistic parsing Treebank

Page 31: Natural Language Grammars and Parsing

Parser

Parser

Paolo ama FrancescaN

NP

S

Paolo ama Francesca

NV

VP

Page 32: Natural Language Grammars and Parsing

Anatomy of a Parser

(1) Grammar

Context-Free, ...

(2) Algorithm

I. Search strategy top-down, bottom-up, left-to-right, ...

II.Memory organization back-tracking, dynamic programming, ...

(3) Oracle

Probabilistic, rule-based, ...

Page 33: Natural Language Grammars and Parsing

Grammar

Page 34: Natural Language Grammars and Parsing

Target Parse

Page 35: Natural Language Grammars and Parsing

Top-Down

Page 36: Natural Language Grammars and Parsing

Bottom-Up

Page 37: Natural Language Grammars and Parsing

Parser 1

(1) Grammar

Context-Free, ...

(2) Algorithm

I. Search strategy top-down, bottom-up, left-to-right, ...

II.Memory organization back-tracking, dynamic programming, ...

(3) Oracle

Probabilistic, rule-based, ...

Page 38: Natural Language Grammars and Parsing

Parser 1 (1)S→NP VPNP→DET NomNP→PropN

S→AUX NP VPAUX→doesNP→DET Nom

DET→thisNom→Noun

Noun→flightVP→Verb

Page 39: Natural Language Grammars and Parsing

Parser 1 (2)VP→Verb NPVerb→include

NP→Det NomDet→a

Nom→Noun

Noun→meal

Page 40: Natural Language Grammars and Parsing

Left-Recursion

NP → NP PP

Page 41: Natural Language Grammars and Parsing

Repeated Parsing subtrees

Page 42: Natural Language Grammars and Parsing

Ambiguity

● One sentence can have several “legal parse tree”

● 15 words ⇒ ~1000000 parse trees

Dynamic Programming ⇒ Earley Algorithm

Page 43: Natural Language Grammars and Parsing

Generative approach to Syntax

Formal languages Generative grammars Context-Free Parser Probabilistic parsing Treebank

Page 44: Natural Language Grammars and Parsing

Probabilistic CFG

G=(Σ,V,S,P)

A → β [p] p ∈ (0,1)

Page 45: Natural Language Grammars and Parsing

PCFG

Page 46: Natural Language Grammars and Parsing

PCFGP(T

a) = .15 * .4 *.05 * .05 *

.35 * .75 * .4 * .4 * .4 * .3 * .4 * .5 =

= 1.5 x 10-6

P(Tb) = .15 * .4 *.4 * .05 *

.05 * .75 * .4 * .4 * .4 * .3 * .4 * .5 =

= 1.7 x 10-6

Page 47: Natural Language Grammars and Parsing

Parser 2 (CKY)

(1) Grammar

Context-Free, ...

(2) Algorithm

I. Search strategy top-down, bottom-up, left-to-right, ...

II.Memory organization back-tracking, dynamic programming, ...

(3) Oracle

Probabilistic, rule-based, ...

Page 48: Natural Language Grammars and Parsing

CKY idea

W1 W

2 W

3 W

4 W

5

C

P(1,4,A) = pA * P(1,2,B) * P(3,4,C)

P(1,4,D) = pD * P(1,2,B) * P(3,4,C)

B

A

A→BC [pA]

D→BC [pD]

W1 W

2 W

3 W

4 W

5

CB

D

Page 49: Natural Language Grammars and Parsing

Parser 2 (CKY)

Page 50: Natural Language Grammars and Parsing

Generative approach to Syntax

Formal languages Generative grammars Context-Free Parser Probabilistic parsing Treebank

Page 51: Natural Language Grammars and Parsing

Treebank

● How can we compute the probability of a PCFG? Counting

● Treebank: collection of syntactic annotated sentences (trees)

● Penn TB: 1M word

Page 52: Natural Language Grammars and Parsing

Treebank Grammars (PCFG)

P(A→β)=Count(A→β)/Count(A)

P(S→NP VP) =2/2=1 P(NP→N) =2/2=1

P(VP→V N) =1/2=.5 P(VP→V) =1/2=.5

P(N→Paolo) =2/3=.66 P(N→Francesca) =1/3=.33

P(V→corre) =1/2=.5 P(V→ama) =1/2=.5

Paolo ama FrancescaN

NP

S

NV

VP

Paolo corre N

NP

S

V

VP

Page 53: Natural Language Grammars and Parsing

References● SPEECH and LANGUAGE PROCESSING

D. Jurafsky and J.H. Martin

Prentice Hall 2000● Introduction to Automata and Language

Theory

J.E. Hopcroft and J.D.Ullman

Addison-wesley 1979● Natural Language Understanding

J.F. Allen

Benjamin Cummings 1995