Top Banner
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay 21 st March, 2011
47

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Dec 26, 2015

Download

Documents

Abner Sharp
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

CS460/626 : Natural Language Processing/Speech, NLP and the Web

(Lecture 28– Grammar; Constituency, Dependency)

Pushpak BhattacharyyaCSE Dept., IIT Bombay

21st March, 2011

Page 2: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Grammar A finite set of rules

that generates only and all sentences of a language.

that assigns an appropriate structural description to each one.

Page 3: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Grammatical Analysis Techniques

Two main devices

– Morphological– Categorial– Functional

– Sequential– Hierarchical– Transformational

Breaking up a String

Labeling the Constituents

Page 4: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Breaking up and Labeling Sequential Breaking up

Sequential Breaking up and Morphological Labeling

Sequential Breaking up and Categorial Labeling

Sequential Breaking up and Functional Labeling

Hierarchical Breaking up Hierarchical Breaking up and Categorial

Labeling Hierarchical Breaking up and Functional

Labeling

Page 5: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Sequential Breaking up

that

student

solve ed the problem s+ + + + + +

• That student solved the problems.

Page 6: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Sequential Breaking up and Morphological Labeling

That student solved the problems.

that student solve ed the problem s

word word stem affix word stem

affix

Page 7: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Sequential Breaking up and Categorial Labeling

This boy can solve the problem.

• They called her a taxi.

this boy can solve the problem

Det N Aux V Det N

They call ed taxi

Pron V Affix N

her

Pron

a

Det

Page 8: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Sequential Breaking up and Functional Labeling

They called taxi

Subject

Verbal IndirectObject

her

Direct Object

a

They called

Subject

Verbal

taxi

DirectObject

her

Indirect Object

a

Page 9: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Hierarchical Breaking up

Old men and women

Old

men and women

Old men

and

women

Old men and women Old men and women

womenandmenmenOld

Page 10: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Hierarchical Breaking up and Categorial Labeling

S

VP

V Adv

ran away

NP

A N

Poor John

Poor John ran away.

Page 11: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Hierarchical Breaking up and Functional Labeling

• Immediate Constituent (IC) Analysis• Construction types in terms of the function of the

constituents:– Predication (subject + predicate)– Modification (modifier + head)– Complementation (verbal + complement)– Subordination (subordinator + dependent unit)– Coordination (independent unit + coordinator)

Page 12: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Predication

[Birds]subject [fly]predicate

S

PredicateSubject

Birds fly

Page 13: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Modification

[A]modifier [flower]head

John [slept]head [in the room]modifier

S

PredicateSubject

John Head Modifier

slept In the room

Page 14: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

19/12/2004

Complementation He [saw]verbal [a lake]complement

S

PredicateSubject

He Verbal Complement

saw a lake

Page 15: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Subordination John slept [in]subordinator [the room]dependent

unit S

PredicateSubject

John Head Modifier

slept

the room

Subordinator Dependent Unit

in

Page 16: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Coordination

[John came in time] independent unit [but]coordinator [Mary was not ready]

independent unit S

CoordinatorIndependent Unit

John came in time but Mary was not ready

Independent Unit

Page 17: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

S

HeadModifier

In the morning, the sky looked much brighter

Subordinator DU PredicateSubject

Head

Head

Head Verbal ComplementModifier Modifier

Modifier

In the morning, the sky looked much brighter.

An Example

Page 18: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Hierarchical Breaking up and Categorial / Functional Labeling Hierarchical Breaking up coupled

with Categorial /Functional Labeling is a very powerful device.

But there are ambiguities which demand something more powerful.

E.g., Love of God Someone loves God God loves someone

Page 19: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Hierarchical Breaking up

Love of God Love of God

Noun

PhrasePrepositional

PhraseHead

DU

Modifier

Godoflove

Sub

love of God

Categorial Labeling Functional Labeling

Page 20: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Types of Generative Grammar

Finite State Model (sequential)

Phrase Structure Model (sequential + hierarchical) + (categorial)

Transformational Model (sequential + hierarchical + transformational) + (categorial + functional)

Page 21: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Phrase Structure Grammar (PSG)

A phrase-structure grammar G consists of a four tuple (V, T, S, P), where

V is a finite set of alphabets (or vocabulary) E.g., N, V, A, Adv, P, NP, VP, AP, AdvP, PP,

student, sing, etc. T is a finite set of terminal symbols: T

V E.g., student, sing, etc.

S is a distinguished non-terminal symbol, also called start symbol: S V

P is a set of productions.

Page 22: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Noun Phrases

John

NP

N

student

NP

N

the

Det

student

NP

N

the

Det

intelligent

AdjP

• John • the student • the intelligent student

Page 23: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Noun Phrase

five

NP

Quant

his

Det

first

Ord

students

N

PhD

N

• his first five PhD students

Page 24: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Noun Phrase

five

NP

Quant

the

Det

students

N

best

AP

of my class

PP

• The five best students of my class

Page 25: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Verb Phrases

sing

VP

V

can

Aux

the ball

VP

NP

can

Aux

hit

V

• can sing • can hit the ball

Page 26: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Verb Phrase

a flower

VP

NP

can

Aux

give

V

to Mary

PP

• Can give a flower to Mary

Page 27: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Verb Phrase

John

VP

NP

may

Aux

make

V

the chairman

NP

• may make John the chairman

Page 28: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Verb Phrase

the book

VP

NP

may

Aux

find

V

very interesting

AP

• may find the book very interesting

Page 29: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Prepositional Phrases

• in the classroom

the river

PP

NP

near

P

the classroom

PP

NP

in

P

• near the river

Page 30: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Adjective Phrases

intelligent

AP

A

honest

AP

A

very

Degree

of sweets

AP

PP

fond

A

• intelligent • very honest • fond of sweets

Page 31: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Adjective Phrase

• very worried that she might have done badly in the assignment

that she might have done badly in the assignment

AP

S’

very

Degree

worried

A

Page 32: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Phrase Structure Rules

Rewrite Rules:(i) S NP VP(ii) NP Det N(iii) VP V NP(iv) Det the(v) N man, ball(v) V hit

We interpret each rule X Y as the instruction rewrite X as Y.

• The boy hit the ball.

Page 33: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Derivation

SentenceNP + VP (i)Det + N + VP (ii)Det + N + V + NP (iii)The + N + V + NP (iv)The + boy + V + NP (v)The + boy + hit + NP (vi)The + boy + hit + Det + N (ii)The + boy + hit + the + N (iv)The + boy + hit + the + ball (v)

• The boy hit the ball.

Page 34: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

PSG Parse Tree

The boy hit the ball.S

VPNP

VNDet

the

NP

Nboy Dethit

the ball

Page 35: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

PSG Parse Tree John wrote those words in the Book of

Proverbs.S

VPNP

VPropN NP

John wrote those words

PP

NP

in

P

the book

of proverbs

NP PP

Page 36: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Penn POS Tags

[John/NNP ]wrote/VBD [ those/DT words/NNS ]in/IN [ the/DT Book/NN ]of/IN [ Proverbs/NNS ]

• John wrote those words in the Book of Proverbs.

Page 37: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Penn Treebank

(S (NP-SBJ (NP John)) (VP wrote (NP those words) (PP-LOC in

(NP (NP-TTL (NP the Book)(PP of (NP Proverbs)))

• John wrote those words in the Book of Proverbs.

Page 38: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

PSG Parse Tree

Official trading in the shares will start in Paris on Nov 6. S

VP

NP

NAP

official

PP

trading will start on Nov 6

A

PP

NP

in

P

the shares

NP

PPVAux

in Paris

Page 39: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Penn POS Tags

[ Official/JJ trading/NN ]in/IN [ the/DT shares/NNS ]will/MD start/VB in/IN [ Paris/NNP ]on/IN [ Nov./NNP 6/CD ]

• Official trading in the shares will start in Paris on Nov 6.

Page 40: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Penn Treebank

( (S (NP-SBJ (NP Official trading) (PP in (NP the shares))) (VP will (VP start

(PP-LOC in (NP Paris))

(PP-TMP on (NP (NP Nov 6)

• Official trading in the shares will start in Paris on Nov 6.

Page 41: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Penn POS Tag Sset Adjective: JJ Adverb: RB Cardinal Number: CD Determiner: DT Preposition: IN Coordinating Conjunction CC Subordinating Conjunction: IN Singular Noun: NN Plural Noun: NNS Personal Pronoun: PP Proper Noun: NP Verb base form: VB Modal verb: MD Verb (3sg Pres): VBZ Wh-determiner: WDT Wh-pronoun: WP

Page 42: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Difference between constituency and dependency

Page 43: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Constituency Grammar Categorical Uses part of speech Context Free Grammar (CFG) Basic elements Phrases

Page 44: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Dependency Grammar

Functional Context Free Grammar Basic elements Units of

Predication/ Modification/ Complementation/ Subordination/ Co-ordination

Page 45: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Bridge between Constituency and Dependency parse Constituency uses phrases Dependencies consist of Head-modifier combination This is a cricket bat.

Cricket (Category: N, Functional: Adj) Bat (Category: N, Functional: N)

For languages which are free word order we use dependency parser to uncover the relations between the words.

Raam ne Shaam ko dekha . (Ram saw Shyam) Shaam ko Ram ne dekha. (Ram saw Shyam) Case markers cling to the nouns they subordinate

Page 46: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Example of CG and DG output

Birds Fly.

S S

NP VP Subject

Predicate

N V

Birds

fly

Birds

fly

Page 47: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Some probabilistic parsers and why they are used

Stanford, Collins, Charniack, RASP Why Probabilistic parsers

For a single sentence we can have multiple parses.

Probability for the parse is calculated and then the parse with the highest probability is selected.

This is needed in many applications of NLP, that need parsing.