Top Banner
03/25/22 CPSC503 Winter 2010 1 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini
49

7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

Dec 21, 2015

Download

Documents

Elaine Ross
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 1

CPSC 503Computational Linguistics

Lecture 8Giuseppe Carenini

Page 2: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 2

Today 5/10

• Start Syntax / Parsing (Chp 12!)

Page 3: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 3

Knowledge-Formalisms Map(next three lectures)

Logical formalisms (First-Order Logics)

Rule systems (and prob. versions)(e.g., (Prob.) Context-Free

Grammars)

State Machines (and prob. versions)

(Finite State Automata,Finite State Transducers, Markov Models)

Morphology

Syntax

PragmaticsDiscourse

and Dialogue

Semantics

AI planners

Page 4: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 4

Today 5/10

• English Syntax • Context-Free Grammar for English

– Rules– Trees– Recursion– Problems

• Start Parsing

Page 5: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 5

SyntaxDef. The study of how sentences are formed by

grouping and ordering words

Example: Ming and Sue prefer morning flights

* Ming Sue flights morning and prefer

Groups behave as single unit wrt Substitution, Movement, Coordination

Page 6: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 6

Syntax: Useful tasks

• Why should you care?– Grammar checkers– Basis for semantic interpretation

•Question answering •Information extraction•Summarization

– Machine translation– ……

Page 7: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 7

Key Constituents – with heads (English)• Noun phrases• Verb phrases• Prepositional

phrases • Adjective phrases• Sentences

• (Det) N (PP)• (Qual) V (NP)• (Deg) P (NP)• (Deg) A (PP)• (NP) (I) (VP)

Some simple specifiersCategory Typical function

ExamplesDeterminer specifier of N the, a, this,

no..Qualifier specifier of V never,

often..Degree word specifier of A or P very,

almost..

Complements?

(Specifier) X (Complement)

Page 8: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 8

Key Constituents: Examples• Noun phrases

• Verb phrases

• Prepositional phrases

• Adjective phrases

• Sentences

• (Det) N (PP) the cat on the

table• (Qual) V (NP) never eat a cat• (Deg) P (NP) almost in the net• (Deg) A (PP) very happy

about it• (NP) (I) (VP) a mouse -- ate it

Page 9: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 9

Context Free Grammar (Example)

• S -> NP VP• NP -> Det NOMINAL• NOMINAL -> Noun• VP -> Verb• Det -> a• Noun -> flight• Verb -> left

Terminal

Non-terminal

Start-symbol

Page 10: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 10

CFG more complex Example

LexiconGrammar with example phrases

Page 11: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 11

Context Free Grammars (CFGs)

• Define a Formal Language (un/grammatical sentences)

• Generative Formalism– Generate strings in the language– Reject strings not in the language– Impose structures (trees) on strings

in the language

Page 12: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 12

CFG: Formal Definitions

• 4-tuple (non-term., term., productions, start)

• (N, , P, S)

• P is a set of rules A; AN, (N)*

• A derivation is the process of rewriting 1 into m (both strings in (N)*) by

applying a sequence of rules: 1 * m

• L G = W|w* and S * w

Page 13: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 13

Derivations as Trees

flight

Nominal

Nominal

Context Free?

Page 14: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 14

CFG Parsing

• It is completely analogous to running a finite-state transducer with a tape– It’s just more powerful

• Chpt. 13

Parser

I prefer a morning flight

flight

Nominal

Nominal

Page 15: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 15

Other Options• Regular languages (FSA) A xB or A x

– Too weak (e.g., cannot deal with recursion in a general way – no center-embedding)

• CFGs A (also produce more understandable and “useful” structure)

• Context-sensitive A ; ≠– Can be computationally intractable

• Turing equiv. ; ≠– Too powerful / Computationally

intractable

Page 16: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 16

Common Sentence-Types• Declaratives: A plane left

S -> NP VP

• Imperatives: Leave!S -> VP

• Yes-No Questions: Did the plane leave?S -> Aux NP VP

• WH Questions: Which flights serve breakfast?

S -> WH NP VP

When did the plane leave?S -> WH Aux NP VP

Page 17: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 17

NP: more detailsNP -> Specifiers N Complements

• NP -> (Predet)(Det)(Card)(Ord)(Quant) (AP) Nom

e.g., all the other cheap cars

• Nom -> Nom PP (PP) (PP) e.g., reservation on BA456 from NY to

YVRNom -> Nom GerundVP e.g., flight arriving on Monday Nom -> Nom RelClause Nom RelClause ->(who | that) VP e.g., flight that arrives in the evening

Page 18: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 18

Conjunctive Constructions• S -> S and S

– John went to NY and Mary followed him

• NP -> NP and NP– John went to NY and Boston

• VP -> VP and VP– John went to NY and visited MOMA

• …• In fact the right rule for English is

X -> X and X

Page 19: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 19

Problems with CFGs

• Agreement

• Subcategorization

Page 20: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 20

Agreement• In English,

– Determiners and nouns have to agree in number

– Subjects and verbs have to agree in person and number

• Many languages have agreement systems that are far more complex than this (e.g., gender).

Page 21: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 21

Agreement

• This dog• Those dogs

• This dog eats• You have it• Those dogs eat

• *This dogs• *Those dog

• *This dog eat• *You has it• *Those dogs

eats

Page 22: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 22

Possible CFG Solution

• S -> NP VP• NP -> Det Nom• VP -> V NP• …

• SgS -> SgNP SgVP• PlS -> PlNp PlVP• SgNP -> SgDet SgNom• PlNP -> PlDet PlNom• PlVP -> PlV NP• SgVP3p ->SgV3p NP• …

Sg = singularPl = plural

OLD Grammar

NEW Grammar

Page 23: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 23

CFG Solution for Agreement

• It works and stays within the power of CFGs

• But it doesn’t scale all that well (explosion in the number of rules)

Page 24: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 24

Subcategorization

• *John sneezed the book• *I prefer United has a flight• *Give with a flight

• Def. It expresses constraints that a predicate (verb here) places on the number and type of its arguments (see first table)

Page 25: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 25

Subcategorization

• Sneeze: John sneezed

• Find: Please find [a flight to NY]NP

• Give: Give [me]NP[a cheaper fare]NP

• Help: Can you help [me]NP[with a flight]PP

• Prefer: I prefer [to leave earlier]TO-VP

• Told: I was told [United has a flight]S

• …

Page 26: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 26

So?

• So the various rules for VPs overgenerate.– They allow strings containing verbs

and arguments that don’t go together– For example:

•VP -> V NP therefore Sneezed the book•VP -> V S therefore go she will go there

Page 27: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 27

Possible CFG Solution

• VP -> V• VP -> V NP• VP -> V NP PP• …

• VP -> IntransV• VP -> TransV NP

• VP -> TransPPto NP PPto

• …

• TransPPto -> hand,give,..

This solution has the same problem as the one for agreement

OLD Grammar

NEW Grammar

Page 28: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 28

CFG for NLP: summary• CFGs cover most syntactic structure in

English.

• But there are problems (over-generation)– That can be dealt with adequately, although

not elegantly, by staying within the CFG framework.

• Many practical computational grammars simply rely on CFG

• For more elegant / concise approaches see Chpt 15 “Features and Unification”

Page 29: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 29

Dependency Grammars• Syntactic structure: binary relations

between words

• Links: grammatical function or very general semantic relation

• Abstract away from word-order variations (simpler grammars)

• Useful features in many NLP applications (for classification, summarization and NLG)

Page 30: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 30

Today 5/10

• English Syntax • Context-Free Grammar for English

– Rules– Trees– Recursion– Problems

• Start Parsing (if time left)

Page 31: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 31

Parsing with CFGs

Assign valid trees: covers all and only the elements of the input and has an S at

the top

Parser

I prefer a morning flight

flight

Nominal

Nominal

CFG

Sequence of words Valid parse trees

Page 32: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 32

Parsing as Search• S -> NP VP• S -> Aux NP VP• NP -> Det Noun• VP -> Verb• Det -> a• Noun -> flight• Verb -> left,

arrive• Aux -> do, does

Search space of possible parse trees

CFG

defines

Parsing: find all trees that cover all and only the words in the input

Page 33: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 33

Constraints on Search

Parser

I prefer a morning flight

flight

Nominal

NominalCFG

(search space)

Sequence of words Valid parse trees

Search Strategies: • Top-down or goal-directed• Bottom-up or data-directed

Page 34: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 34

Top-Down Parsing• Since we’re trying to find trees

rooted with an S (Sentences) start with the rules that give us an S.

• Then work your way down from there to the words. flightInput

:

Page 35: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 35

Next step: Top Down Space

• When POS categories are reached, reject trees whose leaves fail to match all words in the input

…….. …….. ……..

Page 36: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 36

Bottom-Up Parsing• Of course, we also want trees that

cover the input words. So start with trees that link up with the words in the right way.

• Then work your way up from there.

flight

flight

flight

Page 37: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 37

Two more steps: Bottom-Up Space

flightflightflight

flightflight

flightflight

…….. …….. ……..

Page 38: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 38

Top-Down vs. Bottom-Up• Top-down

– Only searches for trees that can be answers

– But suggests trees that are not consistent with the words

• Bottom-up– Only forms trees consistent with the

words– Suggest trees that make no sense

globally

Page 39: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 39

So Combine Them

• Top-down: control strategy to generate trees

• Bottom-up: to filter out inappropriate parses

Top-down Control strategy:• Depth vs. Breadth first• Which node to try to expand next• Which grammar rule to use to expand a

node

(left-most)

(textual order)

Page 40: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 40

Top-Down, Depth-First, Left-to-Right Search

Sample sentence: “Does this flight include a meal?”

Page 41: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 41

Example “Does this flight include a meal?”

Page 42: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 42

flightflight

Example “Does this flight include a meal?”

Page 43: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 43

flight flight

Example “Does this flight include a meal?”

Page 44: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 44

Adding Bottom-up Filtering

The following sequence was a waste of time because an NP cannot generate a parse tree starting with an AUX

Aux Aux Aux Aux

Page 45: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 45

Bottom-Up FilteringCategory Left Corners

S Det, Proper-Noun, Aux, Verb

NP Det, Proper-Noun

Nominal Noun

VP Verb

Aux Aux Aux

Page 46: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 46

Problems with TD-BU-filtering

• Ambiguity• Repeated Parsing

• SOLUTION: Earley Algorithm (once again dynamic programming!)

Page 47: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 47

For Next Time

• Read Chapter 13 (Parsing)• Optional: Read Chapter 15 (Features and

Unification) – skip algorithms and implementation

Page 48: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 48

Grammars and Constituency• Of course, there’s nothing easy or obvious

about how we come up with right set of constituents and the rules that govern how they combine...

• That’s why there are so many different theories of grammar and competing analyses of the same data.

• The approach to grammar, and the analyses, adopted here are very generic (and don’t correspond to any modern linguistic theory of grammar).

Page 49: 7/2/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

04/19/23 CPSC503 Winter 2010 49

Syntactic Notions so far...

• N-grams: prob. distr. for next word can be effectively approximated knowing previous n words

• POS categories are based on:– distributional properties (what other

words can occur nearby) – morphological properties (affixes they

take)