Top Banner
Word classes and the distribution of words, and Part of Speech tagging Computational linguistics
47

Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Jan 15, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Word classes andthe distribution of words,

and Part of Speech taggingComputational linguistics

Page 2: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Eats shoots and leaves

• We move on to finding higher level structure of natural language. Most of this is expressed in terms of categories – categories of words, and categories of sequences of words called phrases.

Page 3: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Classes or categories of words

• Roughly: words whose distributions are very similar. Two words are in the same categories iff we can substitute one for the other in a sentence and preserve grammaticality.

• We will return to this question.

Page 4: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Open and closed classes

• Open: Noun, verb, adjective.

• Closed: preposition, adverbs, conjunctions.

• Open: large classes, and more words can be added to them.

• Closed: small classes, and they are resistant to adding new members. A new preposition?

Page 5: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Two points of view

What’s real and central in grammar are notions like Noun and Verb (and Noun Phrase and Verb Phrase). Then we find real nouns, like dog and John and Monday. Many of them are good nouns, but some of them are defective; they don’t “do” all the things that they “should do”.

Page 6: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

2nd point of view

What’s real are sentences (or corpora):

John is leaving Wednesday with his dog.

When we look at a language, we find an enormous range of “places” where a given word can appear. (“Places” meaning environments, perhaps meanings). No two words are quite alike, but words do form clusters with regard to their grammatical behavior. For example, ...

Page 7: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

The days of the week (Monday…Sunday) share a lot in common. We can simplify our description by generalizing over that set of words.

John left __. John left last __. John leaves next __. He leaves on __. You must do it before__. Do it by __. Your horoscope for __. __’s weather forecast. The __ after Christmas.

* at__. * to __. *saw__. *We__. *I __.

Page 8: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Proper given namesLikewise, Proper given names (John, Jerry,

…).

As we form larger and larger classes, there are fewer things that they have in common.

How do these J-words (!) differ from other “nouns”?

Rarely take articles (the Jim) or relative clauses or adjectives (Mary who bought a book), but they certainly can: the Jim I went to elementary school with, the Bush who made those campaign promises, a fresh and smiling Ralph Nader)

Page 9: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Back to first view

• Grammar consists of a set of non-terminal nodes, terminal nodes, a set of context-free expansion rules, and a lexicon, at the least.

• Depending on your analysis, also a set of transformations.

• Syntax is responsible for the generation of phrase-structures, whose terminal nodes are lexical categories.

• Lexical categories are expanded to words of the appropriate category.

Page 10: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Syntax• Non-terminal categories: two correspond to

semantic primitives (proposition and term); these are Sentence (S) and Noun Phrase (NP).

• Terminals: the categories into which words are put. Perhaps these are universal, perhaps they aren’t. (Some) Linguists tend to think they are; computational linguists tend to think they aren’t.

• Non-terminals based on terminal categories. Noun begets Noun Phrase, Adjective begets Adjective Phrase, etc.

• Context-free phrase structure rules: Non-terminal node expands to both non-terminals and terminal nodes.

• Terminals are expanded to words (“lexical elements”, in the parlance).

Page 11: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

S

NP VPINFL

mightV VP

V

sleeping

be

N

John

Page 12: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Syntactic rules

• S → NP + INFL + VP• INFL → { can, could, may, might, will,

should, do } • VP → ( Advnot ) VP• VP → V NP NP PP*• VP → VP AdvP[hrase]• VP → V (NP) S: allows for recursive

structure: sentences within sentences, of unbounded length.

Page 13: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

S → NP + INFL + VPS has other expansions in English,

such as in infinitives; there, an INFL with to is found, but no tense, no auxiliary verbs, no dummy do.

S → NP + [INFL to ] + VP

It is important for John to leave, but not …*for John to should leave, …*for John should to leave, etc.

Page 14: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

• NP → det AdjP

• → N PrepP_

N

NP

det AP

_

N

The former king

A PP

P NP

N

of England

_

N

Head of NP

Page 15: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

N is head of NP

• The semantically central word:

A big book is a book.

And the one whose form is determined by the governing verb in a case-marking language, and the one that determines the number and gender of any words that agree with the NP.

Page 16: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Categories

We have 4 things in mind when we make them:

1. (Lexical categories): Morphological structure

2. Meaning (semantics)

3. External distribution

4. (Phrasal categories): internal distribution

...

Page 17: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Morphology

• What suffixes may appear with a given stem: ‘s, NULL, s;

• ed, s, ing, ed

• er, est, ness

Page 18: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Meaning

• Reference to objects in the world

• Reference to n-ary predicates:

• unary: tall, sleep

• binary: eat (human, food), saw (human, object)

• ternary: give (human, human, object)

Page 19: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

External distributionRoughly speaking: this means, what this word

(or phrase) can appear next to (before, after).

Nouns appear after articles (=noun determiners, nominal determiners), after adjectives. before Prepositinal Phrase complements.

the dog, my dog, the taste of champagne, the war of the worlds

Page 20: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Internal distribution (phrases)

• A “noun phrase” has three parts: a determiner, followed by an adjective, followed by a noun.

• Some of these are “optional”: that is, we may still call something an noun phrase even if not all 3 are present.

Page 21: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Back to categories for wordsNoun properties (?English):• Takes articles• Takes preceding adjectives• May appear as subject of a sentence• May appear as object of a preposition• Has singular and plural form; plural is realized

as /s/• Refers to an object or set of objects• May take possessive ‘s• May serve as antecedent to a pronoun

Page 22: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Verb

• Has present-tense form (-s in 3rd singular)

• Has past-tense form (-ed)

• Agrees with its subject noun phrase

• Refers to a predicate (1 or more arguments)

• Follows the subject immediately

• Appears at the beginning of a verb-phrase

Page 23: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Lexical categories in language• One view is that there is a small number of

categories, and they can be identified across languages. (I think most people believe that.)

• The core criterion for membership is semantic, and the only effective way of identifying across languages is semantic.

• All languages have a category of phrases that refer to things (NP), and one that expresses propositions (S).

Page 24: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Nouns and pronouns

• Nouns in many languages are inflected for number and case.

• Case: Nominative, accusative, genitive, dative, and often others.

• Pronouns, but not nouns, in English are inflected for case: nominative, genitive, and accusative (or other).

Page 25: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Pronouns

Tag Nom Acc Possessive Genitive head

reflexive

1st sg I Me My Mine Myself

2nd sg You You Your Yours Yourself

3rd sg m He Him His His Himself

3rd sg f Her Her Her Hers Herself

3rd sg neuter It It Its Its itself

1st plural We Us Our Ours Ourselves

2nd plural You You Your Yours Yourselves

3rd plural They Them Their Theirs themselves

Page 26: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Penn Treebank noun categoriesNN noun, common, singular or mass common-carrier cabbage knuckle-duster Casino afghan shed thermostat investment slide humour

falloff slick wind hyena override subhumanity machinist ... NNP noun, proper, singular Motown Venneboerger Czestochwa Ranzer Conchita Trumplane Christos Oceanside Escobar

Kreisler Sawyer Cougar Yvette Ervin ODI Darryl CTCA Shannon A.K.C. Meltex Liverpool ... NNPS noun, proper, plural Americans Americas Amharas Amityvilles Amusements Anarcho-Syndicalists Andalusians

Andes Andruses Angels Animals Anthony Antilles Antiques Apache Apaches Apocrypha ... NNS noun, common, plural undergraduates scotches bric-a-brac products bodyguards facets coasts divestitures storehouses

designs clubs fragrances averages subjectivists apprehensions muses factory-jobs ...

Page 27: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Along with nouns…• Determiners:

– Articles (a,an,the): definite, indefinite– Possessive pronouns (my, your, his…)– Demonstrative determiners: this, that…

• Adjectives– In many languages, agree with the noun that

they modify for case and number (but not in English). Spanish: l-a-s mes-a-s pequeñ-a-s ‘the tables small-fem-plural’

Page 28: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Adjectives

• Absolute (or positive) form: big

• Comparative: biggerYour car is bigger than theirs.

• Superlative:

Of these cars, John’s car is the biggest.

Page 29: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Quantifiers

• Often appear in pre-noun positions, inside the Noun Phrase

• Express notions of “some, all, none”• May be pre-noun modifiers, or a full NP

(like pronouns): something, anyone, etc. (Are these really two words stuck together?)

• Question and relative clause words: who, what, where, when, why, whose, which.

Page 30: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Relative clauses in English:that-Comp, gap in clause

NP

NPS’

SComp(that)

The thing

I saw [e]

that is option if gap is notin subject position.[e] marks the “gap”

Page 31: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Relative clauses in English:wh-phrase

NP

NPS’

SCompwhich

The ideas

I disagree with [e]

Page 32: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Relative clauses in English:wh-phrase w/ pied-piping of P

NP

NPS’

SCompwith which

The ideas

I disagree [e]

Page 33: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Relative clause formation can rip out of embedded clauses

NP

NPS’

SCompwith which

The ideas

Your manager said

S

You disagree [e]

Page 34: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Verbs

• Verbs are words that refer to actions, and which are the essential component of most sentences.

• There are non-verbal sentences, but they are relatively infrequent. Most frequent of these: Linking a noun (NP) with an adjective or a location. English uses the copula (to be) for this function.

Page 35: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Verbs

• Have an argument structure: typically 1, 2, or 3 nominal arguments.

• 1 argument: typically the subject NP. Intransitive verb: John slept/arrived/left/yawned. The door opened.The phone rang.

• 0 arguments?

Page 36: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Verb arguments

• 2 arguments (transitive): Subject and direct object, usually:

Kim shut the door, helped the students, wrote a book.

• 3 arguments (ditransitive): Subject, indirect object, direct object:

Kim gave Terry a book/a hand/a hard time.

Page 37: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Syntactic/semantic ambiguities• I saw the man with the telescope.

• Time flies like an arrow.

Page 38: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

S

NP VP

V NP

Det N’

the N PP

man P NP

with det N’

N

telescope

Isaw

Page 39: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

S

NP VP

V NP

Det N’

the N

man

N

telescope

Isaw

PP

P NP

with det N’

Page 40: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Part of Speech tagging

An attempt to assign categories to words without doing a whole syntactic parse:

Getting a whole parse is extremely difficult;

Much of the difficulty is the constituency, not the part of speech tagging.

Page 41: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

High frequency words are the most ambiguous regarding PoS

• table

• like– I like ice cream– I like things like ice cream– I’ve been there like 100 times.– People like him.– People like him are obnoxious.

Page 42: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Taggers

• Start with a lexicon with ranges of PoSs– each word is marked with its range of permitted

PoS– an OOV word is given a PoS based on its

morphology, if we’re lucky– A mechanism finds the best combination of

PoS, given the order of the words.

Page 43: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

The Detdesign N Vpres Vinf

Vimperative of Preptaggers Npluralis Vpresoften Advbased VpastParticiple

VpastTenseon Adverb Prepositionwhat WhPronoun

WhDeterminer

is Vpresentknown Vpast tenseabout Adverb Prepthe Detlexicon Noun. punctuation

Page 44: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

History of PoS tagging

• First large scale system in 1971: TAGGIT (Greene and Rubin): 71 items in tag set, based on 3,300 hand-written rules, using a window of up to 5 words of the word being disambiguated. But almost all of the rules looked at immediate neighbors.

Page 45: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

CLAWS1

• Part of the annotation of the Lancaster-Oslo/Bergen corpus; produced at the University of Lancaster.

• Used largely statistical techniques rather than hand-crafted rules, trained off a tagged 200K words of the Brown corpus.

• 96-97% accuracy of top PoS guess. • Used an open (not hidden) Markov model

Page 46: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.

Markov model

I’m not sure that this is exactly the model that CLAWS used, but it’s in the spirit:

p(W[i..n] & PoS[i..n]) =

])[|][(*])1[|][( iPosiWprobiPoSiPoSprob

Page 47: Word classes and the distribution of words, and Part of Speech tagging Computational linguistics.