Top Banner
Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)
30

Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

Dec 17, 2015

Download

Documents

Barrie Perkins
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

Ch 9 Part of Speech Tagging

(slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

Page 2: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

Parts of Speech

8 (ish) traditional parts of speech

• Noun, verb, adjective, preposition, adverb, article, interjection, pronoun, conjunction, etc

• This idea has been around for over 2000 years (Dionysius Thrax of Alexandria, c. 100 B.C.)

• Called: parts-of-speech, lexical category, word classes, morphological classes, lexical tags, POS

• We’ll use POS most frequently

Page 3: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

POS examples for English

N noun chair, bandwidth, pacing V verb study, debate, munch ADJ adj purple, tall, ridiculous ADV adverb unfortunately, slowly, P preposition of, by, to PRO pronoun I, me, mine DET determiner the, a, that, those

Page 4: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

Open Class Words

Every known human language has nouns and verbs

Nouns: people, places, things• Classes of nouns

—proper vs. common—count vs. mass

Verbs: actions and processes Adjectives: properties, qualities Adverbs: hodgepodge!• Unfortunately, John walked home extremely

slowly yesterday

Page 5: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

Definition:

An adverb is a part of speech. It is any word that modifies any othe r part of language: verbs, adjectives (including numbers), clauses, sentences and other adverbs, except for nouns; modifiers of nouns are primarily determiners and adjectives.

Page 6: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

Closed Class Words

Differ more from language to language than open class words

Examples:• prepositions: on, under, over, …• particles: up, down, on, off, …• determiners: a, an, the, …• pronouns: she, who, I, ..• conjunctions: and, but, or, …• auxiliary verbs: can, may should, …• numerals: one, two, three, third, …

Page 7: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

Prepositions from CELEX

Page 8: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

Pronouns in CELEX

Page 9: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

Conjunctions

Page 10: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

Auxiliaries

Page 11: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

NLP Task I – Determining Part of Speech Tags

The Problem:

nounpot

advnounadjlarge

noun-proper

noundeta

advnounprepin

nounoil

verbnounheat

POS listing in Brown CorpusWord

Page 12: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

POS Tagging: Definition

The process of assigning a part-of-speech or lexical class marker to each word in a corpus:

thekoalaputthe

keysonthe

table

WORDSTAGS

NVP

DET

Page 13: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

POS Tagging example

WORD tag

the DETkoala Nput Vthe DETkeys Non Pthe DETtable N

Page 14: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

What is POS tagging good for?

Speech synthesis:• How to pronounce “lead”?• INsult inSULT• OBject obJECT• OVERflow overFLOW• DIScount disCOUNT• CONtent conTENT

Stemming for information retrieval• Knowing a word is a N tells you it gets plurals• Can search for “aardvarks” get “aardvark”

Parsing and speech recognition and etc• Possessive pronouns (my, your, her) followed by nouns• Personal pronouns (I, you, he) likely to be followed by verbs

Page 15: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

Related Problem in Bioinformatics

Durbin et al. Biological Sequence Analysis, Cambridge University Press.

Several applications, e.g. proteins

From primary structure ATCPLELLLD

Infer secondary structure HHHBBBBBC..

Page 16: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

History: From Yair Halevi (Bar-Ilan U.)

1960

1970

1980

1990

2000

Brown Corpus Created (EN-

US)1 Million Words

Brown Corpus Tagged

HMM Tagging (CLAWS)93%-95%

Greene and Rubin

Rule Based - 70%

LOB Corpus Created (EN-UK)1 Million Words

DeRose/Church

Efficient HMMSparse Data

95%+

British National Corpus

(tagged by CLAWS)

POS Tagging separated from

other NLP

Transformation Based Tagging

(Eric Brill)Rule Based – 95%

+

Tree-Based Statistics (Helmut

Shmid)Rule Based – 96%

+Neural Network 96%

+

Trigram Tagger

(Kempe)96%+

Combined Methods

98%+

Penn Treebank Corpus

(WSJ, 4.5M)

LOB Corpus Tagged

Page 17: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

British National Carpus

What is it used for?

Ultimately, its use is limited only by our imagination; if you have any need for up to 100 million words of modern British English, you can make use of the British National Corpus.

The main uses of the corpus, are as follows: Reference Book Publishing

• Dictionaries, grammar books, teaching materials, usage guides, thesauri. Increasingly, publishers are referring to the use they make of corpus facilities: it's important to know how well their corpora are planned and constructed.

Linguistic Research• Raw data for studying lexis, syntax, morphology, semantics, discourse

analysis, stylistics, sociolinguistics... Artificial Intelligence

• Extensive data test bed for program development. Natural language processing

• Taggers, parsers, natural language understanding programs, spell checking word lists...

English Language Teaching• Syllabus and materials design, classroom reference, independent learner

research.

Page 18: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

Penn Treebank Tagset

Page 19: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

A Simplified Tagset for English

Tagsets for English have grown progressively larger since the Brown Corpus until the Penn Treebank project.

34 tags + punctuationUPenn Treebank:

197 tagsLondon-Lund Corpus:

166 tagsLancaster UCREL group:

135 tagsLOB Corpus:

87 tagsBrown Corpus:

Page 20: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

Rationale behind British & European tag sets

To provide “distinct codings for all classes of words having distinct grammatical behaviour” – Garside et al. 1987

The Lund tagset for adverb distinguishes between

• Adjunct – Process, Space, Time• Wh-type – Manner, Reason, Space, Time, Wh-type + ‘S• Conjunct – Appositional, Contrastive, Inferential, Listing, …• Disjunct – Content, Style• Postmodifier – “else”• Negative – “not”• Discourse Item – Appositional, Expletive, Greeting,

Hesitator, …

Page 21: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

Reasons for a Smaller Tagset

Many tags are unique to particular lexical items, and can be recovered automatically if desired.

sung/VBNhad/HVNbeen/BENsinging/VBGhaving/HVGbeing/BEGsang/VBDhad/HVDwas/BEDsing/VBZhas/HVZis/BEZsing/VBhave/HVbe/BE

Brown Tags For Verbs

sung/VBNhad/VBNbeen/VBNsinging/VBGhaving/VBGbeing/VBGsang/VBDhad/VBDwas/VBDsing/VBZhas/VBZis/VBZsing/VBhave/VBbe/VB

Penn Treebank Tags For Verbs

Page 22: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

Task I – Determining Part of Speech Tags

The Problem:

The Old Solution: Combinatorial search. • If each of n words has k tags on average, try the nk

combinations until one works.

nounpot

advnounadjlarge

noun-propernoundeta

advnounprepin

nounoil

verbnounheat

POS listing in BrownWord

Page 23: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

NLP Task I – Determining Part of Speech Tags

Machine Learning Solutions: Automatically learn Part of Speech (POS) assignment.

• The best techniques achieve 96-97% accuracy per word on new materials, given large training corpora.

Page 24: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

Simple Statistical Approaches: Idea 1

Page 25: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

Simple Statistical Approaches: Idea 2

For a string of words

w = w1w2w3…wn

find the string of POS tags

T = t1 t2 t3 …tn

which maximizes P(T|W)• i.e., the probability of tag string T given that

the word string was w• i.e., that w was tagged T

Page 26: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

Again, The Sparse Data Problem …

A Simple, Impossible Approach to Compute P(T|W):

Count up instances of the string "heat oil in a large pot" in the training corpus, and pick the most common tag assignment to the string..

Page 27: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

A Practical Statistical Tagger

Page 28: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

A Practical Statistical Tagger II

But we can't accurately estimate more than tag bigrams or so…

We change to a model that we CAN estimate:

Page 29: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

A Practical Statistical Tagger III

So, for a given string W = w1w2w3…wn, the tagger needs to find the string of tags T which maximizes

Page 30: Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.)

Training and Performance

To estimate the parameters of this model, given an annotated training corpus:

Because many of these counts are small, smoothing is necessary for best results…

Such taggers typically achieve about 95-96% correct tagging, for tag sets of 40-80 tags.