Top Banner
Natural Language Processing
18
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Natural Language Processing

Natural Language Processing

Page 2: Natural Language Processing

A Broad Topic

• Named Entity Recognition (NER) • Speech Recognition • Sentiment Analysis • Translation • Optical Character Recognition (OCR)

Page 3: Natural Language Processing

The Goals of NER

1. Detecting mentions of specific entities 2. Classifying those entities (person, place, …) 3. Grouping different references to the same entity

Page 4: Natural Language Processing

The Challenges of NER1. Named entities are often compound noun phrases 》e.g., “United Airlines”

2. Capitals aren’t a foolproof indicator 3. A single named entity can be referred to variously 》e.g., “United,” “the airline,” even just “they”

4. Rule-based approaches just don’t work 》“Every time I fire a linguist, the performance of

the speech recognizer goes up.” —Frederick Jelinek

Page 5: Natural Language Processing

Aside: Prescriptivism vs. Descriptivism

• Descriptivism holds that: 》grammar is a function of empirical usage 》“correctness” is context-dependent 》languages have fuzzy edges, like species

• Prescriptivism holds that “correctness” is absolute • Not a real debate! All serious linguists are

descriptivists • But style guides are still useful

Page 6: Natural Language Processing

The Process

1. Sentence segmentation 2. Tokenization 3. Part-of-speech (POS) tagging 4. Parsing 5. Co-reference resolution

Page 7: Natural Language Processing

Aside: n-grams

• n-grams are word sequences of a fixed length • they become useful when we count their occurrences within a corpus

• we can characterize a corpus by using these counts to determine transition probabilities 》e.g., P(yellow|the sun is)

Page 8: Natural Language Processing

Sentence Segmentation

Usually ML-based, since it’s hard to devise rules: !

1. Assemble an annotated training set 2. Create a set of features, e.g.: 》 Previous word, next word, prefix, etc. 》 Probabilities, like P(sentence start|previous word)

3. Train a classifier and predict on punctuation !

Page 9: Natural Language Processing

Tokenization: Basically the Same Damn Thing

NICE!!!!!!!!!!

Page 10: Natural Language Processing

POS TaggingDominated by Hidden Markov Models (HMMs). !

1. Assemble an annotated training set 2. Determine emission probabilities 》 e.g., P(dog|noun)

3. Determine transition probabilities 》 e.g., P(adjective|noun,verb)

4. For a given word sequence, determine most likely sequence of underlying POS using HMM

Page 11: Natural Language Processing
Page 12: Natural Language Processing

ParsingIn a sense, similar to PL parsing. !

1. Select a formal context-free grammar for the language being parsed 》in practice, usually augmented with

probabilities for the replacement rules 2. Parse according to taste 3. Select most likely parse by multiplying

probabilities of each replacement rule

Page 13: Natural Language Processing

Coreference ResolutionUsing the Hobbs algorithm: !

1. Create parse tree from input text 2. With text and parse tree as input, traverse

successive parent trees of pronouns and proper nouns, rank remaining according several factors: 》words in common 》 proximity 》 etc.

3. Pick most likely candidate and label

Page 14: Natural Language Processing

Other Cool Stuff: Morphology

uygarlaştıramadıklarımızdanmışsınızcasına uygar +laş +tır +ama +dık +lar +ımız +dan +mış +sınız +casına civilized +BEC +CAUS +NABL +PART +PL +P1PL +ABL +PAST +2PL +AsIf “(behaving) as if you are among those whom we could not civilize” !

+BEC “become”+CAUS the causative verb marker (‘cause to X’)+NABL “not able” +PART past participle form+P1PL 1st person pl possessive agreement+2PL 2nd person pl +ABL ablative (from/among) case marker+AsIf derivationally forms an adverb from a finite verb

Page 15: Natural Language Processing

Other Cool Stuff: Language Synthesis with RNNs

http://www.cs.toronto.edu/~graves/handwriting.html

Page 16: Natural Language Processing

The Future

• NLP is “AI-complete,” that is, we expect that solving it is tantamount to solving hard AI

• In the meantime, it all comes down to more data

Page 17: Natural Language Processing

References

Jurafsky, Dan, and James H. Martin. Speech and language processing. 2. ed., Pearson new internat. ed. Upper Saddle River, NJ [u.a.: Prentice Hall, Pearson Education International, 2014. Print. !Graves, Alex. "Recurrent Neural Network Handwriting Generation Demo." Department of Computer Science, University of Toronto. N.p., n.d. Web. 13 Aug. 2014. <http://www.cs.toronto.edu/~graves/handwriting.html>.

Page 18: Natural Language Processing

The End