Lecture 22 Word Similarity Topics Topics word similarity Thesaurus based word similarity Intro. Distributional based word similarity Readings: Readings: NLTK book Chapter 2 (wordnet) Text Chapter 20 April 8, 2013 CSCE 771 Natural Language Processing
Lecture 22 Word Similarity. CSCE 771 Natural Language Processing. Topics word similarity Thesaurus based word similarity I ntro. Distributional based word similarity Readings: NLTK book Chapter 2 ( wordnet ) Text Chapter 20. April 8, 2013. Overview. Last Time (Programming) - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Lecture 22Word Similarity
Lecture 22Word Similarity
Topics Topics word similarity Thesaurus based word similarity Intro. Distributional based word similarity
Readings:Readings: NLTK book Chapter 2 (wordnet)
Text Chapter 20
April 8, 2013
CSCE 771 Natural Language Processing
– 2 – CSCE 771 Spring 2013
OverviewOverviewLast Time (Programming)Last Time (Programming)
Features in NLTK NL queries SQL NLTK support for Interpretations and Models Propositional and predicate logic support Prover9
TodayToday Last Lectures slides 25-29 Features in NLTK Computational Lexical Semantics
Readings: Readings: Text 19,20 NLTK Book: Chapter 10
Next Time: Computational Lexical Semantics IINext Time: Computational Lexical Semantics II
– 3 – CSCE 771 Spring 2013
Figure 20.1 Possible sense tags for bassFigure 20.1 Possible sense tags for bassChapter 20 – Word Sense disambiguation (WSD)Chapter 20 – Word Sense disambiguation (WSD)
Machine translationMachine translation
Supervised vs unsupervised learningSupervised vs unsupervised learning
Semantic concordance – corpus with words tagged Semantic concordance – corpus with words tagged with sense tags with sense tags
– 4 – CSCE 771 Spring 2013
Feature Extraction for WSDFeature Extraction for WSD
The bank can guarantee deposits will eventually cover The bank can guarantee deposits will eventually cover future tuition costs because it invests in adjustable future tuition costs because it invests in adjustable rate mortgage securities.rate mortgage securities.
– 15 – CSCE 771 Spring 2013
Corpus LeskCorpus Lesk
Using equals weights on words just does not seem Using equals weights on words just does not seem rightright
weights applied to overlap wordsweights applied to overlap words
inverse document frequencyinverse document frequency
idfidfii = log (N = log (Ndocsdocs / num docs containing w / num docs containing wii))
– 16 – CSCE 771 Spring 2013
SENSEVAL competitions SENSEVAL competitions
http://www.senseval.org/
Check the Senseval-3 website.
– 17 – CSCE 771 Spring 2013
SemEval-2 -Evaluation Exercises on Semantic Evaluation - ACL SigLex eventSemEval-2 -Evaluation Exercises on Semantic Evaluation - ACL SigLex event
– 18 – CSCE 771 Spring 2013
Task NameTask Name AreaArea
#1#1 Coreference Resolution in Coreference Resolution in Multiple Languages CorefMultiple Languages Coref
#6#6 Classification of Semantic Classification of Semantic Relations between MeSH Entities in Relations between MeSH Entities in Swedish Medical Texts Swedish Medical Texts
#7#7 Argument Selection and CoercionArgument Selection and CoercionMetonymyMetonymy
#8#8 Multi-Way Classification of Multi-Way Classification of Semantic Relations Between Pairs of Semantic Relations Between Pairs of NominalsNominals
#10#10 Linking Events and their Linking Events and their Participants in DiscourseParticipants in DiscourseSemantic Role Labeling, Information Semantic Role Labeling, Information ExtractionExtraction
#11#11 Event Detection in Chinese Event Detection in Chinese News SentencesNews SentencesSemantic Role Labeling, Semantic Role Labeling, Word SensesWord Senses
#12#12 Parser Training and Evaluation Parser Training and Evaluation using Textual Entailmentusing Textual Entailment
#13#13 TempEval 2TempEval 2 Time Time ExpressionsExpressions
#14#14 Word Sense InductionWord Sense Induction
#15#15 Infrequent Sense Identification Infrequent Sense Identification for Mandarin Text to Speech Systemsfor Mandarin Text to Speech Systems
#16#16 Japanese WSDJapanese WSD Word SensesWord Senses
#17#17 All-words Word Sense All-words Word Sense Disambiguation on a Specific Domain Disambiguation on a Specific Domain (WSD-domain)(WSD-domain)
#18#18 Disambiguating Sentiment Disambiguating Sentiment Ambiguous AdjectivesAmbiguous Adjectives Word Senses, Word Senses, SentimSentim
– 19 – CSCE 771 Spring 2013
20.4.2 Selectional Restrictions and Preferences20.4.2 Selectional Restrictions and Preferences• verb eat verb eat theme=object has feature Food+ theme=object has feature Food+
• Katz and Fodor 1963 used this idea to rule out Katz and Fodor 1963 used this idea to rule out senses that were not consistentsenses that were not consistent
• WSD of diskWSD of disk
(20.12) “In out house, evrybody has a career and none (20.12) “In out house, evrybody has a career and none of them includes washing dishes,” he says.of them includes washing dishes,” he says.
(20.13) In her tiny kitchen, Ms, Chen works efficiently, (20.13) In her tiny kitchen, Ms, Chen works efficiently, stir-frying several simple dishes, inlcuding …stir-frying several simple dishes, inlcuding …
Resnik’s model of Selectional AssociationResnik’s model of Selectional AssociationHow much does a predicate tell you about the semantic How much does a predicate tell you about the semantic
class of its arguments?class of its arguments?
• eat eat
• was, is, to be …was, is, to be …
• selectional preference strength of a verb is indicated selectional preference strength of a verb is indicated by two distributions:by two distributions:
1.1. P(c) how likely the direct object is to be in class cP(c) how likely the direct object is to be in class c
2.2. P(c|v) the distribution of expected semantic classes P(c|v) the distribution of expected semantic classes for the particular verb vfor the particular verb v
• the greater the difference in these distributions the greater the difference in these distributions means the verb provides more informationmeans the verb provides more information
– 21 – CSCE 771 Spring 2013
Relative entropy – Kullback-Leibler divergenceRelative entropy – Kullback-Leibler divergenceGiven two distributions P and QGiven two distributions P and Q
Resnik’s model of Selectional AssociationResnik’s model of Selectional Association
– 23 – CSCE 771 Spring 2013
High and Low Selectional Associations – Resnik 1996High and Low Selectional Associations – Resnik 1996Selectional AssociationsSelectional Associations
– 24 – CSCE 771 Spring 2013
20.5 Minimally Supervised WSD: Bootstrapping20.5 Minimally Supervised WSD: Bootstrapping ““supervised and dictionary methods require large supervised and dictionary methods require large
hand-built resources”hand-built resources”
bootstrapping or semi-supervised learning or bootstrapping or semi-supervised learning or minimally supervised learning to address the no-data minimally supervised learning to address the no-data problemproblem
Start with seed set and grow it.Start with seed set and grow it.
Idea of bootstrapping: “create a larger training set from Idea of bootstrapping: “create a larger training set from a small set of seeds”a small set of seeds”
Heuritics: senses of “bass”Heuritics: senses of “bass”
1.1. one sense per collocationone sense per collocation in a sentence both senses of bass are not used
2.2. one sense per discourseone sense per discourse Yarowsky showed that of 37,232 examples of bass
occurring in a discourse there was only one sense per discourse
YarowskyYarowsky
– 26 – CSCE 771 Spring 2013
Yarowsky algorithmYarowsky algorithm
Goal: learn a word-sense classifier for a wordGoal: learn a word-sense classifier for a word
Input: Input: ΛΛ00 small seed set of labeled instances of each sense small seed set of labeled instances of each sense
1.1. train classifier on seed-set train classifier on seed-set ΛΛ00,,
2.2. label the unlabeled corpus Vlabel the unlabeled corpus V00 with the classifier with the classifier
3.3. Select examples delta in V that you are “most Select examples delta in V that you are “most confident in”confident in”
4.4. ΛΛ11 = = Λ Λ00 + delta + delta
5.5. Repeat Repeat
– 27 – CSCE 771 Spring 2013
Figure 20.4 Two senses of plantFigure 20.4 Two senses of plant
http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html (partially in Chap 02 NLTK book; but different version) (partially in Chap 02 NLTK book; but different version)
http://grey.colorado.edu/mingus/index.php/Objrec_Wordnet.py code for similarity – runs for a while; lots of results
Hi,I was wondering if it is possible for me to use NLTK + wordnet togroup (nouns) words together via similar meanings?
Assuming I have 2000 words or topics. Is it possible for me to groupthem together according to similar meanings using NLTK?
So that at the end of the day I would have different groups of wordsthat are similar in meaning? Can that be done in NLTK? and possibly beable to detect salient patterns emerging? (trend in topics etc...).
Is there a further need for a word classifier based on the CMU BOWtoolkit to classify words to get it into categories? or the above groupwould be good enough? Is there a need to classify words further?
How would one classify words in NLTK effectively?Really hope you can enlighten me?FM
– 36 – CSCE 771 Spring 2013
Response from Steven BirdResponse from Steven Bird
> Assuming I have 2000 words or topics. Is it possible for me to group> them together according to similar meanings using NLTK?
You could compute WordNet similarity (pairwise), so that eachword/topic is represented as a vector of distances, which could thenbe discretized, so each vector would have a form like this:[0,2,3,1,0,0,2,1,3,...]. These vectors could then be clustered usingone of the methods in the NLTK cluster package.
> So that at the end of the day I would have different groups of words> that are similar in meaning? Can that be done in NLTK? and possibly be> able to detect salient patterns emerging? (trend in topics etc...).
This suggests a temporal dimension, which might mean recomputing theclusters as more words or topics come in.
It might help to read the NLTK book sections on WordNet and on textclassification, and also some of the other cited material.-Steven Bird
– 37 – CSCE 771 Spring 2013
More general? Stack-OverflowMore general? Stack-Overflow
import nltkimport nltk
from nltk.corpus import wordnet as wnfrom nltk.corpus import wordnet as wn
Fig 20.7 Wordnet with Lin P(c) valuesFig 20.7 Wordnet with Lin P(c) values
– 45 – CSCE 771 Spring 2013
Extended LeskExtended Lesk
based onbased on1. glosses
2. glosses of hypernyms, hyponyms
ExampleExample
• drawing paper: drawing paper: paperpaper that is that is specially prepared specially prepared for for use in draftinguse in drafting
• decal: the art of transferring designs from decal: the art of transferring designs from specially specially preparedprepared paperpaper to a wood, glass or metal surface. to a wood, glass or metal surface.
• Lesk score = sum of squares of lengths of common Lesk score = sum of squares of lengths of common phrasesphrases
• Example: 1 + 2Example: 1 + 222 = 5 = 5
– 46 – CSCE 771 Spring 2013
Figure 20.8 Summary of Thesaurus Similarity measuresFigure 20.8 Summary of Thesaurus Similarity measures
• tezguino makes you drunk.tezguino makes you drunk.
• We make tezguino out of corn.We make tezguino out of corn.
• What do you know about tezguino?What do you know about tezguino?
– 51 – CSCE 771 Spring 2013
Term-document matrixTerm-document matrix
Collection of documentsCollection of documents
Identify collection of important terms, discriminatory Identify collection of important terms, discriminatory terms(words)terms(words)
Matrix: terms X documents – Matrix: terms X documents – term frequency tfw,d =
each document a vector in ZV: Z= integers; N=natural numbers more accurate but perhaps
misleading
ExampleExample
Distributional Word Similarity D. Jurafsky
– 52 – CSCE 771 Spring 2013
Example Term-document matrixExample Term-document matrix
Subset of terms = {battle, soldier, fool, clown}Subset of terms = {battle, soldier, fool, clown}
Distributional Word Similarity D. Jurafsky
As you like it 12th Night Julius Caesar Henry V
Battle 1 1 8 15
Soldier 2 2 12 36
fool 37 58 1 5
clown 6 117 0 0
– 53 – CSCE 771 Spring 2013
Figure 20.9 Term in context matrix for word similarityFigure 20.9 Term in context matrix for word similaritywindow of 20 words – 10 before 10 after from Brown window of 20 words – 10 before 10 after from Brown
corpuscorpus
– 54 – CSCE 771 Spring 2013
Pointwise Mutual InformationPointwise Mutual Information
• td-idf (inverse document frequency) rating instead of td-idf (inverse document frequency) rating instead of raw countsraw counts• idf intuition again –
• pointwise mutual information (PMI)pointwise mutual information (PMI)• Do events x and y occur more than if they were
independent?• PMI(X,Y)= log2 P(X,Y) / P(X)P(Y)
• PMI between wordsPMI between words
• Positive PMI between two words (PPMI)Positive PMI between two words (PPMI)
– 55 – CSCE 771 Spring 2013
Computing PPMIComputing PPMI
Matrix with W (words) rows and C (contexts) Matrix with W (words) rows and C (contexts) columnscolumns
ffijij is frequency of w is frequency of wii in c in cjj, ,
– 56 – CSCE 771 Spring 2013
Example computing PPMIExample computing PPMI
..
– 57 – CSCE 771 Spring 2013
Figure 20.10Figure 20.10
– 58 – CSCE 771 Spring 2013
Figure 20.11Figure 20.11
– 59 – CSCE 771 Spring 2013
Figure 20.12Figure 20.12
– 60 – CSCE 771 Spring 2013
Figure 20.13Figure 20.13
– 61 – CSCE 771 Spring 2013
Figure 20.14Figure 20.14
– 62 – CSCE 771 Spring 2013
Figure 20.15Figure 20.15
– 63 – CSCE 771 Spring 2013
Figure 20.16Figure 20.16
– 64 – CSCE 771 Spring 2013
http://www.cs.ucf.edu/courses/cap5636/fall2011/nltk.pdf how to do in nltkhttp://www.cs.ucf.edu/courses/cap5636/fall2011/nltk.pdf how to do in nltkNLTK 3.0a1 released : February 2013NLTK 3.0a1 released : February 2013 This version adds support for NLTK’s graphical user interfaces. This version adds support for NLTK’s graphical user interfaces.
which similarity function in nltk.corpus.wordnet is Appropriate for find which similarity function in nltk.corpus.wordnet is Appropriate for find similarity of two words?similarity of two words?
I want use a function for word clustering and yarowsky algorightm for find I want use a function for word clustering and yarowsky algorightm for find similar collocation in a large text.similar collocation in a large text.