Top Banner
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept., IIT Bombay 11 th Jan, 2011
42

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Dec 30, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

CS460/626 : Natural Language Processing/Speech, NLP and the Web

(Lecture 4– Wordnet and Word Sense Disambiguation, cntd)

Pushpak BhattacharyyaCSE Dept., IIT Bombay

11th Jan, 2011

Page 2: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Foundation of Wordnet: Lexical Matrix

Page 3: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Meaning-form relationship

M1

M2

M3

Mk-1Mk

.

.

.

W1

W2

W3

Wk-1Wk

.

.

.

Many to many relationship

Meanings container

Word forms container

Page 4: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Meaning-formMeaning-form

Homonymy (accidental identity or word

borrowing)

Polysemy(shades of meaning)Eg: Fall

1. The kingdom fell

2. The fruit fellHomography(same picture)Eg: river bank and financial

bank

Homophony(same sound)Eg: write and

right

Page 5: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Distributional Similarity

Words which are semantically similar tend to appear in syntactically similar contexts.

The neighbors of the words tend to be the same.

Technically known as Distributional Similarity.

Page 6: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Synset+Gloss+ExampleCrucially needed for concept explication, wordnet building

using another wordnet and wordnet linking.

English Synset: {earthquake, quake, temblor, seism} -- (shaking and vibration at the surface of the earth resulting from underground movement along a fault plane of from volcanic activity)

Hindi Synset: {भू�कं� प, भू�चा�ल, भू�डोल, जलजल�, भू�कंम्प, भू�-कं� प, भू�-कंम्प, ज़लज़ल�, भू�मि�कं� प, भू�मि�कंम्प  - प्रा�कृ� ति�कृ कृ�रणों� से पृ�थ्वी� कृ भी��र� भी�ग में� कृ� छ उथल-पृ�थल हो�ने से ऊपृर� भी�ग कृ सेहोसे� तिहोलने कृ� ति या�   "२००१ में� ग�ज़र�� में� आया भी'कृ( पृ में� कृ�फ़ी� ल�ग में�र गया थ "

(shaking of the surface of earth; many were killed in the earthquake in Gujarat)

Marathi Synset: धरणी�कं� प,  भू�कं� प -  प�थ्वी�च्या� पटा�त द्रव्यक्षोभू होऊन प�ष्ठभू�ग हो�लण्या�चा� क्रि%या� " २००१ सा�ल� ग*जर�त�ध्या, झा�ल,ल्या� धरणी�कं� प�त अन,कं लकं

��त्या*�*खी� पडोल,"

Page 7: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Semantic Relations

Hypernymy and Hyponymy Relation between word senses

(synsets) X is a hyponym of Y if X is a kind of Y Hyponymy is transitive and

asymmetrical Hypernymy is inverse of Hyponymy

(lion->animal->animate entity->entity)

Page 8: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Semantic Relations (continued)

Meronymy and Holonymy Part-whole relation, branch is a part of

tree X is a meronymy of Y if X is a part of

Y Holonymy is the inverse relation of

Meronymy{kitchen} ……………………….

{house}

Page 9: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Lexical Relation

Antonymy Oppositeness in meaning Relation between word forms Often determined by phonetics, word

length etc. ({rise, ascend} vs. {fall, descend})

Page 10: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Gloss

study

Hyponymy

Hyponymy

Dwelling,abode

bedroom

kitchen

house,home

A place that serves as the living quarters of one or mor efamilies

guestroom

veranda

bckyard

hermitage cottage

Meronymy

Hyponymy

Meronymy

Hypernymy

WordNet Sub-Graph

Page 11: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Troponym and Entailment

Entailment {snoring – sleeping}

Troponym {limp, strut – walk} {whisper – talk}

Page 12: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Entailment

Snoring entails sleeping.Buying entails paying.

Proper Temporal Inclusion. Inclusion can be in any way.

Sleeping temporally includes snoring.Buying temporally includes paying.

Co-extensiveness. (Troponymy)Limping is a manner of walking.

Page 13: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Opposition among verbs.

{Rise,ascend} {fall,descend} Tie-untie (do-undo)

Walk-run (slow,fast)Teach-learn (same activity different perspective)Rise-fall (motion upward or downward)

Opposition and Entailment. Hit or miss (entail aim) . Backward presupposition.

Succeed or fail (entail try.)

Page 14: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

The causal relationship.

Show- see.Give- have.

Causation and Entailment. Giving entails having. Feeding entails eating.

Page 15: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,
Page 16: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Kinds of AntonymySize Small - BigQuality Good – BadState Warm – CoolPersonality Dr. Jekyl- Mr. HydeDirection East- WestAction Buy – SellAmount Little – A lotPlace Far – NearTime Day - NightGender Boy - Girl

Page 17: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Kinds of MeronymyComponent-object

Head - Body

Staff-object Wood - Table

Member-collection

Tree - Forest

Feature-Activity Speech - Conference

Place-Area Palo Alto - California

Phase-State Youth - Life

Resource-process

Pen - Writing

Actor-Act Physician - Treatment

Page 18: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

GradationState Childhood, Youth, Old

age

Temperature Hot, Warm, Cold

Action Sleep, Doze, Wake

Page 19: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Metonymy

Associated with Metaphors which are epitomes of semantics

Oxford Advanced Learners Dictionary definition: “The use of a word or phrase to mean something different from the literal meaning”

Does it mean Careless Usage?!

Page 20: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Insight from Sanskritic Tradition

Power of a word Abhidha, Lakshana, Vyanjana

Meaning of Hall: The hall is packed (avidha) The hall burst into laughing

(lakshana) The Hall is full (unsaid: and so we

cannot enter) (vyanjana)

Page 21: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Metaphors in Indian Tradition

upamana and upameya Former: object being compared Latter: object being compared with Puru was like a lion in the battle with

Alexander (Puru: upameya; Lion: upamana)

Page 22: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Upamana, rupak, atishayokti upamana: Explicit comparison

Puru was like a lion in the battle with Alexander

rupak: Implicit comparison Puru was a lion in the battle with

Alexander Atishayokti (exaggeration):

upamana and upameya dropped Puru’s army fled. But the lion fought

on.

Page 23: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Modern study (1956 onwards, Richards et. al.) Three constituents of metaphor

Vehicle (items used metaphorically) Tenor (the metaphorical meaning of the

former) Ground (the basis for metaphorical

extension) “The foot of the mountain”

Vehicle: :foot” Tenor: “lower portion” Ground: “spatial parallel between the

relationship between the foot to the human body and the lower portion of the mountain with the rest of the mountain”

Page 24: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Interaction of semantic fields(Haas)

Core vs. peripheral semantic fields Interaction of two words in

metonymic relation brings in new semantic fields with selective inclusion of features

Leg of a table Does not stretch or move Does stand and support

Page 25: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Lakoff’s (1987) contribution

Source Domain Target Domain Mapping Relations

Page 26: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Mapping Relations: ontological correspondences

Anger is heat of fluid in container

Heat(i) Container(ii) Agitation of fluid(iii) Limit of resistence(iv) Explosion

AngerBodyAgitation of mindLimit of ability to suppressLoss of control

Page 27: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Image Schemas Categories: Container Contained Quantity

More is up, less is down: Outputs rose dramatically; accidents rates were lower

Linear scales and paths: Ram is by far the best performer

Time Stationary event: we are coming to exam

time Stationary observer: weeks rush by

Causation: desperation drove her to extreme steps

Page 28: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Patterns of Metonymy Container for contained

The kettle boiled (water) Possessor for possessed/attribute

Where are you parked? (car) Represented entity for

representative The government will announce new

targets Whole for part

I am going to fill up the car with petrol

Page 29: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Patterns of Metonymy (contd)

Part for whole I noticed several new faces in the

class Place for institution

Lalbaug witnessed the largest Ganapati

Question: Can you have part-part metonymy

Page 30: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Purpose of Metonymy More idiomatic/natural way of

expression More natural to say the kettle is boiling as

opposed to the water in the kettle is boiling Economy

Room 23 is answering (but not *is asleep) Ease of access to referent

He is in the phone book (but not *on the back of my hand)

Highlighting of associated relation The car in the front decided to turn right

(but not *to smoke a cigarette)

Page 31: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Feature sharing not necessary

In a restaurant: Jalebii ko abhi dudh chaiye (no

feature sharing) The elephant now wants some coffee

(feature sharing)

Page 32: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Proverbs

Describes a specific event or state of affairs which is applicable metaphorically to a range of events or states of affairs provided they have the same or sufficiently similar image-schematic structure

Page 33: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

WSD APPROACHES

Page 34: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

WSD – Problem Definition

Obtain the sense of A set of target words, or of All words (all word WSD, more

difficult) Against a

Sense repository (like the wordnet), or A thesaurus (not same as wordnet,

does not have semantic relations) Using the

Context in which the word appears.

Page 35: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Example word: operation operation ((computer science) data processing in which the result is

completely specified by a rule (especially the processing that results from a single instruction)) "it can perform millions of operations per second"

operation, military operation (activity by a military or naval force (as a maneuver or campaign)) "it was a joint operation of the navy and air force"

operation, surgery, surgical operation, surgical procedure, surgical process (a medical procedure involving an incision with instruments; performed to repair damage or arrest disease in a living body) "they will schedule the operation as soon as an operating room is available"; "he died while undergoing surgery"

mathematical process, mathematical operation, operation ((mathematics) calculation by mathematical methods) "the problems at the end of the chapter demonstrated the mathematical processes involved in the derivation"; "they were learning the basic operations of arithmetic"

Page 36: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

36

KNOWLEDEGE BASED v/s MACHINE LEARNING BASED v/s HYBRID APPROACHES

Knowledge Based Approaches Rely on knowledge resources like WordNet,

Thesaurus etc. May use grammar rules for disambiguation. May use hand coded rules for disambiguation.

Machine Learning Based Approaches Rely on corpus evidence. Train a model using tagged or untagged corpus. Probabilistic/Statistical models.

Hybrid ApproachesUse corpus evidence as well as semantic relations

form WordNet.

Page 37: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

37

SELECTIONAL PREFERENCES (INDIAN TRADITION)

“Desire” of some words in the sentence (“aakaangksha”).

I saw the boy with long hair. The verb “saw” and the noun “boy” desire an object here.

“Appropriateness” of some other words in the sentence to fulfil that desire (“yogyataa”).

I saw the boy with long hair. The PP “with long hair” can be appropriately connected only to

“boy” and not “saw”.

In case, the ambiguity is still present, “proximity” (“sannidhi”) can determine the meaning.

E.g. I saw the boy with a telescope. The PP “with a telescope” can be attached to both “boy” and

“saw”, so ambiguity still present. It is then attached to “boy” using the proximity check. 37

Page 38: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

38

SELECTIONAL PREFERENCES (RECENT LINGUISTIC THEORY)

There are words which demand arguments, like, verbs, prepositions, adjectives and sometimes nouns. These arguments are typically nouns.

Arguments must have the property to fulfil the demand. They must satisfy selectional preferences.

Example Give (verb)

agent – animate obj – direct obj – indirect

I gave him the book I gave him the book (yesterday in the school) ->

adjunct How does this help in WSD?

One type of contextual information is the information about the type of arguments that a word takes.

38

Page 39: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

39

WSD USING SELECTIONAL PREFERENCES AND ARGUMENTS

39

This airlines serves dinner in the evening flight.

serve (Verb) agent object – edible

This airlines serves the sector between Agra & Delhi.

serve (Verb) agent object – sector

Sense 1 Sense 2

Requires exhaustive enumeration of: Argument-structure of verbs. Selectional preferences of arguments. Description of properties of words such that meeting the selectional

preference criteria can be decided.

E.g. This flight serves the “region” between Mumbai and Delhi

How do you decide if “region” is compatible with “sector”

Page 40: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

40

OVERLAP BASED APPROACHES

Require a Machine Readable Dictionary (MRD).

Find the overlap between the features of different senses of an ambiguous word (sense bag) and the features of the words in its context (context bag).

These features could be sense definitions, example sentences, hypernyms etc.

The features could also be given weights.

The sense which has the maximum overlap is selected as the contextually appropriate sense.

40

Page 41: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

41

LESK’S ALGORITHM

41

Sense 1Trees of the olive family with pinnate leaves, thin furrowed bark and gray branches.

Sense 2The solid residue left when combustible material is thoroughly burned or oxidized.

Sense 3To convert into ash

Sense 1A piece of glowing carbon or burnt wood.

Sense 2charcoal.

Sense 3A black solid combustible substance formed by the partial decomposition of vegetable matter without free access to air and under the influence of moisture and often increased pressure and temperature that is widely used as a fuel for burning

Ash Coal

Sense Bag: contains the words in the definition of a candidate sense of the ambiguous word.

Context Bag: contains the words in the definition of each sense of each context word.

E.g. “On burning coal we get ash.”

In this case Sense 2 of ash would be the winner sense.

Page 42: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

42

WALKER’S ALGORITHM A Thesaurus Based approach. Step 1: For each sense of the target word find the thesaurus

category to which that sense belongs. Step 2: Calculate the score for each sense by using the context

words. A context words will add 1 to the score of the sense if the thesaurus category of the word matches that of the sense.

E.g. The money in this bank fetches an interest of 8% per annum

Target word: bank Clue words from the context: money, interest, annum,

fetch Sense1: Finance Sense2: Location

Money +1 0

Interest +1 0

Fetch 0 0

Annum +1 0

Total 3 0

Context wordsadd 1 to thesense when the topic of theword matches thatof the sense