Top Banner
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN
29

WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

Dec 26, 2015

Download

Documents

Clinton Small
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

WORDNETApproach on word sense techniques

- AKILAN VELMURUGAN

Page 2: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

What is WORDNET

Machine readable semantic dictionary

interlinked by semantic relations

Developed by PRINCETON University

Large lexical database for English language

Language forms a scale free network with

small average shortest path having words as

nodes and concepts as links

source: http://wordnet.princeton.edu/

Page 3: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

Use of wordnet Easily navigable Used as online dictionary for English Freely for public availability

structure to show relations in the form of - noun, verb, adjective, adverb - synonymn - hypernym (Is a kind of …) - hyponym (… is a kind of) - troponym (particular ways to …) - meronym (parts of . . .)

WORDNET Application

source: http://wordnet.princeton.edu/

Page 4: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

Few representations of WORDNET

Schema representation Graph Theory Tree structure Force graph structure wordnet explorer

Visual Interface for wordnet

Page 5: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

Using RDF Schema and OWL ontology

Wordnet classes and properties are represented as wn:word and wn:wordsense

Source: www.w3.org/.../WNET/wordnet-sw-20040713.html

Page 6: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

Source: www.w3.org/.../WNET/wordnet-sw-20040713.html

Page 7: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

Represented using Graph theory

can be directed or un-directed graph

Source: www. nodebox.net/code/index.php/Graph

Page 8: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

Source: www. nodebox.net/code/index.php/Graph

Page 9: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

Represented using Tree sturucture

uses tokens and lexical relations

Source: www. docs.huihoo.com/nltk/0.9.5/en/ch02.html

Page 10: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

Source: www. docs.huihoo.com/nltk/0.9.5/en/ch02.html

Page 11: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

Represented using Force Graph Structure

Presentation of words and meanings as graph nodes, and relations as edges between them

Source: www. code.google.com/p/synonym/

Page 12: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

Source: www. code.google.com/p/synonym/

Page 13: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

Represented for WORDNET Explorer

For applying visual principles to Lexical semantics

Source: www.cs.toronto.edu/~ccollins/research/wnVis.htm

Page 14: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

Source: www.cs.toronto.edu/~ccollins/research/wnVis.htm

Page 15: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

Flow of study

Background study on wordsense

word ontology

Word Sense Disambiguation

Variable lexical notation for a concept

i-level generic notation

i-level specific notation

Semantic relatedness in WSD

Experiment Results

Thesaurus as a complex network

Visual Interface for wordnet

WORDNET – synsets – word ontology – set algebra – rules for representing lexical notations – semantic relatedness between concepts – concept distribution statistics – Degree of semantic relatedness :: WSD – Word Sense Disambiguation – semcor – Test cases – WSD on a complex network – WSD in English Thesaurus – Future work

Source: http://kylescholz.com/projects/wordnet

Page 16: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

Wordnet – common sense ontology Symbols are words Concept meanings are synsets

Represented by one or more wods Words used for representation: synonymns

Synonyms and polysemous word Synset comprises a list of words and a list of

semantic relations between other sysnsets. Part I – list of words each one with a list of synsets

that the word represents Part II – set of semantic relations between

synsets(is-a, part-of, substance-of, member-of)

Page 17: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

WSD: variable lexical notations for a concept Generic concept

notation: D = I ∪ J ∪ K∴ J = D − (I ∪ K) = (D − I )∩(D − K) = D∩ (I∪ K) J = D∩ ( I ∩K)

since, B = D ∪ E ∪ F D = B − (E∪F) =(B − E)∩(B − F) = B∩(E ∪F) D =B ∩(E ∩ F)

Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

¯¯¯¯

¯ ¯

¯¯¯¯

¯ ¯

Page 18: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

WSD: variable lexical notations for a concept

J = D∩ ( I ∩K) =( B∩(E ∩ F) )∩( I ∩ K) J = B∩( (E ∩ F)∩( I ∩

K) )when J = fly, D = fish lure I = spinner k = troll And introducing boolean

operators,

AND for ∩

OR for ∪

NOT for

¯ ¯

¯ ¯ ¯ ¯

¯ ¯ ¯ ¯

¯

Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

Page 19: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

WSD: variable lexical notations for a concept

(“fly”) becomes : (“fisherman's lure” OR “fish

lure”) AND ( (NOT “spinner”) AND (NOT “troll”) )

then B = lure,

E = ground bait,

F = stool pigeon

(“fly”) becomes :

(“bait” OR “decoy” OR “lure”) AND ( ((NOT

“ground bait”) AND (NOT “stoolpigeon”) AND((NOT “spinner”)AND(NOT “troll”)) )

Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

Page 20: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

Notation for synset i-level generic notation for a

synset

If Sk is a synset, Fi is the synset that is located i links away following the hypernym links from Sk then the i-level generic notation for Sk is:

Note: Fi is the parent node of Fi-1, Fi-1 is the parent node of Fi-2 …

i-level specific notation for a synset

J = P ∪Q∪ R

when, P = T

Q = U

R = V∪ W

∴ J = T ∪ U ∪(V ∪W)

If S is a synset, Li is the set of synsets, Cik that are located i links away following the hyponym links from S, then the i-level specific regular notation for S is:

Note: if Cik is null, then C(i-1)k would be used (C(i-1)k is a leaf node in the case)

Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

Page 21: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

Page 22: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

WSD: Semantic relatedness and word sense disambiguation

Procedure for determining the semantic relatedness of two given wordnet synsets

Conception 1: Concepts that appear more frequently and closer with each others are "more related" to each others than the concepts that appear less frequently and farther are.Conception 1 Synset relatedness measurement

concepts Synset lexical notation

close or far of appearance

Exists in a web page or not

co-occurance frequency

Number of web pages containing synsets

Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

Page 23: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

WSD: Semantic relatedness and word sense disambiguation

Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

Page 24: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

WSD: Tested for four random textsi-level generic notation ( 1, 2, 3 )Size of windows of context: Target words Vs Context words ( 3, 5, 7 )

Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

Page 25: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

Thesaurus as a complex network

As a Directed Graph:

sink composed of the 73,046 terms with kout = 0

source are the 30,260 terms with at least one outgoing link (kout > 0) – Root words absolute source : without

incoming links kin = 0 normal source : (kout > 0 and

kin > 0) bridge source : without

outgoing links to root words (kout(source) = 0)

1 – Normal source2 – Bridge source3 – Absolute source4 – sink

Source: arXiv:cond-mat/0312586 v1 2003

Page 26: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

Thesaurus as a complex network

Frequency of outgoing links

Frequency of incoming links

Source: arXiv:cond-mat/0312586 v1 2003

Page 27: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

Thesaurus as a complex network

Incoming Vs Outgoing Frequency Frequency distribution

Kout – for root words

Kin – for all words

- Root words in Kout

- All words in Kin

- Root words in Kin

- Non root words in Kin

Page 28: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

Extension of wordnet

Transforming a Tree structure to a Matrix

structure

Wordnet in other languages (japanese,

korean, Thai)

Imagenet interlinked with wordnet

REBUILDER – a repository of software designs

Retrieves using bayesian network and wordnet

Page 29: WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.