Sectoral Operational Programme "Increase of Economic Competitiveness" "Investments for your future" Project co-financed by the European Regional Development Fund General Word Sense Disambiguation System applied to Romanian and English Languages - SenDiS - Alin Ştefănescu, Oana Șoica, Andrei Mincă & SenDiS team June 27, 2013 Word sense disambiguation using lexicon nets
15
Embed
Project co-financed by the European Regional Development Fund
Sectoral Operational Programme "Increase of Economic Competitiveness" "Investments for your future". General Word Sense Disambiguation System applied to Romanian and English Languages - SenDiS -. Project co-financed by the European Regional Development Fund. Word sense disambiguation - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Sectoral Operational Programme "Increase of Economic Competitiveness""Investments for your future"
Project co-financed by the European Regional Development Fund
General Word Sense Disambiguation System applied to Romanian and English Languages- SenDiS -
Alin Ştefănescu, Oana Șoica, Andrei Mincă & SenDiS team
June 27, 2013
Word sense disambiguation using lexicon nets
Alin Ştefănescu
Introduction
Page 3
SenDiSThe ambiguous hen
„Găina cea nouă ne ouă nouă nouă ouă.“
Image from aliexpress.com
Page 4
SenDiSNatural Language Processing (NLP)
NLP develops systems that allow computers to communicate with people using everyday language.
An important area, natural language understanding Subproblem: word sense disambiguation
Page 5
SenDiS
NLP is an active research area at Softwin Research
biometrics is the other active area
previously, antivirus research in the same R&D department led to the creation of a award-winning, internationally certified internet security and antivirus software
NLP @ SOFTWIN Research
Page 6
SenDiSNLP @ Softwin Reseach – SenDiS project
SenDiS project at Softwin Research „A general Word Sense Disambiguation System applied to Romanian and English languages“ 2010-2013 co-financed through Sectoral Operational Programme
“Increase of Economic Competitiveness” (POS-CCE) team of 7-10 computer scientists and linguists method: use of structured linguistic knowledge encoded with
SenDiS builds upon and further develops the NLP system GRAALAN at Softwin Research
Page 8
SenDiSWord Sense Disambiguation (WSD)
identify the meaning of words in context in a computational manner
very difficult problem
three main approaches: supervised disambiguation unsupervised disambiguation knowledge-based disambiguation
SenDiS
“Tower of Babel” by Brueghel
Page 9
SenDiS
GRAALAN knowledge bases can encode several types of ambiguities:
multiword expression (MWE) ambiguity
morphologic ambiguity (synthetic & analytic)
lexical ambiguity (synthetic & analytic)
morphemic ambiguity
syntactic ambiguity
Dealing with ambiguity
SenDiS
Page 10
SenDiS
a simple and intuitive knowledge-based WSD approach
computes the word overlap between sense definitions of context target words
For a two-word context (w1,w2) and S1 in Senses(w1) and S2 in Senses(w2):
scoreLesk (S1,S2) = | gloss(S1) ∩ gloss(S2) |
another variant, less computational intensive, computes the word overlap between a word sense definition and other context words
scoreLeskVar (S) = | context(w) ∩ gloss(S) |
Lesk Algorithm - basic idea
Page 11
SenDiSOur approach: Lesk Algorithm extended
1W 2W nW
1W 2W mW...1W 2W mW...
1W 2W mW...1W 2W mW...
1W 2W mW...1W 2W mW...
1W 2W mW...1W 2W mW...
1W 2W mW...
...
Text:
1S
kS
1S 2S 2S1S
kS kS
...
sense definition
annotated/WSD selected definition
link to a lexicon entry/senselink to an annotated lexicon entry/sense
link to a non annotated lexicon entry/sense
Our approach: Lesk algorithm reasoning extended.Every annotated sense is extended with its definition that also has words with disambiguated senses and so on.
Page 12
SenDiSLesk Algorithm extended - example
Generic example (Principle):
<lemma>…= Sense 1 : <word> <word> <word> <word>
Sense 2 : <word> <word> <word> <word>
Sense 3 : <word> <word> <word> <word>
<lemma>…= Sense 1 : <word> <word> <word> <word>
Sense 2 : <word> <word> <word> <word>
Sense 3 : <word> <word> <word> <word>
<lemma>…= Sense 1 : <word> <word> <word> <word>
Sense 2 : <word> <word> <word> <word>
Sense 3 : <word> <word> <word> <word>
Page 13
SenDiS
Romanian example:"radio" =
“0” : "Aparat de receptie radiofonica; radioreceptor."“1” : "Instalatie de transmitere a sunetelor prin unde electromagnetice, cuprinzând aparatele de emisiune
şi pe cele de receptie.""aparat" =
"0" : "Sistem de piese care serveste pentru o operatie mecanica, tehnica, stiintifica etc.""1" : "Sistem tehnic care transforma o forma de energie în alta.""2" : "Ansamblu de organe anatomice care servesc la îndeplinirea unei functiuni fundamentale.""3" : "Totalitatea serviciilor sau a personalului care asigura bunul mers al unei institutii sau al unui
domeniu de activitate. ""4" : "Ansamblul mijloacelor care servesc penrtu un anumit scop."
"receptie" ="0" : "Operatie de luare în primire a unui material sau a unei lucrari, pe baza verificarii lor cantitative şi
calitative.""1" : "Serviciu într-o întreprindere hoteliera care are evidenta persoanelor aflate în hotel, face
repartizarea în camere a solicitatorilor etc.""2" : "(Tehn) Primire a unei anumite forme de energie pentru a o transforma în alta forma de energie.""3" : "Reuniune, banchet cu caracter, festiv (În cercurile oficiale)."4" : "Primire, întâmpinare (cu caracter ceremonios) a unui oaspete."
"radiofonic" ="0" : "Care aparţine radiofoniei, privitor la radiofonie, care utilizeaza radiofonia."
"radioreceptor" ="0" : "Aparat folosit pentru receptionarea undelor radiofonice (prin antene), pentru
transformarea lor în semnale sonore şi transmiterea lor prin intermediul difuzoarelor; radio."