Pinto Evaluating N-gram Models for a Bilingual Word Sense Disambiguation Task

8/12/2019 Pinto Evaluating N-gram Models for a Bilingual Word Sense Disambiguation Task

1/12

Computacin y Sistemas Vol. 15 No. 2, 2011 pp 209-220ISSN 1405-5546

Evaluating n-gram Modelsfor a Bilingual Word Sense Disambiguation Task

David Pinto, Darnes Vilario, Carlos Balderas, Mireya Tovar, and Beatriz Beltrn

Facultad de Ciencias de la Computacin Benemrita Universidad Autnoma of Puebla,Puebla, Mexico

{dpinto,darnes,mtovar,bbeltran}@cs.buap.mx

Abstract The problem of Word Sense Disambiguation(WSD) is about selecting the correct sense of anambiguous word in a given context. However, even if theproblem of WSD is difficult, when we consider itsbilingual version, this problem becomes much morecomplex. In this case, it is necessary not only to find thecorrect translation, but such translation must consider

the contextual senses of the original sentence (in thesource language), in order to find the correct sense (inthe target language) of the source word. In this paperwe present a probabilistic model for bilingual WSDbased on n-grams (2-grams, 3-grams, 5-grams and k-grams, for a sentence S of a length k). The aim is toanalyze the behavior of the system with differentrepresentations of a given sentence containing anambiguous word. We use a Nave Bayes classifier fordetermining the probability of the target sense (in thetarget language) given a sentence which contains anambiguous word (in the source language). For thispurpose, we use a bilingual statistical dictionary, whichis calculated with Giza++ by using the EUROPARLparallel corpus. On the average, the representationmodel based on 5-grams with mutual informationdemonstrated the best performance.

Keywords Bilingual word sense disambiguation,machine translation, parallel corpus, Nave Bayesclassifier.

Evaluacin de mod elos de n-gramas

para la tarea de desambig uacin

bi l ing e del sentido de las palabrasResumen El problema de desambiguacin del sentidode las palabras (WSD) consiste en seleccionar el sentido

adecuado de una palabra polismica, considerando elcontexto en el que sta se encuentra. Esta tarea secomplica an ms cuando se desea desambiguar entredistintos idiomas; en el caso de dos idiomas, a esteproblema se le conoce como WSD bilinge. Es necesarioentonces no solamente encontrar la traduccincorrecta, sino tambin esta traduccin debe considerar

los sentidos de las palabras en el contexto de la oracinoriginal (en un idioma fuente), para encontrar elcorrecto sentido de la palabra ambigua (en un idiomadestino). En este trabajo de investigacin se presenta unmodelo probabilstico para la desambiguacin bilingebasado en n-gramas (2-gramas, 3-gramas, 5-gramas y k-gramas, para una oracin S de longitud k). El objetivo es

analizar el comportamiento del sistema dedesambiguacin con diferentes representaciones de laoracin que contiene la palabra ambigua. Para estepropsito se usa el clasificador de Nave Bayes paradeterminar la probabilidad de un sentido candidato (enun idioma destino), dada una oracin que contiene lapalabra ambigua (en un idioma fuente). Se emplea undiccionario estadstico bilinge, el cual es calculadocon el software Giza++ usando el corpus paraleloEUROPARL. Se evaluaron las diferentes representacionesllegando a la conclusin de que aquella basada en 5-gramas con esquema de filtrado por informacinmutua de bigramas ofrece el mejor valor de precisin.

Palabras clave Desambiguacin bilinge del sentido delas palabras, traduccin automtica, corpus paralelo,clasificador de Nave Bayes.

1 Introduction

Word Sense Disambiguation (WSD) is a task thathas been studied for a long time. The aim ofWSD is to select the correct sense of a givenambiguous word in some context. The fact thatthe automatic WSD continues to be an openproblem has invoked a great interest in thecomputational linguistics community, therefore,many approaches has been introduced in the lastyears [1]. Competitions such as Senseval andrecently SemEval

1 have also motivated thegeneration of new systems for WSD, providingan interesting environment for testing those

1http://www.senseval.org/;http://nlp.cs.swarthmore.edu/semeval/
mailto:@cs.buap.mxmailto:@cs.buap.mxmailto:@cs.buap.mxhttp://www.senseval.org/http://www.senseval.org/mailto:@cs.buap.mx


2/12

210 David Pinto, Darnes Vilario, Carlos Balderas, Mireya Tovar


systems. Despite the WSD task has beenstudied for a long time, the expected feeling isthat WSD should be integrated into realapplications such as mono- and multi-lingual

search engines, machine translation systems,automatic answer machines, etc. [1]. Differentstudies on this issue have demonstrated thatthose applications benefit from WSD [4], [3].

Monolingual word sense disambiguation isknown to be a difficult task, however, when weconsider its bilingual version, the problembecomes much more complex. In this case, it isnecessary not only to find the correct translation,but such translation must consider the contextualsenses of the original sentence (in the sourcelanguage), in order to find the correct sense (inthe target language) of the source word.

For the experiments reported in this paper, weconsidered English as the source language andSpanish as the target language. Thus, weattempted the bilingual version of WSD. We donot use an inventory of senses, as most of theWSD systems do. Instead, we try to find thosesenses automatically by means of a bilingualstatistical dictionary which is calculated on the

basis of the IBM-1 translation model2by using a

filtered version of the EUROPARL parallel

corpus3. This filtered version is obtained by

selecting sentences of EUROPARL containingsome of the ambiguous words of the test corpus.

The bilingual statistical dictionary is fed into aNave Bayes classifier in order to determine theprobability of a target sense, given a sourcesentence (which contains the ambiguous word).Note that we do not use a training corpus ofdisambiguated words. Instead, we construct aclassification model based on the probability oftranslating each ambiguous word (and thosewords that surround it). We are aware that otherclassification models exist such as CRF [8] andSVM [5]. However, since we have chosen aprobabilistic model based on independentfeatures (calculated by means of the IBM-1translation model), in order to find the correct

target sense, we believe that the Nave Bayesclassifier perfectly fits with this kind of approach.

2We used Giza++ (http://fjoch.com/GIZA++.html)

3http://www.statmt.org/europarl/

The main aim of this research is to evaluate towhat extent each word, in a neighborhood of theambiguous word, contributes to improving theprocess of the bilingual WSD. We may

hypothesize the following: the closer a term is tothe ambiguous word, the more it helps to find thecorrect target sense. A natural documentrepresentation is the use of n-grams. In our work,we decided to evaluate six different approachesbased on n-grams whose performance is furthershown in this paper. A brief explanation of thegeneral approach is as follows. Given a sentenceS, we consider its representation by using one|S|-gram. We then propose the first approachconsidering the distance of each sentence termto the ambiguous word (weighted version). Thesecond approach also uses one |S|-gram

disregarding the distance of each term to theambiguous word (unweighted version). The thirdapproach considers the use of bigrams, i.e., asequence of two terms containing the ambiguousword (a window size of 1 around the ambiguousword). The fourth approach uses 3-grams. Thefifth and sixth approaches both use 5-grams; theformer filters 5-grams using pointwise mutualinformation between each pair of terms of a 5-gram; the latter uses the students t-distribution inorder to determine those bigrams that are likelyto be a collocation, i.e., that they do not co-occurby chance. For each approach proposed, weobtain a candidate set of translations for the

source ambiguous word by applying theprobabilistic model on the basis of the n-gramsselected.

The rest of this paper is structured as follows.Section 2 presents some efforts reported inliterature that we consider to be related with thepresent research. Section 3 introduces theproblem of the bilingual word sensedisambiguation. In Section 4 we define theprobabilistic model used as a classifier for thebilingual WSD task. The experimental resultsobtained on the two datasets are shown inSection 5. Finally, the conclusions are given in

Section 6.


3/12

Evaluating n-grams Models for the Bilingual Word Sense Disambiguation Task 211


2 Related Work

The selection of the appropriate sense for agiven ambiguous word is commonly carried out

by considering the words surrounding theambiguous word. A comprehensive survey ofseveral approaches may be found in [1]. As maybe seen, a lot of work has been done on findingthe best supervised learning approach for WSD(for instance, see [6], [9], [11], [14]), but despitethe wide range of learning algorithms, it has beennoted that some classifiers such as Nave Bayesare very competitive and their performancebasically relies on the representation schemataand their feature selection process.

There are other works described in literaturein which parallel corpora (bilingual or multilingual)

was used for dealing with the problem of WSD(for instance, see [12], [4]). Such approaches areexpected to find the best sense in the samelanguage (despite using other languages fortraining the learning model), however, in ourresearch we are interested in finding the besttranslated word, i.e., the word with the correctsense in a different language.

3 Bilingual Word SenseDisambiguation

Word sense disambiguation is an important taskin multilingual scenarios due to the fact that themeanings represented by an ambiguous word inthe source language may be represented bymultiple words in the target language. Considerthe word bank which may have up to 42

different meanings4. Suppose we select one of

these meanings, namely, to put into a bankaccount (to bank). The corresponding meaningin other languages would be to make a deposit.In Spanish, for instance, you would never sayShe banks her paycheck every month (*Ellabankea su cheque cada mes), but She depositsher paycheck every month (Ella deposita sucheque cada mes). Therefore, the ability fordisambiguating a polysemous word in manylanguages is essential to the task of machine

4http://ardictionary.com/Bank/742

translation and to such a related NaturalLanguage Processing (NLP) task as bilinguallexical substitution [13].

In the task of bilingual word sense

disambiguation, we are required to obtain suchtranslations of a given ambiguous word whichmatch with the original word sense. As anexample, let us consider the following sentencecontaining one polysemous word to bedisambiguated. This sentence serves as theinput, and the expected results are the following:

Input sentence: equivalent to giving fish topeople living on the bank of the river ...[English]

Output sense label:Sense Label = {oever/dijk} [Dutch]

Sense Label = {rives/rivage/bord/bords}[French]Sense Label = {Ufer} [German]Sense Label = {riva} [Italian]Sense Label = {orilla} [Spanish]

The bilingual WSD system is able to find thecorresponding translation of bank in the targetlanguage with the same sense meaning. In orderto deal with this problem we propose to use aprobabilistic model based on n-grams. Thisproposal is discussed in the following section.

4 A Nave Bayes Approach to BilingualWSD

We approached the bilingual word sensedisambiguation task by means of a probabilisticsystem based on Nave Bayes, which considersthe probability of a word sense (in the targetlanguage), given a sentence (in the sourcelanguage) containing the ambiguous word. Wecalculated the probability of each word in thesource language of being associated/ translatedto the corresponding word (in the targetlanguage). The probabilities were estimated bymeans of a bilingual statistical dictionary which iscalculated using the Giza++ system over theEUROPARL parallel corpus. We filtered thiscorpus by selecting only sentences containingcandidate senses of the ambiguous word (which


4/12



were obtained by translating the ambiguous wordin the Google search engine).

We will start this section by explaining themanner we represent the source documents (n-

grams) in order to solve the bilingual word sensedisambiguation problem. We further discusssome particularities of the general approach tothe evaluated task.

4.1 The n-gram Model

In order to represent an input sentence we haveconsidered a model based on n-grams.Remember, that we attempt to evaluate thedegree of support in the process of bilingualdisambiguation, for each word in a neighborhoodof the ambiguous word. Thus, a natural

document representation is the use of n-grams,and, therefore, each sentence is split into gramsof n terms. In order to fully understand thisprocess, let us consider the following example ofthe ambiguous word execution, and its pre-processed version which was obtained byeliminating punctuation symbols and stop words(no other pre-processing step was performed):

Input sentence: Allegations of Iraqi armybrutality, including summary executionsand the robbing of civilians at gun-point forfood, were also reported frequently duringFebruary.

Pre-processed input sentence: AllegationsIraqi army brutality including summaryexecutions robbing civilians gun-point foodreported frequently during February

In the experiments reported in this paper, weconsidered six different approaches, but only fourtypes of n-grams (bigrams, 3-grams, 5-gramsand the complete sentence, i.e., |S|-grams)which are described (including one example) asfollows.

The representation of documents by means ofbigrams is constructed by selecting sequences of

two terms that sequentially co-occur in thesentence but considering that at least one of theterms is the ambiguous word. This considerationleads us to conform bigrams of terms in aneighborhood of window size one of theambiguous word. For the example sentence

presented above, we should obtain the followingrepresentation (two bigrams):

2-grams: {summary, executions}, {executions,robbing}

If we represent the sentence by using 3-grams, we must consider sequences of threeterms containing the ambiguous word. For thesame example sentence, we should get thefollowing set of 3-grams:3-grams: {including, summary, executions},

{summary, executions, robbing},{executions robbing, civilians}

In the case of representing the sentences by5-grams, we should select sequences of fiveterms containing the ambiguous word, i.e., awindow size of two around this word. The set of5-grams for the same example sentence should

be:5-grams: {army, brutality, including, summary,executions}, {brutality, including, summary,executions, robbing}, {including, summary,executions, robbing, civilians}, {summary,executions, robbing, civilians, gun-point},{executions, robbing, civilians, gun-point,food}

Finally, if we consider the sentence S ofexample, we must consider all the terms inside it.The sentence representation by means of |S|-grams is as follows:

|S|-gram: {Allegations Iraqi army brutalityincluding summary executions robbingcivilians gun-point food reportedfrequently during February}

We experimented with two different n-gramsfiltering methods for the particular case ofrepresenting the sentences by 5-grams. Firstly,we discarded those bigrams belonging to the 5-grams that do not offer enough evidence of co-occurrence. For this purpose, we use thepointwise mutual information formulae which is

presented in Eq. (1)

[ ] (1)where the bigram t1t2, is the sequence of the twoterms t1 and t2 which occurs in the 5-gram,


5/12



freq(t1) is the frequency of the term t1 in thecomplete corpus, and Nis the number of terms inthe corpus.

The second method used for filtering the

terms in the 5-grams is the students t-distributionapplied in order to eliminate those terms that co-occur by chance.

Given two terms t1and t2contained in one 5-gram, we considered the following hypotheses:

H0: P(t1t2) = P(t1) * P(t2)H1: P(t1t2) > P(t1) * P(t2).

We assume that each term t1 and t2 wasgenerated independently, therefore, the nullhypothesis (H0) declares that the bigram t1t2co-occur by chance, i.e., this bigram is notconsidered a collocation whereas the alternative

hypothesis (H1) states that the bigram is in fact acollocation. The t-distribution is calculated asfollows:

(2)

where

(3)

and

(4)

with N equal to the number of bigrams of thewhole corpus. If the tvalue is greater than 2.576,then we reject the null hypothesis with asignificance level = 0.005.

The representation of sentences by means ofn-grams aims at studying the impact of the termsthat co-occur with the ambiguous word. Webelieve that the importance of such terms should

be emphasized and they should be employed inthe process of bilingual word sensedisambiguation. For each n-gram sentencerepresentation proposed, we obtain a candidateset of translations for the source ambiguous wordby applying a probabilistic model on the basis of

the n-grams selected. Further details areexplained in the next section.

4.2 The Probabilistic Model

Given an English sentence SE, we consider itsrepresentation based on n-grams as discussed inthe previous section. Let S= {w1, w2, , wk, ,wk+1, , w|S|} be the n-gram representation of SEconstructed by bringing together all the n-grams(wk is the ambiguous word). Let us consider N

candidate translations of wk, { }obtained somehow (we will discuss this issuelater in this section). We are interested in findingthe most likely candidate translations for thepolysemous word wk. Therefore, we may use aNave Bayes classifier which takes into account

the probability of given wk. A formaldescription of the classifier is given as follows.

(|)(| ) (5)(| )

(6)

We are looking for the argument that

maximizes p(|S), therefore, the process ofcalculating the denominator may be avoided.Moreover, if we assume that all the different

translations are equally distributed, then Eq. (6)must be approximated by Eq. (7).

(| ) (7)

The complete calculation of Eq. (7) requiresapplying the chain rule. However, if it is assumedthat the words of the sentence are independent,then Eq. (7) may be rewritten as Eq. (8).

(| )

(8)

The best translation is obtained as shown inEq. (9). Irrespective of the position of theambiguous word, we consider only the product of


6/12



the probabilities of translation. Algorithm 1provides details of implementation.

arg

(9)

where i= 1, , N.

An alternative approach (the weightedversion) is proposed as well and shown in Eq.(10). The aim of this approach is to verifywhether it is possible to obtain a betterperformance in the bilingual word sensedisambiguation task when the distance of eachterm to the ambiguous word in the probabilisticmodel is considered. Algorithm 2 provides detailsabout implementation.

arg

(10)

with i = 1, , N.

We have used the Google translator5 in order

to obtain the N candidate translations of the

polysemous word wk, { }. Googleprovides all the possible translations for wk withthe corresponding grammatical category.Therefore, we are able to use translations thatmatch with the same grammatical category of theambiguous word. Even if we attempted otherapproaches such as selecting the most probabletranslations from the statistical dictionary, wewould have confirmed that by using the Googleonline translator we obtain the best performance.We consider that this result is derived from thefact that Google has a better language modelthan we have, because our bilingual statisticaldictionary was trained only on the EUROPARLparallel corpus.

5http://translate.google.com.mx/

Input: A set Qof sentences: Q= {S1, S2, };

Dictionary=p(w|t): A bilingual statisticaldictionary;Output:The best word/sense for each

ambiguous word wjS1.1 forl= 1 to |Q| do2 fori= 1 to Ndo3 Pl,i= 1;4 forj= 1 to |Sl| do

5 foreachwj Sldo

6 ifwjDictionarythen7 Pl,i= Pl,i*p(wj| );8 else

9 Pl,i= Pl,i* ;10 end11 end12 end13 end14 end

15 return Algorithm 1.A Nave Bayes approach to bilingual

WSD

Input: A set Qof sentences: Q= {S1, S2, };

Dictionary=p(w|t): A bilingual statisticaldictionary;Output:The best word/sense for each

ambiguous word wjS1.

1 forl= 1 to |Q| do2 fori= 1 to Ndo3 Pl,i= 1;4 forj= 1 to |Sl| do

5 foreachwjSldo

6 ifwjDictionarythen

7 Pl,i= Pl,i*p(wj| )* ;8 else

9 Pl,i= Pl,i* ;10 end11 end12 end13 end

14 end15 return

Algorithm 2.A weighted Nave Bayes approach tocross-lingual WSD


7/12



Fig. 1. An overview of the presented approach for bilingual word sense disambiguation

Table 1. Test set for the bilingual WSD task

Noun name

coach education execution figure

job post pot range

rest ring mood soil

strain match scene test

mission letter paper side

To summarize the above said, we haveproposed a probabilistic model (see Figure 1)that uses a statistical bilingual dictionary,constructed with the IBM-1 translation model, onthe basis of a filtered parallel corpus extractedfrom EUROPARL. This corpus includes Spanish

translations of each different English ambiguousword. The probability of translation of eachambiguous word is then modeled by means ofthe Nave Bayes classifier in order to furtherclassify the original ambiguous words with thepurpose of finding the correct translation in the

Spanish language. As it may be seen, we do notuse any training corpus of disambiguated words.

5 Experimental Results

The results of experiments with differentsentence representations based on n-grams forbilingual word sense disambiguation are given inthis section. First, we describe the corpus used inthe experiments and further on, we present the


8/12



evaluation of the six different approaches basedon n-grams.

5.1 Datasets

In our experiments, 25 polysemous Englishnouns were used. We selected five nouns(movement, plant, occupation, bank andpassage), each with 20 example instances, forconforming a development corpus. Theremaining polysemous nouns (twenty) wereconsidered as the test corpus. In the test corpus,50 instances per noun were used. The list of theambiguous nouns in the test corpus may be seenin Table 1. Notice that this corpus is not a senserepository, because the task requires finding themost probable translation (with the correct sense)

of a given ambiguous word.

5.2 Evaluation of the n-gram BasedSentence Representation

In Figure 2, one can view the results obtained forthe different approaches evaluated with thecorpus presented in Table 1. The runs arelabeled as follows:

2-grams: A representation of the sentencebased on bigrams.

3-grams: A representation of the sentencebased on trigrams.

tStudent 5-grams: A representation of thesentence based on 5-grams, removing allthose bigrams with are not considered tobe a collocation by means of thestudents t-distribution.

PMI 5-grams: A representation of thesentence based on 5-grams, removing allthose bigrams which are not consideredto be a collocation by means of pointwisemutual information.

unweighted |S|-gram: A sentencerepresentation based on a unique n-gramof length |S|.

weighted |S|-gram: A sentencerepresentation based on a unique n-gramof length |S|, considering the distance ofeach sentence term to the ambiguousword.

The bigram model showed the worstperformance. We think that the fact of using onlyone term (besides the ambiguous one) in thedisambiguation model is responsible for this

failure. Thus, the information needed in order todisambiguate the polysemous word is notsufficient. It may be seen that the model basedon a 3-gram representation outperformed thebigram one, but the number the terms around theambiguous word is still insufficient. With theseresults in mind, we proposed to use arepresentation with a greater number of terms (inthis case, 5-grams were used). Thisrepresentation model was analyzed with thepurpose of detecting those bigrams, inside the 5-grams, that are actual collocations and not co-occur by chance. Therefore, we proposed two

different filtering methods: pointwise mutualinformation and Students t-distribution. Theformer filtering method obtained the best results.The reason is that PMI does not need so manyoccurrences of the bigram that the Students t-distribution does in order to detect that a givenbigram is in fact a collocation.

Finally, when we considered all the terms forthe process of disambiguation (|S|gram), weobserved that some terms were positioned toofar from the ambiguous word to provide valuableinformation. Actually, such terms introduce noisemaking the performance of the method todecrease. In the latter representation, we were

interested in finding out whether the closeness ofthe terms in the sentence with respect to theambiguous word had a positive impact on theprocess of disambiguation. Therefore, weproposed a weighted version of therepresentation model which gives lessimportance to those terms that are far and moreimportance to closer terms.

Unfortunately, the formulae did not giveenough weight for emphasizing thischaracteristic. That is one of the reasons why the5-gram representation reached a betterperformance. In other words, the 5-gram

representation uses only the necessary termsand assigns a higher value of importance to all ofthem if they are closer to the ambiguous word.

With the purpose of observing theperformance of the proposed approaches, Table2 presents a comparison of our runs with others


9/12



approaches presented at the SemEval-26

competition. The UvT team submitted two runs(UvT-WSD1 and UvT-WSD2) with an oofevaluation, which outputs the five best

translations/senses. This team made use of a k-nearest neighbor classifier to build one wordsense for each target ambiguous word, andselected translations from a bilingual dictionaryobtained by executing the GIZA package on theEUROPARL parallel corpus [10].

The University of Heidelberg participatedsubmitting two runs (UHD-1 and UHD-2). Theyapproached the bilingual word sensedisambiguation by finding the most appropriatetranslation in different languages on the basis ofa multilingual co-occurrence graph automaticallyinduced from the target words aligned with the

texts found in the EUROPARL and JRC-Arquisparallel corpora [10].Finally, there was another team which

submitted one run (ColEur2) with a supervisedapproach using the translations obtained withGIZA from the EUROPARL parallel corpus inorder to distinguish between senses in theEnglish source sentences [10]. In general, it maybe seen that all the teams used the GIZAsoftware in order to build a bilingual statisticaldictionary. Therefore, the main differencesamong all these approaches are in the way ofrepresenting the original ambiguous sentence(including the pre-processing stage), and the

manner of filtering the results obtained by GIZA.Table 2 is given only as a reference of the

behavior of our approaches with respect to thosepresented in the literature. However, we mustemphasize that these results are not comparablebecause the teams participating at the SemEval-2 competition were allowed to repeat the targettranslation/sense among the five possibleoutputs. This type of evaluation leads to higherperformance (even greater than 100%)compared with the case when it is not allowed torepeat translations. Despite the unfaircomparison, it can be seen that the approach

named PMI 5-gram outperforms the best resultobtained in the competition.

6http://semeval2.fbk.eu/

In Table 3, we compare our approacheswhich allow the repetition of translations. Again, itmay be noticed that some of our approachesperform better than some other systems.

By observing the values of precision over thedifferent ambiguous words (see Figure 3), wemay have a picture of the significant level ofimprovement that may be reached whenrepresenting the sentence with 5-grams. Inparticular, we present the approach that filteredthe terms using pointwise mutual information andobtained the best results over all the approachesanalyzed. In Figure 3, it may also be seen thatthere are some words that are easier todisambiguate (e.g. soil and education) thanothers (e.g. match). For research purposes, wealso consider it important to focus on those

words that are hard to disambiguate.

Table 2. Evaluation of the bilingual WSD (removingrepeated translations/senses); Five best translations(oof)

System name Precision (%) Recall (%)PMI 5-gram 43.26 43.26

UvT-WSD2 43.12 43.12

UvT-WSD1 42.17 42.17

unweighted |S|-gram 40.82 40.82

weighted |S|-gram 40.76 40.76

UHD-1 38.78 31.81

UHD-2 37.74 31.3

3-gram 36.82 36.82

ColEur2 35.84 35.46

tStudent 5-gram 33.52 33.52

2-gram 21.25 21.25

6 Conclusions

Bilingual word sense disambiguation is the taskof obtaining such translations of a givenambiguous word that match with the original

word sense. In this paper, we presented anevaluation of different representations based onn-grams for sentences containing one ambiguousword. In particular, we used a Nave Bayesclassifier for determining the probability of atarget sense (in the target language) given a


10/12



sentence which contains the ambiguous word (inthe source language). The probabilities weremodeled by means of a bilingual statisticaldictionary calculated with Giza++ (using the

EUROPARL parallel corpus). Six differentapproaches based on n-grams were evaluated.

The 5-gram representation that employed mutualinformation demonstrated the best performance,slightly outperforming the results reported in theliterature for the bilingual word sense

disambiguation task at the SemEval-2international competition.

Fig. 2.A comparison among all the approaches proposed

Table 3. Evaluation of the bilingual WSD (considering repeated translations/senses); five best translations (oof)

System name Precision (%) Recall (%)

3-gram 70.36 70.36

PMI 5-gram 54.87 54.87

UvT-WSD2 43.12 43.12

UvT-WSD1 42.17 42.17

unweighted |S|-gram 40.76 40.76

UHD-1 38.78 31.81

weighted |S|-gram 38.46 38.46

UHD-2 37.74 31.3

ColEur2 35.84 35.46

tStudent 5-gram 33.52 33.52

2-gram 21.25 21.25


11/12



Adding a filtering step by means of pointwisemutual information allowed us to identify theterms which give the best support to the processof bilingual WSD.

We observed that in some cases the use of areduced window size in the neighborhood of theambiguous word may exclude some importantterms that would help to improve the precision offinding the correct target sense. This leads us toconclude that statistical methods do have some

limitations but they may be enriched byconsidering the use of linguistic and/or semantictechniques able to capture those terms.

Finally, we consider that the hypothesis of

Harris

[7]which states that the closer the wordsare to the polysemous word, the better theyserve for disambiguating the polysemous word,although at the same time it is important to avoidType I errors or false positives by using sometechniques like pointwise mutual information.

Fig. 3. An analysis of the evaluation of all the ambiguous words with the PMI 5-grams approach

References

1. Aguirre, E. & Edmonds, P. (2006). Word SenseDisambiguation: algorithms and applications.Dordrecht: Springer.

2. Barcel, G., (2010). Desambiguacin de lossentidos de las palabras en espaol usando textos

paralelos. Tesis de Doctorado, Instituto PolitcnicoNacional, Centro de Investigacin enComputacin, Mxico, D.F.

3. Carpuat, M. & Wu, D.(2007).Improving statisticalmachine translation using word sensedisambiguation. 2007 Joint Conference onEmpirical Methods in Natural Language Processingand Computational Natural Language Learning

(EMNLP-CoNLL 2007). Prague, Czech Republic,61-72.

4. Chan, Y., Ng, H. & Chiang, D.(2007).Word sense

disambiguation improves statistical machinetranslation. 45th Annual Meeting of the Associationfor Computational Linguistics, Prague, CzechRepublic, 33-40.

5. Cortes, C. & Vapnik, V. (1995). Support-vectornetworks. Machine Learning, 20(3), 273297.

6. Florian, R. & Yarowsky, D. (2002). Modeling

consensus: Classifier combination for word sensedisambiguation. ACL-02 Conference on EmpiricalMethods in Natural Language Processing,Philadelphia, USA, 10, 2532.

7. Harris, Z. (1981).Distributional structure. InHenryHiz (Ed.),Papers on syntax (322). Boston: KluwerBoston Inc.
http://www.amazon.com/s/ref=ntt_athr_dp_sr_2?_encoding=UTF8&sort=relevancerank&search-alias=books&field-author=Henry%20Hizhttp://www.amazon.com/s/ref=ntt_athr_dp_sr_2?_encoding=UTF8&sort=relevancerank&search-alias=books&field-author=Henry%20Hizhttp://www.amazon.com/s/ref=ntt_athr_dp_sr_2?_encoding=UTF8&sort=relevancerank&search-alias=books&field-author=Henry%20Hizhttp://www.amazon.com/s/ref=ntt_athr_dp_sr_2?_encoding=UTF8&sort=relevancerank&search-alias=books&field-author=Henry%20Hiz


12/12



8. Lafferty, J.D., McCallum, A. & Pereira, F.C.N.(2001). Conditional random fields: Probabilistic

models for segmenting and labeling sequencedata. Eighteenth International Conference onMachine Learning, ICML 01. Massachusetts, USA,282289.

9. Lee, Y.K. & Ng, H.T. (2002). An empiricalevaluation of knowledge sources and learningalgorithms for word sense disambiguation. ACL-02Conference on Empirical Methods in NaturalLanguage Processing, Philadelphia, USA, 10, 4148.

10. Lefever, E. & Hoste, V. (2010). Semeval-2010

task 3: Cross-lingual word sense disambiguation.NAACL HLT Workshop on Semantic Evaluations:Recent Achievements and Future Directions.Colorado, USA, 8287.

11. Mihalcea, R.F. & Moldovan, D.I. (2001). Patternlearning and active feature selection for word

sense disambiguation. Second InternationalWorkshop on Evaluating Word SenseDisambiguation Systems (SENSEVAL-2).Toulouse, France, 127130.

12. Ng, H. T., Wang, B. & Chan, Y. S. (2003).

Exploiting parallel texts for word sensedisambiguation: An empirical study. 41

st Annual

Meeting of the Association for ComputationalLinguistics (ACL03). Sapporo, Japan, 455462.

13. Sinha, R., McCarthy, D. & Mihalcea, R. (2010).Semeval-2010 task 2: Cross-lingual lexicalsubstitution. NAACL HLT Workshop on SemanticEvaluations: Recent Achievements and FutureDirections. Colorado, USA, 7681.

14. Yarowsky, D., Cucerzan, S., Florian, R., Schafer,C. & Wicentowski, R. (2001).The Johns HopkinsSenseval2 system descriptions. SecondInternational Workshop on Evaluating Word SenseDisambiguation Systems (SENSEVAL-2).Toulouse, France, 163166.

David Eduardo PintoAvendao obtained his PhDin computer science in thearea of artificial intelligenceand pattern recognition at thePolytechnic University of

Valencia, Spain in 2008. Atpresent he is a full time professor at the Faculty ofComputer Science of the Benemrita Universidad

Autnoma de Puebla (BUAP). His areas of interestinclude clustering, information retrieval, crosslingualNLP tasks and computational linguistics in general.

Darnes Vilario Ayala obtainedher PhD in mathematics in thearea of optimization at theHavanas University of Cuba in1997. At present she is a fulltime professor at the Faculty ofComputer Science of the BUAP.

Her areas of interest include artificial intelligence,business intelligence and computational linguistics.

Carlos Balderas is currentlya master student at theFaculty of Computer Scienceof BUAP. His areas of interestinclude information retrievaland word sensedisambiguation.

Mireya Tovar Vidal obtainedher master degree in computerscience at the Cinvestav - IPNin 2002. She is currently a PhDstudent at the CENIDETresearch institute. She is also afull time professor at the Facultyof Computer Science of BUAP.

Her areas of interest include ontologies andcomputational linguistics.

Beatriz Beltrn Martnez

obtained her master degree incomputer science in 1997 at theFaculty of Computer Science ofBUAP where she holds now aposition of a full time professor.Her areas of interest includepattern recognition andcomputational linguistics.

Article received on 12/03/2010; accepted 05/02/2011.

Pinto Evaluating N-gram Models for a Bilingual Word Sense Disambiguation Task

Documents