A Hybrid Relational Approach for Word Sense Disambiguation in Machine Translation Lucia Specia Mark Stevenson Maria G. V. Nunes.

A Hybrid Relational Approach for Word Sense Disambiguation in Machine

Translation

Lucia Specia

Mark Stevenson

Maria G. V. Nunes

WSD in Machine Translation (MT)

Lexical choice in the case of semantic ambiguity.

Examples (English-Portuguese): take =

tomar (carry out),

levar (lead, direct, conduct, guide),

aceitar (accept),

pegar (choose, pick out), etc.

WSD in Machine Translation (cont.)

One of the main challenges in MT.

Conflicting results on the usefulness of WSD for (statistical) MT: (Vickrey et al., 2005); (Carpuat and Wu, 2005).

Particularly for English-Portuguese, studies have shown that the lack of WSD modules is one of the main reasons for the unsatisfactory results of the existent MT systems We suggested that an effective WSD module, specifically

designed for MT, would improve MT performance.

Approaches to WSD

Knowledge-based: linguistic knowledge manually codified or extracted from lexical resources

Corpus-based: knowledge automatically acquired from text using machine learning

Hybrid: merge characteristics of the two other approaches

Accurate, but suffer from the knowledge acquisition bottelneck.

Wide coverage, but need consistent and significant sample corpus.

Explore advantages and minimize limitations of other approaches → wide coverage and accurate results.

Approaches to multilingual WSD

Approaches to WSD as an application-independent task date back to 1960’s.

Most are monolingual, for English disambiguation: WSD is application-dependent (Wilks and Stevenson,

1998; Kilgarriff, 1997; Resnik and Yarowsky, 1997).

WSD for MT differs from monolingual WSD (Hutchins and Sommers, 1992), particularly with respect to the sense repository (Specia et al., 2006).

Approaches to multilingual WSD

Corpus-based and hybrid approaches use propositional formalisms (attribute-value vectors): Limited expressiveness; data sparseness:

Ex1) John gave Mary a big cake.

Ex2) Give me something.

Consequences: Impractical to represent substantial knowledge and use it during the learning process

Hybrid approaches use knowledge in pre-processing steps, before applying machine learning algorithms.

verb1-subj1 verb1-obj1 mod1-obj1 verb1-obj2 …

give-john give-cake big-cake give-mary …give-something give-me …

Proposal – a novel approach

LeAR (Lexical Ambiguity Resolution): Specific for MT: senses, knowledge, techniques.

Hybrid - corpus and knowledge-based Several knowledge sources (KSs) automatically acquired

from corpus and lexical resources; Evidence provided by examples of disambiguation extracted

from automatically created sense tagged corpora.

Relational formalism Highly expressive, avoiding data sparseness:

each example is represented independently.

Inductive Logic Programming (ILP)

Relational symbolic supervised learning approach.

Inductive Logic Programming

Allows the efficient representation of substantial knowledge about the problem, and allows this knowledge to be used during the learning process (Muggleton, 1991).

Machine Learning Logic ProgrammingILP

Theory(1st-order clauses)

Examples(1st-order clauses)

Back. Knowledge(1st-order clauses)

Aleph

Inductive Logic Programming (cont.)

Given: a set of positive and negative examples E = E+ E- a predicate p specifying the target relation to be learned knowledge of a certain domain which specifies which

predicates qi can be part of the definition of p.

The goal is: to induce a hypothesis (or theory) h for p, with respect to E and , which covers most of the E+, without covering the E-.

Additionally: clauses representing K, E, and h must satisfy a set of syntactic restrictions S (language bias).

h can be used to classify new cases of disambiguation.


Aleph (Srinivasan, 2000): Provides a complete relational learning inference engine.

Provides various customization options: Induction methods; Search strategies; Evaluation functions; etc.

We are using: bottom-up search (generalisation); non-incremental learning (batch learning); non-interactive learning (without user intervention); learning based on positive examples only.


The default inference engine induces a theory iteratively by means of the following steps:1. One example is selected to be generalized.

Ex.: sense(sent1,voltar).

1. A more specific clause (bottom clause), which explains the selected example, is built. It usually consists of the representation of all knowledge about that example.

2. A clause that is more generic than the bottom clause is searched, by means of different search, evaluation, and generalization strategies.

3. The best clause found is added to the theory and the examples covered by such clause are removed from the example set. If there are more instances, return to step 1.

KS4KS7

KS6

KS1

ILP Inference Engine

Rules to use Bag-of-words (10)

Rules to use Collocations

KS2

POS of the Narrow Context (10)

Rules to use POS

KS3

Subject-object syntactic relations

Rules to use syntactic relations

Rules to use context, ph. verbs & idioms

KS5

Verbs selectional restrictions

Rules to use selectional restrictions

Subject-object syntactic relations

Nouns semantic features

Rules to use definitions overlapping

Overlapping counting

Rule-based model

Examples

Bag-of-words (10)

POS tagger

LDOCE Wordnet

Hierarchical relations

Feature types hierarchy

Bilingual MRDs

Definitions overlapping

Bag-of-words (200)

Bag-of-words (10)

Mode + type + general settings

Phrasal verbs and idioms

Bag-of-words (10)

11 Collocations

Parser

Verb definitions and examples

LDOCE + Password

Scope

Experiments with: English-Portuguese MT

No studies have examined English-Portuguese.

10 highly frequent and ambiguous verbs Relevant and difficult cases for English-Portuguese MT (Specia,

2005a).

Knowledge from syntactic, semantic and pragmatic sources Working on knowledge which is specific for translation.

Although especially designed for MT of verbs, the approach can be adapted for WSD of any words and languages.

Sample data

Corpus: fiction books, automatically tagged with the verb translation and manually reviewed (Specia et al., 2005a).

Verb # Translation # Examples Accuracy of mostfrequent translation

ask 5 209 0.77come 11 183 0.5get 17 157 0.21give 5 180 0.89go 11 197 0.69live 8 209 0.69

look 7 191 0.5

make 11 170 0.7

take 13 142 0.29

tell 17 209 0.35

Average 10.5 0.56

Knowledge sources

Example: sent1, verb “to come”:“If there is such a thing as reincarnation, I would not mind coming

back as a squirrel”.

KS1: Bag-of-words – ± 5 words (lemmas) surrounding the verb for every sentence (sent_id)

bag(sent_id, list_of_words).

Ex.: bag(sent1, [mind,not,will,i,reincarnation,back,as,a,squirrel])

KS2: Part-of-speech (POS) tags of content words in a ±5 word window surrounding the verb

has_pos(sent_id, word_position, pos).

Ex.: has_pos(sent1, word_left_1, nn).

has_pos(sent1, word_left_2, vbp). …

Knowledge sources

KS3: Subject and object syntactic relations with respect to the verb

has_rel(sent_id, subject_word, object_word).

Ex.: has_rel(sent1, i, nil).

KS4: Context words represented by 11 collocations with respect to the verb: 1st preposition to the right, 1st and 2nd words to the left and right, 1st noun, 1st adjective, and 1st verb to the left and right

has_collocation(sent_id, collocation_type, collocation).

Ex.: has_collocation(sent1, word_right_1, back). has_collocation(sent1, word_left_1, mind).…

Knowledge sources

KS5: Selectional restrictions of verbs and semantic features of their arguments from LDOCE

rest(verb, subj_restrition, obj_ restriction, translation)

Ex.: rest(come, [], nil, voltar). rest(come, [animal,human], nil, vir).

rest(come, [], nil, aparecer). ...

feature(noun, sense_id, features).

Ex.: feature(reincarnation, 0_1, [abstract]). feature(reincarnation, 0_2, [animate]).

feature(squirrel, 0_0, [animal]). …

Knowledge sources

KS5 (cont.): Hierarchy for LDOCE feature types (Bruce and Guthrie,

1992)

relation(feature1, feature2).

Ex.: sub(human, animate). …

Ontological relations from WordNet

relation(word1, sense_id1, word2, sense_id2).

Ex.: hyper(reincarnation, 1, avatar, 1). hyper(reincarnation, 3, religious_doctrine, 2). synon(rebirth, 2, reincarnation, -1). …

Knowledge sources

KS6: Idioms and phrasal verbs

exp(verbal_expression, translation)

Ex.: exp('come about', acontecer). exp('come about', chegar). exp('come to fruition', amadurecer). …

KS7: A count of the overlapping words in dictionary definitions for the possible translations of the verb and the words surrounding it in the sentence

highest_overlap(sent_id, translation, overlapping).

Ex.: highest_overlap(sent1, voltar, 0.222222). highest_overlap(sent1, chegar, 0.0857143).

…

Additional predicates

Examples:sense(sent_id, translation).

Ex.: sense(sent1, voltar). sense(sent2, ir). …

Mode definitionsEx.: :- modeh(1, sense(sent, translation)).

:- modeb(11, has_collocation(sent, colloc_id, colloc)). :- modeb(10, has_bag(sent, word)). …

Auxiliary predicates:Ex.: has_bag(Sent, Word) :-

bag(Sent, List), member(Word, List). …

bag(sent1, [mind,not,will,i,reincarnation,back,as,a,squirrel])

Example of rules produced

Verb “to come”:1. sense(A, sair) :- has_collocation(A, preposition_right_1, out).

2. sense(A, chegar) :- satisfy_restrictions(A, [animal,human],[concrete]), has_expression(A, 'come at').

3. sense(A, vir) :- satisfy_restriction(A, [human],[abstract]);

has_collocation(A, word_right_1, from); (has_rel(A, subj, B), (has_pos(B,nn);has_pos(B,pron))).

4. sense(A, passar) :- (has_bag(A, to), has_bag(A, propernoun)); highest_overlapping(A,passar).

In order classify new cases, rules must be applied in

the order they are produced.

Evaluation

Induction methods: induce: builds one clause each time, removing the

examples covered by that clause. Theory to be produced depends on the order of the examples;

induce_max: builds one clause each time, without removing the examples covered by the clause. Builds a bottom clause for all the examples, and not only for the first one.

Search strategies: bf: enumerates shorter clauses before longer ones; df: enumerates longer clauses before shorter ones; heuristic: enumerates clauses in a best-first manner.

Evaluation

Generalization strategy: Relative least general generalisation (rlgg): lgg of two

clauses c1 and c2, which is the minimum upper bound of c1 and c2 in the lattice introduced by -subsumption, with relation to the background knowledge.

Evaluation function: Only positive examples (Bayesian score).

Knowledge sources: All 7; All – 1 = 6 each time; 1 each time.

Comparison: propositional approaches

Algorithm: C4.5, Naive Bayes, Memory-based, SVM.

Knowledge sources: Narrow context: 5 surrounding words and/or POS tags; Broad context: 1-100 surrounding words and/or POS tags; 11 collocations: 1st preposition to the right, 1st and 2nd words

to the left and right, 1st noun, 1st adjective, and 1st verb to the left and right;

Subject-object syntactic relations.

Best combination of these features, along with use of filters and optimization of parameters.

Best algorithm: SVM.Specia et al. (2005b).

Results

* 10 fold-cross validation, best experimental setting: induce_max, all KSs, heuristic search, but without filters or other optimizations.

Verb # Rules in Aleph

% Accuracy

Aleph* Most frequent sense

C4.5 SVM

ask 5 0.94 0.77 0.76 0.93

come 11 0.91 0.5 0.68 0.8

get 17 0.67 0.21 0.53 0.66

give 6 0.96 0.89 0.9 0.96

go 12 0.88 0.69 0.81 0.85

live 8 0.88 0.69 0.61 0.81

look 6 0.83 0.5 0.83 0.87

make 11 0.83 0.7 0.77 0.87

take 11 0.81 0.29 0.45 0.6

tell 8 0.94 0.35 0.66 0.7

Average 0.87 0.56 0.7 0.8

Results - KSs

All KSs together yield better results than subsets of KSs. Different KSs seem to be more relevant than others for certain verbs. One KS each time – very low quality results (accuracy and rules).

Verb All KSs

All KSs except:Bag-of-words

Collocations Expressions Definition Overlapping

POS SelectionalRestrictions

Syntacticrelations

ask 0.94 0.64 0.9 0.64 0.64 0.76 0.64 0.68come 0.91 0.8 0.82 0.78 0.8 0.82 0.87 0.78get 0.67 0.56 0.46 0.56 0.56 0.54 0.62 0.56give 0.96 0.93 0.8 0.93 0.93 0.91 0.93 0.84go 0.88 0.76 0.76 0.84 0.84 0.84 0.76 0.84live 0.88 0.81 0.67 0.81 0.81 0.77 0.81 0.81look 0.83 0.77 0.62 0.77 0.77 0.77 0.68 0.79make 0.83 0.71 0.83 0.71 0.71 0.88 0.76 0.74take 0.81 0.41 0.47 0.41 0.41 0.44 0.41 0.47tell 0.94 0.87 0.75 0.85 0.87 0.85 0.85 0.87Aver. 0.87 0.73 0.71 0.73 0.73 0.76 0.73 0.74

Next steps

Try different corpora: larger and of different domains/genres (although the verbs are not domain specific).

Other ILP options: Induction methods; Manual pruning; Manual constraints; Search strategies; etc.

Optimization (time).

Use of the translation context KS specific to MT.

Extrinsic evaluation: transfer rule-based MT system.

Final remarks

Results are promising: hybrid relational approach outperforms propositional approaches, yielding a small set of symbolic rules, which are easy to understand and adapt, if necessary.

All KSs seem to play an important role.

In general, the approach showed to be

feasible and we expect the resultant

system will be able to improve

the quality of English-Portuguese MT

systems.

Lucia [email protected]

“Which of you shall we say doth love us most?

That we our largest bounty may extend

Where nature doth with merit challenge.”

References

Bruce, R. and Guthrie. L. (1992). Genus disambiguation: A study in weighted performance. In Proceedings of the 14th COLING, Nantes, pp. 1187-1191. Carpuat, M. and Wu, D. (2005). Word sense disambiguation vs. statistical machine translation. 43rd ACL Meeting, Ann Arbor, pp. 387–394.

Kilgarriff, A. (1997). I Don't Believe in Word Senses. Computers and the Humanities, 31 (2), pp. 91-113.

Hutchins, W.J. and Somers, H.L. 1992. An Introduction to Machine Translation. Academic Press, Great Britain.

Muggleton, S. 1991. Inductive Logic Programming. New Generation Computing, 8 (4):295-318.

Resnik, P. and Yarowsky, D. (1997). A Perspective on Word Sense Disambiguation Methods and their Evaluating. ACL-SIGLEX Workshop Tagging Texts with Lexical Semantics: Why, What and How?. Washington.

Specia, L. (2005a). A Hybrid Model for Word Sense Disambiguation in English-Portuguese Machine Translation. In Proceedings of the 8th CLUK, Manchester, pp. 71-78.

Specia, L. (2005b). Knowledge sources for disambiguating highly ambiguous verbs in machine translation. In Proceedings of the Student Session of the 17th ESSLLI, Edinburgh.

References

Specia, L., Nunes, M.G.V., Stevenson, M. (2005). Exploiting Parallel Texts to Produce a Multilingual Sense-tagged Corpus for Word Sense Disambiguation. In Proceedings of RANLP-05, Borovets, pp. 525-531.

Specia, L., Nunes, M.G.V., Stevenson, M. 2006 (to appear). Multilingual versus Monolingual WSD. Proceedings of EACL Workshop Making Sense of Sense, April 4th, Trento.

Srinivasan, A. 2000. The Aleph Manual. Technical Report. Computing Laboratory, Oxford University (http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/).

Vickrey, D., Biewald, L., Teyssier, M., and Koller, D. (2005). Word-Sense Disambiguation for Machine Translation. HLT/EMNLP, Vancouver.

Wilks, Y. and Stevenson. M. 1998. The Grammar of Sense: Using Part-of-speech Tags as a First Step in Semantic Disambiguation. Journal of Natural Language Engineering, 4(1):1-9.

A Hybrid Relational Approach for Word Sense Disambiguation in Machine Translation Lucia Specia Mark Stevenson Maria G. V. Nunes.

Documents

wsd knowledge

multilingual wsd approaches

usefulness of wsd

linguistic knowledge

multilingual wsd corpus

machine learning hybrid

monolingual wsd hutchins

hybrid corpus