A Hybrid Relational Approach for Word Sense Disambiguation in Machine Translation Lucia Specia Mark Stevenson
Dec 27, 2015
A Hybrid Relational Approach for Word Sense Disambiguation in Machine
Translation
Lucia Specia
Mark Stevenson
Maria G. V. Nunes
WSD in Machine Translation (MT)
Lexical choice in the case of semantic ambiguity.
Examples (English-Portuguese): take =
tomar (carry out),
levar (lead, direct, conduct, guide),
aceitar (accept),
pegar (choose, pick out), etc.
WSD in Machine Translation (cont.)
One of the main challenges in MT.
Conflicting results on the usefulness of WSD for (statistical) MT: (Vickrey et al., 2005); (Carpuat and Wu, 2005).
Particularly for English-Portuguese, studies have shown that the lack of WSD modules is one of the main reasons for the unsatisfactory results of the existent MT systems We suggested that an effective WSD module, specifically
designed for MT, would improve MT performance.
Approaches to WSD
Knowledge-based: linguistic knowledge manually codified or extracted from lexical resources
Corpus-based: knowledge automatically acquired from text using machine learning
Hybrid: merge characteristics of the two other approaches
Accurate, but suffer from the knowledge acquisition bottelneck.
Wide coverage, but need consistent and significant sample corpus.
Explore advantages and minimize limitations of other approaches → wide coverage and accurate results.
Approaches to multilingual WSD
Approaches to WSD as an application-independent task date back to 1960’s.
Most are monolingual, for English disambiguation: WSD is application-dependent (Wilks and Stevenson,
1998; Kilgarriff, 1997; Resnik and Yarowsky, 1997).
WSD for MT differs from monolingual WSD (Hutchins and Sommers, 1992), particularly with respect to the sense repository (Specia et al., 2006).
Approaches to multilingual WSD
Corpus-based and hybrid approaches use propositional formalisms (attribute-value vectors): Limited expressiveness; data sparseness:
Ex1) John gave Mary a big cake.
Ex2) Give me something.
Consequences: Impractical to represent substantial knowledge and use it during the learning process
Hybrid approaches use knowledge in pre-processing steps, before applying machine learning algorithms.
verb1-subj1 verb1-obj1 mod1-obj1 verb1-obj2 …
give-john give-cake big-cake give-mary …give-something give-me …
Proposal – a novel approach
LeAR (Lexical Ambiguity Resolution): Specific for MT: senses, knowledge, techniques.
Hybrid - corpus and knowledge-based Several knowledge sources (KSs) automatically acquired
from corpus and lexical resources; Evidence provided by examples of disambiguation extracted
from automatically created sense tagged corpora.
Relational formalism Highly expressive, avoiding data sparseness:
each example is represented independently.
Inductive Logic Programming (ILP)
Relational symbolic supervised learning approach.
Inductive Logic Programming
Allows the efficient representation of substantial knowledge about the problem, and allows this knowledge to be used during the learning process (Muggleton, 1991).
Machine Learning Logic ProgrammingILP
Theory(1st-order clauses)
Examples(1st-order clauses)
Back. Knowledge(1st-order clauses)
Aleph
Inductive Logic Programming (cont.)
Given: a set of positive and negative examples E = E+ E- a predicate p specifying the target relation to be learned knowledge of a certain domain which specifies which
predicates qi can be part of the definition of p.
The goal is: to induce a hypothesis (or theory) h for p, with respect to E and , which covers most of the E+, without covering the E-.
Additionally: clauses representing K, E, and h must satisfy a set of syntactic restrictions S (language bias).
h can be used to classify new cases of disambiguation.
Inductive Logic Programming (cont.)
Aleph (Srinivasan, 2000): Provides a complete relational learning inference engine.
Provides various customization options: Induction methods; Search strategies; Evaluation functions; etc.
We are using: bottom-up search (generalisation); non-incremental learning (batch learning); non-interactive learning (without user intervention); learning based on positive examples only.
Inductive Logic Programming (cont.)
The default inference engine induces a theory iteratively by means of the following steps:1. One example is selected to be generalized.
Ex.: sense(sent1,voltar).
1. A more specific clause (bottom clause), which explains the selected example, is built. It usually consists of the representation of all knowledge about that example.
2. A clause that is more generic than the bottom clause is searched, by means of different search, evaluation, and generalization strategies.
3. The best clause found is added to the theory and the examples covered by such clause are removed from the example set. If there are more instances, return to step 1.
KS4KS7
KS6
KS1
ILP Inference Engine
Rules to use Bag-of-words (10)
Rules to use Collocations
KS2
POS of the Narrow Context (10)
Rules to use POS
KS3
Subject-object syntactic relations
Rules to use syntactic relations
Rules to use context, ph. verbs & idioms
KS5
Verbs selectional restrictions
Rules to use selectional restrictions
Subject-object syntactic relations
Nouns semantic features
Rules to use definitions overlapping
Overlapping counting
Rule-based model
Examples
Bag-of-words (10)
POS tagger
LDOCE Wordnet
Hierarchical relations
Feature types hierarchy
Bilingual MRDs
Definitions overlapping
Bag-of-words (200)
Bag-of-words (10)
Mode + type + general settings
Phrasal verbs and idioms
Bag-of-words (10)
11 Collocations
Parser
Verb definitions and examples
LDOCE + Password
Scope
Experiments with: English-Portuguese MT
No studies have examined English-Portuguese.
10 highly frequent and ambiguous verbs Relevant and difficult cases for English-Portuguese MT (Specia,
2005a).
Knowledge from syntactic, semantic and pragmatic sources Working on knowledge which is specific for translation.
Although especially designed for MT of verbs, the approach can be adapted for WSD of any words and languages.
Sample data
Corpus: fiction books, automatically tagged with the verb translation and manually reviewed (Specia et al., 2005a).
Verb # Translation # Examples Accuracy of mostfrequent translation
ask 5 209 0.77come 11 183 0.5get 17 157 0.21give 5 180 0.89go 11 197 0.69live 8 209 0.69
look 7 191 0.5
make 11 170 0.7
take 13 142 0.29
tell 17 209 0.35
Average 10.5 0.56
Knowledge sources
Example: sent1, verb “to come”:“If there is such a thing as reincarnation, I would not mind coming
back as a squirrel”.
KS1: Bag-of-words – ± 5 words (lemmas) surrounding the verb for every sentence (sent_id)
bag(sent_id, list_of_words).
Ex.: bag(sent1, [mind,not,will,i,reincarnation,back,as,a,squirrel])
KS2: Part-of-speech (POS) tags of content words in a ±5 word window surrounding the verb
has_pos(sent_id, word_position, pos).
Ex.: has_pos(sent1, word_left_1, nn).
has_pos(sent1, word_left_2, vbp). …
Knowledge sources
KS3: Subject and object syntactic relations with respect to the verb
has_rel(sent_id, subject_word, object_word).
Ex.: has_rel(sent1, i, nil).
KS4: Context words represented by 11 collocations with respect to the verb: 1st preposition to the right, 1st and 2nd words to the left and right, 1st noun, 1st adjective, and 1st verb to the left and right
has_collocation(sent_id, collocation_type, collocation).
Ex.: has_collocation(sent1, word_right_1, back). has_collocation(sent1, word_left_1, mind).…
Knowledge sources
KS5: Selectional restrictions of verbs and semantic features of their arguments from LDOCE
rest(verb, subj_restrition, obj_ restriction, translation)
Ex.: rest(come, [], nil, voltar). rest(come, [animal,human], nil, vir).
rest(come, [], nil, aparecer). ...
feature(noun, sense_id, features).
Ex.: feature(reincarnation, 0_1, [abstract]). feature(reincarnation, 0_2, [animate]).
feature(squirrel, 0_0, [animal]). …
Knowledge sources
KS5 (cont.): Hierarchy for LDOCE feature types (Bruce and Guthrie,
1992)
relation(feature1, feature2).
Ex.: sub(human, animate). …
Ontological relations from WordNet
relation(word1, sense_id1, word2, sense_id2).
Ex.: hyper(reincarnation, 1, avatar, 1). hyper(reincarnation, 3, religious_doctrine, 2). synon(rebirth, 2, reincarnation, -1). …
Knowledge sources
KS6: Idioms and phrasal verbs
exp(verbal_expression, translation)
Ex.: exp('come about', acontecer). exp('come about', chegar). exp('come to fruition', amadurecer). …
KS7: A count of the overlapping words in dictionary definitions for the possible translations of the verb and the words surrounding it in the sentence
highest_overlap(sent_id, translation, overlapping).
Ex.: highest_overlap(sent1, voltar, 0.222222). highest_overlap(sent1, chegar, 0.0857143).
…
Additional predicates
Examples:sense(sent_id, translation).
Ex.: sense(sent1, voltar). sense(sent2, ir). …
Mode definitionsEx.: :- modeh(1, sense(sent, translation)).
:- modeb(11, has_collocation(sent, colloc_id, colloc)). :- modeb(10, has_bag(sent, word)). …
Auxiliary predicates:Ex.: has_bag(Sent, Word) :-
bag(Sent, List), member(Word, List). …
bag(sent1, [mind,not,will,i,reincarnation,back,as,a,squirrel])
Example of rules produced
Verb “to come”:1. sense(A, sair) :- has_collocation(A, preposition_right_1, out).
2. sense(A, chegar) :- satisfy_restrictions(A, [animal,human],[concrete]), has_expression(A, 'come at').
3. sense(A, vir) :- satisfy_restriction(A, [human],[abstract]);
has_collocation(A, word_right_1, from); (has_rel(A, subj, B), (has_pos(B,nn);has_pos(B,pron))).
4. sense(A, passar) :- (has_bag(A, to), has_bag(A, propernoun)); highest_overlapping(A,passar).
In order classify new cases, rules must be applied in
the order they are produced.
Evaluation
Induction methods: induce: builds one clause each time, removing the
examples covered by that clause. Theory to be produced depends on the order of the examples;
induce_max: builds one clause each time, without removing the examples covered by the clause. Builds a bottom clause for all the examples, and not only for the first one.
Search strategies: bf: enumerates shorter clauses before longer ones; df: enumerates longer clauses before shorter ones; heuristic: enumerates clauses in a best-first manner.
Evaluation
Generalization strategy: Relative least general generalisation (rlgg): lgg of two
clauses c1 and c2, which is the minimum upper bound of c1 and c2 in the lattice introduced by -subsumption, with relation to the background knowledge.
Evaluation function: Only positive examples (Bayesian score).
Knowledge sources: All 7; All – 1 = 6 each time; 1 each time.
Comparison: propositional approaches
Algorithm: C4.5, Naive Bayes, Memory-based, SVM.
Knowledge sources: Narrow context: 5 surrounding words and/or POS tags; Broad context: 1-100 surrounding words and/or POS tags; 11 collocations: 1st preposition to the right, 1st and 2nd words
to the left and right, 1st noun, 1st adjective, and 1st verb to the left and right;
Subject-object syntactic relations.
Best combination of these features, along with use of filters and optimization of parameters.
Best algorithm: SVM.Specia et al. (2005b).
Results
* 10 fold-cross validation, best experimental setting: induce_max, all KSs, heuristic search, but without filters or other optimizations.
Verb # Rules in Aleph
% Accuracy
Aleph* Most frequent sense
C4.5 SVM
ask 5 0.94 0.77 0.76 0.93
come 11 0.91 0.5 0.68 0.8
get 17 0.67 0.21 0.53 0.66
give 6 0.96 0.89 0.9 0.96
go 12 0.88 0.69 0.81 0.85
live 8 0.88 0.69 0.61 0.81
look 6 0.83 0.5 0.83 0.87
make 11 0.83 0.7 0.77 0.87
take 11 0.81 0.29 0.45 0.6
tell 8 0.94 0.35 0.66 0.7
Average 0.87 0.56 0.7 0.8
Results - KSs
All KSs together yield better results than subsets of KSs. Different KSs seem to be more relevant than others for certain verbs. One KS each time – very low quality results (accuracy and rules).
Verb All KSs
All KSs except:Bag-of-words
Collocations Expressions Definition Overlapping
POS SelectionalRestrictions
Syntacticrelations
ask 0.94 0.64 0.9 0.64 0.64 0.76 0.64 0.68come 0.91 0.8 0.82 0.78 0.8 0.82 0.87 0.78get 0.67 0.56 0.46 0.56 0.56 0.54 0.62 0.56give 0.96 0.93 0.8 0.93 0.93 0.91 0.93 0.84go 0.88 0.76 0.76 0.84 0.84 0.84 0.76 0.84live 0.88 0.81 0.67 0.81 0.81 0.77 0.81 0.81look 0.83 0.77 0.62 0.77 0.77 0.77 0.68 0.79make 0.83 0.71 0.83 0.71 0.71 0.88 0.76 0.74take 0.81 0.41 0.47 0.41 0.41 0.44 0.41 0.47tell 0.94 0.87 0.75 0.85 0.87 0.85 0.85 0.87Aver. 0.87 0.73 0.71 0.73 0.73 0.76 0.73 0.74
Next steps
Try different corpora: larger and of different domains/genres (although the verbs are not domain specific).
Other ILP options: Induction methods; Manual pruning; Manual constraints; Search strategies; etc.
Optimization (time).
Use of the translation context KS specific to MT.
Extrinsic evaluation: transfer rule-based MT system.
Final remarks
Results are promising: hybrid relational approach outperforms propositional approaches, yielding a small set of symbolic rules, which are easy to understand and adapt, if necessary.
All KSs seem to play an important role.
In general, the approach showed to be
feasible and we expect the resultant
system will be able to improve
the quality of English-Portuguese MT
systems.
Lucia [email protected]
“Which of you shall we say doth love us most?
That we our largest bounty may extend
Where nature doth with merit challenge.”
References
Bruce, R. and Guthrie. L. (1992). Genus disambiguation: A study in weighted performance. In Proceedings of the 14th COLING, Nantes, pp. 1187-1191. Carpuat, M. and Wu, D. (2005). Word sense disambiguation vs. statistical machine translation. 43rd ACL Meeting, Ann Arbor, pp. 387–394.
Kilgarriff, A. (1997). I Don't Believe in Word Senses. Computers and the Humanities, 31 (2), pp. 91-113.
Hutchins, W.J. and Somers, H.L. 1992. An Introduction to Machine Translation. Academic Press, Great Britain.
Muggleton, S. 1991. Inductive Logic Programming. New Generation Computing, 8 (4):295-318.
Resnik, P. and Yarowsky, D. (1997). A Perspective on Word Sense Disambiguation Methods and their Evaluating. ACL-SIGLEX Workshop Tagging Texts with Lexical Semantics: Why, What and How?. Washington.
Specia, L. (2005a). A Hybrid Model for Word Sense Disambiguation in English-Portuguese Machine Translation. In Proceedings of the 8th CLUK, Manchester, pp. 71-78.
Specia, L. (2005b). Knowledge sources for disambiguating highly ambiguous verbs in machine translation. In Proceedings of the Student Session of the 17th ESSLLI, Edinburgh.
References
Specia, L., Nunes, M.G.V., Stevenson, M. (2005). Exploiting Parallel Texts to Produce a Multilingual Sense-tagged Corpus for Word Sense Disambiguation. In Proceedings of RANLP-05, Borovets, pp. 525-531.
Specia, L., Nunes, M.G.V., Stevenson, M. 2006 (to appear). Multilingual versus Monolingual WSD. Proceedings of EACL Workshop Making Sense of Sense, April 4th, Trento.
Srinivasan, A. 2000. The Aleph Manual. Technical Report. Computing Laboratory, Oxford University (http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/).
Vickrey, D., Biewald, L., Teyssier, M., and Koller, D. (2005). Word-Sense Disambiguation for Machine Translation. HLT/EMNLP, Vancouver.
Wilks, Y. and Stevenson. M. 1998. The Grammar of Sense: Using Part-of-speech Tags as a First Step in Semantic Disambiguation. Journal of Natural Language Engineering, 4(1):1-9.