Exploration of system combination in statistical machine translation Le Truong Vinh Phu Supervisor: Prof. Ng Hwee Tou Master of Computing dissertation School of Computing 27 th May 2014
Jun 11, 2015
Exploration of system combination in statistical machine translation
Le Truong Vinh Phu
Supervisor: Prof. Ng Hwee Tou
Master of Computing dissertation
School of Computing
27th May 2014
• Introduction • Literature Review • Multi-Engine Machine Translation (MEMT)
• Experiments • Conclusion and Future Research
Outline
2
• Introduction ♦ Machine translation (MT)
♦ Statistical machine translation (SMT)
♦ Machine translation system combination
♦ Problem description & objective
• Literature Review
• Multi-Engine Machine Translation (MEMT) • Experiments
• Conclusion and Future Research
Outline
3
• the use of computers to automate translation • difficulty: translation divergences • real-world benefits
• different paradigms and approaches ♦ dictionary-based
♦ rule-based
♦ statistical
Machine translation (MT)
4
• enabled by the availability of large corpora (mono, bi-lingual)
• relying on probability models ♦ faithfulness
♦ fluency
• P(F|E): translation model, P(E): language model
• Phrase-based SMT (Koehn et al., 2003)
Statistical machine translation (SMT)
5
• Language model: ♦ conditional probability of a word given previous words
♦ requires monolingual corpus
• Alignment
Statistical machine translation (SMT)
6
• Reordering model: ♦ penalties for long distance reordering
♦ distance-based (Koehn et al., 2005), phrase-based and hierarchical reordering (Galley & Manning, 2008)
• Automatic evaluation: ♦ BLEU (Papineni et al., 2002)
Statistical machine translation (SMT)
7
• different MT systems => different strengths and weaknesses
• synthesizing a consensus translation
• main aspects: ♦ combination method
♦ selection of good component systems to combine
MT system combination
8
• Problem description ♦ in which situation and settings system combination works well?
• Objective:
♦ evaluating system combination via empirical experiments Ø available datasets: NIST OpenMT, WMT
♦ utilizing system combination to improve a Chinese-to-English phrase-based system
Problem description & objective
9
• Introduction • Literature Review ♦ System combination
♦ Confusion network decoding
♦ Other approaches
♦ Diverse hypotheses generation
• Multi-Engine Machine Translation (MEMT) • Experiments • Conclusion and Future Research
Outline
10
• successfully applied in speech recognition (Fiscus, 1997; Mangu et al., 2000)
• crucial steps: aligning hypotheses, controlling word order
• variety of approaches: ♦ hypothesis re-ranking (Hildebrand & Vogel, 2008)
♦ confusion networks (Rosti et al., 2007a, 2007b)
♦ collaborative decoding (Li et al., 2009)
System combination
11
• current mainstream • Bangalore et al. (2001), Matusov et al. (2006), Rosti et
al. (2007a, 2007b), Sim et al. (2007), He et al. (2008)
• Rosti et al. (2007a) ♦ Sentence level
♦ Phrase level
♦ Word level
Confusion network decoding
12
Confusion network decoding
• cat sat the mat, cat sitting on the mat, and hat on a mat.
13
• Collaborative decoding (Li et al.,2009) ♦ avoid early pruning of potentially good translations
♦ leverage agreement information of n-grams
• Multi-Engine Machine Translation (MEMT) ♦ METEOR alignment (Banerjee & Lavie, 2005)
♦ no fixed backbone
Other approaches
14
• Not a trivial problem (Siohan et al., 2005) • Key point: complementary error patterns • Approaches: ♦ selecting different systems of different paradigms
♦ diversifying one baseline system Ø introducing randomness (Siohan et al., 2005) Ø different morphological decompositions of source language (de
Gispert et al., 2009) Ø varying alignment algorithms (Xu & Rosti, 2010) Ø controlling target “trait” values (Devlin and Matsoukas, 2012)
Diverse hypothesis generation
15
• Exploiting multiple Chinese word segmentation standards: Zhang et al. (2008), Dyer et al. (2008), Xu et al. (2005)
• Zhang et al. (2008): ♦ Exploiting four SIGHAN standards: AS, CITYU, MSR, PKU
Diverse hypothesis generation
16
• Introduction • Literature Review • Multi-Engine Machine Translation (MEMT) ♦ Overview
♦ Description
• Experiments • Conclusion and Future Research
Outline
17
• Open source toolkit: http://kheafield.com/code/memt/ • WMT system name: cmu-combo (2009), cmu-heafield-
combo (2010, 2011) • Superior performance in WMT 2011
• Easy to use, robust and efficient
Overview
18
• Combining 1-best outputs of component systems ♦ Pair-wise alignment (METEOR)
♦ Beam search
♦ Z-MERT tuning (Zaidan, 2009)
• Features: ♦ length
♦ language model
♦ backoff
♦ match
Description
19
• METEOR alignment: ♦ exact matches
♦ identical stems (Porter, 2001)
♦ WordNet synonyms (Miller, 1995)
♦ TERp unigram paraphrases (Snover et al., 2009)
Description
20
• Search space: ♦ picking one word at a time, from left to right
♦ maintaining two sets of “captured” and “uncaptured” words
♦ no duplication, fluency across switches
♦ no fixed backbone
Description
21
• final hypothesis weaves together parts of component outputs
Description
22
• Introduction • Literature Review • Multi-Engine Machine Translation (MEMT)
• Experiments ♦ MEMT on WMT11
♦ MEMT on NIST MT08
♦ Diversifying Chinese-English phrase-based SMT
♦ Exploiting multiple CWS standards
• Conclusion and Future Research
Outline
23
• http://www.statmt.org/wmt11 • two language pairs: French-English and Spanish-English • Ranking participating systems by BLEU on the test set
• Selecting different component systems for system combination
MEMT on WMT11
24
• French-English MEMT on WMT11
system combination gain 25
• Spanish-English MEMT on WMT11
system combination gain 26
• Spanish-English ♦ why E1 (combining all) < E2 (excluding the bottom two) ?
MEMT on WMT11
27
• LDC catalog no. LDC2010T21 and LDC2010T01 • No accompanied system papers • Challenging: mix of newswire and web texts
• Chinese-English and Arabic-English ♦ split datasets into tuning set and test set
MEMT on NIST MT08
28
• Chinese-English: ♦ Tuning set: 524 sentences, test set: 788 sentences
♦ Combining the top 5 systems out of 23 systems
♦ similar to Ma and McKeown (2012)
• Arabic-English ♦ Tuning set: 509 sentences, test set: 803 sentences
♦ Combining the top 7 systems out of 14 systems
MEMT on NIST MT08
29
• Chinese-English, gain = 3.76
MEMT on NIST MT08
30
• Arabic-English, gain = 3.47
MEMT on NIST MT08
31
• Varying different steps of training pipeline • Tune on MTC1+MTC3 datasets (LDC2002T01 and
LDC2004T07), test on NIST02-NIST08 evaluation sets
• Varying decoding algorithm: Maximum A Posteriori (MAP), Minimum Bayes Risk (MBR), Lattice Minimum Bayes Risk (LMBR)
• Varying reordering model: word-based (wbe), phrase-based (phrase), hierarchical (hier), combined reordering (phrase-hier)
Diversifying Chinese-English SMT
32
• Varying decoding algorithm, gain=-0.17
Diversifying Chinese-English SMT
33
• Varying reordering model, gain=0.19 Diversifying Chinese-English SMT
34
• Chinese Word Segmentation ♦ Correlates weakly with MT quality
♦ Potential source of diversity
• SIGHAN Bakeoff evaluation campaign ♦ Academia Sinica (AS)
♦ City University of Hong Kong (CITYU)
♦ Penn Chinese Treebank (CTB)
♦ Microsoft Research (MSR)
♦ Peking University (PKU)
Exploiting multiple CWS standards
35
• Chinese Word Segmentation
Exploiting multiple CWS standards
36
• Baseline System ♦ Chinese-English phrase-based SMT systems trained with
Moses
♦ Segmenting and training five different systems corresponding to five CWS standards
♦ Training bi-text: 8,290,649 sentence pairs
♦ Interpolated language model of order 5
♦ Tuning set MTC1+MTC3: 1928 sentences, 4 references each
♦ giza++ alignment, combined reordering scheme, MBR decoding
Exploiting multiple CWS standards
37
• System combination experiments ♦ Same tuning set MTC1+MTC3
♦ ZMERT and PRO tuning
♦ Test sets: NIST 2002 to 2006, 2008
♦ Evaluation: mteval-v11b, case-insensitive
Exploiting multiple CWS standards
38
• Results – component systems Exploiting multiple CWS standards
39
• Results – combining 5 systems ♦ Avg gain: 0.52 (ZMERT) and 0.82 (PRO)
Exploiting multiple CWS standards
40
• Results – combining the top 3 systems ♦ Avg gain: 0.35 (ZMERT) and 0.64 (PRO)
♦ Lower than when combining 5 systems
Exploiting multiple CWS standards
41
• Discussion ♦ CWS is a good source to generate diverse SMT systems
♦ Benefits: Ø Reducing segmentation errors Ø Reducing out-of-vocabulary words Ø Providing diverse translations
Exploiting multiple CWS standards
42
• Component system outputs
Exploiting multiple CWS standards
43
• Combined system output
Exploiting multiple CWS standards
44
Conclusion and future research
• Conclusion ♦ System combination does benefit MT
♦ Exceptions Ø Combining very few systems Ø Some component systems with exceptionally bad performance Ø Combining very similar systems (non-complementary)
♦ Achieved the goal of improving Chinese-English SMT system
45
Conclusion and future research
• Future research ♦ Evaluating different combination algorithms
Ø Collaborative decoding (Li et al., 2009)
♦ Trait-based approach as a way to generate diverse inputs (Devlin and Matsoukas, 2012)
46
Summary
• Empirical experiments ♦ MEMT as system combination module
♦ WMT and NIST evaluation sets
• System combination does benefit MT quality ♦ comparable, complementary input systems
• Exploiting multiple CWS as a way to diversify SMT systems ♦ improve a strong Chinese-English phrase-based system
♦ average gain 0.5-0.8 BLEU in NIST02-06 and NIST08
47
Thank You
48