0 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics and Robotics KIT - Institute for Anthropomatics and Robotics Pre-Translation for Neural Machine Translation Jan Niehues, Eunah Cho, Thanh-Le Ha and Alex Waibel KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu
39
Embed
Pre-Translation for Neural Machine Translation · Neural machine translation sets state-of-the art End-to-End neural network approach to machine translation Comparison to SMT Significant
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
0 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
KIT - Institute for Anthropomatics and Robotics
Pre-Translation for Neural Machine TranslationJan Niehues, Eunah Cho, Thanh-Le Ha and Alex Waibel
KIT – University of the State of Baden-Wuerttemberg andNational Research Center of the Helmholtz Association www.kit.edu
Mixed Input
12 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
Implementation:Join source sentence and PBMT translation
the goalie der TorwartRNN state encode source and PBMT translation
Language specific word embeddingsE_the E_goalie D_der D_Torwart
BPE for word encodingE_the E_go E_al E_ie D_der D_Tor D_wart
Result by Word Frequency
16 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
Alignment
19 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
0 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
KIT - Institute for Anthropomatics and Robotics
Pre-Translation for Neural Machine TranslationJan Niehues, Eunah Cho, Thanh-Le Ha and Alex Waibel
KIT – University of the State of Baden-Wuerttemberg andNational Research Center of the Helmholtz Association www.kit.edu
Motivation
1 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
2 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
NMT has different problemsSmall vocabularyProblems translating rare words
English: the goalie parriedNMT: der GottNMT(gloss): the god
Motivation
2 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
NMT has different problemsSmall vocabularyProblems translating rare words
English: the goalie parriedNMT: der GottNMT(gloss): the god
Combine SMT and NMTSimplify the task of NMT
Outline
3 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
MotivationMT approachesIdea
PipelineMixed Input
EvaluationConclusion
Statistical Machine Translation (SMT)
4 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
Build translations from blocks of source and target words (phrasepairs)
Statistical Machine Translation (SMT)
4 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
Build translations from blocks of source and target words (phrasepairs)
Statistical Machine Translation (SMT)
4 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
Build translations from blocks of source and target words (phrasepairs)
Statistical Machine Translation (SMT)
4 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
Build translations from blocks of source and target words (phrasepairs)
Neural Machine Translation (NMT)
5 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
Neural network to predict most probably target sequenceJointly train modelLarge improvements in translation quality
Neural Machine Translation (NMT)
6 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
Fixed vocabulary sizeByte pair encoding (Sennrich et al. 2016)
Represent all words with n sub-wordsStart with character representationJoin most common bi-gram sequence to new symbol
Exampel:t h e _ g o a l i e _ p a r r i e d
Neural Machine Translation (NMT)
6 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
Fixed vocabulary sizeByte pair encoding (Sennrich et al. 2016)
Represent all words with n sub-wordsStart with character representationJoin most common bi-gram sequence to new symbol
Exampel:t h e _ g o a l ie _ p a r r ie d
Neural Machine Translation (NMT)
6 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
Fixed vocabulary sizeByte pair encoding (Sennrich et al. 2016)
Represent all words with n sub-wordsStart with character representationJoin most common bi-gram sequence to new symbol
Exampel:t h e _ g o a l ie _ p a r r ied
Neural Machine Translation (NMT)
6 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
Fixed vocabulary sizeByte pair encoding (Sennrich et al. 2016)
Represent all words with n sub-wordsStart with character representationJoin most common bi-gram sequence to new symbol
Exampel:t h e _ g o a l ie _ pa r r ied
Neural Machine Translation (NMT)
6 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
Fixed vocabulary sizeByte pair encoding (Sennrich et al. 2016)
Represent all words with n sub-wordsStart with character representationJoin most common bi-gram sequence to new symbol
Exampel:the _ go al ie _ par ried
Difference SMT/NMT
7 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
SMT:Handle large vocabularyEasily extensible
Add translation via new phrase pairs
NMT:Joint modelLong contextBetter generalization due to word embeddings
Pre-Translation
8 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
Combine advantages of both approachesFacilitate advantages of SMTSuccessful combination of other approachesIdea:
Use SMT as input to NMTEncode words using Byte pair encoding
Use translation of words not in NMT vocabulary
Related Work
9 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
Combination of SMT and Rule-based MT (Dugast et al., 2007, Simardet al, 2007)Automatic Post editing (Junczyd-Dowmunt and Grundkiewicz, 2016)Preprocessing for PBMT
Compound splittingPre-reordering
Handling of rare words in NMT (Luong et al 2014, Sennrich et al,2015)
Pipeline
10 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
Input:Source sentence
Translate using PBMTTranslate from PBMT German to German using NMT
Pipeline
10 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
Input:Source sentence
Translate using PBMTTranslate from PBMT German to German using NMT
Pipeline
10 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
Input:Source sentence
Translate using PBMTTranslate from PBMT German to German using NMT
Pipeline
10 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
Input:Source sentence
Translate using PBMTTranslate from PBMT German to German using NMT
Mixed Input
11 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
Input:Source sentence
Translate using PBMTCombine source and PBMT TranslationTranslate joined text using NMT
Mixed Input
12 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
Implementation:Join source sentence and PBMT translation
the goalie der TorwartRNN state encode source and PBMT translation
Language specific word embeddingsE_the E_goalie D_der D_Torwart
BPE for word encodingE_the E_go E_al E_ie D_der D_Tor D_wart
Training
13 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
Training data:Parallel corpusPBMT translation of corpus
Problem:PBMT tends to overfit on the training data
Filter singletons from phrase tableSuccessful used in other models
Experiments
14 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
Training data:WMT EN-DE Data
PBMTIn-house translation system
NMTNematusBPE with 40K operations
Results English - German
15 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
16 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
Examples
17 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
English: Then with a shot which the goalie parriedwith his knee in the 35th minute.
PBMT: Dann mit einem Schuss, die der Torwart pariertmit seinem Knie in der 35. Minute.
NMT: Dann mit einem Schuss, den der Gottmit seinem Knie in der 35. Minute.
Pre: Dann mit einem Schuss, das der Torwartmit seinem Knie in der 35. Minute pariert.
Pre(gloss): Then with a shoot, that the goaliewith his knee in the 35th minute parried.
Examples
18 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
English: ... a riot in the stadium.PBMT: ... einen Aufruhr im Stadion.NMT: ... einen Riot im Stadion.Pre: ... einen Aufruhr im Station.Pre (gloss): ... a riot in_the stadium.
Alignment
19 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
Conclusion
20 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics
KIT
Combine advantages of NMT and SMTImprove handling of rare wordsEasy handling different input streamsIncrease overall translation performanceFurther work:
Do we need to do a full translation?
21 2016-12-15 Jan Niehues - Pre-Translation for Neural Machine Translation KIT - Institute for Anthropomatics andRobotics