Chunk-based Decoder for Neural Machine Translaon Shonosuke Ishiwatari¹, Jingtao Yao², Shujie Liu³, Mu Li³, Ming Zhou³, Naoki Yoshinaga⁴, Masaru Kitsuregawa⁴⁵, Weijia Jia² ¹The University of Tokyo ²Shanghai Jiao Tong University ³Microsoſt Research Asia ⁴Instute of Industrial Science, the University of Tokyo ⁵Naonal Instute of Informacs Overview Proposal: Chunk-based Decoder for Neural Machine Translaon Translaon examples Idea: Using a chunk rather than a word as basic translaon unit SOTA performance in distant languages (En -> Ja) Translaon between distant language pairs are difficult Word-based Encoder-Decoder か が 犬 だれ !"" !"# $%&&'( $) *+,'+(' " -+. + * α(4, dog) !"" /&&'(&%+( … 0'1+-'2 3(1+-'2 Difficules of translaon of En -> Ja Experiments Our decoder for NMT Quality of translaon Quality of generated chunks ※ Scores inside () are not our implementaons Data - ASPEC [Nakazawa+ 16] , 1.6M En/Ja pairs Preprocessing - Bunsetsu chunking with J.DepP [Yoshinaga & Kitsuregawa 09] Baseline systems 1. Word-based encoder-decoder [Bahdanau+ 15] 2. Tree-based encoder [Eriguchi+ 16] (SOTA) Results I heard that someone was bien by a dog, weren’ t you injured? だれ か が / 犬 に / 噛ま れ た そう だ けれど、/ 君 は / 怪我 し なかっ た ? Japanese sentence has - longer sequence (En: 25 vs. Ja: 30 [words/sentence]) - free chunk order (e.g., 「だれかが / 犬に」 = 「犬に / だれかが」 ) Enc-Dec with Aenon [Bahdanau+ 15] - encodes / decodes “word-by-word” Two step decoding - First a chunk, then words inside the chunk Two addional connecons 1. to capture the interacon between chunks 2. to memorize peprevious outputs well !"#$%& ! " # % ! & ' ( ) " # * + , - . ' / 0 + ( , " % * - # + ( % "'&'-#1#'2( #.#,!*+, 0+(,"%*-# '3!%+1#0 34 %)).4+1- %.!#*1%!+1- 5'.!%-# %6!#* +1!*'02,+1-%!&'()"#*+, 7# -%( +1 % !4)+,%. 0+#.#,!*+, 3%**+#* 0+(,"%*-# *#%,!'* 8 3#,%2(# 2(#* ')#*%!+'1 +( +&)'*!%1! 6 ' * ! " # + 0 # % ( 2 ) ) ' * ! +1 &%!#*+%. 0#5#.')! 9 %1 +1!#*6%,# 6'* % (23(!%1,# ')#*%!+'1 %! %!'&+, .#5#. /%( 0#5#.')#0 8 '&(&$&)%& ,07 )+ 42 % vn X 5 ua [ HN? la N?D G 6 R^ r JNAL > kV / 2 ej ar > s_ 0< 4 c := < qT 5 la 3 ( < # ph ZU 6 * , < .6 15 $ "& # % $ M%E 7 g 7 td + `] 3 ( < 18 $ ¥Y PKO 3 7 bm td 7 18 7 @Q G%IB@F > ZU / 1 *"$+,-./&+ 0 1.2+.).#3 567 vn X 5 ua [ HN? la fi o 6 * ) 2 R^ S 7 &' CF > kV / 1 W $ ej > s_ 0< -4 6 9 ; $ ,0 )+ 42 % qT 5 la 3 ( < # ph ZU 7 3! / # % M%E td + `] 3 ( ; $ ¥Y PKO 3 7 bm td 7 18 7 @QG IB @F + ZU . = 2 ) < # 82#)9 , -./&+ 0:$":"/&+7 ,0 - )+ 42 % $ : v n X5 : ua [ HN? la fio6 : R^ &' CF > : kV/1 : W$ : ej > : s _ /2 : c := < : qT 5 : la3(<# ph ZU 6 : *,< : '(*' 15 # % : M% E td + : `] 3 ( < 7 3 $ : ¥Y PKO 3 7 : bm td 7 : 18 7 : @Q G IB@F > : ZU / 1 # *$"); %2#)9 <"%.=>") 15 ?/#::"$=@ >/ A>//>); !"#$%"& ()*" +,-. /0+-1 !"#$%&'()*+ ,-.+*/ 12 2345 67327 !"#$%&'()*+ ,-.+*/ 32 4546 78596 !"#$%&'()*+ ,-.+*/ 82 457: 7;594 <.=+&'()*+ 457> 7;541 8$%"9 +,-. /0+-1 <.=+&'()*+ *$?.+*= @ :;<=>?@AB"% %"#$%"& CD&$*$B"%E 8$%"9 F FG374 2737F -.+*/ 3 17568 6853: -.+*/ 8 1954; 685;8 A=**&'()*+ *$?.+*= BC=DE#?"D@ 8>F @ <.=+&'()*+ +*?.+*= ,195:82 ,685>>2 <.=+&'()*+ *$?.+*=&+*?.+*= BG("+($(#@ 87F 1>511 ,195>92 68533 ,685>;2 1. Some languages use many words to represent one thing while others use less words 2. Some languages are free word-order while others are not decodes sentences in a “chunk-by-chunk” manner to overcome the differences of length and word-order Sequence of a sentence becomes much shorter Fixed word order and free chunk order can be modeled independenlty か が 犬 に だれかが 犬に end-of- chunk !"#$%&'()(' +,, -"./0(/ 1(2#($3(14 だれ 5./6&'()(' +,, #7680( 9.6('1 :;<(6 =./6 ./6(/ 8$6 :/(( =./6 ./6(/ ;$6(7($6($0'>4 9.6(' ? か 犬 に end-of- chunk が !"#$%&'()(' +,, だれ -./0&'()(' +,, 1.0(' 23 4$5(/&6"#$% 6.$$(657.$ 1.0(' 83 -./0&5.&6"#$% 9((0:;6%