Chunk-based Decoder for Neural Machine Translationynaga/papers/ishiwatari-acl...Chunk-based Decoder for Neural Machine Translation Shonosuke Ishiwatari¹, Jingtao Yao², Shujie Liu³,

Chunk-based Decoder for Neural Machine TranslationShonosuke Ishiwatari¹, Jingtao Yao², Shujie Liu³, Mu Li³, Ming Zhou³, Naoki Yoshinaga⁴, Masaru Kitsuregawa⁴⁵, Weijia Jia²

¹The University of Tokyo ²Shanghai Jiao Tong University ³Microsoft Research Asia

⁴Institute of Industrial Science, the University of Tokyo ⁵National Institute of Informatics

Overview

Proposal: Chunk-based Decoder for Neural Machine Translation

Translation examples

Idea: Using a chunk rather than a word as basic translation unit

SOTA performance in distant languages (En -> Ja)

Translation between distant language pairs are difficult

Word-based Encoder-Decoder

かが犬だれ

!""

!"# $%&&'( $)*+,'+(' " -+.

＋ * α(4, dog)

!""

/&&'(&%+(

…

0'1+-'2

3(1+-'2

Difficulties of translation of En -> Ja

Experiments

Our decoder for NMT

Quality of translationQuality of generated chunks

※Scores inside () are not our implementations

Data - ASPEC [Nakazawa+ 16], 1.6M En/Ja pairs

Preprocessing - Bunsetsu chunking with J.DepP [Yoshinaga & Kitsuregawa 09]

Baseline systems 1. Word-based encoder-decoder [Bahdanau+ 15] 2. Tree-based encoder [Eriguchi+ 16] (SOTA)

Results

I heard that someone was bitten by a dog, weren’ t you injured?だれかが / 犬に / 噛まれたそうだけれど、/ 君　は / 怪我しなかった？

Japanese sentence has - longer sequence (En: 25 vs. Ja: 30 [words/sentence]) - free chunk order (e.g., 「だれかが/犬に」 = 「犬に/だれかが」)

Enc-Dec with Attention [Bahdanau+ 15] - encodes / decodes “word-by-word”

Two step decoding - First a chunk, then words inside the chunk

Two additional connections 1. to capture the interaction between chunks 2. to memorize peprevious outputs well

!"#$%&

!"#$%!&'()"#*+,$-.'/$0+(,"%*-#$+( %$"'&'-#1#'2($#.#,!*+,$0+(,"%*-#$'3!%+1#0$34$%)).4+1-$%.!#*1%!+1-$5'.!%-#$%6!#*$+1!*'02,+1-$%!&'()"#*+,$7#$-%($+1$%$!4)+,%.$0+#.#,!*+,$3%**+#*$0+(,"%*-#$*#%,!'*$8

3#,%2(#$2(#*$')#*%!+'1$+($+&)'*!%1!$6'*$!"#$+0#%$(2))'*! +1$&%!#*+%.$0#5#.')&#1!$9$%1$+1!#*6%,#$6'*$%$(23(!%1,#$')#*%!+'1$%!$%!'&+,$.#5#.$/%($0#5#.')#0$8$

'&(&$&)%&

*"$+,-./&+01.2+.).#34567

82#)9,-./&+0:$":"/&+7

:$:$:$ :$:$ :$ :$

:$ :$ :$

:$ :$$$$$:$

:$:$ :$

:$ :$:$

*$");4%2#)94<"%.=>") ?/#::"$=@4>/4A>//>);

!"#$%"&'()*" +,-. /0+-1

!"#$%&'()*+,-.+*/012 2345 67327!"#$%&'()*+,-.+*/032 4546 78596!"#$%&'()*+,-.+*/082 457: 7;594

<.=+&'()*+ 457> 7;541

8$%"9 +,-. /0+-1

<.=+&'()*+0*$?.+*=@0:;<=>?@AB"%'%"#$%"&'CD&$*$B"%E

8$%"9'F FG374 2737F

-.+*/ 3 17568 6853:

-.+*/08 1954; 685;8

A=**&'()*+0*$?.+*=0BC=DE#?"D@08>F0@0<.=+&'()*+0+*?.+*= ,195:82 ,685>>2

<.=+&'()*+0*$?.+*=&+*?.+*=0BG("+($(#@087F 1>511,195>92

68533,685>;2

1. Some languages use many words to represent one thing while others use less words

2. Some languages are free word-order while others are not

decodes sentences in a “chunk-by-chunk” manner toovercome the differencesof length and word-order

Sequence of a sentence becomes much shorter Fixed word order and free chunk order can be modeled independenlty

かが犬に

だれかが

犬に

end-of-chunk

!"#$%&'()('*+,,

-"./0(/*1(2#($3(14

だれ

5./6&'()('*+,,

#7680(

9.6('1*:;<(6*=./6*./6(/*8$6*:/((*=./6*./6(/*;$6(7($6($0'>4

9.6('*?

か犬にend-of-chunkが

!"#$%&'()('*+,,

だれ

-./0&'()('*+,,

1.0('*23*4$5(/&6"#$%*6.$$(657.$

1.0('*83*-./0&5.&6"#$%*9((0:;6%

Chunk-based Decoder for Neural Machine Translationynaga/papers/ishiwatari-acl...Chunk-based Decoder for Neural Machine Translation Shonosuke Ishiwatari¹, Jingtao Yao², Shujie Liu³,

Documents