Top Banner
Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi
32

Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

Dec 13, 2015

Download

Documents

Adele Farmer
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent

Translation Task

Kyoto University

Toshiaki Nakazawa     Sadao Kurohashi

Page 2: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

Overview of Kyoto-U SystemTranslation Examples

J: 図書館で新聞を読むE: I read a newspaper in the library

J: 政治の本が売れ残っているE: A book in politics was left on the shelf

・・・・・

Page 3: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

・・・・・           ・・・・・

本 が売れ残って いる

政治 の a book

in politicswas left

on the shelf

図書館 で新聞 を

読む

I

read

a newspaper

in the library

library in

newspaper ACC

read

politics in

book NOM

left unsold

Overview of Kyoto-U SystemTranslation Examples

Page 4: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

Input:図書館で政治の本を読む。

Output:I read a book in politicsin the library

本 が売れ残って いる

政治 の a book

in politicswas left

on the shelf

図書館 で新聞 を

読む

I

read

a newspaper

in the library

・・・・・           ・・・・・

図書館 で

本 を読む

政治 の

read

book ACC

politics in

library in

a book

in politics

in the library

I

read

Overview of Kyoto-U SystemTranslation Examples

Page 5: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

Alignment

Page 6: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

Alignment

J: 交差点で、突然あの車が

飛び出して来たのです。

E : The car came at me from

the side at the intersection.

Page 7: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

Alignment

交差

点 で 、

突然

あの車 が

飛び出して 来た のです

the car

came

at me

from the side

at the intersection

1. Transformation into dependency structure

J: JUMAN/KNPE: Charniak’s nlparser → Dependency tree

Page 8: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

Alignment

交差

点 で 、

突然

あの車 が

飛び出して 来た のです

the car

came

at me

from the side

at the intersection

1. Transformation into dependency structure

2. Detection of word(s) correspondences

Page 9: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

Finding Correspondences• Bilingual dictionaries (500K entries)

• Substring co-occurrence (Cromieres 2006)

• Numeral normalization

二百十六万 →  2,160,000 ← 2.16 million

• Transliteration (Katakana words, NEs)  ローズワイン → rosuwain ⇔ rose wine

(similarity:0.78)新宿 → shinjuku ⇔ shinjuku (similarity:1.0)

)()(

),(

ecountjcount

ejcount

Page 10: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

Alignment

交差

点 で 、

突然

あの車 が

飛び出して 来た のです

the car

came

at me

from the side

at the intersection

1. Transformation into dependency structure

2. Detection of word(s) correspondences

3. Disambiguation of correspondences

Page 11: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

Alignment

交差

点 で 、

突然

あの車 が

飛び出して 来た のです

the car

came

at me

from the side

at the intersection

1. Transformation into dependency structure

2. Detection of word(s) correspondences

3. Disambiguation of correspondences

4. Handling of remaining phrasesExtension to leaf-nodes

Page 12: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

Alignment

交差

点 で 、

突然

あの車 が

飛び出して 来た のです

the car

came

at me

from the side

at the intersection

1. Transformation into dependency structure

2. Detection of word(s) correspondences

3. Disambiguation of correspondences

4. Handling of remaining phrases

5. Registration to translation example database

Page 13: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

Alignment Ambiguities

you

will have to file

insurance

an claim

insurance

with the office

in Japan

日本 で

保険

会社 に 対して

保険

請求 の

申し立て が

可能です よ

[in Japan]

[insurance]

[insurance]

[of claim]

[to the company]

[file]

[be able to]

Page 14: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

Alignment: Consistency

Near

Far

Page 15: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

• For each pair of candidates ai and aj

calculate the J-side distance dJ and

the E-side distance dE

• Give a consistency score to the pair based on dJ and dE

• Calculate consistency scores for all the pairs in a possible set of alignment candidates

2/)1(

),(),,(maxarg 1 1

nn

aadaadcsn

i

n

ij jiEjiJ

alignment

Page 16: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

Baseline

Distance of Each Branch: 1

  Consistency Score:

EJEJ dd

ddcs11

,

1/1+1/2=1.5…

……

Page 17: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

Consistency Score• The frequency of distance pair in gold-standard

alignment data (Mainichi newspaper 40K sentence pairs) [Uchimoto04]

Frequency (log)

Dist of J-Side Dist of E-Side

Page 18: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

Distance based on Dependency Type

you

will have to file

insurance

an claim

insurance

with the office

in Japan

日本 で

保険

会社 に 対して

保険

請求 の

申し立て が

可能です よ

デ格

文節内

連用

文節内

ノ格

ガ格

NP

NP

NN

PP

NN

PP

[in Japan]

[insurance]

[insurance]

[of claim]

[to the company]

[file]

[be able to]

Page 19: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

you

will have to file

insurance

an claim

insurance

with the office

in Japan

日本 で

保険

会社 に 対して

保険

請求 の

申し立て が

可能です よ

デ格

文節内

連用

文節内

ノ格

ガ格

NP

NP

NN

PP

NN

PP

[in Japan]

[insurance]

[insurance]

[of claim]

[to the company]

[file]

[be able to]

Distance based on Dependency Type

Page 20: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

you

will have to file

insurance

an claim

insurance

with the office

in Japan

日本 で

保険

会社 に 対して

保険

請求 の

申し立て が

可能です よ

デ格

文節内

連用

文節内

ノ格

ガ格

NP

NP

NN

PP

NN

PP

[in Japan]

[insurance]

[insurance]

[of claim]

[to the company]

[file]

[be able to]

Distance based on Dependency Type

Page 21: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

Example of Alignment Improvement

Proposed model Word-base alignment

Page 22: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

Translation

Page 23: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

Input:図書館で政治の本を読む。

Output:I read a book in politicsin the library

本 が売れ残って いる

政治 の a book

in politicswas left

on the shelf

図書館 で新聞 を

読む

I

read

a newspaper

in the library

・・・・・           ・・・・・

図書館 で

本 を読む

政治 の

read

book ACC

politics in

library in

a book

in politics

in the library

I

read

TranslationTranslation Examples

Page 24: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

Selection of Translation Examples

• Score for an example

1. Size of an example

2. Similarity of neighboring nodes

3. Translation probability

• Beam search from the root of the input

[Sato 91]

Page 25: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

Input:

図書館 で

本 を読む

政治 の

read

book ACC

politics in

library in

2sizew

読む

a newspaper

I

read

a newspaper

in the library

I

study

in the library

I

read

a newspaper

in the library

7.0 simw3

2 transw

0.7

Translation example:

新聞 を図書館 で

Page 26: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

Input:図書館で政治の本を読む。

本 が売れ残って いる

政治 の a book

in politicswas left

on the shelf

図書館 で新聞 を

読む

I

read

a newspaper

in the library

・・・・・           ・・・・・

図書館 で

本 を読む

政治 の

read

book ACC

politics in

library in

a book

in politics

in the library

I

read

Combination of TMsTranslation Examples

Page 27: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

      ┌ 記録    ┌ 領域 で の    ├ 変形  ┌ 形状 と ,  │ ┌ 記録  ├ 特性 の┌  関係 を調べた 。

┌ the relationship ││   ┌ deformation ││┌ shape and │││ │  ┌ recording │││ └ in the region ││├ recording │└ between characteristics was examined

InputDependency Tree

Input :記録領域での変形形状と,記録特性の関係を調べた。

OutputDependency Tree┌  状況 を

調べた 。┌ the situationwas examined

    ┌ 相互  ┌ 作用 と  │┌ 記録  ├ 特性 の┌  関係 を調べた 。

┌ the relationship ││┌ interaction and ││├ recording │└ between characteristics was investigated

      ┌ 大変    ┌ 形  ┌ 領域 で の  ├ 断面┌ 形状 を模擬 した

  ┌  cross-sectional ┌ shape ││   ┌   large ││┌ deformation │└ in the region was └ simulated

┌  記録領域 の

┌ recording of the areas

┌  変形パターン を

┌ deformation the pattern

Translation Examples

Output :The relationship between deformation shape in the recording region and recording characteristics was examined .

Page 28: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

Evaluation Resultsand

Discussion

Page 29: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

BLEU Adequacy Fluency Average

27.20 NTT 3.81 tsbmt 4.02 Japio 3.88 tsbmt

27.14 moses 3.71 Japio 3.94 tsbmt 3.86 Japio

27.14 MIT 3.15 MIT 3.66 MIT 3.40 MIT

25.48 NAIST-NTT 2.96 NTT 3.65 NTT 3.30 NTT

24.79 NICT-ATR 2.85 Kyoto-U 3.55 moses 3.18 moses

24.49 KLE 2.81 moses 3.44 tori 3.10 Kyoto-U

23.10 tsbmt 2.66 NAIST-NTT 3.43 NAIST-NTT 3.04 NAIST-NTT

22.29 tori 2.59 KLE 3.35 Kyoto-U 3.01 tori

21.57 Kyoto-U 2.58 tori 3.28 HIT2 2.94 KLE

19.93 mibel 2.47 NICT-ATR 3.28 KLE 2.86 HIT2

19.48 HIT2 2.44 HIT2 3.09 mibel 2.78 NICT-ATR

19.46 Japio 2.38 mibel 3.08 NICT-ATR 2.74 mibel

15.90 TH 1.87 TH 2.42 FDU-MCandWI 2.13 TH

9.55 FDU-MCandWI 1.75 FDU-MCandWI 2.39 TH 2.08 FDU-MCandWI

1.41 NTNU 1.08 NTNU 1.04 NTNU 1.06 NTNU

Intrinsic J-E Evaluation Result

Page 30: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

BLEU Adequacy Fluency Average

30.58 moses 3.53 tsbmt 3.69 moses 3.60 tsbmt

29.15 NICT-ATR 2.90 moses 3.67 tsbmt 3.30 moses

28.07 NTT 2.74 NTT 3.54 NTT 3.14 NTT

22.65 Kyoto-U 2.59 NICT-ATR 3.20 NICT-ATR 2.89 NICT-ATR

17.46 tsbmt 2.42 Kyoto-U 2.54 Kyoto-U 2.48 Kyoto-U

Intrinsic E-JEvaluation Result

Page 31: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

• Not caring whether a child node is a pre-child or post-child– Resulting target structure goes wrong

• After resolving this defect, BLEU score in EJ translation rose to 24.02 from 22.65

Critical Defect in EJ Translation

BLEU Adequacy Fluency Average

30.58 moses 3.53 tsbmt 3.69 moses 3.60 tsbmt

29.15 NICT-ATR 2.90 moses 3.67 tsbmt 3.30 moses

28.07 NTT 2.74 NTT 3.54 NTT 3.14 NTT

22.65 Kyoto-U 2.59 NICT-ATR 3.20 NICT-ATR 2.89 NICT-ATR

17.46 tsbmt 2.42 Kyoto-U 2.54 Kyoto-U 2.48 Kyoto-U

24.02 ? ? ?

Page 32: Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.

• Kyoto-U Fully Syntactic EBMT system:1. Alignment: Consistency

2. Alignment: Extension

3. Translation: Discontinuous example

4. Translation: Easy combination

• By using syntactic information, we could achieve reasonably high quality translation

• For patent translation, we may need some pre-processings to handle special expressions which cause parsing errors

Conclusion