Top Banner
Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi Sato (Kyoto-U), Hideo Watanabe (IBM Japan)
23

Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.

Dec 14, 2015

Download

Documents

Walter Evans
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.

Finding Translation Correspondences from Parallel Parsed Corpus for

Example-based Translation

Eiji Aramaki (Kyoto-U),

Sadao Kurohashi (U-Tokyo),

Satoshi Sato (Kyoto-U),

Hideo Watanabe (IBM Japan)

Page 2: Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.

Our method

Introduction

1-2%

Co-occurrence informationParallelCorpus

Syntactic InformationTranslation dictionary

Statistical approach

50%

Translationexamples

Page 3: Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.

Goal

大きく 寄与して いること が(great) (contribution) case-maker

大きく 寄与して いること が(great) (contribution) case-maker

This paper showsshows great contributionsgreat contributions of TFPof TFP ・・・

示されている(show)

示されている(show)

・・・全要素生産性 が(TFP) case-maker

全要素生産性 が(TFP) case-maker

Page 4: Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.

Problems

• For finding many correspondences

Translation Dictionary

1: some words can not be consulted by a dictionary

2: ambiguity resolution of consulting dictionary

2 Problems

Page 5: Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.

Overview

• Introduction

• Method

• Experiments

• Conclusion

Page 6: Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.

Method

Step 1 Detection of Phrasal Dependency Structure

Detection of Basic Phrasal Correspondences by Consulting Dictionary

Discovery of New CorrespondencesBy Handling Remaining Phrases

Step 2

Step 3

Page 7: Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.

Step1: Phrasal Dependency Structures

I

bought

this car

by monthly installments

I bought this car by monthly installments.

ESG (English Parser)

Rules

Page 8: Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.

Step1: Phrasal Dependency Structures

RulesRules

Function words are grouped together with a following content-word.

A compound noun is considered as one phrase.

Auxiliary verbs are grouped together with a following verb. (is playing, was tired, …)

A parallel-relation word is considered as one phrase. ( and , or ,… )

Page 9: Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.

Step2: Detection of Phrasal Correspondences

information technology in science technology

科学 技術 に(Science Technology)

おける 情報 技術(Information Technology)

… …

… …

Page 10: Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.

information technology in science technology

科学 技術 に(Science Technology)

おける 情報 技術(Information Technology)

Step2: Detection of Phrasal Correspondences

… …

… …

Page 11: Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.

information technology in science technology

科学 技術 に(Science Technology)

おける 情報 技術(Information Technology)

Step2: Detection of Phrasal Correspondences

… …

… …

Page 12: Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.

Step2: Detection of Phrasal Correspondences

information technology in science technology

科学 技術 に(Science Technology)

おける 情報 技術(Information Technology)

Page 13: Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.

Step2: Detection of Phrasal Correspondences

in science technology

科学 技術 に(Science Technology)

おける 情報 技術(Information Technology)

……

information technology…

Page 14: Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.

• Criteria to choose phrasal correspondences – Correspondences of content words

– Correspondences of neighboring phrases

# of word-link X 2

# of J content-word + # of E content-word

Step2: Detection of Phrasal Correspondences

Page 15: Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.

Method

Step 1 Detection of Phrasal Dependency Structure

Detection of Basic Phrasal Correspondences by Consulting Dictionary

Discovery of New CorrespondencesBy Handling Remaining Phrases

Step 2

Step 3

Page 16: Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.

Step3: Discovery of New CorrespondencesBy Handling Remaining Phrases

(New)

in post Cold war yearsCold war years

冷戦 終結 後 に(cold-war) (end) (after) case-maker

冷戦 終結 後 に(cold-war) (end) (after) case-maker

and servicesservicesgoods

物 や(object)

サービス の(service)

サービス の(service)

(merge)

Page 17: Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.

• Criteria to discover new correspondences– Local and Global supports

• Local support: other phrasal correspondences within two-phrase distance in the dependency structure.

• Global support: phrase correspondences in the parallel sentences.

– POS Consistency– Inner Sufficiency

Step3: Discovery of New CorrespondencesBy Handling Remaining Phrases

Page 18: Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.

JapanJapan the rolethe role

日本  は(Japan) case-maker

日本  は(Japan) case-maker

役割 を(Role) case-maker

役割 を(Role) case-maker

果たす(Achieve)

play

Step3: Discovery of New CorrespondencesBy Handling Remaining Phrases

Page 19: Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.

・・・

technologytechnology become importantbecome important

技術 が(technology) case-maker

技術 が(technology) case-maker

重要 と( important )

重要 と( important )

なっている( become )

has・・・

Step3: Discovery of New CorrespondencesBy Handling Remaining Phrases

Page 20: Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.

Experiments

Evaluation data:

200 sentence-pairs form White Paper & Example sentences in a Japanese-English dictionary

Gold standard data:

We manually tagged correct correspondences on

these sentences.

Correct : Exactly equal with a pre-aligned

Near-correct : Partly matches with a pre-aligned

Wrong : No match with Correct & Near-correct

Page 21: Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.

Output Examples

English Japanese Scoreis being pursued

of G7 nations

geographical proximity

行われている(is doing by )

先進 7 カ国の(advanced 7 countries )

地理的に近い(near in geography)

2.75

2.6

2.0

tree (become)

went [to bed]

She ( held)

その木は(That tree is)

寝る(Go to bed)

彼女は(She is)

1.2

1.0

0.5

Near-correct

Correct

Page 22: Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.

70

75

80

85

90

60 65 70 75 80

Recall

Precision

Precision – Recall

Correct→  

Correct   + Near-Correct × 0.5→  

Page 23: Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.

Conclusion

• We can find more correspondences than statistical approach.

• In comparable corpus, a statistical approach seems to be effective, however in parallel corpus, our approach is more effective to get large number of translation examples.

Statistical approach 1-2% of the input corpus

Our system 51-68% of the input corpus