Top Banner
Kyoshiro SUGIYAMA , AHC-Lab. , NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama , Masahiro Mizukami, Graham Neubig, Koichiro Yoshino, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura NAIST, Japan
22

Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.

Jan 12, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

An Investigation of Machine Translation Evaluation Metrics

in Cross-lingual Question AnsweringKyoshiro Sugiyama, Masahiro Mizukami, Graham Neubig,

Koichiro Yoshino, Sakriani Sakti, Tomoki Toda, Satoshi NakamuraNAIST, Japan

Page 2: Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Question answering (QA)

One of the techniques for information retrievalInput: Question Output: Answer

InformationSourceWhere is the

capital of Japan?

Tokyo.

Retrieval

Retrieval Result

2/22

Page 3: Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

QA using knowledge bases

Convert question sentence into a queryLow ambiguityLinguistic restriction of knowledge base Cross-lingual QA is necessary

Where is the capital of Japan?

Tokyo.

Type.LocationCountry.Japan.CapitalCity

Knowledge base

Location.City.Tokyo

QA system using knowledge baseQuery

Response3/22

Page 4: Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Cross-lingual QA (CLQA)

Question sentence (Linguistic difference) Information source

日本の首都はどこ?

東京

Type.LocationCountry.Japan.CapitalCity

Knowledge base

Location.City.Tokyo

QA system using knowledge base

Query

Response

To create mapping:High cost and

not re-usable in other languages

4/22

Any language

Any language

Page 5: Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

CLQA using machine translation

Machine translation (MT) can be used to perform CLQAEasy, low cost and usable in many languagesQA accuracy depends on MT quality

日本の首都はどこ?

Where is the capital of

Japan? ExistingQA system

Tokyo

MachineTranslation

東京 MachineTranslation

5/22

Any language

Any language

Page 6: Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Purpose of our work

To make clear how translation affects QA accuracyWhich MT metrics are suitable for the CLQA task? Creation of QA dataset using various translations systems Evaluation of the translation quality and QA accuracy

What kind of translation results influences QA accuracy? Case study (manual analysis of the QA results)

6/22

Page 7: Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

QA system

SEMPRE framework [Berant et al., 13]3 steps of query generation:Alignment

Convert entities in the question sentence into “logical forms”

BridgingGenerate predicates compatible with

neighboring predicatesScoring

Evaluate candidates using scoring function

7/22

Scoring

Page 8: Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Data set creation

8/22

Training(512 pairs)

Dev.(129 pairs)

Test(276 pairs)

(OR set)

Free917

JA set

HT set

GT set

YT set

Mo set

Tra set

Manual translation into Japanese

Translation into English

Page 9: Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Translation method

Manual Translation (“HT” set): Professional humans

Commercial MT systemsGoogle Translate (“GT” set)Yahoo! Translate (“YT” set)

Moses (“Mo” set): Phrase-based MT system

Travatar (“Tra” set): Tree-to-String based MT system9/22

Page 10: Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Experiments

Evaluation of translation quality of created data setsReference is the questions in the OR set

QA accuracy evaluation using created data setsUsing same model

Investigation of correlation between them

10/22

Page 11: Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Metrics for evaluation of translation quality

11/22

BLEU+1: Evaluates local n-grams

1-WER: Evaluates whole word order strictly

RIBES: Evaluates rank correlation of word order

NIST: Evaluates local word order and correctness of infrequent words

Acceptability: Human evaluation

Page 12: Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Translation quality

12/22

Page 13: Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

QA accuracy

13/22

Page 14: Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Translation quality and QA accuracy

14/22

Page 15: Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Translation quality and QA accuracy

15/22

Page 16: Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Sentence-level analysis

47% questions of OR set are not answered correctly These questions might be difficult to answer even with the correct translation result

Dividing questions into two groupsCorrect group (141*5=705 questions):

Translated from 141 questions answered correctly in OR setIncorrect group (123*5=615 questions):

Translated from remaining 123 questions in OR set

16/22

Page 17: Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Sentence-level correlation

Metrics (correct group) (incorrect group)

BLUE+1 0.900 0.0071-WER 0.690 0.092RIBES 0.418 0.311NIST 0.942 0.210

Acceptability 0.890 0.547

17/22

Page 18: Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Sentence-level correlation

Metrics (correct group) (incorrect group)

BLUE+1 0.900 0.0071-WER 0.690 0.092RIBES 0.418 0.311NIST 0.942 0.210

Acceptability 0.890 0.547

Very little correlation

NIST has the highest correlation Importance of content words

If the reference cannot be answered correctly, the sentences are not suitable,

even for negative samples

18/22

Page 19: Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Sample 1

19/22

Page 20: Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Sample 2

20/22

Lack of thequestion type-word

Page 21: Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Sample 3

21/22

All questions were answered correctly though they are grammatically incorrect.

Page 22: Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Conclusion

NIST score has the highest correlationNIST is sensitive to the change of content words

If reference cannot be answered correctly, there is very little correlation between translation quality and QA accuracyAnswerable references should be used

3 factors which cause change of QA results:content words, question types and syntax

22/22