Top Banner
CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA: Adam Lee
25

CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:

CMSC 723 / LING 645: Intro to Computational Linguistics

September 8, 2004: Dorr

MT (continued), MT Evaluation

Prof. Bonnie J. DorrDr. Christof Monz

TA: Adam Lee

Page 2: CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:

MT Challenges: Ambiguity

Syntactic AmbiguityI saw the man on the hill with the telescope

Lexical AmbiguityE: bookS: libro, reservar

Semantic Ambiguity– Homography:

ball(E) = pelota, baile(S)– Polysemy:

kill(E), matar, acabar (S)– Semantic granularity

esperar(S) = wait, expect, hope (E)be(E) = ser, estar(S)fish(E) = pez, pescado(S)

Page 3: CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:

MT Challenges: Divergences

Meaning of two translationally equivalent phrases is distributed differently in the two languages

Example:– English: [RUN INTO ROOM]– Spanish: [ENTER IN ROOM RUNNING]

Page 4: CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:

Divergence Frequency

32% of sentences in UN Spanish/English Corpus (5K) 35% of sentences in TREC El Norte Corpus (19K) Divergence Types

– Categorial (X tener hambre X have hunger) [98%]

– Conflational (X dar puñaladas a Z X stab Z) [83%]

– Structural (X entrar en Y X enter Y) [35%]

– Head Swapping (X cruzar Y nadando X swim across Y) [8%]

– Thematic (X gustar a Y Y like X) [6%]

Page 5: CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:

Spanish/Arabic Divergences

Divergence E/E’ (Spanish) E/E’ (Arabic)

Categorial be jealous when he returns have jealousy [tener celos] upon his return [ رجوعه [عند

Conflational float come again go floating [ir flotando] return [عاد]

Structural enter the house seek enter in the house [entrar en la casa] search for [ عن [بحثHead Swap run in do something quickly

enter running [entrar corriendo] go-quickly in doing something [اسرع] Thematic I have a headache my-head hurts me [me duele la cabeza] —

[Arg1 [V]] [Arg1 [MotionV] Modifier(v)]“The boat floated’’ “The boat went floating’’

Page 6: CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:

(using narrowly defined divergence detection rules)

Language Detected Human Sample Corpus Confirmed Size Size

Spanish – Total 11.1% 10.5% 19K 150K

Arabic – Total 31.9 12.5% 1K 28K

Automatic Divergence Detection

Page 7: CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:

Application of Divergence Detection: Bilingual Alignment for MT

Word-level alignments of bilingual texts are an integral part of MT models

Divergences present a great challenge to the alignment task

Common divergence types can be found in multiple language pairs, systematically identified, and resolved

Page 8: CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:

The Problem:Alignment & Projection

I began to eat the fish

Yo empecé a comer el pescado

Page 9: CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:

Why is this a hard problem?

I run into the room

Yo entro en el cuarto corriendo

Page 10: CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:

Divergences!

English: [RUN INTO ROOM]Spanish: [ENTER IN ROOM RUNNING]

Page 11: CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:

Our Goal: Improved Alignment & Projection

Induce higher interannotator agreement rate

Increase the number of aligned words

Decrease multiple alignments

Page 12: CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:

DUSTer Approach: Divergence Unraveling

I run into the roomE:

I move-in running the roomE:

Yo entro en el cuarto corriendoS:

Page 13: CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:

Word-Level Alignment (1): Test Setup

run

John into

room

John

enter

room

running

Ex: John ran into the room → John entered the room running

Divergence Detection: Categorize English sentences into one of 5 divergence types

Divergence Correction: Apply appropriate

structural transformation [E → E]

Page 14: CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:

Word-Level Alignment (2): Testing Impact of Divergence Correction

Human align English and foreign sentence

Human align English and foreign sentence

Compare inter-annotator agreement, unaligned units, multiple alignments

Page 15: CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:

Word-Level Alignment Results

Inter-Annotator Agreement: – English-Spanish: agreement increased from 80.2% to 82.9%

– English-Arabic: agreement increased from 69.7% to 75.1%

Number of aligned words:– English-Spanish: aligned words increased from 82.8% to 86%

– English-Arabic: aligned words increased from 61.5% to 88.1%

Multiple Alignments:– English-Spanish: number of links went from 1.35 to 1.16

– English-Arabic: number of links increased from 1.48 to 1.72

Page 16: CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:

Divergence Unraveling Conclusions

Divergence handling shows promise for improvement of automatic alignment

Conservative lower bound on divergence frequency

Effective solution: syntactic transformation of English

Validity of solution shown through alignment experiments

Page 17: CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:

How do we evaluate MT?

Human-based Metrics– Semantic Invariance– Pragmatic Invariance– Lexical Invariance– Structural Invariance– Spatial Invariance– Fluency– Accuracy– “Do you get it?”

Automatic Metrics: Bleu

Page 18: CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:

BiLingual Evaluation Understudy (BLEU —Papineni, 2001)

Automatic Technique, but ….Requires the pre-existence of Human (Reference)

TranslationsApproach:

– Produce corpus of high-quality human translations– Judge “closeness” numerically (word-error rate)– Compare n-gram matches between candidate translation

and 1 or more reference translations

http://www.research.ibm.com/people/k/kishore/RC22176.pdf

Page 19: CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:

Bleu Comparison

Chinese-English Translation Example:

Candidate 1: It is a guide to action which ensures that the military always obeys the commands of the party.

Candidate 2: It is to insure the troops forever hearing the activity guidebook that party direct.

Reference 1: It is a guide to action that ensures that the military will forever heed Party commands.

Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party.

Reference 3: It is the practical guide for the army always to heed the directions of the party.

Page 20: CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:

How Do We Compute Bleu Scores?

Key Idea: A reference word should be considered exhausted after a matching candidate word is identified.

• For each word compute: (1) candidate word count(2) maximum ref count

• Add counts for each candidate word using the lower of the two numbers .

• Divide by number of candidate words..

Page 21: CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:

Modified Unigram Precision: Candidate #1

Reference 1: It is a guide to action that ensures that the military will forever heed Party commands.

Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party.

Reference 3: It is the practical guide for the army always to heed the directions of the party.

It(1) is(1) a(1) guide(1) to(1) action(1) which(1) ensures(1) that(2) the(4) military(1) always(1) obeys(0) the commands(1) of(1) the party(1)

What’s the answer??????

17/18

Page 22: CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:

Modified Unigram Precision: Candidate #2

It(1) is(1) to(1) insure(0) the(4) troops(0) forever(1) hearing(0) the activity(0) guidebook(0) that(2) party(1) direct(0)

What’s the answer??????

8/14

Reference 1: It is a guide to action that ensures that the military will forever heed Party commands.

Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party.

Reference 3: It is the practical guide for the army always to heed the directions of the party.

Page 23: CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:

Modified Bigram Precision: Candidate #1It is(1) is a(1) a guide(1) guide to(1) to action(1) action which(0) which ensures(0) ensures that(1) that the(1) the military(1) military always(0) always obeys(0) obeys the(0) the commands(0) commands of(0) of the(1) the party(1)

What’s the answer??????

10/17

Reference 1: It is a guide to action that ensures that the military will forever heed Party commands.

Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party.

Reference 3: It is the practical guide for the army always to heed the directions of the party.

Page 24: CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:

Modified Bigram Precision: Candidate #2

Reference 1: It is a guide to action that ensures that themilitary will forever heed Party commands.

Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party.

Reference 3: It is the practical guide for the army always to heed the directions of the party.

It is(1) is to(0) to insure(0) insure the(0) the troops(0) troops forever(0) forever hearing(0) hearing the(0) the activity(0) activity guidebook(0) guidebook that(0) that party(0) party direct(0)

What’s the answer??????

1/13

Page 25: CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:

Catching Cheaters

Reference 1: The cat is on the mat

Reference 2: There is a cat on the mat

the(2) the the the(0) the(0) the(0) the(0)

What’s the unigram answer?

2/7

What’s the bigram answer?

0/7