Top Banner
CS 479, section 1: Natural Language Processing Lecture #35: Word Alignment Models (cont.) This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License . Content by Eric Ringger, partially based on earlier slides from Dan Klein of U.C. Berkeley.
20

CS 479, section 1: Natural Language Processing

Feb 24, 2016

Download

Documents

sabin

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License . CS 479, section 1: Natural Language Processing. Lecture # 35: Word Alignment Models (cont.). Content by Eric Ringger, partially based on earlier slides from Dan Klein of U.C. Berkeley. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CS 479, section 1: Natural Language Processing

CS 479, section 1:Natural Language Processing

Lecture #35: Word Alignment Models (cont.)

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.

Content by Eric Ringger, partially based on earlier slides from Dan Klein of U.C. Berkeley.

Page 2: CS 479, section 1: Natural Language Processing

Announcements Project #4

Your insights into treebank grammars? Project #5

Model 2 discussed today! Propose-your-own

Reminder: No presentation, unless you really want to give one!

Check the schedule Plan enough time to succeed! Don’t get or stay blocked. Get your questions answered early. Get the help you need to keep moving forward. No late work accepted after the last day of instruction.

Page 3: CS 479, section 1: Natural Language Processing

Announcements (2) Project Report:

Early: Wednesday Due: Friday

Homework 0.4 Due: today

Reading Report #14 Phrase-based MT paper Due: next Monday (online again)

Page 4: CS 479, section 1: Natural Language Processing

EM Revisited

1. What are the four steps of the Expectation Maximization (EM) algorithm? Think of document clustering and/or training IBM

Model 1!

2. What are the two primary purposes of EM?

Page 5: CS 479, section 1: Natural Language Processing

Objectives

Observe problems with IBM Model 1

Model ordering issues as IBM Model 2!

Page 6: CS 479, section 1: Natural Language Processing

“Monotonic Translation”

Le Japon secoué par deux nouveaux séismes

Japan shaken by two new quakes

NULL

How would you implement a monotone decoder?(to translate the French)

Page 7: CS 479, section 1: Natural Language Processing

MT System You could now build a simple MT system using:

English language model English to French alignment model (IBM Model 1)

Canadian Hansard data Monotone Decoder

Greedy Or Viterbi

Page 8: CS 479, section 1: Natural Language Processing

IBM Model 1

1... Ja a a1 2a 2 3a 3 4a 4 5a 5 6a 6 6a 7 6a

( |ˆ ( , | )1)1 jj

jaI

t fa e eP f

ˆ ˆ( | ) ( , | )a

P f e P f a e

Target:

Source:

Page 9: CS 479, section 1: Natural Language Processing

One-to-Many Alignments

But there are other problems to think about as the following examples will show:

Page 10: CS 479, section 1: Natural Language Processing

Problem: Many-to-One Alignments

Page 11: CS 479, section 1: Natural Language Processing

Problem: Many-to-Many Alignments

Page 12: CS 479, section 1: Natural Language Processing

Problem: Local Order Change

Le Japon est au confluent de quatre plaques tectoniques

Japan is at the junction of four tectonic plates

“Distortions”

Page 13: CS 479, section 1: Natural Language Processing

Problem: More Distortions

Le tremblement de terre a fait 39 morts et 3,183 blessés.

The earthquake killed 39 and wounded 3,183.

Page 14: CS 479, section 1: Natural Language Processing

Insights

How to include “distortion” in the model?

How to prefer nearby distortions over long-distance distortions?

Page 15: CS 479, section 1: Natural Language Processing

IBM Model 2 Reminder: Model 1

Could model distortions without any strong assumptions about where they occur as a distribution over target language positions:

Could build a model as a distribution over distortion distances:

Page 16: CS 479, section 1: Natural Language Processing

Matrix View of an Alignment

Page 17: CS 479, section 1: Natural Language Processing

Preference for the Diagonal But alignments for some language pairs tend to the

diagonal in general: Can use a normal distribution for the distortion model

Page 18: CS 479, section 1: Natural Language Processing

EM for Model 2 Model 2 Parameters:

Translation probabilities: Distortion parameters:

Initialize with Model 1 Initialize as uniform E-step: For each pair of sentences :

For each French position 1. Calculate posterior over English positions :

2. Increment count of word with word by these amounts:

3. Similarly, for each English position , update:

( | , , )( | , , )

( | )( | )

, )·

( |·

,i

j i

j i

t f et

a i j I Ja f

P i f e jj I J ei

( | , ,, )( )j i P i jf fC ee

| , , ; ,C i j J I f e

Page 19: CS 479, section 1: Natural Language Processing

EM for Model 2 (cont.) M-step:

Re-estimate by normalizing these counts one conditional distribution for each context

Re-estimate by normalizing the earlier counts one conditional distribution per word e

Iterate until convergence of or a handful of times

See the directions for Project #5 on the course wiki for a more detailed version of this EM algorithm, including implementation tips.

Page 20: CS 479, section 1: Natural Language Processing

Next

Even better alignment models

Evaluating alignment models

Evaluating translation end-to-end!