Top Banner
Integrating Morphology in Probabilistic Translation Models Chris Dyer lti January 24, 2011 joint work with Jon Clark, Alon Lavie, and Noah Smith Tuesday, January 25, 2011
106

Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Nov 13, 2018

Download

Documents

trandang
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Integrating Morphology in Probabilistic Translation Models

Chris Dyer

lti

lti

January 24, 2011

joint work with Jon Clark, Alon Lavie, and Noah Smith

Tuesday, January 25, 2011

Page 2: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

das alte Haus

the old house

mach das

do that

2

Tuesday, January 25, 2011

Page 3: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

das alte Haus

the old house

mach das

do that

guten Tag

hello3

Tuesday, January 25, 2011

Page 4: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

1 4 2

5 7 6

3 1

8 10

9 11

124

Tuesday, January 25, 2011

Page 5: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

das alte Haus

the old house

mach das

do that

guten Tag

hello5

Tuesday, January 25, 2011

Page 6: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

das alte Haus

the old house

mach das

do that

guten Tag

hello

Haus

6

Tuesday, January 25, 2011

Page 7: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

das alte Haus

the old house

mach das

do that

Haus

guten Tag

hello

house

7

Tuesday, January 25, 2011

Page 8: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

das alte Haus

the old house

mach das

do that

das

guten Tag

hello8

Tuesday, January 25, 2011

Page 9: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

das alte Haus

the old house

mach das

do that

das

the guten Tag

hello9

Tuesday, January 25, 2011

Page 10: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

das alte Haus

the old house

mach das

do that

das

that guten Tag

hello10

Tuesday, January 25, 2011

Page 11: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

das alte Haus

the old house

mach das

do that

markant

guten Tag

hello11

Tuesday, January 25, 2011

Page 12: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

das alte Haus

the old house

mach das

do that

markant

??? guten Tag

hello12

Tuesday, January 25, 2011

Page 13: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

So far so good,

but....

13

Tuesday, January 25, 2011

Page 14: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

das alte Haus

the old house

mach das

do that

alten

guten Tag

hello14

Tuesday, January 25, 2011

Page 15: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

das alte Haus

the old house

mach das

do that

alten

guten Tag

hello

???

15

Tuesday, January 25, 2011

Page 16: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

the old house

mach das

do that old? guten Tag

hello

das alte Haus

alten

16

Tuesday, January 25, 2011

Page 17: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Problems

1. Source language inflectional richness.

17

Tuesday, January 25, 2011

Page 18: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

the old house

mach das

do that

guten Tag

hello

das alte Haus

18

Tuesday, January 25, 2011

Page 19: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

the old house

mach das

do that

guten Tag

hello

das alte Haus

19

Tuesday, January 25, 2011

Page 20: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

the old house

mach das

do that old guten Tag

hello

das alte Haus

20

Tuesday, January 25, 2011

Page 21: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

the old house

mach das

do that old guten Tag

hello

das alte Haus

alte

21

Tuesday, January 25, 2011

Page 22: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

the old house

mach das

do that old guten Tag

hello

das alte Haus

alten?

22

Tuesday, January 25, 2011

Page 23: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Problems

1. Source language inflectional richness.

2. Target language inflectional richness.

23

Tuesday, January 25, 2011

Page 24: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Bauchschmerzen

abdominal pain

Kopfschmerzen

head ache

Rücken

back

Rückenschmerzen

Kopf

head 24

Tuesday, January 25, 2011

Page 25: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Bauchschmerzen

abdominal pain

Kopfschmerzen

head ache

Rücken

back

???

Rückenschmerzen

Kopf

head 25

Tuesday, January 25, 2011

Page 26: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Bauchschmerzen

abdominal pain

Kopfschmerzen

head ache

Rücken

backback pain

Rückenschmerzen

Kopf

head 26

Tuesday, January 25, 2011

Page 27: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Bauchschmerzen

abdominal pain

Kopfschmerzen

head ache

back ache

Rückenschmerzen

Kopf

head 27

Rücken

back

Tuesday, January 25, 2011

Page 28: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Problems

1. Source language inflectional richness.

2. Target language inflectional richness.

3.Source language sublexical semantic compositionality.

28

Tuesday, January 25, 2011

Page 29: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

29

General Solution

MORPHOLOGY

Tuesday, January 25, 2011

Page 30: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

30

Synthesis

f’

f

e’

e

Analysis

Translation

Tuesday, January 25, 2011

Page 31: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

31

f’

f

e’

Tuesday, January 25, 2011

Page 32: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

31

f’

f

e’

AlAbAmA

Tuesday, January 25, 2011

Page 33: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

31

f’

f

e’

AlAbAmA

Al# Abama (looks like Al + OOV)

Tuesday, January 25, 2011

Page 34: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

31

f’

f

e’ the Ibama

AlAbAmA

Al# Abama (looks like Al + OOV)

Tuesday, January 25, 2011

Page 35: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

But...Ambiguity!

• Morphology is an inherently ambiguous problem

• Competing linguistic theories

• Lexicalization

• Morphological analyzers (tools) make mistakes

• Are minimal linguistic morphemes the optimal morphemes for MT?

32

Tuesday, January 25, 2011

Page 36: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Problems

1. Source language inflectional richness.

2. Target language inflectional richness.

3.Source language sublexical semantic compositionality.

33

4. Ambiguity everywhere!

Tuesday, January 25, 2011

Page 37: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

MORPHOLOGY

34

General Solution

PROBABILITY+

Tuesday, January 25, 2011

Page 38: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Why probability?• Probabilistic models formalize uncertainty

• e.g., words can be formed via a morphological derivation according to a joint distribution:

• The probability of a word is naturally defined as the marginal probability:

• Such a model can even be trained observing just words (EM!)

35

p(word, derivation)

derivation

p(word) = p(word, derivation)

Tuesday, January 25, 2011

Page 39: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

36

p(derived) = p(derived, de+rive+d) +p(derived, derived+∅) +p(derived, derive+d) +p(derived, deriv+ed) + ...

Tuesday, January 25, 2011

Page 40: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Outline• Introduction: 4 problems

• Three probabilistic modeling solutions

• Embracing uncertainty: multi-segmentations for decoding and learning

• Rich morphology via sparse lexical features

• Hierarchical Bayesian translation: infinite translation lexicons

• Conclusion

37

Tuesday, January 25, 2011

Page 41: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Outline• Introduction: 4 problems

• Three probabilistic modeling solutions

• Embracing uncertainty: multi-segmentations for decoding and learning

• Rich morphology via sparse lexical features

• Hierarchical Bayesian translation: infinite translation lexicons

• Conclusion

38

Tuesday, January 25, 2011

Page 42: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

39

f

Tuesday, January 25, 2011

Page 43: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

39

f AlAbAmA

Tuesday, January 25, 2011

Page 44: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

39

f AlAbAmA

f’ Al# Abama f’ AlAbama

Tuesday, January 25, 2011

Page 45: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

39

f AlAbAmA

f’ Al# Abama f’ AlAbama

e’ the Ibama e’ Alabama

Tuesday, January 25, 2011

Page 46: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Two problems

• We need to decode lots of similar source candidates efficiently

• Lattice / confusion network decoding

• We need a model to generate a set of candidate sources

• What are the right candidates?

40

Kumar & Byrne (EMNLP, 2005), Bertoldi, Zens, Federico (ICAASP, 2007), Dyer et al. (ACL, 2008), inter alia

Tuesday, January 25, 2011

Page 47: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Two problems

• We need to decode lots of similar source candidates efficiently

• Lattice / confusion network decoding

• We need a model to generate a set of candidate sources

• What are the right candidates?

41

Kumar & Byrne (EMNLP, 2005), Bertoldi, Zens, Federico (ICAASP, 2007), Dyer et al. (ACL, 2008), inter alia

Tuesday, January 25, 2011

Page 48: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Uncertainty is everywhere

42

Requirement: a probabilistic model p(f’|f) that transforms f → f’

Possible solution: a discriminatively trained model, e.g., a CRF

Required data: example (f,f’) pairs from a linguistic expert or other source

Tuesday, January 25, 2011

Page 49: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Uncertainty is everywhere

43

AlAntxAbAt(DEF+election+PL)

What is the best/right analysis ... for MT?

Tuesday, January 25, 2011

Page 50: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Uncertainty is everywhere

43

AlAntxAbAt(DEF+election+PL)

What is the best/right analysis ... for MT?

AlAntxAb +AtAl+ AntxAb +AtAl+ AntxAbAtAlAntxAbAt

Some possibilities: Sadat & Habash (NAACL, 2007)

Tuesday, January 25, 2011

Page 51: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Uncertainty is everywhere

43

AlAntxAbAt(DEF+election+PL)

What is the best/right analysis ... for MT?

AlAntxAb +AtAl+ AntxAb +AtAl+ AntxAbAtAlAntxAbAt

Some possibilities: Sadat & Habash (NAACL, 2007)

Let’s use them all!Tuesday, January 25, 2011

Page 52: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

• Train with EM variant

• Lattices can encode very large sets of references and support efficient inference

• Bonus: annotation task is much simpler

• Don’t know whether to label an example with A or B?

• Label it with both!

44

Dyer (NAACL, 2009), Dyer (thesis, 2010)

Wait...multiple references?!?

Tuesday, January 25, 2011

Page 53: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Wait...multiple references?!?

• Train with EM variant

• Lattices can encode very large sets of references and support efficient inference

• Bonus: annotation task is much simpler

• Don’t know whether to label an example with A or B?

• Label it with both!

45

Dyer (NAACL, 2009), Dyer (thesis, 2010)

Tuesday, January 25, 2011

Page 54: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Reference Segmentations

freitag

tonbandaufnahme tonband

aufnahme

tonband

freitag

46

f f’

Tuesday, January 25, 2011

Page 55: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

47

Rückenschmerzen

Rücken + schmerzen

Rückensc + hmerzenRü + cke + nschme + rzen

bad phonotactics!

good phonotactics!

Phonotactic features!

Tuesday, January 25, 2011

Page 56: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Just 20 features

• Phonotactic probability

• Lexical features (in vocab, OOV)

• Lexical frequencies

• Is high frequency?

• Segment length

• ...

48https://github.com/redpony/cdec/tree/master/compound-splitTuesday, January 25, 2011

Page 57: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

49

tonbandaufnahmeInput:

Tuesday, January 25, 2011

Page 58: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

!

"#$%&'()*!+,-.

-

#$%/&'()*!+01.

,

#$%/2&'()*!+0".

3#$%/2%&'()*!+00.

4

#$%/2%5&'()*!+34.

0#$%/2%52&'()*!+16.

1#$%/2%527&'()*!+6,.

6

#$%/2%5278&'()*!+69.

9

#$%/2%5278%&'()*!+94.

"!

#$%/2%5278%2&'()*"+!9.

""

#$%/2%5278%2:;<&'()*"+!0.

/2%&'()*!+,-.

/2%5&'()*!+3!.

/2%52&'()*!+36.

/2%527&'()*!+10.

/2%5278&'()*!+6-.

/2%5278%&'()*!+61.

/2%5278%2&'()*!+9,.

/2%5278%2:;<&'()*"+"9.

2%5&'()*!+91.

2%52&'()*!+40.

2%527&'()*!+10.

2%5278&'()*!+6".

2%5278%&'()*!+61.

2%5278%2&'()*!+9,.

2%5278%2:;<&'()*"+"!.

%52&'()*!+0-.

%527&'()*"+-1.

%5278&'()*"+-".

%5278%&'()*"+-1.

%5278%2&'()*"+,,.

%5278%2:;<&'()*"+4!.

527&'()*!+30.

5278&'()*"+!6.

5278%&'()*"+!-.

5278%2&'()*"+!1.5278%2:;<&'()*"+-4.

278&'()*!+4-.

278%&'()*!+3,.

278%2&'()*!+49.

278%2:;<&'()*!+,6.

78%&'()*!+60.

78%2&'()*"+3,.

78%2:;<&'()*"+39.

8%2&'()*!+19.8%2:;<&'()*"+,3.

%2:;<&'()*!+03.

2:;<&'()*!+1!.

:;<&'()*!+66.

49

tonbandaufnahmeInput:

Tuesday, January 25, 2011

Page 59: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

! "

#$%&'%(')*%'+,-./01!2!!3

4

#$%&'%(./015!2673 ')*%'+,-./015!2893

!

"#$%&'()*!+,-.

-#$%/0%1&'()*!+23.

,#$%/0%1045%0678&'()*"+!9.

/0%1&'()*!+2!.

045%0678&'()*!+,:.

!

"#$%&'()*!+,-.

-

#$%/&'()*!+01.

,

#$%/2&'()*!+0".

3#$%/2%&'()*!+00.

4

#$%/2%5&'()*!+34.

0#$%/2%52&'()*!+16.

1#$%/2%527&'()*!+6,.

6

#$%/2%5278&'()*!+69.

9

#$%/2%5278%&'()*!+94.

"!

#$%/2%5278%2&'()*"+!9.

""

#$%/2%5278%2:;<&'()*"+!0.

/2%&'()*!+,-.

/2%5&'()*!+3!.

/2%52&'()*!+36.

/2%527&'()*!+10.

/2%5278&'()*!+6-.

/2%5278%&'()*!+61.

/2%5278%2&'()*!+9,.

/2%5278%2:;<&'()*"+"9.

2%5&'()*!+91.

2%52&'()*!+40.

2%527&'()*!+10.

2%5278&'()*!+6".

2%5278%&'()*!+61.

2%5278%2&'()*!+9,.

2%5278%2:;<&'()*"+"!.

%52&'()*!+0-.

%527&'()*"+-1.

%5278&'()*"+-".

%5278%&'()*"+-1.

%5278%2&'()*"+,,.

%5278%2:;<&'()*"+4!.

527&'()*!+30.

5278&'()*"+!6.

5278%&'()*"+!-.

5278%2&'()*"+!1.5278%2:;<&'()*"+-4.

278&'()*!+4-.

278%&'()*!+3,.

278%2&'()*!+49.

278%2:;<&'()*!+,6.

78%&'()*!+60.

78%2&'()*"+3,.

78%2:;<&'()*"+39.

8%2&'()*!+19.8%2:;<&'()*"+,3.

%2:;<&'()*!+03.

2:;<&'()*!+1!.

:;<&'()*!+66.

a=0.4

a=0.250

a=∞

tonbandaufnahmeInput:

Tuesday, January 25, 2011

Page 60: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

0.955

0.96

0.965

0.97

0.975

0.98

0.985

0.99

0.995

1

0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Reca

ll

PrecisionSegmentationTuesday, January 25, 2011

Page 61: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Translation Evaluation

Input BLEU TER

Unsegmented 20.8 61.0

1-best segmentation 20.3 60.2

Lattice (a=0.2) 21.5 59.8

52

in police raids found illegal guns , ammunition stahlkern , laserzielfernrohr and a machine gun .in police raids found with illegal guns and ammunition steel core , alaser objective telescope and a machine gun .

police raids found illegal guns , steel core ammunition , alaser scope and a machine gun .

REF:

Tuesday, January 25, 2011

Page 62: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Outline• Introduction: 4 problems

• Three probabilistic modeling solutions

• Embracing uncertainty: multi-segmentations for decoding and learning

• Rich morphology via sparse lexical features

• Hierarchical Bayesian translation: infinite translation lexicons

• Conclusion

53

Tuesday, January 25, 2011

Page 63: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

54

What do we see when we look inside the IBM models?

(or any multinomial-based generative model...like parsing models!)

Tuesday, January 25, 2011

Page 64: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

55

old altesaltealtaltergammeliggammeliges

0.30.10.20.10.10.1

car WagenAutoPKW

0.20.60.2

What do we see when we look inside the IBM models?

(or any multinomial-based generative model...like parsing models!)

Tuesday, January 25, 2011

Page 65: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

56

old altesaltealtaltergammeliggammeliges

0.30.10.20.10.10.1

car WagenAutoPKW

0.20.60.2

What do we see when we look inside the IBM models?

(or any multinomial-based generative model...like parsing models!)

Tuesday, January 25, 2011

Page 66: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

DLVM for Translation

57

1. Source language inflectional richness.2. Target language inflectional richness.

Addresses problems:

How?1. Replace the locally normalized multinomial parameterization in a translation model with a globally normalized log-linear model.

p(e | f)

2. Add lexical association features sensitive to sublexical units.

C. Dyer, J. Clark, A. Lavie, and N. Smith (in review)

Tuesday, January 25, 2011

Page 67: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

58

a1 a2 a3 an

t1 t2 t3 tn

s

n

Fully directed model (Brown et al., 1993;Vogel et al., 1996; Berg-Kirkpatrick et al., 2010)

Our model

... a1 a2 a3 an

t1 t2 t3 tn

s

n

...

......s

s s s s

ss s

Tuesday, January 25, 2011

Page 68: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

59

a1 a2 a3 an

t1 t2 t3 tn

s

n

Fully directed model (Brown et al., 1993;Vogel et al., 1996; Berg-Kirkpatrick et al., 2010)

Our model

... a1 a2 a3 an

t1 t2 t3 tn

s

n

...

......s

s s s s

ss s

Tuesday, January 25, 2011

Page 69: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

60

old altesaltealtaltergammeliggammeliges

0.30.10.20.10.10.1

car WagenAutoPKW

0.20.60.2

Tuesday, January 25, 2011

Page 70: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

61

old altesaltealtaltergammeliggammeliges

0.30.10.20.10.10.1

old alt+gammelig+

score(e,f) = 0.2h1(e,f) + 0.9h2(e,f) + 1.3h1(e,f) + ...

car WagenAutoPKW

0.20.60.2

New model:

Ω[0,2]

Ω[0,2]

Tuesday, January 25, 2011

Page 71: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

62

old altesaltealtaltergammeliggammeliges

0.30.10.20.10.10.1

old alt+gammelig+

score(e,f) = 0.2h1(e,f) + 0.9h2(e,f) + 1.3h1(e,f) + ...

car WagenAutoPKW

0.20.60.2

New model:

Ω[0,2]

Ω[0,2]

(~ Incremental vs. realizational)

Tuesday, January 25, 2011

Page 72: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Sublexical Features

63

každoroční→annual

PREFIXkaž_ann

PREFIXkažd_annu

PREFIXkaždo_annua

IDkaždoroční_annual

SUFFIXní_al

SUFFIXí_l

Tuesday, January 25, 2011

Page 73: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Sublexical Features

64

každoroční→ annually

PREFIXkaž_ann

PREFIXkažd_annu

PREFIXkaždo_annua

IDkaždoroční_annually

SUFFIXní_ly

SUFFIXí_y

Tuesday, January 25, 2011

Page 74: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Sublexical Features

65

každoročního →

PREFIXkaž_ann

PREFIXkažd_annu

PREFIXkaždo_annua

IDkaždoročního_annually

SUFFIXho_ly

SUFFIXo_y

annually

Tuesday, January 25, 2011

Page 75: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Sublexical Features

66

každoročního →

PREFIXkaž_ann

PREFIXkažd_annu

PREFIXkaždo_annua

IDkaždoročního_annually

SUFFIXho_ly

SUFFIXo_y

Abstract away frominflectional variation!

annually

Tuesday, January 25, 2011

Page 76: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Evaluation

• Given a parallel corpus (no supervised alignments!), we can infer

• The weights in the log-linear translation model

• The MAP alignment

• The model is a translation model, but we evaluate it as applied to alignment

67

Tuesday, January 25, 2011

Page 77: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Alignment Evaluation

68

AER

Model 4Model 4Model 4

DLVMDLVMDLVM

e|f 24.8

f|e 33.6

sym. 23.4

e|f 21.9

f|e 29.3

sym. 20.5

Czech-English, 3.1M words training, 525 sentences gold alignments.

Tuesday, January 25, 2011

Page 78: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Translation Evaluation

69

Table 2: Czech-English experimental results. φsing. is theaverage fertility of singleton source words.

AER ↓ φsing. ↓ # rules ↑Model 4 e | f 24.8 4.1

f | e 33.6 6.6sym. 23.4 2.7 993,953

Our model e | f 21.9 2.3f | e 29.3 3.8sym. 20.5 1.6 1,146,677

Alignment BLEU ↑ METEOR ↑ TER ↓Model 4 16.3σ=0.2 46.1σ=0.1 67.4σ=0.3

Our model 16.5σ=0.1 46.8σ=0.1 67.0σ=0.2

Both 17.4σ=0.1 47.7σ=0.1 66.3σ=0.5

Table 3: Chinese-English experimental results.

φsing. ↓ # rules ↑Model 4 e | f 4.4

f | e 3.9sym. 3.6 52,323

Our model e | f 3.5f | e 2.6sym. 3.1 54,077

Alignment BLEU ↑ METEOR ↑ TER ↓Model 4 56.5σ=0.3 73.0σ=0.4 29.1σ=0.3

Our model 57.2σ=0.8 73.8σ=0.4 29.3σ=1.1

Both 59.1σ=0.6 74.8σ=0.7 27.6σ=0.5

as well. Second, there has been no previous workon discriminative modeling of Urdu, since, to ourknowledge, no gold alignments have been gener-ated. Finally, unlike English, Urdu is a head-finallanguage: not only does it have SOV word order,but rather than prepositions, it has post-positions,which precede the nouns they modify, meaning itslarge scale word order is wholly different from thatof English. Table 4 demonstrates the same patternof improving results with our discriminative model.

5.3 AnalysisThe quantitative results presented in this sectionstrongly suggest that our modeling approach pro-duces better alignments. In this section, we try tocharacterize how the model is doing what it doesand what it has learned. Because of the 1 regu-larization, the number of active (non-zero) featuresfor the various models have is small, relative to the

Table 4: Urdu-English experimental results.

φsing. ↓ # rules ↑Model 4 e | f 6.5

f | e 8.0sym. 3.2 244,570

Our model e | f 4.8f | e 8.3sym. 2.3 260,953

Alignment BLEU ↑ METEOR ↑ TER ↓Model 4 23.3σ=0.2 49.3σ=0.2 68.8σ=0.8

Our model 23.4σ=0.2 49.7σ=0.1 67.7σ=0.2

Both 24.1σ=0.2 50.6σ=0.1 66.8σ=0.5

number of features available to explain the data. Thenumber of ranged from about 300k for the smallChinese-English corpus to 800k for Urdu-English,with Czech in between, which is less than one tenthof all features. Coarse features (Model 1 proba-bilities, Dice coefficient, coarse positional features,etc.) typically received weights with large magni-tudes. However, language differences manifest inmany ways. For example, orthographic featureswere unsurprisingly more valuable in Czech (withits Latin alphabet) than in Chinese and Urdu. Ex-amining the more fine-grained features is also illu-minating. Table 5 shows the most highly weightedsource path bigram features on the three modelswhere English was the source language, and in each,we may observe some interesting characteristics ofthe target language. Left-most is English-Czech. Atfirst it may be surprising that words like since andthat have a highly weighted feature for transitioningto themselves. However, Czech punctuation rulesrequire that relative clauses and subordinating con-junctions be preceded by a comma (which is forbid-den or only optional in English), therefore our modeltranslates these words ‘twice’, once to produce thecomma, and a second time to produce the lexicalitem. The middle column is the English-Chinesemodel. In the training data, many of the sentencesare questions directed to a second person, you. How-ever, Chinese questions do not invert and the sub-ject remains in the canonical first position, thus thetransition from the start of sentence to you is highlyweighted. Finally, Figure 2 illustrates how Model4 (left) and our discriminative model (right) alignan English-Urdu sentence pair (the English side is

Czech-English, WMT 2010 test set, 1 reference

Tuesday, January 25, 2011

Page 79: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Outline• Introduction: 4 problems

• Three probabilistic modeling solutions

• Embracing uncertainty: multi-segmentations for decoding and learning

• Rich morphology via sparse lexical features

• Hierarchical Bayesian translation: infinite translation lexicons

• Conclusion

70

Tuesday, January 25, 2011

Page 80: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Bayesian Translation

71

2. Target language inflectional richness.

Addresses problems:

How?1. Replace multinomials in a lexical translation model with a process that generates target language lexical items by combining stems and suffixes.

2. Fully inflected forms can be generated, but a hierarchical prior backs off to a component-wise generation.

Tuesday, January 25, 2011

Page 81: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Chinese Restaurant Process

72

a b a c x ...

Tuesday, January 25, 2011

Page 82: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Chinese Restaurant Process

73

a b a c x

New customer

...

Tuesday, January 25, 2011

Page 83: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Chinese Restaurant Process

74

a b a c x

1

7 + α

1

7 + α

3

7 + α

2

7 + ααP0(x)

7 + α

...

Tuesday, January 25, 2011

Page 84: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Chinese Restaurant Process

75

a b a c x

1

7 + α

1

7 + α

3

7 + α

2

7 + ααP0(x)

7 + α

P0(x)

α “Concentration” parameter

Base distribution

...

Tuesday, January 25, 2011

Page 85: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

76

old altesaltealtaltergammeliggammeliges

0.30.10.20.10.10.1

car WagenAutoPKW

0.20.60.2

Tuesday, January 25, 2011

Page 86: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

77

old altesaltealtaltergammeliggammeliges

0.30.10.20.10.10.1

old

car WagenAutoPKW

0.20.60.2

alt+e

alt+es

alt+∅

alt

+es

+e

+∅

+es

+en

+er

+∅suffixes

New model:

Tuesday, January 25, 2011

Page 87: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

• Observed words are formed by an unobserved process that concatenates a stem α and a suffix β, yielding αβ

• A source word should have only a few translations αβ

• translate into only a few stems α

• The suffix β occurs many times, with many different stems

• β may be null

• β will have a maximum length of r

• Once a word has been translated into some inflected form, that inflected form, its stem, and its suffix should be more likely (“rich get richer”)

78

Modeling assumptions

Tuesday, January 25, 2011

Page 88: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

79

Synthesis

f

e’

e

Translation

X

Z

Observed during training

Latent variableTuesday, January 25, 2011

Page 89: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

80

f

e’

e

Translation

X

Z

Observed during training

Latent variable

Synthesis

+

Tuesday, January 25, 2011

Page 90: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

81

oldTranslate the word Task:

Tuesday, January 25, 2011

Page 91: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

82

old alt+e

gammelig+

alt+

inflected|old

oldTranslate the word Task:

Tuesday, January 25, 2011

Page 92: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

83

old alt+e

gammelig+

alt+

inflected|old

gammelig

alt

+e

+

stem|old suffix|old

oldTranslate the word Task:

alt

Tuesday, January 25, 2011

Page 93: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

84

old alt+e

gammelig+

alt+

inflected|old

gammelig

alt

+e

+

stem|old

oldTranslate the word Task:

?

+alt

Tuesday, January 25, 2011

Page 94: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

85

+en +e +s

+ +er

old alt+e

gammelig+

alt+gammelig

alt

stem|oldinflected|old

+alt

+e

+

?

Tuesday, January 25, 2011

Page 95: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

86

+en +e +s

+ +er

old alt+e

gammelig+

alt+gammelig

alt

stem|oldinflected|old

+alt en

+e

+

+en

Tuesday, January 25, 2011

Page 96: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Evaluation

• Given a parallel corpus, we can infer

• The MAP alignment

• The MAP segmentation of each target word into <stem+suffix>

87

Tuesday, January 25, 2011

Page 97: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Alignment Evaluation

88

AER

Model 1 - EM

Model 1 - HPYP

Model 1 - EM

Model 1 - HPYP

f|e 43.3

f|e 37.5

e|f 38.4

e|f 36.6

English-French, 115k words, 447 sentences gold alignments.

Tuesday, January 25, 2011

Page 98: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Frequent suffixes

89

Suffix Count

+∅ 20,837

+s 334

+d 217

+e 156

+n 156

+y 130

+ed 121

+ing 119

Tuesday, January 25, 2011

Page 99: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Assessment• Breaking the “lexical independence assumption”

is computationally costly

• The search space is much, much larger!

• Dealing only with inflectional morphology simplifies the problems

• Sparse priors are crucial for avoiding degenerate solutions

90

Tuesday, January 25, 2011

Page 100: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

In conclusion ...

91Tuesday, January 25, 2011

Page 101: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Why don’t we have integrated morphology?

92

Tuesday, January 25, 2011

Page 102: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Why don’t we have integrated morphology?

93

Because we spend all our time working on English, which doesn’t have much morphology!

Tuesday, January 25, 2011

Page 103: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Why don’t we have integrated morphology?• Translation with words is already hard: an n-word

sentence has n! permutations

• But, if you’re looking at a sentence with m letters there are m! permutations

• Search is ... considerably harder

• m > n m! n!

• Modeling is harder too

• must also support all these permutations!

94

>>>>>

Tuesday, January 25, 2011

Page 104: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Take away messages

• Morphology matters for MT

• Probabilistic models are a great fit for the uncertainty involved

• Breaking the lexical independence assumption is hard

95Tuesday, January 25, 2011

Page 105: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

Thank you!Toda!

$krAF!

https://github.com/redpony/cdec/

Tuesday, January 25, 2011

Page 106: Integrating Morphology in Probabilistic Translation Models · Integrating Morphology in Probabilistic Translation Models Chris Dyer lti lti January 24, 2011 joint work with Jon Clark,

97

PAPER UNDER REVIEW – DO NOT DISTRIBUTE

Unsupervised Word Alignment with Arbitrary Features

Chris Dyer Jonathan Clark Alon Lavie Noah A. Smith

Language Technologies Institute

Carnegie Mellon University

Pittsburgh, PA 15213, USA

cdyer,jhclark,alavie,[email protected]

Abstract

We introduce a discriminatively trained, glob-

ally normalized, log-linear variant of the lex-

ical translation models proposed by. In our

model, arbitrary, non-independent features

may be freely incorporated, thereby overcom-

ing the inherent limitation of generative mod-

els, which require that features be sensitive to

the conditional independencies of the genera-

tive process. However, unlike previous work

on discriminative modeling of word align-

ment (which also permits the use of arbitrary

features), the parameters in our models are

learned from unannotated parallel sentences,

rather than from supervised word alignments.

Using a variety of intrinsic and extrinsic met-

rics, we show our model yields better align-

ments than generative baselines in a number

of language pairs.

1 Introduction

n ∼ Poisson(λ)

ai ∼ Uniform(1/|f|)ei | fai ∼ Tfai

Tfai| a, b,M ∼ PYP(a, b,M(· | fai))

M(e = α+ β | f) = Gf (α)×Hf (β)

Gf | a, b, f, P0 ∼ PYP(a, b, P0(·))Hf | a, b, f,H0 ∼ PYP(a, b,H0(·))

H0 | a, b,Q0 ∼ PYP(a, b,Q0(·))

P0(α; p) =p|β|

|V ||β|× (1− p)

Q0(β; r) =1

(|V |× r)|β|

Tuesday, January 25, 2011