Top Banner
Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein
25

Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Mar 29, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Joint Parsing and Alignment with Weakly Synchronized Grammars

David Burkett, John Blitzer, & Dan Klein

John
PhD, visiting researcher, PostdocAll on correspondencesSay: Correspondences for Domain Adaptation Correspondences for Machine Translation
Page 2: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Statistical MT Training Pipeline

1) Align sentence pairs (GIZA++)2) Parse English sentences (Berkeley parser) Parse Foreign sentences

3) Extract rules (Galley et al. 2006)

4) Tune discriminative parameters

在at

办公室office

里in

读了read

书book

read

the

book

in

the

office

} Joint model for (1) & (2)

Page 3: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Data Setting for Joint Models

( 中文 ; )

English WSJ

.

.

.

(EN; )

(EN; )

(EN; )

( 中文 ; )...

( 中文 ; )

Chinese CTBParallel, Aligned CTB

.

.

.

(EN, 中文 ; )

(EN, 中文 ; )

(EN, 中文 ; )

Unlabeled parallel text

.

.

.

(EN; 中文 )

(EN; 中文 )

(EN; 中文 )

Page 4: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Word alignment grids

在at

办公室office

里in

读了read

书book

read

the

book

in

the

office

Page 5: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Syntactic Correspondences

EN中文Build a model

Page 6: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Correspondence via Synchronous Grammars

Page 7: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Synchronous derivation

Page 8: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Synchronous Derivation

Page 9: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Weakly Synchronized Example

Page 10: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Weakly Synchronized Example

Separate PCFGs

Page 11: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Weakly Synchronized Example

ITG alignment

Page 12: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Weakly Synchronized Example

Points for synchronization, but not required

Page 13: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Correspondence Model & Feature Types

办公室office

Feature type 1: Word Alignment

EN 中文

PPPP

Feature type 3: Correspondence

Feature type 2: Monolingual Parser

ENPP

in the office

EN 中文EN 中文

EN 中文EN 中文EN 中文

[HBDK09]

Page 14: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Estimating

EN 中文 EN 中文

• Set to maximize the log-likelihood of the correct parses & alignments

EN EN 中文中文 EN 中文

EN 中文• normalizes to sum to 1

Page 15: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Computing

PP PP Correspondence features tie pieces together

EN 中文

EN 中文

Computing exactly is intractable

EN 中文 EN 中文

Individual , , have polynomial-time dynamic programming algorithms

Page 16: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Approximating : Mean Field

• Exploit tractability in individual models:

• Factored approximation: EN 中文

PPPP

1) Initialize separately

2) Iterate:

• Set to minimize EN 中文

EN 中文

Algorithm

Page 17: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Large scale inference

We can approximate in polynomial time, but . . .EN 中文

Sum over possible alignments is an algorithm.

But computers are fast, right?

• Medium-length sentences are 50 words long• Small translation data sets are 250,000 sentences• ~4 quadrillion operations (See for speedup details)[BBK10, HBDK09]

Page 18: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Quantitative Results: Parsing

Series178

81

84

87

90 Monolingual Joint

Page 19: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Quantitative Results: Parsing

Chinese parser78

81

84

87

90 Monolingual Joint

85.7%

83.6%

Page 20: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Quantitative Results: Parsing

Chinese parser English parser78

81

84

87

90 Monolingual Joint

81.2%

84.5%

Page 21: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Incorrect English PP Attachment

Page 22: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Corrected English PP Attachment

Page 23: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Quantitative Results: Translation

Word alignment65

69

73

77

81

85

89 HMM Discriminative ITG Joint

69.5%

85.0%

BLEU improvement from 29.4 to 30.6

79.5%

Page 24: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Better Translations with Bilingual Adaptation

ReferenceAt this point the cause of the plane collision is still unclear. The local caa will launch an investigation into this .

Baseline (GIZA++)The cause of planes is still not clear yet, local civil aviation department will investigate this .

目前 导致 飞机 相撞 的 原因 尚 不 清楚 , 当地 民航 部门 将 对此 展开 调查Cur-

rently cause plane crash DE reason yet not clear, localcivilaero-

nauticsbureau will toward open

investi-gations

Bilingual Adaptation ModelThe cause of plane collision remained unclear, local civil aviation departments will launch an investigation .

Page 25: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Thanks