Top Banner
Course Summary LING 575 Fei Xia 03/06/07
25

Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Course Summary

LING 575

Fei Xia

03/06/07

Page 2: Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Outline

• Introduction to MT: 1

• Major approaches– SMT: 3– Transfer-based MT: 2– Hybrid systems: 2

• Other topics

Page 3: Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Introduction to MT

Page 4: Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Major challenges

• Translation is hard.

• Getting the right words:– Choosing the correct root form– Getting the correct inflected form– Inserting “spontaneous” words

• Putting the words in the correct order:– Word order: SVO vs. SOV, …– Unique constructions: – Divergence

Page 5: Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Lexical choice

• Homonymy/Polysemy: bank, run

• Concept gap: no corresponding concepts in another language: go Greek, go Dutch, fen sui, lame duck, …

• Coding (Concept lexeme mapping) differences:– More distinction in one language: e.g., kinship

vocabulary.– Different division of conceptual space:

Page 6: Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Major approaches

• Transfer-based

• Interlingua

• Example-based (EBMT)

• Statistical MT (SMT)

• Hybrid approach

Page 7: Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

The MT triangle

word Word

Meaning

Transfer-based

Phrase-based SMT, EBMT

Word-based SMT, EBMT

(interlingua)

Ana

lysi

sS

ynthesis

Page 8: Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Comparison of resource requirement

Transfer-based

Interlingua EBMT SMT

dictionary + + +

Transfer rules

+

parser + + + (?)

semantic

analyzer

+

parallel data + +

others Universal representation

Generator

thesaurus

Page 9: Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Evaluation

• Unlike many NLP tasks (e.g., tagging, chunking, parsing, IE, pronoun resolution), there is no single gold standard for MT.

• Human evaluation: accuracy, fluency, …– Problem: expensive, slow, subjective, non-reusable.

• Automatic measures:– Edit distance– Word error rate (WER), Position-independent WER (PER)– Simple string accuracy (SSA), Generation string accuracy (GSA)– BLEU

Page 10: Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Major approaches

Page 11: Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Word-based SMT

• IBM Models 1-5

• Main concepts:– Source channel model– Hidden word alignment– EM training

Page 12: Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Source channel model for MT

)|(*)(maxarg* EFPEPEE

Eng sent Noisy channel Fr sent

P(E) P(F | E)

Two types of parameters:• Language model: P(E) • Translation model: P(F | E)

Page 13: Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Modeling p(F | E) with alignment

a

a

EaFPEaP

EFaPEFP

),|(*)|(

)|,()|(

Page 14: Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Modeling

)|()1(

)|()|(

1 1i

m

j

l

ijmeft

l

lmPEFP

Parameters:• Length prob: P(m | l)• Translation prob: t(fj | ei)• Distortion prob (for Model 2): d(i | j, m, l)

Model 1:

Model 2: ))|(*),,|(()|()|(1 1

i

m

j

l

ij eftlmjidlmPEFP

Page 15: Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Training

• Model 1:

FE

E

ii

F

jj

E

ii

eft

ffeeeft

efCt,

||

0''

||

0

||

0

)|('

)),((*)),((*)|('

),(

FVx

exCt

efCteft

),(

),()|(

Page 16: Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Finding the best alignment

),|(argmax* FEaPaa

)|(argmax*1

ja

m

jj

aefta

Given E and F, we are looking for

)|(maxarg*ij

ij efta

),...,(* **1 maaa

Model 1:

Page 17: Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Clump-based SMT

• The unit of translation is a clump.

• Training stage: – Word alignment– Extracting clump pairs

• Decoding stage:– Try all segmentations of the src sent and all the

allowed permutations– For each src clump, try TopN tgt clumps– Prune the hypotheses

Page 18: Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Transfer-based MT

• Analysis, transfer, generation:– Example: (Quirk et al., 2005)

1.Parse the source sentence2.Transform the parse tree with transfer rules3.Translate source words4.Get the target sentence from the tree

• Translation as parsing:– Example: (Wu, 1995)

Page 19: Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Hybrid approaches

• Preprocessing with transfer rules: (Xia and McCord, 2004), (Collins et al, 2005)

• Postprocessing with taggers, parsers, etc: JHU 2003 workshop

• Hierarchical phrase-based model: (Chiang, 2005)

• …

Page 20: Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Other topics

Page 21: Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Other issues

• Resources– MT for Low density languages– Using comparable corpora and wikipedia

• Special translation modules– Identifying and translating name entities and

abbreviations– …

Page 22: Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

To build an MT system (1)

• Gather resources– Parallel corpora, comparable corpora– Grammars, dictionaries, …

• Process data– Document alignment, sentence alignment– Tokenization, parsing, …

Page 23: Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

To build an MT system (2)

• Modeling

• Training– Word alignment and extracting clump pairs– Learning transfer rules

• Decoding– Identifying entities and translating them with special modules

(optional)– Translation as parsing, or parse + transfer + translation– Segmenting src sentence, replace src clump with target clump,

Page 24: Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

To build an MT system (3)

• Post-processing– System combination– Reranking

• Using the system for other applications:– Cross-lingual IR– Computer-assisted translation– ….

Page 25: Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Misc

• Grades– Assignments ( hw1-hw3): 30%– Class participation: 20%– Project:

• Presentation: 25%• Final paper: 25%