Top Banner
Sequence to Sequence Models for Machine Translation CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides & figure credits: Graham Neubig
20

Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Aug 01, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Sequence to Sequence Modelsfor Machine TranslationCMSC 723 / LING 723 / INST 725

Marine Carpuat

Slides & figure credits: Graham Neubig

Page 2: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Machine Translation

• Translation system• Input: source sentence F

• Output: target sentence E

• Can be viewed as a function

• Statistical machine translation systems

• 3 problems

• Modeling• how to define P(.)?

• Training/Learning• how to estimate parameters from

parallel corpora?

• Search• How to solve argmax efficiently?

Page 3: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Introduction to Neural Machine Translation

• Neural language models review

• Sequence to sequence models for MT• Encoder-Decoder

• Sampling and search (greedy vs beam search)

• Practical tricks

• Sequence to sequence models for other NLP tasks

Page 4: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

A feedforward neural 3-gram model

Page 5: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

A recurrent language model

Page 6: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

A recurrent language model

Page 7: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Examples of RNN variants

• LSTMs• Aim to address vanishing/exploding gradient issue

• Stacked RNNs

• …

Page 8: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Training in practice: online

Page 9: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Training in practice: batch

Page 10: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Training in practice: minibatch

• Compromise between online and batch

• Computational advantages• Can leverage vector processing instructions in modern hardware

• By processing multiple examples simultaneously

Page 11: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Problem with minibatches: in language modeling, examples don’t have the same length

• 3 tricks• Padding

• Add </s> symbol to make all sentences same length

• Masking• Multiply loss function calculated

over padded symbols by zero

• + sort sentences by length

Page 12: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Introduction to Neural Machine Translation

• Neural language models review

• Sequence to sequence models for MT• Encoder-Decoder

• Sampling and search (greedy vs beam search)

• Training tricks

• Sequence to sequence models for other NLP tasks

Page 13: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Encoder-decoder model

Page 14: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Encoder-decoder model

Page 15: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Generating Output

• We have a model P(E|F), how can we generate translations?

• 2 methods

• Sampling: generate a random sentence according to probability distribution

• Argmax: generate sentence with highest probability

Page 16: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Ancestral Sampling

• Randomly generate words one by one

• Until end of sentence symbol

• Done!

Page 17: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Greedy search

• One by one, pick single highest probability word

• Problems• Often generates easy words first

• Often prefers multiple common words to rare words

Page 18: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Greedy SearchExample

Page 19: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Beam SearchExample with beam size b = 2

We consider b top hypotheses at each time step

Page 20: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence

Introduction to Neural Machine Translation

• Neural language models review

• Sequence to sequence models for MT• Encoder-Decoder

• Sampling and search (greedy vs beam search)

• Practical tricks

• Sequence to sequence models for other NLP tasks