Variational Attention - NTNU184pc128.csie.ntnu.edu.tw/presentation/20-01-03/Variational Attenti… · Variational Autoencoder [Bowman et al. 2016] Generating Sentences from a Continuous

Variational Attention

《Sequence-to-Sequence Models》

Source : COLING 2018

Speaker : Ya-Fang, Hsiao

Advisor : Jia-Ling, Koh

Date : 2020/01/03

for

PART

Introduction

Auto-Encoder

Encoder-DecoderDeterministic

Variational

Auto-Encoder

Encoder-Decoder

DAE

DED

VAE

VED

PART

Variational Autoencoder[Bowman et al. 2016] Generating Sentences from a Continuous Space

Data likelihood under the posterior (cross entropy)

KL divergence of the posterior from the prior

ℒ θ, 𝜙 = 𝔼𝑞𝜙 𝑧 𝑥 [𝑙𝑜𝑔𝑝θ(𝑥|𝑧)] − KL 𝑞𝜙 𝑧 𝑥 ||𝑝(𝑧)

Variational Seq2Seq model

A B

Bypassing phenomenon

C D

PART

VED+VAttn

ℒ θ, 𝜙 = 𝔼𝑞𝜙 𝑧, 𝑎 𝑥 [𝑙𝑜𝑔𝑝θ(𝑦|𝑧, 𝑎)] − KL 𝑞𝜙 𝑧, 𝑎 𝑥 ||𝑝(𝑧, 𝑎)

Variational Attention for

ℒ θ, 𝜙 = 𝔼𝑞𝜙(𝑧) 𝑧 𝑥 ,𝑞𝜙

(𝑎) 𝑎 𝑥 𝑙𝑜𝑔𝑝θ 𝑦 𝑧, 𝑎

−KL 𝑞𝜙(𝑧)

𝑧 𝑥 ||𝑝(𝑧) − KL 𝑞𝜙(𝑎)

𝑎 𝑥 ||𝑝(𝑎)

Variational Encoder Decoder

1. 𝑁 0, 𝐼

2. 𝑁(തℎ 𝑠𝑟𝑐 , 𝐼)

ℒ θ, 𝜙 = 𝔼𝑞𝜙(𝑧) 𝑧 𝑥 ,𝑞𝜙

(𝑎) 𝑎 𝑥 𝑙𝑜𝑔𝑝θ 𝑦 𝑧, 𝑎

−KL 𝑞𝜙𝑧

𝑧 𝑥 ||𝑝(𝑧) − KL 𝑞𝜙(𝑎)

𝑎 𝑥 ||𝑝(𝑎)𝜆𝐾𝐿[ ]

VED+VAttn

Variational Attention for Variational Encoder Decoder

𝛾𝑎

+

PART

Question GenerationStandford Question Answering Dataset (Rajpurkar et al., 2016, SQuAD)



Case study

PART

Using variational attention

to solve bypassing phenomenon

Showing more diversified

while retaining high quality

Variational Attention - NTNU184pc128.csie.ntnu.edu.tw/presentation/20-01-03/Variational Attenti… · Variational Autoencoder [Bowman et al. 2016] Generating Sentences from a Continuous

Documents