Variational Attention 《Sequence-to-Sequence Models》 Source : COLING 2018 Speaker : Ya-Fang, Hsiao Advisor : Jia-Ling, Koh Date : 2020/01/03 for
Variational Attention
《Sequence-to-Sequence Models》
Source : COLING 2018
Speaker : Ya-Fang, Hsiao
Advisor : Jia-Ling, Koh
Date : 2020/01/03
for
PART
Introduction
Auto-Encoder
Encoder-DecoderDeterministic
Variational
Auto-Encoder
Encoder-Decoder
DAE
DED
VAE
VED
PART
Variational Autoencoder[Bowman et al. 2016] Generating Sentences from a Continuous Space
Data likelihood under the posterior (cross entropy)
KL divergence of the posterior from the prior
ℒ θ, 𝜙 = 𝔼𝑞𝜙 𝑧 𝑥 [𝑙𝑜𝑔𝑝θ(𝑥|𝑧)] − KL 𝑞𝜙 𝑧 𝑥 ||𝑝(𝑧)
Variational Seq2Seq model
A B
Bypassing phenomenon
C D
PART
VED+VAttn
ℒ θ, 𝜙 = 𝔼𝑞𝜙 𝑧, 𝑎 𝑥 [𝑙𝑜𝑔𝑝θ(𝑦|𝑧, 𝑎)] − KL 𝑞𝜙 𝑧, 𝑎 𝑥 ||𝑝(𝑧, 𝑎)
Variational Attention for
ℒ θ, 𝜙 = 𝔼𝑞𝜙(𝑧) 𝑧 𝑥 ,𝑞𝜙
(𝑎) 𝑎 𝑥 𝑙𝑜𝑔𝑝θ 𝑦 𝑧, 𝑎
−KL 𝑞𝜙(𝑧)
𝑧 𝑥 ||𝑝(𝑧) − KL 𝑞𝜙(𝑎)
𝑎 𝑥 ||𝑝(𝑎)
Variational Encoder Decoder
1. 𝑁 0, 𝐼
2. 𝑁(തℎ 𝑠𝑟𝑐 , 𝐼)
ℒ θ, 𝜙 = 𝔼𝑞𝜙(𝑧) 𝑧 𝑥 ,𝑞𝜙
(𝑎) 𝑎 𝑥 𝑙𝑜𝑔𝑝θ 𝑦 𝑧, 𝑎
−KL 𝑞𝜙𝑧
𝑧 𝑥 ||𝑝(𝑧) − KL 𝑞𝜙(𝑎)
𝑎 𝑥 ||𝑝(𝑎)𝜆𝐾𝐿[ ]
VED+VAttn
Variational Attention for Variational Encoder Decoder
𝛾𝑎
+
PART
Question GenerationStandford Question Answering Dataset (Rajpurkar et al., 2016, SQuAD)
Question GenerationStandford Question Answering Dataset (Rajpurkar et al., 2016, SQuAD)
Question GenerationStandford Question Answering Dataset (Rajpurkar et al., 2016, SQuAD)
Case study
PART
Using variational attention
to solve bypassing phenomenon
Showing more diversified
while retaining high quality