Top Banner
Irene Li 18, July, 2018 Unsupervised Transfer Learning
19

Unsupervised Transfer Learning - WordPress.com · Overview: related areas Transfer Learning Tutorial (Hung-yi Lee) Domain Adaptation

May 29, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Unsupervised Transfer Learning - WordPress.com · Overview: related areas Transfer Learning Tutorial (Hung-yi Lee) Domain Adaptation

Irene Li18, July, 2018

Unsupervised Transfer Learning

Page 2: Unsupervised Transfer Learning - WordPress.com · Overview: related areas Transfer Learning Tutorial (Hung-yi Lee) Domain Adaptation

Why Transfer Learning?

Ganin, Yaroslav, and Victor Lempitsky. "Unsupervised domain adaptation by backpropagation." arXiv preprint arXiv:1409.7495 (2014).

Train on MNIST: .9891 Test on MNIST-M: .5749DROP

Page 3: Unsupervised Transfer Learning - WordPress.com · Overview: related areas Transfer Learning Tutorial (Hung-yi Lee) Domain Adaptation

Overview: related areas

Transfer Learning Tutorial (Hung-yi Lee)

Domain Adaptation

Page 4: Unsupervised Transfer Learning - WordPress.com · Overview: related areas Transfer Learning Tutorial (Hung-yi Lee) Domain Adaptation

Transfer Learning In NLP?▪ Unsupervised TL setting

▪ Source domain:▪ Target domain: ?

▪ Problems in NLP• Frequency bias:▪ Different frequencies: same word in different domains

• Context feature bias:▪ “monitor” in Wall Street Journal and Amazon reviews

Page 5: Unsupervised Transfer Learning - WordPress.com · Overview: related areas Transfer Learning Tutorial (Hung-yi Lee) Domain Adaptation

General Methods

Feature-based method (popular!):

Transfer the features into a same feature space!

Multi-layer feature learning (representation learning)

Model-based method:

Paramter init + fine-tune (a lot!)

Parameter sharing

Instance-based method:

Re-weighting: make source inputs similar with target inputs

Pseudo samples for target domain

Page 6: Unsupervised Transfer Learning - WordPress.com · Overview: related areas Transfer Learning Tutorial (Hung-yi Lee) Domain Adaptation

Feature-based mothod: IntuitionSource DomainTarget Domain

New FeatureSpace

Page 7: Unsupervised Transfer Learning - WordPress.com · Overview: related areas Transfer Learning Tutorial (Hung-yi Lee) Domain Adaptation

Source domain with labels

Target domain without labels

Feature-based mothod: Deep Adaptation Network

Long, Mingsheng, et al. "Learning transferable features with deep adaptation networks." ICML (2015).

Page 8: Unsupervised Transfer Learning - WordPress.com · Overview: related areas Transfer Learning Tutorial (Hung-yi Lee) Domain Adaptation

Source error (CNN loss) + domain discrepancy (MK-MMD)

Loss function: discriminativeness and domain invariance

Multi-kernel Maximum Mean Discrepancy

Page 9: Unsupervised Transfer Learning - WordPress.com · Overview: related areas Transfer Learning Tutorial (Hung-yi Lee) Domain Adaptation

Maximum Mean Discrepancy (MMD)

Two-sample problem (unknown p and q):

Maximum Mean Discrepancy (Muller, 1997):Map the layers into a Reproducing Kernel Hilbert Space H with kernel

function k:

O(n^2)

Page 10: Unsupervised Transfer Learning - WordPress.com · Overview: related areas Transfer Learning Tutorial (Hung-yi Lee) Domain Adaptation

MK-MMD: Optimization

Unbiased estimation in O(n):

Kernel:Gaussian Kernel (RBF), bandwidth sigma could be estimated.

Multi-kernel:

Page 11: Unsupervised Transfer Learning - WordPress.com · Overview: related areas Transfer Learning Tutorial (Hung-yi Lee) Domain Adaptation

Replace MK-MMD with Coral Loss (Second-order Statistics)

Unsupervised domain adaptation method: aligns the second-order statistics of the source and target distributions with a linear transformation

Sun, Baochen, and Kate Saenko. "Deep coral: Correlation alignment for deep domain adaptation." European Conference on Computer Vision. Springer, Cham, 2016.

Correlation Matrix

Page 12: Unsupervised Transfer Learning - WordPress.com · Overview: related areas Transfer Learning Tutorial (Hung-yi Lee) Domain Adaptation

Task: sentiment classification (pos or neg)

Method: word embeddings -> sentence encoding -> Logistic Regressor

Domain Adapted (DA) embeddings

Generic embeddings + Domain Specific (DS) embeddings via CCA/KCCA.

Word embeddings to sentence encoding:

i.e. a weighted combination of their constituent word embeddings.

Use a Logistic Regressor to do classification (pos or neg).

- Canonical Correlation Analysis (CCA)- Kernel CCA (nonlinear CCA)

Model-based Method (1): share word embeddings

Domain Adapted Word Embeddings for Improved Sentiment Classification ACL 2018 (short)

Page 13: Unsupervised Transfer Learning - WordPress.com · Overview: related areas Transfer Learning Tutorial (Hung-yi Lee) Domain Adaptation

Combine two embedding feature space

LSA Embedding

GloVe Embedding

Domain Adapted Word Embeddings for Improved Sentiment Classification ACL 2018 (short)

Page 14: Unsupervised Transfer Learning - WordPress.com · Overview: related areas Transfer Learning Tutorial (Hung-yi Lee) Domain Adaptation

Canonical Correlation Analysis (CCA): X = (X1, ..., Xn) and Y = (Y1, ..., Ym) of random variables, and there are correlations among the variables, then canonical-correlation analysis will find linear combinations of X and Y which have maximum correlation with each other.

Canonical Correlation Analysis (CCA)

Domain Adapted Word Embeddings for Improved Sentiment Classification ACL 2018 (short)

LSA Embedding * Mapping

GloVe Embedding * Mapping

Final Embedding

Page 15: Unsupervised Transfer Learning - WordPress.com · Overview: related areas Transfer Learning Tutorial (Hung-yi Lee) Domain Adaptation

Model-based Method(1): share word embeddings

Domain Adapted Word Embeddings for Improved Sentiment Classification ACL 2018 (short)

Result on Yelp Dataset

Page 16: Unsupervised Transfer Learning - WordPress.com · Overview: related areas Transfer Learning Tutorial (Hung-yi Lee) Domain Adaptation

Datasets:

(Source) MovieQA

(Target 1) TOEFL listening comprehension

(Target2) MCTest

Task: QA

Read an article + a question, find out a correct

answer from 4 or 5 choices.

Models: MemN2N, QACNN

Model-based Method (2): fine-tune

Supervised and Unsupervised Transfer Learning for Question Answering. Naccl, 2018

Page 17: Unsupervised Transfer Learning - WordPress.com · Overview: related areas Transfer Learning Tutorial (Hung-yi Lee) Domain Adaptation

Supervised and Unsupervised Transfer Learning for Question Answering. Naccl, 2018

Model-based Method(2): fine-tune

Results

Page 18: Unsupervised Transfer Learning - WordPress.com · Overview: related areas Transfer Learning Tutorial (Hung-yi Lee) Domain Adaptation

References

Muandet, Krikamol, et al. "Kernel mean embedding of distributions: A review and beyond." Foundations and Trends® in Machine Learning 10.1-2 (2017): 1-141.

Gretton, Arthur, et al. "A kernel method for the two-sample-problem." Advances in neural information processing systems. 2007.

Mou, Lili, et al. "How transferable are neural networks in nlp applications?." arXiv preprint arXiv:1603.06111 (2016).

Pan, Sinno Jialin, and Qiang Yang. "A survey on transfer learning." IEEE Transactions on knowledge and data engineering 22.10 (2010): 1345-1359.

Sun, Baochen, Jiashi Feng, and Kate Saenko. "Return of frustratingly easy domain adaptation." AAAI. Vol. 6. No. 7. 2016.

http://alex.smola.org/icml2008/

https://github.com/jindongwang/transferlearning

http://cs231n.github.io/transfer-learning/

Page 19: Unsupervised Transfer Learning - WordPress.com · Overview: related areas Transfer Learning Tutorial (Hung-yi Lee) Domain Adaptation

Q & A