Unsupervised Transfer Learning - WordPress.com · Overview: related areas Transfer Learning Tutorial (Hung-yi Lee) Domain Adaptation

Irene Li18, July, 2018

Unsupervised Transfer Learning

Why Transfer Learning?

Ganin, Yaroslav, and Victor Lempitsky. "Unsupervised domain adaptation by backpropagation." arXiv preprint arXiv:1409.7495 (2014).

Train on MNIST: .9891 Test on MNIST-M: .5749DROP

https://arxiv.org/pdf/1409.7495

Overview: related areas

Transfer Learning Tutorial (Hung-yi Lee)

Domain Adaptation

http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/transfer%20(v3).pdf

Transfer Learning In NLP?▪ Unsupervised TL setting

▪ Source domain:▪ Target domain: ?

▪ Problems in NLP• Frequency bias:▪ Different frequencies: same word in different domains

• Context feature bias:▪ “monitor” in Wall Street Journal and Amazon reviews

General Methods

Feature-based method (popular!):

Transfer the features into a same feature space!

Multi-layer feature learning (representation learning)

Model-based method:

Paramter init + fine-tune (a lot!)

Parameter sharing

Instance-based method:

Re-weighting: make source inputs similar with target inputs

Pseudo samples for target domain

Feature-based mothod: IntuitionSource DomainTarget Domain

New FeatureSpace

Source domain with labels

Target domain without labels

Feature-based mothod: Deep Adaptation Network

Long, Mingsheng, et al. "Learning transferable features with deep adaptation networks." ICML (2015).

http://proceedings.mlr.press/v37/long15.html

Source error (CNN loss) + domain discrepancy (MK-MMD)

Loss function: discriminativeness and domain invariance

Multi-kernel Maximum Mean Discrepancy

Maximum Mean Discrepancy (MMD)

Two-sample problem (unknown p and q):

Maximum Mean Discrepancy (Muller, 1997):Map the layers into a Reproducing Kernel Hilbert Space H with kernel

function k:

O(n^2)

MK-MMD: Optimization

Unbiased estimation in O(n):

Kernel:Gaussian Kernel (RBF), bandwidth sigma could be estimated.

Multi-kernel:

Replace MK-MMD with Coral Loss (Second-order Statistics)

Unsupervised domain adaptation method: aligns the second-order statistics of the source and target distributions with a linear transformation

Sun, Baochen, and Kate Saenko. "Deep coral: Correlation alignment for deep domain adaptation." European Conference on Computer Vision. Springer, Cham, 2016.

Correlation Matrix

https://arxiv.org/abs/1607.01719

https://arxiv.org/abs/1607.01719

Task: sentiment classification (pos or neg)

Method: word embeddings -> sentence encoding -> Logistic Regressor

Domain Adapted (DA) embeddings

Generic embeddings + Domain Specific (DS) embeddings via CCA/KCCA.

Word embeddings to sentence encoding:

i.e. a weighted combination of their constituent word embeddings.

Use a Logistic Regressor to do classification (pos or neg).

- Canonical Correlation Analysis (CCA)- Kernel CCA (nonlinear CCA)

Model-based Method (1): share word embeddings

Domain Adapted Word Embeddings for Improved Sentiment Classification ACL 2018 (short)

http://aclweb.org/anthology/P18-2007

Combine two embedding feature space

LSA Embedding

GloVe Embedding



Canonical Correlation Analysis (CCA): X = (X1, ..., Xn) and Y = (Y1, ..., Ym) of random variables, and there are correlations among the variables, then canonical-correlation analysis will find linear combinations of X and Y which have maximum correlation with each other.

Canonical Correlation Analysis (CCA)


LSA Embedding * Mapping

GloVe Embedding * Mapping

Final Embedding


Model-based Method(1): share word embeddings


Result on Yelp Dataset


Datasets:

(Source) MovieQA

(Target 1) TOEFL listening comprehension

(Target2) MCTest

Task: QA

Read an article + a question, find out a correct

answer from 4 or 5 choices.

Models: MemN2N, QACNN

Model-based Method (2): fine-tune

Supervised and Unsupervised Transfer Learning for Question Answering. Naccl, 2018

http://people.csail.mit.edu/andyyuan/docs/naacl-18.transferQA.paper.pdf

Supervised and Unsupervised Transfer Learning for Question Answering. Naccl, 2018

Model-based Method(2): fine-tune

Results

http://people.csail.mit.edu/andyyuan/docs/naacl-18.transferQA.paper.pdf

References

Muandet, Krikamol, et al. "Kernel mean embedding of distributions: A review and beyond." Foundations and Trends® in Machine Learning 10.1-2 (2017): 1-141.

Gretton, Arthur, et al. "A kernel method for the two-sample-problem." Advances in neural information processing systems. 2007.

Mou, Lili, et al. "How transferable are neural networks in nlp applications?." arXiv preprint arXiv:1603.06111 (2016).

Pan, Sinno Jialin, and Qiang Yang. "A survey on transfer learning." IEEE Transactions on knowledge and data engineering 22.10 (2010): 1345-1359.

Sun, Baochen, Jiashi Feng, and Kate Saenko. "Return of frustratingly easy domain adaptation." AAAI. Vol. 6. No. 7. 2016.

http://alex.smola.org/icml2008/

https://github.com/jindongwang/transferlearning

http://cs231n.github.io/transfer-learning/

http://alex.smola.org/icml2008/

https://github.com/jindongwang/transferlearning

http://cs231n.github.io/transfer-learning/

Q & A

Unsupervised Transfer Learning - WordPress.com · Overview: related areas Transfer Learning Tutorial (Hung-yi Lee) Domain Adaptation

Documents