Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.

Transfer Learning Task

Problem Identification

Dataset : AYear: 2000Features: 48

TrainingModel

‘M’ Testing

98.6%

TrainingModel

‘M’ Testing

97%

Dataset : BYear: 2006Features: 96

Model‘M’Training Testing

60.9% ??

Transfer Learning

Transfer learning is the improvement of learning in a new task through the transfer of knowledge from a related task that has already been learned.

Traditional Machine Learning vs. Transfer

Source Task

Knowledge

Target Task

Learning System

Different Tasks

Learning System

Learning System

Learning System

Traditional Machine Learning

Transfer Learning

Transfer Learning Definition

Given a source domain and source learning task, a target domain and a target learning task, transfer learning aims to help improve the learning of the target predictive function using the source knowledge, where

or

Transfer Definition

• Therefore, if either : Domain Differences

Task Differences

Examples: Cancer Data

Age Smoking

Age Height Smoking

Examples: Cancer Data

Task

Sour

ce: C

lass

ify

into

can

cer o

r no

canc

erTa

sk T

arge

t: C

lass

ify

into

can

cer le

vel o

ne,

canc

er le

vel t

wo,

canc

er le

vel t

hree

Settings of Transfer Learning

Transfer learning settings

Labelled data in a source domain

Labelled data in a target domain

Tasks

Inductive Transfer Learning × √ Classification

Regression…

√ √Transductive Transfer Learning √ × Classification

Regression…

Unsupervised Transfer Learning × × Clustering

…

Questions to answer when transferring

Wha

t to

Tra

nsfe

r ?

How

to T

ransf

er ?

When

to

Tra

nsf

er ?

Inst

ance

s

?

Mod

el ?

Featu

res ?

Map

M

odel

?

Uni

fy

Feat

ures

?

Wei

ght

Inst

ance

s ?

In w

hich

Situ

atio

ns

What to Transfer ??

Transfer learning approaches Description

Instance-transfer To re-weight some labeled data in a source domain for use in the target domain

Feature-representation-transfer Find a “good” feature representation that reduces difference between a source and a target domain

or minimizes error of models

Model-transfer Discover shared parameters or priors of models between a source domain and a target domain

Relational-knowledge-transfer Build mapping of relational knowledge between a source domain and a target domain.

Inductive Transfer Learning (Instance-transfer)

• Assumption: the source domain and target domain data use exactly the same features and labels.

• Motivation: Although the source domain data can not be reused directly, there are some parts of the data that can still be reused by re-weighting.

• Main Idea: Discriminatively adjust weighs of data in the source domain for use in the target domain.

Instance-transfer

• Assumptions: • Source and Target task have same feature space:

• Marginal distributions are different:

Not all source data might be helpful !

Algorithm: TrAdaBoost

• Idea:

• Iteratively reweight source samples such that: • reduce effect of “bad” source instances• encourage effect of “good” source instances

• Requires:

• Source task labeled data set • Very small Target task labeled data set• Unlabeled Target data set • Base Learner

Self taught clustering

• Unsupervised transfer learning• Co-clustering, no labelled data

• Feature based transfer learning• Features are not the same• Tasks may not be the same

• First applied on image clustering

• Key idea: found high level shared features, new feature representation

Self Taught Learning

Self taught learning

Latent Dirichlet Allocation (LDA)

• LDA is a generative probabilistic model of a corpus. The basic idea is that the documents are represented as random mixtures over latent topics, where a topic is characterized by a distribution over words.

• Typically used for topic modeling• Forums, twitter messages, text corpus

• Do not consider word order• Can be viewed as a dimension reduction technique.

Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.

Documents

source data

source domain data

source learning task

source task

transfer of knowledge

target learning task

source knowledge

target domain data use