Top Banner
Semi-supervised and Active Learning 4/22 Amr Credit: lecture slides
35

Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

Jul 07, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

Semi-supervised and Active Learning

4/22

Amr

Credit: lecture slides

Page 2: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

The big picture

• Semi-supervised Learning

• Active Learning

Learning algorithm

Selective

labeling

Learning algorithm

Page 3: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

How?

• There is no free lunch!

• You need to make assumption

• Leverage them to construct an algorithm

• If assumption are correct we can improve

Page 4: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

Assumption: Overview

uncertainty samplingquery instances the model

is least confident about

query-by-committee (QBC)use ensembles to rapidlyreduce the version space

self-trainingexpectation-maximization (EM)

propagate confident labelingsamong unlabeled data

co-trainingmulti-view learning

use ensembles with multiple viewsto constrain the version space

both try to attack the same problem: making the most of unlabeled data U

Page 5: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

Semi-supervised

• If x and x’ are similar, then they are likely to have the same label

• Algorithm

– Assume generative model

– Cluster and label

– Regularize the classifier using unlabeled data

– Multi-view learning

• Does it help?

Page 6: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

Example

Page 7: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

Examples: 1-NN, works!

Page 8: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

Example: 1-NN, doesn’t work

Page 9: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

Can we be more robust?

• So in general how to deal with this problem?

– Generative model

– Regularization

Page 10: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

SSL using Mixture Models

• Use all data not one at a time!

Page 11: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

SSL using Mixture Models

Page 12: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

SSL using Mixture Models

• Inference and learning

– This was your midterm problem!

– You know more than you think you do!

• Is this robust to noise?

– At least you can get Bayes optimal if assumption is correct

– What if assumption are wrong?

Page 13: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function
Page 14: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

Can we be more robust?

• So in general how to deal with this problem?

– Generative model

– Regularization

Page 15: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

So why a new method

• As we said earlier

• Different kind of assumption

• What if data is not Gaussian?

– Remember spectral clustering

Page 16: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

Graph Regularization

• Regularized classifier

• Learn a classifier that minimize

– Loss term + regularize

– Example: ridge regression

• Can we use unlabeled data for regularization?

Loss on labeled data(mean square,0-1)

Graph based smoothness prioron labeled and unlabeled data

Page 17: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

Is it robust?

• You can play with the regularization parameter

• Sensitive to graph construction

Loss on labeled data(mean square,0-1)

Graph based smoothness prioron labeled and unlabeled data

Page 18: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

The big picture

• Semi-supervised Learning

• Active Learning

Learning algorithm

Selective

labeling

Learning algorithm

There is no free lunch

Pay a little than passive

Page 19: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

Active Learning

• Passive learning

– Input a set of example

– Output a classifier

• Observation:

– Labels are expensive

– Sometime you can get the same classifier with subset of the data

• Example?

Page 20: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

Active Learning

• SVM

– Only need support vector

• Is it that easy?

• What assumption are we making here?

– Noise free environment

• In general, we need a localized function

Page 21: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

Active Learning• When does it help?

Passive = Active

Active learning is useful if complexity of target function is localized – labels of some data points are more informative than others.

[Castro et al.,’05]

Page 22: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

Active Learning setup

induce a model

inspectunlabeleddata

select “queries”

label newinstances,

repeat

Page 23: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

Algorithms from Insights

• We need to learn a decision boundary

• Classification uncertainty

– Query example closer to decision boundary

– We become more confident if we get them right

– Somehow this is still local decisions

• Version-Space uncertainty

– Some how makes global decision

Page 24: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

Version Space

• Set of hypothesis consistent with labeled examples

Page 25: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

Version Space

• Our goal: get a single hypothesis

• Select example that results in maximum reduction of hypothesis space

• What is the problem with that?

Page 26: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

Version space: Algorithm

• Query by committee

– Keep an ensemble of classifiers to approximate

• Goal reduce “entropy” over their contributions

• Idea

– Sample from P(parameters| data)

Page 27: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

Case study: SVM

• How to represent version space

• This is slightly re-parameterized SVM objective but it is the same

Page 28: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

Case study: SVM

• How to represent version space

Page 29: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function
Page 30: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

Given the current labeled data we have an explicit representation of the version space

Page 31: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

Query point

• Halving the version space (query point c)

Page 32: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

Is it the End?

• Supervised

• Semi-supervised

• Active

• Transductive

– You still get to see unlabeled data

– But these are also your test data

– What can you do with that?

Page 33: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

Transductive SVM

• Chose a confident labeling of unlabeled data

Unlabeled data

Page 34: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

Transductive SVM

• Why does it make sense?

Page 35: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function

Transductive SVM

• When is it useful?

• News filtering

– Labeled data: news users liked in the past

– Test data (unlabeled): today’s news

– We only need to do well on those test data