Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department 04-10-2008.

Graph-Based Graph-Based Semi-Supervised Semi-Supervised

Learning Learning with a Generative Modelwith a Generative Model

Speaker: Jingrui HeSpeaker: Jingrui HeAdvisor: Jaime CarbonellAdvisor: Jaime Carbonell

Machine Learning Department 04-10-2008

04/03/2008 2

Semi-Supervised LearningSemi-Supervised Learning

Very Few

Abundant

04/03/2008 3

OutlineOutline

►BackgroundBackground►Existing MethodsExisting Methods►Proposed MethodProposed Method

Ideal CaseIdeal Case General CaseGeneral Case

►Experimental ResultsExperimental Results►ConclusionConclusion

04/03/2008 4

OverviewOverview

Semi-SupervisedLearning

Featurebased

Graphbased

Gradually generate class labels

Collectively generate class labels

Mincut [Blum, ICML01]

Gaussian Random Fields [Zhu, ICML03]

Local and Global Consistency [Zhou, NIPS04]

Generative Model [He, IJCAI07]

Self-Training[Yarowsky, , ACL95]

Co-Training[Blum, , COLT98]

TSVMs[Joachims, ICML99]

EM-based[Nigam, , ML00]

04/03/2008 5

Self-Training Self-Training [Yarowsky, ACL95][Yarowsky, ACL95]

04/03/2008 6

Co-Training Co-Training [Blum, COLT98][Blum, COLT98]

Sufficient to train a good classifier

Conditionally independent given the class

04/03/2008 7

Transductive SVMs Transductive SVMs [Joachims, ICML9[Joachims, ICML99]9]

Inductive SVMs

Transductive SVMs

ClassificationBoundary:Away fromthe DenseRegions!

04/03/2008 8

EM-based Method EM-based Method [Nigam, ML00][Nigam, ML00]

TextCorpus

P , P Px y y x y

ComputerScience

Medicine Politics

04/03/2008 9

-----++++

Graph-Based Graph-Based Semi-Supervised LearningSemi-Supervised Learning

04/03/2008 10

Graph-Based MethodsGraph-Based Methods► G={V,E}G={V,E}► Estimating a function Estimating a function

ff on the graph on the graph f f should be close to should be close to

the given labels on the the given labels on the labeled nodeslabeled nodes

f f should be smooth on should be smooth on the whole graphthe whole graph

► RegularizationRegularization

-----+++++

04/03/2008 11

Graph-Based Methods cont.Graph-Based Methods cont.► Mincut [Blum, ICML01]Mincut [Blum, ICML01]

► Gaussian Random Fields Gaussian Random Fields

[Zhu, ICML03][Zhu, ICML03]

► Local and Global ConsisteLocal and Global Consistency [Zhou, NIPS04]ncy [Zhou, NIPS04]

► Discriminative in Nature!Discriminative in Nature!

1, 0,1

2i i ij i j ii L i jf y w f f f

2i i ij i ii j jji ijf y w f D f D

-----+++++

04/03/2008 12

OutlineOutline

04/03/2008 13

MotivationMotivation► Existing Graph-Based Methods:Existing Graph-Based Methods:

: : NONO justification justification Discriminative: inaccurate proportion in the labeled sDiscriminative: inaccurate proportion in the labeled s

et greatly et greatly AFFECTSAFFECTS the performance the performance► Proposed Method:Proposed Method:

: : WELLWELL justified justified Generative: estimated class priors Generative: estimated class priors COMPENSATESCOMPENSATES fo fo

r the inaccurate proportion in the labeled setr the inaccurate proportion in the labeled set

Pf y x

Pf x y

04/03/2008 14

NotationNotation► nn training examples: training examples: ► labeled examples, labeled examples, ► unlabeled examplesunlabeled examples► Affinity matrix: Affinity matrix: ► similarity between andsimilarity between and► Diagonal matrix Diagonal matrix D D : : ► ► : set to 1 for labeled examples: set to 1 for labeled examples

1, , dnx x

u ln n n n nW

,ij i jW x x ix jx

ii ijjD W

1 2 1 2S D WD , nf f

04/03/2008 15

-----+++++

Ideal CaseIdeal Case► Two classes far Two classes far

apartapart1

1 2 1 211 11

1 2 1 20 00 0

D W DSS

S D W D

04/03/2008 16

Derivation SketchDerivation Sketch

Relate

Relateeigenvector

to P x y

Relate

to,f f

04/03/2008 17

Class Conditional ProbabilityClass Conditional Probability

►Theorem 1Theorem 1 As , As ,

Similar to kernel density estimationSimilar to kernel density estimation

►Unlabeled dataUnlabeled data ? ?? ?

n Piii y i iD n x y

P 1ii i iD x y P 0ii i iD x y

Relate

Relateeigenvector

to P x y

Relate

to,f f

04/03/2008 18

Class Conditional Probability Class Conditional Probability cont.cont.

►Eigenvectors of Eigenvectors of SS ;;

► Element-wise:Element-wise:

► ;;

1 0TTv v

00TTv v

1 1 1S v v 0 0 0S v v

2 21v v D

2P 1v x y 2

P 0v x y

1/ 21 1 1v D

0 0 1v D

Relate

Relateeigenvector

to P x y

Relate

to,f f

04/03/2008 19

Class Conditional Probability Class Conditional Probability cont.cont.

►To get and , iterate:To get and , iterate: , ,

►Upon convergenceUpon convergence ,,

► After normalizationAfter normalization ,,

f Sf f Sf

f v f v

2P 1i ix y f 2

P 0i ix y f

Relate

Relateeigenvector

to P x y

Relate

to,f f

04/03/2008 20

Example of the Ideal CaseExample of the Ideal Case

-5 0 5 10

P 1x y

P 0x y

04/03/2008 21

General CaseGeneral Case

►Two classes not far apartTwo classes not far apart

►SS not block diagonal not block diagonal-5 0 5 10

f Sf Upon Convergence f f

04/03/2008 22

Class Conditional ProbabilityClass Conditional Probability

► Iteration processIteration process The labeled examples gradually spread The labeled examples gradually spread

their information to nearby pointstheir information to nearby points

►SolutionSolution Stop the iteration when certain criterion is Stop the iteration when certain criterion is

satisfiedsatisfied

04/03/2008 23

Stopping CriterionStopping Criterion

►Average probability of the negative Average probability of the negative labeled examples in the positive classlabeled examples in the positive class

ii iyx y

0 200 400 600 800 10000

04/03/2008 24

Stopping Criterion cont.Stopping Criterion cont.

0 200 400 600 800 10000

Pre-maturity

ExcessivePropagation

04/03/2008 25

Stopping Criterion cont.Stopping Criterion cont.

►Average probability of the positive Average probability of the positive labeled examples in the negative classlabeled examples in the negative class

ii iyx y

0 200 400 600 800 10000

04/03/2008 26

Example of the General CaseExample of the General Case

-5 0 5 10-4

8 P 1x y

P 0x y

04/03/2008 27

Estimating Class PriorsEstimating Class Priors

►Theorem 2: in the general case, as Theorem 2: in the general case, as

►To get estimates of To get estimates of

n P 1 P 1 P 0 P 0ii i i i iD n x y y x y y

P 1y ˆ ˆP 1 P 0 1 , 1, ,ii i i i iD n x y p x y p i n

1ˆP 1 , P 0 1 P 1

p ny y y

04/03/2008 28

PredictionPrediction

► To classify a new example To classify a new example Calculate the class conditional probabilitiesCalculate the class conditional probabilities

According to Bayes ruleAccording to Bayes rule

x x x yx y

x y yy x

04/03/2008 29

OutlineOutline

04/03/2008 30

Cedar Buffalo Binary Digits Data Cedar Buffalo Binary Digits Data Set Set [Hull, PAMI94][Hull, PAMI94]

0 20 40 60 80 1000.5

labeled set size

racy Our Algorithm

Gaussian Random Fields

Local and Global Consistency

►Balanced classificationBalanced classification

20 40 60 80 1000.5

labeled set size

Our Algorithm

1 vs 2 odd vs even

Our method

04/03/2008 31

Cedar Buffalo Binary Digits Data Cedar Buffalo Binary Digits Data Set Set [Hull, PAMI94][Hull, PAMI94]

►Unbalanced classificationUnbalanced classification

0 20 40 60 80 1000.4

labeled set size

racy Our Algorithm

20 40 60 80 1000.5

labeled set size

Our AlgorithmGaussian Random FieldsLocal and Global Consistency

Our method

Our methodGaussian Random

Fields

1 vs 2 odd vs even

04/03/2008 32

Genre Data Set Genre Data Set [Liu, ECML03][Liu, ECML03]

►Classification between random Classification between random partitionspartitions

20 40 60 80 1000.5

labeled set size

20 40 60 80 1000.5

labeled set size

balanced unbalanced

Our method

Our methodGaussian Random Fields

04/03/2008 33

Genre Data Set Genre Data Set [Liu, ECML03][Liu, ECML03]

►Unbalanced classificationUnbalanced classification

50 100 150 2000.2

labeled set size

50 100 150 2000.5

labeled set size

newspapers vs other biographies vs other

Our method

04/03/2008 34

ConclusionConclusion

►A new graph-based semi-supervised A new graph-based semi-supervised learning methodlearning method Generative in natureGenerative in nature Ideal case: theoretical guaranteeIdeal case: theoretical guarantee General case: reasonable estimatesGeneral case: reasonable estimates Prediction: easy and intuitivePrediction: easy and intuitive

Questions?Questions?

Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department 04-10-2008.

Documents

CARBONELL ATHENAZE

Miguel Carbonell Noviembre

Reflexiones Carbonell

Jaume carbonell

Óscar Javier Carbonell Valderrama

Jingrui He , Changshui Zhang , Nanyuan Zhao , Hanghang Tong

DANI and PAU CARBONELL

Proactive Learning: Cost- Sensitive Active Learning with...

Application of Machine Learning and Crowdsourcing to...

Carbonell. Edad Media

aitv-new.mss - Carnegie Mellon School of Computer...

Carbonell. La Historiografía Árabe

Carbonell, Miguel

Osteoporosi cristina carbonell

CARBONELL (Maniobra MCE-1_75)

Ricardo Ruiz Carbonell