Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department 04-10-2008.
Post on 21-Dec-2015
214 Views
Preview:
Transcript
Graph-Based Graph-Based Semi-Supervised Semi-Supervised
Learning Learning with a Generative Modelwith a Generative Model
Speaker: Jingrui HeSpeaker: Jingrui HeAdvisor: Jaime CarbonellAdvisor: Jaime Carbonell
Machine Learning Department 04-10-2008
04/03/2008 2
Semi-Supervised LearningSemi-Supervised Learning
- +
Very Few
Abundant
04/03/2008 3
OutlineOutline
►BackgroundBackground►Existing MethodsExisting Methods►Proposed MethodProposed Method
Ideal CaseIdeal Case General CaseGeneral Case
►Experimental ResultsExperimental Results►ConclusionConclusion
04/03/2008 4
OverviewOverview
Semi-SupervisedLearning
Featurebased
Graphbased
Gradually generate class labels
Collectively generate class labels
Mincut [Blum, ICML01]
Gaussian Random Fields [Zhu, ICML03]
Local and Global Consistency [Zhou, NIPS04]
Generative Model [He, IJCAI07]
Self-Training[Yarowsky, , ACL95]
Co-Training[Blum, , COLT98]
TSVMs[Joachims, ICML99]
EM-based[Nigam, , ML00]
04/03/2008 5
Self-Training Self-Training [Yarowsky, ACL95][Yarowsky, ACL95]
- +
04/03/2008 6
Co-Training Co-Training [Blum, COLT98][Blum, COLT98]
Sufficient to train a good classifier
Conditionally independent given the class
04/03/2008 7
Transductive SVMs Transductive SVMs [Joachims, ICML9[Joachims, ICML99]9]
- +
Inductive SVMs
Transductive SVMs
ClassificationBoundary:Away fromthe DenseRegions!
04/03/2008 8
EM-based Method EM-based Method [Nigam, ML00][Nigam, ML00]
TextCorpus
P , P Px y y x y
ComputerScience
Medicine Politics
04/03/2008 9
-+
-----++++
Graph-Based Graph-Based Semi-Supervised LearningSemi-Supervised Learning
--
+
+ +
+
04/03/2008 10
Graph-Based MethodsGraph-Based Methods► G={V,E}G={V,E}► Estimating a function Estimating a function
ff on the graph on the graph f f should be close to should be close to
the given labels on the the given labels on the labeled nodeslabeled nodes
f f should be smooth on should be smooth on the whole graphthe whole graph
► RegularizationRegularization
-+
-----+++++
04/03/2008 11
Graph-Based Methods cont.Graph-Based Methods cont.► Mincut [Blum, ICML01]Mincut [Blum, ICML01]
► Gaussian Random Fields Gaussian Random Fields
[Zhu, ICML03][Zhu, ICML03]
► Local and Global ConsisteLocal and Global Consistency [Zhou, NIPS04]ncy [Zhou, NIPS04]
► Discriminative in Nature!Discriminative in Nature!
22
,
1, 0,1
2i i ij i j ii L i jf y w f f f
22
,
1,
2i i ij i j ii L i jf y w f f f
22 1
2i i ij i ii j jji ijf y w f D f D
if
-+
-----+++++
04/03/2008 12
OutlineOutline
►BackgroundBackground►Existing MethodsExisting Methods►Proposed MethodProposed Method
Ideal CaseIdeal Case General CaseGeneral Case
►Experimental ResultsExperimental Results►ConclusionConclusion
04/03/2008 13
MotivationMotivation► Existing Graph-Based Methods:Existing Graph-Based Methods:
: : NONO justification justification Discriminative: inaccurate proportion in the labeled sDiscriminative: inaccurate proportion in the labeled s
et greatly et greatly AFFECTSAFFECTS the performance the performance► Proposed Method:Proposed Method:
: : WELLWELL justified justified Generative: estimated class priors Generative: estimated class priors COMPENSATESCOMPENSATES fo fo
r the inaccurate proportion in the labeled setr the inaccurate proportion in the labeled set
Pf y x
Pf x y
04/03/2008 14
NotationNotation► nn training examples: training examples: ► labeled examples, labeled examples, ► unlabeled examplesunlabeled examples► Affinity matrix: Affinity matrix: ► similarity between andsimilarity between and► Diagonal matrix Diagonal matrix D D : : ► ► : set to 1 for labeled examples: set to 1 for labeled examples
1, , dnx x
ln
u ln n n n nW
,ij i jW x x ix jx
1
n
ii ijjD W
1 2 1 2S D WD , nf f
0,1iy
04/03/2008 15
-+
-----+++++
Ideal CaseIdeal Case► Two classes far Two classes far
apartapart1
0
0
0
WW
W
1
0
0
0
DD
D
1 2 1 211 11
1 2 1 20 00 0
00
0 0
D W DSS
S D W D
0
0
1
0
0
f
0
0
1
0
0
f
04/03/2008 16
Derivation SketchDerivation Sketch
Relate
toiiD
P x y
Relateeigenvector
to P x y
Relate
to,f f
P x y
04/03/2008 17
Class Conditional ProbabilityClass Conditional Probability
►Theorem 1Theorem 1 As , As ,
Similar to kernel density estimationSimilar to kernel density estimation
►Unlabeled dataUnlabeled data ? ?? ?
n Piii y i iD n x y
P 1ii i iD x y P 0ii i iD x y
Relate
toiiD
P x y
Relateeigenvector
to P x y
Relate
to,f f
P x y
04/03/2008 18
Class Conditional Probability Class Conditional Probability cont.cont.
►Eigenvectors of Eigenvectors of SS ;;
► Element-wise:Element-wise:
► ;;
1 0TTv v
00TTv v
1 1 1S v v 0 0 0S v v
2 21v v D
2P 1v x y 2
P 0v x y
1/ 21 1 1v D
1/ 2
0 0 1v D
Relate
toiiD
P x y
Relateeigenvector
to P x y
Relate
to,f f
P x y
04/03/2008 19
Class Conditional Probability Class Conditional Probability cont.cont.
►To get and , iterate:To get and , iterate: , ,
►Upon convergenceUpon convergence ,,
► After normalizationAfter normalization ,,
v v
f Sf f Sf
f v f v
2P 1i ix y f 2
P 0i ix y f
Relate
toiiD
P x y
Relateeigenvector
to P x y
Relate
to,f f
P x y
04/03/2008 20
Example of the Ideal CaseExample of the Ideal Case
-5 0 5 10
-4
-2
0
2
4
6
P 1x y
P 0x y
04/03/2008 21
General CaseGeneral Case
►Two classes not far apartTwo classes not far apart
►SS not block diagonal not block diagonal-5 0 5 10
-4
-2
0
2
4
6
8
f Sf
f Sf Upon Convergence f f
04/03/2008 22
Class Conditional ProbabilityClass Conditional Probability
► Iteration processIteration process The labeled examples gradually spread The labeled examples gradually spread
their information to nearby pointstheir information to nearby points
►SolutionSolution Stop the iteration when certain criterion is Stop the iteration when certain criterion is
satisfiedsatisfied
04/03/2008 23
Stopping CriterionStopping Criterion
►Average probability of the negative Average probability of the negative labeled examples in the positive classlabeled examples in the positive class
0P 1
ii iyx y
Ln
0 200 400 600 800 10000
0.5
1
1.5
2
2.5
3x 10
-3
L+
L-
04/03/2008 24
Stopping Criterion cont.Stopping Criterion cont.
0 200 400 600 800 10000
0.5
1
1.5
2
2.5
3x 10
-3
L+
L-
Pre-maturity
ExcessivePropagation
04/03/2008 25
Stopping Criterion cont.Stopping Criterion cont.
►Average probability of the positive Average probability of the positive labeled examples in the negative classlabeled examples in the negative class
1P 0
ii iyx y
Ln
0 200 400 600 800 10000
0.5
1
1.5
2
2.5
3x 10
-3
L+
L-
04/03/2008 26
Example of the General CaseExample of the General Case
-5 0 5 10-4
-2
0
2
4
6
8 P 1x y
P 0x y
04/03/2008 27
Estimating Class PriorsEstimating Class Priors
►Theorem 2: in the general case, as Theorem 2: in the general case, as
►To get estimates of To get estimates of
n P 1 P 1 P 0 P 0ii i i i iD n x y y x y y
P 1y ˆ ˆP 1 P 0 1 , 1, ,ii i i i iD n x y p x y p i n
1ˆP 1 , P 0 1 P 1
1l
l
p ny y y
n
04/03/2008 28
PredictionPrediction
► To classify a new example To classify a new example Calculate the class conditional probabilitiesCalculate the class conditional probabilities
According to Bayes ruleAccording to Bayes rule
dx
1
1
, PP
,
n
i iin
ii
x x x yx y
x x
P PP
P Py
x y yy x
x y y
04/03/2008 29
OutlineOutline
►BackgroundBackground►Existing MethodsExisting Methods►Proposed MethodProposed Method
Ideal CaseIdeal Case General CaseGeneral Case
►Experimental ResultsExperimental Results►ConclusionConclusion
04/03/2008 30
Cedar Buffalo Binary Digits Data Cedar Buffalo Binary Digits Data Set Set [Hull, PAMI94][Hull, PAMI94]
0 20 40 60 80 1000.5
0.6
0.7
0.8
0.9
1
labeled set size
accu
racy Our Algorithm
Gaussian Random Fields
Local and Global Consistency
►Balanced classificationBalanced classification
20 40 60 80 1000.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
labeled set size
accu
racy
Our Algorithm
Gaussian Random Fields
Local and Global Consistency
1 vs 2 odd vs even
Our method
Gaussian Random Fields
Local and Global Consistency
Our method
Gaussian Random Fields
Local and Global Consistency
04/03/2008 31
Cedar Buffalo Binary Digits Data Cedar Buffalo Binary Digits Data Set Set [Hull, PAMI94][Hull, PAMI94]
►Unbalanced classificationUnbalanced classification
0 20 40 60 80 1000.4
0.5
0.6
0.7
0.8
0.9
1
labeled set size
accu
racy Our Algorithm
Gaussian Random Fields
Local and Global Consistency
20 40 60 80 1000.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
labeled set size
accu
racy
Our AlgorithmGaussian Random FieldsLocal and Global Consistency
Our method
Gaussian Random Fields
Local and Global Consistency
Our methodGaussian Random
Fields
Local and Global Consistency
1 vs 2 odd vs even
04/03/2008 32
Genre Data Set Genre Data Set [Liu, ECML03][Liu, ECML03]
►Classification between random Classification between random partitionspartitions
20 40 60 80 1000.5
0.55
0.6
0.65
0.7
labeled set size
accu
racy
Our AlgorithmGaussian Random FieldsLocal and Global Consistency
20 40 60 80 1000.5
0.55
0.6
0.65
labeled set size
accu
racy
Our AlgorithmGaussian Random FieldsLocal and Global Consistency
balanced unbalanced
Our method
Gaussian Random Fields
Local and Global Consistency
Our methodGaussian Random Fields
Local and Global Consistency
04/03/2008 33
Genre Data Set Genre Data Set [Liu, ECML03][Liu, ECML03]
►Unbalanced classificationUnbalanced classification
50 100 150 2000.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
labeled set size
accu
racy
Our AlgorithmGaussian Random FieldsLocal and Global Consistency
50 100 150 2000.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
labeled set size
accu
racy
Our AlgorithmGaussian Random FieldsLocal and Global Consistency
newspapers vs other biographies vs other
Our method
Gaussian Random Fields
Local and Global Consistency
Our method
Gaussian Random Fields
Local and Global Consistency
04/03/2008 34
ConclusionConclusion
►A new graph-based semi-supervised A new graph-based semi-supervised learning methodlearning method Generative in natureGenerative in nature Ideal case: theoretical guaranteeIdeal case: theoretical guarantee General case: reasonable estimatesGeneral case: reasonable estimates Prediction: easy and intuitivePrediction: easy and intuitive
Questions?Questions?
top related