Machine Learning Fall 2017 Generative and Discriminative Learning 1
MachineLearningFall2017
GenerativeandDiscriminativeLearning
1
Whatwesawmostofthesemester
• Afixed,unknowndistributionDoverX£ Y– X:Instancespace,Y:labelspace(eg:{+1,-1})
• GivenadatasetS={(xi,yi)}
• Learning– IdentifyahypothesisspaceH,definealossfunctionL(h,x,y)– Minimizeaveragelossovertrainingdata(plusregularization)
• Theguarantee– Ifwefindanalgorithmthatminimizeslossontheobserveddata– Then,learningtheoryguaranteesgoodfuturebehavior(asafunction
ofH)
2
Whatwesawmostofthesemester
• Afixed,unknowndistributionDoverX£ Y– X:Instancespace,Y:labelspace(eg:{+1,-1})
• GivenadatasetS={(xi,yi)}
• Learning– IdentifyahypothesisspaceH,definealossfunctionL(h,x,y)– Minimizeaveragelossovertrainingdata(plusregularization)
• Theguarantee– Ifwefindanalgorithmthatminimizeslossontheobserveddata– Then,learningtheoryguaranteesgoodfuturebehavior(asafunction
ofH)
3
IsthisdifferentfromassumingadistributionoverXandafixedoraclefunctionf?
Whatwesawmostofthesemester
• Afixed,unknowndistributionDoverX£ Y– X:Instancespace,Y:labelspace(eg:{+1,-1})
• GivenadatasetS={(xi,yi)}
• Learning– IdentifyahypothesisspaceH,definealossfunctionL(h,x,y)– Minimizeaveragelossovertrainingdata(plusregularization)
• Theguarantee– Ifwefindanalgorithmthatminimizeslossontheobserveddata– Then,learningtheoryguaranteesgoodfuturebehavior(asafunction
ofH)
4
IsthisdifferentfromassumingadistributionoverXandafixedoraclefunctionf?
Whatwesawmostofthesemester
• Afixed,unknowndistributionDoverX£ Y– X:Instancespace,Y:labelspace(eg:{+1,-1})
• GivenadatasetS={(xi,yi)}
• Learning– IdentifyahypothesisspaceH,definealossfunctionL(h,x,y)– Minimizeaveragelossovertrainingdata(plusregularization)
• Theguarantee– Ifwefindanalgorithmthatminimizeslossontheobserveddata– Then,learningtheoryguaranteesgoodfuturebehavior(asafunction
ofH)
5
IsthisdifferentfromassumingadistributionoverXandafixedoraclefunctionf?
Whatwesawmostofthesemester
• Afixed,unknowndistributionDoverX£ Y– X:Instancespace,Y:labelspace(eg:{+1,-1})
• GivenadatasetS={(xi,yi)}
• Learning– IdentifyahypothesisspaceH,definealossfunctionL(h,x,y)– Minimizeaveragelossovertrainingdata(plusregularization)
• Theguarantee– Ifwefindanalgorithmthatminimizeslossontheobserveddata– Then,learningtheoryguaranteesgoodfuturebehavior(asafunction
ofH)
6
IsthisdifferentfromassumingadistributionoverXandafixedoraclefunctionf?
Discriminativemodels
Goal:learndirectlyhowtomakepredictions
• Lookatmany(positive/negative)examples
• Discoverregularitiesinthedata
• Usethesetoconstructapredictionpolicy
• Assumptionscomeintheformofthehypothesisclass
Bottomline:approximatingh:X→YisestimatingP(Y|X)
7
Generativemodels
• Explicitlymodelhowinstancesineachcategoryaregenerated
• Thatis,learnP(X|Y) andP(Y)
• WedidthisfornaïveBayes– NaïveBayesisagenerativemodel
• PredictP(Y|X)usingtheBayesrule
8
Example:GenerativestoryofnaïveBayes
9
Example:GenerativestoryofnaïveBayes
10
YP(Y)
Firstsamplealabel
Example:GenerativestoryofnaïveBayes
11
YP(Y)
X1
P(X1 |Y)
Giventhelabel,samplethefeaturesindependentlyfromtheconditionaldistributions
Example:GenerativestoryofnaïveBayes
12
YP(Y)
X1
P(X1 |Y)
X2
P(X2 |Y)
Giventhelabel,samplethefeaturesindependentlyfromtheconditionaldistributions
Example:GenerativestoryofnaïveBayes
13
YP(Y)
X1
P(X1 |Y)
X2
P(X2 |Y)
X3
P(X3 |Y)
Giventhelabel,samplethefeaturesindependentlyfromtheconditionaldistributions
Example:GenerativestoryofnaïveBayes
14
YP(Y)
X1
P(X1 |Y)
X2
P(X2 |Y)
X3
P(X3 |Y)
...
Giventhelabel,samplethefeaturesindependentlyfromtheconditionaldistributions
Example:GenerativestoryofnaïveBayes
15
YP(Y)
X1
P(X1 |Y)
X2
P(X2 |Y)
X3
P(X3 |Y)
Xd
P(Xd |Y)
...
Giventhelabel,samplethefeaturesindependentlyfromtheconditionaldistributions
• Generativemodels– learnP(x,y)– Characterizehowthedataisgenerated(bothinputsandoutputs)– Eg:NaïveBayes,HiddenMarkovModel
• Discriminativemodels– learnP(y|x)– Directlycharacterizesthedecisionboundaryonly– Eg:LogisticRegression,Conditionalmodels(severalnames)
Generativevs Discriminativemodels
16
• Generativemodels– learnP(x,y)– Characterizehowthedataisgenerated(bothinputsandoutputs)– Eg:NaïveBayes,HiddenMarkovModel
• Discriminativemodels– learnP(y|x)– Directlycharacterizesthedecisionboundaryonly– Eg:LogisticRegression,Conditionalmodels(severalnames)
Generativevs Discriminativemodels
17
• Generativemodels– learnP(x,y)– Characterizehowthedataisgenerated(bothinputsandoutputs)– Eg:NaïveBayes,HiddenMarkovModel
• Discriminativemodels– learnP(y|x)– Directlycharacterizesthedecisionboundaryonly– Eg:LogisticRegression,Conditionalmodels(severalnames)
Generativevs Discriminativemodels
18
• Generativemodels– learnP(x,y)– Characterizehowthedataisgenerated(bothinputsandoutputs)– Eg:NaïveBayes,HiddenMarkovModel
• Discriminativemodels– learnP(y|x)– Directlycharacterizesthedecisionboundaryonly– Eg:LogisticRegression,Conditionalmodels(severalnames)
Generativevs Discriminativemodels
19
• Generativemodels– learnP(x,y)– Characterizehowthedataisgenerated(bothinputsandoutputs)– Eg:NaïveBayes,HiddenMarkovModel
• Discriminativemodels– learnP(y|x)– Directlycharacterizesthedecisionboundaryonly– Eg:LogisticRegression,Conditionalmodels(severalnames)
Generativevs Discriminativemodels
Agenerativemodeltriestocharacterizethedistributionoftheinputs,adiscriminativemodeldoesn’tcare
20