Machine Learning Neural Networks: Introduction 1 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others
MachineLearning
NeuralNetworks:Introduction
1BasedonslidesandmaterialfromGeoffreyHinton,RichardSocher,DanRoth,Yoav Goldberg,ShaiShalev-Shwartz andShaiBen-David,andothers
Wherearewe?
Generallearningprinciples• Overfitting• Mistake-boundlearning• PAClearning,samplecomplexity• Hypothesischoice&VCdimensions• Trainingandgeneralizationerrors• RegularizedEmpiricalLoss
Minimization• BayesianLearning
Learningalgorithms• DecisionTrees• Perceptron• AdaBoost• SupportVectorMachines• NaïveBayes• LogisticRegression
4
Producelinearclassifiers
NeuralNetworks
• Whatisaneuralnetwork?
• Predictingwithaneuralnetwork
• Trainingneuralnetworks
• Practicalconcerns
6
Thislecture
• Whatisaneuralnetwork?– Thehypothesisclass– Structure,expressiveness
• Predictingwithaneuralnetwork
• Trainingneuralnetworks
• Practicalconcerns
7
Wehaveseenlinearthresholdunits
11
features
dotproduct
threshold
Predictionsgn(&'( + *) = sgn(∑./0/ + *)
Learningvariousalgorithmsperceptron,SVM,logisticregression,…
ingeneral,minimizeloss
Butwheredotheseinputfeaturescomefrom?
Whatifthefeatureswereoutputsofanotherclassifier?
Featuresfromclassifiers
12
Featuresfromclassifiers
13
Featuresfromclassifiers
14
Eachoftheseconnectionshavetheirownweightsaswell
Featuresfromclassifiers
15
Featuresfromclassifiers
16
Thisisatwolayerfeedforwardneuralnetwork
Featuresfromclassifiers
17
Theoutputlayer
ThehiddenlayerTheinputlayer
Thisisatwolayerfeedforwardneuralnetwork
Thinkofthehiddenlayeraslearningagoodrepresentationoftheinputs
Featuresfromclassifiers
19
Thedotproductfollowedbythethresholdconstitutesaneuron
Fiveneuronsinthispicture(fourinhiddenlayerandoneoutput)
Thisisatwolayerfeedforwardneuralnetwork
Butwheredotheinputscomefrom?
20
Whatiftheinputsweretheoutputsofaclassifier?Theinputlayer
Wecanmakeathree layernetwork….Andsoon.
Letustrytoformalizethis
21
Neuralnetworks
Arobustapproachforapproximatingreal-valued,discrete-valuedorvectorvaluedfunctions
Amongthemosteffectivegeneralpurpose supervisedlearningmethodscurrentlyknown
Especiallyforcomplexandhardtointerpretdatasuchasreal-worldsensorydata
TheBackpropagationalgorithmforneuralnetworkshasbeenshownsuccessfulinmanypracticalproblems
Acrossvariousapplicationdomains
22
Artificialneurons
Functionsthatverylooselymimicabiologicalneuron
Aneuronacceptsacollectionofinputs(avectorx)andproducesanoutputby:
1. Applyingadotproductwithweightsw andaddingabiasb2. Applyinga(possiblynon-linear)transformationcalledanactivation
25
123423 = activation(&'( + *)
Artificialneurons
Functionsthatverylooselymimicabiologicalneuron
Aneuronacceptsacollectionofinputs(avectorx)andproducesanoutputby:
1. Applyingadotproductwithweightsw andaddingabiasb2. Applyinga(possiblynon-linear)transformationcalledanactivation
27
Dotproduct
Thresholdactivation
Otheractivationsarepossible
123423 = activation(&'( + *)
Activationfunctions
Nameoftheneuron Activationfunction:activation(;)Linearunit ;Threshold/sign unit sgn(;)
Sigmoidunit1
1 + exp(−;)Rectifiedlinearunit(ReLU) max(0, ;)Tanh unit tanh(;)
28
123423 = activation(&'( + *)
Manymoreactivationfunctionsexist(sinusoid,sinc,gaussian,polynomial…)
Alsocalledtransferfunctions
Aneuralnetwork
Afunctionthatconvertsinputstooutputsdefinedbyadirectedacyclicgraph
– Nodesorganizedinlayers,correspondtoneurons
– Edgescarryoutputofoneneurontoanother,associatedwithweights
• Todefineaneuralnetwork,weneedtospecify:– Thestructureofthegraph
• Howmanynodes,theconnectivity– Theactivationfunctiononeachnode– Theedgeweights
30
Input
Hidden
Output
wFGH
wFGI
Aneuralnetwork
Afunctionthatconvertsinputstooutputsdefinedbyadirectedacyclicgraph
– Nodesorganizedinlayers,correspondtoneurons
– Edgescarryoutputofoneneurontoanother,associatedwithweights
• Todefineaneuralnetwork,weneedtospecify:– Thestructureofthegraph
• Howmanynodes,theconnectivity– Theactivationfunctiononeachnode– Theedgeweights
31
Input
Hidden
Output
wFGH
wFGI
Aneuralnetwork
Afunctionthatconvertsinputstooutputsdefinedbyadirectedacyclicgraph
– Nodesorganizedinlayers,correspondtoneurons
– Edgescarryoutputofoneneurontoanother,associatedwithweights
• Todefineaneuralnetwork,weneedtospecify:– Thestructureofthegraph
• Howmanynodes,theconnectivity– Theactivationfunctiononeachnode– Theedgeweights
32
CalledthearchitectureofthenetworkTypicallypredefined,partofthedesignoftheclassifier
Input
Hidden
Output
wFGH
wFGI
Aneuralnetwork
Afunctionthatconvertsinputstooutputsdefinedbyadirectedacyclicgraph
– Nodesorganizedinlayers,correspondtoneurons
– Edgescarryoutputofoneneurontoanother,associatedwithweights
• Todefineaneuralnetwork,weneedtospecify:– Thestructureofthegraph
• Howmanynodes,theconnectivity– Theactivationfunctiononeachnode– Theedgeweights
33
CalledthearchitectureofthenetworkTypicallypredefined,partofthedesignoftheclassifier
Learnedfromdata
Input
Hidden
Output
wFGH
wFGI
Abriefhistoryofneuralnetworks
• 1943:McCulloughandPittsshowedhowlinearthresholdunitscancomputelogicalfunctions
• 1949:Hebbsuggestedalearningrulethathassomephysiologicalplausibility
• 1950s:Rosenblatt,thePeceptron algorithmforasinglethresholdneuron
• 1969:MinskyandPapert studiedtheneuronfromageometricalperspective
• 1980s:Convolutionalneuralnetworks(Fukushima,LeCun),thebackpropagationalgorithm(various)
• Early2000s-today:Morecompute,moredata,deepernetworks
34Seealso:http://people.idsia.ch/~juergen/deep-learning-overview.html
very
Whatfunctionsdoneuralnetworksexpress?
35
Asingleneuronwiththresholdactivation
36
Prediction=sgn(b+w1 x1 +w2x2)
++
++
+ +++
-- --
-- -- --
---- --
--
b+w1 x1 +w2x2=0
Twolayers,withthresholdactivations
37
Ingeneral,convexpolygons
FigurefromShaiShalev-Shwartz andShaiBen-David,2014
Threelayerswiththresholdactivations
38
Ingeneral,unionsofconvexpolygons
FigurefromShaiShalev-Shwartz andShaiBen-David,2014
Neuralnetworksareuniversalfunctionapproximators
• Anycontinuousfunctioncanbeapproximatedtoarbitraryaccuracyusingonehiddenlayerofsigmoidunits[Cybenko 1989]
• Approximationerrorisinsensitivetothechoiceofactivationfunctions[DasGupta etal1993]
• Twolayerthreshold networkscanexpressanyBooleanfunction– Exercise:Provethis
• VCdimensionofthresholdnetworkwithedgesE:JK = L(|N|log|N|)
• VCdimensionofsigmoidnetworkswithnodesVandedgesE:– Upperbound:Ο J H N H
– Lowerbound:Ω N H
39
Exercise:Showthatifwehaveonlylinearunits,thenmultiplelayersdoesnotchangetheexpressiveness
Neuralnetworksareuniversalfunctionapproximators
• Anycontinuousfunctioncanbeapproximatedtoarbitraryaccuracyusingonehiddenlayerofsigmoidunits[Cybenko 1989]
• Approximationerrorisinsensitivetothechoiceofactivationfunctions[DasGupta etal1993]
• Twolayerthreshold networkscanexpressanyBooleanfunction– Exercise:Provethis
• VCdimensionofthresholdnetworkwithedgesE:JK = L(|N|log|N|)
• VCdimensionofsigmoidnetworkswithnodesVandedgesE:– Upperbound:Ο J H N H
– Lowerbound:Ω N H
40
Neuralnetworksareuniversalfunctionapproximators
• Anycontinuousfunctioncanbeapproximatedtoarbitraryaccuracyusingonehiddenlayerofsigmoidunits[Cybenko 1989]
• Approximationerrorisinsensitivetothechoiceofactivationfunctions[DasGupta etal1993]
• Twolayerthreshold networkscanexpressanyBooleanfunction– Exercise:Provethis
• VCdimensionofthresholdnetworkwithedgesE:JK = L(|N|log|N|)
• VCdimensionofsigmoidnetworkswithnodesVandedgesE:– Upperbound:Ο J H N H
– Lowerbound:Ω N H
41
Exercise:Showthatifwehaveonlylinearunits,thenmultiplelayersdoesnotchangetheexpressiveness