Top Banner
Machine Learning Neural Networks: Introduction 1 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others
30

Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

May 20, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

MachineLearning

NeuralNetworks:Introduction

1BasedonslidesandmaterialfromGeoffreyHinton,RichardSocher,DanRoth,Yoav Goldberg,ShaiShalev-Shwartz andShaiBen-David,andothers

Page 2: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Wherearewe?

Generallearningprinciples• Overfitting• Mistake-boundlearning• PAClearning,samplecomplexity• Hypothesischoice&VCdimensions• Trainingandgeneralizationerrors• RegularizedEmpiricalLoss

Minimization• BayesianLearning

Learningalgorithms• DecisionTrees• Perceptron• AdaBoost• SupportVectorMachines• NaïveBayes• LogisticRegression

4

Producelinearclassifiers

Page 3: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

NeuralNetworks

• Whatisaneuralnetwork?

• Predictingwithaneuralnetwork

• Trainingneuralnetworks

• Practicalconcerns

6

Page 4: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Thislecture

• Whatisaneuralnetwork?– Thehypothesisclass– Structure,expressiveness

• Predictingwithaneuralnetwork

• Trainingneuralnetworks

• Practicalconcerns

7

Page 5: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Wehaveseenlinearthresholdunits

11

features

dotproduct

threshold

Predictionsgn(&'( + *) = sgn(∑./0/ + *)

Learningvariousalgorithmsperceptron,SVM,logisticregression,…

ingeneral,minimizeloss

Butwheredotheseinputfeaturescomefrom?

Whatifthefeatureswereoutputsofanotherclassifier?

Page 6: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Featuresfromclassifiers

12

Page 7: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Featuresfromclassifiers

13

Page 8: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Featuresfromclassifiers

14

Eachoftheseconnectionshavetheirownweightsaswell

Page 9: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Featuresfromclassifiers

15

Page 10: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Featuresfromclassifiers

16

Thisisatwolayerfeedforwardneuralnetwork

Page 11: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Featuresfromclassifiers

17

Theoutputlayer

ThehiddenlayerTheinputlayer

Thisisatwolayerfeedforwardneuralnetwork

Thinkofthehiddenlayeraslearningagoodrepresentationoftheinputs

Page 12: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Featuresfromclassifiers

19

Thedotproductfollowedbythethresholdconstitutesaneuron

Fiveneuronsinthispicture(fourinhiddenlayerandoneoutput)

Thisisatwolayerfeedforwardneuralnetwork

Page 13: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Butwheredotheinputscomefrom?

20

Whatiftheinputsweretheoutputsofaclassifier?Theinputlayer

Wecanmakeathree layernetwork….Andsoon.

Page 14: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Letustrytoformalizethis

21

Page 15: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Neuralnetworks

Arobustapproachforapproximatingreal-valued,discrete-valuedorvectorvaluedfunctions

Amongthemosteffectivegeneralpurpose supervisedlearningmethodscurrentlyknown

Especiallyforcomplexandhardtointerpretdatasuchasreal-worldsensorydata

TheBackpropagationalgorithmforneuralnetworkshasbeenshownsuccessfulinmanypracticalproblems

Acrossvariousapplicationdomains

22

Page 16: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Artificialneurons

Functionsthatverylooselymimicabiologicalneuron

Aneuronacceptsacollectionofinputs(avectorx)andproducesanoutputby:

1. Applyingadotproductwithweightsw andaddingabiasb2. Applyinga(possiblynon-linear)transformationcalledanactivation

25

123423 = activation(&'( + *)

Page 17: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Artificialneurons

Functionsthatverylooselymimicabiologicalneuron

Aneuronacceptsacollectionofinputs(avectorx)andproducesanoutputby:

1. Applyingadotproductwithweightsw andaddingabiasb2. Applyinga(possiblynon-linear)transformationcalledanactivation

27

Dotproduct

Thresholdactivation

Otheractivationsarepossible

123423 = activation(&'( + *)

Page 18: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Activationfunctions

Nameoftheneuron Activationfunction:activation(;)Linearunit ;Threshold/sign unit sgn(;)

Sigmoidunit1

1 + exp(−;)Rectifiedlinearunit(ReLU) max(0, ;)Tanh unit tanh(;)

28

123423 = activation(&'( + *)

Manymoreactivationfunctionsexist(sinusoid,sinc,gaussian,polynomial…)

Alsocalledtransferfunctions

Page 19: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Aneuralnetwork

Afunctionthatconvertsinputstooutputsdefinedbyadirectedacyclicgraph

– Nodesorganizedinlayers,correspondtoneurons

– Edgescarryoutputofoneneurontoanother,associatedwithweights

• Todefineaneuralnetwork,weneedtospecify:– Thestructureofthegraph

• Howmanynodes,theconnectivity– Theactivationfunctiononeachnode– Theedgeweights

30

Input

Hidden

Output

wFGH

wFGI

Page 20: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Aneuralnetwork

Afunctionthatconvertsinputstooutputsdefinedbyadirectedacyclicgraph

– Nodesorganizedinlayers,correspondtoneurons

– Edgescarryoutputofoneneurontoanother,associatedwithweights

• Todefineaneuralnetwork,weneedtospecify:– Thestructureofthegraph

• Howmanynodes,theconnectivity– Theactivationfunctiononeachnode– Theedgeweights

31

Input

Hidden

Output

wFGH

wFGI

Page 21: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Aneuralnetwork

Afunctionthatconvertsinputstooutputsdefinedbyadirectedacyclicgraph

– Nodesorganizedinlayers,correspondtoneurons

– Edgescarryoutputofoneneurontoanother,associatedwithweights

• Todefineaneuralnetwork,weneedtospecify:– Thestructureofthegraph

• Howmanynodes,theconnectivity– Theactivationfunctiononeachnode– Theedgeweights

32

CalledthearchitectureofthenetworkTypicallypredefined,partofthedesignoftheclassifier

Input

Hidden

Output

wFGH

wFGI

Page 22: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Aneuralnetwork

Afunctionthatconvertsinputstooutputsdefinedbyadirectedacyclicgraph

– Nodesorganizedinlayers,correspondtoneurons

– Edgescarryoutputofoneneurontoanother,associatedwithweights

• Todefineaneuralnetwork,weneedtospecify:– Thestructureofthegraph

• Howmanynodes,theconnectivity– Theactivationfunctiononeachnode– Theedgeweights

33

CalledthearchitectureofthenetworkTypicallypredefined,partofthedesignoftheclassifier

Learnedfromdata

Input

Hidden

Output

wFGH

wFGI

Page 23: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Abriefhistoryofneuralnetworks

• 1943:McCulloughandPittsshowedhowlinearthresholdunitscancomputelogicalfunctions

• 1949:Hebbsuggestedalearningrulethathassomephysiologicalplausibility

• 1950s:Rosenblatt,thePeceptron algorithmforasinglethresholdneuron

• 1969:MinskyandPapert studiedtheneuronfromageometricalperspective

• 1980s:Convolutionalneuralnetworks(Fukushima,LeCun),thebackpropagationalgorithm(various)

• Early2000s-today:Morecompute,moredata,deepernetworks

34Seealso:http://people.idsia.ch/~juergen/deep-learning-overview.html

very

Page 24: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Whatfunctionsdoneuralnetworksexpress?

35

Page 25: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Asingleneuronwiththresholdactivation

36

Prediction=sgn(b+w1 x1 +w2x2)

++

++

+ +++

-- --

-- -- --

---- --

--

b+w1 x1 +w2x2=0

Page 26: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Twolayers,withthresholdactivations

37

Ingeneral,convexpolygons

FigurefromShaiShalev-Shwartz andShaiBen-David,2014

Page 27: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Threelayerswiththresholdactivations

38

Ingeneral,unionsofconvexpolygons

FigurefromShaiShalev-Shwartz andShaiBen-David,2014

Page 28: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Neuralnetworksareuniversalfunctionapproximators

• Anycontinuousfunctioncanbeapproximatedtoarbitraryaccuracyusingonehiddenlayerofsigmoidunits[Cybenko 1989]

• Approximationerrorisinsensitivetothechoiceofactivationfunctions[DasGupta etal1993]

• Twolayerthreshold networkscanexpressanyBooleanfunction– Exercise:Provethis

• VCdimensionofthresholdnetworkwithedgesE:JK = L(|N|log|N|)

• VCdimensionofsigmoidnetworkswithnodesVandedgesE:– Upperbound:Ο J H N H

– Lowerbound:Ω N H

39

Exercise:Showthatifwehaveonlylinearunits,thenmultiplelayersdoesnotchangetheexpressiveness

Page 29: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Neuralnetworksareuniversalfunctionapproximators

• Anycontinuousfunctioncanbeapproximatedtoarbitraryaccuracyusingonehiddenlayerofsigmoidunits[Cybenko 1989]

• Approximationerrorisinsensitivetothechoiceofactivationfunctions[DasGupta etal1993]

• Twolayerthreshold networkscanexpressanyBooleanfunction– Exercise:Provethis

• VCdimensionofthresholdnetworkwithedgesE:JK = L(|N|log|N|)

• VCdimensionofsigmoidnetworkswithnodesVandedgesE:– Upperbound:Ο J H N H

– Lowerbound:Ω N H

40

Page 30: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general

Neuralnetworksareuniversalfunctionapproximators

• Anycontinuousfunctioncanbeapproximatedtoarbitraryaccuracyusingonehiddenlayerofsigmoidunits[Cybenko 1989]

• Approximationerrorisinsensitivetothechoiceofactivationfunctions[DasGupta etal1993]

• Twolayerthreshold networkscanexpressanyBooleanfunction– Exercise:Provethis

• VCdimensionofthresholdnetworkwithedgesE:JK = L(|N|log|N|)

• VCdimensionofsigmoidnetworkswithnodesVandedgesE:– Upperbound:Ο J H N H

– Lowerbound:Ω N H

41

Exercise:Showthatifwehaveonlylinearunits,thenmultiplelayersdoesnotchangetheexpressiveness