Stats5 Seminar: Machine Learningsmyth/courses/stats5/onlineslides/...Feb 20 John Brock Cylance, Inc Data Science and CyberSecurity Feb 27 Video Lecture (Kate Crawford) Microsoft Research

Post on 11-Jul-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Stats5Seminar:MachineLearning

Winter2018

ProfessorPadhraicSmythDepartmentsofComputerScienceandStatisticsUniversityofCalifornia, Irvine

P.Smyth:Stats5:DataScience Seminar,Winter2018:2

ClassOrganization

• Meetweeklyfor40minuteseminarwith5-10minutediscussion

• 8topics(withguestspeakers),weeks2through9– Youareencouraged toaskquestionsduringandafterthetalks

• Introandwrap-uptalksinweeks1and10

• ClassWebsiteisatwww.ics.uci.edu/~smyth/courses/stats5– Slidesandrelatedmaterialswillbeposted duringthequarter

P.Smyth:Stats5:DataScience Seminar,Winter2018:3

Date Speaker DepartmentOr Organization Topic

Jan9 PadhraicSmyth ComputerScience IntroductiontoDataScience

Jan16 Padhraic Smyth ComputerScience Classification AlgorithmsinMachineLearning

Jan23 MichaelCarey ComputerScience DatabasesandDataManagement

Jan30 SameerSingh ComputerScience StatisticalNaturalLanguageProcessing

Feb6 Zhaoxia Yu Statistics AnIntroductiontoClusterAnalysis

Feb13 ErikSudderth ComputerScience ComputerVision andMachineLearning

Feb20 JohnBrock Cylance, Inc DataScienceandCyberSecurity

Feb27 VideoLecture(KateCrawford)

MicrosoftResearchandNYU BiasinMachineLearning

Mar6 MattHarding Economics DataScienceinEconomics andFinance

Mar13 PadhraicSmyth ComputerScience Review:PastandFutureofDataScience

ScheduleofLectures

P.Smyth:Stats5:DataScience Seminar,Winter2018:4

SubmissionofReviewForms(Weeks2to10)

• SubmitReviewformsforLectures2through10• Availableathttp://www.ics.uci.edu/~smyth/courses/stats5/Forms/

• Reviewformswillbeavailableonlineatthestartofeachclass– Afewrelativelyshortquestionsbasedonthelecturethatday– Needstobesubmitted toEEEby12:15foreachlecture– Bringyour laptoporotherdevice

• Requirementstopasstheclass– Attendandsubmit reviewform for least8lecturesforweeks2through 10

(allowedtomissoneifyouneedtoforsomereason)

• Nofinalexam:pass/failbasedonattendanceandreviewforms

P.Smyth:Stats5:DataScience Seminar,Winter2018:5

OutlineofToday’sTopic

• Whatismachinelearning?

• Classificationalgorithms

• Examplesfromimageandsequenceclassification

• Conclusionsanddiscussion

[Acknowledgement toProfessor AlexIhler forvariousslidesandfigures inthislecture]

P.Smyth:Stats5:DataScience Seminar,Winter2018:6

WhatisMachineLearning?

P.Smyth:Stats5:DataScience Seminar,Winter2018:7

Machinelearning(ML)

• Learningmodelsfromdata• Makingpredictions(ordecisions)• Gettingbetterwithexperience(data)• Problemswhosesolutionsare“hardtodescribe”

P.Smyth:Stats5:DataScience Seminar,Winter2018:8

Typesofmachinelearningproblems

• Supervisedlearning– “Labeled”trainingdata– Everyexamplehasadesiredtargetvalue(a“knownanswer”)– Rewardpredictionsclosetotarget;penalizepredictionswithlargeerrors

– Classification:adiscrete-valuedprediction– Regression:acontinuous-valued prediction

P.Smyth:Stats5:DataScience Seminar,Winter2018:9

P.Smyth:Stats5:DataScience Seminar,Winter2018:10

Typesofmachinelearningproblems

• Supervisedlearning– “Labeled”trainingdata– Everyexamplehasadesiredtargetvalue(a“bestanswer”)– Rewardpredictionbeingclosetotarget

– Classification:adiscrete-valuedprediction– Regression:acontinuous-valued prediction

– Recommendersystems12

11

10

987654321

455? 311

3124452

534321423

245424

5224345

423316

users

movies

P.Smyth:Stats5:DataScience Seminar,Winter2018:11

Typesofmachinelearningproblems

• Supervisedlearning– Trainingdatahaslabelsortargetvalues

• Unsupervisedlearning– Trainingdatahasnolabelsortargetvalues– Interestedindiscoveringnaturalstructureindata– Oftenusedinexplorationofdata,e.g.,inscience,inbusiness– Example:

• Clusteringcustomersormedicalpatientsintogroups• Discoveringanumericalrepresentationofwordsormovies

P.Smyth:Stats5:DataScience Seminar,Winter2018:12

Datain2Dimensionswith5Clusters

SeeLecturebyProfZhaoxia YulaterthisquarteronClusteringAlgorithms

P.Smyth:Stats5:DataScience Seminar,Winter2018:13

Embeddings ofWordsasVectors

From:https://www.mathworks.com/help/examples/textanalytics/

P.Smyth:Stats5:DataScience Seminar,Winter2018:14

Figure from Koren, Bell, Volinksy, IEEE Computer, 2009

P.Smyth:Stats5:DataScience Seminar,Winter2018:15

Typesofmachinelearningproblems

• Supervisedlearning• Unsupervisedlearning

• Reinforcementlearning– Algorithmgetsindirectfeedbackonitsprogress(ratherthancorrect/incorrect)– E.g.,aprogramlearningtoplaychess,orGo,oravideogame– E.g.,anautonomous vehiclelearninghowtonavigateacity– Mathematicalmodelsfordelayedreward,creditassignment,explore/exploit

P.Smyth:Stats5:DataScience Seminar,Winter2018:16

P.Smyth:Stats5:DataScience Seminar,Winter2018:17

ClassificationusingSupervisedLearning

P.Smyth:Stats5:DataScience Seminar,Winter2018:18

LearningaClassificationModel

PatientID Zipcode Age …. Test Score Diagnosis

18261 92697 55 83 1

42356 92697 19 99 1

00219 90001 35 21 0

83726 24351 0 35 0

TrainingData

Learningalgorithmlearnsafunctionthattakesvaluesonthelefttopredictthevalue(diagnosis)ontheright

P.Smyth:Stats5:DataScience Seminar,Winter2018:19

MakingPredictionswithaClassificationModel

PatientID Zipcode Age …. Test Score Diagnosis

18261 92697 55 83 1

42356 92697 19 99 1

00219 90001 35 21 0

83726 24351 0 35 0

12837 92697 40 70 ??

72623 92697 32 44 ??

Wecanthenusethemodeltomakepredictionswhentargetvaluesareunknown

TrainingData

TestData

P.Smyth:Stats5:DataScience Seminar,Winter2018:20

0 10 20 30 40 50 60 70 80 900

2000

4000

6000

8000

10000

12000

14000

AGE

MO

NTHL

Y IN

COM

EEachdotisa2-dimensionalpointrepresentingoneperson=[AGE,MONTHLYINCOME]

P.Smyth:Stats5:DataScience Seminar,Winter2018:21

0 10 20 30 40 50 60 70 80 900

2000

4000

6000

8000

10000

12000

14000

AGE

MO

NTHL

Y IN

COM

E

Goodboundary?

Betterboundary?

Bluedots=goodloansReddots=badloans

P.Smyth:Stats5:DataScience Seminar,Winter2018:22

0 10 20 30 40 50 60 70 80 900

2000

4000

6000

8000

10000

12000

14000

AGE

MO

NTHL

Y IN

COM

E

Amuchmorecomplexboundary– butperhapsoverfitting tonoise?

P.Smyth:Stats5:DataScience Seminar,Winter2018:23

BasicConcepts

• Thecurverepresentsaclassifier(amodel,apredictor)– Pointsononesideofthelinegetclassifiedasoneclass– Pointsontheother sidegetclassifiedastheotherclass– Onceweknowthecurvewecantakenewpointsandclassifythem

• Thecurveisrepresentedinternallybyasetofcoefficients– Thesearealsoknownas“parameters”or“weights”

• Thealgorithmsystematicallyadjuststhecoefficientsontrainingdatatoreducetheerrorasmuchasitcan

• Thisprocessoffindingtheweightsisknownas“learningamodel”

• Foundationalideasarefromstatisticsandoptimization

P.Smyth:Stats5:DataScience Seminar,Winter2018:24

0 10 20 30 40 50 60 70 80 900

2000

4000

6000

8000

10000

12000

14000

AGE

MO

NTHL

Y IN

COM

E

Initialguessforcoefficients(notverygood,higherror)

P.Smyth:Stats5:DataScience Seminar,Winter2018:25

0 10 20 30 40 50 60 70 80 900

2000

4000

6000

8000

10000

12000

14000

AGE

MO

NTHL

Y IN

COM

E

Initialguessforcoefficients(notverygood,higherror)

Finalsolutionforcoefficients(muchbetter,lowerror)

P.Smyth:Stats5:DataScience Seminar,Winter2018:26

010 20

30 4050

60 7080

90

0

2000

4000

6000

8000

10000

12000

14000-3

-2

-1

0

1

2

3

4

5

AGEMONTHLY INCOME

ASS

ETS

Noweachdotisa3-dimensionalpointrepresentingoneperson=[AGE,MONTHLYINCOME,ASSETS]

Ourboundarylinewillnowbecomeaplane

P.Smyth:Stats5:DataScience Seminar,Winter2018:27

HowDoesthisWorkinPractice?

• Weusecomputeralgorithmstosearchforthebestlineorcurve

• Thesesearchalgorithmsarequitesimple1. Startwithaninitialrandomguessforcoefficients2. Changethecoefficientsslightly toreducetheerror

(canusecalculustodothis)3. Movetothenewcoefficients4. Keeprepeatinguntil“convergence”

• Thissearchcanbedone10,100,1000,or1million“dimensions”….with10’sofmillionsofexamples

• Thissearchprocessisatthecoreofmachinelearningalgorithms

P.Smyth:Stats5:DataScience Seminar,Winter2018:28

KeyPoints

• Werepresentourtrainingdataaspointsinamulti-dimensionalspace– Howdoweobtainthelabelsforthedatapoints?

• Wewanttofindaboundarycurvethatcanseparatepointsintotwoclasses

• Thecurvesarerepresentedbysetsofcoefficients(orweights)

• Machinelearningalgorithmsusesearch(oroptimization)toautomaticallyfindthecoefficientswiththelowesterroronthetrainingdata

P.Smyth:Stats5:DataScience Seminar,Winter2018:29

IftheModelistooComplexitcanOverfit

x

y

x

y

x

y

x

y

Toosimple?

Toocomplex? Aboutright?

Data

P.Smyth:Stats5:DataScience Seminar,Winter2018:30

NeuralNetworkClassifiers

P.Smyth:Stats5:DataScience Seminar,Winter2018:31

Machine Learning Notation

Features x e.g.,pixelinputs(usuallyamultidimensionalvector)

Targets y e.g.,truelabelforanimage:“cat”or“nocat”

Predictionsŷ e.g.,model’spredictiongiveninputs,e.g.,“cat”

Error e(y,ŷ ) e.g.,e=0ifpredictionmatchestarget,1otherwise

Parametersθ e.g.,weights,coefficientsspecifyingthemodel

P.Smyth:Stats5:DataScience Seminar,Winter2018:32

Example:ASimpleLinearModel

x1

x2

x3

+1

f(x)

Themachinelearningalgorithmwilllearnaweightforeacharrowinthediagram

Thisasimplemodel:oneweightperinput

P.Smyth:Stats5:DataScience Seminar,Winter2018:33

ASimpleNeuralNetwork

Herethemodellearns3differentfunctionsandthencombinestheoutputsofthe3tomakeaprediction

Thisismorecomplexandhasmoreparametersthanthesimplemodel

x1

x2

x3

+1

f(x)

HiddenLayer

Output

Inputs

P.Smyth:Stats5:DataScience Seminar,Winter2018:34

ASimpleNeuralNetwork

Herethemodellearns3differentfunctionsandthencombinestheoutputsofthe3tomakeaprediction

Thisismorecomplexandhasmoreparametersthanthesimplemodel

x1

x2

x3

+1

f(x)

HiddenLayer

Output

Inputs

P.Smyth:Stats5:DataScience Seminar,Winter2018:35

ASimpleNeuralNetwork

Herethemodellearns3differentfunctionsandthencombinestheoutputsofthe3tomakeaprediction

Thisismorecomplexandhasmoreparametersthanthesimplemodel

x1

x2

x3

+1

f(x)

HiddenLayer

Output

Inputs

P.Smyth:Stats5:DataScience Seminar,Winter2018:36

DeepLearning:ModelswithMoreHiddenLayers

Wecanbuildonthisideatocreate“deepmodels”withmanyhiddenlayers

x1

x2

x3

+1

f(x)

Veryflexibleandcomplexfunctions

HiddenLayer 1

HiddenLayer 2

Output

Inputs

P.Smyth:Stats5:DataScience Seminar,Winter2018:37

Figure from http://parse.ele.tue.nl/

ExampleofaNetworkforImageRecognition

Mathematicallythisisjustafunction(acomplicatedone)

P.Smyth:Stats5:DataScience Seminar,Winter2018:38

ABriefHistoryofNeuralNetworks…

• ThePerceptronEra:1950sand60s– Greatoptimism withperceptrons(linearmodels)....– ...untilMinsky,1969:perceptronshadlimitedrepresentationpower– Hardproblemsrequirehiddenlayers....buttherewasnotrainingalgorithm

• TheBackpropagationEra:Late1980stomid-90’s– Invention ofbackpropagation– trainingofmodelswithhiddenlayers– Wildenthusiasm(intheUSatleast)....NIPSconference,funding,etc– Mid1990’s:enthusiasmdiesout: trainingdeepNNsishard

• TheDeepLearningEra:2010-present– 3rdwaveofneuralnetworkenthusiasm– Whathappenedsincemid90’s?

• Muchlargerdatasets• Muchgreatercomputationalpower• Fastoptimizationtechniques

P.Smyth:Stats5:DataScience Seminar,Winter2018:39

LearningviaGradientDescent

P.Smyth:Stats5:DataScience Seminar,Winter2018:40

Finding good parameters

• Wanttofindparametersθ whichminimizeourerror…

• Thinkofacost“surface”:errorresidualforthat θ…

P.Smyth:Stats5:DataScience Seminar,Winter2018:41

Gradientdescent

?

• Howtochangeθ toimproveJ(θ)?

• ChooseadirectioninwhichJ(θ)isdecreasing

P.Smyth:Stats5:DataScience Seminar,Winter2018:42

Gradientdescent

• Howtochangeθ toimproveJ(θ)?

• ChooseadirectioninwhichJ(θ)isdecreasing

• Derivative

• Positive=>increasing• Negative=>decreasing

P.Smyth:Stats5:DataScience Seminar,Winter2018:43

Gradientdescentinmoredimensions

• Gradientvector

• Indicatesdirectionofsteepestascent(negative=steepestdescent)

P.Smyth:Stats5:DataScience Seminar,Winter2018:44

Commentsongradientdescent

• Simpleandgeneralalgorithm– Usableinbroadvarietyofmodels

• Localminima– Sensitivetostartingpoint

P.Smyth:Stats5:DataScience Seminar,Winter2018:45

ImageClassificationExamples

P.Smyth:Stats5:DataScience Seminar,Winter2018:46

Example:ClassifyingHandwrittenDigits

Whatthedatalooksliketothehumaneye

Inputs:pixelvaluesfromeachimageOutput:10possibleclasses(0,1,…,9)

P.Smyth:Stats5:DataScience Seminar,Winter2018:47

PixelInputsRepresentedNumerically

From https://www.tensorflow.org/get_started/mnist/beginners

P.Smyth:Stats5:DataScience Seminar,Winter2018:48

Example:ClassifyingHandwrittenDigits

ClassificationAccuracyhasgonefrom93%to99.9%inthepast10years

P.Smyth:Stats5:DataScience Seminar,Winter2018:49

ExamplesofErrorsmadebytheNeuralNetworkClassifier

Image from http://neuralnetworksanddeeplearning.com/chap6.html

Human label (“truth”)

Label predicted by the classifier

P.Smyth:Stats5:DataScience Seminar,Winter2018:50

Russakovsky etal,ImageNet LargeScaleVisualRecognitionChallenge, 2015

P.Smyth:Stats5:DataScience Seminar,Winter2018:51

Trainingdatainputsx=rawpixelvalueslabelsy=valuesfrom1to1000

Trainedonmillionsofimages

Howisnetworkstructuredetermined?Essentiallytrial-and-error(expensive!)

DeepNetworkarchitectureforGoogLeNet network,27layers

P.Smyth:Stats5:DataScience Seminar,Winter2018:52

Figure from Kevin Murphy, Google, 2016

P.Smyth:Stats5:DataScience Seminar,Winter2018:53

Figure from Krizhevsky, Sutskever, Hinton, 2012

P.Smyth:Stats5:DataScience Seminar,Winter2018:54

Figure from Krizhevsky, Sutskever, Hinton, 2012

P.Smyth:Stats5:DataScience Seminar,Winter2018:55

Figure from Lee et al., ICML 2009

P.Smyth:Stats5:DataScience Seminar,Winter2018:56

SequencePredictionExamples

P.Smyth:Stats5:DataScience Seminar,Winter2018:57

LearningbyPredictingwhat’sNext

• Examples– Predictthenextwordapersonwilltypeorspeak,givenwordsuptothispoint– Predictthevalueofthe DowJonestomorrowafternoon,givenhistory

• Wecanusethesamegeneralmethodologiesasbefore– Modelnowusespastdatatopredictnextevent

• Applications– Speechrecognition– Auto-suggest inhumantyping– Machinetranslation– Consumermodeling– Chatbots– …andmore

P.Smyth:Stats5:DataScience Seminar,Winter2018:58

Example:PredictingtheNextCharacter

Figure from http://cs.stanford.edu/people/karpathy/recurrentjs/

P.Smyth:Stats5:DataScience Seminar,Winter2018:59

Example:PredictingCharacterswithaRecurrentNetwork

Figure from http://cs.stanford.edu/people/karpathy/recurrentjs/

P.Smyth:Stats5:DataScience Seminar,Winter2018:60

OutputfromaModelLearnedonShakespeare

KING LEAR: O, if you were a feeble sight, the courtesy of your law, Your sight and several breath, will wear the gods With his heads, and my hands are wonder'd at the deeds, So drop upon your lordship's head, and your opinion Shall be against your honour.

Second Senator: They are away this miseries, produced upon my soul, Breaking and strongly should be buried, when I perish The earth and thoughts of many states.

DUKE VINCENTIO: Well, your wit is in the care of side and that.

Examples from “The Unreasonable Effectiveness of Recurrent Neural Networks”, Andrej Kaparthy, blog, http://karpathy.github.io/2015/05/21/rnn-effectiveness/

P.Smyth:Stats5:DataScience Seminar,Winter2018:61

OutputfromaModelLearnedonCookingRecipes

From https://gist.github.com/nylki/1efbaa36635956d35bcc

P.Smyth:Stats5:DataScience Seminar,Winter2018:62

OutputfromaModelLearnedonSourceCode

Examples from “The Unreasonable Effectiveness of Recurrent Neural Networks”, Andrej Kaparthy, blog, http://karpathy.github.io/2015/05/21/rnn-effectiveness/

P.Smyth:Stats5:DataScience Seminar,Winter2018:63

OutputfromaModelLearnedonMathematicsPapers

Examples from “The Unreasonable Effectiveness of Recurrent Neural Networks”, Andrej Kaparthy, blog, http://karpathy.github.io/2015/05/21/rnn-effectiveness/

P.Smyth:Stats5:DataScience Seminar,Winter2018:64

OutputfromaModelLearnedfromUSPresidentSpeeches

From https://medium.com/@samim/

P.Smyth:Stats5:DataScience Seminar,Winter2018:65

LimitationsofClassificationAlgorithms

P.Smyth:Stats5:DataScience Seminar,Winter2018:66

ADeepNeuralNetworkforImageRecognitionFromNguyen,Yosinski, Clune, CVPR2015

P.Smyth:Stats5:DataScience Seminar,Winter2018:67

ADeepNeuralNetworkforImageRecognition

ImagesusedforTraining NewImages

FromNguyen,Yosinski, Clune, CVPR2015

P.Smyth:Stats5:DataScience Seminar,Winter2018:68

ADeepNeuralNetworkforImageRecognitionFromNguyen,Yosinski, Clune, CVPR2015

P.Smyth:Stats5:DataScience Seminar,Winter2018:69

FromNguyen,Yosinski, Clune, CVPR2015

ADeepNeuralNetworkforImageRecognition

P.Smyth:Stats5:DataScience Seminar,Winter2018:70

P.Smyth:Stats5:DataScience Seminar,Winter2018:71

Date Speaker DepartmentOr Organization Topic

Jan9 PadhraicSmyth ComputerScience IntroductiontoDataScience

Jan16 Padhraic Smyth ComputerScience MachineLearning

Jan23 MichaelCarey ComputerScience DatabasesandDataManagement

Jan30 SameerSingh ComputerScience StatisticalNaturalLanguageProcessing

Feb6 Zhaoxia Yu Statistics AnIntroductiontoClusterAnalysis

Feb13 ErikSudderth ComputerScience ComputerVision andMachineLearning

Feb20 JohnBrock Cylance, Inc DataScienceandCyberSecurity

Feb27 VideoLecture(KateCrawford)

MicrosoftResearchandNYU BiasinMachineLearning

Mar6 MattHarding Economics DataScienceinEconomics andFinance

Mar13 PadhraicSmyth ComputerScience Review:PastandFutureofDataScience

ScheduleofLectures

top related