Top Banner
Deep Learning for Personalized Search and Recommender Systems Ganesh Venkataraman Airbnb Nadia Fawaz, Saurabh Kataria, Benjamin Le, Liang Zhang LinkedIn 1
113

Deep Learning for Personalized Search and Recommender Systems

Jan 21, 2018

Download

Engineering

Benjamin Le
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Deep Learning for Personalized Search and Recommender Systems

DeepLearningforPersonalizedSearchandRecommender

Systems

GaneshVenkataramanAirbnb

NadiaFawaz,SaurabhKataria,BenjaminLe,LiangZhangLinkedIn

1

Page 2: Deep Learning for Personalized Search and Recommender Systems

TutorialOutline

• PartI(45min)DeepLearningKeyconcepts• PartII(45min)DeeplearningforSearchandRecommendationsatScale

• Coffeebreak (30min)

• DeepLearningCaseStudies• PartIII (45min)JobsYouMayBeInterestedIn(JYMBII)atLinkedIn• PartIV(45min)JobSearchatLinkedIn

Q&Aattheendofeachpart2

Page 3: Deep Learning for Personalized Search and Recommender Systems

Motivation– WhyRecommenderSystems?

• Recommendationsystemsareeverywhere.Someexamplesofimpact:• “Netflixvaluesrecommendationsathalfabilliondollarstothecompany”[netflix recsys]

• “LinkedInjobmatchingalgorithmstoimprovesperformanceby50%”[SanJoseMercuryNews]

• “Instagramswitchestousingalgorithmicfeed”[Instagramblog]

3

Page 4: Deep Learning for Personalized Search and Recommender Systems

Motivation– WhySearch?

4

PERSONALIZEDSEARCH

4

Query=“thingstodoinhalifax”Searchview– thisisaclassicIRproblemRecommendationsview– Forthisquery,whataretherecommendedresults?

Page 5: Deep Learning for Personalized Search and Recommender Systems

WhyDeepLearning?Whynow?

• Manyofthefundamentalalgorithmictechniqueshaveexistedsincethe80sorbefore

2.5Exobytes ofdataproducedperdayOr530,000,000songs150,000,000iPhones 5

Page 6: Deep Learning for Personalized Search and Recommender Systems

WhyDeepLearning?

ImageclassificationeCommercefraudSearchRecommendationsNLP

Deeplearningiseatingtheworld

6

Page 7: Deep Learning for Personalized Search and Recommender Systems

WhyDeepLearningandRecommenderSystems?• Features

• Semanticunderstandingofwords/sentencespossiblewithembeddings• Betterclassificationofimages(identifyingcatsinYouTubevideos)

• Modeling• Canwecastmatchingproblemsintoadeep(andpossibly)widenetandlearnfamilyoffunctions?

7

Page 8: Deep Learning for Personalized Search and Recommender Systems

PartI– RepresentationLearningandDeepLearning:KeyConcepts

8

Page 9: Deep Learning for Personalized Search and Recommender Systems

DeepLearningandAI

http://www.deeplearningbook.org/contents/intro.html 9

Page 10: Deep Learning for Personalized Search and Recommender Systems

PartIOutline

• ShallowModelsforEmbeddingLearning• Word2Vec

• DeepArchitectures• FF,CNN,RNN

• TrainingDeepNeuralNetworks• SGD,Backpropagation,LearningRateSchedule,Regularization,Pre-Training

10

Page 11: Deep Learning for Personalized Search and Recommender Systems

LearningEmbeddings

11

Page 12: Deep Learning for Personalized Search and Recommender Systems

Representationlearningforautomatedfeaturegeneration

• NaturalLanguageProcessing• Wordembedding:word2vec,GloVe• SequencemodelingusingRNN’sandLSTM’s

• GraphInputs• DeepWalk

• MultipleHierarchyoffeaturesforvaryinggranularitiesforsemanticmeaningwithdeepnetworks

12

Page 13: Deep Learning for Personalized Search and Recommender Systems

ExampleApplicationofRepresentationLearning- UnderstandingText• Oneofthekeystoanycontentbasedrecommendersystemisunderstandingtext

• Whatdoes“understanding”mean?• Howsimilar/dissimilarareanytwowords?• Whatdoesthewordrepresent?(NamedEntityRecognition)• “AbrahamLincoln,the16th President...”• “MycousindrivesaLincoln”

13

Page 14: Deep Learning for Personalized Search and Recommender Systems

Howtorepresentaword?

• Vocabulary– run,jog,math• Simplerepresentation:

• [1,0,0],[0,1,0],[0,0,1]

• Norepresentationofmeaning• Cooccurrenceinaword/documentmatrix

14

Page 15: Deep Learning for Personalized Search and Recommender Systems

Howtorepresentaword?

• Troublewithcooccurrencematrix• Largedimension,lotsofmemory

• DimensionalityreductionusingSVD• Highcomputationaltimenxm matrix=>O(mn^2)• Addingnewword=>redoeverything

15

Page 16: Deep Learning for Personalized Search and Recommender Systems

Wordembeddingstakingcontext

• KeyConjecture• Contextmatters.• Wordsthatconveyacertaincontextoccurtogether

• “AbrahamLincolnwasthe16th PresidentoftheUnitedStates”• Bigrammodel

• P(“Lincoln”|”Abraham”)

• SkipGramModel• Considerallwordswithincontextandignoreposition• P(Context|Word)

16

Page 17: Deep Learning for Personalized Search and Recommender Systems

Word2vec

17

Page 18: Deep Learning for Personalized Search and Recommender Systems

Word2Vec:SkipGramModel

• Basicnotations:• w representsaword,C(w) representsallthecontextaroundaword• 𝜃 representstheparameterspace• Drepresentallthe(w,c)pairs

• 𝑝 𝑐 𝑤; 𝜃 representstheprobabilityofcontextcgivenwordwparametrizedby𝜃

• Theprobabilityofallthecontextappearinggivenawordisgivenby:• ∏ 𝑝(𝑐|𝑤; 𝜃)�

+∈-(.)

• Thelossfunctionthenbecomes:• 𝑎𝑟𝑔𝑚𝑎𝑥𝜃∏ 𝑝(𝑐|𝑤; 𝜃)�

.,+ ∈6

18

Page 19: Deep Learning for Personalized Search and Recommender Systems

Word2vecdetails

• Let𝑣.and𝑣+ representthecurrentwordandcontext.Notethat𝑣+and𝑣. areparameterswewanttolearn

• p c w; 𝜃 = <=>∗=@

∑ <=B∗=@B∈C

• C representssetofallavailablecontexts

19

Page 20: Deep Learning for Personalized Search and Recommender Systems

NegativeSampling– basicintuition

p c w; 𝜃 = 𝑒E>∗E@

∑ 𝑒EB∗E@F∈-

• Samplefromunigramdistributioninsteadoftakingallcontextsintoaccount

• Word2vecitselfisashallowmodelandcanbeusedtoinitializeadeepmodel

20

Page 21: Deep Learning for Personalized Search and Recommender Systems

DeepArchitecturesFF,CNN,RNN

21

Page 22: Deep Learning for Personalized Search and Recommender Systems

Neuron:ComputationalUnit

• Inputvector:x=[x1,x2,… ,xn]

• Neuron• Weightvector:W• Bias:b• Activationfunction:f

• Outputa=f(WTx+b)

x1

x2

x3

x4

Wbf

a=f(WTx +b)

Inputx Neuron Outputa 22

Page 23: Deep Learning for Personalized Search and Recommender Systems

ActivationFunctions• Tanh:ℝ → (-1,1)

tanh(𝑥) =𝑒M − 𝑒OM

𝑒M + 𝑒OM

• Sigmoid:ℝ → (0,1)

𝜎 𝑥 =1

1 + 𝑒OM

• ReLU:ℝ → [0,+∞)

𝑓 𝑥 = max 0, 𝑥 = 𝑥Whttp://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/

23

Page 24: Deep Learning for Personalized Search and Recommender Systems

Layer

• Layerl:nl neurons• weightmatrix:W=[W1,…,Wnl]• biasvector:b=[b1,…,bnl]• activationfunction:f

• outputvector• a=f(WTx+b)

x1

x2

x3

x4

W1b1f

a1 =f(W1Tx+b1)

W2b2f

a2=f(W2Tx+b2)

Inputx Layer Outputa

W3b3f

a3=f(W3Tx+b3)

24

Page 25: Deep Learning for Personalized Search and Recommender Systems

Layer:MatrixNotation

• Layerl:nl neurons• weightmatrix:W• biasvector:b• activationfunction:f

• outputvector• a=f(WTx+b)

• morecompactnotation• fast-linearalgebraroutinesforquickcomputationsinnetwork

x1

x2

x3

x4

Inputx Layer Outputa

a =f(WTa +b )

W,b ,f

25

Page 26: Deep Learning for Personalized Search and Recommender Systems

FeedForwardNetwork• DepthLlayers

• Activationatlayerl+1a(l+1)=f(W(l)Ta(l)+b(l) )

• Output:predictioninsupervisedlearning

• goal:approximatey=F(x)

x1

x2

x3

x4

InputLayer1 HiddenLayer3

a(3)

HiddenLayer2W(1) ,b(1) ,f(1) W(2) ,b(2) ,f(2)

a(2)

DepthL=4

a(L)

W(3) ,b(3) ,f(3)

26OutputLayer4:Predictionlayer

Page 27: Deep Learning for Personalized Search and Recommender Systems

WhyCNN:ConvolutionalNeuralNetworks?

• Largesizegridstructureddata• 1D:timeseries• 2D:image

• Convolutiontoextractfeaturesfromimage(e.g.edges,texture)• Localconnectivity• Parametersharing• Equivariance totranslation:smalltranslationsininputdonotaffectoutput

Page 28: Deep Learning for Personalized Search and Recommender Systems

Convolutionexample

https://docs.gimp.org/en/plug-in-convmatrix.html

Edgedetectkernel Sharpenkernel

Page 29: Deep Learning for Personalized Search and Recommender Systems

2Dconvolution

http://ufldl.stanford.edu/tutorial/supervised/FeatureExtractionUsingConvolution/

2Dkernel(3x3)

W1 W2 W3 W4

inputmatrix

Kernelmatrix(2x2)

29

Page 30: Deep Learning for Personalized Search and Recommender Systems

• Fullyconnected• hiddenunitconnectedtoallinputunits• computationallyexpensive

• LargeimageNxN pixelsandHiddenlayerKfeatures• Numberofparameters:~KN2

• Locallyconnected• hiddenunitconnectedtosomecontiguousinputunits

• noparametersharing

• Convolution• locallyconnected• kernel:parametersharing

• 1DKernelvector[W1,W2]• 1DToeplitzweightmatrixW

• Scalingtolargeinput,images• Equivariance totranslation

30

W11 W12 W22 W23 W33 W34

W1 W2 W1 W2 W1 W2

W11 W12 W13 W14

W21 W22 W23 W24

W31 W32 W33 W34

W11 W12 0 0

0 W22 W23 0

0 0 W33 W34

Kernelvector

WeightmatrixW

Convolution

W1 W2 0 0

0 W1 W2 0

0 0 W1 W2

Page 31: Deep Learning for Personalized Search and Recommender Systems

Pooling

• Summarystatistics• Aggregateoverregion• Reducesize• Lessoverfitting

• Translationinvariance• Max,mean

http://ufldl.stanford.edu/tutorial/supervised/Pooling/

31

Page 32: Deep Learning for Personalized Search and Recommender Systems

CNN:ConvolutionalNeuralNetwork

Combination• Convolutionallayers• Poolinglayers• Fullyconnectedlayers

http://colah.github.io/posts/2014-07-Conv-Nets-Modular/

32

[LeCun etal.,1998]

Page 33: Deep Learning for Personalized Search and Recommender Systems

CNNexampleforimagerecognition:ImageNet[Krizhevsky etal.,2012]

Picturescourtesyof[Krizhevsky etal.,2012],http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf

33

1st GPU

2nd GPU

filterslearnedbyfirstCNNlayer

Page 34: Deep Learning for Personalized Search and Recommender Systems

WhyRNN:RecurrentNeuralNetwork?• Sequentialdataprocessing

• ex:predictnextwordinsentence:“IwasborninFrance.Icanspeak…”

• RNN• Persistinformationthroughfeedbackloop

• looppassesinformationfromonesteptothenext

• Parametersharingacrosstimeindexes• outputunitdependsonpreviousoutputunitsthroughsameupdaterule.

xt

ht

ht-1

Page 35: Deep Learning for Personalized Search and Recommender Systems

UnfoldedRNN• CopiesofNNpassingfeedbacktooneanother

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

35

Page 36: Deep Learning for Personalized Search and Recommender Systems

LSTM:LongShortTermMemory[Hochreiter etal.,1997]• Avoidvanishingorexplodinggradient

• Cellstateupdatesregulatedbygates• Forget:howmuchinfofromcellstatetoletthrough

• Input:whichcellstatecomponentstoupdate• Tanh:valuestoaddtocellstate• Output:selectcomponentvaluestooutput

picturecourtesyofhttp://colah.github.io/posts/2015-08-Understanding-LSTMs/

Cellstate

• Longtermdependencies• largegapbetweenrelevantinformationand

whereitisneeded• Cellstate:long-termmemory• Canrememberrelevantinformationoverlong

periodoftime

36

Page 37: Deep Learning for Personalized Search and Recommender Systems

ExamplesofRNNapplication

• Speechrecognition[Gravesetal.,2013]• Languagemodeling[Mikolov,2012]

• Machinetranslation[Kalchbrenner etal.,2013][Sustkever etal.,2014]• Imagecaptioning[Vinyals etal.,2014]

37

Page 38: Deep Learning for Personalized Search and Recommender Systems

TrainingaDeepNeuralNetwork

38

Page 39: Deep Learning for Personalized Search and Recommender Systems

CostFunction• mtrainingsamples(featurevector,label)

(𝑥 X , 𝑦 X ), … , (𝑥 [ , 𝑦 [ )

• Persamplecost:errorbetweenlabelandoutputfrompredictionlayer

𝐽 𝑊, 𝑏; 𝑥 _ , 𝑦 _ = 𝑎(`) 𝑥 _ − 𝑦(_)a

• Minimizecostfunctionoverparameters:weightsWandbiasesb

𝐽 𝑊, 𝑏 = 1𝑚b𝐽(𝑊, 𝑏; 𝑥 _ , 𝑦(_))

[

_cX

+𝜆2b 𝑊(f)

ga

`

fcX

Averageerror Regularization39

Page 40: Deep Learning for Personalized Search and Recommender Systems

GradientDescent

• Randomparameterinitialization:symmetrybreaking

• Gradientdescentstep:updateforeveryparameterWij(l)andbi(l)

𝜃 = 𝜃 − 𝛼𝛻j𝔼[𝐽(𝜃)]

• GradientcomputedbyBackpropagation• Highcostofbackpropagationoverfulltrainingset

40

Page 41: Deep Learning for Personalized Search and Recommender Systems

StochasticGradientDescent(SGD)

• SGD:follownegativegradientafter• singlesample

𝜃 = 𝜃 − 𝛼𝛻nJ(θ; 𝑥 _ , 𝑦(_))

• afewsamples:mini-batch(256)

• Epoch:fullpassthroughtrainingset• Randomlyshuffledatapriortoeachtrainingepoch

41

Page 42: Deep Learning for Personalized Search and Recommender Systems

Backpropagation[Rumelhart etal.,1986]

Goal:Computegradientnumerically

RecursivelyapplychainruleforderivativeofcompositionoffunctionsLet𝑦 = 𝑔 𝑥 and𝑧 = 𝑓 𝑦 = 𝑓(𝑔(𝑥))

thenstsM= st

sususM= 𝑓v 𝑔 𝑥 𝑔′(𝑥)

Backpropagationsteps1. Feedforwardpass:computeallactivations2. Outputerror:measuresnodecontributiontooutputerror3. Backpropagate errorthroughalllayers4. Computepartialderivatives

42

Page 43: Deep Learning for Personalized Search and Recommender Systems

Trainingoptimization• LearningRateSchedule

• Changinglearningrateaslearningprogresses

• Pre-training• Goal:trainingsimplemodelonsimpletaskbeforetrainingdesiredmodeltoperformdesiredtask• Greedysupervisedpre-training:pre-trainfortaskonsubsetoflayersasinitializationforfinalnetwork

• Regularizationtocurboverfitting• Goal:reducegeneralizationerror• Penalizeparameternorm:L2,L1• Augmentdataset:trainonmoredata• Earlystopping:returnparametersetatpointintimewithlowestvalidationerror• Dropout[Srivatstava,2013] :trainensembleofallsubnetworksformedbyremovingnon-outputunits

• Gradientclippingtoavoidexplodinggradient• normclipping• elementwiseclipping

43

Page 44: Deep Learning for Personalized Search and Recommender Systems

PartII– DeepLearningforPersonalizedRecommenderSystemsatScale

44

Page 45: Deep Learning for Personalized Search and Recommender Systems

ExamplesofPersonalizedRecommenderSystems

45

Page 46: Deep Learning for Personalized Search and Recommender Systems

ExamplesofPersonalizedRecommenderSystems

JobSearch

46

Page 47: Deep Learning for Personalized Search and Recommender Systems

ExamplesofPersonalizedRecommenderSystems

47

Page 48: Deep Learning for Personalized Search and Recommender Systems

itemj fromasetofcandidates

Useriwith<userfeatures,query(optional)>(e.g.,industry,behavioralfeatures,Demographicfeatures,……)

(i, j):responseyijvisits

Algorithmselects

(actionornot,e.g.click,like,share,apply…)

Whichitem(s)shouldwerecommendtotheuser?• Theitem(s)withthebestexpectedutility• Utilityexamples:

• CTR,Revenue,JobApplyrates,Adsconversionrates,…• Canbeacombinationoftheabovefortrade-offs

Personalized Recommender Systems

48

Page 49: Deep Learning for Personalized Search and Recommender Systems

AnExampleArchitectureofPersonalizedRecommender

Systems

49

Page 50: Deep Learning for Personalized Search and Recommender Systems

UserInteraction

Logs

OfflineModelingWorkflow+User/

Itemderivedfeatures

User

UserFeatureStore

ItemStore+Features

RecommendationRanking

RankingModelStore

AdditionalRe-rankingSteps1

2

4

5

OfflineSystem OnlineSystem

3

AnexampleofRecommenderSystemArchitecture

Itemderivedfeatures

50

Page 51: Deep Learning for Personalized Search and Recommender Systems

UserInteraction

Logs

OfflineModelingWorkflow+User/

Itemderivedfeatures

User

Search-basedCandidateSelection&Retrieval

QueryConstruction

UserFeatureStore

SearchIndexofItems

RecommendationRanking

RankingModelStore

AdditionalRe-rankingSteps

1

2

3

4 5

6

7

OfflineSystem OnlineSystem

Itemderivedfeatures

AnexampleofPersonalizedSearchSystemArchitecture

51

Page 52: Deep Learning for Personalized Search and Recommender Systems

KeyComponents– OfflineModeling

• Trainthemodeloffline(e.g.Hadoop)• Pushmodeltoonlinerankingmodelstore• Pre-generateuser/itemderivedfeaturesforonlinesystemstoconsume

• E.g.user/itemembeddings fromword2vec/DNNsbasedontherawfeatures

52

Page 53: Deep Learning for Personalized Search and Recommender Systems

KeyComponents– CandidateSelection

• PersonalizedSearch(Withuserquery):• Formaquerytotheindexbasedonuserqueryannotation[Aryaetal.,2016]• Example:PandaExpressSunnyvaleà +restaurant:panda express+location:sunnyvale

• Recommendersystem(Optional):• Canhelpdramaticallyreducethenumberofitemstoscoreinrankingsteps[Cheng,etal.,2016,Borisyuk etal.2016]

• Formaquerybasedontheuserfeatures• Goal:Fetchonlytheitemswithatleastsomematchwithuserfeature

• Example:auserwithtitlesoftwareengineer->+title:software engineerforjobsrecommendation

53

Page 54: Deep Learning for Personalized Search and Recommender Systems

KeyComponents- Ranking

• RecommendationRanking• ThemainMLmodelthatranksitemsretrievedbycandidateselectionbasedontheexpectedutility

• AdditionalRe-rankingSteps• Oftenforuserexperienceoptimizationrelatedtobusinessrules,e.g.

• Diversificationoftherankingresults• Recency boost• Impressiondiscounting• …

54

Page 55: Deep Learning for Personalized Search and Recommender Systems

IntegrationofDeepLearningModelsintoPersonalizedRecommender

SystemsatScale

55

Page 56: Deep Learning for Personalized Search and Recommender Systems

Literature:DeepLearningforRecommendationSystems

• RBMforCollaborativeFiltering[Salakhutdinov etal.,2007]• DeepBeliefNetworks[Hintonetal.,2006]• NeuralAutoregressiveDistributionEstimator(NADE)[Zheng,2016]• NeuralCollaborativeFiltering[He,etal.,2017]• Siamesenetworksforuseritemmatching[Huangetal.,2013]• DeepBeliefNetworkswithPre-training[Hintonetal.,2006]• CollaborativeDeepLearning[Wangetal.,2015]

56

Page 57: Deep Learning for Personalized Search and Recommender Systems

UserInteraction

Logs

OfflineModelingWorkflow+User/

Itemderivedfeatures

User

Search-basedCandidateSelection&Retrieval

QueryConstruction

UserFeatureStore

SearchIndexofItems

RecommendationRanking

RankingModelStore

AdditionalRe-rankingSteps

1

2

3

4 5

6

7

OfflineSystem OnlineSystem

Itemderivedfeatures

57

Page 58: Deep Learning for Personalized Search and Recommender Systems

OfflineModeling+User/ItemEmbeddings

UserFeatures ItemFeatures

UserEmbeddingVector

ItemEmbeddingVector

Sim(U,I)

UserFeatureStore

ItemStore/IndexwithFeatures

58

Page 59: Deep Learning for Personalized Search and Recommender Systems

QueryFormulation&CandidateSelection

• Issuesofusingrawtext:Noisyorincorrectquerytaggingdueto• Failuretocapturesemanticmeaning

• Ex.Query:Applewatch->+food:apple+product:watchor+product:applewatch?• Multilingualtext

• Query:熊猫快餐 -> +restaurant:pandaexpress• Cross-domainunderstanding

• Peoplesearchvsjobsearch

59

Page 60: Deep Learning for Personalized Search and Recommender Systems

QueryFormulation&CandidateSelection

• RepresentQueryasanembedding

• Expandquerytosimilarqueriesinasemanticspace

• KNNsearchindensefeaturespacewithInvertedIndex[Cheng,etal.,2016]

Q=“AppleWatch”

D=“iphone”

D=“OrangeSwatch”

D=“ipad”

60

Page 61: Deep Learning for Personalized Search and Recommender Systems

RecommendationRankingModels

• WideandDeepModelstocaptureallpossiblesignals[Cheng,etal.,2016]

https://arxiv.org/pdf/1606.07792.pdf

61

Page 62: Deep Learning for Personalized Search and Recommender Systems

Challenges&OpenProblemsforDeepLearningatRecommenderSystems• Distributedtrainingonverylargedata

• Tensorflow onSpark(https://github.com/yahoo/TensorFlowOnSpark)• CNTK(https://github.com/Microsoft/CNTK)• MXNet (http://mxnet.io/)• Caffe (http://caffe.berkeleyvision.org/)• …

• LatencyIssuesfromOnlineScoring• Pre-generationofuser/itemembeddings• Multi-layerscoring(simplemodels=>complex)

• Batchvsonlinetraining

62

Page 63: Deep Learning for Personalized Search and Recommender Systems

PartIII– CaseStudy:JobsYouMayBeInterestedIn(JYMBII)

63

Page 64: Deep Learning for Personalized Search and Recommender Systems

Outline

• Introduction• GeneratingEmbeddingsviaWord2vec• GeneratingEmbeddingsviaDeepNetworks• TreeFeatureTransformsinDeep+WideFramework

64

Page 65: Deep Learning for Personalized Search and Recommender Systems

Introduction:JYMBII

65

Page 66: Deep Learning for Personalized Search and Recommender Systems

Introduction:ProblemFormulation

• Rankjobsby𝑃 User𝑢appliestoJob𝑗 𝑢, 𝑗)• Modelresponsegiven:

66

CareersHistory,Skills,Education,Connections JobTitle,Description,Location,Company

66

Page 67: Deep Learning for Personalized Search and Recommender Systems

Introduction:JYMBIIModeling- Generalization

Recommend

• Modelshouldlearngeneralrulestopredictwhichjobstorecommendtoamember.

• Learngeneralizationsbasedonsimilarityintitle,skill,location,etc betweenprofile andjobposting

67

Page 68: Deep Learning for Personalized Search and Recommender Systems

Introduction:JYMBIIModeling- Memorization

Appliesto

68

• Modelshouldmemorizeexceptionstotherules• Learnexceptionsbasedonfrequentco-

occurrenceoffeatures

Page 69: Deep Learning for Personalized Search and Recommender Systems

Introduction:BaselineFeatures• Dense BoW SimilarityFeaturesforGeneralization

• i.e:Similarityintitletextgoodpredictorofresponse

• Sparse Two-DepthCrossFeaturesforMemorization• i.e:Memorizethatcomputersciencestudentswilltransitiontoentryengineeringroles

VectorBoW SimilarityFeatureSim(UserTitleBoW,JobTitleBoW)

SparseCrossFeatureAND(user=CompSci.Student,job=SoftwareEngineer)

SparseCrossFeatureAND(user=InSiliconValley,job=InAustin,TX)

SparseCrossFeatureAND(user=MLEngineer,job=UXDesigner)

69

Page 70: Deep Learning for Personalized Search and Recommender Systems

Introduction:Issues

• BoW Featuresdon’tcapturesemanticsimilaritybetweenuser/job• CosineSimilaritybetweenApplicationDeveloperandSoftwareEngineeris0

• Generatingthree-depth,four-depthcrossfeatureswon’tscale• i.e.MemorizingthatFactoryWorkersfromDetroit areapplyingtoFrackingjobsinPennsylvania

• Hand-engineeredfeaturestimeconsumingandwillhavelowcoverage• Permutationsofthree-depth,four-depthcrossfeaturesgrowsexponentially

70

Page 71: Deep Learning for Personalized Search and Recommender Systems

Introduction:Deep+WideforJYMBII

• BoW Featuresdon’tcapturesemanticsimilaritybetweenuser/job• GenerateembeddingstocaptureGeneralization throughsemanticsimilarity• Deep+WidemodelforJYMBII[Chengetal.,2016]

SemanticSimilarityFeatureSim(UserEmbedding,JobEmbedding)

GlobalModelCrossFeatureAND(user=CompSci.Student,job=SoftwareEngineer)

UserModelCrossFeatureAND(user=User2,job=JobLatentFeature1)

JobModelCrossFeatureAND(user=UserLatentFeature,job=Job1)

71

SparseCrossFeatureAND(user=CompSci.Student,job=SoftwareEngineer)

SparseCrossFeatureAND(user=InSiliconValley,job=InAustin,TX)

SparseCrossFeatureAND(user=MLEngineer,job=UXDesigner)

VectorBoW SimilarityFeatureSim(UserTitleBoW,JobTitleBoW)

Page 72: Deep Learning for Personalized Search and Recommender Systems

GeneratingEmbeddingsviaWord2vec:TrainingWordVectors• KeyIdeas

• Sameusers(context)applytosimilarjobs(target)• Similarusers(target)willapplytothesamejobs(context)

ApplicationDeveloper=>SoftwareEngineer

• Trainwordvectorsviaword2vecskip-gramarchitecture• Concatenateuser’scurrenttitle andtheappliedjob’stitleasinput

UserTitle AppliedJobTitle

72

Page 73: Deep Learning for Personalized Search and Recommender Systems

GeneratingEmbeddingsviaWord2vec:ModelStructure

Application,Developer Software,EngineerTokenizedTitles

WordEmbeddingLookupPre-trainedWordVectors

EntityEmbeddingsViaAveragePooling

WordVectors

ResponsePrediction(LogisticRegression)

CosineSimilarity

User Job 73

Page 74: Deep Learning for Personalized Search and Recommender Systems

GeneratingEmbeddingsviaWord2vec:ResultsandNextSteps• ReceiverOperatingCharacteristic– AreaUnderCurveforevaluation

• Responsepredictionisbinaryclassification:Applyordon’tApply• Highlyskeweddata:LowCTRforApplyAction• Goodmetricforrankingquality:Focusondiscriminatoryabilityofmodel

• Marginal0.87% ROCAUCGain• Howtoimprovequalityofembeddings?

• Optimizeembeddingsforpredictiontaskwithsupervisedtraining• Leveragerichercontextaboutuserandjob

74

Page 75: Deep Learning for Personalized Search and Recommender Systems

GeneratingEmbeddingsviaDeepNetworks:ModelStructure

User Job

ResponsePrediction(LogisticRegression)

SparseFeatures(Title,Skill,Company)

EmbeddingLayer

HiddenLayer

EntityEmbedding

Hadamard Product(ElementwiseProduct)

75

Page 76: Deep Learning for Personalized Search and Recommender Systems

GeneratingEmbeddingsviaDeepNetworks:HyperParameters,LotsofKnobs!• OptimizerUsed

• SGDw/Momentumandexponentialdecayvs.Adam[Kingma etal.,2015](Adam)• LearningRate

• 10O� to10O� (𝟏𝟎O𝟒)• EmbeddingLayerSize

• 50to200(100)• Dropout

• 0%to50%dropout(0%dropout)• SharingParameterSpaceforbothuser/jobembeddings

• Assumescommunitivepropertyofrecommendations(a+b=b+a)(Nosharedparameterspace)• HiddenLayerSizes

• 0to2HiddenLayers(200->200 HiddenLayerSize)• ActivationFunction

• ReLU vs.Tanh (ReLU)

76

Page 77: Deep Learning for Personalized Search and Recommender Systems

GeneratingEmbeddingsviaDeepNetworks:TrainingChallenges• Millionsofrowsoftrainingdataimpossibletostoreallinmemory

• Streamdataincrementallydirectlyfromfilesintoafixedsizeexamplepool• Addshufflingbyrandomlysamplingfromexamplepoolfortrainingbatches

• Extremedimensionalityofcompanysparsefeature• Reducedimensionalityofcompanyfeaturefrommillions->tensofthousands• Performfeatureselectionbyfrequencyintrainingset

• Hyperparametertuning• DistributegridsearchthroughparallelmodelinginsingledriverSparkjobs

77

Page 78: Deep Learning for Personalized Search and Recommender Systems

GeneratingEmbeddingsviaDeepNetworks:ResultsModel ROC AUC

Baseline Model 0.753

Deep +WideModel 0.790(+4.91%***)

***Forreference,apreviousmajorJYMBIImodelingimprovementwitha20%liftinROCAUC resultedina30%liftinJobApplications

78

Page 79: Deep Learning for Personalized Search and Recommender Systems

ResponsePrediction(LogisticRegression)

TheCurrentDeep+WideModel

DeepEmbeddingFeatures(FeedForwardNN)

• Generatingthree-depth,four-depthcrossfeatureswon’tscale• Smartfeatureselectionrequired

WideSparseCrossFeatures(Two-Depth)

79

Page 80: Deep Learning for Personalized Search and Recommender Systems

TreeFeatureTransforms:FeatureSelectionviaGradientBoostedDecisionTrees

Eachtreeoutputsapathfromroottoleafencodingacombinationoffeaturecrosses[Heetal.,2014]

GDBT’sselectthemostusefulcombinationsoffeaturecrossesformemorization

MemberSeniority:VicePresident

Yes

No

MemberIndustry:Banking

Yes

No

MemberLocation:SiliconValley

MemberSkill:Statistics

Yes No

80

Yes No

JobSeniority:CXO

NoYes

JobTitle:MLEngineer

Yes No

Page 81: Deep Learning for Personalized Search and Recommender Systems

ResponsePrediction(LogisticRegression)

TreeFeatureTransforms:TheFullPicture

HowtotrainboththeNNmodelandGBDTmodeljointlywitheachother?

DeepEmbeddingFeatures(FeedForwardNN) WideSparseCrossFeatures(GBDT)

81

Page 82: Deep Learning for Personalized Search and Recommender Systems

TreeFeatureTransforms:JointTrainingviaBlock-wiseCyclicCoordinateDescent• TreatNNmodelandGBDTmodelasseparateblock-wisecoordinates• Implementedby

1. TrainingtheNNuntilconvergence2. TrainingGBDTw/fixedNNembeddings3. Trainingtheregressionlayerweightsw/generatedcrossfeaturesfromGBDT4. TrainingtheNNuntilconvergencew/fixedcrossfeatures5. Cyclestep2-4untilglobalconvergencecriteria

82

Page 83: Deep Learning for Personalized Search and Recommender Systems

ResponsePrediction(LogisticRegression)

TreeFeatureTransforms:TrainNNUntilConvergence

Initiallynotreesareinourforest

DeepEmbeddingFeatures(FeedForwardNN) WideSparseCrossFeatures(GDBT)

83

Page 84: Deep Learning for Personalized Search and Recommender Systems

ResponsePrediction(LogisticRegression)

TreeFeatureTransforms:TrainGDBTw/NNSectionasInitialMargin

DeepEmbeddingFeatures(FeedForwardNN) WideSparseCrossFeatures(GDBT)

84

Page 85: Deep Learning for Personalized Search and Recommender Systems

ResponsePrediction(LogisticRegression)

TreeFeatureTransforms:TrainGDBTw/NNSectionasInitialMargin

DeepEmbeddingFeatures(FeedForwardNN) WideSparseCrossFeatures(GDBT)

85

Page 86: Deep Learning for Personalized Search and Recommender Systems

ResponsePrediction(LogisticRegression)

TreeFeatureTransforms:TrainRegressionLayerWeights

DeepEmbeddingFeatures(FeedForwardNN) WideSparseCrossFeatures(GDBT)

86

Page 87: Deep Learning for Personalized Search and Recommender Systems

ResponsePrediction(LogisticRegression)

TreeFeatureTransforms:TrainNNw/GDBTSectionasInitialMargin

DeepEmbeddingFeatures(FeedForwardNN) WideSparseCrossFeatures(GDBT)

87

Page 88: Deep Learning for Personalized Search and Recommender Systems

TreeFeatureTransforms:Block-wiseCoordinateDescent ResultsModel ROC AUC

Baseline Model 0.753

Deep +WideModel 0.790(+4.91%)

Deep +WideModelw/GBDTIteration1 0.792(+5.18%)

Deep +WideModelw/GBDTIteration2 0.794(+5.44%)

Deep +WideModelw/GBDTIteration3 0.795 (+5.57%)

Deep +WideModelw/GBDTIteration4 0.796(+5.71%)

88

Page 89: Deep Learning for Personalized Search and Recommender Systems

JYMBIIDeep+Wide:FutureDirection

• GeneratingEmbeddingsw/LSTMNetworks• Leveragesequentialcareerhistorydata• PromisingresultsinNEMO:NextCareerMovePredictionwithContextualEmbedding[Lietal.,2017]

• Semi-SupervisedTraining• Leveragepre-trainedtitle,skill,andcompanyembeddingsonprofiledata

• ReplaceHadamard Productforentityembeddingsimilarityfunction• DeepCrossing[Shanetal.,2016]

• Addevenrichercontext• i.e.Location,Education,andNetworkfeatures

89

Page 90: Deep Learning for Personalized Search and Recommender Systems

PartIV– CaseStudy:DeepLearningNetworksforJobSearch

90

Page 91: Deep Learning for Personalized Search and Recommender Systems

Outline

• Introduction• RepresentationsviaWord2vec• RobustRepresentationsviaDSSM

91

Page 92: Deep Learning for Personalized Search and Recommender Systems

Introduction:JobSearch

92

Page 93: Deep Learning for Personalized Search and Recommender Systems

Introduction:SearchArchitecture

Index

Indexer

Top-Kretrieval

ResultsOffline Training/Model

Result Ranking

User QueryQueryUnderstanding

93

Page 94: Deep Learning for Personalized Search and Recommender Systems

Introduction: QueryUnderstanding-SegmentationandTagging• Firstdividethesearchqueryintosegments

• Tagquerysegmentsbasedonrecognizedentitytags

OracleJava

Application Developer

OracleJava Application Developer

QuerySegmentations

COMPANY = Oracle SKILL = Java

TITLE = Application Developer

COMPANY = Oracle TITLE = Java Application

Developer

QueryTagging

94

Page 95: Deep Learning for Personalized Search and Recommender Systems

Introduction: QueryUnderstanding–Expansion• Taskofaddingadditionalsynonyms/relatedentitiestothequerytoimproverecall

• CurrentApproach:Curateddictionaryforcommonsynonymsandrelatedentities

COMPANY = Oracle OR NetSuite OR Taleo OR Sun Microsystems OR …

SKILL = Java OR Java EE OR J2EE OR JVM OR JRE OR JDK …

TITLE = Application Developer OR Software Engineer OR

Software Developer ORProgrammer …

Green – SynonymsBlue – RelatedEntities

95

Page 96: Deep Learning for Personalized Search and Recommender Systems

Introduction: QueryUnderstanding- RetrievalandRanking

COMPANY = Oracle OR NetSuite OR Taleo OR Sun Microsystems OR …

SKILL = Java OR Java EE OR J2EE OR JVM OR JRE OR JDK …

TITLE = Application Developer OR Software Engineer OR

Software Developer ORProgrammer …

Title

Title

Skills

Company

96

Page 97: Deep Learning for Personalized Search and Recommender Systems

Introduction: Issues– RetrievalandRanking

• Termretrievalhaslimitations• Crosslanguageretrieval

• Softwareentwickleró Softwaredeveloper• WordInflections

• EngineeringManagementó EngineeringManager

• Queryexpansionviacurateddictionaryofsynonymsisnotscalable• Expensivetorefreshandstoresynonymsforallpossibleentities

• Heavyrelianceonquerytaggingisnotrobustenough• Noveltitle,skill,andcompanyentitieswillnotbetaggedcorrectly• Errorsupstreampropagatestopoorretrievalandranking

97

Page 98: Deep Learning for Personalized Search and Recommender Systems

Introduction:Solution– DeepLearningforQueryandDocumentRepresentations• Queryanddocumentrepresentations

• Mapqueriesanddocumenttexttovectorsinsemanticspace• RobusttoHandleOutofVocabularywords

• Termretrievalhaslimitations• Queryexpansionviacurateddictionaryofsynonymsisnotscalable

• Mapsynonyms,translationsandinflectionstosimilarvectorsinsemanticspace• TermretrievalonclusteridorKNNbasedretrieval

• Heavyrelianceonquerytaggingisnotrobustenough• Complimentstructuredqueryrepresentationswithsemanticrepresentations

98

Page 99: Deep Learning for Personalized Search and Recommender Systems

RepresentationsviaWord2vec:LeverageJYMBIIWork• KeyIdeas

• Similarusers(context)applytothesamejob(target)• Thesameuser(target)willapplytosimilarjobs(context)

ApplicationDeveloper=>SoftwareEngineer

• Trainwordvectorsviaword2vecskip-gramarchitecture• Concatenateuser’scurrenttitle andtheappliedjob’stitleasinput

UserTitle AppliedJobTitle

99

Page 100: Deep Learning for Personalized Search and Recommender Systems

RepresentationsviaWord2vec:Word2vecinRanking

Application,Developer Software,EngineerTokenizedText

WordEmbeddingLookupPre-trainedWordVectors

EntityEmbeddingsViaAveragePooling

WordVectors

LearningtoRankModel(NDCGLoss)

CosineSimilarity

JobQuery 100

Page 101: Deep Learning for Personalized Search and Recommender Systems

RepresentationsviaWord2vec:RankingModelResultsModel Normalized Cumulative

DiscountedGain@5(NDCG@5)CTR@5(%)

BaselineModel 0.582 +0.0%

BaselineModel+Word2VecFeature 0.595(+2.2%) +1.6%

101

Page 102: Deep Learning for Personalized Search and Recommender Systems

RepresentationsviaWord2vec:OptimizeEmbeddingsforJobSearchUseCase• Leverageapplyandclickfeedbacktoguidelearningofembeddings

• Finetuneembeddingsfortaskusingsupervisedfeedback

• Handleoutofvocabularywordsandscaletoqueryvocabularysize• ComparedtoJYMBII,queryvocabularyismuchlargerandlesswell-formed

• Misspellings• WordInflections• Freetextsearch

• Needtomakerepresentationsmorerobustforthesefreetextqueries

102

Page 103: Deep Learning for Personalized Search and Recommender Systems

RobustRepresentationsviaDSSM:DeepStructuredSemanticModel[Huangetal.,2013]

Query AppliedJob(Positive)

ApplicationDeveloper SoftwareEngineerRawText

#Ap,App,ppl… #So,Sof,oft…Tri-letterHashing #Ha,Hai,air…

HairdresserRandomlySampled

AppliedJob(Negative)

HiddenLayer3

HiddenLayer2

HiddenLayer1

CosineSimilarity

Softmax w/CrossEntropyLoss

103

Page 104: Deep Learning for Personalized Search and Recommender Systems

RobustRepresentationsviaDSSM:Tri-letterHashing• Tri-letterHashingExample

• Engineer->#en,eng,ngi,gin,ine,nee,eer,er#

• BenefitsofTri-letterHashing• MorecompactBagofTri-lettersvs.BagofWordsrepresentation

• 700KWordVocabulary->75KTri-letters• Cangeneralizeforoutofvocabularywords• Tri-letterhashingrobusttominormisspellingsandinflectionsofwords

• Engneer ->#en,eng,ngn,gne,nee,eer,er#

104

Page 105: Deep Learning for Personalized Search and Recommender Systems

RobustRepresentationsviaDSSM:TrainingDetails

105

• ParameterSharingHelps• Betterandfasterconvergence• Modelsizeisreduced

• Regularization• L2performsbetterthandropout

• ToolkitComparisons(CNTKvsTensorFlow)• CNTK:Fasterconvergenceandbettermodelquality• TensorFlow:Easytoimplementandbettercommunitysupport.Comparativemodelquality

Trainingperformancewith/oparametersharing

Page 106: Deep Learning for Personalized Search and Recommender Systems

RobustRepresentationsviaDSSM:LessonsinProductionEnvironment

106

+100%

+70%

+40%

• BottlenecksinProductionEnvironment

• Latencyduetoextracomputation• LatencyduetoGCactivity• FatJarsinJVMenvironment

• PracticalLessons• AvoidJVMHeapwhileservingthemodel

• Cachingmostaccessedentities’embedding

Page 107: Deep Learning for Personalized Search and Recommender Systems

RobustRepresentationsviaDSSM:DSSMQualitativeResultsSoftwareEngineer DataMining LinkedIn Softwareentwickler

EngineerSoftware DataMiner Google Software

SoftwareEngineers MachineLearningEngineer

SoftwareEngineers SoftwareEngineer

SoftwareEngineering Microsoft Research SoftwareEngineer EngineerSoftware

Forqualitativeresults,onlytopheadqueriesaretakentoanalyzesimilaritytoeachother

107

Page 108: Deep Learning for Personalized Search and Recommender Systems

RobustRepresentationsviaDSSM:DSSMMetricResultsModel Normalized Cumulative

DiscountedGain@5(NDCG@5)CTR@5Lift (%)

BaselineModel 0.582 +0.0%

BaselineModel+Word2Vec Feature 0.595(+2.2%) +1.6%

BaselineModel+DSSM Feature 0.602(+3.4%) +3.2%

108

Page 109: Deep Learning for Personalized Search and Recommender Systems

RobustRepresentationsviaDSSM:DSSMFutureDirection• LeverageCurrentQueryUnderstandingIntoDSSMModel

• Querytagentityinformationforrichercontextembeddings• Querysegmentationstructurecanbeconsideredintothenetworkdesign

• DeepCrossingforSimilarityLayer[Shanetal.,2016]• ConvolutionalDSSM[Shenetal.,2014]

109

Page 110: Deep Learning for Personalized Search and Recommender Systems

Conclusion

• RecommenderSystemsandpersonalizedsearchareverysimilarproblems

• DeepLearningisheretostayandcanhavesignificantimpactonboth• Understandingandconstructingqueries• Ranking

• Deeplearningandmoretraditionaltechniquesare*not*mutuallyexclusive(hint:Deep+Wide)

110

Page 111: Deep Learning for Personalized Search and Recommender Systems

References• [Rumelhart etal.,1986]Learningrepresentationsbyback-propagatingerrors,Nature1986• [Hochreiter etal.,1997]Longshort-termmemory,Neuralcomputation 1997• [LeCun etal.,1998]Gradient-basedlearningappliedtodocumentrecognition, ProceedingsoftheIEEE 1998

• [Krizhevsky etal.,2012]Imagenet classificationwithdeepconvolutionalneuralnetworks, NIPS2012

• [Gravesetal.,2013]Speechrecognition with deep recurrent neuralnetworks,ICASSP2013• [Mikolov,2012]Statisticallanguage models based on neuralnetworks,PhD Thesis,BrnoUniversity of Technology,2012

• [Kalchbrenner etal.,2013]Recurrent continuous translation models,EMNLP2013• [Srivatstava,2013]Improving neuralnetworks with dropout,PhD Thesis,University of Toronto,2013

• [Sustkever etal.,2014]Sequence tosequence learningg with neuralnetworks,NIPS2014• [Vinyals etal.,2014]Showandtell:aneuralimagecaption generator,Arxiv 2014• [Zaremba etal.,2015]Recurrent NeuralNetworkRegularization,ICLR2015

111

Page 112: Deep Learning for Personalized Search and Recommender Systems

References(continued)• [Aryaetal.,2016]PersonalizedFederatedSearchatLinkedIn,CIKM2016• [Chengetal.,2016]Wide&DeepLearningforRecommenderSystems,DLRS2016• [Heetal.,2014]PracticalLessonsfromPredictingClicksonAdsatFacebook,ADKDD2014• [Kingma etal.,2015]Adam:AMethodforStochasticOptimization,ICLR2015• [Huangetal.,2013]LearningDeepStructuredSemanticModelsforWebSearchusingClickthrough Data,CIKM2013• [Lietal.,2017]NEMO:NextCareerMovePredictionwithContextualEmbedding,WWW2017• [Shanetal.,2016]DeepCrossing:Web-scalemodelingwithoutmanuallycraftedcombinatorialfeatures,KDD2016• [Zhangetal.,2016]GLMix:GeneralizedLinearMixedModelsForLarge-ScaleResponsePrediction,KDD2016• [Salakhutdinov etal.,2007]RestrictedBoltzmannMachinesforCollaborativeFiltering,ICML2007• [Zheng,2016]http://tech.hulu.com/blog/2016/08/01/cfnade.html• [Hintonetal.,2006]Afastlearningalgorithmfordeepbeliefnets,NeuralComputations2006• [Wangetal.,2015]CollaborativeDeepLearningforRecommenderSystems,KDD2015• [Heetal.,2017]NeuralCollaborativeFiltering,WWW2017• [Borisyuk etal.2016].CaSMoS:AFrameworkforLearningCandidateSelectionModelsoverStructuredQueriesand

Documents,KDD2016

112

Page 113: Deep Learning for Personalized Search and Recommender Systems

References(continued)

• [netflix recsys]http://nordic.businessinsider.com/netflix-recommendation-engine-worth-1-billion-per-year-2016-6/

• [SanJoseMercuryNews]http://www.mercurynews.com/2017/01/06/at-linkedin-artificial-intelligence-is-like-oxygen/

• [Instagramblog]http://blog.instagram.com/post/145322772067/160602-news

113