Top Banner
The Automated Travel Agent: Machine Learning for Hotel Cluster Recommendation Michael Arruza-Cruz, Michael Straka, John Pericich Expedia users who prefer the same types of hotels presumably share other commonalities (i.e., non-hotel commonalities) with each other. With this in mind, Kaggle challenged developers to recommend hotels to Expedia users. Armed with a training set containing data about 37 million Expedia users, we set out to do just that. Our machine-learning algorithms ranged from direct applications of material learned in class to multi-part algorithms with novel combinations of recommender system techniques. Kaggle’s benchmark for randomly guessing a user’s hotel cluster is 0.02260, and the mean average precision K = 5 value for naïve recommender systems is 0.05949. Our best combination of machine-learning algorithms achieved a figure just over 0.30. Our results provide insight into performing multi-class classification on data sets that lack linear structure. 37 million data entries corresponding to user data, each with a total of 23 features corresponding to hotel destination, number of rooms, number of children, length of stay, etc. Hotel clusters are anonymized, and only user data is given. This makes typical user-item matrix methods such as Alternating Least Squares impossible to use. The data set is skewed towards certain hotel clusters over others; certain clusters are overrepresented while others appear very rarely. Likewise, some destinations appear very frequently, and others appear only a handful of times. The data is also not linearly separable. Baseline methods: Our first attempt involved implementing a basic multinomial Naïve Bayes classifier that returns a list of the top ten most likely clusters for a user. This method served as our baseline moving forward. Our second attempt involved the use of a support vector machine with an RBF kernel. This underperformed compared to Naïve Bayes. We suspect it is diffcult to fit a hyperplane to the data without using parameters that result in overfitting due to the lack of linear separability Gradient Boosting: First real success found by using ensemble of decision trees minimizing softmax loss function Likely due to intelligent learning of non-linear structure in the data, along with boosting’s resistance to overfitting Converged more slowly than SVM and Naïve Bayes for increased values of K in Mapk, implying its rankings are more nuanced Kernelized User Similarity (Most Effective Method): First, we cluster the data together based on destination; training data sharing the same destination id are grouped together. For each new testing example, we retrieve the training group with a matching destination id and create user similarity matrices. The matrices are made utilizing a kernel function; we tried the following three kernels: " (, ) = ) 1{ , = , } . ,/" exp (− 6 2 6 ) Where z is the placement of the user in terms of similarity to the test example (first most similar, second, third, etc.) and tau is 60. 6 (, ) = 9 1 ) 1{ , = , } . ,/" ; < where e = 5. = (, ) = 1 > ) 1{ , = , } . ? ,/" +) , , . ?? ,/" where x and y are divided into two vectors of size m’ and m’’, with different features separated into each. Once the similarity matrix is created, find the top 150 users most similar to the test example, and for each hotel cluster represented in the 150 users sum their similarity score (determined by the chosen kernel method). Then, recommend the top 10 hotel clusters (by similarity score) found in the most similar users. Of the three kernels mentioned above, the second kernel proved most effective, likely due to heavily discretized nature of the user features Algorithm Precision Recall F1 SVM (RBF Kernel) 0.00543661971831 0.0100680272109 0.000970935340416 Naïve Bayes 0.0758968048912 0.0724648868325 0.0735346947801 Gradient Boosting 0.123878702014 0.128351405483 0.108727388991 User Similarity: Kernel 1 0.182515928508 0.182225684972 0.171697778825 User Similarity: Kernel 3 0.183583426068 0.182550650282 0.171638835758 User Similarity: Kernel 2 0.185584222149 0.184623438005 0.173430366799 This project provides an excellent case study for applying machine learning algorithms to large data sets lacking obvious structure. It also embodies the challenge of recommending items about which we have no features. In addressing these challenges, we demonstrated that a creative combination of user similarity matrices and Jaccard similarity outperforms gradient boosting—a technique currently well-known for winning Kaggle competitions. For future work, we recommend using ensemble stacking methods to combine predictions from various algorithms. Further work could also explore tuning hyper-parameters for gradient boosting. Methodology Abstract Data and Features Results Discussion and Future Work Figure 1: Frequency of hotel clusters in data set. Figure 2: PCA in three dimensions of data for three most popular hotel clusters. While not linearly separable, there does appear to be some non-linear structure to the hotel clusters. The success of our methods based on user similarity support this. Overall, the best methods were the methods that utilized user similarity and kernels to recommend hotel clusters that other similar users booked. Gradient Boosting was also effective, but mean average precision seemed to hit a hard cap at .25 regardless of the parameters used. The SVM performed very poorly, as did other basic machine learning methods attempted on the data initially. 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 SVM (RBF Kernel) Gradient Boosting User Similarity: Kernel 1 User Similarity: Kernel 2 User Similarity: Kernel 3 Naiive Bayes Mean Average Precision: 5 Predictions Mean Average Precision: 5 Predictions
1

The Automated Travel Agent: Machine Learning for Hotel ...cs229.stanford.edu/proj2016spr/poster/017.pdf · The Automated Travel Agent: Machine Learning for Hotel Cluster Recommendation

Feb 24, 2018

Download

Documents

trantuyen
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Automated Travel Agent: Machine Learning for Hotel ...cs229.stanford.edu/proj2016spr/poster/017.pdf · The Automated Travel Agent: Machine Learning for Hotel Cluster Recommendation

TheAutomatedTravelAgent:MachineLearningforHotelClusterRecommendationMichaelArruza-Cruz,MichaelStraka,JohnPericich

Expediauserswhopreferthesametypesofhotelspresumablyshareothercommonalities(i.e.,non-hotelcommonalities)witheachother.Withthisinmind,KagglechallengeddeveloperstorecommendhotelstoExpediausers.Armedwithatrainingsetcontainingdataabout37millionExpediausers,wesetouttodojustthat.Ourmachine-learningalgorithmsrangedfromdirectapplicationsofmateriallearnedinclasstomulti-partalgorithmswithnovelcombinationsofrecommendersystemtechniques.Kaggle’sbenchmarkforrandomlyguessingauser’shotelclusteris0.02260,andthemeanaverageprecisionK=5valuefornaïverecommendersystemsis0.05949.Ourbestcombinationofmachine-learningalgorithmsachievedafigurejustover0.30.Ourresultsprovideinsightintoperformingmulti-classclassificationondatasetsthatlacklinearstructure.

• 37milliondataentriescorrespondingtouserdata,eachwithatotalof23featurescorrespondingtohoteldestination,numberofrooms,numberofchildren,lengthofstay,etc.

• Hotelclustersareanonymized,andonlyuserdataisgiven.Thismakestypicaluser-itemmatrixmethodssuchasAlternatingLeastSquaresimpossibletouse.

• Thedatasetisskewedtowardscertainhotelclustersoverothers;certainclustersareoverrepresentedwhileothersappearveryrarely.

• Likewise,somedestinationsappearveryfrequently,andothersappearonlyahandfuloftimes.

• Thedataisalsonotlinearlyseparable.

Baselinemethods:• OurfirstattemptinvolvedimplementingabasicmultinomialNaïveBayesclassifier

thatreturnsalistofthetoptenmostlikelyclustersforauser.Thismethodservedasourbaselinemovingforward.

• OursecondattemptinvolvedtheuseofasupportvectormachinewithanRBFkernel.ThisunderperformedcomparedtoNaïveBayes.Wesuspectitisdiffculttofitahyperplanetothedatawithoutusingparametersthatresultinoverfittingduetothelackoflinearseparability

GradientBoosting:

• Firstrealsuccessfoundbyusingensembleofdecisiontreesminimizingsoftmaxlossfunction

• Likelyduetointelligentlearningofnon-linearstructureinthedata,alongwithboosting’sresistancetooverfitting

• ConvergedmoreslowlythanSVMandNaïveBayesforincreasedvaluesofKinMapk,implyingitsrankingsaremorenuanced

KernelizedUserSimilarity(MostEffectiveMethod):

• First,weclusterthedatatogetherbasedondestination;trainingdatasharingthesamedestinationidaregroupedtogether.

• Foreachnewtestingexample,weretrievethetraininggroupwithamatchingdestinationidandcreateusersimilaritymatrices.Thematricesaremadeutilizingakernelfunction;wetriedthefollowingthreekernels:

𝐾"(𝑥, 𝑦) =)1{𝑥, = 𝑦,}.

,/"

exp(−𝑧6

2𝜏6)

Wherezistheplacementoftheuserintermsofsimilaritytothetestexample(firstmostsimilar,second,third,etc.)andtauis60.

𝐾6(𝑥, 𝑦) = 91𝑚)1{𝑥, = 𝑦,}

.

,/"

;<

wheree=5.

𝐾=(𝑥, 𝑦) =1𝑚>)1{𝑥, = 𝑦,}

.?

,/"

+ 𝑘)𝑥,𝑦,

.??

,/"

wherexandyaredividedintotwovectorsofsizem’andm’’,withdifferentfeaturesseparatedintoeach.

• Oncethesimilaritymatrixiscreated,findthetop150usersmostsimilartothetestexample,andforeachhotelclusterrepresentedinthe150userssumtheirsimilarityscore(determinedbythechosenkernelmethod).

• Then,recommendthetop10hotelclusters(bysimilarityscore)foundinthemostsimilarusers.

• Ofthethreekernelsmentionedabove,thesecondkernelprovedmosteffective,likelyduetoheavilydiscretizednatureoftheuserfeatures

Algorithm Precision Recall F1

SVM(RBFKernel)

0.00543661971831 0.0100680272109 0.000970935340416

NaïveBayes

0.0758968048912 0.0724648868325 0.0735346947801

GradientBoosting

0.123878702014 0.128351405483 0.108727388991

UserSimilarity:Kernel1

0.182515928508

0.182225684972

0.171697778825

UserSimilarity:Kernel3

0.183583426068

0.182550650282

0.171638835758

UserSimilarity:Kernel2

0.185584222149

0.184623438005

0.173430366799

Thisprojectprovidesanexcellentcasestudyforapplyingmachinelearningalgorithmstolargedatasetslackingobviousstructure.Italsoembodiesthechallengeofrecommendingitemsaboutwhichwehavenofeatures.Inaddressingthesechallenges,wedemonstratedthatacreativecombinationofusersimilaritymatricesandJaccardsimilarityoutperformsgradientboosting—atechniquecurrentlywell-knownforwinningKagglecompetitions.Forfuturework,werecommendusingensemblestackingmethodstocombinepredictionsfromvariousalgorithms.Furtherworkcouldalsoexploretuninghyper-parametersforgradientboosting.

MethodologyAbstract

DataandFeaturesResults

DiscussionandFutureWork

Figure1:Frequencyofhotelclustersindataset.

Figure2:PCAinthreedimensionsofdataforthreemostpopularhotelclusters.Whilenotlinearlyseparable,theredoesappeartobesomenon-linearstructuretothehotelclusters.Thesuccessofourmethodsbasedonusersimilaritysupportthis.

Overall,thebestmethodswerethemethodsthatutilizedusersimilarityandkernelstorecommendhotelclustersthatothersimilarusersbooked.GradientBoostingwasalsoeffective,butmeanaverageprecisionseemedtohitahardcapat.25regardlessoftheparametersused.TheSVMperformedverypoorly,asdidotherbasicmachinelearningmethodsattemptedonthedatainitially.

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

SVM(RBFKernel)

GradientBoosting

UserSimilarity:Kernel1

UserSimilarity:Kernel2

UserSimilarity:Kernel3

NaiiveBayes

MeanAveragePrecision:5Predictions

MeanAveragePrecision:5Predictions