A Joint Deep Recommendation Framework for …downloads.hindawi.com/journals/complexity/2019/2926749.pdfrecommendation process, as will be demonstrated in this paper. Although numerous
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Research ArticleA Joint Deep Recommendation Framework for Location-BasedSocial Networks
Omer Tal and Yang Liu
Department of Physics and Computer Science Wilfrid Laurier University Waterloo Ontario Canada
Correspondence should be addressed to Yang Liu yangliuwluca
Received 1 December 2018 Revised 11 February 2019 Accepted 5 March 2019 Published 19 March 2019
Guest Editor Jiajie Xu
Copyright copy 2019 Omer Tal and Yang Liu This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited
Location-based social networks such as Yelp and Tripadvisor which allow users to share experiences about visited locations withtheir friends have gained increasing popularity in recent years However as more locations become available the need for accuratesystems able to present personalized suggestions arises By providing such service point-of-interest recommender systems haveattracted much interest from different societies leading to improved methods and techniques Deep learning provides an excitingopportunity to further enhance these systems by utilizing additional data to understand usersrsquo preferences better In this workwe propose Textual and Contextual Embedding-based Neural Recommender (TCENR) a deep framework that employs contextualdata such as usersrsquo social networks and locationsrsquo geo-spatial data along with textual reviews To make best use of these inputswe utilize multiple types of deep neural networks that are best suited for each type of data TCENR adopts the popular multilayerperceptrons to analyze historical activities in the system while the learning of textual reviews is achieved using two variations ofthe suggested framework One is based on convolutional neural networks to extract meaningful data from textual reviews andthe other employs recurrent neural networks Our proposed network is evaluated over the Yelp dataset and found to outperformmultiple state-of-the-art baselines in terms of accuracy mean squared error precision and recall In addition we provide furtherinsight into the design selections and hyperparameters of our recommender system hoping to shed light on the benefit of deeplearning for location-based social network recommendation
1 Introduction
In todayrsquos age of information it has become a prerequisite tohave reliable data prior to making any decision Preferablywe expect to obtain reviews from like-minded users beforeinvesting our time and money for any product or serviceLocation-based social networks (LBSN) such as Yelp Tri-pAdvisor and Foursquare provide such information aboutpotential locations while allowing users to connect withtheir friends and other users that have similar tastes Andindeed these networks are gaining popularity Yelp recently(httpswwwyelpcafactsheet) reported having 72 millionmobile active users every month while TripAdvisor achievedno less than 390 million monthly users (httpswwwtripad-visorcomTripAdvisorInsightsw828) However as LBSNshave more users with various preferences and additionallocations are being added on a daily basis it becomes timeconsuming to browse through all possibilities before findingan appropriate restaurant or venue to visit
Point-of-interest (POI) recommendation a subfield ofrecommender systems (RS) attempts to provide LBSN userswith personalized suggestions Properly exploited it can savetime and effort for the end user and encourages her to makefuture use in the LBSN both as a consumer and as a contentprovider Collaborative filtering (CF) is probably the mostprominent RS paradigm It is based on the assumption thatusers who had resembling past activities will have similarfuture preferences
While POI recommendation methods usually attempt tosolve a similar problem as the standard recommender systemthey are required to overcome additional challenges Firstdata sparsity commonly referred to as the cold start problemis based on the fact that most users will only interact with asmall fraction of the possible locations and vice-versa Whenadopting methods of CF it becomes harder to representusers and locations based on similar past activities whenthe variation is high This issue is worsened for users and
HindawiComplexityVolume 2019 Article ID 2926749 11 pageshttpsdoiorg10115520192926749
2 Complexity
locations that have few or no past interactions known tothe system as available data is insufficient to fully capturetheir features Second in contrast to other recommendationscenarios the decision whether to visit a location dependsnot only on the target userrsquos preferences but on that ofher friends [1] They might share different interests that areunknown to the system resulting in amore complex decisionprocess for the RS to learn Furthermore attributes that relateto the location itself and are unknown to the system mayhave impact on the decision For example the availability ofparking or convenient public transport or the popularity ofthe area itself can be the decisive factor for a user Thesechallenges proved the classic CF methods to be insufficientand prone to overfit the data as reported by previous works[2 3]
A common approach to address sparsity while acknowl-edging the userrsquos deriving factors is by utilizing contextualdata such as social networks [2 4] geographical locations[5 6] categories [1] and textual reviews [7 8] Deep neuralnetworks have been enthusiastically adopted in recent years[2 9 10] to employ such complex contextual inputs whilebeing able to process the vast amount of available dataDifferent neural network architectures have been introducedto improve the recommendation performance such as mul-tilayer perceptrons (MLP) [3 11 12] convolutional neuralnetworks (CNN) [13ndash15] and recurrent neural networks(RNN) [16ndash18] These techniques were shown to highlybenefit from the addition of contextual attributes As moredata becomes available incorporating these features presentsa promising opportunity to improve the personalized POIrecommendation process as will be demonstrated in thispaper
Although numerous works established the potential ofrecommender systems based on deep learning most havefocused on only a single type of neural network that bestsuited their given task However in this work we intendto incorporate multiple types of neural networks ie MLPCNN and RNN to provide POI recommendation usingvarious types of inputs First we will describe our proposedmethod Textual and Contextual Embedding-based NeuralRecommender (TCENR) a framework that takes usersrsquo socialnetwork locationsrsquo geographical data and textual reviewsalong with historical activities to provide personalized top-k POI recommendations By optimizing MLP and CNNjointly the proposed model will learn different aspects of thesame user-location interaction in conjunction and thereforewill be better suited to capture the underlying factors inthe userrsquos selection We will then present an extensionof TCENR denoted as TCENRseq where we replace theCNN component with that of an RNN and attempt tolearn user preferences and location attributes by treatingthe written reviews as a sequential input rather than byfocusing on their most important words Although theproposed solution has been developed to provide recom-mendations for specific types of inputs we claim it can beeasily generalized to a framework able to support additionalfeatures
The main contributions of our work are as follows
(1) We present TCENR a framework that jointly trainsMLP and CNN to provide POI recommendationsas well as a variation TCENRseq that performs thesame task while adopting RNN instead of the CNNcomponent
(2) To the best of our knowledge no work has been donein jointly training MLP and CNN for the task of POIrecommendation using social networks geographicallocations and natural language reviews as inputs
(3) Evaluated over the Yelp dataset our proposed frame-works were found to outperform seven state-of-the-art baselines in terms of accuracyMSE precision andrecall
(4) By comparing the two alternatives to our suggestedmodel we provide insight into the impact gained byanalyzing textual reviews as a secondary input to thecommon past interactions as well as a comparison ofCNN and RNN for the task of sentiment analysis inthe same experimental settings
(5) We further present comprehensive analysis over themost important hyper-parameters and design selec-tions of our proposed networks shedding some lightover the different components of deep neural net-works in the task of POI recommendation
In the following section we first lay the background andintroduce relatedworks to ourmodel In Section 3 we developour proposed frameworks which are evaluated and analyzedin Section 4 Finally we summarize our work and introducefuture work in Section 5
2 Related Work
Incorporating additional data about users and items is acommon strategy to provide meaningful recommendationsand to mitigate the cold-start problem In the area of POIrecommender systems where the data is highly sparse suchpractices are essential
21 Context-Based Recommender Systems Location-basedsocial networks are usually rich in contextual inputs whichpresent various opportunities for data enrichment in RSSuch features include time [9] spatial location [19] userrsquossocial network [4] itemrsquos meta-data [1] photos [20] anddemographics [11]
Contextual data is usually incorporated into RS eitheras part of the input or as a regularizing factor Reference[11 21] exploits the strengths of MLP-based networks inmodeling complex relationships by concatenating multiplefeature embeddings to the input before feeding it to aseries of nonlinear layers The final layerrsquos output is thena representation of a user-item interaction adjusted to thegiven context Sequentiality is very often introduced torecommender systems by treating sequences of past user anditem interactions in a timely manner In the case of POIrecommendations it consists of feeding sequences of check-ins to RNN-based layers such as long-short term memory(LSTM) [22] or themore concise gated recurrent units (GRU)[23]
Complexity 3
The use of spatial data is often done by dividing theinput space into roles and regions Assuming usersrsquo behaviorvaries when traveling far from home previous works [5 6]generated two profiles for each user one to be used in herhome region while another in more distant locations Arecent approach [24ndash26] attempts to divide the input spaceinto geographical regions before incorporated into themodeloften by hierarchical structures Although methods based onregions and roles are able to better distinguish user behaviorsin varied locations they do not provide a personalized userrepresentation and can ignore potential shifts of preferencesfrom one region to another For example a user might preferto visit a Starbucks location in different regions close or farfrom home Enriching geographical features with additionaldata is demonstrated in [25 27] where the next locationprediction is partially determined by past sequences of check-ins However these methods are not generic and cannot beextended to include additional features derived from socialnetworks or textual data
Furthermore in case of tasks in highly sparse environ-ments such as POI recommendation adding user or itemspecific inputs may diminish the modelrsquos ability to generalizeHowever applying the same data as a regulating factor canenhance the modelrsquos performance and reduce over-fittingSuch has been done in [4] where the similarity betweenconnected users in the social network was used to constraina matrix factorization (MF) model Reference [2] utilizedsocial networks and geographical distances to enforce similarembeddings for users and locations thus improving themodelrsquos ability to generalize for users and locations with fewhistorical records
22 Text-Based Recommendation Since many websites en-courage users to provide a written explanation to theirnumeric ratings textual reviews are one of the most populartypes of data to be integrated into RS Previous works hadadopted probabilistic-based approaches to alleviate the datasparsity problem using textual input [28ndash31] By expressingeach review as a bag of words LDA-based models are ableto extract topics which can be used to represent usersrsquointerests and locationsrsquo characteristics [5 6 25] These prob-abilistic methods are usually successful in handling issuesthat standard CF approaches struggle with such as out-of-town recommendations where similar users lack sufficienthistorical data However as demonstrated in recent works[17 32] failing to preserve the original order of wordsand ignoring their semantic meaning prevent the successfulmodeling of a given review On the other hand an emergingtrend of adopting neural networks over reviews allows suchlearning without the loss of data These implementations cangenerally be categorized into RNN-based [17 18] and CNN-based [14 15] models
RNN-based recommender systems usually rely on thesequential structure of sentences to learn their meaningIn [18] the words describing a target item are fed to abidirectional RNN layer Following the GRU architecture themodel utilizes an accumulated context from successive wordsto provide better representation of each word To preserveand update the context of words the recurrent model has to
manage a large number of parameters The problem becomesmore prominent when adopting the popular LSTMparadigmas done in [17]
Following its success in the field of computer vision [33]CNN-based models are gaining popularity in other areassuch as textualmodeling [34] By employing a slidingwindowover a given document such networks are able to representdifferent features foundwithin the text by identifying relevantsubsets of words Reference [15] follows the standard CNNstructure comprised of an embedding to represent thesemantic meaning of each word a convolution layer forgenerating local features and a max pooling operation toidentify the most relevant factors In [14] two CNNs aredeveloped to represent the target user and item based on theirreviews The resulting vectors are regarded as the user anditem representations which are then fed to a nonlinear layerto learn their corresponding rating
We claim that by jointly learning contextual and textualbased deep models our proposed method will better exploitthe strengths of collaborative filtering while being moreresilient to its shortcomings in sparse scenarios This will beachieved by learning usersrsquo and locationsrsquo representations assimilarities in direct interactions along with the correlationin underlying features extracted from their written reviewsThe notable work of [35] proposed JRL a framework thatsimilarly attempts to jointly optimize multiple models whereeach is responsible for learning a unique perspective of thesame task by focusing on different inputs However whileJRL is a general framework focused on extendability to manytypes of input our proposed method is tailored for the taskof POI recommendation
3 The Neural Recommender
31 Neural Network Architecture The following recom-mender system aims to improve the POI recommendationtask by learning user-location interactions using two parallelneural networks as shown in Figure 1 The context-basednetwork presented in the left part of the figure is designed tomodel the user-POI preferences using social and geographi-cal attributes and based on a multilayer perceptron structure[2 3] Shown in the right side of Figure 1 the convolutionalneural network is responsible for the textual modeling unit[14] It attempts to learn the same preference by analyzing theunderlying meaning in usersrsquo and locationsrsquo reviews Each ofthe two networks is based on modeling the userPOI inputindividually with regard to their shared interaction definedin the merge layers
311 Context-Based Network Layers To better capture thecomplex relations between users and locations in LBSN wechose to adopt the multilayer perceptron architecture Bystacking multiple layers of nonlinearities MLP is capable oflearning relevant latent factors of its inputs It is first fedwith user and location vectors of sizes N and M whereeach input tuple ⟨119906 119901⟩ is transformed into sparse one-hotencoding representations The two fully connected embed-ding layers found on top of the input layer project the sparserepresentations of users and locations into smaller and denser
4 Complexity
Prediction y
Hidden Layer
Hidden Layer
Merge Layer
Merge Layer
Hidden Layer
Merge Layer
oplus oplus
oplus oplusoplus oplus
User GraphOutput gu
POI GraphOutput gp
Somax Layer(cEu)
Somax Layer(cEp)
User Embedding(Eu)
POI Embedding(Ep)
User u POI p
Full Connected (ℎu) Fully Connected (ℎp)
Pooling Layer (Ou) Pooling Layer (Op)
Conv Layer (Zu) Conv Layer (Zp)
Word Embedding (Vu) Word Embedding (Vp)
User Reviews (du) POI Reviews (dp)
Figure 1 TCENR Framework
vectors For 119906 and 119901 the respective embedding matrices are119864119906 isin R119896119906times119873 and 119864119901 isin R119896119901times119872 where 119896119906 and 119896119901 are thecorresponding dimensions
We exploit the usersrsquo social networks along with the loca-tionsrsquo geographical graphs to constrain the learned embed-dings Two softmax layers take the representations 119864119906 and 119864119901as input and transform them back to N and M sized vectorsrespectively The user output layer 120595119888119864119906 isin R119873 describes thesimilarity of a userrsquos embedding to all N users and can beformally denoted as
120595119888119864119906 = 119886 (119882119906119888 times 119864119906 + 119887119906119888 ) (1)
where 119882119906119888 and 119887119906119888 are the layerrsquos weight matrix and biasvector and a is a nonlinear activation function Due tothe similarity between the user and POI specific layers thelocation output layer 120595119888119864119901 will not be developed in thissection Enforcing 120595119888119864119906 and 120595119888119864119901 to resemble user ursquos socialgraph 119892119906 and location prsquos spatial graph 119892119901 respectivelyresults in a smoothing factor that limits the amount by whichconnected entitiesrsquo embeddings differ
The user and location embeddings are projected to amerge layer and combined by a concatenation operatorUsing concatenation instead of dot-product allows variedembedding structures which in turn improves the generatedrepresentations [3] As the input layer for the following neuralnetwork the merge layer can be represented as ℎ(0)(119909) where
119909 = [119864119906 119864119901] (2)
Since simple concatenation of the user-location embeddedvectors does not allow for interactions to bemodeled hiddenlayers are added to learn these connections The popularRectified Linear Units (ReLU) is employed as the activationfunction for these layers More formally the q-th hidden layercan be defined as
where119882119902 and 119887119902 are the q-th layer parameters
312 Textual Modeling Network Layers To improve themodelrsquos coverage a textual-based network is introduced Itsimultaneously learns the same interaction as the contextual-based network but with a natural language input Twoadditional vectors 119889119906 and 119889119901 representing user ursquos andlocation prsquos textual reviews respectively are applied as inputsfor this network Each vector is comprised of all n wordswritten by the user or about the location merged togetherkept in their original order These words are then mappedto c-dimensional vectors defining their semantic meaningin the following embedding layers The output of the userembedding layer is the representation of all words used bya user u in the form of a matrix and can be denoted as
where [1206011 1206012] denotes the concatenation of two vectors 119889119906119897 isuser 119906rsquos 119897-th word and 120601 119863 997888rarr R119896119908 is a lookup function to
Complexity 5
Fully Connected
Max Pooling
ConvolutionalLayers
WordEmbeddings
delicious healthy food steak is amazing
(a) Textual modeling component using CNN
Fully Connected
BI-directionalGRU
Max Pooling
WordEmbeddings
delicious healthy food steak is amazing
(b) Textual modeling component using RNN
Figure 2 Proposed alternatives to learn user and location representations from textual reviews 2(a) is a CNN-based solution employed inTCENR while 2(b) illustrates the suggested extension using RNN
a pre-trained textual embedding layer [36 37] that representseach word in vocabulary 119863 as a vector in size 119896119908 Similarly119881119901 denotes the word embedding matrix for location 119901
Due to the large amount of parameters required to trainthe aforementioned contextual model the textual network isimplemented using a CNN-based architecture which is usu-ally more computationally efficient than RNNThe semanticrepresentations of usersrsquo and locationsrsquo reviews are fed toconvolution layers to detect parts of the text that best capturethe reviewrsquos meaning These layers produce 119905 feature mapsover the embedded word vector using a window size of 119908119904and filter119870 isin R119896119908times119905 As suggested by [14] ReLU is used as anactivation function for this layer
where 119881119906119897 is user 119906rsquos 119897-th input word embedding and 119911119906119895 the119895-th feature extracted from the complete textBased on the standard CNN structure feature maps
produced by the convolution layers are reduced by a poolinglayer
wheremax-pooling is selected to identify the most importantwords and their latent values 119874119906 is the collection of allconcise features extracted from user 119906rsquos textual input Theseare followed by fully connected layers to jointlymodel the dif-ferent features and result in the latent textual representationsfor user 119906 and location 119901 respectively denoted as ℎ119906 and ℎ119901
ℎ119906 = 119877119890119871119880 (1198821199061 times 119874119906 + 1198871199061 ) (7)
To combine the outputs of the users and locations fullyconnected layers to the same feature space a shared layeris utilized It concatenates its two inputs and learns theirinteraction by employing an additional hidden layer
ℎ119903119890V119894119890119908119904 = 119877119890119871119880 (1198822 times [ℎ119906 ℎ119901] + 1198872) (8)
The two neural networks are then finally merged toproduce a prediction 119910119906119901 isin [0 1] The last layers of thetwo networks each representing a different view of the user-location interaction are concatenated and fed to yet anotherhidden layer responsible to blend the learning
119910119906119901 = 120590 (1198823 times [ℎ119888119900119899119905119890119909119905 ℎ119903119890V119894119890119908119904] + 1198873) (9)
where the sigmoid function was selected to transform thehidden layer output to the desired range of [0 1]32 Sequential Textual Modeling To further investigate thegain achieved by integrating a textual modeling componentover reviews in TCENR we suggest an extension denoted asTCENRseq Following its success in previous language model-ing tasks [17 18] and its ability to capture sentencesrsquo sequentialnature we employ anRNNcomponent to learn latent featuresfrom reviews An illustration of the proposed extension ispresented in Figure 2(b) while the CNN method used inthe vanilla TCENR is shown in Figure 2(a) to provide aconvenient base for comparisons More specifically we followthe findings of previous works [18 23] and implement ourrecurrent network using GRU an architecture that achievescompetitive performance compared to LSTM but with fewerparameters making it more efficient
where 119891119897 is the forget gate for input word 119897 119904119897 is the outputgate 119888119897 is the new candidate state combining the current wordembedding 119881119906119897 with the previous hidden state and ℎ119897 iscurrent state for word 119897modeled by the output gate ⊙ denotesthe element-wise product and 119882119891 119877119891 119882119904 119877119904 119882119888 and 119877119888are the GRU weight matrices while 119887119891 119887119904 and 119887119888 are the biasvectors
Since the context of a word can be determined byother preceding and successive words or sentences ourproposed method employs a bidirectional GRU over theuser embedding 119881119906 and the location 119881119901 Each word 119897rsquoshidden state is learned by forward and backward GRU layers
denoted as997888rarrℎ1119897 and larr997888ℎ1119897 respectively To learn a more concise
and combined representation of a word while taking intoaccount the context of all surrounding words we feed the
concatenation of997888rarrℎ1119897 andlarr997888ℎ1119897 to an additional bidirectionalGRU
layer such that its input for every word 119897 is 1198902119897 = [997888rarrℎ1119897 larr997888ℎ1119897 ]The second recurrent unit will output 119899 latent vectors eachis a sequentially infused representation of a word written bythe target user or about the candidate item To allow themethod of textual modeling to be the only variant betweenTCENR and TCENRseq and to further reduce the numberof learned parameters all modified word vectors will befed to the pooling and fully connected layers originallypresented in (6) and (7) respectively This will allow us todirectly determine the effect RNN has on textual modelingfor POI recommender systems compared to CNN as well asenabling the model to learn a more concise user and locationrepresentations As in TCENR the resulting vectors will bemerged in order to learn the user-location interaction
33 Training the Network To train the recommendationmodels we adopt a pointwise loss objective function as donein [2 3 14] where the difference between the prediction 119910119906119901and the actual value 119910119906119901 is minimized To address the implicitfeedback nature of LBSNs we sample a set of negative samplesfrom the dataset denoted as 119884
Due to the implicit feedback nature of the recommenda-tion task the algorithmrsquos output can be considered as a binaryclassification problem As the sigmoid activation function isbeing used over the last hidden layer the output probabilitycan be defined as
where 119864119906 and 119864119901 are the embedding layers for users andlocations respectively Similarly 119881119906 and 119881119901 are the textualreviews embedding layers and Θ119891 represents the modelparameters Taking the negative log-likelihood of p results
in the binary cross-entropy loss function for the predictionportion of the model
119871119901119903119890119889 = minus sum(119906119901)isin119884cup119884
119910119906119901 log119910119906119901+ (1 minus 119910119906119901) log (1 minus 119910119906119901)
(12)
As there are two more outputs in the model the usersrsquosocial network 120595c119864119906 and the locationsrsquo distance graph 120595119888119864119901two additional loss functions are required to train the net-work We follow the process done in [2] assuming two userswho share the same context should have similar embeddingsThis is achieved by minimizing the log-loss of the contextgiven the instance embedding
119871119906 119888119900119899119905119890119909119905= minus sum(119906119906119888)
log(120595119888119864119906 minus log sum1199061015840119888isin119862119906
exp (1205951198881015840119864119906)) (13)
where 120595119888119864119906 is as defined in (1) Taking the binary classlabel into account prompts the following loss functioncorrespondingwithminimizing the cross-entropy loss of useri and context c with respect to the y class label
where I is a function that returns 1 if y is in the given setand 0 otherwise The same logic is used to formulate the lossfunction for the POI context and will not be provided due tospace limitations
We simultaneously minimize the three loss functions119871119901119903e119889 119871119906 119888119900119899119905119890119909119905 and 119871119901 119888119900119899119905119890119909119905 The joint optimizationimproves the recommendation accuracy while enforcingsimilar representations for locations in close proximity andusers connected in the social network The loss functionsare combined using two hyper-parameters to weight thecontextual contribution
To optimize the combined loss function a method ofgradient descent can be adopted and more specifically weutilize the Adaptive Moment Estimation (Adam) [38] Thisoptimizer automatically adjusts the learning rate and yieldsfaster convergence than the standard gradient descent inaddition to making the learning rate optimization processmore efficient In order to avoid additional overfitting whentraining the model an early stopping criteria is integratedThe model parameters are initialized with Gaussian distri-bution while the output layerrsquos parameters are set to followuniform distribution
4 Experiments and Evaluation
41 Experimental Setup To evaluate our proposed algorithmwe use Yelprsquos real-world dataset (httpswwwyelpcomdata-setchallenge) It includes a subset of textual reviews along
Complexity 7
with the usersrsquo friends and the businesses locations Due tothe limited resources used in the model evaluation we choseto filter the dataset and keep only a concise subset whereall users and locations with less than 100 written reviews orless than 10 friends are removedThe filtered dataset includes141028 reviews and 9808 sparsity for the rating matrixThesocial and geographical graphs were constructed by randomwalks 10 of the original vertices were sampled as basenodes while 20 and 30 vertices were connected to each basenode for users and locations respectively with a window sizeof 3 To build the POI graph two locations are considereddirectly connected if they are up to 1 km apart
To test our modelsrsquo performance the original data wassplit to training-validation-test sets by random samplingwith the respective ratios of 56-24-20 resulting in78899 training instances In addition the input data was neg-atively sampled with 4 negative locations for every positiveone
To effectively compare our proposal with other alterna-tives we adopt the same settings as applied in [2 3] TheMLP input vectors are represented with an embedding size of10 while two hidden layers are added on top of the mergedresult Following the tower architecture where the size ofeach layer is half the size of its predecessor the numbers ofhidden units are 32 and 16 for the first and second layersrespectively
In the CNN component each word is represented by apretrained embedding layer with 50 units while the convo-lutional layer is constructed with a window size of 10 anda stride of 3 It results in 3 feature maps that are flattenedafter performing the max-pooling operation with a pool sizeof 2 The results are further modeled by a hidden layer with32 units Following the merge of the two hidden units theirinteraction is learned using another hidden layer with 8 unitsTo combine the three loss functions as described in (15) wefollow the results of [2] and set the hyperparameters 1205821 =1205822 = 01 For the training phase of the model a learning rateof 0005 was used over 50 maximum epochs and a batch sizeof 512 samples
42 Baselines To evaluate our algorithm we chose to com-pare it to these seven empirically proven frameworks
(vi) PACE [2] Preference and Context Embedding AMLP-based framework with the addition of contex-tual graphsrsquo smoothing for POI recommendation
(vii) DeepCoNN [14] Deep Cooperative Neural Net-works A CNN-based method that jointly learns anexplicit prediction by exploiting usersrsquo and locationsrsquonatural language reviews
For the task of evaluating ourmodel and the baselines wechose to apply Accuracy and Mean Square Error (MSE) overall n test samples as well as Precision (Pre10) and Recall(Rec10) for the average top 10 predictions per user
The proposed models were implemented using Keras(httpskerasio) on top of TensorFlow (httpswwwten-sorfloworg) backend All experiments were conducted usingNivida GTX 1070 GPU
43 Performance Evaluation The performance of our pro-posed algorithms and the seven baselines is reported inTable 1 along with the improvement ratio of TCENR overeach method in brackets The presented results are based onthe average of three individual executions
As can be witnessed from the results the proposedmodel TCENR achieves the best results overall comparedto all baselines Furthermore it was found to significantlyoutperformHPFNMF Geo-SAGE LCARS Pace and Deep-CoNN for 119901 lt 005 based on a one-sided unpaired t-testin terms of accuracy and MSE The contrasting results interms of precision and recall compared to NeuMF suggestthat TCENR offers lessbut more relevant recommendationsto the user While NMF provides the best precision scorecompared to all methods it underperforms in all othermeasures making it a less desirable model Taking a closerlook shows that surprisingly NeuMF outperforms PACE inaccuracy precision and recall This may be due to the less
Figure 3 Runtime (seconds) of all models on the Yelp dataset
sparse dataset tested which does not allow the contextualregularization to be fully harvested In addition the use ofonly the first 500words to represent the textual input for eachuser and location may explain the relatively low scores of theDeepCoNN model on the dataset while the performance ofGeo-SAGE and LCARS demonstrates that relying solely ongeographical data does not allow suchmodels to fully captureusersrsquo preferences in LBSNs
Comparing TCENR and its proposed extensionTCENRseq provides contrasting results By employingRNN instead of CNN to extract user and location featuresfrom textual reviews TCENRseq achieves lower error rateand improved precision score while accuracy and recallare worsened It may be considered that by accuratelycapturing different aspects from user reviews the modelis able to reinforce its hypotheses and therefore reducethe uncertainty in some cases However when faced witha contrast between textual aspects and the ground truthit might choose the wrong class label Nonetheless theresults demonstrate the importance of adopting the mostsuitable techniques and measures to learn different datatypes rather than employing a single method over all inputsMoreover it shows the positive impact of using textualdata in conjunction to historical activities The reportedperformance further suggests additional insight towards theselection of CNN and RNN for the task of language modelingin future recommendation tasks
To further evaluate our suggested frameworks and theseven baselines in terms of runtime the average time requiredto fully train each method is presented in Figure 3 Asdemonstrated by the results TCENR is competitivewithmostbaselines and found to be more efficient than DeepCoNNand LCARS The reported runtime of TCENRseq furtherdemonstrates the relative efficiency of CNN-based solutionsfor textual modeling tasks As the number of trainableparameters is increased due to the use of recurrent layers ourRNN-based extension takes 329 longer to train comparedto TCENR while achieving comparative results
44 Model Design Analysis In this section we discuss theeffect of several design selections over the suggested modelrsquosperformance
441Merge Layer The importance of themodelrsquos final layersresponsible for combining the dense output of both the MLPand convolutional networks requires a close attention as it
effects the networksrsquo ability to jointly learn and the predictionitself To properly select the fusion operator the followingmethods had been considered
(i) Combining the last hidden layers of the two modelsusing concatenation A model using this method willbe denoted as 119879119862119864119873119877119888119900119899 and described in (9)
(ii) Merging the last hidden layers using dot productresulting in a model named 119879119862119864119873119877119889119900119905 that can bedefined as
(iii) Combining the two previously described methodswhere the two representations will be jointly learnedby concatenation and dot productThe resultedmodelwill be denoted as 119879119862119864119873119877119889119900119905 119888119900119899 and can be devel-oped by combining (9) and (16) using addition andtranslating the result to a range of [0 1] with thesigmoid function
(iv) Adopting a weighted average for the prediction resultof the two networks Denoted as 119879119862119864119873119877119908119890119894119892ℎ119905 thismodel can be defined as
119910119906119901 = 1205821120590 (1198825 times ℎ119888119900119899119905119890119909119905 + 1198875)+ 1205822120590 (1198826 times ℎ119903119890V119894119890119908119904 + 1198876) (18)
As shown in Figure 4 adopting the more simple methodsof weighted average and dot product leads to an inferiorperformance of TCENR demonstrating the added valueof utilizing the latent features learned by each subnetworkjointly When combined with the underperforming methodof dot product in 119879119862119864119873119877119889119900119905 119888119900119899 the use of concatenationimproves over dot product alone However since the twomethods are integrated using a simple average employingonly concatenation as done in 119879119862119864119873119877119888119900119899 produces the bestresults and therefore integrated into the final model
442 MLP Layer Design Although it was found by [3] thatadding more layers and units to the MLP-based recom-mender has a positive effect the use of CNN and the addi-tional hidden layer suggests it is a subject worth investigating
Complexity 9
0812
0823
0802
0828
TCENRdot_con
TCENRweight
TCENRdot
TCENR con
Figure 4 Comparison of merging methods in terms of accuracy
Table 2 Modelrsquos accuracy with different layers
To this end we test the proposed algorithm with 1-4 hiddenlayers used to learn the user-item interaction with contextualregularization in varying sizes from 8 to 128 hidden unitsThe results in terms of test setrsquos accuracy are presented inTable 2 where the number of hidden layers is defined ascolumns and the size of the first unit is presented as rowsUnlike previous results we find that two hidden layers with32 and 16 hidden units result in the best performance for ourdataset
443 Number of Words The use of written reviews in theiroriginal order allows the strengths of CNN and RNN to beexploited by finding the best representation for every fewwords and eventually for the whole text Our final datasethowever is composed of very long reviews where to fullylearn a single user or location more than 20000 words arerequired making it computationally expensive to extract rel-evant representations To benefit from the sequential natureof the written reviews while keeping the solution feasible thenumber of words was limited to a range of 500-6000 As canbe witnessed from Figure 5 there is a slight improvement inaccuracy as the number of words increase up to 3000 whileadditional words result in an increased bias towards users andlocations with longer reviews and in turn reduce the modelrsquoslearning capabilities
5 Conclusion and Future Work
In this paper we developed a neural POI recommendersystem called TCENR The model exploits data about userslocations spatial data social networks and textual reviewsto predict the implicit preference of users regarding POIsTCENR models two types of user-location interactionsnative check-ins regularized by contextual information andthe words used to describe the usersrsquo experiences We furtherextended our proposed method and presented TCENRseqwhere textual data was modeled using RNN instead of CNNEvaluated over the Yelp dataset the proposed algorithms
0822
0825
0828
0831
0834
0 2000 4000 6000
Figure 5 Number of words comparison in terms of accuracy
consistently achieved superior results compared to sevenstate-of-the-art baselines in terms of accuracy and MSE
For future work we intend to extend our modelsrsquoevaluation over additional LBSN datasets In addition weplan to investigate the proposed frameworksrsquo contributionto the cold-start problem by analyzing its performance onadditional data while taking newusers and locationswith fewreviews into account
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was supported in part by the National NaturalScience Foundation of China Grant (61572289) and NSERCDiscovery Grants
References
[1] H Li Y Ge R Hong and H Zhu ldquoPoint-of-interest rec-ommendations learning potential check-ins from friendsrdquo inProceedings of the 22nd ACM SIGKDD International Conferenceon Knowledge Discovery And Data Mining pp 975ndash984 ACMSan Francisco California USA August 2016
[2] C Yang L Bai C Zhang Q Yuan and J Han ldquoBridgingcollaborative filtering and semi-supervised learning a neuralapproach for poi recommendationrdquo in Proceedings of the 23rdACM SIGKDD International Conference on Knowledge Discov-ery andDataMining pp 1245ndash1254 ACMHalifaxNS CanadaAugust 2017
[3] X He L Liao H Zhang L Nie X Hu and T-S Chua ldquoNeuralcollaborative filteringrdquo in Proceedings of the 26th InternationalConference onWorldWideWeb InternationalWorldWideWebConferences Steering Committee pp 173ndash182 Perth AustraliaApril 2017
[4] H Ma D Zhou C Liu M R Lyu and I King ldquoRecommendersystems with social regularizationrdquo in Proceedings of the 4thACM International Conference onWeb Search and DataMiningpp 287ndash296 ACM February 2011
10 Complexity
[5] W Wang H Yin L Chen Y Sun S Sadiq and X ZhouldquoGeo-sage a geographical sparse additive generative model forspatial item recommendationrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1255ndash1264 ACM Sydney NSW AustraliaAugust 2015
[6] H Yin Y Sun B Cui Z Hu and L Chen ldquoLcars a location-content-aware recommender systemrdquo in Proceedings of the 19thACM SIGKDD International Conference on Knowledge Discov-ery andDataMining pp 221ndash229 ACMChicago Illinois USAAugust 2013
[7] C Wang and D M Blei ldquoCollaborative topic modeling for rec-ommending scientific articlesrdquo in Proceedings of the 17th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo11) pp 448ndash456 ACM August 2011
[8] HWang NWang andD-Y Yeung ldquoCollaborative deep learn-ing for recommender systemsrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1235ndash1244 ACM Sydney NSW AustraliaAugust 2015
[9] J Manotumruksa C Macdonald and I Ounis ldquoA deep recur-rent collaborative filtering framework for venue recommen-dationrdquo in Proceedings of the 2017 ACM on Conference onInformation and KnowledgeManagement pp 1429ndash1438 ACMSingapore Singapore November 2017
[10] Q Liu S Wu D Wang Z Li and L Wang ldquoContext-awaresequential recommendationrdquo in Proceedings of the 2016 IEEE16th International Conference on Data Mining (ICDM) pp1053ndash1058 IEEE Barcelona Spain December 2016
[11] H-T Cheng L Koc J Harmsen et al ldquoWide amp deep learningfor recommender systemsrdquo inProceedings of the 1stWorkshop onDeep Learning for Recommender Systems pp 7ndash10 ACM 2016
[12] Y Yu L Zhang C Wang R Gao W Zhao and J JiangldquoNeural personalized ranking via poisson factormodel for itemrecommendationrdquoComplexity vol 2019 Article ID 3563674 16pages 2019
[13] A Van Den Oord S Dieleman and B Schrauwen ldquoDeepcontent-based music recommendationrdquo in Proceedings of the26th International Conference on Neural Information ProcessingSystems Advances in neural information processing systemspp 2643ndash2651 2013
[14] L Zheng V Noroozi and P S Yu ldquoJoint deepmodeling of usersand items using reviews for recommendationrdquo in Proceedingsof the Tenth ACM International Conference on Web Search andData Mining pp 425ndash434 ACM Cambridge UK Feburary2017
[15] D Kim C Park J Oh S Lee andH Yu ldquoConvolutional matrixfactorization for document context-aware recommendationrdquoin Proceedings of the 10th ACM Conference on RecommenderSystems pp 233ndash240 ACM 2016
[16] B Hidasi A Karatzoglou L Baltrunas and D Tikk ldquoSession-based recommendations with recurrent neural networksrdquo 2015httpsarxivorgabs151106939
[17] A Almahairi K Kastner K Cho and A Courville ldquoLearningdistributed representations from reviews for collaborative filter-ingrdquo in Proceedings of the 9th ACMConference on RecommenderSystems pp 147ndash154 ACM Vienna Austria September 2015
[18] T Bansal D Belanger and A McCallum ldquoAsk the gru multi-task learning for deep text recommendationsrdquo in Proceedings ofthe 10th ACMConference on Recommender Systems pp 107ndash114ACM 2016
[19] J Chen W Zhang P Zhang P Ying K Niu and M ZouldquoExploiting spatial and temporal for point of interest recom-mendationrdquoComplexity vol 2018 Article ID 6928605 16 pages2018
[20] P Zhao X Xu Y Liu V S Sheng K Zheng and H XiongldquoPhoto2trip exploiting visual contents in geo-tagged photos forpersonalized tour recommendationrdquo in Proceedings of the 2017ACM on Multimedia Conference - MM 17 pp 916ndash924 ACMPress Mountain View California USA October 2017
[21] P Covington J Adams and E Sargin ldquoDeep neural networksfor youtube recommendationsrdquo in Proceedings of the 10th ACMConference on Recommender Systems pp 191ndash198 ACM 2016
[22] C-Y Wu A Ahmed A Beutel A J Smola and H JingldquoRecurrent recommender networksrdquo in Proceedings of the TenthACM International Conference onWeb Search and DataMiningpp 495ndash503 ACM 2017
[23] A Beutel P Covington S Jain et al ldquoLatent cross making useof context in recurrent recommender systemsrdquo inProceedings ofthe Eleventh ACM International Conference on Web Search andDataMining pp 46ndash54 ACMMarinaDel Rey CA USA 2018
[24] H Yin W Wang H Wang L Chen and X Zhou ldquoSpatial-aware hierarchical collaborative deep learning for POI rec-ommendationrdquo IEEE Transactions on Knowledge and DataEngineering vol 29 no 11 pp 2537ndash2551 2017
[25] H Yin X Zhou Y Shao H Wang and S Sadiq ldquoJointmodeling of user check-in behaviors for point-of-interest rec-ommendationrdquo in Proceedings of the 24th ACM Internationalon Conference on Information and Knowledge Management pp1631ndash1640 ACMMelbourne Australia October 2015
[26] P Zhao X Xu Y Liu et al ldquoExploiting hierarchical structuresfor POI recommendationrdquo in Proceedings of the 2017 IEEEInternational Conference on Data Mining (ICDM) IEEE NewOrleans LA USA November 2017
[27] P Zhao H Zhu Y Liu et al ldquoWhere to go next a spatio-temporal gated network for next poi recommendationrdquo inProceedings of the 33rdAAAIConference onArtificial Intelligence(AAAI 2019) 2019
[28] HWang F Zhang X Xie andMGuo ldquoDkn deep knowledge-aware network for news recommendationrdquo in Proceedings of the2018World Wide Web Conference pp 1835ndash1844 Lyon FranceApril 2018
[29] Y Gong and Q Zhang ldquoHashtag recommendation usingattention-based convolutional neural networkrdquo in Proceedingsof the 25th International Joint Conference on Artificial Intelli-gence IJCAI 2016 pp 2782ndash2788 NY USA July 2016
[30] Y Tay A T Luu and S C Hui ldquoMulti-pointer co-attentionnetworks for recommendationrdquo in Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Discovery ampData Mining pp 2309ndash2318 London UK August 2018
[31] Z Cheng Y Ding L Zhu and M Kankanhalli ldquoAspect-awarelatent factormodel rating prediction with ratings and reviewsrdquoin Proceedings of the 2018World WideWeb Conference pp 639ndash648 Lyon France April 2018
[32] D Tang BQin T Liu andY Yang ldquoUsermodeling with neuralnetwork for review rating predictionrdquo in Proceedings of the 24thInternational Conference on Artificial Intelligence IJCAI 2015pp 1340ndash1346 Argentina July 2015
[33] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2324 1998
[34] Y Kim ldquoConvolutional neural networks for sentence classi-ficationrdquo in Proceedings of the 2014 Conference on Empirical
Complexity 11
Methods in Natural Language Processing (EMNLP) pp 1746ndash1751 Association for Computational Linguistics Doha Qatar2014 httpsaclanthologyinfopapersD14-1181d14-1181
[35] Y Zhang Q Ai X Chen andW B Croft ldquoJoint representationlearning for top-N recommendation with heterogeneous infor-mation sourcesrdquo in Proceedings of the 2017 ACM on Conferenceon Information and Knowledge Management pp 1449ndash1458ACM Singapore Singapore November 2017
[36] J Pennington R Socher and C Manning ldquoGloVe globalvectors for word representationrdquo in Proceedings of the 2014 Con-ference on Empirical Methods in Natural Language Processing(EMNLP) pp 1532ndash1543 2014
[37] T Mikolov I Sutskever K Chen G S Corrado and JDean ldquoDistributed representations of words and phrases andtheir compositionalityrdquo in Proceedings of the 26th Interna-tional Conference on Neural Information Processing SystemsAdvances in neural information processing systems pp 3111ndash3119 Lake TahoeNevada 2013 httpsdlacmorgcitationcfmid=2999959
[38] D P Kingma and J Ba ldquoAdam a method for stochasticoptimizationrdquo 2014 httpsarxivorgabs14126980
[39] P Gopalan J M Hofman and DM Blei ldquoScalable recommen-dationwith hierarchical Poisson factorizationrdquo inProceedings ofthe 31st Conference on Uncertainty in Artificial Intelligence UAI2015 pp 326ndash335 Netherlands July 2015
[40] X Luo M Zhou Y Xia and Q Zhu ldquoAn efficient non-negativematrix-factorization-based approach to collaborative filteringfor recommender systemsrdquo IEEE Transactions on IndustrialInformatics vol 10 no 2 pp 1273ndash1284 2014
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
2 Complexity
locations that have few or no past interactions known tothe system as available data is insufficient to fully capturetheir features Second in contrast to other recommendationscenarios the decision whether to visit a location dependsnot only on the target userrsquos preferences but on that ofher friends [1] They might share different interests that areunknown to the system resulting in amore complex decisionprocess for the RS to learn Furthermore attributes that relateto the location itself and are unknown to the system mayhave impact on the decision For example the availability ofparking or convenient public transport or the popularity ofthe area itself can be the decisive factor for a user Thesechallenges proved the classic CF methods to be insufficientand prone to overfit the data as reported by previous works[2 3]
A common approach to address sparsity while acknowl-edging the userrsquos deriving factors is by utilizing contextualdata such as social networks [2 4] geographical locations[5 6] categories [1] and textual reviews [7 8] Deep neuralnetworks have been enthusiastically adopted in recent years[2 9 10] to employ such complex contextual inputs whilebeing able to process the vast amount of available dataDifferent neural network architectures have been introducedto improve the recommendation performance such as mul-tilayer perceptrons (MLP) [3 11 12] convolutional neuralnetworks (CNN) [13ndash15] and recurrent neural networks(RNN) [16ndash18] These techniques were shown to highlybenefit from the addition of contextual attributes As moredata becomes available incorporating these features presentsa promising opportunity to improve the personalized POIrecommendation process as will be demonstrated in thispaper
Although numerous works established the potential ofrecommender systems based on deep learning most havefocused on only a single type of neural network that bestsuited their given task However in this work we intendto incorporate multiple types of neural networks ie MLPCNN and RNN to provide POI recommendation usingvarious types of inputs First we will describe our proposedmethod Textual and Contextual Embedding-based NeuralRecommender (TCENR) a framework that takes usersrsquo socialnetwork locationsrsquo geographical data and textual reviewsalong with historical activities to provide personalized top-k POI recommendations By optimizing MLP and CNNjointly the proposed model will learn different aspects of thesame user-location interaction in conjunction and thereforewill be better suited to capture the underlying factors inthe userrsquos selection We will then present an extensionof TCENR denoted as TCENRseq where we replace theCNN component with that of an RNN and attempt tolearn user preferences and location attributes by treatingthe written reviews as a sequential input rather than byfocusing on their most important words Although theproposed solution has been developed to provide recom-mendations for specific types of inputs we claim it can beeasily generalized to a framework able to support additionalfeatures
The main contributions of our work are as follows
(1) We present TCENR a framework that jointly trainsMLP and CNN to provide POI recommendationsas well as a variation TCENRseq that performs thesame task while adopting RNN instead of the CNNcomponent
(2) To the best of our knowledge no work has been donein jointly training MLP and CNN for the task of POIrecommendation using social networks geographicallocations and natural language reviews as inputs
(3) Evaluated over the Yelp dataset our proposed frame-works were found to outperform seven state-of-the-art baselines in terms of accuracyMSE precision andrecall
(4) By comparing the two alternatives to our suggestedmodel we provide insight into the impact gained byanalyzing textual reviews as a secondary input to thecommon past interactions as well as a comparison ofCNN and RNN for the task of sentiment analysis inthe same experimental settings
(5) We further present comprehensive analysis over themost important hyper-parameters and design selec-tions of our proposed networks shedding some lightover the different components of deep neural net-works in the task of POI recommendation
In the following section we first lay the background andintroduce relatedworks to ourmodel In Section 3 we developour proposed frameworks which are evaluated and analyzedin Section 4 Finally we summarize our work and introducefuture work in Section 5
2 Related Work
Incorporating additional data about users and items is acommon strategy to provide meaningful recommendationsand to mitigate the cold-start problem In the area of POIrecommender systems where the data is highly sparse suchpractices are essential
21 Context-Based Recommender Systems Location-basedsocial networks are usually rich in contextual inputs whichpresent various opportunities for data enrichment in RSSuch features include time [9] spatial location [19] userrsquossocial network [4] itemrsquos meta-data [1] photos [20] anddemographics [11]
Contextual data is usually incorporated into RS eitheras part of the input or as a regularizing factor Reference[11 21] exploits the strengths of MLP-based networks inmodeling complex relationships by concatenating multiplefeature embeddings to the input before feeding it to aseries of nonlinear layers The final layerrsquos output is thena representation of a user-item interaction adjusted to thegiven context Sequentiality is very often introduced torecommender systems by treating sequences of past user anditem interactions in a timely manner In the case of POIrecommendations it consists of feeding sequences of check-ins to RNN-based layers such as long-short term memory(LSTM) [22] or themore concise gated recurrent units (GRU)[23]
Complexity 3
The use of spatial data is often done by dividing theinput space into roles and regions Assuming usersrsquo behaviorvaries when traveling far from home previous works [5 6]generated two profiles for each user one to be used in herhome region while another in more distant locations Arecent approach [24ndash26] attempts to divide the input spaceinto geographical regions before incorporated into themodeloften by hierarchical structures Although methods based onregions and roles are able to better distinguish user behaviorsin varied locations they do not provide a personalized userrepresentation and can ignore potential shifts of preferencesfrom one region to another For example a user might preferto visit a Starbucks location in different regions close or farfrom home Enriching geographical features with additionaldata is demonstrated in [25 27] where the next locationprediction is partially determined by past sequences of check-ins However these methods are not generic and cannot beextended to include additional features derived from socialnetworks or textual data
Furthermore in case of tasks in highly sparse environ-ments such as POI recommendation adding user or itemspecific inputs may diminish the modelrsquos ability to generalizeHowever applying the same data as a regulating factor canenhance the modelrsquos performance and reduce over-fittingSuch has been done in [4] where the similarity betweenconnected users in the social network was used to constraina matrix factorization (MF) model Reference [2] utilizedsocial networks and geographical distances to enforce similarembeddings for users and locations thus improving themodelrsquos ability to generalize for users and locations with fewhistorical records
22 Text-Based Recommendation Since many websites en-courage users to provide a written explanation to theirnumeric ratings textual reviews are one of the most populartypes of data to be integrated into RS Previous works hadadopted probabilistic-based approaches to alleviate the datasparsity problem using textual input [28ndash31] By expressingeach review as a bag of words LDA-based models are ableto extract topics which can be used to represent usersrsquointerests and locationsrsquo characteristics [5 6 25] These prob-abilistic methods are usually successful in handling issuesthat standard CF approaches struggle with such as out-of-town recommendations where similar users lack sufficienthistorical data However as demonstrated in recent works[17 32] failing to preserve the original order of wordsand ignoring their semantic meaning prevent the successfulmodeling of a given review On the other hand an emergingtrend of adopting neural networks over reviews allows suchlearning without the loss of data These implementations cangenerally be categorized into RNN-based [17 18] and CNN-based [14 15] models
RNN-based recommender systems usually rely on thesequential structure of sentences to learn their meaningIn [18] the words describing a target item are fed to abidirectional RNN layer Following the GRU architecture themodel utilizes an accumulated context from successive wordsto provide better representation of each word To preserveand update the context of words the recurrent model has to
manage a large number of parameters The problem becomesmore prominent when adopting the popular LSTMparadigmas done in [17]
Following its success in the field of computer vision [33]CNN-based models are gaining popularity in other areassuch as textualmodeling [34] By employing a slidingwindowover a given document such networks are able to representdifferent features foundwithin the text by identifying relevantsubsets of words Reference [15] follows the standard CNNstructure comprised of an embedding to represent thesemantic meaning of each word a convolution layer forgenerating local features and a max pooling operation toidentify the most relevant factors In [14] two CNNs aredeveloped to represent the target user and item based on theirreviews The resulting vectors are regarded as the user anditem representations which are then fed to a nonlinear layerto learn their corresponding rating
We claim that by jointly learning contextual and textualbased deep models our proposed method will better exploitthe strengths of collaborative filtering while being moreresilient to its shortcomings in sparse scenarios This will beachieved by learning usersrsquo and locationsrsquo representations assimilarities in direct interactions along with the correlationin underlying features extracted from their written reviewsThe notable work of [35] proposed JRL a framework thatsimilarly attempts to jointly optimize multiple models whereeach is responsible for learning a unique perspective of thesame task by focusing on different inputs However whileJRL is a general framework focused on extendability to manytypes of input our proposed method is tailored for the taskof POI recommendation
3 The Neural Recommender
31 Neural Network Architecture The following recom-mender system aims to improve the POI recommendationtask by learning user-location interactions using two parallelneural networks as shown in Figure 1 The context-basednetwork presented in the left part of the figure is designed tomodel the user-POI preferences using social and geographi-cal attributes and based on a multilayer perceptron structure[2 3] Shown in the right side of Figure 1 the convolutionalneural network is responsible for the textual modeling unit[14] It attempts to learn the same preference by analyzing theunderlying meaning in usersrsquo and locationsrsquo reviews Each ofthe two networks is based on modeling the userPOI inputindividually with regard to their shared interaction definedin the merge layers
311 Context-Based Network Layers To better capture thecomplex relations between users and locations in LBSN wechose to adopt the multilayer perceptron architecture Bystacking multiple layers of nonlinearities MLP is capable oflearning relevant latent factors of its inputs It is first fedwith user and location vectors of sizes N and M whereeach input tuple ⟨119906 119901⟩ is transformed into sparse one-hotencoding representations The two fully connected embed-ding layers found on top of the input layer project the sparserepresentations of users and locations into smaller and denser
4 Complexity
Prediction y
Hidden Layer
Hidden Layer
Merge Layer
Merge Layer
Hidden Layer
Merge Layer
oplus oplus
oplus oplusoplus oplus
User GraphOutput gu
POI GraphOutput gp
Somax Layer(cEu)
Somax Layer(cEp)
User Embedding(Eu)
POI Embedding(Ep)
User u POI p
Full Connected (ℎu) Fully Connected (ℎp)
Pooling Layer (Ou) Pooling Layer (Op)
Conv Layer (Zu) Conv Layer (Zp)
Word Embedding (Vu) Word Embedding (Vp)
User Reviews (du) POI Reviews (dp)
Figure 1 TCENR Framework
vectors For 119906 and 119901 the respective embedding matrices are119864119906 isin R119896119906times119873 and 119864119901 isin R119896119901times119872 where 119896119906 and 119896119901 are thecorresponding dimensions
We exploit the usersrsquo social networks along with the loca-tionsrsquo geographical graphs to constrain the learned embed-dings Two softmax layers take the representations 119864119906 and 119864119901as input and transform them back to N and M sized vectorsrespectively The user output layer 120595119888119864119906 isin R119873 describes thesimilarity of a userrsquos embedding to all N users and can beformally denoted as
120595119888119864119906 = 119886 (119882119906119888 times 119864119906 + 119887119906119888 ) (1)
where 119882119906119888 and 119887119906119888 are the layerrsquos weight matrix and biasvector and a is a nonlinear activation function Due tothe similarity between the user and POI specific layers thelocation output layer 120595119888119864119901 will not be developed in thissection Enforcing 120595119888119864119906 and 120595119888119864119901 to resemble user ursquos socialgraph 119892119906 and location prsquos spatial graph 119892119901 respectivelyresults in a smoothing factor that limits the amount by whichconnected entitiesrsquo embeddings differ
The user and location embeddings are projected to amerge layer and combined by a concatenation operatorUsing concatenation instead of dot-product allows variedembedding structures which in turn improves the generatedrepresentations [3] As the input layer for the following neuralnetwork the merge layer can be represented as ℎ(0)(119909) where
119909 = [119864119906 119864119901] (2)
Since simple concatenation of the user-location embeddedvectors does not allow for interactions to bemodeled hiddenlayers are added to learn these connections The popularRectified Linear Units (ReLU) is employed as the activationfunction for these layers More formally the q-th hidden layercan be defined as
where119882119902 and 119887119902 are the q-th layer parameters
312 Textual Modeling Network Layers To improve themodelrsquos coverage a textual-based network is introduced Itsimultaneously learns the same interaction as the contextual-based network but with a natural language input Twoadditional vectors 119889119906 and 119889119901 representing user ursquos andlocation prsquos textual reviews respectively are applied as inputsfor this network Each vector is comprised of all n wordswritten by the user or about the location merged togetherkept in their original order These words are then mappedto c-dimensional vectors defining their semantic meaningin the following embedding layers The output of the userembedding layer is the representation of all words used bya user u in the form of a matrix and can be denoted as
where [1206011 1206012] denotes the concatenation of two vectors 119889119906119897 isuser 119906rsquos 119897-th word and 120601 119863 997888rarr R119896119908 is a lookup function to
Complexity 5
Fully Connected
Max Pooling
ConvolutionalLayers
WordEmbeddings
delicious healthy food steak is amazing
(a) Textual modeling component using CNN
Fully Connected
BI-directionalGRU
Max Pooling
WordEmbeddings
delicious healthy food steak is amazing
(b) Textual modeling component using RNN
Figure 2 Proposed alternatives to learn user and location representations from textual reviews 2(a) is a CNN-based solution employed inTCENR while 2(b) illustrates the suggested extension using RNN
a pre-trained textual embedding layer [36 37] that representseach word in vocabulary 119863 as a vector in size 119896119908 Similarly119881119901 denotes the word embedding matrix for location 119901
Due to the large amount of parameters required to trainthe aforementioned contextual model the textual network isimplemented using a CNN-based architecture which is usu-ally more computationally efficient than RNNThe semanticrepresentations of usersrsquo and locationsrsquo reviews are fed toconvolution layers to detect parts of the text that best capturethe reviewrsquos meaning These layers produce 119905 feature mapsover the embedded word vector using a window size of 119908119904and filter119870 isin R119896119908times119905 As suggested by [14] ReLU is used as anactivation function for this layer
where 119881119906119897 is user 119906rsquos 119897-th input word embedding and 119911119906119895 the119895-th feature extracted from the complete textBased on the standard CNN structure feature maps
produced by the convolution layers are reduced by a poolinglayer
wheremax-pooling is selected to identify the most importantwords and their latent values 119874119906 is the collection of allconcise features extracted from user 119906rsquos textual input Theseare followed by fully connected layers to jointlymodel the dif-ferent features and result in the latent textual representationsfor user 119906 and location 119901 respectively denoted as ℎ119906 and ℎ119901
ℎ119906 = 119877119890119871119880 (1198821199061 times 119874119906 + 1198871199061 ) (7)
To combine the outputs of the users and locations fullyconnected layers to the same feature space a shared layeris utilized It concatenates its two inputs and learns theirinteraction by employing an additional hidden layer
ℎ119903119890V119894119890119908119904 = 119877119890119871119880 (1198822 times [ℎ119906 ℎ119901] + 1198872) (8)
The two neural networks are then finally merged toproduce a prediction 119910119906119901 isin [0 1] The last layers of thetwo networks each representing a different view of the user-location interaction are concatenated and fed to yet anotherhidden layer responsible to blend the learning
119910119906119901 = 120590 (1198823 times [ℎ119888119900119899119905119890119909119905 ℎ119903119890V119894119890119908119904] + 1198873) (9)
where the sigmoid function was selected to transform thehidden layer output to the desired range of [0 1]32 Sequential Textual Modeling To further investigate thegain achieved by integrating a textual modeling componentover reviews in TCENR we suggest an extension denoted asTCENRseq Following its success in previous language model-ing tasks [17 18] and its ability to capture sentencesrsquo sequentialnature we employ anRNNcomponent to learn latent featuresfrom reviews An illustration of the proposed extension ispresented in Figure 2(b) while the CNN method used inthe vanilla TCENR is shown in Figure 2(a) to provide aconvenient base for comparisons More specifically we followthe findings of previous works [18 23] and implement ourrecurrent network using GRU an architecture that achievescompetitive performance compared to LSTM but with fewerparameters making it more efficient
where 119891119897 is the forget gate for input word 119897 119904119897 is the outputgate 119888119897 is the new candidate state combining the current wordembedding 119881119906119897 with the previous hidden state and ℎ119897 iscurrent state for word 119897modeled by the output gate ⊙ denotesthe element-wise product and 119882119891 119877119891 119882119904 119877119904 119882119888 and 119877119888are the GRU weight matrices while 119887119891 119887119904 and 119887119888 are the biasvectors
Since the context of a word can be determined byother preceding and successive words or sentences ourproposed method employs a bidirectional GRU over theuser embedding 119881119906 and the location 119881119901 Each word 119897rsquoshidden state is learned by forward and backward GRU layers
denoted as997888rarrℎ1119897 and larr997888ℎ1119897 respectively To learn a more concise
and combined representation of a word while taking intoaccount the context of all surrounding words we feed the
concatenation of997888rarrℎ1119897 andlarr997888ℎ1119897 to an additional bidirectionalGRU
layer such that its input for every word 119897 is 1198902119897 = [997888rarrℎ1119897 larr997888ℎ1119897 ]The second recurrent unit will output 119899 latent vectors eachis a sequentially infused representation of a word written bythe target user or about the candidate item To allow themethod of textual modeling to be the only variant betweenTCENR and TCENRseq and to further reduce the numberof learned parameters all modified word vectors will befed to the pooling and fully connected layers originallypresented in (6) and (7) respectively This will allow us todirectly determine the effect RNN has on textual modelingfor POI recommender systems compared to CNN as well asenabling the model to learn a more concise user and locationrepresentations As in TCENR the resulting vectors will bemerged in order to learn the user-location interaction
33 Training the Network To train the recommendationmodels we adopt a pointwise loss objective function as donein [2 3 14] where the difference between the prediction 119910119906119901and the actual value 119910119906119901 is minimized To address the implicitfeedback nature of LBSNs we sample a set of negative samplesfrom the dataset denoted as 119884
Due to the implicit feedback nature of the recommenda-tion task the algorithmrsquos output can be considered as a binaryclassification problem As the sigmoid activation function isbeing used over the last hidden layer the output probabilitycan be defined as
where 119864119906 and 119864119901 are the embedding layers for users andlocations respectively Similarly 119881119906 and 119881119901 are the textualreviews embedding layers and Θ119891 represents the modelparameters Taking the negative log-likelihood of p results
in the binary cross-entropy loss function for the predictionportion of the model
119871119901119903119890119889 = minus sum(119906119901)isin119884cup119884
119910119906119901 log119910119906119901+ (1 minus 119910119906119901) log (1 minus 119910119906119901)
(12)
As there are two more outputs in the model the usersrsquosocial network 120595c119864119906 and the locationsrsquo distance graph 120595119888119864119901two additional loss functions are required to train the net-work We follow the process done in [2] assuming two userswho share the same context should have similar embeddingsThis is achieved by minimizing the log-loss of the contextgiven the instance embedding
119871119906 119888119900119899119905119890119909119905= minus sum(119906119906119888)
log(120595119888119864119906 minus log sum1199061015840119888isin119862119906
exp (1205951198881015840119864119906)) (13)
where 120595119888119864119906 is as defined in (1) Taking the binary classlabel into account prompts the following loss functioncorrespondingwithminimizing the cross-entropy loss of useri and context c with respect to the y class label
where I is a function that returns 1 if y is in the given setand 0 otherwise The same logic is used to formulate the lossfunction for the POI context and will not be provided due tospace limitations
We simultaneously minimize the three loss functions119871119901119903e119889 119871119906 119888119900119899119905119890119909119905 and 119871119901 119888119900119899119905119890119909119905 The joint optimizationimproves the recommendation accuracy while enforcingsimilar representations for locations in close proximity andusers connected in the social network The loss functionsare combined using two hyper-parameters to weight thecontextual contribution
To optimize the combined loss function a method ofgradient descent can be adopted and more specifically weutilize the Adaptive Moment Estimation (Adam) [38] Thisoptimizer automatically adjusts the learning rate and yieldsfaster convergence than the standard gradient descent inaddition to making the learning rate optimization processmore efficient In order to avoid additional overfitting whentraining the model an early stopping criteria is integratedThe model parameters are initialized with Gaussian distri-bution while the output layerrsquos parameters are set to followuniform distribution
4 Experiments and Evaluation
41 Experimental Setup To evaluate our proposed algorithmwe use Yelprsquos real-world dataset (httpswwwyelpcomdata-setchallenge) It includes a subset of textual reviews along
Complexity 7
with the usersrsquo friends and the businesses locations Due tothe limited resources used in the model evaluation we choseto filter the dataset and keep only a concise subset whereall users and locations with less than 100 written reviews orless than 10 friends are removedThe filtered dataset includes141028 reviews and 9808 sparsity for the rating matrixThesocial and geographical graphs were constructed by randomwalks 10 of the original vertices were sampled as basenodes while 20 and 30 vertices were connected to each basenode for users and locations respectively with a window sizeof 3 To build the POI graph two locations are considereddirectly connected if they are up to 1 km apart
To test our modelsrsquo performance the original data wassplit to training-validation-test sets by random samplingwith the respective ratios of 56-24-20 resulting in78899 training instances In addition the input data was neg-atively sampled with 4 negative locations for every positiveone
To effectively compare our proposal with other alterna-tives we adopt the same settings as applied in [2 3] TheMLP input vectors are represented with an embedding size of10 while two hidden layers are added on top of the mergedresult Following the tower architecture where the size ofeach layer is half the size of its predecessor the numbers ofhidden units are 32 and 16 for the first and second layersrespectively
In the CNN component each word is represented by apretrained embedding layer with 50 units while the convo-lutional layer is constructed with a window size of 10 anda stride of 3 It results in 3 feature maps that are flattenedafter performing the max-pooling operation with a pool sizeof 2 The results are further modeled by a hidden layer with32 units Following the merge of the two hidden units theirinteraction is learned using another hidden layer with 8 unitsTo combine the three loss functions as described in (15) wefollow the results of [2] and set the hyperparameters 1205821 =1205822 = 01 For the training phase of the model a learning rateof 0005 was used over 50 maximum epochs and a batch sizeof 512 samples
42 Baselines To evaluate our algorithm we chose to com-pare it to these seven empirically proven frameworks
(vi) PACE [2] Preference and Context Embedding AMLP-based framework with the addition of contex-tual graphsrsquo smoothing for POI recommendation
(vii) DeepCoNN [14] Deep Cooperative Neural Net-works A CNN-based method that jointly learns anexplicit prediction by exploiting usersrsquo and locationsrsquonatural language reviews
For the task of evaluating ourmodel and the baselines wechose to apply Accuracy and Mean Square Error (MSE) overall n test samples as well as Precision (Pre10) and Recall(Rec10) for the average top 10 predictions per user
The proposed models were implemented using Keras(httpskerasio) on top of TensorFlow (httpswwwten-sorfloworg) backend All experiments were conducted usingNivida GTX 1070 GPU
43 Performance Evaluation The performance of our pro-posed algorithms and the seven baselines is reported inTable 1 along with the improvement ratio of TCENR overeach method in brackets The presented results are based onthe average of three individual executions
As can be witnessed from the results the proposedmodel TCENR achieves the best results overall comparedto all baselines Furthermore it was found to significantlyoutperformHPFNMF Geo-SAGE LCARS Pace and Deep-CoNN for 119901 lt 005 based on a one-sided unpaired t-testin terms of accuracy and MSE The contrasting results interms of precision and recall compared to NeuMF suggestthat TCENR offers lessbut more relevant recommendationsto the user While NMF provides the best precision scorecompared to all methods it underperforms in all othermeasures making it a less desirable model Taking a closerlook shows that surprisingly NeuMF outperforms PACE inaccuracy precision and recall This may be due to the less
Figure 3 Runtime (seconds) of all models on the Yelp dataset
sparse dataset tested which does not allow the contextualregularization to be fully harvested In addition the use ofonly the first 500words to represent the textual input for eachuser and location may explain the relatively low scores of theDeepCoNN model on the dataset while the performance ofGeo-SAGE and LCARS demonstrates that relying solely ongeographical data does not allow suchmodels to fully captureusersrsquo preferences in LBSNs
Comparing TCENR and its proposed extensionTCENRseq provides contrasting results By employingRNN instead of CNN to extract user and location featuresfrom textual reviews TCENRseq achieves lower error rateand improved precision score while accuracy and recallare worsened It may be considered that by accuratelycapturing different aspects from user reviews the modelis able to reinforce its hypotheses and therefore reducethe uncertainty in some cases However when faced witha contrast between textual aspects and the ground truthit might choose the wrong class label Nonetheless theresults demonstrate the importance of adopting the mostsuitable techniques and measures to learn different datatypes rather than employing a single method over all inputsMoreover it shows the positive impact of using textualdata in conjunction to historical activities The reportedperformance further suggests additional insight towards theselection of CNN and RNN for the task of language modelingin future recommendation tasks
To further evaluate our suggested frameworks and theseven baselines in terms of runtime the average time requiredto fully train each method is presented in Figure 3 Asdemonstrated by the results TCENR is competitivewithmostbaselines and found to be more efficient than DeepCoNNand LCARS The reported runtime of TCENRseq furtherdemonstrates the relative efficiency of CNN-based solutionsfor textual modeling tasks As the number of trainableparameters is increased due to the use of recurrent layers ourRNN-based extension takes 329 longer to train comparedto TCENR while achieving comparative results
44 Model Design Analysis In this section we discuss theeffect of several design selections over the suggested modelrsquosperformance
441Merge Layer The importance of themodelrsquos final layersresponsible for combining the dense output of both the MLPand convolutional networks requires a close attention as it
effects the networksrsquo ability to jointly learn and the predictionitself To properly select the fusion operator the followingmethods had been considered
(i) Combining the last hidden layers of the two modelsusing concatenation A model using this method willbe denoted as 119879119862119864119873119877119888119900119899 and described in (9)
(ii) Merging the last hidden layers using dot productresulting in a model named 119879119862119864119873119877119889119900119905 that can bedefined as
(iii) Combining the two previously described methodswhere the two representations will be jointly learnedby concatenation and dot productThe resultedmodelwill be denoted as 119879119862119864119873119877119889119900119905 119888119900119899 and can be devel-oped by combining (9) and (16) using addition andtranslating the result to a range of [0 1] with thesigmoid function
(iv) Adopting a weighted average for the prediction resultof the two networks Denoted as 119879119862119864119873119877119908119890119894119892ℎ119905 thismodel can be defined as
119910119906119901 = 1205821120590 (1198825 times ℎ119888119900119899119905119890119909119905 + 1198875)+ 1205822120590 (1198826 times ℎ119903119890V119894119890119908119904 + 1198876) (18)
As shown in Figure 4 adopting the more simple methodsof weighted average and dot product leads to an inferiorperformance of TCENR demonstrating the added valueof utilizing the latent features learned by each subnetworkjointly When combined with the underperforming methodof dot product in 119879119862119864119873119877119889119900119905 119888119900119899 the use of concatenationimproves over dot product alone However since the twomethods are integrated using a simple average employingonly concatenation as done in 119879119862119864119873119877119888119900119899 produces the bestresults and therefore integrated into the final model
442 MLP Layer Design Although it was found by [3] thatadding more layers and units to the MLP-based recom-mender has a positive effect the use of CNN and the addi-tional hidden layer suggests it is a subject worth investigating
Complexity 9
0812
0823
0802
0828
TCENRdot_con
TCENRweight
TCENRdot
TCENR con
Figure 4 Comparison of merging methods in terms of accuracy
Table 2 Modelrsquos accuracy with different layers
To this end we test the proposed algorithm with 1-4 hiddenlayers used to learn the user-item interaction with contextualregularization in varying sizes from 8 to 128 hidden unitsThe results in terms of test setrsquos accuracy are presented inTable 2 where the number of hidden layers is defined ascolumns and the size of the first unit is presented as rowsUnlike previous results we find that two hidden layers with32 and 16 hidden units result in the best performance for ourdataset
443 Number of Words The use of written reviews in theiroriginal order allows the strengths of CNN and RNN to beexploited by finding the best representation for every fewwords and eventually for the whole text Our final datasethowever is composed of very long reviews where to fullylearn a single user or location more than 20000 words arerequired making it computationally expensive to extract rel-evant representations To benefit from the sequential natureof the written reviews while keeping the solution feasible thenumber of words was limited to a range of 500-6000 As canbe witnessed from Figure 5 there is a slight improvement inaccuracy as the number of words increase up to 3000 whileadditional words result in an increased bias towards users andlocations with longer reviews and in turn reduce the modelrsquoslearning capabilities
5 Conclusion and Future Work
In this paper we developed a neural POI recommendersystem called TCENR The model exploits data about userslocations spatial data social networks and textual reviewsto predict the implicit preference of users regarding POIsTCENR models two types of user-location interactionsnative check-ins regularized by contextual information andthe words used to describe the usersrsquo experiences We furtherextended our proposed method and presented TCENRseqwhere textual data was modeled using RNN instead of CNNEvaluated over the Yelp dataset the proposed algorithms
0822
0825
0828
0831
0834
0 2000 4000 6000
Figure 5 Number of words comparison in terms of accuracy
consistently achieved superior results compared to sevenstate-of-the-art baselines in terms of accuracy and MSE
For future work we intend to extend our modelsrsquoevaluation over additional LBSN datasets In addition weplan to investigate the proposed frameworksrsquo contributionto the cold-start problem by analyzing its performance onadditional data while taking newusers and locationswith fewreviews into account
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was supported in part by the National NaturalScience Foundation of China Grant (61572289) and NSERCDiscovery Grants
References
[1] H Li Y Ge R Hong and H Zhu ldquoPoint-of-interest rec-ommendations learning potential check-ins from friendsrdquo inProceedings of the 22nd ACM SIGKDD International Conferenceon Knowledge Discovery And Data Mining pp 975ndash984 ACMSan Francisco California USA August 2016
[2] C Yang L Bai C Zhang Q Yuan and J Han ldquoBridgingcollaborative filtering and semi-supervised learning a neuralapproach for poi recommendationrdquo in Proceedings of the 23rdACM SIGKDD International Conference on Knowledge Discov-ery andDataMining pp 1245ndash1254 ACMHalifaxNS CanadaAugust 2017
[3] X He L Liao H Zhang L Nie X Hu and T-S Chua ldquoNeuralcollaborative filteringrdquo in Proceedings of the 26th InternationalConference onWorldWideWeb InternationalWorldWideWebConferences Steering Committee pp 173ndash182 Perth AustraliaApril 2017
[4] H Ma D Zhou C Liu M R Lyu and I King ldquoRecommendersystems with social regularizationrdquo in Proceedings of the 4thACM International Conference onWeb Search and DataMiningpp 287ndash296 ACM February 2011
10 Complexity
[5] W Wang H Yin L Chen Y Sun S Sadiq and X ZhouldquoGeo-sage a geographical sparse additive generative model forspatial item recommendationrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1255ndash1264 ACM Sydney NSW AustraliaAugust 2015
[6] H Yin Y Sun B Cui Z Hu and L Chen ldquoLcars a location-content-aware recommender systemrdquo in Proceedings of the 19thACM SIGKDD International Conference on Knowledge Discov-ery andDataMining pp 221ndash229 ACMChicago Illinois USAAugust 2013
[7] C Wang and D M Blei ldquoCollaborative topic modeling for rec-ommending scientific articlesrdquo in Proceedings of the 17th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo11) pp 448ndash456 ACM August 2011
[8] HWang NWang andD-Y Yeung ldquoCollaborative deep learn-ing for recommender systemsrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1235ndash1244 ACM Sydney NSW AustraliaAugust 2015
[9] J Manotumruksa C Macdonald and I Ounis ldquoA deep recur-rent collaborative filtering framework for venue recommen-dationrdquo in Proceedings of the 2017 ACM on Conference onInformation and KnowledgeManagement pp 1429ndash1438 ACMSingapore Singapore November 2017
[10] Q Liu S Wu D Wang Z Li and L Wang ldquoContext-awaresequential recommendationrdquo in Proceedings of the 2016 IEEE16th International Conference on Data Mining (ICDM) pp1053ndash1058 IEEE Barcelona Spain December 2016
[11] H-T Cheng L Koc J Harmsen et al ldquoWide amp deep learningfor recommender systemsrdquo inProceedings of the 1stWorkshop onDeep Learning for Recommender Systems pp 7ndash10 ACM 2016
[12] Y Yu L Zhang C Wang R Gao W Zhao and J JiangldquoNeural personalized ranking via poisson factormodel for itemrecommendationrdquoComplexity vol 2019 Article ID 3563674 16pages 2019
[13] A Van Den Oord S Dieleman and B Schrauwen ldquoDeepcontent-based music recommendationrdquo in Proceedings of the26th International Conference on Neural Information ProcessingSystems Advances in neural information processing systemspp 2643ndash2651 2013
[14] L Zheng V Noroozi and P S Yu ldquoJoint deepmodeling of usersand items using reviews for recommendationrdquo in Proceedingsof the Tenth ACM International Conference on Web Search andData Mining pp 425ndash434 ACM Cambridge UK Feburary2017
[15] D Kim C Park J Oh S Lee andH Yu ldquoConvolutional matrixfactorization for document context-aware recommendationrdquoin Proceedings of the 10th ACM Conference on RecommenderSystems pp 233ndash240 ACM 2016
[16] B Hidasi A Karatzoglou L Baltrunas and D Tikk ldquoSession-based recommendations with recurrent neural networksrdquo 2015httpsarxivorgabs151106939
[17] A Almahairi K Kastner K Cho and A Courville ldquoLearningdistributed representations from reviews for collaborative filter-ingrdquo in Proceedings of the 9th ACMConference on RecommenderSystems pp 147ndash154 ACM Vienna Austria September 2015
[18] T Bansal D Belanger and A McCallum ldquoAsk the gru multi-task learning for deep text recommendationsrdquo in Proceedings ofthe 10th ACMConference on Recommender Systems pp 107ndash114ACM 2016
[19] J Chen W Zhang P Zhang P Ying K Niu and M ZouldquoExploiting spatial and temporal for point of interest recom-mendationrdquoComplexity vol 2018 Article ID 6928605 16 pages2018
[20] P Zhao X Xu Y Liu V S Sheng K Zheng and H XiongldquoPhoto2trip exploiting visual contents in geo-tagged photos forpersonalized tour recommendationrdquo in Proceedings of the 2017ACM on Multimedia Conference - MM 17 pp 916ndash924 ACMPress Mountain View California USA October 2017
[21] P Covington J Adams and E Sargin ldquoDeep neural networksfor youtube recommendationsrdquo in Proceedings of the 10th ACMConference on Recommender Systems pp 191ndash198 ACM 2016
[22] C-Y Wu A Ahmed A Beutel A J Smola and H JingldquoRecurrent recommender networksrdquo in Proceedings of the TenthACM International Conference onWeb Search and DataMiningpp 495ndash503 ACM 2017
[23] A Beutel P Covington S Jain et al ldquoLatent cross making useof context in recurrent recommender systemsrdquo inProceedings ofthe Eleventh ACM International Conference on Web Search andDataMining pp 46ndash54 ACMMarinaDel Rey CA USA 2018
[24] H Yin W Wang H Wang L Chen and X Zhou ldquoSpatial-aware hierarchical collaborative deep learning for POI rec-ommendationrdquo IEEE Transactions on Knowledge and DataEngineering vol 29 no 11 pp 2537ndash2551 2017
[25] H Yin X Zhou Y Shao H Wang and S Sadiq ldquoJointmodeling of user check-in behaviors for point-of-interest rec-ommendationrdquo in Proceedings of the 24th ACM Internationalon Conference on Information and Knowledge Management pp1631ndash1640 ACMMelbourne Australia October 2015
[26] P Zhao X Xu Y Liu et al ldquoExploiting hierarchical structuresfor POI recommendationrdquo in Proceedings of the 2017 IEEEInternational Conference on Data Mining (ICDM) IEEE NewOrleans LA USA November 2017
[27] P Zhao H Zhu Y Liu et al ldquoWhere to go next a spatio-temporal gated network for next poi recommendationrdquo inProceedings of the 33rdAAAIConference onArtificial Intelligence(AAAI 2019) 2019
[28] HWang F Zhang X Xie andMGuo ldquoDkn deep knowledge-aware network for news recommendationrdquo in Proceedings of the2018World Wide Web Conference pp 1835ndash1844 Lyon FranceApril 2018
[29] Y Gong and Q Zhang ldquoHashtag recommendation usingattention-based convolutional neural networkrdquo in Proceedingsof the 25th International Joint Conference on Artificial Intelli-gence IJCAI 2016 pp 2782ndash2788 NY USA July 2016
[30] Y Tay A T Luu and S C Hui ldquoMulti-pointer co-attentionnetworks for recommendationrdquo in Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Discovery ampData Mining pp 2309ndash2318 London UK August 2018
[31] Z Cheng Y Ding L Zhu and M Kankanhalli ldquoAspect-awarelatent factormodel rating prediction with ratings and reviewsrdquoin Proceedings of the 2018World WideWeb Conference pp 639ndash648 Lyon France April 2018
[32] D Tang BQin T Liu andY Yang ldquoUsermodeling with neuralnetwork for review rating predictionrdquo in Proceedings of the 24thInternational Conference on Artificial Intelligence IJCAI 2015pp 1340ndash1346 Argentina July 2015
[33] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2324 1998
[34] Y Kim ldquoConvolutional neural networks for sentence classi-ficationrdquo in Proceedings of the 2014 Conference on Empirical
Complexity 11
Methods in Natural Language Processing (EMNLP) pp 1746ndash1751 Association for Computational Linguistics Doha Qatar2014 httpsaclanthologyinfopapersD14-1181d14-1181
[35] Y Zhang Q Ai X Chen andW B Croft ldquoJoint representationlearning for top-N recommendation with heterogeneous infor-mation sourcesrdquo in Proceedings of the 2017 ACM on Conferenceon Information and Knowledge Management pp 1449ndash1458ACM Singapore Singapore November 2017
[36] J Pennington R Socher and C Manning ldquoGloVe globalvectors for word representationrdquo in Proceedings of the 2014 Con-ference on Empirical Methods in Natural Language Processing(EMNLP) pp 1532ndash1543 2014
[37] T Mikolov I Sutskever K Chen G S Corrado and JDean ldquoDistributed representations of words and phrases andtheir compositionalityrdquo in Proceedings of the 26th Interna-tional Conference on Neural Information Processing SystemsAdvances in neural information processing systems pp 3111ndash3119 Lake TahoeNevada 2013 httpsdlacmorgcitationcfmid=2999959
[38] D P Kingma and J Ba ldquoAdam a method for stochasticoptimizationrdquo 2014 httpsarxivorgabs14126980
[39] P Gopalan J M Hofman and DM Blei ldquoScalable recommen-dationwith hierarchical Poisson factorizationrdquo inProceedings ofthe 31st Conference on Uncertainty in Artificial Intelligence UAI2015 pp 326ndash335 Netherlands July 2015
[40] X Luo M Zhou Y Xia and Q Zhu ldquoAn efficient non-negativematrix-factorization-based approach to collaborative filteringfor recommender systemsrdquo IEEE Transactions on IndustrialInformatics vol 10 no 2 pp 1273ndash1284 2014
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Complexity 3
The use of spatial data is often done by dividing theinput space into roles and regions Assuming usersrsquo behaviorvaries when traveling far from home previous works [5 6]generated two profiles for each user one to be used in herhome region while another in more distant locations Arecent approach [24ndash26] attempts to divide the input spaceinto geographical regions before incorporated into themodeloften by hierarchical structures Although methods based onregions and roles are able to better distinguish user behaviorsin varied locations they do not provide a personalized userrepresentation and can ignore potential shifts of preferencesfrom one region to another For example a user might preferto visit a Starbucks location in different regions close or farfrom home Enriching geographical features with additionaldata is demonstrated in [25 27] where the next locationprediction is partially determined by past sequences of check-ins However these methods are not generic and cannot beextended to include additional features derived from socialnetworks or textual data
Furthermore in case of tasks in highly sparse environ-ments such as POI recommendation adding user or itemspecific inputs may diminish the modelrsquos ability to generalizeHowever applying the same data as a regulating factor canenhance the modelrsquos performance and reduce over-fittingSuch has been done in [4] where the similarity betweenconnected users in the social network was used to constraina matrix factorization (MF) model Reference [2] utilizedsocial networks and geographical distances to enforce similarembeddings for users and locations thus improving themodelrsquos ability to generalize for users and locations with fewhistorical records
22 Text-Based Recommendation Since many websites en-courage users to provide a written explanation to theirnumeric ratings textual reviews are one of the most populartypes of data to be integrated into RS Previous works hadadopted probabilistic-based approaches to alleviate the datasparsity problem using textual input [28ndash31] By expressingeach review as a bag of words LDA-based models are ableto extract topics which can be used to represent usersrsquointerests and locationsrsquo characteristics [5 6 25] These prob-abilistic methods are usually successful in handling issuesthat standard CF approaches struggle with such as out-of-town recommendations where similar users lack sufficienthistorical data However as demonstrated in recent works[17 32] failing to preserve the original order of wordsand ignoring their semantic meaning prevent the successfulmodeling of a given review On the other hand an emergingtrend of adopting neural networks over reviews allows suchlearning without the loss of data These implementations cangenerally be categorized into RNN-based [17 18] and CNN-based [14 15] models
RNN-based recommender systems usually rely on thesequential structure of sentences to learn their meaningIn [18] the words describing a target item are fed to abidirectional RNN layer Following the GRU architecture themodel utilizes an accumulated context from successive wordsto provide better representation of each word To preserveand update the context of words the recurrent model has to
manage a large number of parameters The problem becomesmore prominent when adopting the popular LSTMparadigmas done in [17]
Following its success in the field of computer vision [33]CNN-based models are gaining popularity in other areassuch as textualmodeling [34] By employing a slidingwindowover a given document such networks are able to representdifferent features foundwithin the text by identifying relevantsubsets of words Reference [15] follows the standard CNNstructure comprised of an embedding to represent thesemantic meaning of each word a convolution layer forgenerating local features and a max pooling operation toidentify the most relevant factors In [14] two CNNs aredeveloped to represent the target user and item based on theirreviews The resulting vectors are regarded as the user anditem representations which are then fed to a nonlinear layerto learn their corresponding rating
We claim that by jointly learning contextual and textualbased deep models our proposed method will better exploitthe strengths of collaborative filtering while being moreresilient to its shortcomings in sparse scenarios This will beachieved by learning usersrsquo and locationsrsquo representations assimilarities in direct interactions along with the correlationin underlying features extracted from their written reviewsThe notable work of [35] proposed JRL a framework thatsimilarly attempts to jointly optimize multiple models whereeach is responsible for learning a unique perspective of thesame task by focusing on different inputs However whileJRL is a general framework focused on extendability to manytypes of input our proposed method is tailored for the taskof POI recommendation
3 The Neural Recommender
31 Neural Network Architecture The following recom-mender system aims to improve the POI recommendationtask by learning user-location interactions using two parallelneural networks as shown in Figure 1 The context-basednetwork presented in the left part of the figure is designed tomodel the user-POI preferences using social and geographi-cal attributes and based on a multilayer perceptron structure[2 3] Shown in the right side of Figure 1 the convolutionalneural network is responsible for the textual modeling unit[14] It attempts to learn the same preference by analyzing theunderlying meaning in usersrsquo and locationsrsquo reviews Each ofthe two networks is based on modeling the userPOI inputindividually with regard to their shared interaction definedin the merge layers
311 Context-Based Network Layers To better capture thecomplex relations between users and locations in LBSN wechose to adopt the multilayer perceptron architecture Bystacking multiple layers of nonlinearities MLP is capable oflearning relevant latent factors of its inputs It is first fedwith user and location vectors of sizes N and M whereeach input tuple ⟨119906 119901⟩ is transformed into sparse one-hotencoding representations The two fully connected embed-ding layers found on top of the input layer project the sparserepresentations of users and locations into smaller and denser
4 Complexity
Prediction y
Hidden Layer
Hidden Layer
Merge Layer
Merge Layer
Hidden Layer
Merge Layer
oplus oplus
oplus oplusoplus oplus
User GraphOutput gu
POI GraphOutput gp
Somax Layer(cEu)
Somax Layer(cEp)
User Embedding(Eu)
POI Embedding(Ep)
User u POI p
Full Connected (ℎu) Fully Connected (ℎp)
Pooling Layer (Ou) Pooling Layer (Op)
Conv Layer (Zu) Conv Layer (Zp)
Word Embedding (Vu) Word Embedding (Vp)
User Reviews (du) POI Reviews (dp)
Figure 1 TCENR Framework
vectors For 119906 and 119901 the respective embedding matrices are119864119906 isin R119896119906times119873 and 119864119901 isin R119896119901times119872 where 119896119906 and 119896119901 are thecorresponding dimensions
We exploit the usersrsquo social networks along with the loca-tionsrsquo geographical graphs to constrain the learned embed-dings Two softmax layers take the representations 119864119906 and 119864119901as input and transform them back to N and M sized vectorsrespectively The user output layer 120595119888119864119906 isin R119873 describes thesimilarity of a userrsquos embedding to all N users and can beformally denoted as
120595119888119864119906 = 119886 (119882119906119888 times 119864119906 + 119887119906119888 ) (1)
where 119882119906119888 and 119887119906119888 are the layerrsquos weight matrix and biasvector and a is a nonlinear activation function Due tothe similarity between the user and POI specific layers thelocation output layer 120595119888119864119901 will not be developed in thissection Enforcing 120595119888119864119906 and 120595119888119864119901 to resemble user ursquos socialgraph 119892119906 and location prsquos spatial graph 119892119901 respectivelyresults in a smoothing factor that limits the amount by whichconnected entitiesrsquo embeddings differ
The user and location embeddings are projected to amerge layer and combined by a concatenation operatorUsing concatenation instead of dot-product allows variedembedding structures which in turn improves the generatedrepresentations [3] As the input layer for the following neuralnetwork the merge layer can be represented as ℎ(0)(119909) where
119909 = [119864119906 119864119901] (2)
Since simple concatenation of the user-location embeddedvectors does not allow for interactions to bemodeled hiddenlayers are added to learn these connections The popularRectified Linear Units (ReLU) is employed as the activationfunction for these layers More formally the q-th hidden layercan be defined as
where119882119902 and 119887119902 are the q-th layer parameters
312 Textual Modeling Network Layers To improve themodelrsquos coverage a textual-based network is introduced Itsimultaneously learns the same interaction as the contextual-based network but with a natural language input Twoadditional vectors 119889119906 and 119889119901 representing user ursquos andlocation prsquos textual reviews respectively are applied as inputsfor this network Each vector is comprised of all n wordswritten by the user or about the location merged togetherkept in their original order These words are then mappedto c-dimensional vectors defining their semantic meaningin the following embedding layers The output of the userembedding layer is the representation of all words used bya user u in the form of a matrix and can be denoted as
where [1206011 1206012] denotes the concatenation of two vectors 119889119906119897 isuser 119906rsquos 119897-th word and 120601 119863 997888rarr R119896119908 is a lookup function to
Complexity 5
Fully Connected
Max Pooling
ConvolutionalLayers
WordEmbeddings
delicious healthy food steak is amazing
(a) Textual modeling component using CNN
Fully Connected
BI-directionalGRU
Max Pooling
WordEmbeddings
delicious healthy food steak is amazing
(b) Textual modeling component using RNN
Figure 2 Proposed alternatives to learn user and location representations from textual reviews 2(a) is a CNN-based solution employed inTCENR while 2(b) illustrates the suggested extension using RNN
a pre-trained textual embedding layer [36 37] that representseach word in vocabulary 119863 as a vector in size 119896119908 Similarly119881119901 denotes the word embedding matrix for location 119901
Due to the large amount of parameters required to trainthe aforementioned contextual model the textual network isimplemented using a CNN-based architecture which is usu-ally more computationally efficient than RNNThe semanticrepresentations of usersrsquo and locationsrsquo reviews are fed toconvolution layers to detect parts of the text that best capturethe reviewrsquos meaning These layers produce 119905 feature mapsover the embedded word vector using a window size of 119908119904and filter119870 isin R119896119908times119905 As suggested by [14] ReLU is used as anactivation function for this layer
where 119881119906119897 is user 119906rsquos 119897-th input word embedding and 119911119906119895 the119895-th feature extracted from the complete textBased on the standard CNN structure feature maps
produced by the convolution layers are reduced by a poolinglayer
wheremax-pooling is selected to identify the most importantwords and their latent values 119874119906 is the collection of allconcise features extracted from user 119906rsquos textual input Theseare followed by fully connected layers to jointlymodel the dif-ferent features and result in the latent textual representationsfor user 119906 and location 119901 respectively denoted as ℎ119906 and ℎ119901
ℎ119906 = 119877119890119871119880 (1198821199061 times 119874119906 + 1198871199061 ) (7)
To combine the outputs of the users and locations fullyconnected layers to the same feature space a shared layeris utilized It concatenates its two inputs and learns theirinteraction by employing an additional hidden layer
ℎ119903119890V119894119890119908119904 = 119877119890119871119880 (1198822 times [ℎ119906 ℎ119901] + 1198872) (8)
The two neural networks are then finally merged toproduce a prediction 119910119906119901 isin [0 1] The last layers of thetwo networks each representing a different view of the user-location interaction are concatenated and fed to yet anotherhidden layer responsible to blend the learning
119910119906119901 = 120590 (1198823 times [ℎ119888119900119899119905119890119909119905 ℎ119903119890V119894119890119908119904] + 1198873) (9)
where the sigmoid function was selected to transform thehidden layer output to the desired range of [0 1]32 Sequential Textual Modeling To further investigate thegain achieved by integrating a textual modeling componentover reviews in TCENR we suggest an extension denoted asTCENRseq Following its success in previous language model-ing tasks [17 18] and its ability to capture sentencesrsquo sequentialnature we employ anRNNcomponent to learn latent featuresfrom reviews An illustration of the proposed extension ispresented in Figure 2(b) while the CNN method used inthe vanilla TCENR is shown in Figure 2(a) to provide aconvenient base for comparisons More specifically we followthe findings of previous works [18 23] and implement ourrecurrent network using GRU an architecture that achievescompetitive performance compared to LSTM but with fewerparameters making it more efficient
where 119891119897 is the forget gate for input word 119897 119904119897 is the outputgate 119888119897 is the new candidate state combining the current wordembedding 119881119906119897 with the previous hidden state and ℎ119897 iscurrent state for word 119897modeled by the output gate ⊙ denotesthe element-wise product and 119882119891 119877119891 119882119904 119877119904 119882119888 and 119877119888are the GRU weight matrices while 119887119891 119887119904 and 119887119888 are the biasvectors
Since the context of a word can be determined byother preceding and successive words or sentences ourproposed method employs a bidirectional GRU over theuser embedding 119881119906 and the location 119881119901 Each word 119897rsquoshidden state is learned by forward and backward GRU layers
denoted as997888rarrℎ1119897 and larr997888ℎ1119897 respectively To learn a more concise
and combined representation of a word while taking intoaccount the context of all surrounding words we feed the
concatenation of997888rarrℎ1119897 andlarr997888ℎ1119897 to an additional bidirectionalGRU
layer such that its input for every word 119897 is 1198902119897 = [997888rarrℎ1119897 larr997888ℎ1119897 ]The second recurrent unit will output 119899 latent vectors eachis a sequentially infused representation of a word written bythe target user or about the candidate item To allow themethod of textual modeling to be the only variant betweenTCENR and TCENRseq and to further reduce the numberof learned parameters all modified word vectors will befed to the pooling and fully connected layers originallypresented in (6) and (7) respectively This will allow us todirectly determine the effect RNN has on textual modelingfor POI recommender systems compared to CNN as well asenabling the model to learn a more concise user and locationrepresentations As in TCENR the resulting vectors will bemerged in order to learn the user-location interaction
33 Training the Network To train the recommendationmodels we adopt a pointwise loss objective function as donein [2 3 14] where the difference between the prediction 119910119906119901and the actual value 119910119906119901 is minimized To address the implicitfeedback nature of LBSNs we sample a set of negative samplesfrom the dataset denoted as 119884
Due to the implicit feedback nature of the recommenda-tion task the algorithmrsquos output can be considered as a binaryclassification problem As the sigmoid activation function isbeing used over the last hidden layer the output probabilitycan be defined as
where 119864119906 and 119864119901 are the embedding layers for users andlocations respectively Similarly 119881119906 and 119881119901 are the textualreviews embedding layers and Θ119891 represents the modelparameters Taking the negative log-likelihood of p results
in the binary cross-entropy loss function for the predictionportion of the model
119871119901119903119890119889 = minus sum(119906119901)isin119884cup119884
119910119906119901 log119910119906119901+ (1 minus 119910119906119901) log (1 minus 119910119906119901)
(12)
As there are two more outputs in the model the usersrsquosocial network 120595c119864119906 and the locationsrsquo distance graph 120595119888119864119901two additional loss functions are required to train the net-work We follow the process done in [2] assuming two userswho share the same context should have similar embeddingsThis is achieved by minimizing the log-loss of the contextgiven the instance embedding
119871119906 119888119900119899119905119890119909119905= minus sum(119906119906119888)
log(120595119888119864119906 minus log sum1199061015840119888isin119862119906
exp (1205951198881015840119864119906)) (13)
where 120595119888119864119906 is as defined in (1) Taking the binary classlabel into account prompts the following loss functioncorrespondingwithminimizing the cross-entropy loss of useri and context c with respect to the y class label
where I is a function that returns 1 if y is in the given setand 0 otherwise The same logic is used to formulate the lossfunction for the POI context and will not be provided due tospace limitations
We simultaneously minimize the three loss functions119871119901119903e119889 119871119906 119888119900119899119905119890119909119905 and 119871119901 119888119900119899119905119890119909119905 The joint optimizationimproves the recommendation accuracy while enforcingsimilar representations for locations in close proximity andusers connected in the social network The loss functionsare combined using two hyper-parameters to weight thecontextual contribution
To optimize the combined loss function a method ofgradient descent can be adopted and more specifically weutilize the Adaptive Moment Estimation (Adam) [38] Thisoptimizer automatically adjusts the learning rate and yieldsfaster convergence than the standard gradient descent inaddition to making the learning rate optimization processmore efficient In order to avoid additional overfitting whentraining the model an early stopping criteria is integratedThe model parameters are initialized with Gaussian distri-bution while the output layerrsquos parameters are set to followuniform distribution
4 Experiments and Evaluation
41 Experimental Setup To evaluate our proposed algorithmwe use Yelprsquos real-world dataset (httpswwwyelpcomdata-setchallenge) It includes a subset of textual reviews along
Complexity 7
with the usersrsquo friends and the businesses locations Due tothe limited resources used in the model evaluation we choseto filter the dataset and keep only a concise subset whereall users and locations with less than 100 written reviews orless than 10 friends are removedThe filtered dataset includes141028 reviews and 9808 sparsity for the rating matrixThesocial and geographical graphs were constructed by randomwalks 10 of the original vertices were sampled as basenodes while 20 and 30 vertices were connected to each basenode for users and locations respectively with a window sizeof 3 To build the POI graph two locations are considereddirectly connected if they are up to 1 km apart
To test our modelsrsquo performance the original data wassplit to training-validation-test sets by random samplingwith the respective ratios of 56-24-20 resulting in78899 training instances In addition the input data was neg-atively sampled with 4 negative locations for every positiveone
To effectively compare our proposal with other alterna-tives we adopt the same settings as applied in [2 3] TheMLP input vectors are represented with an embedding size of10 while two hidden layers are added on top of the mergedresult Following the tower architecture where the size ofeach layer is half the size of its predecessor the numbers ofhidden units are 32 and 16 for the first and second layersrespectively
In the CNN component each word is represented by apretrained embedding layer with 50 units while the convo-lutional layer is constructed with a window size of 10 anda stride of 3 It results in 3 feature maps that are flattenedafter performing the max-pooling operation with a pool sizeof 2 The results are further modeled by a hidden layer with32 units Following the merge of the two hidden units theirinteraction is learned using another hidden layer with 8 unitsTo combine the three loss functions as described in (15) wefollow the results of [2] and set the hyperparameters 1205821 =1205822 = 01 For the training phase of the model a learning rateof 0005 was used over 50 maximum epochs and a batch sizeof 512 samples
42 Baselines To evaluate our algorithm we chose to com-pare it to these seven empirically proven frameworks
(vi) PACE [2] Preference and Context Embedding AMLP-based framework with the addition of contex-tual graphsrsquo smoothing for POI recommendation
(vii) DeepCoNN [14] Deep Cooperative Neural Net-works A CNN-based method that jointly learns anexplicit prediction by exploiting usersrsquo and locationsrsquonatural language reviews
For the task of evaluating ourmodel and the baselines wechose to apply Accuracy and Mean Square Error (MSE) overall n test samples as well as Precision (Pre10) and Recall(Rec10) for the average top 10 predictions per user
The proposed models were implemented using Keras(httpskerasio) on top of TensorFlow (httpswwwten-sorfloworg) backend All experiments were conducted usingNivida GTX 1070 GPU
43 Performance Evaluation The performance of our pro-posed algorithms and the seven baselines is reported inTable 1 along with the improvement ratio of TCENR overeach method in brackets The presented results are based onthe average of three individual executions
As can be witnessed from the results the proposedmodel TCENR achieves the best results overall comparedto all baselines Furthermore it was found to significantlyoutperformHPFNMF Geo-SAGE LCARS Pace and Deep-CoNN for 119901 lt 005 based on a one-sided unpaired t-testin terms of accuracy and MSE The contrasting results interms of precision and recall compared to NeuMF suggestthat TCENR offers lessbut more relevant recommendationsto the user While NMF provides the best precision scorecompared to all methods it underperforms in all othermeasures making it a less desirable model Taking a closerlook shows that surprisingly NeuMF outperforms PACE inaccuracy precision and recall This may be due to the less
Figure 3 Runtime (seconds) of all models on the Yelp dataset
sparse dataset tested which does not allow the contextualregularization to be fully harvested In addition the use ofonly the first 500words to represent the textual input for eachuser and location may explain the relatively low scores of theDeepCoNN model on the dataset while the performance ofGeo-SAGE and LCARS demonstrates that relying solely ongeographical data does not allow suchmodels to fully captureusersrsquo preferences in LBSNs
Comparing TCENR and its proposed extensionTCENRseq provides contrasting results By employingRNN instead of CNN to extract user and location featuresfrom textual reviews TCENRseq achieves lower error rateand improved precision score while accuracy and recallare worsened It may be considered that by accuratelycapturing different aspects from user reviews the modelis able to reinforce its hypotheses and therefore reducethe uncertainty in some cases However when faced witha contrast between textual aspects and the ground truthit might choose the wrong class label Nonetheless theresults demonstrate the importance of adopting the mostsuitable techniques and measures to learn different datatypes rather than employing a single method over all inputsMoreover it shows the positive impact of using textualdata in conjunction to historical activities The reportedperformance further suggests additional insight towards theselection of CNN and RNN for the task of language modelingin future recommendation tasks
To further evaluate our suggested frameworks and theseven baselines in terms of runtime the average time requiredto fully train each method is presented in Figure 3 Asdemonstrated by the results TCENR is competitivewithmostbaselines and found to be more efficient than DeepCoNNand LCARS The reported runtime of TCENRseq furtherdemonstrates the relative efficiency of CNN-based solutionsfor textual modeling tasks As the number of trainableparameters is increased due to the use of recurrent layers ourRNN-based extension takes 329 longer to train comparedto TCENR while achieving comparative results
44 Model Design Analysis In this section we discuss theeffect of several design selections over the suggested modelrsquosperformance
441Merge Layer The importance of themodelrsquos final layersresponsible for combining the dense output of both the MLPand convolutional networks requires a close attention as it
effects the networksrsquo ability to jointly learn and the predictionitself To properly select the fusion operator the followingmethods had been considered
(i) Combining the last hidden layers of the two modelsusing concatenation A model using this method willbe denoted as 119879119862119864119873119877119888119900119899 and described in (9)
(ii) Merging the last hidden layers using dot productresulting in a model named 119879119862119864119873119877119889119900119905 that can bedefined as
(iii) Combining the two previously described methodswhere the two representations will be jointly learnedby concatenation and dot productThe resultedmodelwill be denoted as 119879119862119864119873119877119889119900119905 119888119900119899 and can be devel-oped by combining (9) and (16) using addition andtranslating the result to a range of [0 1] with thesigmoid function
(iv) Adopting a weighted average for the prediction resultof the two networks Denoted as 119879119862119864119873119877119908119890119894119892ℎ119905 thismodel can be defined as
119910119906119901 = 1205821120590 (1198825 times ℎ119888119900119899119905119890119909119905 + 1198875)+ 1205822120590 (1198826 times ℎ119903119890V119894119890119908119904 + 1198876) (18)
As shown in Figure 4 adopting the more simple methodsof weighted average and dot product leads to an inferiorperformance of TCENR demonstrating the added valueof utilizing the latent features learned by each subnetworkjointly When combined with the underperforming methodof dot product in 119879119862119864119873119877119889119900119905 119888119900119899 the use of concatenationimproves over dot product alone However since the twomethods are integrated using a simple average employingonly concatenation as done in 119879119862119864119873119877119888119900119899 produces the bestresults and therefore integrated into the final model
442 MLP Layer Design Although it was found by [3] thatadding more layers and units to the MLP-based recom-mender has a positive effect the use of CNN and the addi-tional hidden layer suggests it is a subject worth investigating
Complexity 9
0812
0823
0802
0828
TCENRdot_con
TCENRweight
TCENRdot
TCENR con
Figure 4 Comparison of merging methods in terms of accuracy
Table 2 Modelrsquos accuracy with different layers
To this end we test the proposed algorithm with 1-4 hiddenlayers used to learn the user-item interaction with contextualregularization in varying sizes from 8 to 128 hidden unitsThe results in terms of test setrsquos accuracy are presented inTable 2 where the number of hidden layers is defined ascolumns and the size of the first unit is presented as rowsUnlike previous results we find that two hidden layers with32 and 16 hidden units result in the best performance for ourdataset
443 Number of Words The use of written reviews in theiroriginal order allows the strengths of CNN and RNN to beexploited by finding the best representation for every fewwords and eventually for the whole text Our final datasethowever is composed of very long reviews where to fullylearn a single user or location more than 20000 words arerequired making it computationally expensive to extract rel-evant representations To benefit from the sequential natureof the written reviews while keeping the solution feasible thenumber of words was limited to a range of 500-6000 As canbe witnessed from Figure 5 there is a slight improvement inaccuracy as the number of words increase up to 3000 whileadditional words result in an increased bias towards users andlocations with longer reviews and in turn reduce the modelrsquoslearning capabilities
5 Conclusion and Future Work
In this paper we developed a neural POI recommendersystem called TCENR The model exploits data about userslocations spatial data social networks and textual reviewsto predict the implicit preference of users regarding POIsTCENR models two types of user-location interactionsnative check-ins regularized by contextual information andthe words used to describe the usersrsquo experiences We furtherextended our proposed method and presented TCENRseqwhere textual data was modeled using RNN instead of CNNEvaluated over the Yelp dataset the proposed algorithms
0822
0825
0828
0831
0834
0 2000 4000 6000
Figure 5 Number of words comparison in terms of accuracy
consistently achieved superior results compared to sevenstate-of-the-art baselines in terms of accuracy and MSE
For future work we intend to extend our modelsrsquoevaluation over additional LBSN datasets In addition weplan to investigate the proposed frameworksrsquo contributionto the cold-start problem by analyzing its performance onadditional data while taking newusers and locationswith fewreviews into account
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was supported in part by the National NaturalScience Foundation of China Grant (61572289) and NSERCDiscovery Grants
References
[1] H Li Y Ge R Hong and H Zhu ldquoPoint-of-interest rec-ommendations learning potential check-ins from friendsrdquo inProceedings of the 22nd ACM SIGKDD International Conferenceon Knowledge Discovery And Data Mining pp 975ndash984 ACMSan Francisco California USA August 2016
[2] C Yang L Bai C Zhang Q Yuan and J Han ldquoBridgingcollaborative filtering and semi-supervised learning a neuralapproach for poi recommendationrdquo in Proceedings of the 23rdACM SIGKDD International Conference on Knowledge Discov-ery andDataMining pp 1245ndash1254 ACMHalifaxNS CanadaAugust 2017
[3] X He L Liao H Zhang L Nie X Hu and T-S Chua ldquoNeuralcollaborative filteringrdquo in Proceedings of the 26th InternationalConference onWorldWideWeb InternationalWorldWideWebConferences Steering Committee pp 173ndash182 Perth AustraliaApril 2017
[4] H Ma D Zhou C Liu M R Lyu and I King ldquoRecommendersystems with social regularizationrdquo in Proceedings of the 4thACM International Conference onWeb Search and DataMiningpp 287ndash296 ACM February 2011
10 Complexity
[5] W Wang H Yin L Chen Y Sun S Sadiq and X ZhouldquoGeo-sage a geographical sparse additive generative model forspatial item recommendationrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1255ndash1264 ACM Sydney NSW AustraliaAugust 2015
[6] H Yin Y Sun B Cui Z Hu and L Chen ldquoLcars a location-content-aware recommender systemrdquo in Proceedings of the 19thACM SIGKDD International Conference on Knowledge Discov-ery andDataMining pp 221ndash229 ACMChicago Illinois USAAugust 2013
[7] C Wang and D M Blei ldquoCollaborative topic modeling for rec-ommending scientific articlesrdquo in Proceedings of the 17th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo11) pp 448ndash456 ACM August 2011
[8] HWang NWang andD-Y Yeung ldquoCollaborative deep learn-ing for recommender systemsrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1235ndash1244 ACM Sydney NSW AustraliaAugust 2015
[9] J Manotumruksa C Macdonald and I Ounis ldquoA deep recur-rent collaborative filtering framework for venue recommen-dationrdquo in Proceedings of the 2017 ACM on Conference onInformation and KnowledgeManagement pp 1429ndash1438 ACMSingapore Singapore November 2017
[10] Q Liu S Wu D Wang Z Li and L Wang ldquoContext-awaresequential recommendationrdquo in Proceedings of the 2016 IEEE16th International Conference on Data Mining (ICDM) pp1053ndash1058 IEEE Barcelona Spain December 2016
[11] H-T Cheng L Koc J Harmsen et al ldquoWide amp deep learningfor recommender systemsrdquo inProceedings of the 1stWorkshop onDeep Learning for Recommender Systems pp 7ndash10 ACM 2016
[12] Y Yu L Zhang C Wang R Gao W Zhao and J JiangldquoNeural personalized ranking via poisson factormodel for itemrecommendationrdquoComplexity vol 2019 Article ID 3563674 16pages 2019
[13] A Van Den Oord S Dieleman and B Schrauwen ldquoDeepcontent-based music recommendationrdquo in Proceedings of the26th International Conference on Neural Information ProcessingSystems Advances in neural information processing systemspp 2643ndash2651 2013
[14] L Zheng V Noroozi and P S Yu ldquoJoint deepmodeling of usersand items using reviews for recommendationrdquo in Proceedingsof the Tenth ACM International Conference on Web Search andData Mining pp 425ndash434 ACM Cambridge UK Feburary2017
[15] D Kim C Park J Oh S Lee andH Yu ldquoConvolutional matrixfactorization for document context-aware recommendationrdquoin Proceedings of the 10th ACM Conference on RecommenderSystems pp 233ndash240 ACM 2016
[16] B Hidasi A Karatzoglou L Baltrunas and D Tikk ldquoSession-based recommendations with recurrent neural networksrdquo 2015httpsarxivorgabs151106939
[17] A Almahairi K Kastner K Cho and A Courville ldquoLearningdistributed representations from reviews for collaborative filter-ingrdquo in Proceedings of the 9th ACMConference on RecommenderSystems pp 147ndash154 ACM Vienna Austria September 2015
[18] T Bansal D Belanger and A McCallum ldquoAsk the gru multi-task learning for deep text recommendationsrdquo in Proceedings ofthe 10th ACMConference on Recommender Systems pp 107ndash114ACM 2016
[19] J Chen W Zhang P Zhang P Ying K Niu and M ZouldquoExploiting spatial and temporal for point of interest recom-mendationrdquoComplexity vol 2018 Article ID 6928605 16 pages2018
[20] P Zhao X Xu Y Liu V S Sheng K Zheng and H XiongldquoPhoto2trip exploiting visual contents in geo-tagged photos forpersonalized tour recommendationrdquo in Proceedings of the 2017ACM on Multimedia Conference - MM 17 pp 916ndash924 ACMPress Mountain View California USA October 2017
[21] P Covington J Adams and E Sargin ldquoDeep neural networksfor youtube recommendationsrdquo in Proceedings of the 10th ACMConference on Recommender Systems pp 191ndash198 ACM 2016
[22] C-Y Wu A Ahmed A Beutel A J Smola and H JingldquoRecurrent recommender networksrdquo in Proceedings of the TenthACM International Conference onWeb Search and DataMiningpp 495ndash503 ACM 2017
[23] A Beutel P Covington S Jain et al ldquoLatent cross making useof context in recurrent recommender systemsrdquo inProceedings ofthe Eleventh ACM International Conference on Web Search andDataMining pp 46ndash54 ACMMarinaDel Rey CA USA 2018
[24] H Yin W Wang H Wang L Chen and X Zhou ldquoSpatial-aware hierarchical collaborative deep learning for POI rec-ommendationrdquo IEEE Transactions on Knowledge and DataEngineering vol 29 no 11 pp 2537ndash2551 2017
[25] H Yin X Zhou Y Shao H Wang and S Sadiq ldquoJointmodeling of user check-in behaviors for point-of-interest rec-ommendationrdquo in Proceedings of the 24th ACM Internationalon Conference on Information and Knowledge Management pp1631ndash1640 ACMMelbourne Australia October 2015
[26] P Zhao X Xu Y Liu et al ldquoExploiting hierarchical structuresfor POI recommendationrdquo in Proceedings of the 2017 IEEEInternational Conference on Data Mining (ICDM) IEEE NewOrleans LA USA November 2017
[27] P Zhao H Zhu Y Liu et al ldquoWhere to go next a spatio-temporal gated network for next poi recommendationrdquo inProceedings of the 33rdAAAIConference onArtificial Intelligence(AAAI 2019) 2019
[28] HWang F Zhang X Xie andMGuo ldquoDkn deep knowledge-aware network for news recommendationrdquo in Proceedings of the2018World Wide Web Conference pp 1835ndash1844 Lyon FranceApril 2018
[29] Y Gong and Q Zhang ldquoHashtag recommendation usingattention-based convolutional neural networkrdquo in Proceedingsof the 25th International Joint Conference on Artificial Intelli-gence IJCAI 2016 pp 2782ndash2788 NY USA July 2016
[30] Y Tay A T Luu and S C Hui ldquoMulti-pointer co-attentionnetworks for recommendationrdquo in Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Discovery ampData Mining pp 2309ndash2318 London UK August 2018
[31] Z Cheng Y Ding L Zhu and M Kankanhalli ldquoAspect-awarelatent factormodel rating prediction with ratings and reviewsrdquoin Proceedings of the 2018World WideWeb Conference pp 639ndash648 Lyon France April 2018
[32] D Tang BQin T Liu andY Yang ldquoUsermodeling with neuralnetwork for review rating predictionrdquo in Proceedings of the 24thInternational Conference on Artificial Intelligence IJCAI 2015pp 1340ndash1346 Argentina July 2015
[33] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2324 1998
[34] Y Kim ldquoConvolutional neural networks for sentence classi-ficationrdquo in Proceedings of the 2014 Conference on Empirical
Complexity 11
Methods in Natural Language Processing (EMNLP) pp 1746ndash1751 Association for Computational Linguistics Doha Qatar2014 httpsaclanthologyinfopapersD14-1181d14-1181
[35] Y Zhang Q Ai X Chen andW B Croft ldquoJoint representationlearning for top-N recommendation with heterogeneous infor-mation sourcesrdquo in Proceedings of the 2017 ACM on Conferenceon Information and Knowledge Management pp 1449ndash1458ACM Singapore Singapore November 2017
[36] J Pennington R Socher and C Manning ldquoGloVe globalvectors for word representationrdquo in Proceedings of the 2014 Con-ference on Empirical Methods in Natural Language Processing(EMNLP) pp 1532ndash1543 2014
[37] T Mikolov I Sutskever K Chen G S Corrado and JDean ldquoDistributed representations of words and phrases andtheir compositionalityrdquo in Proceedings of the 26th Interna-tional Conference on Neural Information Processing SystemsAdvances in neural information processing systems pp 3111ndash3119 Lake TahoeNevada 2013 httpsdlacmorgcitationcfmid=2999959
[38] D P Kingma and J Ba ldquoAdam a method for stochasticoptimizationrdquo 2014 httpsarxivorgabs14126980
[39] P Gopalan J M Hofman and DM Blei ldquoScalable recommen-dationwith hierarchical Poisson factorizationrdquo inProceedings ofthe 31st Conference on Uncertainty in Artificial Intelligence UAI2015 pp 326ndash335 Netherlands July 2015
[40] X Luo M Zhou Y Xia and Q Zhu ldquoAn efficient non-negativematrix-factorization-based approach to collaborative filteringfor recommender systemsrdquo IEEE Transactions on IndustrialInformatics vol 10 no 2 pp 1273ndash1284 2014
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
4 Complexity
Prediction y
Hidden Layer
Hidden Layer
Merge Layer
Merge Layer
Hidden Layer
Merge Layer
oplus oplus
oplus oplusoplus oplus
User GraphOutput gu
POI GraphOutput gp
Somax Layer(cEu)
Somax Layer(cEp)
User Embedding(Eu)
POI Embedding(Ep)
User u POI p
Full Connected (ℎu) Fully Connected (ℎp)
Pooling Layer (Ou) Pooling Layer (Op)
Conv Layer (Zu) Conv Layer (Zp)
Word Embedding (Vu) Word Embedding (Vp)
User Reviews (du) POI Reviews (dp)
Figure 1 TCENR Framework
vectors For 119906 and 119901 the respective embedding matrices are119864119906 isin R119896119906times119873 and 119864119901 isin R119896119901times119872 where 119896119906 and 119896119901 are thecorresponding dimensions
We exploit the usersrsquo social networks along with the loca-tionsrsquo geographical graphs to constrain the learned embed-dings Two softmax layers take the representations 119864119906 and 119864119901as input and transform them back to N and M sized vectorsrespectively The user output layer 120595119888119864119906 isin R119873 describes thesimilarity of a userrsquos embedding to all N users and can beformally denoted as
120595119888119864119906 = 119886 (119882119906119888 times 119864119906 + 119887119906119888 ) (1)
where 119882119906119888 and 119887119906119888 are the layerrsquos weight matrix and biasvector and a is a nonlinear activation function Due tothe similarity between the user and POI specific layers thelocation output layer 120595119888119864119901 will not be developed in thissection Enforcing 120595119888119864119906 and 120595119888119864119901 to resemble user ursquos socialgraph 119892119906 and location prsquos spatial graph 119892119901 respectivelyresults in a smoothing factor that limits the amount by whichconnected entitiesrsquo embeddings differ
The user and location embeddings are projected to amerge layer and combined by a concatenation operatorUsing concatenation instead of dot-product allows variedembedding structures which in turn improves the generatedrepresentations [3] As the input layer for the following neuralnetwork the merge layer can be represented as ℎ(0)(119909) where
119909 = [119864119906 119864119901] (2)
Since simple concatenation of the user-location embeddedvectors does not allow for interactions to bemodeled hiddenlayers are added to learn these connections The popularRectified Linear Units (ReLU) is employed as the activationfunction for these layers More formally the q-th hidden layercan be defined as
where119882119902 and 119887119902 are the q-th layer parameters
312 Textual Modeling Network Layers To improve themodelrsquos coverage a textual-based network is introduced Itsimultaneously learns the same interaction as the contextual-based network but with a natural language input Twoadditional vectors 119889119906 and 119889119901 representing user ursquos andlocation prsquos textual reviews respectively are applied as inputsfor this network Each vector is comprised of all n wordswritten by the user or about the location merged togetherkept in their original order These words are then mappedto c-dimensional vectors defining their semantic meaningin the following embedding layers The output of the userembedding layer is the representation of all words used bya user u in the form of a matrix and can be denoted as
where [1206011 1206012] denotes the concatenation of two vectors 119889119906119897 isuser 119906rsquos 119897-th word and 120601 119863 997888rarr R119896119908 is a lookup function to
Complexity 5
Fully Connected
Max Pooling
ConvolutionalLayers
WordEmbeddings
delicious healthy food steak is amazing
(a) Textual modeling component using CNN
Fully Connected
BI-directionalGRU
Max Pooling
WordEmbeddings
delicious healthy food steak is amazing
(b) Textual modeling component using RNN
Figure 2 Proposed alternatives to learn user and location representations from textual reviews 2(a) is a CNN-based solution employed inTCENR while 2(b) illustrates the suggested extension using RNN
a pre-trained textual embedding layer [36 37] that representseach word in vocabulary 119863 as a vector in size 119896119908 Similarly119881119901 denotes the word embedding matrix for location 119901
Due to the large amount of parameters required to trainthe aforementioned contextual model the textual network isimplemented using a CNN-based architecture which is usu-ally more computationally efficient than RNNThe semanticrepresentations of usersrsquo and locationsrsquo reviews are fed toconvolution layers to detect parts of the text that best capturethe reviewrsquos meaning These layers produce 119905 feature mapsover the embedded word vector using a window size of 119908119904and filter119870 isin R119896119908times119905 As suggested by [14] ReLU is used as anactivation function for this layer
where 119881119906119897 is user 119906rsquos 119897-th input word embedding and 119911119906119895 the119895-th feature extracted from the complete textBased on the standard CNN structure feature maps
produced by the convolution layers are reduced by a poolinglayer
wheremax-pooling is selected to identify the most importantwords and their latent values 119874119906 is the collection of allconcise features extracted from user 119906rsquos textual input Theseare followed by fully connected layers to jointlymodel the dif-ferent features and result in the latent textual representationsfor user 119906 and location 119901 respectively denoted as ℎ119906 and ℎ119901
ℎ119906 = 119877119890119871119880 (1198821199061 times 119874119906 + 1198871199061 ) (7)
To combine the outputs of the users and locations fullyconnected layers to the same feature space a shared layeris utilized It concatenates its two inputs and learns theirinteraction by employing an additional hidden layer
ℎ119903119890V119894119890119908119904 = 119877119890119871119880 (1198822 times [ℎ119906 ℎ119901] + 1198872) (8)
The two neural networks are then finally merged toproduce a prediction 119910119906119901 isin [0 1] The last layers of thetwo networks each representing a different view of the user-location interaction are concatenated and fed to yet anotherhidden layer responsible to blend the learning
119910119906119901 = 120590 (1198823 times [ℎ119888119900119899119905119890119909119905 ℎ119903119890V119894119890119908119904] + 1198873) (9)
where the sigmoid function was selected to transform thehidden layer output to the desired range of [0 1]32 Sequential Textual Modeling To further investigate thegain achieved by integrating a textual modeling componentover reviews in TCENR we suggest an extension denoted asTCENRseq Following its success in previous language model-ing tasks [17 18] and its ability to capture sentencesrsquo sequentialnature we employ anRNNcomponent to learn latent featuresfrom reviews An illustration of the proposed extension ispresented in Figure 2(b) while the CNN method used inthe vanilla TCENR is shown in Figure 2(a) to provide aconvenient base for comparisons More specifically we followthe findings of previous works [18 23] and implement ourrecurrent network using GRU an architecture that achievescompetitive performance compared to LSTM but with fewerparameters making it more efficient
where 119891119897 is the forget gate for input word 119897 119904119897 is the outputgate 119888119897 is the new candidate state combining the current wordembedding 119881119906119897 with the previous hidden state and ℎ119897 iscurrent state for word 119897modeled by the output gate ⊙ denotesthe element-wise product and 119882119891 119877119891 119882119904 119877119904 119882119888 and 119877119888are the GRU weight matrices while 119887119891 119887119904 and 119887119888 are the biasvectors
Since the context of a word can be determined byother preceding and successive words or sentences ourproposed method employs a bidirectional GRU over theuser embedding 119881119906 and the location 119881119901 Each word 119897rsquoshidden state is learned by forward and backward GRU layers
denoted as997888rarrℎ1119897 and larr997888ℎ1119897 respectively To learn a more concise
and combined representation of a word while taking intoaccount the context of all surrounding words we feed the
concatenation of997888rarrℎ1119897 andlarr997888ℎ1119897 to an additional bidirectionalGRU
layer such that its input for every word 119897 is 1198902119897 = [997888rarrℎ1119897 larr997888ℎ1119897 ]The second recurrent unit will output 119899 latent vectors eachis a sequentially infused representation of a word written bythe target user or about the candidate item To allow themethod of textual modeling to be the only variant betweenTCENR and TCENRseq and to further reduce the numberof learned parameters all modified word vectors will befed to the pooling and fully connected layers originallypresented in (6) and (7) respectively This will allow us todirectly determine the effect RNN has on textual modelingfor POI recommender systems compared to CNN as well asenabling the model to learn a more concise user and locationrepresentations As in TCENR the resulting vectors will bemerged in order to learn the user-location interaction
33 Training the Network To train the recommendationmodels we adopt a pointwise loss objective function as donein [2 3 14] where the difference between the prediction 119910119906119901and the actual value 119910119906119901 is minimized To address the implicitfeedback nature of LBSNs we sample a set of negative samplesfrom the dataset denoted as 119884
Due to the implicit feedback nature of the recommenda-tion task the algorithmrsquos output can be considered as a binaryclassification problem As the sigmoid activation function isbeing used over the last hidden layer the output probabilitycan be defined as
where 119864119906 and 119864119901 are the embedding layers for users andlocations respectively Similarly 119881119906 and 119881119901 are the textualreviews embedding layers and Θ119891 represents the modelparameters Taking the negative log-likelihood of p results
in the binary cross-entropy loss function for the predictionportion of the model
119871119901119903119890119889 = minus sum(119906119901)isin119884cup119884
119910119906119901 log119910119906119901+ (1 minus 119910119906119901) log (1 minus 119910119906119901)
(12)
As there are two more outputs in the model the usersrsquosocial network 120595c119864119906 and the locationsrsquo distance graph 120595119888119864119901two additional loss functions are required to train the net-work We follow the process done in [2] assuming two userswho share the same context should have similar embeddingsThis is achieved by minimizing the log-loss of the contextgiven the instance embedding
119871119906 119888119900119899119905119890119909119905= minus sum(119906119906119888)
log(120595119888119864119906 minus log sum1199061015840119888isin119862119906
exp (1205951198881015840119864119906)) (13)
where 120595119888119864119906 is as defined in (1) Taking the binary classlabel into account prompts the following loss functioncorrespondingwithminimizing the cross-entropy loss of useri and context c with respect to the y class label
where I is a function that returns 1 if y is in the given setand 0 otherwise The same logic is used to formulate the lossfunction for the POI context and will not be provided due tospace limitations
We simultaneously minimize the three loss functions119871119901119903e119889 119871119906 119888119900119899119905119890119909119905 and 119871119901 119888119900119899119905119890119909119905 The joint optimizationimproves the recommendation accuracy while enforcingsimilar representations for locations in close proximity andusers connected in the social network The loss functionsare combined using two hyper-parameters to weight thecontextual contribution
To optimize the combined loss function a method ofgradient descent can be adopted and more specifically weutilize the Adaptive Moment Estimation (Adam) [38] Thisoptimizer automatically adjusts the learning rate and yieldsfaster convergence than the standard gradient descent inaddition to making the learning rate optimization processmore efficient In order to avoid additional overfitting whentraining the model an early stopping criteria is integratedThe model parameters are initialized with Gaussian distri-bution while the output layerrsquos parameters are set to followuniform distribution
4 Experiments and Evaluation
41 Experimental Setup To evaluate our proposed algorithmwe use Yelprsquos real-world dataset (httpswwwyelpcomdata-setchallenge) It includes a subset of textual reviews along
Complexity 7
with the usersrsquo friends and the businesses locations Due tothe limited resources used in the model evaluation we choseto filter the dataset and keep only a concise subset whereall users and locations with less than 100 written reviews orless than 10 friends are removedThe filtered dataset includes141028 reviews and 9808 sparsity for the rating matrixThesocial and geographical graphs were constructed by randomwalks 10 of the original vertices were sampled as basenodes while 20 and 30 vertices were connected to each basenode for users and locations respectively with a window sizeof 3 To build the POI graph two locations are considereddirectly connected if they are up to 1 km apart
To test our modelsrsquo performance the original data wassplit to training-validation-test sets by random samplingwith the respective ratios of 56-24-20 resulting in78899 training instances In addition the input data was neg-atively sampled with 4 negative locations for every positiveone
To effectively compare our proposal with other alterna-tives we adopt the same settings as applied in [2 3] TheMLP input vectors are represented with an embedding size of10 while two hidden layers are added on top of the mergedresult Following the tower architecture where the size ofeach layer is half the size of its predecessor the numbers ofhidden units are 32 and 16 for the first and second layersrespectively
In the CNN component each word is represented by apretrained embedding layer with 50 units while the convo-lutional layer is constructed with a window size of 10 anda stride of 3 It results in 3 feature maps that are flattenedafter performing the max-pooling operation with a pool sizeof 2 The results are further modeled by a hidden layer with32 units Following the merge of the two hidden units theirinteraction is learned using another hidden layer with 8 unitsTo combine the three loss functions as described in (15) wefollow the results of [2] and set the hyperparameters 1205821 =1205822 = 01 For the training phase of the model a learning rateof 0005 was used over 50 maximum epochs and a batch sizeof 512 samples
42 Baselines To evaluate our algorithm we chose to com-pare it to these seven empirically proven frameworks
(vi) PACE [2] Preference and Context Embedding AMLP-based framework with the addition of contex-tual graphsrsquo smoothing for POI recommendation
(vii) DeepCoNN [14] Deep Cooperative Neural Net-works A CNN-based method that jointly learns anexplicit prediction by exploiting usersrsquo and locationsrsquonatural language reviews
For the task of evaluating ourmodel and the baselines wechose to apply Accuracy and Mean Square Error (MSE) overall n test samples as well as Precision (Pre10) and Recall(Rec10) for the average top 10 predictions per user
The proposed models were implemented using Keras(httpskerasio) on top of TensorFlow (httpswwwten-sorfloworg) backend All experiments were conducted usingNivida GTX 1070 GPU
43 Performance Evaluation The performance of our pro-posed algorithms and the seven baselines is reported inTable 1 along with the improvement ratio of TCENR overeach method in brackets The presented results are based onthe average of three individual executions
As can be witnessed from the results the proposedmodel TCENR achieves the best results overall comparedto all baselines Furthermore it was found to significantlyoutperformHPFNMF Geo-SAGE LCARS Pace and Deep-CoNN for 119901 lt 005 based on a one-sided unpaired t-testin terms of accuracy and MSE The contrasting results interms of precision and recall compared to NeuMF suggestthat TCENR offers lessbut more relevant recommendationsto the user While NMF provides the best precision scorecompared to all methods it underperforms in all othermeasures making it a less desirable model Taking a closerlook shows that surprisingly NeuMF outperforms PACE inaccuracy precision and recall This may be due to the less
Figure 3 Runtime (seconds) of all models on the Yelp dataset
sparse dataset tested which does not allow the contextualregularization to be fully harvested In addition the use ofonly the first 500words to represent the textual input for eachuser and location may explain the relatively low scores of theDeepCoNN model on the dataset while the performance ofGeo-SAGE and LCARS demonstrates that relying solely ongeographical data does not allow suchmodels to fully captureusersrsquo preferences in LBSNs
Comparing TCENR and its proposed extensionTCENRseq provides contrasting results By employingRNN instead of CNN to extract user and location featuresfrom textual reviews TCENRseq achieves lower error rateand improved precision score while accuracy and recallare worsened It may be considered that by accuratelycapturing different aspects from user reviews the modelis able to reinforce its hypotheses and therefore reducethe uncertainty in some cases However when faced witha contrast between textual aspects and the ground truthit might choose the wrong class label Nonetheless theresults demonstrate the importance of adopting the mostsuitable techniques and measures to learn different datatypes rather than employing a single method over all inputsMoreover it shows the positive impact of using textualdata in conjunction to historical activities The reportedperformance further suggests additional insight towards theselection of CNN and RNN for the task of language modelingin future recommendation tasks
To further evaluate our suggested frameworks and theseven baselines in terms of runtime the average time requiredto fully train each method is presented in Figure 3 Asdemonstrated by the results TCENR is competitivewithmostbaselines and found to be more efficient than DeepCoNNand LCARS The reported runtime of TCENRseq furtherdemonstrates the relative efficiency of CNN-based solutionsfor textual modeling tasks As the number of trainableparameters is increased due to the use of recurrent layers ourRNN-based extension takes 329 longer to train comparedto TCENR while achieving comparative results
44 Model Design Analysis In this section we discuss theeffect of several design selections over the suggested modelrsquosperformance
441Merge Layer The importance of themodelrsquos final layersresponsible for combining the dense output of both the MLPand convolutional networks requires a close attention as it
effects the networksrsquo ability to jointly learn and the predictionitself To properly select the fusion operator the followingmethods had been considered
(i) Combining the last hidden layers of the two modelsusing concatenation A model using this method willbe denoted as 119879119862119864119873119877119888119900119899 and described in (9)
(ii) Merging the last hidden layers using dot productresulting in a model named 119879119862119864119873119877119889119900119905 that can bedefined as
(iii) Combining the two previously described methodswhere the two representations will be jointly learnedby concatenation and dot productThe resultedmodelwill be denoted as 119879119862119864119873119877119889119900119905 119888119900119899 and can be devel-oped by combining (9) and (16) using addition andtranslating the result to a range of [0 1] with thesigmoid function
(iv) Adopting a weighted average for the prediction resultof the two networks Denoted as 119879119862119864119873119877119908119890119894119892ℎ119905 thismodel can be defined as
119910119906119901 = 1205821120590 (1198825 times ℎ119888119900119899119905119890119909119905 + 1198875)+ 1205822120590 (1198826 times ℎ119903119890V119894119890119908119904 + 1198876) (18)
As shown in Figure 4 adopting the more simple methodsof weighted average and dot product leads to an inferiorperformance of TCENR demonstrating the added valueof utilizing the latent features learned by each subnetworkjointly When combined with the underperforming methodof dot product in 119879119862119864119873119877119889119900119905 119888119900119899 the use of concatenationimproves over dot product alone However since the twomethods are integrated using a simple average employingonly concatenation as done in 119879119862119864119873119877119888119900119899 produces the bestresults and therefore integrated into the final model
442 MLP Layer Design Although it was found by [3] thatadding more layers and units to the MLP-based recom-mender has a positive effect the use of CNN and the addi-tional hidden layer suggests it is a subject worth investigating
Complexity 9
0812
0823
0802
0828
TCENRdot_con
TCENRweight
TCENRdot
TCENR con
Figure 4 Comparison of merging methods in terms of accuracy
Table 2 Modelrsquos accuracy with different layers
To this end we test the proposed algorithm with 1-4 hiddenlayers used to learn the user-item interaction with contextualregularization in varying sizes from 8 to 128 hidden unitsThe results in terms of test setrsquos accuracy are presented inTable 2 where the number of hidden layers is defined ascolumns and the size of the first unit is presented as rowsUnlike previous results we find that two hidden layers with32 and 16 hidden units result in the best performance for ourdataset
443 Number of Words The use of written reviews in theiroriginal order allows the strengths of CNN and RNN to beexploited by finding the best representation for every fewwords and eventually for the whole text Our final datasethowever is composed of very long reviews where to fullylearn a single user or location more than 20000 words arerequired making it computationally expensive to extract rel-evant representations To benefit from the sequential natureof the written reviews while keeping the solution feasible thenumber of words was limited to a range of 500-6000 As canbe witnessed from Figure 5 there is a slight improvement inaccuracy as the number of words increase up to 3000 whileadditional words result in an increased bias towards users andlocations with longer reviews and in turn reduce the modelrsquoslearning capabilities
5 Conclusion and Future Work
In this paper we developed a neural POI recommendersystem called TCENR The model exploits data about userslocations spatial data social networks and textual reviewsto predict the implicit preference of users regarding POIsTCENR models two types of user-location interactionsnative check-ins regularized by contextual information andthe words used to describe the usersrsquo experiences We furtherextended our proposed method and presented TCENRseqwhere textual data was modeled using RNN instead of CNNEvaluated over the Yelp dataset the proposed algorithms
0822
0825
0828
0831
0834
0 2000 4000 6000
Figure 5 Number of words comparison in terms of accuracy
consistently achieved superior results compared to sevenstate-of-the-art baselines in terms of accuracy and MSE
For future work we intend to extend our modelsrsquoevaluation over additional LBSN datasets In addition weplan to investigate the proposed frameworksrsquo contributionto the cold-start problem by analyzing its performance onadditional data while taking newusers and locationswith fewreviews into account
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was supported in part by the National NaturalScience Foundation of China Grant (61572289) and NSERCDiscovery Grants
References
[1] H Li Y Ge R Hong and H Zhu ldquoPoint-of-interest rec-ommendations learning potential check-ins from friendsrdquo inProceedings of the 22nd ACM SIGKDD International Conferenceon Knowledge Discovery And Data Mining pp 975ndash984 ACMSan Francisco California USA August 2016
[2] C Yang L Bai C Zhang Q Yuan and J Han ldquoBridgingcollaborative filtering and semi-supervised learning a neuralapproach for poi recommendationrdquo in Proceedings of the 23rdACM SIGKDD International Conference on Knowledge Discov-ery andDataMining pp 1245ndash1254 ACMHalifaxNS CanadaAugust 2017
[3] X He L Liao H Zhang L Nie X Hu and T-S Chua ldquoNeuralcollaborative filteringrdquo in Proceedings of the 26th InternationalConference onWorldWideWeb InternationalWorldWideWebConferences Steering Committee pp 173ndash182 Perth AustraliaApril 2017
[4] H Ma D Zhou C Liu M R Lyu and I King ldquoRecommendersystems with social regularizationrdquo in Proceedings of the 4thACM International Conference onWeb Search and DataMiningpp 287ndash296 ACM February 2011
10 Complexity
[5] W Wang H Yin L Chen Y Sun S Sadiq and X ZhouldquoGeo-sage a geographical sparse additive generative model forspatial item recommendationrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1255ndash1264 ACM Sydney NSW AustraliaAugust 2015
[6] H Yin Y Sun B Cui Z Hu and L Chen ldquoLcars a location-content-aware recommender systemrdquo in Proceedings of the 19thACM SIGKDD International Conference on Knowledge Discov-ery andDataMining pp 221ndash229 ACMChicago Illinois USAAugust 2013
[7] C Wang and D M Blei ldquoCollaborative topic modeling for rec-ommending scientific articlesrdquo in Proceedings of the 17th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo11) pp 448ndash456 ACM August 2011
[8] HWang NWang andD-Y Yeung ldquoCollaborative deep learn-ing for recommender systemsrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1235ndash1244 ACM Sydney NSW AustraliaAugust 2015
[9] J Manotumruksa C Macdonald and I Ounis ldquoA deep recur-rent collaborative filtering framework for venue recommen-dationrdquo in Proceedings of the 2017 ACM on Conference onInformation and KnowledgeManagement pp 1429ndash1438 ACMSingapore Singapore November 2017
[10] Q Liu S Wu D Wang Z Li and L Wang ldquoContext-awaresequential recommendationrdquo in Proceedings of the 2016 IEEE16th International Conference on Data Mining (ICDM) pp1053ndash1058 IEEE Barcelona Spain December 2016
[11] H-T Cheng L Koc J Harmsen et al ldquoWide amp deep learningfor recommender systemsrdquo inProceedings of the 1stWorkshop onDeep Learning for Recommender Systems pp 7ndash10 ACM 2016
[12] Y Yu L Zhang C Wang R Gao W Zhao and J JiangldquoNeural personalized ranking via poisson factormodel for itemrecommendationrdquoComplexity vol 2019 Article ID 3563674 16pages 2019
[13] A Van Den Oord S Dieleman and B Schrauwen ldquoDeepcontent-based music recommendationrdquo in Proceedings of the26th International Conference on Neural Information ProcessingSystems Advances in neural information processing systemspp 2643ndash2651 2013
[14] L Zheng V Noroozi and P S Yu ldquoJoint deepmodeling of usersand items using reviews for recommendationrdquo in Proceedingsof the Tenth ACM International Conference on Web Search andData Mining pp 425ndash434 ACM Cambridge UK Feburary2017
[15] D Kim C Park J Oh S Lee andH Yu ldquoConvolutional matrixfactorization for document context-aware recommendationrdquoin Proceedings of the 10th ACM Conference on RecommenderSystems pp 233ndash240 ACM 2016
[16] B Hidasi A Karatzoglou L Baltrunas and D Tikk ldquoSession-based recommendations with recurrent neural networksrdquo 2015httpsarxivorgabs151106939
[17] A Almahairi K Kastner K Cho and A Courville ldquoLearningdistributed representations from reviews for collaborative filter-ingrdquo in Proceedings of the 9th ACMConference on RecommenderSystems pp 147ndash154 ACM Vienna Austria September 2015
[18] T Bansal D Belanger and A McCallum ldquoAsk the gru multi-task learning for deep text recommendationsrdquo in Proceedings ofthe 10th ACMConference on Recommender Systems pp 107ndash114ACM 2016
[19] J Chen W Zhang P Zhang P Ying K Niu and M ZouldquoExploiting spatial and temporal for point of interest recom-mendationrdquoComplexity vol 2018 Article ID 6928605 16 pages2018
[20] P Zhao X Xu Y Liu V S Sheng K Zheng and H XiongldquoPhoto2trip exploiting visual contents in geo-tagged photos forpersonalized tour recommendationrdquo in Proceedings of the 2017ACM on Multimedia Conference - MM 17 pp 916ndash924 ACMPress Mountain View California USA October 2017
[21] P Covington J Adams and E Sargin ldquoDeep neural networksfor youtube recommendationsrdquo in Proceedings of the 10th ACMConference on Recommender Systems pp 191ndash198 ACM 2016
[22] C-Y Wu A Ahmed A Beutel A J Smola and H JingldquoRecurrent recommender networksrdquo in Proceedings of the TenthACM International Conference onWeb Search and DataMiningpp 495ndash503 ACM 2017
[23] A Beutel P Covington S Jain et al ldquoLatent cross making useof context in recurrent recommender systemsrdquo inProceedings ofthe Eleventh ACM International Conference on Web Search andDataMining pp 46ndash54 ACMMarinaDel Rey CA USA 2018
[24] H Yin W Wang H Wang L Chen and X Zhou ldquoSpatial-aware hierarchical collaborative deep learning for POI rec-ommendationrdquo IEEE Transactions on Knowledge and DataEngineering vol 29 no 11 pp 2537ndash2551 2017
[25] H Yin X Zhou Y Shao H Wang and S Sadiq ldquoJointmodeling of user check-in behaviors for point-of-interest rec-ommendationrdquo in Proceedings of the 24th ACM Internationalon Conference on Information and Knowledge Management pp1631ndash1640 ACMMelbourne Australia October 2015
[26] P Zhao X Xu Y Liu et al ldquoExploiting hierarchical structuresfor POI recommendationrdquo in Proceedings of the 2017 IEEEInternational Conference on Data Mining (ICDM) IEEE NewOrleans LA USA November 2017
[27] P Zhao H Zhu Y Liu et al ldquoWhere to go next a spatio-temporal gated network for next poi recommendationrdquo inProceedings of the 33rdAAAIConference onArtificial Intelligence(AAAI 2019) 2019
[28] HWang F Zhang X Xie andMGuo ldquoDkn deep knowledge-aware network for news recommendationrdquo in Proceedings of the2018World Wide Web Conference pp 1835ndash1844 Lyon FranceApril 2018
[29] Y Gong and Q Zhang ldquoHashtag recommendation usingattention-based convolutional neural networkrdquo in Proceedingsof the 25th International Joint Conference on Artificial Intelli-gence IJCAI 2016 pp 2782ndash2788 NY USA July 2016
[30] Y Tay A T Luu and S C Hui ldquoMulti-pointer co-attentionnetworks for recommendationrdquo in Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Discovery ampData Mining pp 2309ndash2318 London UK August 2018
[31] Z Cheng Y Ding L Zhu and M Kankanhalli ldquoAspect-awarelatent factormodel rating prediction with ratings and reviewsrdquoin Proceedings of the 2018World WideWeb Conference pp 639ndash648 Lyon France April 2018
[32] D Tang BQin T Liu andY Yang ldquoUsermodeling with neuralnetwork for review rating predictionrdquo in Proceedings of the 24thInternational Conference on Artificial Intelligence IJCAI 2015pp 1340ndash1346 Argentina July 2015
[33] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2324 1998
[34] Y Kim ldquoConvolutional neural networks for sentence classi-ficationrdquo in Proceedings of the 2014 Conference on Empirical
Complexity 11
Methods in Natural Language Processing (EMNLP) pp 1746ndash1751 Association for Computational Linguistics Doha Qatar2014 httpsaclanthologyinfopapersD14-1181d14-1181
[35] Y Zhang Q Ai X Chen andW B Croft ldquoJoint representationlearning for top-N recommendation with heterogeneous infor-mation sourcesrdquo in Proceedings of the 2017 ACM on Conferenceon Information and Knowledge Management pp 1449ndash1458ACM Singapore Singapore November 2017
[36] J Pennington R Socher and C Manning ldquoGloVe globalvectors for word representationrdquo in Proceedings of the 2014 Con-ference on Empirical Methods in Natural Language Processing(EMNLP) pp 1532ndash1543 2014
[37] T Mikolov I Sutskever K Chen G S Corrado and JDean ldquoDistributed representations of words and phrases andtheir compositionalityrdquo in Proceedings of the 26th Interna-tional Conference on Neural Information Processing SystemsAdvances in neural information processing systems pp 3111ndash3119 Lake TahoeNevada 2013 httpsdlacmorgcitationcfmid=2999959
[38] D P Kingma and J Ba ldquoAdam a method for stochasticoptimizationrdquo 2014 httpsarxivorgabs14126980
[39] P Gopalan J M Hofman and DM Blei ldquoScalable recommen-dationwith hierarchical Poisson factorizationrdquo inProceedings ofthe 31st Conference on Uncertainty in Artificial Intelligence UAI2015 pp 326ndash335 Netherlands July 2015
[40] X Luo M Zhou Y Xia and Q Zhu ldquoAn efficient non-negativematrix-factorization-based approach to collaborative filteringfor recommender systemsrdquo IEEE Transactions on IndustrialInformatics vol 10 no 2 pp 1273ndash1284 2014
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Complexity 5
Fully Connected
Max Pooling
ConvolutionalLayers
WordEmbeddings
delicious healthy food steak is amazing
(a) Textual modeling component using CNN
Fully Connected
BI-directionalGRU
Max Pooling
WordEmbeddings
delicious healthy food steak is amazing
(b) Textual modeling component using RNN
Figure 2 Proposed alternatives to learn user and location representations from textual reviews 2(a) is a CNN-based solution employed inTCENR while 2(b) illustrates the suggested extension using RNN
a pre-trained textual embedding layer [36 37] that representseach word in vocabulary 119863 as a vector in size 119896119908 Similarly119881119901 denotes the word embedding matrix for location 119901
Due to the large amount of parameters required to trainthe aforementioned contextual model the textual network isimplemented using a CNN-based architecture which is usu-ally more computationally efficient than RNNThe semanticrepresentations of usersrsquo and locationsrsquo reviews are fed toconvolution layers to detect parts of the text that best capturethe reviewrsquos meaning These layers produce 119905 feature mapsover the embedded word vector using a window size of 119908119904and filter119870 isin R119896119908times119905 As suggested by [14] ReLU is used as anactivation function for this layer
where 119881119906119897 is user 119906rsquos 119897-th input word embedding and 119911119906119895 the119895-th feature extracted from the complete textBased on the standard CNN structure feature maps
produced by the convolution layers are reduced by a poolinglayer
wheremax-pooling is selected to identify the most importantwords and their latent values 119874119906 is the collection of allconcise features extracted from user 119906rsquos textual input Theseare followed by fully connected layers to jointlymodel the dif-ferent features and result in the latent textual representationsfor user 119906 and location 119901 respectively denoted as ℎ119906 and ℎ119901
ℎ119906 = 119877119890119871119880 (1198821199061 times 119874119906 + 1198871199061 ) (7)
To combine the outputs of the users and locations fullyconnected layers to the same feature space a shared layeris utilized It concatenates its two inputs and learns theirinteraction by employing an additional hidden layer
ℎ119903119890V119894119890119908119904 = 119877119890119871119880 (1198822 times [ℎ119906 ℎ119901] + 1198872) (8)
The two neural networks are then finally merged toproduce a prediction 119910119906119901 isin [0 1] The last layers of thetwo networks each representing a different view of the user-location interaction are concatenated and fed to yet anotherhidden layer responsible to blend the learning
119910119906119901 = 120590 (1198823 times [ℎ119888119900119899119905119890119909119905 ℎ119903119890V119894119890119908119904] + 1198873) (9)
where the sigmoid function was selected to transform thehidden layer output to the desired range of [0 1]32 Sequential Textual Modeling To further investigate thegain achieved by integrating a textual modeling componentover reviews in TCENR we suggest an extension denoted asTCENRseq Following its success in previous language model-ing tasks [17 18] and its ability to capture sentencesrsquo sequentialnature we employ anRNNcomponent to learn latent featuresfrom reviews An illustration of the proposed extension ispresented in Figure 2(b) while the CNN method used inthe vanilla TCENR is shown in Figure 2(a) to provide aconvenient base for comparisons More specifically we followthe findings of previous works [18 23] and implement ourrecurrent network using GRU an architecture that achievescompetitive performance compared to LSTM but with fewerparameters making it more efficient
where 119891119897 is the forget gate for input word 119897 119904119897 is the outputgate 119888119897 is the new candidate state combining the current wordembedding 119881119906119897 with the previous hidden state and ℎ119897 iscurrent state for word 119897modeled by the output gate ⊙ denotesthe element-wise product and 119882119891 119877119891 119882119904 119877119904 119882119888 and 119877119888are the GRU weight matrices while 119887119891 119887119904 and 119887119888 are the biasvectors
Since the context of a word can be determined byother preceding and successive words or sentences ourproposed method employs a bidirectional GRU over theuser embedding 119881119906 and the location 119881119901 Each word 119897rsquoshidden state is learned by forward and backward GRU layers
denoted as997888rarrℎ1119897 and larr997888ℎ1119897 respectively To learn a more concise
and combined representation of a word while taking intoaccount the context of all surrounding words we feed the
concatenation of997888rarrℎ1119897 andlarr997888ℎ1119897 to an additional bidirectionalGRU
layer such that its input for every word 119897 is 1198902119897 = [997888rarrℎ1119897 larr997888ℎ1119897 ]The second recurrent unit will output 119899 latent vectors eachis a sequentially infused representation of a word written bythe target user or about the candidate item To allow themethod of textual modeling to be the only variant betweenTCENR and TCENRseq and to further reduce the numberof learned parameters all modified word vectors will befed to the pooling and fully connected layers originallypresented in (6) and (7) respectively This will allow us todirectly determine the effect RNN has on textual modelingfor POI recommender systems compared to CNN as well asenabling the model to learn a more concise user and locationrepresentations As in TCENR the resulting vectors will bemerged in order to learn the user-location interaction
33 Training the Network To train the recommendationmodels we adopt a pointwise loss objective function as donein [2 3 14] where the difference between the prediction 119910119906119901and the actual value 119910119906119901 is minimized To address the implicitfeedback nature of LBSNs we sample a set of negative samplesfrom the dataset denoted as 119884
Due to the implicit feedback nature of the recommenda-tion task the algorithmrsquos output can be considered as a binaryclassification problem As the sigmoid activation function isbeing used over the last hidden layer the output probabilitycan be defined as
where 119864119906 and 119864119901 are the embedding layers for users andlocations respectively Similarly 119881119906 and 119881119901 are the textualreviews embedding layers and Θ119891 represents the modelparameters Taking the negative log-likelihood of p results
in the binary cross-entropy loss function for the predictionportion of the model
119871119901119903119890119889 = minus sum(119906119901)isin119884cup119884
119910119906119901 log119910119906119901+ (1 minus 119910119906119901) log (1 minus 119910119906119901)
(12)
As there are two more outputs in the model the usersrsquosocial network 120595c119864119906 and the locationsrsquo distance graph 120595119888119864119901two additional loss functions are required to train the net-work We follow the process done in [2] assuming two userswho share the same context should have similar embeddingsThis is achieved by minimizing the log-loss of the contextgiven the instance embedding
119871119906 119888119900119899119905119890119909119905= minus sum(119906119906119888)
log(120595119888119864119906 minus log sum1199061015840119888isin119862119906
exp (1205951198881015840119864119906)) (13)
where 120595119888119864119906 is as defined in (1) Taking the binary classlabel into account prompts the following loss functioncorrespondingwithminimizing the cross-entropy loss of useri and context c with respect to the y class label
where I is a function that returns 1 if y is in the given setand 0 otherwise The same logic is used to formulate the lossfunction for the POI context and will not be provided due tospace limitations
We simultaneously minimize the three loss functions119871119901119903e119889 119871119906 119888119900119899119905119890119909119905 and 119871119901 119888119900119899119905119890119909119905 The joint optimizationimproves the recommendation accuracy while enforcingsimilar representations for locations in close proximity andusers connected in the social network The loss functionsare combined using two hyper-parameters to weight thecontextual contribution
To optimize the combined loss function a method ofgradient descent can be adopted and more specifically weutilize the Adaptive Moment Estimation (Adam) [38] Thisoptimizer automatically adjusts the learning rate and yieldsfaster convergence than the standard gradient descent inaddition to making the learning rate optimization processmore efficient In order to avoid additional overfitting whentraining the model an early stopping criteria is integratedThe model parameters are initialized with Gaussian distri-bution while the output layerrsquos parameters are set to followuniform distribution
4 Experiments and Evaluation
41 Experimental Setup To evaluate our proposed algorithmwe use Yelprsquos real-world dataset (httpswwwyelpcomdata-setchallenge) It includes a subset of textual reviews along
Complexity 7
with the usersrsquo friends and the businesses locations Due tothe limited resources used in the model evaluation we choseto filter the dataset and keep only a concise subset whereall users and locations with less than 100 written reviews orless than 10 friends are removedThe filtered dataset includes141028 reviews and 9808 sparsity for the rating matrixThesocial and geographical graphs were constructed by randomwalks 10 of the original vertices were sampled as basenodes while 20 and 30 vertices were connected to each basenode for users and locations respectively with a window sizeof 3 To build the POI graph two locations are considereddirectly connected if they are up to 1 km apart
To test our modelsrsquo performance the original data wassplit to training-validation-test sets by random samplingwith the respective ratios of 56-24-20 resulting in78899 training instances In addition the input data was neg-atively sampled with 4 negative locations for every positiveone
To effectively compare our proposal with other alterna-tives we adopt the same settings as applied in [2 3] TheMLP input vectors are represented with an embedding size of10 while two hidden layers are added on top of the mergedresult Following the tower architecture where the size ofeach layer is half the size of its predecessor the numbers ofhidden units are 32 and 16 for the first and second layersrespectively
In the CNN component each word is represented by apretrained embedding layer with 50 units while the convo-lutional layer is constructed with a window size of 10 anda stride of 3 It results in 3 feature maps that are flattenedafter performing the max-pooling operation with a pool sizeof 2 The results are further modeled by a hidden layer with32 units Following the merge of the two hidden units theirinteraction is learned using another hidden layer with 8 unitsTo combine the three loss functions as described in (15) wefollow the results of [2] and set the hyperparameters 1205821 =1205822 = 01 For the training phase of the model a learning rateof 0005 was used over 50 maximum epochs and a batch sizeof 512 samples
42 Baselines To evaluate our algorithm we chose to com-pare it to these seven empirically proven frameworks
(vi) PACE [2] Preference and Context Embedding AMLP-based framework with the addition of contex-tual graphsrsquo smoothing for POI recommendation
(vii) DeepCoNN [14] Deep Cooperative Neural Net-works A CNN-based method that jointly learns anexplicit prediction by exploiting usersrsquo and locationsrsquonatural language reviews
For the task of evaluating ourmodel and the baselines wechose to apply Accuracy and Mean Square Error (MSE) overall n test samples as well as Precision (Pre10) and Recall(Rec10) for the average top 10 predictions per user
The proposed models were implemented using Keras(httpskerasio) on top of TensorFlow (httpswwwten-sorfloworg) backend All experiments were conducted usingNivida GTX 1070 GPU
43 Performance Evaluation The performance of our pro-posed algorithms and the seven baselines is reported inTable 1 along with the improvement ratio of TCENR overeach method in brackets The presented results are based onthe average of three individual executions
As can be witnessed from the results the proposedmodel TCENR achieves the best results overall comparedto all baselines Furthermore it was found to significantlyoutperformHPFNMF Geo-SAGE LCARS Pace and Deep-CoNN for 119901 lt 005 based on a one-sided unpaired t-testin terms of accuracy and MSE The contrasting results interms of precision and recall compared to NeuMF suggestthat TCENR offers lessbut more relevant recommendationsto the user While NMF provides the best precision scorecompared to all methods it underperforms in all othermeasures making it a less desirable model Taking a closerlook shows that surprisingly NeuMF outperforms PACE inaccuracy precision and recall This may be due to the less
Figure 3 Runtime (seconds) of all models on the Yelp dataset
sparse dataset tested which does not allow the contextualregularization to be fully harvested In addition the use ofonly the first 500words to represent the textual input for eachuser and location may explain the relatively low scores of theDeepCoNN model on the dataset while the performance ofGeo-SAGE and LCARS demonstrates that relying solely ongeographical data does not allow suchmodels to fully captureusersrsquo preferences in LBSNs
Comparing TCENR and its proposed extensionTCENRseq provides contrasting results By employingRNN instead of CNN to extract user and location featuresfrom textual reviews TCENRseq achieves lower error rateand improved precision score while accuracy and recallare worsened It may be considered that by accuratelycapturing different aspects from user reviews the modelis able to reinforce its hypotheses and therefore reducethe uncertainty in some cases However when faced witha contrast between textual aspects and the ground truthit might choose the wrong class label Nonetheless theresults demonstrate the importance of adopting the mostsuitable techniques and measures to learn different datatypes rather than employing a single method over all inputsMoreover it shows the positive impact of using textualdata in conjunction to historical activities The reportedperformance further suggests additional insight towards theselection of CNN and RNN for the task of language modelingin future recommendation tasks
To further evaluate our suggested frameworks and theseven baselines in terms of runtime the average time requiredto fully train each method is presented in Figure 3 Asdemonstrated by the results TCENR is competitivewithmostbaselines and found to be more efficient than DeepCoNNand LCARS The reported runtime of TCENRseq furtherdemonstrates the relative efficiency of CNN-based solutionsfor textual modeling tasks As the number of trainableparameters is increased due to the use of recurrent layers ourRNN-based extension takes 329 longer to train comparedto TCENR while achieving comparative results
44 Model Design Analysis In this section we discuss theeffect of several design selections over the suggested modelrsquosperformance
441Merge Layer The importance of themodelrsquos final layersresponsible for combining the dense output of both the MLPand convolutional networks requires a close attention as it
effects the networksrsquo ability to jointly learn and the predictionitself To properly select the fusion operator the followingmethods had been considered
(i) Combining the last hidden layers of the two modelsusing concatenation A model using this method willbe denoted as 119879119862119864119873119877119888119900119899 and described in (9)
(ii) Merging the last hidden layers using dot productresulting in a model named 119879119862119864119873119877119889119900119905 that can bedefined as
(iii) Combining the two previously described methodswhere the two representations will be jointly learnedby concatenation and dot productThe resultedmodelwill be denoted as 119879119862119864119873119877119889119900119905 119888119900119899 and can be devel-oped by combining (9) and (16) using addition andtranslating the result to a range of [0 1] with thesigmoid function
(iv) Adopting a weighted average for the prediction resultof the two networks Denoted as 119879119862119864119873119877119908119890119894119892ℎ119905 thismodel can be defined as
119910119906119901 = 1205821120590 (1198825 times ℎ119888119900119899119905119890119909119905 + 1198875)+ 1205822120590 (1198826 times ℎ119903119890V119894119890119908119904 + 1198876) (18)
As shown in Figure 4 adopting the more simple methodsof weighted average and dot product leads to an inferiorperformance of TCENR demonstrating the added valueof utilizing the latent features learned by each subnetworkjointly When combined with the underperforming methodof dot product in 119879119862119864119873119877119889119900119905 119888119900119899 the use of concatenationimproves over dot product alone However since the twomethods are integrated using a simple average employingonly concatenation as done in 119879119862119864119873119877119888119900119899 produces the bestresults and therefore integrated into the final model
442 MLP Layer Design Although it was found by [3] thatadding more layers and units to the MLP-based recom-mender has a positive effect the use of CNN and the addi-tional hidden layer suggests it is a subject worth investigating
Complexity 9
0812
0823
0802
0828
TCENRdot_con
TCENRweight
TCENRdot
TCENR con
Figure 4 Comparison of merging methods in terms of accuracy
Table 2 Modelrsquos accuracy with different layers
To this end we test the proposed algorithm with 1-4 hiddenlayers used to learn the user-item interaction with contextualregularization in varying sizes from 8 to 128 hidden unitsThe results in terms of test setrsquos accuracy are presented inTable 2 where the number of hidden layers is defined ascolumns and the size of the first unit is presented as rowsUnlike previous results we find that two hidden layers with32 and 16 hidden units result in the best performance for ourdataset
443 Number of Words The use of written reviews in theiroriginal order allows the strengths of CNN and RNN to beexploited by finding the best representation for every fewwords and eventually for the whole text Our final datasethowever is composed of very long reviews where to fullylearn a single user or location more than 20000 words arerequired making it computationally expensive to extract rel-evant representations To benefit from the sequential natureof the written reviews while keeping the solution feasible thenumber of words was limited to a range of 500-6000 As canbe witnessed from Figure 5 there is a slight improvement inaccuracy as the number of words increase up to 3000 whileadditional words result in an increased bias towards users andlocations with longer reviews and in turn reduce the modelrsquoslearning capabilities
5 Conclusion and Future Work
In this paper we developed a neural POI recommendersystem called TCENR The model exploits data about userslocations spatial data social networks and textual reviewsto predict the implicit preference of users regarding POIsTCENR models two types of user-location interactionsnative check-ins regularized by contextual information andthe words used to describe the usersrsquo experiences We furtherextended our proposed method and presented TCENRseqwhere textual data was modeled using RNN instead of CNNEvaluated over the Yelp dataset the proposed algorithms
0822
0825
0828
0831
0834
0 2000 4000 6000
Figure 5 Number of words comparison in terms of accuracy
consistently achieved superior results compared to sevenstate-of-the-art baselines in terms of accuracy and MSE
For future work we intend to extend our modelsrsquoevaluation over additional LBSN datasets In addition weplan to investigate the proposed frameworksrsquo contributionto the cold-start problem by analyzing its performance onadditional data while taking newusers and locationswith fewreviews into account
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was supported in part by the National NaturalScience Foundation of China Grant (61572289) and NSERCDiscovery Grants
References
[1] H Li Y Ge R Hong and H Zhu ldquoPoint-of-interest rec-ommendations learning potential check-ins from friendsrdquo inProceedings of the 22nd ACM SIGKDD International Conferenceon Knowledge Discovery And Data Mining pp 975ndash984 ACMSan Francisco California USA August 2016
[2] C Yang L Bai C Zhang Q Yuan and J Han ldquoBridgingcollaborative filtering and semi-supervised learning a neuralapproach for poi recommendationrdquo in Proceedings of the 23rdACM SIGKDD International Conference on Knowledge Discov-ery andDataMining pp 1245ndash1254 ACMHalifaxNS CanadaAugust 2017
[3] X He L Liao H Zhang L Nie X Hu and T-S Chua ldquoNeuralcollaborative filteringrdquo in Proceedings of the 26th InternationalConference onWorldWideWeb InternationalWorldWideWebConferences Steering Committee pp 173ndash182 Perth AustraliaApril 2017
[4] H Ma D Zhou C Liu M R Lyu and I King ldquoRecommendersystems with social regularizationrdquo in Proceedings of the 4thACM International Conference onWeb Search and DataMiningpp 287ndash296 ACM February 2011
10 Complexity
[5] W Wang H Yin L Chen Y Sun S Sadiq and X ZhouldquoGeo-sage a geographical sparse additive generative model forspatial item recommendationrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1255ndash1264 ACM Sydney NSW AustraliaAugust 2015
[6] H Yin Y Sun B Cui Z Hu and L Chen ldquoLcars a location-content-aware recommender systemrdquo in Proceedings of the 19thACM SIGKDD International Conference on Knowledge Discov-ery andDataMining pp 221ndash229 ACMChicago Illinois USAAugust 2013
[7] C Wang and D M Blei ldquoCollaborative topic modeling for rec-ommending scientific articlesrdquo in Proceedings of the 17th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo11) pp 448ndash456 ACM August 2011
[8] HWang NWang andD-Y Yeung ldquoCollaborative deep learn-ing for recommender systemsrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1235ndash1244 ACM Sydney NSW AustraliaAugust 2015
[9] J Manotumruksa C Macdonald and I Ounis ldquoA deep recur-rent collaborative filtering framework for venue recommen-dationrdquo in Proceedings of the 2017 ACM on Conference onInformation and KnowledgeManagement pp 1429ndash1438 ACMSingapore Singapore November 2017
[10] Q Liu S Wu D Wang Z Li and L Wang ldquoContext-awaresequential recommendationrdquo in Proceedings of the 2016 IEEE16th International Conference on Data Mining (ICDM) pp1053ndash1058 IEEE Barcelona Spain December 2016
[11] H-T Cheng L Koc J Harmsen et al ldquoWide amp deep learningfor recommender systemsrdquo inProceedings of the 1stWorkshop onDeep Learning for Recommender Systems pp 7ndash10 ACM 2016
[12] Y Yu L Zhang C Wang R Gao W Zhao and J JiangldquoNeural personalized ranking via poisson factormodel for itemrecommendationrdquoComplexity vol 2019 Article ID 3563674 16pages 2019
[13] A Van Den Oord S Dieleman and B Schrauwen ldquoDeepcontent-based music recommendationrdquo in Proceedings of the26th International Conference on Neural Information ProcessingSystems Advances in neural information processing systemspp 2643ndash2651 2013
[14] L Zheng V Noroozi and P S Yu ldquoJoint deepmodeling of usersand items using reviews for recommendationrdquo in Proceedingsof the Tenth ACM International Conference on Web Search andData Mining pp 425ndash434 ACM Cambridge UK Feburary2017
[15] D Kim C Park J Oh S Lee andH Yu ldquoConvolutional matrixfactorization for document context-aware recommendationrdquoin Proceedings of the 10th ACM Conference on RecommenderSystems pp 233ndash240 ACM 2016
[16] B Hidasi A Karatzoglou L Baltrunas and D Tikk ldquoSession-based recommendations with recurrent neural networksrdquo 2015httpsarxivorgabs151106939
[17] A Almahairi K Kastner K Cho and A Courville ldquoLearningdistributed representations from reviews for collaborative filter-ingrdquo in Proceedings of the 9th ACMConference on RecommenderSystems pp 147ndash154 ACM Vienna Austria September 2015
[18] T Bansal D Belanger and A McCallum ldquoAsk the gru multi-task learning for deep text recommendationsrdquo in Proceedings ofthe 10th ACMConference on Recommender Systems pp 107ndash114ACM 2016
[19] J Chen W Zhang P Zhang P Ying K Niu and M ZouldquoExploiting spatial and temporal for point of interest recom-mendationrdquoComplexity vol 2018 Article ID 6928605 16 pages2018
[20] P Zhao X Xu Y Liu V S Sheng K Zheng and H XiongldquoPhoto2trip exploiting visual contents in geo-tagged photos forpersonalized tour recommendationrdquo in Proceedings of the 2017ACM on Multimedia Conference - MM 17 pp 916ndash924 ACMPress Mountain View California USA October 2017
[21] P Covington J Adams and E Sargin ldquoDeep neural networksfor youtube recommendationsrdquo in Proceedings of the 10th ACMConference on Recommender Systems pp 191ndash198 ACM 2016
[22] C-Y Wu A Ahmed A Beutel A J Smola and H JingldquoRecurrent recommender networksrdquo in Proceedings of the TenthACM International Conference onWeb Search and DataMiningpp 495ndash503 ACM 2017
[23] A Beutel P Covington S Jain et al ldquoLatent cross making useof context in recurrent recommender systemsrdquo inProceedings ofthe Eleventh ACM International Conference on Web Search andDataMining pp 46ndash54 ACMMarinaDel Rey CA USA 2018
[24] H Yin W Wang H Wang L Chen and X Zhou ldquoSpatial-aware hierarchical collaborative deep learning for POI rec-ommendationrdquo IEEE Transactions on Knowledge and DataEngineering vol 29 no 11 pp 2537ndash2551 2017
[25] H Yin X Zhou Y Shao H Wang and S Sadiq ldquoJointmodeling of user check-in behaviors for point-of-interest rec-ommendationrdquo in Proceedings of the 24th ACM Internationalon Conference on Information and Knowledge Management pp1631ndash1640 ACMMelbourne Australia October 2015
[26] P Zhao X Xu Y Liu et al ldquoExploiting hierarchical structuresfor POI recommendationrdquo in Proceedings of the 2017 IEEEInternational Conference on Data Mining (ICDM) IEEE NewOrleans LA USA November 2017
[27] P Zhao H Zhu Y Liu et al ldquoWhere to go next a spatio-temporal gated network for next poi recommendationrdquo inProceedings of the 33rdAAAIConference onArtificial Intelligence(AAAI 2019) 2019
[28] HWang F Zhang X Xie andMGuo ldquoDkn deep knowledge-aware network for news recommendationrdquo in Proceedings of the2018World Wide Web Conference pp 1835ndash1844 Lyon FranceApril 2018
[29] Y Gong and Q Zhang ldquoHashtag recommendation usingattention-based convolutional neural networkrdquo in Proceedingsof the 25th International Joint Conference on Artificial Intelli-gence IJCAI 2016 pp 2782ndash2788 NY USA July 2016
[30] Y Tay A T Luu and S C Hui ldquoMulti-pointer co-attentionnetworks for recommendationrdquo in Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Discovery ampData Mining pp 2309ndash2318 London UK August 2018
[31] Z Cheng Y Ding L Zhu and M Kankanhalli ldquoAspect-awarelatent factormodel rating prediction with ratings and reviewsrdquoin Proceedings of the 2018World WideWeb Conference pp 639ndash648 Lyon France April 2018
[32] D Tang BQin T Liu andY Yang ldquoUsermodeling with neuralnetwork for review rating predictionrdquo in Proceedings of the 24thInternational Conference on Artificial Intelligence IJCAI 2015pp 1340ndash1346 Argentina July 2015
[33] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2324 1998
[34] Y Kim ldquoConvolutional neural networks for sentence classi-ficationrdquo in Proceedings of the 2014 Conference on Empirical
Complexity 11
Methods in Natural Language Processing (EMNLP) pp 1746ndash1751 Association for Computational Linguistics Doha Qatar2014 httpsaclanthologyinfopapersD14-1181d14-1181
[35] Y Zhang Q Ai X Chen andW B Croft ldquoJoint representationlearning for top-N recommendation with heterogeneous infor-mation sourcesrdquo in Proceedings of the 2017 ACM on Conferenceon Information and Knowledge Management pp 1449ndash1458ACM Singapore Singapore November 2017
[36] J Pennington R Socher and C Manning ldquoGloVe globalvectors for word representationrdquo in Proceedings of the 2014 Con-ference on Empirical Methods in Natural Language Processing(EMNLP) pp 1532ndash1543 2014
[37] T Mikolov I Sutskever K Chen G S Corrado and JDean ldquoDistributed representations of words and phrases andtheir compositionalityrdquo in Proceedings of the 26th Interna-tional Conference on Neural Information Processing SystemsAdvances in neural information processing systems pp 3111ndash3119 Lake TahoeNevada 2013 httpsdlacmorgcitationcfmid=2999959
[38] D P Kingma and J Ba ldquoAdam a method for stochasticoptimizationrdquo 2014 httpsarxivorgabs14126980
[39] P Gopalan J M Hofman and DM Blei ldquoScalable recommen-dationwith hierarchical Poisson factorizationrdquo inProceedings ofthe 31st Conference on Uncertainty in Artificial Intelligence UAI2015 pp 326ndash335 Netherlands July 2015
[40] X Luo M Zhou Y Xia and Q Zhu ldquoAn efficient non-negativematrix-factorization-based approach to collaborative filteringfor recommender systemsrdquo IEEE Transactions on IndustrialInformatics vol 10 no 2 pp 1273ndash1284 2014
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
where 119891119897 is the forget gate for input word 119897 119904119897 is the outputgate 119888119897 is the new candidate state combining the current wordembedding 119881119906119897 with the previous hidden state and ℎ119897 iscurrent state for word 119897modeled by the output gate ⊙ denotesthe element-wise product and 119882119891 119877119891 119882119904 119877119904 119882119888 and 119877119888are the GRU weight matrices while 119887119891 119887119904 and 119887119888 are the biasvectors
Since the context of a word can be determined byother preceding and successive words or sentences ourproposed method employs a bidirectional GRU over theuser embedding 119881119906 and the location 119881119901 Each word 119897rsquoshidden state is learned by forward and backward GRU layers
denoted as997888rarrℎ1119897 and larr997888ℎ1119897 respectively To learn a more concise
and combined representation of a word while taking intoaccount the context of all surrounding words we feed the
concatenation of997888rarrℎ1119897 andlarr997888ℎ1119897 to an additional bidirectionalGRU
layer such that its input for every word 119897 is 1198902119897 = [997888rarrℎ1119897 larr997888ℎ1119897 ]The second recurrent unit will output 119899 latent vectors eachis a sequentially infused representation of a word written bythe target user or about the candidate item To allow themethod of textual modeling to be the only variant betweenTCENR and TCENRseq and to further reduce the numberof learned parameters all modified word vectors will befed to the pooling and fully connected layers originallypresented in (6) and (7) respectively This will allow us todirectly determine the effect RNN has on textual modelingfor POI recommender systems compared to CNN as well asenabling the model to learn a more concise user and locationrepresentations As in TCENR the resulting vectors will bemerged in order to learn the user-location interaction
33 Training the Network To train the recommendationmodels we adopt a pointwise loss objective function as donein [2 3 14] where the difference between the prediction 119910119906119901and the actual value 119910119906119901 is minimized To address the implicitfeedback nature of LBSNs we sample a set of negative samplesfrom the dataset denoted as 119884
Due to the implicit feedback nature of the recommenda-tion task the algorithmrsquos output can be considered as a binaryclassification problem As the sigmoid activation function isbeing used over the last hidden layer the output probabilitycan be defined as
where 119864119906 and 119864119901 are the embedding layers for users andlocations respectively Similarly 119881119906 and 119881119901 are the textualreviews embedding layers and Θ119891 represents the modelparameters Taking the negative log-likelihood of p results
in the binary cross-entropy loss function for the predictionportion of the model
119871119901119903119890119889 = minus sum(119906119901)isin119884cup119884
119910119906119901 log119910119906119901+ (1 minus 119910119906119901) log (1 minus 119910119906119901)
(12)
As there are two more outputs in the model the usersrsquosocial network 120595c119864119906 and the locationsrsquo distance graph 120595119888119864119901two additional loss functions are required to train the net-work We follow the process done in [2] assuming two userswho share the same context should have similar embeddingsThis is achieved by minimizing the log-loss of the contextgiven the instance embedding
119871119906 119888119900119899119905119890119909119905= minus sum(119906119906119888)
log(120595119888119864119906 minus log sum1199061015840119888isin119862119906
exp (1205951198881015840119864119906)) (13)
where 120595119888119864119906 is as defined in (1) Taking the binary classlabel into account prompts the following loss functioncorrespondingwithminimizing the cross-entropy loss of useri and context c with respect to the y class label
where I is a function that returns 1 if y is in the given setand 0 otherwise The same logic is used to formulate the lossfunction for the POI context and will not be provided due tospace limitations
We simultaneously minimize the three loss functions119871119901119903e119889 119871119906 119888119900119899119905119890119909119905 and 119871119901 119888119900119899119905119890119909119905 The joint optimizationimproves the recommendation accuracy while enforcingsimilar representations for locations in close proximity andusers connected in the social network The loss functionsare combined using two hyper-parameters to weight thecontextual contribution
To optimize the combined loss function a method ofgradient descent can be adopted and more specifically weutilize the Adaptive Moment Estimation (Adam) [38] Thisoptimizer automatically adjusts the learning rate and yieldsfaster convergence than the standard gradient descent inaddition to making the learning rate optimization processmore efficient In order to avoid additional overfitting whentraining the model an early stopping criteria is integratedThe model parameters are initialized with Gaussian distri-bution while the output layerrsquos parameters are set to followuniform distribution
4 Experiments and Evaluation
41 Experimental Setup To evaluate our proposed algorithmwe use Yelprsquos real-world dataset (httpswwwyelpcomdata-setchallenge) It includes a subset of textual reviews along
Complexity 7
with the usersrsquo friends and the businesses locations Due tothe limited resources used in the model evaluation we choseto filter the dataset and keep only a concise subset whereall users and locations with less than 100 written reviews orless than 10 friends are removedThe filtered dataset includes141028 reviews and 9808 sparsity for the rating matrixThesocial and geographical graphs were constructed by randomwalks 10 of the original vertices were sampled as basenodes while 20 and 30 vertices were connected to each basenode for users and locations respectively with a window sizeof 3 To build the POI graph two locations are considereddirectly connected if they are up to 1 km apart
To test our modelsrsquo performance the original data wassplit to training-validation-test sets by random samplingwith the respective ratios of 56-24-20 resulting in78899 training instances In addition the input data was neg-atively sampled with 4 negative locations for every positiveone
To effectively compare our proposal with other alterna-tives we adopt the same settings as applied in [2 3] TheMLP input vectors are represented with an embedding size of10 while two hidden layers are added on top of the mergedresult Following the tower architecture where the size ofeach layer is half the size of its predecessor the numbers ofhidden units are 32 and 16 for the first and second layersrespectively
In the CNN component each word is represented by apretrained embedding layer with 50 units while the convo-lutional layer is constructed with a window size of 10 anda stride of 3 It results in 3 feature maps that are flattenedafter performing the max-pooling operation with a pool sizeof 2 The results are further modeled by a hidden layer with32 units Following the merge of the two hidden units theirinteraction is learned using another hidden layer with 8 unitsTo combine the three loss functions as described in (15) wefollow the results of [2] and set the hyperparameters 1205821 =1205822 = 01 For the training phase of the model a learning rateof 0005 was used over 50 maximum epochs and a batch sizeof 512 samples
42 Baselines To evaluate our algorithm we chose to com-pare it to these seven empirically proven frameworks
(vi) PACE [2] Preference and Context Embedding AMLP-based framework with the addition of contex-tual graphsrsquo smoothing for POI recommendation
(vii) DeepCoNN [14] Deep Cooperative Neural Net-works A CNN-based method that jointly learns anexplicit prediction by exploiting usersrsquo and locationsrsquonatural language reviews
For the task of evaluating ourmodel and the baselines wechose to apply Accuracy and Mean Square Error (MSE) overall n test samples as well as Precision (Pre10) and Recall(Rec10) for the average top 10 predictions per user
The proposed models were implemented using Keras(httpskerasio) on top of TensorFlow (httpswwwten-sorfloworg) backend All experiments were conducted usingNivida GTX 1070 GPU
43 Performance Evaluation The performance of our pro-posed algorithms and the seven baselines is reported inTable 1 along with the improvement ratio of TCENR overeach method in brackets The presented results are based onthe average of three individual executions
As can be witnessed from the results the proposedmodel TCENR achieves the best results overall comparedto all baselines Furthermore it was found to significantlyoutperformHPFNMF Geo-SAGE LCARS Pace and Deep-CoNN for 119901 lt 005 based on a one-sided unpaired t-testin terms of accuracy and MSE The contrasting results interms of precision and recall compared to NeuMF suggestthat TCENR offers lessbut more relevant recommendationsto the user While NMF provides the best precision scorecompared to all methods it underperforms in all othermeasures making it a less desirable model Taking a closerlook shows that surprisingly NeuMF outperforms PACE inaccuracy precision and recall This may be due to the less
Figure 3 Runtime (seconds) of all models on the Yelp dataset
sparse dataset tested which does not allow the contextualregularization to be fully harvested In addition the use ofonly the first 500words to represent the textual input for eachuser and location may explain the relatively low scores of theDeepCoNN model on the dataset while the performance ofGeo-SAGE and LCARS demonstrates that relying solely ongeographical data does not allow suchmodels to fully captureusersrsquo preferences in LBSNs
Comparing TCENR and its proposed extensionTCENRseq provides contrasting results By employingRNN instead of CNN to extract user and location featuresfrom textual reviews TCENRseq achieves lower error rateand improved precision score while accuracy and recallare worsened It may be considered that by accuratelycapturing different aspects from user reviews the modelis able to reinforce its hypotheses and therefore reducethe uncertainty in some cases However when faced witha contrast between textual aspects and the ground truthit might choose the wrong class label Nonetheless theresults demonstrate the importance of adopting the mostsuitable techniques and measures to learn different datatypes rather than employing a single method over all inputsMoreover it shows the positive impact of using textualdata in conjunction to historical activities The reportedperformance further suggests additional insight towards theselection of CNN and RNN for the task of language modelingin future recommendation tasks
To further evaluate our suggested frameworks and theseven baselines in terms of runtime the average time requiredto fully train each method is presented in Figure 3 Asdemonstrated by the results TCENR is competitivewithmostbaselines and found to be more efficient than DeepCoNNand LCARS The reported runtime of TCENRseq furtherdemonstrates the relative efficiency of CNN-based solutionsfor textual modeling tasks As the number of trainableparameters is increased due to the use of recurrent layers ourRNN-based extension takes 329 longer to train comparedto TCENR while achieving comparative results
44 Model Design Analysis In this section we discuss theeffect of several design selections over the suggested modelrsquosperformance
441Merge Layer The importance of themodelrsquos final layersresponsible for combining the dense output of both the MLPand convolutional networks requires a close attention as it
effects the networksrsquo ability to jointly learn and the predictionitself To properly select the fusion operator the followingmethods had been considered
(i) Combining the last hidden layers of the two modelsusing concatenation A model using this method willbe denoted as 119879119862119864119873119877119888119900119899 and described in (9)
(ii) Merging the last hidden layers using dot productresulting in a model named 119879119862119864119873119877119889119900119905 that can bedefined as
(iii) Combining the two previously described methodswhere the two representations will be jointly learnedby concatenation and dot productThe resultedmodelwill be denoted as 119879119862119864119873119877119889119900119905 119888119900119899 and can be devel-oped by combining (9) and (16) using addition andtranslating the result to a range of [0 1] with thesigmoid function
(iv) Adopting a weighted average for the prediction resultof the two networks Denoted as 119879119862119864119873119877119908119890119894119892ℎ119905 thismodel can be defined as
119910119906119901 = 1205821120590 (1198825 times ℎ119888119900119899119905119890119909119905 + 1198875)+ 1205822120590 (1198826 times ℎ119903119890V119894119890119908119904 + 1198876) (18)
As shown in Figure 4 adopting the more simple methodsof weighted average and dot product leads to an inferiorperformance of TCENR demonstrating the added valueof utilizing the latent features learned by each subnetworkjointly When combined with the underperforming methodof dot product in 119879119862119864119873119877119889119900119905 119888119900119899 the use of concatenationimproves over dot product alone However since the twomethods are integrated using a simple average employingonly concatenation as done in 119879119862119864119873119877119888119900119899 produces the bestresults and therefore integrated into the final model
442 MLP Layer Design Although it was found by [3] thatadding more layers and units to the MLP-based recom-mender has a positive effect the use of CNN and the addi-tional hidden layer suggests it is a subject worth investigating
Complexity 9
0812
0823
0802
0828
TCENRdot_con
TCENRweight
TCENRdot
TCENR con
Figure 4 Comparison of merging methods in terms of accuracy
Table 2 Modelrsquos accuracy with different layers
To this end we test the proposed algorithm with 1-4 hiddenlayers used to learn the user-item interaction with contextualregularization in varying sizes from 8 to 128 hidden unitsThe results in terms of test setrsquos accuracy are presented inTable 2 where the number of hidden layers is defined ascolumns and the size of the first unit is presented as rowsUnlike previous results we find that two hidden layers with32 and 16 hidden units result in the best performance for ourdataset
443 Number of Words The use of written reviews in theiroriginal order allows the strengths of CNN and RNN to beexploited by finding the best representation for every fewwords and eventually for the whole text Our final datasethowever is composed of very long reviews where to fullylearn a single user or location more than 20000 words arerequired making it computationally expensive to extract rel-evant representations To benefit from the sequential natureof the written reviews while keeping the solution feasible thenumber of words was limited to a range of 500-6000 As canbe witnessed from Figure 5 there is a slight improvement inaccuracy as the number of words increase up to 3000 whileadditional words result in an increased bias towards users andlocations with longer reviews and in turn reduce the modelrsquoslearning capabilities
5 Conclusion and Future Work
In this paper we developed a neural POI recommendersystem called TCENR The model exploits data about userslocations spatial data social networks and textual reviewsto predict the implicit preference of users regarding POIsTCENR models two types of user-location interactionsnative check-ins regularized by contextual information andthe words used to describe the usersrsquo experiences We furtherextended our proposed method and presented TCENRseqwhere textual data was modeled using RNN instead of CNNEvaluated over the Yelp dataset the proposed algorithms
0822
0825
0828
0831
0834
0 2000 4000 6000
Figure 5 Number of words comparison in terms of accuracy
consistently achieved superior results compared to sevenstate-of-the-art baselines in terms of accuracy and MSE
For future work we intend to extend our modelsrsquoevaluation over additional LBSN datasets In addition weplan to investigate the proposed frameworksrsquo contributionto the cold-start problem by analyzing its performance onadditional data while taking newusers and locationswith fewreviews into account
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was supported in part by the National NaturalScience Foundation of China Grant (61572289) and NSERCDiscovery Grants
References
[1] H Li Y Ge R Hong and H Zhu ldquoPoint-of-interest rec-ommendations learning potential check-ins from friendsrdquo inProceedings of the 22nd ACM SIGKDD International Conferenceon Knowledge Discovery And Data Mining pp 975ndash984 ACMSan Francisco California USA August 2016
[2] C Yang L Bai C Zhang Q Yuan and J Han ldquoBridgingcollaborative filtering and semi-supervised learning a neuralapproach for poi recommendationrdquo in Proceedings of the 23rdACM SIGKDD International Conference on Knowledge Discov-ery andDataMining pp 1245ndash1254 ACMHalifaxNS CanadaAugust 2017
[3] X He L Liao H Zhang L Nie X Hu and T-S Chua ldquoNeuralcollaborative filteringrdquo in Proceedings of the 26th InternationalConference onWorldWideWeb InternationalWorldWideWebConferences Steering Committee pp 173ndash182 Perth AustraliaApril 2017
[4] H Ma D Zhou C Liu M R Lyu and I King ldquoRecommendersystems with social regularizationrdquo in Proceedings of the 4thACM International Conference onWeb Search and DataMiningpp 287ndash296 ACM February 2011
10 Complexity
[5] W Wang H Yin L Chen Y Sun S Sadiq and X ZhouldquoGeo-sage a geographical sparse additive generative model forspatial item recommendationrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1255ndash1264 ACM Sydney NSW AustraliaAugust 2015
[6] H Yin Y Sun B Cui Z Hu and L Chen ldquoLcars a location-content-aware recommender systemrdquo in Proceedings of the 19thACM SIGKDD International Conference on Knowledge Discov-ery andDataMining pp 221ndash229 ACMChicago Illinois USAAugust 2013
[7] C Wang and D M Blei ldquoCollaborative topic modeling for rec-ommending scientific articlesrdquo in Proceedings of the 17th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo11) pp 448ndash456 ACM August 2011
[8] HWang NWang andD-Y Yeung ldquoCollaborative deep learn-ing for recommender systemsrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1235ndash1244 ACM Sydney NSW AustraliaAugust 2015
[9] J Manotumruksa C Macdonald and I Ounis ldquoA deep recur-rent collaborative filtering framework for venue recommen-dationrdquo in Proceedings of the 2017 ACM on Conference onInformation and KnowledgeManagement pp 1429ndash1438 ACMSingapore Singapore November 2017
[10] Q Liu S Wu D Wang Z Li and L Wang ldquoContext-awaresequential recommendationrdquo in Proceedings of the 2016 IEEE16th International Conference on Data Mining (ICDM) pp1053ndash1058 IEEE Barcelona Spain December 2016
[11] H-T Cheng L Koc J Harmsen et al ldquoWide amp deep learningfor recommender systemsrdquo inProceedings of the 1stWorkshop onDeep Learning for Recommender Systems pp 7ndash10 ACM 2016
[12] Y Yu L Zhang C Wang R Gao W Zhao and J JiangldquoNeural personalized ranking via poisson factormodel for itemrecommendationrdquoComplexity vol 2019 Article ID 3563674 16pages 2019
[13] A Van Den Oord S Dieleman and B Schrauwen ldquoDeepcontent-based music recommendationrdquo in Proceedings of the26th International Conference on Neural Information ProcessingSystems Advances in neural information processing systemspp 2643ndash2651 2013
[14] L Zheng V Noroozi and P S Yu ldquoJoint deepmodeling of usersand items using reviews for recommendationrdquo in Proceedingsof the Tenth ACM International Conference on Web Search andData Mining pp 425ndash434 ACM Cambridge UK Feburary2017
[15] D Kim C Park J Oh S Lee andH Yu ldquoConvolutional matrixfactorization for document context-aware recommendationrdquoin Proceedings of the 10th ACM Conference on RecommenderSystems pp 233ndash240 ACM 2016
[16] B Hidasi A Karatzoglou L Baltrunas and D Tikk ldquoSession-based recommendations with recurrent neural networksrdquo 2015httpsarxivorgabs151106939
[17] A Almahairi K Kastner K Cho and A Courville ldquoLearningdistributed representations from reviews for collaborative filter-ingrdquo in Proceedings of the 9th ACMConference on RecommenderSystems pp 147ndash154 ACM Vienna Austria September 2015
[18] T Bansal D Belanger and A McCallum ldquoAsk the gru multi-task learning for deep text recommendationsrdquo in Proceedings ofthe 10th ACMConference on Recommender Systems pp 107ndash114ACM 2016
[19] J Chen W Zhang P Zhang P Ying K Niu and M ZouldquoExploiting spatial and temporal for point of interest recom-mendationrdquoComplexity vol 2018 Article ID 6928605 16 pages2018
[20] P Zhao X Xu Y Liu V S Sheng K Zheng and H XiongldquoPhoto2trip exploiting visual contents in geo-tagged photos forpersonalized tour recommendationrdquo in Proceedings of the 2017ACM on Multimedia Conference - MM 17 pp 916ndash924 ACMPress Mountain View California USA October 2017
[21] P Covington J Adams and E Sargin ldquoDeep neural networksfor youtube recommendationsrdquo in Proceedings of the 10th ACMConference on Recommender Systems pp 191ndash198 ACM 2016
[22] C-Y Wu A Ahmed A Beutel A J Smola and H JingldquoRecurrent recommender networksrdquo in Proceedings of the TenthACM International Conference onWeb Search and DataMiningpp 495ndash503 ACM 2017
[23] A Beutel P Covington S Jain et al ldquoLatent cross making useof context in recurrent recommender systemsrdquo inProceedings ofthe Eleventh ACM International Conference on Web Search andDataMining pp 46ndash54 ACMMarinaDel Rey CA USA 2018
[24] H Yin W Wang H Wang L Chen and X Zhou ldquoSpatial-aware hierarchical collaborative deep learning for POI rec-ommendationrdquo IEEE Transactions on Knowledge and DataEngineering vol 29 no 11 pp 2537ndash2551 2017
[25] H Yin X Zhou Y Shao H Wang and S Sadiq ldquoJointmodeling of user check-in behaviors for point-of-interest rec-ommendationrdquo in Proceedings of the 24th ACM Internationalon Conference on Information and Knowledge Management pp1631ndash1640 ACMMelbourne Australia October 2015
[26] P Zhao X Xu Y Liu et al ldquoExploiting hierarchical structuresfor POI recommendationrdquo in Proceedings of the 2017 IEEEInternational Conference on Data Mining (ICDM) IEEE NewOrleans LA USA November 2017
[27] P Zhao H Zhu Y Liu et al ldquoWhere to go next a spatio-temporal gated network for next poi recommendationrdquo inProceedings of the 33rdAAAIConference onArtificial Intelligence(AAAI 2019) 2019
[28] HWang F Zhang X Xie andMGuo ldquoDkn deep knowledge-aware network for news recommendationrdquo in Proceedings of the2018World Wide Web Conference pp 1835ndash1844 Lyon FranceApril 2018
[29] Y Gong and Q Zhang ldquoHashtag recommendation usingattention-based convolutional neural networkrdquo in Proceedingsof the 25th International Joint Conference on Artificial Intelli-gence IJCAI 2016 pp 2782ndash2788 NY USA July 2016
[30] Y Tay A T Luu and S C Hui ldquoMulti-pointer co-attentionnetworks for recommendationrdquo in Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Discovery ampData Mining pp 2309ndash2318 London UK August 2018
[31] Z Cheng Y Ding L Zhu and M Kankanhalli ldquoAspect-awarelatent factormodel rating prediction with ratings and reviewsrdquoin Proceedings of the 2018World WideWeb Conference pp 639ndash648 Lyon France April 2018
[32] D Tang BQin T Liu andY Yang ldquoUsermodeling with neuralnetwork for review rating predictionrdquo in Proceedings of the 24thInternational Conference on Artificial Intelligence IJCAI 2015pp 1340ndash1346 Argentina July 2015
[33] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2324 1998
[34] Y Kim ldquoConvolutional neural networks for sentence classi-ficationrdquo in Proceedings of the 2014 Conference on Empirical
Complexity 11
Methods in Natural Language Processing (EMNLP) pp 1746ndash1751 Association for Computational Linguistics Doha Qatar2014 httpsaclanthologyinfopapersD14-1181d14-1181
[35] Y Zhang Q Ai X Chen andW B Croft ldquoJoint representationlearning for top-N recommendation with heterogeneous infor-mation sourcesrdquo in Proceedings of the 2017 ACM on Conferenceon Information and Knowledge Management pp 1449ndash1458ACM Singapore Singapore November 2017
[36] J Pennington R Socher and C Manning ldquoGloVe globalvectors for word representationrdquo in Proceedings of the 2014 Con-ference on Empirical Methods in Natural Language Processing(EMNLP) pp 1532ndash1543 2014
[37] T Mikolov I Sutskever K Chen G S Corrado and JDean ldquoDistributed representations of words and phrases andtheir compositionalityrdquo in Proceedings of the 26th Interna-tional Conference on Neural Information Processing SystemsAdvances in neural information processing systems pp 3111ndash3119 Lake TahoeNevada 2013 httpsdlacmorgcitationcfmid=2999959
[38] D P Kingma and J Ba ldquoAdam a method for stochasticoptimizationrdquo 2014 httpsarxivorgabs14126980
[39] P Gopalan J M Hofman and DM Blei ldquoScalable recommen-dationwith hierarchical Poisson factorizationrdquo inProceedings ofthe 31st Conference on Uncertainty in Artificial Intelligence UAI2015 pp 326ndash335 Netherlands July 2015
[40] X Luo M Zhou Y Xia and Q Zhu ldquoAn efficient non-negativematrix-factorization-based approach to collaborative filteringfor recommender systemsrdquo IEEE Transactions on IndustrialInformatics vol 10 no 2 pp 1273ndash1284 2014
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Complexity 7
with the usersrsquo friends and the businesses locations Due tothe limited resources used in the model evaluation we choseto filter the dataset and keep only a concise subset whereall users and locations with less than 100 written reviews orless than 10 friends are removedThe filtered dataset includes141028 reviews and 9808 sparsity for the rating matrixThesocial and geographical graphs were constructed by randomwalks 10 of the original vertices were sampled as basenodes while 20 and 30 vertices were connected to each basenode for users and locations respectively with a window sizeof 3 To build the POI graph two locations are considereddirectly connected if they are up to 1 km apart
To test our modelsrsquo performance the original data wassplit to training-validation-test sets by random samplingwith the respective ratios of 56-24-20 resulting in78899 training instances In addition the input data was neg-atively sampled with 4 negative locations for every positiveone
To effectively compare our proposal with other alterna-tives we adopt the same settings as applied in [2 3] TheMLP input vectors are represented with an embedding size of10 while two hidden layers are added on top of the mergedresult Following the tower architecture where the size ofeach layer is half the size of its predecessor the numbers ofhidden units are 32 and 16 for the first and second layersrespectively
In the CNN component each word is represented by apretrained embedding layer with 50 units while the convo-lutional layer is constructed with a window size of 10 anda stride of 3 It results in 3 feature maps that are flattenedafter performing the max-pooling operation with a pool sizeof 2 The results are further modeled by a hidden layer with32 units Following the merge of the two hidden units theirinteraction is learned using another hidden layer with 8 unitsTo combine the three loss functions as described in (15) wefollow the results of [2] and set the hyperparameters 1205821 =1205822 = 01 For the training phase of the model a learning rateof 0005 was used over 50 maximum epochs and a batch sizeof 512 samples
42 Baselines To evaluate our algorithm we chose to com-pare it to these seven empirically proven frameworks
(vi) PACE [2] Preference and Context Embedding AMLP-based framework with the addition of contex-tual graphsrsquo smoothing for POI recommendation
(vii) DeepCoNN [14] Deep Cooperative Neural Net-works A CNN-based method that jointly learns anexplicit prediction by exploiting usersrsquo and locationsrsquonatural language reviews
For the task of evaluating ourmodel and the baselines wechose to apply Accuracy and Mean Square Error (MSE) overall n test samples as well as Precision (Pre10) and Recall(Rec10) for the average top 10 predictions per user
The proposed models were implemented using Keras(httpskerasio) on top of TensorFlow (httpswwwten-sorfloworg) backend All experiments were conducted usingNivida GTX 1070 GPU
43 Performance Evaluation The performance of our pro-posed algorithms and the seven baselines is reported inTable 1 along with the improvement ratio of TCENR overeach method in brackets The presented results are based onthe average of three individual executions
As can be witnessed from the results the proposedmodel TCENR achieves the best results overall comparedto all baselines Furthermore it was found to significantlyoutperformHPFNMF Geo-SAGE LCARS Pace and Deep-CoNN for 119901 lt 005 based on a one-sided unpaired t-testin terms of accuracy and MSE The contrasting results interms of precision and recall compared to NeuMF suggestthat TCENR offers lessbut more relevant recommendationsto the user While NMF provides the best precision scorecompared to all methods it underperforms in all othermeasures making it a less desirable model Taking a closerlook shows that surprisingly NeuMF outperforms PACE inaccuracy precision and recall This may be due to the less
Figure 3 Runtime (seconds) of all models on the Yelp dataset
sparse dataset tested which does not allow the contextualregularization to be fully harvested In addition the use ofonly the first 500words to represent the textual input for eachuser and location may explain the relatively low scores of theDeepCoNN model on the dataset while the performance ofGeo-SAGE and LCARS demonstrates that relying solely ongeographical data does not allow suchmodels to fully captureusersrsquo preferences in LBSNs
Comparing TCENR and its proposed extensionTCENRseq provides contrasting results By employingRNN instead of CNN to extract user and location featuresfrom textual reviews TCENRseq achieves lower error rateand improved precision score while accuracy and recallare worsened It may be considered that by accuratelycapturing different aspects from user reviews the modelis able to reinforce its hypotheses and therefore reducethe uncertainty in some cases However when faced witha contrast between textual aspects and the ground truthit might choose the wrong class label Nonetheless theresults demonstrate the importance of adopting the mostsuitable techniques and measures to learn different datatypes rather than employing a single method over all inputsMoreover it shows the positive impact of using textualdata in conjunction to historical activities The reportedperformance further suggests additional insight towards theselection of CNN and RNN for the task of language modelingin future recommendation tasks
To further evaluate our suggested frameworks and theseven baselines in terms of runtime the average time requiredto fully train each method is presented in Figure 3 Asdemonstrated by the results TCENR is competitivewithmostbaselines and found to be more efficient than DeepCoNNand LCARS The reported runtime of TCENRseq furtherdemonstrates the relative efficiency of CNN-based solutionsfor textual modeling tasks As the number of trainableparameters is increased due to the use of recurrent layers ourRNN-based extension takes 329 longer to train comparedto TCENR while achieving comparative results
44 Model Design Analysis In this section we discuss theeffect of several design selections over the suggested modelrsquosperformance
441Merge Layer The importance of themodelrsquos final layersresponsible for combining the dense output of both the MLPand convolutional networks requires a close attention as it
effects the networksrsquo ability to jointly learn and the predictionitself To properly select the fusion operator the followingmethods had been considered
(i) Combining the last hidden layers of the two modelsusing concatenation A model using this method willbe denoted as 119879119862119864119873119877119888119900119899 and described in (9)
(ii) Merging the last hidden layers using dot productresulting in a model named 119879119862119864119873119877119889119900119905 that can bedefined as
(iii) Combining the two previously described methodswhere the two representations will be jointly learnedby concatenation and dot productThe resultedmodelwill be denoted as 119879119862119864119873119877119889119900119905 119888119900119899 and can be devel-oped by combining (9) and (16) using addition andtranslating the result to a range of [0 1] with thesigmoid function
(iv) Adopting a weighted average for the prediction resultof the two networks Denoted as 119879119862119864119873119877119908119890119894119892ℎ119905 thismodel can be defined as
119910119906119901 = 1205821120590 (1198825 times ℎ119888119900119899119905119890119909119905 + 1198875)+ 1205822120590 (1198826 times ℎ119903119890V119894119890119908119904 + 1198876) (18)
As shown in Figure 4 adopting the more simple methodsof weighted average and dot product leads to an inferiorperformance of TCENR demonstrating the added valueof utilizing the latent features learned by each subnetworkjointly When combined with the underperforming methodof dot product in 119879119862119864119873119877119889119900119905 119888119900119899 the use of concatenationimproves over dot product alone However since the twomethods are integrated using a simple average employingonly concatenation as done in 119879119862119864119873119877119888119900119899 produces the bestresults and therefore integrated into the final model
442 MLP Layer Design Although it was found by [3] thatadding more layers and units to the MLP-based recom-mender has a positive effect the use of CNN and the addi-tional hidden layer suggests it is a subject worth investigating
Complexity 9
0812
0823
0802
0828
TCENRdot_con
TCENRweight
TCENRdot
TCENR con
Figure 4 Comparison of merging methods in terms of accuracy
Table 2 Modelrsquos accuracy with different layers
To this end we test the proposed algorithm with 1-4 hiddenlayers used to learn the user-item interaction with contextualregularization in varying sizes from 8 to 128 hidden unitsThe results in terms of test setrsquos accuracy are presented inTable 2 where the number of hidden layers is defined ascolumns and the size of the first unit is presented as rowsUnlike previous results we find that two hidden layers with32 and 16 hidden units result in the best performance for ourdataset
443 Number of Words The use of written reviews in theiroriginal order allows the strengths of CNN and RNN to beexploited by finding the best representation for every fewwords and eventually for the whole text Our final datasethowever is composed of very long reviews where to fullylearn a single user or location more than 20000 words arerequired making it computationally expensive to extract rel-evant representations To benefit from the sequential natureof the written reviews while keeping the solution feasible thenumber of words was limited to a range of 500-6000 As canbe witnessed from Figure 5 there is a slight improvement inaccuracy as the number of words increase up to 3000 whileadditional words result in an increased bias towards users andlocations with longer reviews and in turn reduce the modelrsquoslearning capabilities
5 Conclusion and Future Work
In this paper we developed a neural POI recommendersystem called TCENR The model exploits data about userslocations spatial data social networks and textual reviewsto predict the implicit preference of users regarding POIsTCENR models two types of user-location interactionsnative check-ins regularized by contextual information andthe words used to describe the usersrsquo experiences We furtherextended our proposed method and presented TCENRseqwhere textual data was modeled using RNN instead of CNNEvaluated over the Yelp dataset the proposed algorithms
0822
0825
0828
0831
0834
0 2000 4000 6000
Figure 5 Number of words comparison in terms of accuracy
consistently achieved superior results compared to sevenstate-of-the-art baselines in terms of accuracy and MSE
For future work we intend to extend our modelsrsquoevaluation over additional LBSN datasets In addition weplan to investigate the proposed frameworksrsquo contributionto the cold-start problem by analyzing its performance onadditional data while taking newusers and locationswith fewreviews into account
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was supported in part by the National NaturalScience Foundation of China Grant (61572289) and NSERCDiscovery Grants
References
[1] H Li Y Ge R Hong and H Zhu ldquoPoint-of-interest rec-ommendations learning potential check-ins from friendsrdquo inProceedings of the 22nd ACM SIGKDD International Conferenceon Knowledge Discovery And Data Mining pp 975ndash984 ACMSan Francisco California USA August 2016
[2] C Yang L Bai C Zhang Q Yuan and J Han ldquoBridgingcollaborative filtering and semi-supervised learning a neuralapproach for poi recommendationrdquo in Proceedings of the 23rdACM SIGKDD International Conference on Knowledge Discov-ery andDataMining pp 1245ndash1254 ACMHalifaxNS CanadaAugust 2017
[3] X He L Liao H Zhang L Nie X Hu and T-S Chua ldquoNeuralcollaborative filteringrdquo in Proceedings of the 26th InternationalConference onWorldWideWeb InternationalWorldWideWebConferences Steering Committee pp 173ndash182 Perth AustraliaApril 2017
[4] H Ma D Zhou C Liu M R Lyu and I King ldquoRecommendersystems with social regularizationrdquo in Proceedings of the 4thACM International Conference onWeb Search and DataMiningpp 287ndash296 ACM February 2011
10 Complexity
[5] W Wang H Yin L Chen Y Sun S Sadiq and X ZhouldquoGeo-sage a geographical sparse additive generative model forspatial item recommendationrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1255ndash1264 ACM Sydney NSW AustraliaAugust 2015
[6] H Yin Y Sun B Cui Z Hu and L Chen ldquoLcars a location-content-aware recommender systemrdquo in Proceedings of the 19thACM SIGKDD International Conference on Knowledge Discov-ery andDataMining pp 221ndash229 ACMChicago Illinois USAAugust 2013
[7] C Wang and D M Blei ldquoCollaborative topic modeling for rec-ommending scientific articlesrdquo in Proceedings of the 17th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo11) pp 448ndash456 ACM August 2011
[8] HWang NWang andD-Y Yeung ldquoCollaborative deep learn-ing for recommender systemsrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1235ndash1244 ACM Sydney NSW AustraliaAugust 2015
[9] J Manotumruksa C Macdonald and I Ounis ldquoA deep recur-rent collaborative filtering framework for venue recommen-dationrdquo in Proceedings of the 2017 ACM on Conference onInformation and KnowledgeManagement pp 1429ndash1438 ACMSingapore Singapore November 2017
[10] Q Liu S Wu D Wang Z Li and L Wang ldquoContext-awaresequential recommendationrdquo in Proceedings of the 2016 IEEE16th International Conference on Data Mining (ICDM) pp1053ndash1058 IEEE Barcelona Spain December 2016
[11] H-T Cheng L Koc J Harmsen et al ldquoWide amp deep learningfor recommender systemsrdquo inProceedings of the 1stWorkshop onDeep Learning for Recommender Systems pp 7ndash10 ACM 2016
[12] Y Yu L Zhang C Wang R Gao W Zhao and J JiangldquoNeural personalized ranking via poisson factormodel for itemrecommendationrdquoComplexity vol 2019 Article ID 3563674 16pages 2019
[13] A Van Den Oord S Dieleman and B Schrauwen ldquoDeepcontent-based music recommendationrdquo in Proceedings of the26th International Conference on Neural Information ProcessingSystems Advances in neural information processing systemspp 2643ndash2651 2013
[14] L Zheng V Noroozi and P S Yu ldquoJoint deepmodeling of usersand items using reviews for recommendationrdquo in Proceedingsof the Tenth ACM International Conference on Web Search andData Mining pp 425ndash434 ACM Cambridge UK Feburary2017
[15] D Kim C Park J Oh S Lee andH Yu ldquoConvolutional matrixfactorization for document context-aware recommendationrdquoin Proceedings of the 10th ACM Conference on RecommenderSystems pp 233ndash240 ACM 2016
[16] B Hidasi A Karatzoglou L Baltrunas and D Tikk ldquoSession-based recommendations with recurrent neural networksrdquo 2015httpsarxivorgabs151106939
[17] A Almahairi K Kastner K Cho and A Courville ldquoLearningdistributed representations from reviews for collaborative filter-ingrdquo in Proceedings of the 9th ACMConference on RecommenderSystems pp 147ndash154 ACM Vienna Austria September 2015
[18] T Bansal D Belanger and A McCallum ldquoAsk the gru multi-task learning for deep text recommendationsrdquo in Proceedings ofthe 10th ACMConference on Recommender Systems pp 107ndash114ACM 2016
[19] J Chen W Zhang P Zhang P Ying K Niu and M ZouldquoExploiting spatial and temporal for point of interest recom-mendationrdquoComplexity vol 2018 Article ID 6928605 16 pages2018
[20] P Zhao X Xu Y Liu V S Sheng K Zheng and H XiongldquoPhoto2trip exploiting visual contents in geo-tagged photos forpersonalized tour recommendationrdquo in Proceedings of the 2017ACM on Multimedia Conference - MM 17 pp 916ndash924 ACMPress Mountain View California USA October 2017
[21] P Covington J Adams and E Sargin ldquoDeep neural networksfor youtube recommendationsrdquo in Proceedings of the 10th ACMConference on Recommender Systems pp 191ndash198 ACM 2016
[22] C-Y Wu A Ahmed A Beutel A J Smola and H JingldquoRecurrent recommender networksrdquo in Proceedings of the TenthACM International Conference onWeb Search and DataMiningpp 495ndash503 ACM 2017
[23] A Beutel P Covington S Jain et al ldquoLatent cross making useof context in recurrent recommender systemsrdquo inProceedings ofthe Eleventh ACM International Conference on Web Search andDataMining pp 46ndash54 ACMMarinaDel Rey CA USA 2018
[24] H Yin W Wang H Wang L Chen and X Zhou ldquoSpatial-aware hierarchical collaborative deep learning for POI rec-ommendationrdquo IEEE Transactions on Knowledge and DataEngineering vol 29 no 11 pp 2537ndash2551 2017
[25] H Yin X Zhou Y Shao H Wang and S Sadiq ldquoJointmodeling of user check-in behaviors for point-of-interest rec-ommendationrdquo in Proceedings of the 24th ACM Internationalon Conference on Information and Knowledge Management pp1631ndash1640 ACMMelbourne Australia October 2015
[26] P Zhao X Xu Y Liu et al ldquoExploiting hierarchical structuresfor POI recommendationrdquo in Proceedings of the 2017 IEEEInternational Conference on Data Mining (ICDM) IEEE NewOrleans LA USA November 2017
[27] P Zhao H Zhu Y Liu et al ldquoWhere to go next a spatio-temporal gated network for next poi recommendationrdquo inProceedings of the 33rdAAAIConference onArtificial Intelligence(AAAI 2019) 2019
[28] HWang F Zhang X Xie andMGuo ldquoDkn deep knowledge-aware network for news recommendationrdquo in Proceedings of the2018World Wide Web Conference pp 1835ndash1844 Lyon FranceApril 2018
[29] Y Gong and Q Zhang ldquoHashtag recommendation usingattention-based convolutional neural networkrdquo in Proceedingsof the 25th International Joint Conference on Artificial Intelli-gence IJCAI 2016 pp 2782ndash2788 NY USA July 2016
[30] Y Tay A T Luu and S C Hui ldquoMulti-pointer co-attentionnetworks for recommendationrdquo in Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Discovery ampData Mining pp 2309ndash2318 London UK August 2018
[31] Z Cheng Y Ding L Zhu and M Kankanhalli ldquoAspect-awarelatent factormodel rating prediction with ratings and reviewsrdquoin Proceedings of the 2018World WideWeb Conference pp 639ndash648 Lyon France April 2018
[32] D Tang BQin T Liu andY Yang ldquoUsermodeling with neuralnetwork for review rating predictionrdquo in Proceedings of the 24thInternational Conference on Artificial Intelligence IJCAI 2015pp 1340ndash1346 Argentina July 2015
[33] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2324 1998
[34] Y Kim ldquoConvolutional neural networks for sentence classi-ficationrdquo in Proceedings of the 2014 Conference on Empirical
Complexity 11
Methods in Natural Language Processing (EMNLP) pp 1746ndash1751 Association for Computational Linguistics Doha Qatar2014 httpsaclanthologyinfopapersD14-1181d14-1181
[35] Y Zhang Q Ai X Chen andW B Croft ldquoJoint representationlearning for top-N recommendation with heterogeneous infor-mation sourcesrdquo in Proceedings of the 2017 ACM on Conferenceon Information and Knowledge Management pp 1449ndash1458ACM Singapore Singapore November 2017
[36] J Pennington R Socher and C Manning ldquoGloVe globalvectors for word representationrdquo in Proceedings of the 2014 Con-ference on Empirical Methods in Natural Language Processing(EMNLP) pp 1532ndash1543 2014
[37] T Mikolov I Sutskever K Chen G S Corrado and JDean ldquoDistributed representations of words and phrases andtheir compositionalityrdquo in Proceedings of the 26th Interna-tional Conference on Neural Information Processing SystemsAdvances in neural information processing systems pp 3111ndash3119 Lake TahoeNevada 2013 httpsdlacmorgcitationcfmid=2999959
[38] D P Kingma and J Ba ldquoAdam a method for stochasticoptimizationrdquo 2014 httpsarxivorgabs14126980
[39] P Gopalan J M Hofman and DM Blei ldquoScalable recommen-dationwith hierarchical Poisson factorizationrdquo inProceedings ofthe 31st Conference on Uncertainty in Artificial Intelligence UAI2015 pp 326ndash335 Netherlands July 2015
[40] X Luo M Zhou Y Xia and Q Zhu ldquoAn efficient non-negativematrix-factorization-based approach to collaborative filteringfor recommender systemsrdquo IEEE Transactions on IndustrialInformatics vol 10 no 2 pp 1273ndash1284 2014
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Figure 3 Runtime (seconds) of all models on the Yelp dataset
sparse dataset tested which does not allow the contextualregularization to be fully harvested In addition the use ofonly the first 500words to represent the textual input for eachuser and location may explain the relatively low scores of theDeepCoNN model on the dataset while the performance ofGeo-SAGE and LCARS demonstrates that relying solely ongeographical data does not allow suchmodels to fully captureusersrsquo preferences in LBSNs
Comparing TCENR and its proposed extensionTCENRseq provides contrasting results By employingRNN instead of CNN to extract user and location featuresfrom textual reviews TCENRseq achieves lower error rateand improved precision score while accuracy and recallare worsened It may be considered that by accuratelycapturing different aspects from user reviews the modelis able to reinforce its hypotheses and therefore reducethe uncertainty in some cases However when faced witha contrast between textual aspects and the ground truthit might choose the wrong class label Nonetheless theresults demonstrate the importance of adopting the mostsuitable techniques and measures to learn different datatypes rather than employing a single method over all inputsMoreover it shows the positive impact of using textualdata in conjunction to historical activities The reportedperformance further suggests additional insight towards theselection of CNN and RNN for the task of language modelingin future recommendation tasks
To further evaluate our suggested frameworks and theseven baselines in terms of runtime the average time requiredto fully train each method is presented in Figure 3 Asdemonstrated by the results TCENR is competitivewithmostbaselines and found to be more efficient than DeepCoNNand LCARS The reported runtime of TCENRseq furtherdemonstrates the relative efficiency of CNN-based solutionsfor textual modeling tasks As the number of trainableparameters is increased due to the use of recurrent layers ourRNN-based extension takes 329 longer to train comparedto TCENR while achieving comparative results
44 Model Design Analysis In this section we discuss theeffect of several design selections over the suggested modelrsquosperformance
441Merge Layer The importance of themodelrsquos final layersresponsible for combining the dense output of both the MLPand convolutional networks requires a close attention as it
effects the networksrsquo ability to jointly learn and the predictionitself To properly select the fusion operator the followingmethods had been considered
(i) Combining the last hidden layers of the two modelsusing concatenation A model using this method willbe denoted as 119879119862119864119873119877119888119900119899 and described in (9)
(ii) Merging the last hidden layers using dot productresulting in a model named 119879119862119864119873119877119889119900119905 that can bedefined as
(iii) Combining the two previously described methodswhere the two representations will be jointly learnedby concatenation and dot productThe resultedmodelwill be denoted as 119879119862119864119873119877119889119900119905 119888119900119899 and can be devel-oped by combining (9) and (16) using addition andtranslating the result to a range of [0 1] with thesigmoid function
(iv) Adopting a weighted average for the prediction resultof the two networks Denoted as 119879119862119864119873119877119908119890119894119892ℎ119905 thismodel can be defined as
119910119906119901 = 1205821120590 (1198825 times ℎ119888119900119899119905119890119909119905 + 1198875)+ 1205822120590 (1198826 times ℎ119903119890V119894119890119908119904 + 1198876) (18)
As shown in Figure 4 adopting the more simple methodsof weighted average and dot product leads to an inferiorperformance of TCENR demonstrating the added valueof utilizing the latent features learned by each subnetworkjointly When combined with the underperforming methodof dot product in 119879119862119864119873119877119889119900119905 119888119900119899 the use of concatenationimproves over dot product alone However since the twomethods are integrated using a simple average employingonly concatenation as done in 119879119862119864119873119877119888119900119899 produces the bestresults and therefore integrated into the final model
442 MLP Layer Design Although it was found by [3] thatadding more layers and units to the MLP-based recom-mender has a positive effect the use of CNN and the addi-tional hidden layer suggests it is a subject worth investigating
Complexity 9
0812
0823
0802
0828
TCENRdot_con
TCENRweight
TCENRdot
TCENR con
Figure 4 Comparison of merging methods in terms of accuracy
Table 2 Modelrsquos accuracy with different layers
To this end we test the proposed algorithm with 1-4 hiddenlayers used to learn the user-item interaction with contextualregularization in varying sizes from 8 to 128 hidden unitsThe results in terms of test setrsquos accuracy are presented inTable 2 where the number of hidden layers is defined ascolumns and the size of the first unit is presented as rowsUnlike previous results we find that two hidden layers with32 and 16 hidden units result in the best performance for ourdataset
443 Number of Words The use of written reviews in theiroriginal order allows the strengths of CNN and RNN to beexploited by finding the best representation for every fewwords and eventually for the whole text Our final datasethowever is composed of very long reviews where to fullylearn a single user or location more than 20000 words arerequired making it computationally expensive to extract rel-evant representations To benefit from the sequential natureof the written reviews while keeping the solution feasible thenumber of words was limited to a range of 500-6000 As canbe witnessed from Figure 5 there is a slight improvement inaccuracy as the number of words increase up to 3000 whileadditional words result in an increased bias towards users andlocations with longer reviews and in turn reduce the modelrsquoslearning capabilities
5 Conclusion and Future Work
In this paper we developed a neural POI recommendersystem called TCENR The model exploits data about userslocations spatial data social networks and textual reviewsto predict the implicit preference of users regarding POIsTCENR models two types of user-location interactionsnative check-ins regularized by contextual information andthe words used to describe the usersrsquo experiences We furtherextended our proposed method and presented TCENRseqwhere textual data was modeled using RNN instead of CNNEvaluated over the Yelp dataset the proposed algorithms
0822
0825
0828
0831
0834
0 2000 4000 6000
Figure 5 Number of words comparison in terms of accuracy
consistently achieved superior results compared to sevenstate-of-the-art baselines in terms of accuracy and MSE
For future work we intend to extend our modelsrsquoevaluation over additional LBSN datasets In addition weplan to investigate the proposed frameworksrsquo contributionto the cold-start problem by analyzing its performance onadditional data while taking newusers and locationswith fewreviews into account
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was supported in part by the National NaturalScience Foundation of China Grant (61572289) and NSERCDiscovery Grants
References
[1] H Li Y Ge R Hong and H Zhu ldquoPoint-of-interest rec-ommendations learning potential check-ins from friendsrdquo inProceedings of the 22nd ACM SIGKDD International Conferenceon Knowledge Discovery And Data Mining pp 975ndash984 ACMSan Francisco California USA August 2016
[2] C Yang L Bai C Zhang Q Yuan and J Han ldquoBridgingcollaborative filtering and semi-supervised learning a neuralapproach for poi recommendationrdquo in Proceedings of the 23rdACM SIGKDD International Conference on Knowledge Discov-ery andDataMining pp 1245ndash1254 ACMHalifaxNS CanadaAugust 2017
[3] X He L Liao H Zhang L Nie X Hu and T-S Chua ldquoNeuralcollaborative filteringrdquo in Proceedings of the 26th InternationalConference onWorldWideWeb InternationalWorldWideWebConferences Steering Committee pp 173ndash182 Perth AustraliaApril 2017
[4] H Ma D Zhou C Liu M R Lyu and I King ldquoRecommendersystems with social regularizationrdquo in Proceedings of the 4thACM International Conference onWeb Search and DataMiningpp 287ndash296 ACM February 2011
10 Complexity
[5] W Wang H Yin L Chen Y Sun S Sadiq and X ZhouldquoGeo-sage a geographical sparse additive generative model forspatial item recommendationrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1255ndash1264 ACM Sydney NSW AustraliaAugust 2015
[6] H Yin Y Sun B Cui Z Hu and L Chen ldquoLcars a location-content-aware recommender systemrdquo in Proceedings of the 19thACM SIGKDD International Conference on Knowledge Discov-ery andDataMining pp 221ndash229 ACMChicago Illinois USAAugust 2013
[7] C Wang and D M Blei ldquoCollaborative topic modeling for rec-ommending scientific articlesrdquo in Proceedings of the 17th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo11) pp 448ndash456 ACM August 2011
[8] HWang NWang andD-Y Yeung ldquoCollaborative deep learn-ing for recommender systemsrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1235ndash1244 ACM Sydney NSW AustraliaAugust 2015
[9] J Manotumruksa C Macdonald and I Ounis ldquoA deep recur-rent collaborative filtering framework for venue recommen-dationrdquo in Proceedings of the 2017 ACM on Conference onInformation and KnowledgeManagement pp 1429ndash1438 ACMSingapore Singapore November 2017
[10] Q Liu S Wu D Wang Z Li and L Wang ldquoContext-awaresequential recommendationrdquo in Proceedings of the 2016 IEEE16th International Conference on Data Mining (ICDM) pp1053ndash1058 IEEE Barcelona Spain December 2016
[11] H-T Cheng L Koc J Harmsen et al ldquoWide amp deep learningfor recommender systemsrdquo inProceedings of the 1stWorkshop onDeep Learning for Recommender Systems pp 7ndash10 ACM 2016
[12] Y Yu L Zhang C Wang R Gao W Zhao and J JiangldquoNeural personalized ranking via poisson factormodel for itemrecommendationrdquoComplexity vol 2019 Article ID 3563674 16pages 2019
[13] A Van Den Oord S Dieleman and B Schrauwen ldquoDeepcontent-based music recommendationrdquo in Proceedings of the26th International Conference on Neural Information ProcessingSystems Advances in neural information processing systemspp 2643ndash2651 2013
[14] L Zheng V Noroozi and P S Yu ldquoJoint deepmodeling of usersand items using reviews for recommendationrdquo in Proceedingsof the Tenth ACM International Conference on Web Search andData Mining pp 425ndash434 ACM Cambridge UK Feburary2017
[15] D Kim C Park J Oh S Lee andH Yu ldquoConvolutional matrixfactorization for document context-aware recommendationrdquoin Proceedings of the 10th ACM Conference on RecommenderSystems pp 233ndash240 ACM 2016
[16] B Hidasi A Karatzoglou L Baltrunas and D Tikk ldquoSession-based recommendations with recurrent neural networksrdquo 2015httpsarxivorgabs151106939
[17] A Almahairi K Kastner K Cho and A Courville ldquoLearningdistributed representations from reviews for collaborative filter-ingrdquo in Proceedings of the 9th ACMConference on RecommenderSystems pp 147ndash154 ACM Vienna Austria September 2015
[18] T Bansal D Belanger and A McCallum ldquoAsk the gru multi-task learning for deep text recommendationsrdquo in Proceedings ofthe 10th ACMConference on Recommender Systems pp 107ndash114ACM 2016
[19] J Chen W Zhang P Zhang P Ying K Niu and M ZouldquoExploiting spatial and temporal for point of interest recom-mendationrdquoComplexity vol 2018 Article ID 6928605 16 pages2018
[20] P Zhao X Xu Y Liu V S Sheng K Zheng and H XiongldquoPhoto2trip exploiting visual contents in geo-tagged photos forpersonalized tour recommendationrdquo in Proceedings of the 2017ACM on Multimedia Conference - MM 17 pp 916ndash924 ACMPress Mountain View California USA October 2017
[21] P Covington J Adams and E Sargin ldquoDeep neural networksfor youtube recommendationsrdquo in Proceedings of the 10th ACMConference on Recommender Systems pp 191ndash198 ACM 2016
[22] C-Y Wu A Ahmed A Beutel A J Smola and H JingldquoRecurrent recommender networksrdquo in Proceedings of the TenthACM International Conference onWeb Search and DataMiningpp 495ndash503 ACM 2017
[23] A Beutel P Covington S Jain et al ldquoLatent cross making useof context in recurrent recommender systemsrdquo inProceedings ofthe Eleventh ACM International Conference on Web Search andDataMining pp 46ndash54 ACMMarinaDel Rey CA USA 2018
[24] H Yin W Wang H Wang L Chen and X Zhou ldquoSpatial-aware hierarchical collaborative deep learning for POI rec-ommendationrdquo IEEE Transactions on Knowledge and DataEngineering vol 29 no 11 pp 2537ndash2551 2017
[25] H Yin X Zhou Y Shao H Wang and S Sadiq ldquoJointmodeling of user check-in behaviors for point-of-interest rec-ommendationrdquo in Proceedings of the 24th ACM Internationalon Conference on Information and Knowledge Management pp1631ndash1640 ACMMelbourne Australia October 2015
[26] P Zhao X Xu Y Liu et al ldquoExploiting hierarchical structuresfor POI recommendationrdquo in Proceedings of the 2017 IEEEInternational Conference on Data Mining (ICDM) IEEE NewOrleans LA USA November 2017
[27] P Zhao H Zhu Y Liu et al ldquoWhere to go next a spatio-temporal gated network for next poi recommendationrdquo inProceedings of the 33rdAAAIConference onArtificial Intelligence(AAAI 2019) 2019
[28] HWang F Zhang X Xie andMGuo ldquoDkn deep knowledge-aware network for news recommendationrdquo in Proceedings of the2018World Wide Web Conference pp 1835ndash1844 Lyon FranceApril 2018
[29] Y Gong and Q Zhang ldquoHashtag recommendation usingattention-based convolutional neural networkrdquo in Proceedingsof the 25th International Joint Conference on Artificial Intelli-gence IJCAI 2016 pp 2782ndash2788 NY USA July 2016
[30] Y Tay A T Luu and S C Hui ldquoMulti-pointer co-attentionnetworks for recommendationrdquo in Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Discovery ampData Mining pp 2309ndash2318 London UK August 2018
[31] Z Cheng Y Ding L Zhu and M Kankanhalli ldquoAspect-awarelatent factormodel rating prediction with ratings and reviewsrdquoin Proceedings of the 2018World WideWeb Conference pp 639ndash648 Lyon France April 2018
[32] D Tang BQin T Liu andY Yang ldquoUsermodeling with neuralnetwork for review rating predictionrdquo in Proceedings of the 24thInternational Conference on Artificial Intelligence IJCAI 2015pp 1340ndash1346 Argentina July 2015
[33] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2324 1998
[34] Y Kim ldquoConvolutional neural networks for sentence classi-ficationrdquo in Proceedings of the 2014 Conference on Empirical
Complexity 11
Methods in Natural Language Processing (EMNLP) pp 1746ndash1751 Association for Computational Linguistics Doha Qatar2014 httpsaclanthologyinfopapersD14-1181d14-1181
[35] Y Zhang Q Ai X Chen andW B Croft ldquoJoint representationlearning for top-N recommendation with heterogeneous infor-mation sourcesrdquo in Proceedings of the 2017 ACM on Conferenceon Information and Knowledge Management pp 1449ndash1458ACM Singapore Singapore November 2017
[36] J Pennington R Socher and C Manning ldquoGloVe globalvectors for word representationrdquo in Proceedings of the 2014 Con-ference on Empirical Methods in Natural Language Processing(EMNLP) pp 1532ndash1543 2014
[37] T Mikolov I Sutskever K Chen G S Corrado and JDean ldquoDistributed representations of words and phrases andtheir compositionalityrdquo in Proceedings of the 26th Interna-tional Conference on Neural Information Processing SystemsAdvances in neural information processing systems pp 3111ndash3119 Lake TahoeNevada 2013 httpsdlacmorgcitationcfmid=2999959
[38] D P Kingma and J Ba ldquoAdam a method for stochasticoptimizationrdquo 2014 httpsarxivorgabs14126980
[39] P Gopalan J M Hofman and DM Blei ldquoScalable recommen-dationwith hierarchical Poisson factorizationrdquo inProceedings ofthe 31st Conference on Uncertainty in Artificial Intelligence UAI2015 pp 326ndash335 Netherlands July 2015
[40] X Luo M Zhou Y Xia and Q Zhu ldquoAn efficient non-negativematrix-factorization-based approach to collaborative filteringfor recommender systemsrdquo IEEE Transactions on IndustrialInformatics vol 10 no 2 pp 1273ndash1284 2014
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
To this end we test the proposed algorithm with 1-4 hiddenlayers used to learn the user-item interaction with contextualregularization in varying sizes from 8 to 128 hidden unitsThe results in terms of test setrsquos accuracy are presented inTable 2 where the number of hidden layers is defined ascolumns and the size of the first unit is presented as rowsUnlike previous results we find that two hidden layers with32 and 16 hidden units result in the best performance for ourdataset
443 Number of Words The use of written reviews in theiroriginal order allows the strengths of CNN and RNN to beexploited by finding the best representation for every fewwords and eventually for the whole text Our final datasethowever is composed of very long reviews where to fullylearn a single user or location more than 20000 words arerequired making it computationally expensive to extract rel-evant representations To benefit from the sequential natureof the written reviews while keeping the solution feasible thenumber of words was limited to a range of 500-6000 As canbe witnessed from Figure 5 there is a slight improvement inaccuracy as the number of words increase up to 3000 whileadditional words result in an increased bias towards users andlocations with longer reviews and in turn reduce the modelrsquoslearning capabilities
5 Conclusion and Future Work
In this paper we developed a neural POI recommendersystem called TCENR The model exploits data about userslocations spatial data social networks and textual reviewsto predict the implicit preference of users regarding POIsTCENR models two types of user-location interactionsnative check-ins regularized by contextual information andthe words used to describe the usersrsquo experiences We furtherextended our proposed method and presented TCENRseqwhere textual data was modeled using RNN instead of CNNEvaluated over the Yelp dataset the proposed algorithms
0822
0825
0828
0831
0834
0 2000 4000 6000
Figure 5 Number of words comparison in terms of accuracy
consistently achieved superior results compared to sevenstate-of-the-art baselines in terms of accuracy and MSE
For future work we intend to extend our modelsrsquoevaluation over additional LBSN datasets In addition weplan to investigate the proposed frameworksrsquo contributionto the cold-start problem by analyzing its performance onadditional data while taking newusers and locationswith fewreviews into account
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was supported in part by the National NaturalScience Foundation of China Grant (61572289) and NSERCDiscovery Grants
References
[1] H Li Y Ge R Hong and H Zhu ldquoPoint-of-interest rec-ommendations learning potential check-ins from friendsrdquo inProceedings of the 22nd ACM SIGKDD International Conferenceon Knowledge Discovery And Data Mining pp 975ndash984 ACMSan Francisco California USA August 2016
[2] C Yang L Bai C Zhang Q Yuan and J Han ldquoBridgingcollaborative filtering and semi-supervised learning a neuralapproach for poi recommendationrdquo in Proceedings of the 23rdACM SIGKDD International Conference on Knowledge Discov-ery andDataMining pp 1245ndash1254 ACMHalifaxNS CanadaAugust 2017
[3] X He L Liao H Zhang L Nie X Hu and T-S Chua ldquoNeuralcollaborative filteringrdquo in Proceedings of the 26th InternationalConference onWorldWideWeb InternationalWorldWideWebConferences Steering Committee pp 173ndash182 Perth AustraliaApril 2017
[4] H Ma D Zhou C Liu M R Lyu and I King ldquoRecommendersystems with social regularizationrdquo in Proceedings of the 4thACM International Conference onWeb Search and DataMiningpp 287ndash296 ACM February 2011
10 Complexity
[5] W Wang H Yin L Chen Y Sun S Sadiq and X ZhouldquoGeo-sage a geographical sparse additive generative model forspatial item recommendationrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1255ndash1264 ACM Sydney NSW AustraliaAugust 2015
[6] H Yin Y Sun B Cui Z Hu and L Chen ldquoLcars a location-content-aware recommender systemrdquo in Proceedings of the 19thACM SIGKDD International Conference on Knowledge Discov-ery andDataMining pp 221ndash229 ACMChicago Illinois USAAugust 2013
[7] C Wang and D M Blei ldquoCollaborative topic modeling for rec-ommending scientific articlesrdquo in Proceedings of the 17th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo11) pp 448ndash456 ACM August 2011
[8] HWang NWang andD-Y Yeung ldquoCollaborative deep learn-ing for recommender systemsrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1235ndash1244 ACM Sydney NSW AustraliaAugust 2015
[9] J Manotumruksa C Macdonald and I Ounis ldquoA deep recur-rent collaborative filtering framework for venue recommen-dationrdquo in Proceedings of the 2017 ACM on Conference onInformation and KnowledgeManagement pp 1429ndash1438 ACMSingapore Singapore November 2017
[10] Q Liu S Wu D Wang Z Li and L Wang ldquoContext-awaresequential recommendationrdquo in Proceedings of the 2016 IEEE16th International Conference on Data Mining (ICDM) pp1053ndash1058 IEEE Barcelona Spain December 2016
[11] H-T Cheng L Koc J Harmsen et al ldquoWide amp deep learningfor recommender systemsrdquo inProceedings of the 1stWorkshop onDeep Learning for Recommender Systems pp 7ndash10 ACM 2016
[12] Y Yu L Zhang C Wang R Gao W Zhao and J JiangldquoNeural personalized ranking via poisson factormodel for itemrecommendationrdquoComplexity vol 2019 Article ID 3563674 16pages 2019
[13] A Van Den Oord S Dieleman and B Schrauwen ldquoDeepcontent-based music recommendationrdquo in Proceedings of the26th International Conference on Neural Information ProcessingSystems Advances in neural information processing systemspp 2643ndash2651 2013
[14] L Zheng V Noroozi and P S Yu ldquoJoint deepmodeling of usersand items using reviews for recommendationrdquo in Proceedingsof the Tenth ACM International Conference on Web Search andData Mining pp 425ndash434 ACM Cambridge UK Feburary2017
[15] D Kim C Park J Oh S Lee andH Yu ldquoConvolutional matrixfactorization for document context-aware recommendationrdquoin Proceedings of the 10th ACM Conference on RecommenderSystems pp 233ndash240 ACM 2016
[16] B Hidasi A Karatzoglou L Baltrunas and D Tikk ldquoSession-based recommendations with recurrent neural networksrdquo 2015httpsarxivorgabs151106939
[17] A Almahairi K Kastner K Cho and A Courville ldquoLearningdistributed representations from reviews for collaborative filter-ingrdquo in Proceedings of the 9th ACMConference on RecommenderSystems pp 147ndash154 ACM Vienna Austria September 2015
[18] T Bansal D Belanger and A McCallum ldquoAsk the gru multi-task learning for deep text recommendationsrdquo in Proceedings ofthe 10th ACMConference on Recommender Systems pp 107ndash114ACM 2016
[19] J Chen W Zhang P Zhang P Ying K Niu and M ZouldquoExploiting spatial and temporal for point of interest recom-mendationrdquoComplexity vol 2018 Article ID 6928605 16 pages2018
[20] P Zhao X Xu Y Liu V S Sheng K Zheng and H XiongldquoPhoto2trip exploiting visual contents in geo-tagged photos forpersonalized tour recommendationrdquo in Proceedings of the 2017ACM on Multimedia Conference - MM 17 pp 916ndash924 ACMPress Mountain View California USA October 2017
[21] P Covington J Adams and E Sargin ldquoDeep neural networksfor youtube recommendationsrdquo in Proceedings of the 10th ACMConference on Recommender Systems pp 191ndash198 ACM 2016
[22] C-Y Wu A Ahmed A Beutel A J Smola and H JingldquoRecurrent recommender networksrdquo in Proceedings of the TenthACM International Conference onWeb Search and DataMiningpp 495ndash503 ACM 2017
[23] A Beutel P Covington S Jain et al ldquoLatent cross making useof context in recurrent recommender systemsrdquo inProceedings ofthe Eleventh ACM International Conference on Web Search andDataMining pp 46ndash54 ACMMarinaDel Rey CA USA 2018
[24] H Yin W Wang H Wang L Chen and X Zhou ldquoSpatial-aware hierarchical collaborative deep learning for POI rec-ommendationrdquo IEEE Transactions on Knowledge and DataEngineering vol 29 no 11 pp 2537ndash2551 2017
[25] H Yin X Zhou Y Shao H Wang and S Sadiq ldquoJointmodeling of user check-in behaviors for point-of-interest rec-ommendationrdquo in Proceedings of the 24th ACM Internationalon Conference on Information and Knowledge Management pp1631ndash1640 ACMMelbourne Australia October 2015
[26] P Zhao X Xu Y Liu et al ldquoExploiting hierarchical structuresfor POI recommendationrdquo in Proceedings of the 2017 IEEEInternational Conference on Data Mining (ICDM) IEEE NewOrleans LA USA November 2017
[27] P Zhao H Zhu Y Liu et al ldquoWhere to go next a spatio-temporal gated network for next poi recommendationrdquo inProceedings of the 33rdAAAIConference onArtificial Intelligence(AAAI 2019) 2019
[28] HWang F Zhang X Xie andMGuo ldquoDkn deep knowledge-aware network for news recommendationrdquo in Proceedings of the2018World Wide Web Conference pp 1835ndash1844 Lyon FranceApril 2018
[29] Y Gong and Q Zhang ldquoHashtag recommendation usingattention-based convolutional neural networkrdquo in Proceedingsof the 25th International Joint Conference on Artificial Intelli-gence IJCAI 2016 pp 2782ndash2788 NY USA July 2016
[30] Y Tay A T Luu and S C Hui ldquoMulti-pointer co-attentionnetworks for recommendationrdquo in Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Discovery ampData Mining pp 2309ndash2318 London UK August 2018
[31] Z Cheng Y Ding L Zhu and M Kankanhalli ldquoAspect-awarelatent factormodel rating prediction with ratings and reviewsrdquoin Proceedings of the 2018World WideWeb Conference pp 639ndash648 Lyon France April 2018
[32] D Tang BQin T Liu andY Yang ldquoUsermodeling with neuralnetwork for review rating predictionrdquo in Proceedings of the 24thInternational Conference on Artificial Intelligence IJCAI 2015pp 1340ndash1346 Argentina July 2015
[33] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2324 1998
[34] Y Kim ldquoConvolutional neural networks for sentence classi-ficationrdquo in Proceedings of the 2014 Conference on Empirical
Complexity 11
Methods in Natural Language Processing (EMNLP) pp 1746ndash1751 Association for Computational Linguistics Doha Qatar2014 httpsaclanthologyinfopapersD14-1181d14-1181
[35] Y Zhang Q Ai X Chen andW B Croft ldquoJoint representationlearning for top-N recommendation with heterogeneous infor-mation sourcesrdquo in Proceedings of the 2017 ACM on Conferenceon Information and Knowledge Management pp 1449ndash1458ACM Singapore Singapore November 2017
[36] J Pennington R Socher and C Manning ldquoGloVe globalvectors for word representationrdquo in Proceedings of the 2014 Con-ference on Empirical Methods in Natural Language Processing(EMNLP) pp 1532ndash1543 2014
[37] T Mikolov I Sutskever K Chen G S Corrado and JDean ldquoDistributed representations of words and phrases andtheir compositionalityrdquo in Proceedings of the 26th Interna-tional Conference on Neural Information Processing SystemsAdvances in neural information processing systems pp 3111ndash3119 Lake TahoeNevada 2013 httpsdlacmorgcitationcfmid=2999959
[38] D P Kingma and J Ba ldquoAdam a method for stochasticoptimizationrdquo 2014 httpsarxivorgabs14126980
[39] P Gopalan J M Hofman and DM Blei ldquoScalable recommen-dationwith hierarchical Poisson factorizationrdquo inProceedings ofthe 31st Conference on Uncertainty in Artificial Intelligence UAI2015 pp 326ndash335 Netherlands July 2015
[40] X Luo M Zhou Y Xia and Q Zhu ldquoAn efficient non-negativematrix-factorization-based approach to collaborative filteringfor recommender systemsrdquo IEEE Transactions on IndustrialInformatics vol 10 no 2 pp 1273ndash1284 2014
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
10 Complexity
[5] W Wang H Yin L Chen Y Sun S Sadiq and X ZhouldquoGeo-sage a geographical sparse additive generative model forspatial item recommendationrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1255ndash1264 ACM Sydney NSW AustraliaAugust 2015
[6] H Yin Y Sun B Cui Z Hu and L Chen ldquoLcars a location-content-aware recommender systemrdquo in Proceedings of the 19thACM SIGKDD International Conference on Knowledge Discov-ery andDataMining pp 221ndash229 ACMChicago Illinois USAAugust 2013
[7] C Wang and D M Blei ldquoCollaborative topic modeling for rec-ommending scientific articlesrdquo in Proceedings of the 17th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo11) pp 448ndash456 ACM August 2011
[8] HWang NWang andD-Y Yeung ldquoCollaborative deep learn-ing for recommender systemsrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1235ndash1244 ACM Sydney NSW AustraliaAugust 2015
[9] J Manotumruksa C Macdonald and I Ounis ldquoA deep recur-rent collaborative filtering framework for venue recommen-dationrdquo in Proceedings of the 2017 ACM on Conference onInformation and KnowledgeManagement pp 1429ndash1438 ACMSingapore Singapore November 2017
[10] Q Liu S Wu D Wang Z Li and L Wang ldquoContext-awaresequential recommendationrdquo in Proceedings of the 2016 IEEE16th International Conference on Data Mining (ICDM) pp1053ndash1058 IEEE Barcelona Spain December 2016
[11] H-T Cheng L Koc J Harmsen et al ldquoWide amp deep learningfor recommender systemsrdquo inProceedings of the 1stWorkshop onDeep Learning for Recommender Systems pp 7ndash10 ACM 2016
[12] Y Yu L Zhang C Wang R Gao W Zhao and J JiangldquoNeural personalized ranking via poisson factormodel for itemrecommendationrdquoComplexity vol 2019 Article ID 3563674 16pages 2019
[13] A Van Den Oord S Dieleman and B Schrauwen ldquoDeepcontent-based music recommendationrdquo in Proceedings of the26th International Conference on Neural Information ProcessingSystems Advances in neural information processing systemspp 2643ndash2651 2013
[14] L Zheng V Noroozi and P S Yu ldquoJoint deepmodeling of usersand items using reviews for recommendationrdquo in Proceedingsof the Tenth ACM International Conference on Web Search andData Mining pp 425ndash434 ACM Cambridge UK Feburary2017
[15] D Kim C Park J Oh S Lee andH Yu ldquoConvolutional matrixfactorization for document context-aware recommendationrdquoin Proceedings of the 10th ACM Conference on RecommenderSystems pp 233ndash240 ACM 2016
[16] B Hidasi A Karatzoglou L Baltrunas and D Tikk ldquoSession-based recommendations with recurrent neural networksrdquo 2015httpsarxivorgabs151106939
[17] A Almahairi K Kastner K Cho and A Courville ldquoLearningdistributed representations from reviews for collaborative filter-ingrdquo in Proceedings of the 9th ACMConference on RecommenderSystems pp 147ndash154 ACM Vienna Austria September 2015
[18] T Bansal D Belanger and A McCallum ldquoAsk the gru multi-task learning for deep text recommendationsrdquo in Proceedings ofthe 10th ACMConference on Recommender Systems pp 107ndash114ACM 2016
[19] J Chen W Zhang P Zhang P Ying K Niu and M ZouldquoExploiting spatial and temporal for point of interest recom-mendationrdquoComplexity vol 2018 Article ID 6928605 16 pages2018
[20] P Zhao X Xu Y Liu V S Sheng K Zheng and H XiongldquoPhoto2trip exploiting visual contents in geo-tagged photos forpersonalized tour recommendationrdquo in Proceedings of the 2017ACM on Multimedia Conference - MM 17 pp 916ndash924 ACMPress Mountain View California USA October 2017
[21] P Covington J Adams and E Sargin ldquoDeep neural networksfor youtube recommendationsrdquo in Proceedings of the 10th ACMConference on Recommender Systems pp 191ndash198 ACM 2016
[22] C-Y Wu A Ahmed A Beutel A J Smola and H JingldquoRecurrent recommender networksrdquo in Proceedings of the TenthACM International Conference onWeb Search and DataMiningpp 495ndash503 ACM 2017
[23] A Beutel P Covington S Jain et al ldquoLatent cross making useof context in recurrent recommender systemsrdquo inProceedings ofthe Eleventh ACM International Conference on Web Search andDataMining pp 46ndash54 ACMMarinaDel Rey CA USA 2018
[24] H Yin W Wang H Wang L Chen and X Zhou ldquoSpatial-aware hierarchical collaborative deep learning for POI rec-ommendationrdquo IEEE Transactions on Knowledge and DataEngineering vol 29 no 11 pp 2537ndash2551 2017
[25] H Yin X Zhou Y Shao H Wang and S Sadiq ldquoJointmodeling of user check-in behaviors for point-of-interest rec-ommendationrdquo in Proceedings of the 24th ACM Internationalon Conference on Information and Knowledge Management pp1631ndash1640 ACMMelbourne Australia October 2015
[26] P Zhao X Xu Y Liu et al ldquoExploiting hierarchical structuresfor POI recommendationrdquo in Proceedings of the 2017 IEEEInternational Conference on Data Mining (ICDM) IEEE NewOrleans LA USA November 2017
[27] P Zhao H Zhu Y Liu et al ldquoWhere to go next a spatio-temporal gated network for next poi recommendationrdquo inProceedings of the 33rdAAAIConference onArtificial Intelligence(AAAI 2019) 2019
[28] HWang F Zhang X Xie andMGuo ldquoDkn deep knowledge-aware network for news recommendationrdquo in Proceedings of the2018World Wide Web Conference pp 1835ndash1844 Lyon FranceApril 2018
[29] Y Gong and Q Zhang ldquoHashtag recommendation usingattention-based convolutional neural networkrdquo in Proceedingsof the 25th International Joint Conference on Artificial Intelli-gence IJCAI 2016 pp 2782ndash2788 NY USA July 2016
[30] Y Tay A T Luu and S C Hui ldquoMulti-pointer co-attentionnetworks for recommendationrdquo in Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Discovery ampData Mining pp 2309ndash2318 London UK August 2018
[31] Z Cheng Y Ding L Zhu and M Kankanhalli ldquoAspect-awarelatent factormodel rating prediction with ratings and reviewsrdquoin Proceedings of the 2018World WideWeb Conference pp 639ndash648 Lyon France April 2018
[32] D Tang BQin T Liu andY Yang ldquoUsermodeling with neuralnetwork for review rating predictionrdquo in Proceedings of the 24thInternational Conference on Artificial Intelligence IJCAI 2015pp 1340ndash1346 Argentina July 2015
[33] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2324 1998
[34] Y Kim ldquoConvolutional neural networks for sentence classi-ficationrdquo in Proceedings of the 2014 Conference on Empirical
Complexity 11
Methods in Natural Language Processing (EMNLP) pp 1746ndash1751 Association for Computational Linguistics Doha Qatar2014 httpsaclanthologyinfopapersD14-1181d14-1181
[35] Y Zhang Q Ai X Chen andW B Croft ldquoJoint representationlearning for top-N recommendation with heterogeneous infor-mation sourcesrdquo in Proceedings of the 2017 ACM on Conferenceon Information and Knowledge Management pp 1449ndash1458ACM Singapore Singapore November 2017
[36] J Pennington R Socher and C Manning ldquoGloVe globalvectors for word representationrdquo in Proceedings of the 2014 Con-ference on Empirical Methods in Natural Language Processing(EMNLP) pp 1532ndash1543 2014
[37] T Mikolov I Sutskever K Chen G S Corrado and JDean ldquoDistributed representations of words and phrases andtheir compositionalityrdquo in Proceedings of the 26th Interna-tional Conference on Neural Information Processing SystemsAdvances in neural information processing systems pp 3111ndash3119 Lake TahoeNevada 2013 httpsdlacmorgcitationcfmid=2999959
[38] D P Kingma and J Ba ldquoAdam a method for stochasticoptimizationrdquo 2014 httpsarxivorgabs14126980
[39] P Gopalan J M Hofman and DM Blei ldquoScalable recommen-dationwith hierarchical Poisson factorizationrdquo inProceedings ofthe 31st Conference on Uncertainty in Artificial Intelligence UAI2015 pp 326ndash335 Netherlands July 2015
[40] X Luo M Zhou Y Xia and Q Zhu ldquoAn efficient non-negativematrix-factorization-based approach to collaborative filteringfor recommender systemsrdquo IEEE Transactions on IndustrialInformatics vol 10 no 2 pp 1273ndash1284 2014
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Complexity 11
Methods in Natural Language Processing (EMNLP) pp 1746ndash1751 Association for Computational Linguistics Doha Qatar2014 httpsaclanthologyinfopapersD14-1181d14-1181
[35] Y Zhang Q Ai X Chen andW B Croft ldquoJoint representationlearning for top-N recommendation with heterogeneous infor-mation sourcesrdquo in Proceedings of the 2017 ACM on Conferenceon Information and Knowledge Management pp 1449ndash1458ACM Singapore Singapore November 2017
[36] J Pennington R Socher and C Manning ldquoGloVe globalvectors for word representationrdquo in Proceedings of the 2014 Con-ference on Empirical Methods in Natural Language Processing(EMNLP) pp 1532ndash1543 2014
[37] T Mikolov I Sutskever K Chen G S Corrado and JDean ldquoDistributed representations of words and phrases andtheir compositionalityrdquo in Proceedings of the 26th Interna-tional Conference on Neural Information Processing SystemsAdvances in neural information processing systems pp 3111ndash3119 Lake TahoeNevada 2013 httpsdlacmorgcitationcfmid=2999959
[38] D P Kingma and J Ba ldquoAdam a method for stochasticoptimizationrdquo 2014 httpsarxivorgabs14126980
[39] P Gopalan J M Hofman and DM Blei ldquoScalable recommen-dationwith hierarchical Poisson factorizationrdquo inProceedings ofthe 31st Conference on Uncertainty in Artificial Intelligence UAI2015 pp 326ndash335 Netherlands July 2015
[40] X Luo M Zhou Y Xia and Q Zhu ldquoAn efficient non-negativematrix-factorization-based approach to collaborative filteringfor recommender systemsrdquo IEEE Transactions on IndustrialInformatics vol 10 no 2 pp 1273ndash1284 2014
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences