A Neural Network Approach to Joint Modeling Social Networks … · 2018. 3. 21. · Social networks and mobile trajectories are heteroge-neous data types. A social network is typically

A Neural Network Approach to Joint

Modeling Social Networks and Mobile

Trajectories

Cheng YangTsinghua University, China

Maosong SunTsinghua University, China

Wayne Xin ZhaoRenmin University, China

Zhiyuan LiuTsinghua University, China

Edward Y.ChangHTC Research & Innovation

October 24, 2018

Abstract

The accelerated growth of mobile trajectories in location-based ser-vices brings valuable data resources to understand users’ moving behav-iors. Apart from recording the trajectory data, another major character-istic of these location-based services is that they also allow the users toconnect whomever they like or are interested in. A combination of socialnetworking and location-based services is called as location-based socialnetworks (LBSN). As shown in [11], locations that are frequently visitedby socially-related persons tend to be correlated, which indicates the closeassociation between social connections and trajectory behaviors of usersin LBSNs. In order to better analyze and mine LBSN data, we need tohave a comprehensive view to analyze and mine the information from thetwo aspects, i.e., the social network and mobile trajectory data.

In this paper, we present a novel neural network model which can jointmodel both social networks and mobile trajectories. In specific, our modelconsists of two components: the construction of social networks and thegeneration of mobile trajectories. We first adopt a network embeddingmethod for the construction of social networks: a networking representa-tion can be derived for a user. The key of our model lies in the componentof generating mobile trajectories. We have considered four factors thatinfluence the generation process of mobile trajectories, namely user visitpreference, influence of friends, short-term sequential contexts and long-term sequential contexts. To characterize the last two contexts, we employthe RNN and GRU models to capture the sequential relatedness in mo-bile trajectories at different levels, i.e., short term or long term. Finally,the two components are tied by sharing the user network representations.Experimental results on two important applications demonstrate the effec-tiveness of our model. Especially, the improvement over baselines is moresignificant when either network structure or trajectory data is sparse.

1

arX

iv:1

606.

0815

4v1

[cs

.SI]

27

Jun

2016

1 Introduction

In recent years, mobile devices (e.g., smartphones and tablets) are widely usedby users. With the innovation and development on the Internet technology,mobile devices serve as an essential connection to the broader world of onlineinformation for users. In daily life, a user can use her smartphone for navigatingnumerous important life activities, including researching a travel plan, accessingonline education, and looking for a job. The accelerated growth of mobile us-age brings unique opportunities to data mining research communities. Amongthese rich mobile data, an important kind of data resource is the huge amountof mobile trajectory data obtained from GPS sensors on mobile devices. Thesesensor footprints provide a valuable information resource to discover users’ tra-jectory patterns and understand their moving behaviors. Several location-basedsharing services have emerged and received much attention, such as Gowalla1

and Brightkite2.Apart from recording user trajectory data, another major characteristic

of these location-based services is that they also allow the users to connectwhomever they like or are interested in. For example, with Brightkite you cantrack on your friends or any other Brightkite users nearby using the phone’sbuilt in GPS. A combination of social networking and location-based serviceshas lead to a specific style of social networks, termed as location-based social net-works (LBSN) [1, 10]. We present an illustrative example for LBSNs in Fig. 1,and it can been seen that LBSNs usually include both the social network andmobile trajectory data. As shown in [11], locations that are frequently visitedby socially-related persons tend to be correlated, which indicates the close asso-ciation between social connections and trajectory behaviors of users in LBSNs.We need to have a comprehensive view to analyze and mine the informationfrom the two aspects. In this paper, our focus is to develop an joint approachto model LBSN data by characterizing both the social network and mobile tra-jectory data.

(a) Friendship Network (b) User Trajectory

Figure 1: An illustrative example for the data in LBSNs: (a) Link connectionsrepresent the friendship between users. (b) A trajectory generated by a user isa sequence of chronologically ordered check-in records.

1https://en.wikipedia.org/wiki/Gowalla2https://en.wikipedia.org/wiki/Brightkite

2

In the first aspect, social network analysis has attracted increasing attentionduring the past decade. It characterizes network structures in terms of nodes(individual actors, people, or things within the network) and the ties or edges(relationships or interactions) that connect them. A variety of applicationshave been developed on social networks, including network classification [35],link prediction [25], anomaly detection [6] and community detection [16]. Afundamental issue is how to represent network nodes. Recently, networkingembedding models [30] have been proposed in order to solve the data sparsityin networks. In the second aspect, location-based services provide a convenientway for users to record their trajectory information, usually called check-in.Independent of social networking analysis, many studies have been constructedto improve the location-based services. A typical application task is the lo-cation recommendation, which aims to infer users’ visit preference and makemeaningful recommendations for users to visit. It can be divided into threedifferent settings: general location recommendation [8, 48], time-aware locationrecommendation [50, 51] and next-location recommendation [9, 47, 52]. Generallocation recommendation will generate an overall recommendation list of loca-tions for a users to visit; while time-aware or next location recommendationfurther imposes the temporal constraint on the recommendation task by eitherspecifying the time period or producing sequential predictions.

These two aspects capture different data characteristics on LBSNs and tendto be correlated with each other [8, 22]. To conduct better and more effectivedata analysis and mining studies, there is a need to develop a joint model bycapturing both network structure and trajectory behaviors on LBSNs. However,such a task is challenging. Social networks and mobile trajectories are heteroge-neous data types. A social network is typically characterized by a graph, whilea trajectory is usually modelled as a sequence of check-in records. A commonlyused way to incorporate social connections into an application system (e.g.,recommender systems) is to adopt the regularizations techniques by assumingthat the links convey the user similarity. In this way, the social connections areexploited as the side information but not characterized by a joint data model,and the model performance highly rely on the “homophily principle” of likeassociates with like. In this paper, we take the initiative to jointly model so-cial networks and mobile trajectories using a neural network approach. Ourapproach is inspired by the recent progress on deep learning. First, several pio-neering studies try to embed vertices of a network into low-dimensional vectorspaces [30, 37, 39], called networking embedding. With such a low-dimensionaldense vector, it can alleviate the data sparsity that a sparse network represen-tation suffers from. Second, neural network models are powerful computationaldata models that are able to capture and represent complex input/output re-lationships. Especially, several neural network models for processing sequentialdata have been proposed, such as recurrent neural networks (RNN) [27]. RNNand its variants including LSTM and GRU have shown good performance inmany applications.

By combining the merits from both network embedding and sequential mod-elling from deep learning, we present a novel neural network model which canjoint model both social networks and mobile trajectories. In specific, our modelconsists of two components: the construction of social networks and the genera-tion of mobile trajectories. We first adopt a network embedding method for theconstruction of social networks: a networking representation can be derived for

3

a user. The key of our model lies in the component generating mobile trajec-tories. We have considered four factors that influence the generation process ofmobile trajectories, namely user visit preference, influence of friends, short-termsequential contexts and long-term sequential contexts. The first two factors aremainly related to the users themselves, while the last factors mainly reflect thesequential characteristics of historical trajectories. We set two different user rep-resentations to model the first two factors: a visit interest representation and anetwork representation. To characterize the last two contexts, we employ theRNN and GRU models to capture the sequential relatedness in mobile trajecto-ries at different levels, i.e., short term or long term. Finally, the two componentsare tied by sharing the user network representations: the information from thenetwork structure is encoded in the user networking representation, which issubsequently utilized in the generation process of mobile trajectories.

To demonstrate the effectiveness of the proposed model, we evaluate ourmodel using real-world datasets on two important LBSN applications, namelynext-location recommendation and friend recommendation. For the first task,the trajectory data is the major information signal while network structureserves as auxiliary data. Our method consistently outperforms several compet-itive baselines. Interestingly, we have found that for users with little check-indata, the auxiliary data (i.e., network structure) becomes more important toconsider. For the second task, the network data is the major information sig-nal while trajectory data serves as auxiliary data. The finding is similar tothat in the first task: our method still performs best, especially for those userswith few friend links. Experimental results on the two important applicationsdemonstrate the effectiveness of our model. In our approach, network structureand trajectory information complement each other. Hence, the improvementover baselines is more significant when either network structure or trajectorydata is sparse.

Our contributions are three-fold summarized below:

• We proposed a novel neural network model to jointly characterize socialnetwork structure and users’ trajectory behaviors. In our approach, net-work structure and trajectory information complement each other. It pro-vides a promising way to characterize heterogenous data types in LBSNs.

• Our model considered four factors in the generation of mobile trajectories,including user visit preference, influence of friends, short-term sequentialcontexts and long-term sequential contexts. The first two factors are mod-elled by two different embedding representations for users. The modelfurther employed both RNN and GRU models to capture both short-termand long-term sequential contexts.

• Experimental results on two important applications demonstrated the ef-fectiveness of our model. Interestingly, the improvement over baselines wasmore significant when either network structure or trajectory informationwas sparse.

The rest of this paper is organized as follows. Section 2 reviews the relatedwork and Section 3 presents the problem formulation. The proposed modeltogether with the learning algorithm is given in Section 4. We present theexperimental evaluation in Section 5. Section 6 concludes the paper and presentsthe future work.

4

2 Related Work

Our work is mainly related to distributed representation learning, social linkprediction and location recommendation.

2.1 Distributed Representation Learning and Neural Net-work Models

Machine learning algorithms based on data representation learning make a greatsuccess in the past few years. Representations learning of the data can extractuseful information for learning classifiers and other predictors. Distributed rep-resentation learning has been widely used in many machine learning tasks [3],such as computer vision [20] and natural language processing [28].

During the last decade, many works have also been proposed for networkembedding learning [7,30,38,39]. Traditional network embedding learning algo-rithms learn vertex representations by computing eigenvectors of affinity matri-ces [2,39,45]. For example, DGE [7] solves generalized eigenvector computationproblem on combinational Laplacian matrix; SocioDim [39] computes k small-est eigenvectors of normalized graph Laplacian matrix as k-dimensional vertexrepresentations.

DeepWalk [30] adapts Skip-Gram [28], a widely used language model innatural language processing area, for NRL on truncated random walks. Deep-Walk which leverages deep learning technique for network analysis is much moreefficient than traditional NRL algorithms and makes large-scale NRL possi-ble. Following this line, LINE [37] is a scalable network embedding algorithmwhich models the first-order and second-order proximities between vertices andGraRep [5] characterizes local and global structural information for networkembedding by computing SVD decomposition on k-step transition probabilitymatrix.

TADW [46] and PTE [36] extend DeepWalk and LINE by incorporatingother information into NRL respectively. TADW embeds text informationinto vertex representation by matrix factorization framework and PTE learnssemi-supervised embeddings from heterogeneous text networks. However bothTADW and PTE conduct experiments on document networks and fail to takesequential information between words into consideration.

Neural network models have achieved great success during the last decade.Two well-know neural network architectures are Convolutional Neural Network(CNN) and Recurrent Neural Network (RNN). CNN is used for extracting fixlength representation from various size of data [20]. RNN and its variant GRUwhich aim at sequential modeling have been successfully applied in sentencemodeling [27], speech signal modeling [12] and sequential click prediction [53].

2.2 Social Link Prediction

Social link prediction has been widely studied in various social networks bymining graph structure patterns such as triadic closure process [34] and userdemographics [19]. In this paper, we mainly focus on the applications on tra-jectory data.

Researchers used to measure user similarity by evaluating sequential pat-terns. For example, they used a sequence of stay points to represent a user tra-

5

jectory and evaluated user similarity by a sequence matching algorithm [23]. Inorder to improve these methods, people also took pre-defined tags and weightsinto consideration to better characterize stay points [44]. As LBSN becomesincreasingly popular, trajectory similarity mining has attracted much more at-tention. A number of factors was considered to better characterize the similarity.As a result, physical distance [13], location category [21], spatial or temporalco-location rate [41] and co-occurrence with time and distance constraints [31]were proposed for social link prediction. The diversity of co-occurrence andpopularity of locations [32] were proved to be important features among all thefactors. Using associated social ties to cluster locations, the social strength canbe inferred in turn by extracted clusters shared by users [11].

2.3 Location Recommendation

One of the most important tasks on trajectory modeling is location recommenda-tion. For general location recommendation, several kinds of side information areconsidered, such as geographical [8,48] and social network information [22]. Toaddress the data sparsity issue, content information including location categorylabels is also concerned [49, 56]. The location labels and tags can also be usedin probabilistic model such as aggregate LDA [18]. Textual information whichincludes text descriptions [18, 24, 54] are applied for location recommendationas well. W 4 employs tensor factorization on multi-dimensional collaborativerecommendation for Who (user), What (location category), When(time) andWhere (location) [4, 55]. However, these methods which are mainly based oncollaborate filtering, matrix factorization or LDA do not model the sequentialinformation in the trajectory.

For time-aware location recommendation task which recommends locationsat a specific time, it is also worth modeling the temporal effect. Collaboratefiltering based method [50] unifies temporal and geographical information withlinear combination. Geographical-temporal graph was proposed for time-awarelocation recommendation by doing preference propagation on the graph [51]. Inaddition, temporal effect is also studied via nonnegative matrix factorization [17]and RNN [26].

Different from general location recommendation, next-location recommen-dation also need to take current state into account. Therefore, the sequen-tial information is more important to consider in next location recommen-dation. Most previous works model sequential behaviors, i.e., trajectories ofcheck-in locations, based on Markov chain assumption which assumes the nextlocation is determined only by current location and independent of previousones [9,33,47,52]. For example, Factorized Personalized Markov Chain (FPMC)algorithm [33] factorizes the tensor of transition cube which includes transi-tion probability matrices of all users. Personalized Ranking Metric Embedding(PRME) [15] further extends FPMC by modeling user-location distance andlocation-location distance in two different vector spaces. Hierarchical Represen-tation Model (HRM) [42], which is originally designed for user purchase behaviormodeling, can be easily adapted for modeling user trajectories. HRM builds atwo-layer structure to predict items in next transaction with user features anditems in last transaction. These methods are applied for next-location recom-mendation which aims at predicting the next location that a user will visit,given check-in history and current location of the user. Note that Markov chain

6

property is a strong assumption that assumes next location is determined onlyby current location. In practice, next location may also be influenced by theentire check-in history.

3 Problem Formalization

We use L to denote the set of locations (a.k.a. check-in points or POIs) . Whena user v checks in at a location l at the timestamp s, the information can bemodeled as a triplet 〈v, l, s〉. Given a user v, her trajectory Tv is a sequenceof triplets related to v: 〈v, l1, s1〉, ..., 〈v, li, si〉, ..., 〈v, lN , sN 〉, where N is thesequence length and the triplets are ordered by timestamps ascendingly. Forbrevity, we rewrite the above formulation of Tv as a sequence of locations Tv =

{l(v)1 , l(v)2 , . . . , l

(v)N } in chronological order. Furthermore, we can split a trajectory

into multiple consecutive subtrajectories: the trajectory Tv is split into mv

subtrajectories T 1v , . . . , T

mvv . Each subtrajectory is essentially a subsequence of

the original trajectory sequence. In order to split the trajectory, we compute thetime interval between two check-in points in the original trajectory sequence, wefollow [9] to make a splitting when the time interval is larger than six hours. Tothis end, each user corresponds to a trajectory sequence Tv consisting of severalconsecutive subtrajectories T 1

v , . . . , Tmvv . Let T denote the set of trajectories for

all the users.Besides trajectory data, location-based services provide social connection

links among users, too. Formally, we model the social network as a graphG = (V,E), where each vertex v ∈ V represents a user, each edge e ∈ Erepresents the friendship between two users. In real applications, the edges canbe either undirected or directed. As we will see, our model is flexible to dealwith both types of social networks. Note that these links mainly reflect onlinefriendship, which do not necessarily indicate that two users are friends in actuallife.

Given the social network information G = (V,E) and the mobile trajectoryinformation T , we aim to develop a joint model which can characterize andutilize both kinds of data resources. Such a joint model should be more effec-tive those built with a single data resource alone. In order to test the modelperformance, we set up two application tasks in LBSNs.

Task I. For the task of next-location recommendation, our goal is to recommenda ranked list of locations that a user v is likely to visit next at each step.

Task II. For the task of friend recommendation, our goal is to recommend aranked list of users that are likely to be the friends of a user v.

We select these tasks because they are widely studied in LBSNs, respectivelyrepresenting two aspects for mobile trajectory mining and social networkinganalysis. Other tasks related to LBSN data can be equally solved by our model,which are not our focus in this paper.

4 The Proposed Model

In this section, we present a novel neural network model for generating bothsocial network and mobile trajectory data. In what follows, we first study how

7

Table 1: Notations used in this paper.Notation Descriptions

V,E vertex and edge setL location set

Tv, Tjv trajectory and the j-th subtrajectory of user v

mv number of subtrajectories in the trajectory Tv of user vmv,j number of locations in the j-th subtrajectory of trajectory Tv of user v

l(v,j)i the i-th location of the j-th subtrajectory of user vUli representation of location li used in representation modelingU ′li representation of location li for prediction

Pv, Fv interest and friendship representation of user vF ′v context friendship representation of user vSi short-term context representation after visiting location li−1ht long-term context representation after visiting location lt−1

to characterize each individual component. Then, we present the joint modelfollowed by the parameter learning algorithm. Before introducing the modeldetails, we first summarize the used notations in this paper in Table 1.

4.1 Modeling the Construction of the Social Network

Recently, networking representation learning is widely studied [7, 30, 38, 39],and it provides a way to explore the networking structure patterns using low-dimensional embedding vectors. Not limited to discover structure patterns,network representations have been shown to be effective to serve as importantfeatures in many network-independent tasks, such as demographic prediction[19] and text classification [46]. In our task, we characterize the networkingrepresentations based on two considerations. First, a user is likely to havesimilar visit behaviors with their friends, and user links can be leveraged toshare common visit patterns. Second, the networking structure is utilized asauxiliary information to enhance the trajectory modelling.

Formally, we use a d-dimensional embedding vector of use Fv ∈ Rd to denotethe network representation of user v and matrix F ∈ R|V |×d to denote thenetwork representations for all the users. The network representation is learnedwith the user links on the social network, and encodes the information for thestructure patterns of a user.

The social network is constructed based on users’ networking representationsF . We first study how to model the generative probability for a edge of vi →vj , formally as Pr[(vi, vj) ∈ E]. The main intuition is that if two users viand vj form a friendship link on the network, their networking representationsshould be similar. In other words, the inner product F>vi · Fvj between thecorresponding two networking representations will yield a large similarity valuefor two linked users. A potential problem will be such a formulation can onlydeal with undirected networks. In order to characterize both undirected anddirected networks, we propose to incorporate a context representation for auser vj , i.e., F ′vj . Given a directed link vi → vj , we model the representation

similarity as F>vi · F′vj instead of F>vi · Fvj . The context representations are only

8

used in the network construction. We define the probability of a link vi → vjby using a sigmoid function as follows

Pr[(vi, vj) ∈ E] = σ(−F>vi · F′vj ) =

1

1 + exp(−F>vi · F ′vj ). (1)

When dealing with undirected networks, a friend pair (vi, vj) will be split intotwo directed links namely vi → vj and vj → vi. For edges not existing in E, wepropose to use the following formulation

Pr[(vi, vj) 6∈ E] = 1− σ(−F>vi· F ′vj ) =

exp(−F>vi · F′vj )

1 + exp(−F>vi · F ′vj ). (2)

Combining Eq. 1 and 2, we essentially adopt a Bernouli distribution formodelling networking links. Following studies on networking representationlearning [30], we assume that each user pair is independent in the generationprocess. That is to say the probabilities Pr[(vi, vj) ∈ E|F ] are independent fordifferent pairs of (vi, vj). With this assumption, we can factorize the generativeprobabilities by user pairs

L(G) =∑

(vi,vj)∈E

log Pr[(vi, vj) ∈ E] +∑

(vi,vj)6∈E

log Pr[(vi, vj) 6∈ E]

= −∑vi,vj

log(1 + exp(−F>vi · F′vj ))−

∑(vi,vj)6∈E

F>vi · F′vj .

(3)

4.2 Modeling the Generation of the Mobile Trajectories

In Section 3, a user trajectory is formatted as an ordered check-in sequences.Therefore, we model the trajectory generation process with a sequential neuralnetwork method. To generate a trajectory sequence, we generate the locationsin it one by one conditioned on four important factors. We first summarize thefour factors as below

• General visit preference: A user’s preference or habits directly determineher own visit behaviors.

• Influence of Friends: The visit behavior of a user is likely to be influ-enced by her friends. Previous studies [8, 22] indeed showed that sociallycorrelated users tend to visit common locations.

• Short-term sequential contexts: The next location is closely related tothe last few locations visited by a user. The idea is intuitive in that thevisit behaviors of a user is usually related to a single activity or a series ofrelated activities in a short time window, making that the visited locationshave strong correlations.

• Long-term sequential contexts: It is likely that there exists long-term de-pendency for the visited locations by a user in a long time period. Aspecific case for long-term dependency will be periodical visit behaviors.For example, a user regularly has a travel for vocation in every summervocation.

9

The first two factors are mainly related to the two-way interactions betweenusers and locations. While the last two factors mainly reflect the sequentialrelatedness among the visited locations by a user.

4.2.1 Characterization of General Visit Preference

We first characterize the general visit preference by the interest representations.We use a d-dimensional embedding vector of Pv ∈ Rd to denote the visit interestrepresentation of user v and matrix P ∈ R|V |×d to denote the visit preferencerepresentations for all the users. The visit interest representation encodes theinformation for the general preference of a user over the set of locations in termsof visit behaviors.

We assume that one’s general visit interests are relatively stable and doesnot vary too much in a given period. Such an assumption is reasonable in thata user typically has a fixed lifestyle (e.g., with a relatively fixed residence area)and her visiting behaviors are likely to show some overall patterns. The visitinterest representation aims to capture and encode such visit patterns by usinga d-dimensional embedding vector. For convenience, we call Pv as the interestrepresentation for user v.

4.2.2 Characterization of Influence of Friends

For characterizing influence of friends, a straightforward approach is to modelthe correlation between interest representations from two linked users with someregularization terms. However, such a method usually has high computationalcomplexity. In this paper, we adopt a more flexible method: we incorporatethe network representation in the trajectory generation process. Because thenetwork representations are learned through the network links, the informationfrom their friends are implicitly encoded and used. We still use the formulationof networking representation Fv introduced in Section 4.1.

4.2.3 Characterization of Short-Term Sequential Contexts

Usually, the visited locations by a user in a short time window are closelycorrelated. A short sequence of the visited locations tend to be related tosome activity. For example, a sequence “Home → Traffic → Office” refers toone’s transportation activity from home to office. In addition, the geographicalor traffic limits play an important role in trajectory generation process. Forexample, a user is more likely to visit a nearby location. Therefore, when auser decides what location to visit next, the last few locations visited by herselfshould be of importance for next-location prediction.

Based on the above considerations, we treat the last few visited locations in ashort time window as the sequential history and predict the next location basedon them. To capture the short-term visit dependency, we use the Recurrent Neu-ral Network (RNN), a convenient way for modelling sequential data, to develop

our model. Formally, given the j-th subsequence T jv = {l(v,j)1 , l

(v,j)2 . . . l

(v,j)mv,j}

from the trajectory of user v, we recursively define the short-term sequentialrelatedness as follows:

Si = tanh(Uli−1 +W · Si−1) (4)

10

where Si ∈ Rd is the embedding representation for the state after visiting lo-

cation li−1, Uli ∈ Rd is the representation of location l(v,j)i and W ∈ Rd×d is a

transition matrix. Here we call Si states which are similar to those in HiddenMarkov Models. RNN resembles Hidden Markov Models in that the sequentialrelatedness is also reflected through the transitions between two consecutivestates. A major difference is that in RNN each hidden state is characterized bya d-dimensional embedding vector. As shown in Fig. 2, we derive the state rep-resentation Si by forwarding Si−1 with a transformation matrix W and addingthe embedding representation for the current location Uli−1

. The initial rep-resentation S0 is invariant among all users because short-term correlation issupposed to be irrelevant to user preference in our model. Our formulation inEq. 4 is essentially a RNN model without outputs. The embedding vector cor-responding to each state can be understood as an information summary till thecorresponding location in the sequence. Especially, the state corresponding tothe last location can be considered the embedding representation for the entiresequence.

Location

. . .

. . .

. . .

. . .WWW. . .

Si-1

Si

Si+1

li-2

li-1

li

Ul i-2 Ul i-1 Ul i

Representation

Figure 2: An illustrative example of recurrent neural networks for modellingshort-term sequential contexts.

4.2.4 Characterization of Long-Term Sequential Contexts

In the above, short-term sequential contexts (five locations on average for ourdataset) aim to capture the sequential relatedness in a short time window. Thelong-term sequential contexts are also important to consider when modellingtrajectory sequences. For example, a user is likely to show some periodical orlong-range visit patterns. To capture the long-term dependency, a straight-forward approach will be to use another RNN model for the entire trajectorysequence. However, the entire trajectory sequence generated by a user in a longtime period tends to contain a large number of locations, e.g., several hundredlocations or more. A RNN model over long sequences usually suffers from theproblem of “vanishing gradient”.

To address the problem, we employ the Gated Recurrent Unit (GRU) forcapturing long-term dependency in the trajectory sequence. Compared with

11

traditional RNN, GRU incorporates several extra gates to control the input andoutput. Specifically, we use two gates in our model: input gate and forget gates.With the help of input and forget gates, the memory of GRU, i.e., the state Ct

can remember the “important stuff” even when the sequence is very long andforget less important information if necessary. We present an illustrative figurefor the architecture for recurrent neural networks with GRUs in Fig. 3.

Ul t-1

~C

t-1I

t-1F

t-1

Ct-1

ht-1

Ul t

~C

tI

tF

t

Ct

ht

. . .

Figure 3: An illustrative architecture of recurrent neural networks with GRUs.Let Ct denote a candidate state. The current state Ct is a mixture of the laststate Ct−1 and the current candidate state Ct. It and Ft are input and forgetgate respectively, which can control this mixture.

Formally, consider the following location sequence {l1, l2, . . . , lm}, we denotethe initial state by C0 ∈ Rd and initial representation by h0 = tanh(C0) ∈ Rd.At a timestep of t, the new candidate state is updated as follows

Ct = tanh(Wc1Ult +Wc2ht−1 + bc) (5)

where Wc1 ∈ Rd×d and Wc2 ∈ Rd×d are the model parameters, Ult is theembedding representation of location lt which is the same representation usedin short-term sequential relatedness, ht−1 is the embedding representation in

the last step and bc ∈ Rd is the bias vector. Note that the computation of Ct

remains the same as that in RNN.GRU does not directly replace the state with Ct as RNN does. Instead,

GRU tries to find a balance between the last state Ct−1 and a new candidate

state Ct:Ct = it ∗ Ct + ft ∗ Ct−1 (6)

12

where ∗ is entrywise product and it, ft ∈ Rd are input and forget gate respec-tively.

And the input and forget gates it, ft ∈ Rd are defined as

it = σ(Wi1Ult +Wi2ht−1 + bi) (7)

andft = σ(Wf1Ult +Wf2ht−1 + bf ) (8)

where σ(·) is the sigmoid function, Wi1 ,Wi2 ∈ Rd×d and Wf1 ,Wf2 ∈ Rd×d areinput and forget gate parameters, and bi, bf ∈ Rd are the bias vectors.

Finally, the representation of long-term interest variation at the timestep oft is derived as follows

ht = tanh(Ct). (9)

Similar to Eq. 4, ht provides a summary which encodes the informationtill the t-th location in a trajectory sequence. We can recursively learn therepresentations after each visit of a location.

4.2.5 The Final Objective Function for Generating Trajectory Data

Given the above discussions, we are now ready to present the objective function

for generating trajectory data. Given the trajectory sequence Tv = {l(v)1 , l(v)2 , . . . , l

(v)m }

of user v, we factorize the log likelihood according to the chain rule as follows

L(Tv) = log Pr[l(v)1 , l

(v)2 , . . . , l(v)m |v,Φ]

=

m∑i=1

log Pr[l(v)i |l

(v)1 , . . . , l

(v)i−1, v,Φ],

(10)

where Φ denotes all the related parameters. As we can see, L(Tv) is char-acterized as a sum of log probabilities conditioned on the user v and relatedparameters Φ. Recall that the trajectory Tv is split into mv subtrajecto-

ries T 1v , . . . , T

mvv . Let l

(v,j)i denote the i-th location in the j-th subtrajectory.

The contextual locations for l(v,j)i contain the preceding (i − 1) locations (i.e.,

l(v,j)1 . . . l

(v,j)i−1 ) in the same subtrajectory, denoted by l

(v,j)1 : l

(v,j)i−1 , and all the

locations in previous (j − 1) subtrajectories (i.e., T 1v , . . . , T

j−1v ), denoted by

T 1v : T j−1

v . With these notions, we can rewrite Eq. 10 as follows

L(Tv) =

m∑i=1

log Pr[l(v,j)i | l

(v,j)1 : l

(v,j)i−1︸︷︷︸

short-term contexts

, T 1v : T j−1

v︸︷︷︸long-term contexts

, v,Φ]. (11)

Given the target location l(v,j)i , the term of l

(v,j)1 : l

(v,j)i−1 corresponds to

the short-term contexts, the term of T 1v : T j−1

v corresponds to the long-termcontexts, and v corresponds to the user context. The key problem becomes how

to model the conditional probability Pr[l(v,j)i |l(v,j)1 : l

(v,j)i−1 , T

1v : T j−1

v , v,Φ].For short-term contexts, we adopt the RNN model described in Eq. 4 to

characterize the the location sequence of l(v,j)1 : l

(v,j)i−1 . We use Sj

i to denotethe derived short-term representation after visiting the i-th location in the j-th

13

subtrajectory; For long-term contexts, the locations in the preceding subtrajec-tories T 1

v . . . Tj−1v are characterized using the GRU model in Eq. 6 ∼ 9. We use

hj to denote the derived long-term representation after visiting the locations infirst j subtrajectories. We present an illustrative example for the combinationof short-term and long-term contexts in Fig. 4.

?

Long-term Context Representation modeled by GRU

Short-term Context Representation modeled by RNN

. . .

. . .

Figure 4: An illustrative figure for modelling both short-term and long-termsequential contexts. The locations in a rounded rectangular indicates a sub-trajectory. The locations in red and blue rectangular are used for long-termand short-term sequential contexts respectively. “?” is the next location forprediction.

So far, given a target location l(v,j)i , we have obtained four representa-

tions corresponding to the four factors: networking representation (i.e., Fv),visit interest representation (i.e., Pv), short-term context representation Sj

i−1,and long-term context representation hj−1. We concatenate them into a sin-

gle context representation R(i,j)v = [Fv;Pv;Sj

i−1;hj−1] ∈ R4d and use it for

next-location generation. Given the context representation R(i,j)v , we define the

probability of l(v,j)i as

Pr[l(v,j)i |l(v,j)1 : l

(v,j)i−1 , T

1v : T j−1

v , v,Φ]

= Pr[l(v,j)i |R(i,j)

v ]

=exp(R

(i,j)v · U ′

l(v,j)i

)∑l∈L exp(R

(i,j)v · U ′l )

(12)

where parameter U ′l ∈ R4d is location representation of location l ∈ L used forprediction. Note that this location representation U ′l is totally different withthe location representation Ul ∈ Rd used in short-term and long-term contextmodelling. The overall log likelihood of trajectory generation can be computedby adding up all the locations.

4.3 The Joint Model

Our general model is a linear combination between the objective functions forthe two parts. Given the friendship network of G = (V,E) and user trajectory

14

T , we have the following log likelihood function

L(G,T ) = Lnetwork(G) + Ltrajectory(T )

= L(G) +∑v∈VL(Tv). (13)

where Lnetwork(G) is defined in Eq. 3 and Ltrajectory(T ) is defined in Eq. ??respectively. We name our model as Joint Network and Trajectory Model (JNTM).

We present an illustrative architecture of the proposed model JNTM inFig 5. Our model is a three-layer neural network for generating both socialnetwork and user trajectory. In training, we require that both the social net-work and user trajectory should be provided as the objective output to thetrain the model. Based on such data signals, our model naturally consists oftwo objective functions. For generating the social network, a network-baseduser representation was incorporated; for generating the user trajectory, fourfactors were considered: network-based representation, general visiting prefer-ence, short-term and long-term sequential contexts. These two parts were tiedby sharing the network-based user representation.

Layer 1

Layer 2

Layer 3

Friendship User Interest Long-term Context Short-term Context

(Output)

(Representation)

(Deeper Neural Network)

RNNGRU

Network G Trajectory T

. . .

. . .

Figure 5: An illustrative architecture of the proposed model JNTM.

4.4 Parameter Learning

Now we will show how to train our model and learn the parameters, i.e., user in-terest representation P ∈ R|V |×d, user friendship representation F, F ′ ∈ R|V |×d,location representations U ∈ R|L|×d, U ′ ∈ R|L|×4d, initial short-term represen-tation S0 ∈ Rd, transition matrix W ∈ Rd×d, initial GRU state C0 ∈ Rd andGRU parameters Wi1 ,Wi2 ,Wf1 ,Wf2 ,Wc1 ,Wc2 ∈ Rd×d, bi, bf , bc ∈ Rd .

Negative Sampling. Recall that the log likelihood of network generationequation 3 includes |V | × |V | terms. Thus it takes at least O(|V |2) time tocompute, which is time-consuming. Therefore we employ negative samplingtechnique which is commonly used in NLP area [28] to accelerate our trainingprocess.

15

ALGORITHM 1: One Iteration of Network Generation

for each user v ∈ V doRandom pick 100 vertices {v1, . . . , vn2} which are not connected with v;Compute the log likelihood∑

v′:(v,v′)∈E log Pr[(v, v′) ∈ E] +∑100

k=1 log Pr[(v, vk) 6∈ E];

Compute the gradients of F and F ′ by back propagation;

endUpdate F and F ′ according to the gradients;

Note that real-world networks are usually sparse, i.e., O(E) = O(V ). Thenumber of connected vertex pairs (positive examples) are much less than thenumber of unconnected vertex pairs (negative examples). The core idea ofnegative sampling is that most vertex pairs serve as negative examples and thuswe don’t need to compute all of them. Instead we compute all connected vertexpairs and n1 random unconnected vertex pairs as an approximation where n1 �|V |2 is the number of negative samples. In our settings, we set n1 = 100|V |.The log likelihood can be rewritten as

L(G|F, F ′) =∑

(vi,vj)∈E

log Pr[(vi, vj) ∈ E] +

n1∑k=1,(vik,vjk)6∈E

log Pr[(vik, vjk) 6∈ E].

(14)Then the computation of likelihood of network generation only includes O(E +n1) = O(V ) terms.

On the other hand, the computation of equation 12 takes at least O(|L|)time because the denominator contains |L| terms. Note that the computationof this conditional probability need to be done for every location. Thereforethe computation of trajectory generation needs at least O(|L|2) which is notefficient. Similarly, we don’t compute every term in the denominator. Instead

we only compute location l(v,j)i and other n2 random locations. In this paper

we use n2 = 100. Then we reformulate equation 12 as

Pr[l(v,j)i |R(i,j)

v ] =exp(R

(i,j)v · U ′

l(v,j)i

)

exp(R(i,j)v · U ′

l(v,j)i

) +∑n2

k=1,lk 6=l(v,j)i

exp(R(i,j)v · U ′lk)

. (15)

Then the computation of the denominator only includes O(n2+1) = O(1) terms.We compute the gradients of the parameters by back propagation through

time (BPTT) [43]. Then the parameters are updated with AdaGrad [14], avariant of stochastic gradient descent (SGD), in mini-batches.

In more detail, we use pseudo codes in algorithm 1 and 2 to illustrate train-ing process of our model. The network iteration and trajectory iteration areexecuted iteratively until the performance on validation set becomes stable.

Complexity Analysis. The network generation of user v takes O(d) time tocompute log likelihood and gradients of Fv and corresponding rows of F ′. Thusthe complexity of network generation is O(d|V |).

In trajectory generation, we denote the total number of check-in data as|D|. Then the forward and backward propagation of GRU take O(d2|D|) timeto compute since the complexity of a single check-in is O(d2). Each step of

16

ALGORITHM 2: One Iteration of Trajectory Generation

for each user v ∈ V doCompute the forward propagation of GRU by equation 5∼9 and get long-termcontext representations h1, . . . , hmv ;

for each subtrajectory T jv of user v do

for each location l(v,j)i of T j

v doUpdate short-term location dependency representation by equation 6;

Concatenate four representations [Fv;Pv;Sji−1;hj−1];

Compute log likelihood by equation 15;

Compute the gradients of U ′, Fv, Pv, Sji−1 and hj−1

endfor i = mv,j , . . . , 1 do

Compute the gradients of S0, U,W through back propagation of thegradient of Sj

i−1

end

endfor j = mv, . . . , 1 do

Compute the gradients of C0,Wi1 ,Wi2 ,Wf1 ,Wf2 ,Wc1 ,Wc2 , bi, bf , bc throughback propagation of the gradient of hj

end

endUpdate all parameters according to their gradients

RNN takes O(d2) time to update local dependency representation and computethe gradients of S0, U,W . The computation of log likelihood and gradients ofU ′, Fv, Pv, S

ji−1 and hj−1 takes O(d2) times. Hence the overall complexity of

our model is O(d2|D| + d|V |). Note that the representation dimension d andnumber of negative samples per user/location are much less than the data size|V | and |D|. Hence the time complexity of our algorithm NASA is linear to thedata size and scalable for large datasets.

5 Experimental Evaluation

In this section, we evaluate the performance of our proposed model JNTM.We consider two application tasks, namely next-location recommendation andfriend recommendation. In what follows, we will discuss the data collection,baselines, parameter setting and evaluation metrics. Then we will present theexperimental results together with the related analysis.

5.1 Data Collection

We consider using two publicly available LBSN datasets3 [10], i.e., Gowalla andBrightkite, for our evaluation. Gowalla and Brightkite have released the mobileapps for users. For example, with Brightkite you can track on your friends orany other BrightKite users nearby using a phone’s built in GPS; Gowalla has asimilar function: use GPS data to show where you are, and what’s near you.

These two datasets provide both connection links and users’ check-in infor-mation. A connection link indicates reciprocal friendship and a check-in record

3http://snap.stanford.edu/data/

17

contains the location ID and the corresponding check-in timestamp. We orga-nized the check-in information as trajectory sequences. Following [9], we splita trajectory wherever the interval between two successive check-ins is largerthan six hours. We preformed some preprocessing steps on both datasets. ForGowalla, we removed all users who have less than 10 check-ins and locationswhich have fewer than 15 check-ins, and finally obtained 837, 352 subtrajecto-ries. For Brightkite, since this dataset is smaller, we only remove users who havefewer than 10 check-ins and locations which have fewer than 5 check-ins, andfinally obtain 503, 037 subtrajectories after preprocessing.

Table 2 presents the statistics of the preprocessed datasets. Note that ourdatasets are larger than those in previous works [9, 15].

Table 2: Statistics of datasets. |V |: number of vertices; |E|: number of edges;|D|: number of check-ins; |L|: number of locations.

Dataset |V | |E| |D| |L|Gowalla 37,800 390,902 2,212,652 58,410

Brightkite 11,498 140,372 1,029,959 51,866

5.2 Evaluation Tasks and Baselines

Next-Location Recommendation

For the task of next-location recommendation, we consider the following base-lines:

• FPMC [33], which is a state-of-the-art recommendation algorithm, fac-torizes tensor of transition matrices of all users and predicts next locationby computing the transition probability based on Markov chain assump-tion. It was originally proposed for product recommendation, however, itis easy adapt FPMC to deal with next-location recommendation.

• PRME [15] extends FPMC by modeling user-location and location-location pairs in different vector spaces. PRME achieves state-of-the-artperformance on next-location recommendation task.

• HRM [42] is a latest algorithm for next-basket recommendation. By tak-ing each subtrajectory as a transaction basket, we can easily adapt HRMfor next-location recommendation. It is the first study that distributedrepresentation learning has been applied to the recommendation problem.

We select these three baselines because they represent three different kindsof recommendation algorithms. FPMC was mainly developed in the matrixfactorization framework, PRME made special extensions based on FPMC toadapt to the task of next-location recommendation, and HRM adopted thedistributed representation learning method for sequential modelling.

Next, we split the data collection into the training set and test set. Thefirst 90% of check-in subtrajectories of each user are used as the training dataand the remaining 10% as test data. Given a user, we predict the locationsin the test set in a sequential way: for each location slot, we recommend five

18

or ten locations to the user. For JNTM, we naturally rank the locations bythe log likelihood as shown in equation 12. Note that negative sampling is notused in evaluation. For the baselines, we rank the locations by the transitionprobability for FPMC and HRM and transition distance for PRME. Then wereport Recall@5 and Recall@10 as the evaluation metrics where Recall@K isdefined as

Recall@K =# ground truth locations in the K recommended locations

# ground truth locations in test data.

Note that another commonly used metric Precision@K. In our experiments, wehave found it is positively correlated with Recall@K. We omit the correspondingresults for simplification.

Friend Recommendation

For the task of friend recommendation, we consider three kinds of baselinesbased on the used data resources, including the method with only the networkingdata (i.e., DeepWalk), the method with only the trajectory data (i.e., PMF),and the methods with both networking and trajectory data (i.e., PTE andTADW).

• DeepWalk [30] is a state-of-the-art NRL method which learns vertexembeddings from random walk sequences. It first employs the randomwalk algorithm to generate length-truncated random paths, and apply theword embedding technique to learn the distributions for network vertices.

• PMF [29] is a general collaborative filtering method based on user-itemmatrix factorization. In our experiments, we build the user-location ma-trix using the trajectory data, and then we utilize the user latent repre-sentations for friend recommendation.

• PTE We adapt semi-supervised text embedding algorithm PTE [36] forunsupervised embedding learning by removing the supervised part andoptimizing over adjacency matrix and user-location co-occurrence matrix.Then we reconstruct the low-rank adjacency matrix from the learnt vertexembeddings and use the low-rank adjacency matrix for friend recommen-dation.

• TADW [46] further extends DeepWalk to take advantage of text infor-mation of a network. We can replace text feature matrix in TADW withuser-location co-occurrence matrix by disregarding the sequential infor-mation of locations. Similar to PTE, we can also reconstruct the low-rankadjacency matrix and use it for friend recommendation.

To construct the evaluation collection, we randomly select 20 ∼ 50 of theexisting connection links as training set and leave the rest for test. We recom-mend 5 or 10 friends for each user and report Recall@5 and Recall@10. Thefinal results are compared by varying the training ratio from 20 to 50 percent.

The baselines methods and our model involves an important parameter, i.e.,the number of latent (or embedding) dimensions. We use a grid search from

19

25 to 100 and set the optimal value using the validation set. Other parame-ters in baselines or our model can be tuned in a similar way. For our model,the learning rate and number of negative samples are empirically set to 0.1and 100, respectively. We randomly initialize parameters according to uniformdistribution U(−0.02, 0.02).

To rank candidate users, DeepWalk and PMF adopt the cosine similarityof user representations while for PTE and TADW, we rank users based on thecorresponding entries of reconstructed adjacency matrix.

For our model, we naturally rank users and locations according to their loglikelihood in Equation (1) and (12).

All the experiments are executed on a 12-core CPU server and the CPU typeis Intel Xeon E5-2620 @ 2.0GHz.

5.3 Experimental Results on Next-location Recommenda-tion.

Table 3 shows the results of different methods on next-location recommenda-tion. Compared with FPMC and PRME, HRM models the sequential related-ness between consecutive subtrajectories while the sequential relatedness in asubtrajectory is ignored. In the Brightkite dataset, the average number of loca-tions in a subtrajectory is much less than that in the Gowalla dataset. Thereforeshort-term sequential contexts are more important in the Gowalla dataset andless useful in the Brightkite dataset. Experimental results in Table 3 demon-strate this intuition: HRM outperforms FPMC and PRME on Brightkite whilePRME works best on Gowalla.

As shown in Table 3, our model JNTM consistently outperforms the otherbaseline methods. JNTM yields 4.9% and 12.4% improvement on Recall@5 ascompared to the best baseline HRM on the Brightkite dataset and PRME onthe Gowalla dataset. Recall that our model JNTM has considered four factors,including user preference, influence of friends, short-term and long-term sequen-tial contexts. All the baseline methods only characterize user preference and asingle kind of sequential contexts. Thus, JNTM achieves the best performanceon both datasets.

Table 3: Results of different methods on next location recommendation.Dataset Brightkite Gowalla

Metric (%) R@5 R@10 R@5 R@10

FPMC 45.6 53.8 24.9 31.6PRME 44.6 53.0 31.9 38.2HRM 46.2 56.4 26.2 37.0JNTM 51.1 59.8 38.6 47.7

The above results are reported by averaging over all the users. In recom-mender systems, an important issue is how a method performs in the cold-startsetting, i.e., new users or new items. To examine the effectiveness on new usersgenerating very few check-ins, we present the results in terms of Recall@5 forusers with no more five subtrajectories. In a cold-start scenario, a commonlyused way to leverage the side information (e.g., user links [8] and text informa-tion [18, 24, 54]) to alleviate the data sparsity. For our model, we characterize

20

two kinds of user representations, either using network data or trajectory data.The user representations learned using network data can be exploited to im-prove the recommendation performance for new users to some extent. Indeed,networking representations have been applied to multiple network-independenttasks, including profession prediction [40] or text classification [46]. By utiliz-ing the networking representations, the results indicate that our model JNTMis very promising to deal with next-location recommendation in a cold-startsetting.

Table 4: Results of next location recommendation results for users with no morethan five subtrajectories.

Dataset Brightkite GowallaMetric (%) R@5 R@10 R@5 R@10

FPMC 30.0 33.9 13.5 18.5PRME 36.3 40.0 12.2 15.1HRM 31.2 39.7 15.2 21.5JNTM 52.2 57.4 23.1 29.4

Table 5: Friend recommendation results on Brightkite dataset.Training Ratio 20% 30% 40% 50%

Metric (%) R@5 R@10 R@5 R@10 R@5 R@10 R@5 R@10

DeepWalk 2.3 3.8 3.9 6.7 5.5 9.2 7.4 12.3PMF 2.1 3.6 2.1 3.7 2.3 3.4 2.3 3.8PTE 1.5 2.5 3.8 4.7 4.0 6.6 5.1 8.3

TADW 2.2 3.4 3.6 3.9 2.9 4.3 3.2 4.5JNTM 3.7 6.0 5.4 8.7 6.7 11.1 8.4 13.9

Table 6: Friend recommendation results on Gowalla dataset.Training Ratio 20% 30% 40% 50%

Metric (%) R@5 R@10 R@5 R@10 R@5 R@10 R@5 R@10

DeepWalk 2.6 3.9 5.1 8.1 7.9 12.1 10.5 15.8PMF 1.7 2.4 1.8 2.5 1.9 2.7 1.9 3.1PTE 1.1 1.8 2.3 3.6 3.6 5.6 4.9 7.6

TADW 2.1 3.1 2.6 3.9 3.2 4.7 3.6 5.4JNTM 3.8 5.5 5.9 8.9 7.9 11.9 10.0 15.1

In the above, we have shown the effectiveness of the proposed model JNTMon the task of next-location recommendation. Since trajectory data itself issequential data, our model has leveraged the flexibility of recurrent neural net-works for modelling sequential data, including both short-term and long-termsequential contexts. Now we study the effect of sequential modelling on thecurrent task.

We prepare three variants for our model JNTM

• JNTMbase: it removes both short-term and long-term contexts. It only

21

employs the user interest representation and network representation togenerate the trajectory data.

• JNTMbase+short: it incorporates the modelling for short-term contexts toJNTMbase.

• JNTMbase+short+long: it incorporates the modelling for both short-termand long-term contexts to JNTMbase.

Table 7: Performance comparison for three variants of JNTM on next-locationrecommendation.

Dataset Brightkite GowallaMetric (%) R@1 R@5 R@10 R@1 R@5 R@10

JNTMbase 20.2 49.3 59.5 15.0 37.7 46.3JNTMbase+short 21.6 49.7 59.6 15.5 38.4 47.5

JNTMbase+short+long 23.7 51.1 59.8 16.1 38.8 47.7

Table 7 shows the experimental results of three JNTM variants on theBrightkite and Gowalla dataset. We can observe a performance ranking: JNTMbase

< JNTMbase+short < JNTMbase+short+long. The observations indicate that bothkinds of sequential contexts are useful to improve the performance for next-location recommendation.

5.4 Experimental Results on Friend Recommendation

We continue to present and analyze the experimental results on the task offriend recommendation. Table 6 and 5 show the results when the training ratiovaries from 20% to 50%.

Among the baselines, DeepWalk performs best and even better than thebaselines using both networking data and trajectory data (i.e., PTE and TADW).A major reason is that DeepWalk is tailored to the reconstruction of networkconnections and adopts a distributed representation method to capture thetopology structure. As indicated in other following studies [30, 37], distributedrepresentation learning is particularly effective to network embedding. AlthoughPTE and TADW utilize both network and trajectory data, their performanceis still low. These two methods cannot capture the sequential relatedness intrajectory sequences. Another observation is that PMF (i.e., factorizing theuser-location matrix) is better than PTE at the ratio of 20% but becomes theworst baseline. It is because that PMF learns user representations using thetrajectory data, and the labeled data (i.e., links) is mainly used for training aclassifier.

Our algorithm is competitive with state-of-the-art network embedding methodDeepWalk and outperforms DeepWalk when network structure is sparse. Theexplanation is that trajectory information is more useful when network informa-tion is insufficient. As network becomes dense, the trajectory information is notas useful as the connection links. To demonstrate this explanation, we furtherreport the results for users with fewer than five friends when the training ratio of50%. As shown in Table 8, our methods have yielded 2.1% and 1.5% improve-ment than DeepWalk for these inactive users on the Brightkite and Gowalla

22

datasets, respectively. The results indicate that trajectory information is usefulto improve the performance of friend recommendation for users with very fewfriends.

Table 8: Friend recommendation results for users with fewer than five friendswhen training ratio is 50%.

Dataset Brightkite GowallaMetric (%) R@5 R@10 R@5 R@10

DeepWalk 14.0 18.6 19.8 23.5NASA 16.1 20.4 21.3 25.5

In summary, our methods significantly outperforms existing state-of-the-artmethods on both next-location prediction and friend recommendation. Exper-imental results on both tasks demonstrate the effectiveness of our proposedmodel.

5.5 Parameter Tuning

In this section, we study on how different parameters affect the performanceof our model. We mainly select two important parameters, i.e., the number ofiterations and and the number of embedding dimensions.

5 10 15 20 25 30 35 40 45 50−13000

−12000

−11000

−10000

−9000

−8000

−7000

−6000

−5000

−4000

−3000

Iteration

Net

wor

k Lo

g Li

kelih

ood

(a) Network log likelihood

5 10 15 20 25 30 35 40 45 50−4

−3.5

−3

−2.5

−2

−1.5x 10

5

Iteration

Tra

ject

ory

Log

Like

lihoo

d

(b) Trajectory log likelihood.

5 10 15 20 25 30 35 40 45 500.46

0.48

0.5

0.52

0.54

0.56

0.58

0.6

0.62

Iteration

Rec

all@

K

Recall@5Recall@10

(c) Recall@K

Figure 6: Performance of the iteration number on the Brightkite dataset.

5 10 15 20 25 30 35 40 45 50−9

−8

−7

−6

−5

−4

−3

−2

−1x 10

4

Iteration

Net

wor

k Lo

g Li

kelih

ood

(a) Network log likelihood

5 10 15 20 25 30 35 40 45 50−6.5

−6

−5.5

−5

−4.5

−4

−3.5

−3

−2.5

−2

−1.5x 10

6

Iteration

Tra

ject

ory

Log

Like

lihoo

d

(b) Trajectory log likelihood

5 10 15 20 25 30 35 40 45 500.34

0.36

0.38

0.4

0.42

0.44

0.46

0.48

0.5

Iteration

Rec

all@

K

Recall@5Recall@10

(c) Recall@K

Figure 7: Performance of the iteration number on the on the Gowalla dataset.

We conduct the tuning experiments on the training sets by varying thenumber of iterations from 5 to 50. We report the log likelihood for the networkand trajectory data on the training sets and Recall@5 and Recall@10 of nextlocation recommendation on validation sets.

Fig. 6 and 7 show the tuning results of the iteration number on both datasets.From the results we can see that our algorithm can converge within 50 iterations

23

on both datasets, and the growth of log likelihood slows down after 30 itera-tions. On the other hand, the performance of next location recommendationon validation sets is relatively stable: JNTM can yield a relatively good perfor-mance after 5 iterations. The recall values increase slowly and reach the highestscore at 45 iteration on Brightkite and 15 iteration on Gowalla dataset. HereGowalla dataset converges more quickly and smoothly than Brightkite. It ismainly because Gowalla dataset contains 3 times more check-in data than thatof Brightkite and has more enough training data. To summarize, an iterationnumber of 40 is a reasonably good choice to give good performance.

The number of embedding dimensions is also vital to the performance ofour model. A large dimension number will have a strong expressive ability butwill also probably lead to overfitting. We conduct experiments with differentembedding dimension numbers on next location recommendation and measuretheir performance on validation sets. In Fig. 8, we can see that the performanceof our algorithm is relatively stable when we vary the dimension number from 25to 100. The recall values start to decrease when the dimension number exceeds50. We finally set the dimension number to 50.

20 30 40 50 60 70 80 90 1000.44

0.46

0.48

0.5

0.52

0.54

0.56

0.58

0.6

0.62

The Number of Embedding Dimensions

Rec

all

Recall@5Recall@10

(a) Brightkite

20 30 40 50 60 70 80 90 1000.32

0.34

0.36

0.38

0.4

0.42

0.44

0.46

0.48

0.5

The Number of Embedding Dimensions

Rec

all

Recall@5Recall@10

(b) Gowalla

Figure 8: Performance tuning with different dimension numbers.

6 Conclusion and Future Work

In this paper, we presented a novel neural network model by joint model bothsocial networks and mobile trajectories. In specific, our model consisted oftwo components: the construction of social networks and the generation ofmobile trajectories. We first adopted a network embedding method for theconstruction of social networks. We considered four factors that influence thegeneration process of mobile trajectories, namely user visit preference, influenceof friends, short-term sequential contexts and long-term sequential contexts. Tocharacterize the last two contexts, we employed the RNN and GRU models tocapture the sequential relatedness in mobile trajectories at different levels, i.e.,short term or long term. Finally, the two components were tied by sharingthe user network representations. On two important application tasks, ourmodel was consistently better than several competitive baseline methods. Inour approach, network structure and trajectory information complemented each

24

other. Hence, the improvement over baselines was more significant when eithernetwork structure or trajectory data is sparse.

Currently, our model does not consider the GPS information, i.e., a check-in record is usually attached with a pair of longitude and latitude values. Ourcurrent focus mainly lies in how to joint model social networks and mobiletrajectories. As the future work, we will study how to incorporate the GPSinformation into the neural network models. In addition, the check-in locationcan be also attached with categorical labels. We will also investigate how toleverage these semantic information to improve the performance. Such semanticinformation can be utilized for the explanation of the generated recommendationresults. Our current model has provides a flexible neural network framework tocharacterize LBSN data. We believe it will inspire more follow-up studies alongthis direction.

References

[1] J. Bao, Y. Zheng, and M. F. Mokbel.Location-based and preference-aware recommendation using sparse geo-social net-

working data.In Proceedings of the 20th international conference on advances in geographic

information systems, pages 199–208. ACM, 2012.

[2] M. Belkin and P. Niyogi.Laplacian eigenmaps and spectral techniques for embedding and clustering.In Proceedings of NIPS, 2001.

[3] Y. Bengio, A. Courville, and P. Vincent.Representation learning: A review and new perspectives.IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828,

2013.

[4] P. Bhargava, T. Phan, J. Zhou, and J. Lee.Who, what, when, and where: Multi-dimensional collaborative recommendations

using tensor factorization on sparse user-generated data.In Proceedings of the 24th International Conference on World Wide Web, pages

130–140. ACM, 2015.

[5] S. Cao, W. Lu, and Q. Xu.Grarep: Learning graph representations with global structural information.In Proceedings of CIKM, 2015.

[6] V. Chandola, A. Banerjee, and V. Kumar.Anomaly detection: A survey.ACM computing surveys (CSUR), 2009.

[7] M. Chen, Q. Yang, and X. Tang.Directed graph embedding.In Proceedings of IJCAI, pages 2707–2712, 2007.

[8] C. Cheng, H. Yang, I. King, and M. R. Lyu.Fused matrix factorization with geographical and social influence in location-

based social networks.In Aaai, volume 12, pages 17–23, 2012.

[9] C. Cheng, H. Yang, M. R. Lyu, and I. King.Where you like to go next: Successive point-of-interest recommendation.In Proceedings of IJCAI, 2013.

[10] E. Cho, S. A. Myers, and J. Leskovec.Friendship and mobility: user movement in location-based social networks.

25

In Proceedings of SIGKDD, 2011.

[11] Y.-S. Cho, G. Ver Steeg, and A. Galstyan.Socially relevant venue clustering from check-in data.In KDD Workshop on Mining and Learning with Graphs, 2013.

[12] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio.Empirical evaluation of gated recurrent neural networks on sequence modeling.arXiv preprint arXiv:1412.3555, 2014.

[13] J. Cranshaw, E. Toch, J. Hong, A. Kittur, and N. Sadeh.Bridging the gap between physical location and online social networks.In Proceedings of the 12th ACM international conference on Ubiquitous comput-

ing, pages 119–128. ACM, 2010.

[14] J. Duchi, E. Hazan, and Y. Singer.Adaptive subgradient methods for online learning and stochastic optimization.The Journal of Machine Learning Research, 12:2121–2159, 2011.

[15] S. Feng, X. Li, Y. Zeng, G. Cong, Y. M. Chee, and Q. Yuan.Personalized ranking metric embedding for next new poi recommendation.In Proceedings of IJCAI, 2015.

[16] S. Fortunato.Community detection in graphs.Physics Reports, 2010.

[17] H. Gao, J. Tang, X. Hu, and H. Liu.Exploring temporal effects for location recommendation on location-based social

networks.In Proceedings of the 7th ACM conference on Recommender systems, pages 93–

100. ACM, 2013.

[18] H. Gao, J. Tang, X. Hu, and H. Liu.Content-aware point of interest recommendation on location-based social net-

works.In AAAI, pages 1721–1727. Citeseer, 2015.

[19] H. Huang, J. Tang, S. Wu, L. Liu, et al.Mining triadic closure patterns in social networks.In Proceedings of the 23rd International Conference on World Wide Web, pages

499–504. ACM, 2014.

[20] A. Krizhevsky, I. Sutskever, and G. E. Hinton.Imagenet classification with deep convolutional neural networks.In Proceedings of NIPS, pages 1097–1105, 2012.

[21] M.-J. Lee and C.-W. Chung.A user similarity calculation based on the location for social network services.In International Conference on Database Systems for Advanced Applications,

pages 38–52. Springer, 2011.

[22] J. J. Levandoski, M. Sarwat, A. Eldawy, and M. F. Mokbel.Lars: A location-aware recommender system.In 2012 IEEE 28th International Conference on Data Engineering, pages 450–461.

IEEE, 2012.

[23] Q. Li, Y. Zheng, X. Xie, Y. Chen, W. Liu, and W.-Y. Ma.Mining user similarity based on location history.In Proceedings of the 16th ACM SIGSPATIAL international conference on Ad-

vances in geographic information systems, page 34. ACM, 2008.

[24] Y. Li, J. Nie, Y. Zhang, B. Wang, B. Yan, and F. Weng.Contextual recommendation based on text mining.In Proceedings of the 23rd International Conference on Computational Linguistics:

Posters, pages 692–700. Association for Computational Linguistics, 2010.

26

[25] D. Liben-Nowell and J. Kleinberg.The link-prediction problem for social networks.Journal of the American society for information science and technology, 2007.

[26] Q. Liu, S. Wu, L. Wang, and T. Tan.Predicting the next location: A recurrent model with spatial and temporal con-

texts.In Thirtieth AAAI Conference on Artificial Intelligence, 2016.

[27] T. Mikolov, M. Karafiat, L. Burget, J. Cernocky, and S. Khudanpur.Recurrent neural network based language model.In Interspeech, volume 2, page 3, 2010.

[28] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean.Distributed representations of words and phrases and their compositionality.In Proceedings of NIPS, 2013.

[29] A. Mnih and R. Salakhutdinov.Probabilistic matrix factorization.In Advances in NIPS, 2007.

[30] B. Perozzi, R. Al-Rfou, and S. Skiena.Deepwalk: Online learning of social representations.In Proceedings of SIGKDD, 2014.

[31] H. Pham, L. Hu, and C. Shahabi.Towards integrating real-world spatiotemporal data with social networks.In Proceedings of the 19th ACM SIGSPATIAL International Conference on Ad-

vances in Geographic Information Systems, pages 453–457. ACM, 2011.

[32] H. Pham, C. Shahabi, and Y. Liu.Ebm: an entropy-based model to infer social strength from spatiotemporal data.In Proceedings of the 2013 ACM SIGMOD International Conference on Manage-

ment of Data, pages 265–276. ACM, 2013.

[33] S. Rendle, C. Freudenthaler, and L. Schmidt-Thieme.Factorizing personalized markov chains for next-basket recommendation.In Proceedings of WWW, 2010.

[34] D. M. Romero and J. M. Kleinberg.The directed closure process in hybrid social-information networks, with an anal-

ysis of link formation on twitter.In ICWSM, 2010.

[35] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad.Collective classification in network data.AI magazine, 2008.

[36] J. Tang, M. Qu, and Q. Mei.Pte: Predictive text embedding through large-scale heterogeneous text networks.In Proceedings of SIGKDD, 2015.

[37] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei.Line: Large-scale information network embedding.In Proceedings of WWW, 2015.

[38] L. Tang and H. Liu.Relational learning via latent social dimensions.In Proceedings of SIGKDD, pages 817–826, 2009.

[39] L. Tang and H. Liu.Leveraging social media networks for classification.Proceedings of SIGKDD, 2011.

[40] C. Tu, Z. Liu, and M. Sun.Prism: Profession identification in social media with personal information and

community structure.

27

In Chinese National Conference on Social Media Processing, pages 15–27.Springer, 2015.

[41] D. Wang, D. Pedreschi, C. Song, F. Giannotti, and A.-L. Barabasi.Human mobility, social ties, and link prediction.In Proceedings of the 17th ACM SIGKDD international conference on Knowledge

discovery and data mining, pages 1100–1108. ACM, 2011.

[42] P. Wang, J. Guo, Y. Lan, J. Xu, S. Wan, and X. Cheng.Learning hierarchical representation model for nextbasket recommendation.In Proceedings of SIGIR, 2015.

[43] P. J. Werbos.Backpropagation through time: what it does and how to do it.Proceedings of the IEEE, 1990.

[44] X. Xiao, Y. Zheng, Q. Luo, and X. Xie.Finding similar users using category-based location history.In Proceedings of the 18th SIGSPATIAL International Conference on Advances

in Geographic Information Systems, pages 442–445. ACM, 2010.

[45] S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin.Graph embedding and extensions: a general framework for dimensionality reduc-

tion.PAMI, 2007.

[46] C. Yang, Z. Liu, D. Zhao, M. Sun, and E. Y. Chang.Network representation learning with rich text information.In Proceedings of IJCAI, 2015.

[47] J. Ye, Z. Zhu, and H. Cheng.Whats your next move: User activity prediction in location-based social networks.In Proceedings of the SIAM International Conference on Data Mining. SIAM,

2013.

[48] M. Ye, P. Yin, W.-C. Lee, and D.-L. Lee.Exploiting geographical influence for collaborative point-of-interest recommenda-

tion.In Proceedings of the 34th international ACM SIGIR conference on Research and

development in Information Retrieval, pages 325–334. ACM, 2011.

[49] H. Yin, Y. Sun, B. Cui, Z. Hu, and L. Chen.Lcars: a location-content-aware recommender system.In Proceedings of the 19th ACM SIGKDD international conference on Knowledge

discovery and data mining, pages 221–229. ACM, 2013.

[50] Q. Yuan, G. Cong, Z. Ma, A. Sun, and N. M. Thalmann.Time-aware point-of-interest recommendation.In Proceedings of the 36th international ACM SIGIR conference on Research and

development in information retrieval, pages 363–372. ACM, 2013.

[51] Q. Yuan, G. Cong, and A. Sun.Graph-based point-of-interest recommendation with geographical and temporal

influences.In Proceedings of the 23rd ACM International Conference on Conference on In-

formation and Knowledge Management, pages 659–668. ACM, 2014.

[52] J.-D. Zhang, C.-Y. Chow, and Y. Li.Lore: Exploiting sequential influence for location recommendations.In Proceedings of the 22nd ACM SIGSPATIAL International Conference on Ad-

vances in Geographic Information Systems, pages 103–112. ACM, 2014.

[53] Y. Zhang, H. Dai, C. Xu, J. Feng, T. Wang, J. Bian, B. Wang, and T.-Y. Liu.Sequential click prediction for sponsored search with recurrent neural networks.arXiv preprint arXiv:1404.5772, 2014.

28

[54] K. Zhao, G. Cong, Q. Yuan, and K. Q. Zhu.Sar: a sentiment-aspect-region model for user preference analysis in geo-tagged

reviews.In 2015 IEEE 31st International Conference on Data Engineering, pages 675–686.

IEEE, 2015.

[55] V. W. Zheng, Y. Zheng, X. Xie, and Q. Yang.Collaborative location and activity recommendations with gps history data.In Proceedings of the 19th international conference on World wide web, pages

1029–1038. ACM, 2010.

[56] N. Zhou, W. X. Zhao, X. Zhang, J.-R. Wen, and S. Wang.A general multi-context embedding model for mining human trajectory data.IEEE transactions on knowledge and data engineering, 2016.

29

A Neural Network Approach to Joint Modeling Social Networks … · 2018. 3. 21. · Social networks and mobile trajectories are heteroge-neous data types. A social network is typically

Documents