The Call of the Crowd: Event Participation in Location ... · Foursquare, a location-based social service created in 2009, has quickly advanced as being one of the most popular location-based

The Call of the Crowd: Event Participation in Location-based Social Services

Petko GeorgievComputer Laboratory

University of Cambridge, [email protected]

Anastasios NoulasComputer Laboratory


Cecilia MascoloComputer Laboratory


Abstract

Understanding the social and behavioral forces behindevent participation is not only interesting from the view-point of social science, but also has important applica-tions in the design of personalized event recommendersystems. This paper takes advantage of data from awidely used location-based social network, Foursquare,to analyze event patterns in three metropolitan cities.We put forward several hypotheses on the motivatingfactors of user participation and confirm that social as-pects play a major role in determining the likelihood ofa user to participate in an event. While an explicit socialfiltering signal accounting for whether friends are at-tending dominates the factors, the popularity of an eventproves to also be a strong attractor. Further, we capturean implicit social signal by performing random walksin a high dimensional graph that encodes the place typepreferences of friends and that proves especially suitedto identify relevant niche events for users. Our findingson the extent to which the various temporal, spatial andsocial aspects underlie users’ event preferences lead usto further hypothesize that a combination of factors bet-ter models users’ event interests. We verify this througha supervised learning framework. We show that for onein three users in London and one in five users in NewYork and Chicago it identifies the exact event the userwould attend among the pool of suggestions.

IntroductionOrganized events such as festivals, concerts and sportsgames are important social phenomena offering individu-als a source of recreation and opportunities to socialize. Un-derstanding the collective dynamics of user participation insuch events can provide critical insights that help in venueresource planning (Liang et al. 2013), personalized eventrecommendation (Minkov et al. 2010) and targeted adver-tising that increase customer satisfaction and trust in onlineservices. With the rise in popularity of location-based ser-vices such as Foursquare, we now have the tools to ana-lyze and model social event participation at scale. The datafrom millions of users broadcasting their locations providesan unprecedented opportunity to accurately model the socio-

Copyright c© 2014, Association for the Advancement of ArtificialIntelligence (www.aaai.org). All rights reserved.

spatial dynamics of events that motivate people to share theirlocation and visit certain venues.

The present work studies the social and behavioral under-pinnings of event participation as represented by location-based social networks. The main research question we ad-dress in this work is: what is the extent to which geospatial,temporal, and social factors influence users’ preferencestowards events? To answer this question, we formulate apredictive modeling task where we try to match a user’smobility profile against the collective past check-in activ-ity of potential event attendees. The design of this predic-tion task allows us to empirically measure homophily effectson users’ event choices as reflected by location-based socialmedia.

Besides its societal importance, solving the above men-tioned challenge finds crucial applications in the domainof personalized event recommendations. First, the insightson the relationship between social media usage and eventinterests can be used to augment the credibility of recom-mendations by accompanying event suggestions with evi-dence elicited from publicly shared data. Further, under theassumption that some information about potential or on-going attendance is available, our framework can be di-rectly used as a content-based event recommender systemfor users of location-based services. Such an assumption isbeing increasingly supported by the rapid growth of event-based social services provided by Facebook, Meetup, Plan-cast, DoubanEvent, and Eventbrite. These online networksoffer a platform for users to not only organize and estab-lish social events, but also to express their intention to joinby signing up in advance. While event-based networks pro-liferate, Foursquare as a location-based service provides aunique chance to investigate event participation from mul-tiple angles (temporal, spatial, geo-social) which are not si-multaneously available elsewhere.

To reveal the underlying forces of users’ attraction to cer-tain events, we first present a methodology to mine exist-ing events from check-in data and then test three hypothesesthrough which we make our major contributions:

• [H1] Events attract users with similar mobility patterns.To test this hypothesis we motivate the selection of coreestablished and novel features which we subsequentlyevaluate in the context of the prediction task presentedabove. We find evidence of similarity in the past spatio-

arX

iv:1

403.

7657

v1 [

cs.S

I] 2

9 M

ar 2

014

(a) Check-ins, 26 May 2011 (b) Check-ins, 28 May 2011 (c) Check-ins at the Wembley area, 28 May

Figure 1: User check-in distribution before and during the UEFA Champions League Final event in London. Darker shadedregions denote a higher number of check-ins closer to the observed maximum among all regions during the same day. The sizeof the location markers in Figure (c) is proportional to the number of check-ins at the place. Notice the significantly increasedactivity at the Wembley area in Northwestern London on the 28th of May 2011 when the UEFA football match was held.

temporal activity of event participants through the hoursthey tend to check in at, the distance they are willing totravel, and the types of places they are inclined to visit.

• [H2] Social factors are a driving force when determiningthe likelihood of users to attend certain events. Throughextensive evaluation in three cities we confirm that so-cial factors are the strongest predictors. On the one hand,event popularity, which can be related to forces of so-cial contagion (Le Bon 2001), dominates the results inLondon. On the other hand, an explicit social filteringthat checks whether friends are visiting the event topsthe results in New York and Chicago, hinting at the pres-ence of a social group identity in collective behavior(Aveni 1977). Third, an implicit social signal, inspiredby trust-based recommendations (Jamali and Ester 2009;Andersen et al. 2008) and based on the place type prefer-ences of friends encoded in a socio-spatial graph, identi-fies relevant niche events in London.

• [H3] A combination of multiple factors is a more powerfulsignal than individual features in determining event par-ticipation preferences. To test this hypothesis we imple-ment a fused mobility model based on supervised learningtechniques the performance of which we compare againstthe best single features. Overall, the prediction frameworksuccessfully identifies the exact attended event for one inthree users in London and one in five users in New Yorkand Chicago.

Our work is one of the first to investigate event par-ticipation from the viewpoint of location-based social ser-vices. We demonstrate that such services are successful incapturing social phenomena related to crowd behavior inmass gatherings which has vital implications for personal-ized event recommender systems.

Data Collection and Event ExtractionFoursquare, a location-based social service created in 2009,has quickly advanced as being one of the most popularlocation-based services with over 40 million users as of

September 2013.1 The primary means of expressing activitythrough the online service is creating check-ins which arelocation broadcasts tagged with tips and comments aboutthe places visited by Foursquare users. Users can option-ally share their check-ins via their Twitter accounts whichenables us to crawl the check-ins via the Twitter stream-ing API. Over a period of 8 months, from December 2010to September 2011, we were able to collect the 3,586,374check-ins of 190,883 users across 184,280 venues in Lon-don, New York, and Chicago (Table 1).

city # users # check-ins # venuesLondon 41,397 533,931 41,701Chicago 42,790 715,650 33,261New York 106,696 2,336,793 109,318

Table 1: Dataset properties.

We additionally mined the city social networks from Twit-ter where users can subscribe to follow the public messagefeeds of arbitrary users. Two users are considered friendsif the subscription is bidirectional, i.e. both of them followeach other’s activity. Foursquare does not allow unautho-rized access to a users’ friend list which is the reason why theTwitter social graph is used. Although it may not be identicalto the actual Foursquare graph, our evaluation results sug-gest it is a useful approximation sufficient for the purposesof this work.

Event DetectionOne characteristic of events is that they cause some of theplaces to become unusually busy on certain days. This ob-servation has been used by Sklar et. al. in building their real-time event recommendation engine (Sklar, Shaw, and Hogue2012) and is our guiding principle in uncovering events inthe dataset. Figure 1 shows the check-in attention levels ofone of the most popular events in London, the UEFA Cham-pions League Final, which was revealed by tracking changes

1http://goo.gl/VNtDRP

(a) MCM Expo(b) UEFA Champions

League Final (c) Internet Week (d) Webby Awards (e) DrupalCon(f) Lollapalooza

Festival

Figure 2: Word clouds of the words used in the names of the event places and place types: (a)-(b) London, (c)-(d) New York,(e)-(f) Chicago

in the popularity of the London Wembley arena during thedifferent days. In Figure 1b the visualized check-in levelsduring the event day, 28th of May 2011, illustrate the signif-icantly increased activity at the places in the Wembley areaand hint at the simplicity of the method of tracking the placepopularity for mining events.

An event is considered to be an anomalous activity, mea-sured in amount of check-ins, that is unusually high for aplace given its check-in history. To detect events we com-pute the average number of check-ins per place and look forsignificant deviations (more than double the average) fromthis number during the days. The place-day pairs are thensorted in decreasing order of the absolute difference betweenthe observed and average place popularity. For each city wepick the top 60 most popular organized events the existenceof which could be verified.

We validate the actual existence of the events at the placeswith increased check-in activity by performing a simple lin-guistic analysis on the words in the names of the eventplaces. As shown in Figure 2 many of the events have ded-icated Foursquare places whose names exactly match theevent ones. The words used to describe an event are highlyinformative of both its name and type (music venue, con-ference, football match, etc.). In the few cases when wecould not obtain the exact event name from the dataset it-self we resorted to manual validation via a web search en-gine. Although the manual labeling is a tentative task, it isa method that allows for the extraction of ground truth la-bels and avoids the incorporation of irrelevant items in theanalysis.

Event Scope DefinitionThe dataset has a diverse set of events that may span severalhours (concerts or sports games) or a whole day (festivalsor conferences). To account for this diversity and not restrictthe actual check-in time of users, all the check-ins at theevent place that happen during the same day are considered.In addition, for some of the events we observe check-in ac-tivity at several nearby places. For instance, the UEFA finalsfootball match has multiple check-in hotspots at the Wemb-ley Stadium in London (Figure 1c). Once the most popularevent place is identified as described in the previous section,we search for other event places with a greater than averagenumber of check-ins in a 300-meter radius. The check-insfrom these additional places are also included in the analy-sis. We manually verify that there are no two major unrelated

events happening on the same day in the neighborhood area.A user is assumed to have attended an event if they

checked in anywhere at the event places during the day. Itis possible that the true intention of users might be differentfrom attending the event when they check in there. How-ever, this information is not readily available and we allowfor some noise in the event data.

Event Participation FactorsIn the previous section we have extracted the check-in datafor a range of events: from sports games and festivals to con-certs, shows and conferences. We now pose our main hy-potheses on the forces underlying users’ event choices andmotivate a core set of spatial, temporal and social factors.

Events and User MobilityThe motives of visitors attending organized events can rangefrom cultural exploration to socialization and gregariousness(Crompton and McKay 1997). Regardless of the concretereason for participation, the event acts as a focal point forits attendees sharing a common experience. We hypothesizethat some level of commonality also propagates to the mo-bility patterns of participants.

1) Attending nearby events: Our first conjecture is thatgeographic distance might restrict the venue preferencesof Foursquare users to nearby places and, by extension,to nearby events. Some evidence in favor of this intuitioncan be found in previous work suggesting that a large pro-portion of human movements are short-range (Cho, Myers,and Leskovec 2011) and predictable (Song et al. 2010). Wetherefore model the role of spatial proximity by introducingthe factor Home Distance: a user’s likelihood to attend anevent is inversely proportional to the distance between theirmost frequently visited place, or home, and the most popularevent place.

2) Place type like-mindedness: The next dimension ofevent participants’ potential similarity is their past activity.In Foursquare the activities, and by extension the type ofpast attended events, can be inferred by users’ visited typesof places. For instance, football matches and large concertstake place at stadiums, while festivals are typically outdoorattractions located in parks and open spaces. By looking atthe types of places users tend to visit we expect to gain abroader view on the events they are interested in.

Taking advantage of this intuition, we quantify the levelof attractiveness of an event for a user by comparing the

Blogworld Expo Orioles-Yankees Baseball Lollapalooza Chicago Comic ConPlace type Score Place type Score Place type Score Place type ScoreConvention Center 0.0074 Baseball 0.0138 Music Venue 0.0947 Indie Theater 0.0106Event Space 0.0033 Bar 0.0070 Bar 0.0353 Bookstore 0.0098Hotel 0.0025 Sports Bar 0.0067 American 0.0195 Convention Center 0.0076Vegetarian / Vegan 0.0024 Pub 0.0049 Mexican 0.0162 Cineplex 0.0072Train Station 0.0020 Pizza 0.0039 Sports Bar 0.0162 Other - Buildings 0.0059American 0.0016 Stadium 0.0038 Pub 0.0162 Electronics 0.0052Tech Startup 0.0015 American 0.0031 Other - Entertainment 0.0161 Fast Food 0.0047Corporate / Office 0.0015 Pier 0.0030 Corporate / Office 0.0145 Other - Entertainment 0.0045Other - Entertainment 0.0014 Coffee Shop 0.0029 Stadium 0.0145 Movie Theater 0.0044Bookstore 0.0013 Gym 0.0029 Burgers 0.0139 Grocery Store 0.0042

Table 2: Top 10 place categories observed in the check-in history of participants in four events. The first two events are held inNew York, while the second two in Chicago. Place types in bold match the general theme of the event.

L set of city locationsC set of place typesU set of city usersE set of city eventsL(e) set of event placesU(e) set of event attendeesG city social graphG(e) social network of event attendeesΓu neighborhood set of user u ∈ U in GNc

u accumulated # check-ins for user u at places of type cNh

u total # check-ins between hours h and h + 1 for user u

Table 3: Notation. In the context of a particular event anduser we imply the check-ins at place category c or hour h upto the day before the event occurs.

user’s activity patterns to the collective activity of the eventcrowd. Our hypothesis is that the closer the user profile is tothe collective behavior of the mass, the higher the chancesare of the event attracting the user. One way to material-ize this notion through location-based data is compute thecosine similarity cos6 (ru, re) between two vectors repre-senting the profiles of the user and the event. On the user’sside, the vector ru is built from assigning scores to the vis-ited place types: higher values are given to categories thatare popular for a particular user but at the same time are notpopular among most users in general. These requirementsare highly reminiscent of the Term Frequency-Inverse Doc-ument Frequency (TF-IDF) commonly used in InformationRetrieval (Baeza-Yates and Ribeiro-Neto 1999). Users couldbe modeled as documents and the place categories as terms.The weight of a term in a document is simply the number ofcheck-ins of a user at places of the type associated with theterm. Employing the notation from Table 3, the user’s scorefor category i is defined as:

riu =N i

u

max({N ju : j ∈ C})

× ln|U |

|{v ∈ U : N iv > 0}|

(1)

The aggregated event profile is similarly built from thepast visited place types of its attendees where place cate-gories are ranked differently based on their specificity forevent participants. The ranking strategy should give higherranks to place types that are common among the majority

of the participants (a). Higher ranks should be also given toplace types whose attendance contribution from participantsis relatively large compared to other place categories (b). Anelement rie from the event vector re corresponds to places oftype i and is the result of the multiplication of the two factors(a) and (b):

rie = ab =|{u′ ∈ U(e) : N i

u′ > 0}||U(e)|

×

∑u′∈U(e)

N iu′∑

u′∈UN i

u′(2)

We call this metric that captures the ”herding” behaviorof participants the Place Category Score. When building theevent profiles and looking at the related place types, we seethat the adopted metric is highly effective in uncovering animportant aspect of event attendance preference. As demon-strated in Table 2 where the top 10 most highly ranked placecategories for events are listed, participants in an event ap-pear to have a preference to visit places of a similar typeas the one of the most popular/central event place. In ourdataset, baseball and football matches, for instance, attractfans that previously visited Stadiums, conferences appear toattract people visiting Convention Centers, and concerts at-tract users visiting Music Venues.

3) Hourly patterns: A third dimension of the factors driv-ing users’ decision to visit events is the temporal preferencesof users to get involved in activities. Our assumption is thatif a user is mostly active during a particular time of theday such as the evening, they would rather attend an eventaligned with these temporal preferences. As an approxima-tion for the event time we could adopt its peak hour. As por-trayed in Figure 3, the temporal distribution of check-ins atevents of different types usually has a well-defined shapethat reflects how users arrive at the event venues before andduring the event. The peak is an often observed phenomenonthat marks the onset of an expected activity such as the be-ginning of a concert or a sports game. In fact, more than 40%of the events in all cities have at least half of their check-inscreated at the peak and the hours immediately before and af-ter it. The alignment between the event and the user’s pasttemporal activity can then be captured by measuring the ex-tent to which users tend to check in at the hours around the

0 5 10 15 20Hour

0.0

0.2

0.4

0.6

0.8

1.0C

heck

-in inte

nsi

ty

BBQ Block Party

LCD Sound System Concert

Webby Awards

US Open Tennis

Mongo Conference

Next Wave Festival

Figure 3: Temporal shapes of six events in New York city.The number of check-ins at each hour is normalized by themaximum value reached on the day of the event. A globalmaximum, or a peak, is often observed.

event peak pe:

d =∑

h∈[0,24)

Nhu

maxNhu

×min(|h− pe|, 24− |h− pe|) (3)

A small Temporal Distance d implies that the user prefersto check in predominantly at hours coinciding with the eventones.

Events and Social ForcesOur next main hypothesis is that social forces are the pri-mary means of luring users to attend events. These forcescan take various forms and in this section we give threeprominent examples.

4) Following the crowd: A strong motivation for users toparticipate in an event might be its Popularity, which canbe measured in amount of check-ins. Attending events be-cause of their popularity can be considered as a form ofcrowd behavior where individuals follow trends through so-cial contagion or imitation (Le Bon 2001). This claim ispartly supported by our findings that the events in the datasetfeature a long-tail popularity distribution. A few of them at-tract large masses of users (such as the Royal Wedding inLondon or Lollapalooza Festival in Chicago), while the resthave a markedly lower number of users checking in at re-lated venues.

5) Social group identity: Many events such as festivalsand concerts are social activities by nature which is why weexpect the purely social motivation for users to attend certainevents to be particularly strong. This intuition is confirmedin social systems with respect to crowd behavior (Aveni1977) where participants in mass gatherings are more likelyto be found among a group of friends. Drury and Reicher(Drury and Reicher 1999) further develop a social identitymodel by proposing that crowd behavior is driven by inter-group dynamics where individuals adopt the collective iden-tity of their social group to interact with others. Falling backon these studies, we put forward a Social Influence factor

that assumes that a user would prefer to join events for whichthe number of visiting friends is larger.

It is possible that some events are ranked equally highfor a user because the number of friends attending is thesame. We argue that in this case the probability of joiningthe events may not be the same and it often depends on thesocial importance of the event for the user’s friends. In suchsituations we break ties by considering the maximum degreecentrality (|Γu ∩ U(e)|) of a friend in the social networkof event attendees. Our reasoning is that if an event is of aparticular interest to a friend, they would most probably playa central role in the social network of attendees and wouldattract more of their friends in turn to participate.

6) Place-focused social interactions: Our next hypothe-sis is that the friends’ visited place categories and the associ-ated activities with them can be indicative of the users’ eventpreferences. The types of places visited by friends may actas gravitational forces for social interactions where friend-ship is fostered and ultimately manifested in collective par-ticipation in events. This could be considered as a type ofhomophily in social systems (McPherson, Smith-Lovin, andCook 2001) where networks are homogeneous with respectto behavioral characteristics. In our case the homogeneity iscaptured though the common place types such as Bars andTheaters where friends meet.

To model the above mentioned assumptions we design agraph that seamlessly combines social and spatial signalsand that connects users, place types and events as shownin Figure 4. Personalized random walks with restart (Tong,Faloutsos, and Pan 2006) are performed on the graph tocompute user attraction scores towards events.

Figure 4: An example socio-spatial graph. Nodes representevents (left), users (right) and place types (middle). Dashedlinks denote social network relations. User-user and user-category links are bidirectional but have different weightsdepending on the direction. A random walk starts from anevent node and reaches out to users via place types.

Graph Definition. The graph is a directed one with threetypes of nodes: users U , events E and place categories C.There are three types of links which we weigh differently toencode domain-specific transition probabilities. User-userlinks connect two users i and j if they are friends. Theweight wij = 1

|Γi| of the connecting arc is inversely pro-portional to the number of friends user i has. User-categorylinks connect users to their visited place categories. The arcbetween user u and category c is weighed according to theTF-IDF score as defined in Equation 1. The reverse link isweighed similarly with the difference being that now a place

category is represented as a document and the users check-ing in there as terms:

wcu =N c

u

max({N cv : v ∈ U})

×ln|C|

|{j ∈ C : N ju > 0}|

(4)

Last, an event-category link connects an event to a place typeif it is among the top K categories visited by users partici-pating in the event. The place types are sorted in descend-ing order of the place category score (Equation (2)) whichis used as the weight on the links. A too low value of Kmight overlook an important place preference signal, whilea too high value might introduce unwanted noise. We findthat K = 10 offers a good balance between sufficient detailand tolerable noise. Finally, to correctly set transition prob-abilities in the resulting graph, we normalize the weight oneach link by dividing its value by the sum of the weights onthe out links of the source node.

Random Walks with Restart. Random walks on graphshave been used to rank nodes in a way that encodes the prob-ability of reaching a target node from a source. The rankinginformation has been successfully employed by variationson the PageRank algorithm (Page et al. 1999) to computeimportance scores of web pages in the web page citationgraph. Random walks with restart (Tong, Faloutsos, and Pan2006) are personalized versions of the model that addition-ally incorporate a constant probability of jumping back toa specific graph node in order to bias the walks nearer thenode’s neighborhood. The restart step is essential for acquir-ing a personalized view of the graph with respect to a spe-cific node. In our case, this allows us to measure the extentto which a user is related to a concrete event.

A random walker starts from an event node, keeps travers-ing adjacent links and with constant probability (1 − α)jumps back to the event node which guarantees the person-alized view of the graph. The parameter α is a scaling factorthat is usually set to 0.85 (Page et al. 1999). By setting therestart probabilities at other nodes to zero we ensure that therandom walker explores nodes close to the event neighbor-hood more often. We are then interested in the steady-stateprobability that we reach the user nodes. If a user is easilyreachable from an event via place types, friends or any com-bination of factors, the random walk score of the user nodein the graph will be higher. The preference towards events isconsidered stronger when the computed user random walkscores are higher.

Experimental EvaluationIn this section we formulate an event prediction task in thecontext of which we evaluate the strength of the describedfactors. In doing so we confirm our hypotheses: [H1] thereis similarity in the event participants’ mobility patterns; [H2]social signals and popularity play a leading role in the pre-diction task; [H3] a combination of factors is more infor-mative of users’ event preferences compared to individualfeatures.

Evaluation MethodologyWe define an event participation prediction problem as fol-lows: given a set of events and a set of users, find a way

to rank events so that those at the forefront of the predic-tion list are the ones that the user actually attends. Eventsare ranked according to the preference scores produced bythe participation factors as described in the previous section.In this context the factors behave as prediction features. Afeature is considered more successful in explaining the mo-tivation behind a user’s participation if it gives higher ranksfor events that are truly attended by the user.

For the evaluation, we use a stratified 10-fold cross valida-tion with respect to users. From each event 10% of the par-ticipants are repeatedly held out as test users. The rest of theusers in the training set are assumed to be the ones who havesigned up for the event and they are the ones from whom theevent profiles are built. When building the user and eventprofiles, only the check-in activity prior to the day of theevent is considered without including the check-in data fromthe event itself. For each test user all items are ranked and asingle preference list of events is produced many of whichhappen in different days.

MetricsThe performance of the event ranking features is evaluatedwith respect to two metrics: normalized discounted cumu-lative gain (NDCG) and accuracy. The NDCG@N met-ric is commonly used in information retrieval (Jarvelin andKekalainen 2002) to measure the effectiveness in the rank-ing of relevant items in a list of recommendations:

NDCG@N =1

ZN

N∑i=1

2rel(li) − 1

log2(1 + i)(5)

The relevance rel(li) of an item (event) li at position i in ourcase is equal to 1 when the user attended the event and 0 oth-erwise. The idealized cumulative gain ZN is a normalizingconstant such that a perfect ranking with all relevant itemsordered first would result in an NDCG value of 1. We alsouse the Accuracy@N metric which for a user is defined as 1if and only if an event that the user attended is ranked withinthe top N items in the prediction list. The accuracy resultsare averaged across users. This metric is complementary tothe NDCG one and shows for what proportion of the usersa feature brings relevant events to the front of the predictionlist. The Accuracy@X% is similar and represents the cut-offthreshold equal to X% of the total number of events eligiblefor prediction.

Model London Chicago New YorkRandom 0.118 0.142 0.115

Temporal Dist. 0.203 0.221 0.194Home Dist. 0.219 0.245 0.223

Category Score 0.315 0.267 0.235Popularity 0.411 0.275 0.262

Social Influence 0.290 0.306 0.268Random Walk 0.347 0.221 0.244

Table 4: Averaged NDCG@10 for the different rankingstrategies. Top 2 features for each city are in bold.

5 10 15 20 250.00.10.20.30.40.50.60.7

Acc

ura

cy@

X%

London

5 10 15 20 25List size (%)

Chicago

5 10 15 20 25

New York

Random Baseline

Temporal Distance

Home Distance

Centroid Distance

Popularity

Category Score

Social Influence

Random Walk

Figure 5: Averaged user accuracy of the mobility features as a function of the prediction list size.

[H1] Events and User MobilityHere we test our hypothesis that event interests imply somesimilarity in the mobility patterns of participants. We con-firm this by comparing the spatio-temporal features’ per-formance against a random baseline (Table 4). The Tempo-ral Distance and Home Distance, albeit being weaker thanthe other signals, still perform significantly better than ran-dom, which implies similarity in the temporal and spatialdimensions: 1) events are likely to attract users that histor-ically check in more at hours around the event peak, and2) events appeal more to users that are geographically closeto the activity hot spot. Among the spatial-only factors, theone that encodes the Place Category preferences of usersperforms better than the simple distance-based metric in allcases. This confirms that the semantics (types) of places aremore informative than pure distance when it comes to eventpreferences, which is unlike standard place mobility modelswhere distance is a dominant factor (Scellato et al. 2011).These observations suggest that place types alone, as alreadyhinted by the Random Walk model that incorporates themtoo, can be an important source of information for inferringevent interests since certain events appear to attract userswith common activity patterns captured in check-ins at par-ticular types of places.

[H2] Events and Social ForcesFriends in the Crowd: In this part of the analysis we testwhether social factors in their various forms are drivingforces for event participation. A first discovery in testing thisclaim is that event popularity, as captured by the number ofattendees, is truly among the best predictors across cities. InLondon the feature achieves the highest NDCG score, 0.411,observed for a factor. The reason for this is that there aremassively popular events in cities, such as the Royal Wed-ding and the UEFA Champions League Final in London,that attract a large number of people. This phenomenon ishighly reminiscent of preferential attachment models (Al-bert and Barabasi 2002) where popular entities (events) lureeven more followers governed by forces such as gregarious-ness and social contagion.

On the other hand, we confirm that the Social Influencefeature is extremely strong in the event domain, scoring ashigh as 0.306 in Chicago and 0.268 in New York and outper-forming even Popularity which reaches 0.275 and 0.262 inthe two cities respectively. These figures suggest that eventsfoster social participation which is in line with Aveni’s find-ings on the role of social groups in collective behavior(Aveni 1977). We recall that when designing the Social In-fluence feature we additionally incorporated the degree cen-trality of the user’s most socially involved friend as a way tomeasure the social importance of an event for a user. To un-derstand whether this additional complexity is worthwhile,in Table 5 we compare the performance of the enhancedsignal to the no-centrality baseline for users that have eventswith an equal number of participating friends. The signifi-cant, more than 12% improvement in Accuracy@1 impliesthat the centrality technique is successful in breaking tiesamong already highly ranked events. This suggests that thepreference towards an event for a user can be successfullyinferred based on the social engagement of their friends.

NDCG@10 ACC@1City # Users Base Centr. Base Centr.London 843 0.43 0.46 0.32 0.36Chicago 2323 0.31 0.33 0.21 0.26New York 3972 0.38 0.40 0.24 0.28

Table 5: Comparison of the Social Influence feature perfor-mance between its two variants: with and without centrality.These results are obtained through leave-one-out cross vali-dation and averaged across users for whom there are at least2 events with an equal number of friends.

In terms of Accuracy@X, the results shown in Figure 5are consistent with the NDCG ones: the best performing fea-tures are the socially influenced models and in the case ofLondon, Popularity. For all cities the Accuracy@5% for theSocial Influence is around 30% which means that roughlyfor 1 in 3 users on average the metric correctly identifies arelevant event within the top 5% of the items in the predic-tion list. This hints that social factors are better at predicting

the exact event a user would attend as discussed above.Where Friends Meet: An intriguing outcome is that the

implicit social signal hidden in the place type preferencesof friends and captured by the Random Walk model exhibitsdiversity in its performance (Table 4). In London it achievesa high score of 0.347 ranking second best overall, whereasin Chicago and New York the results of 0.221 and 0.244respectively are clearly lower than the ones of Social Influ-ence and Popularity. We demonstrate that this heterogeneityis related to the presence of niche events that engage userswho prefer to check-in at place types that are not generallypopular.

A niche event such as a football game can be character-ized by the highly targeted interests of its fans. This can bereflected in the participants’ place type preferences wherevisiting certain place categories such as football stadiumsmay be common among the attendees, or among certainfriend circles, but not popular in general. To formalize thenotion we look at the Kendall’s τ correlation coefficient(Kendall 1938) between the ranking of place categories foran event profile, as shown in Table 2, and the overall rankingof the place type popularity as reflected in the Foursquaredata. An event is considered more niche if its Kendall’s τcorrelation coefficient is lower or negative. In such cases thediscrepancy in the two rankings dominates which impliesthat there are place types less popular among the commonuser but high on the list among event attendees.

London Chicago New York−0.50∗ −0.38∗ −0.42∗

Table 6: Spearman correlation coefficients between theKendall’s τ score and the Accuracy@5% of the randomwalk model, p-value < 0.01∗.

The key observation illustrated in Table 6 is that there isa statistically significant negative correlation between theKendall’s τ score and the Accuracy@5% for the randomwalk model on the socio-spatial graph. This means that themore niche an event is, the better the performance of therandom walk model becomes. In London 62% of the eventshave a negative Kendall’s τ score implying a highly nichecontent for their participants. In contrast, there are only threesuch events in Chicago and zero in New York. Our findingsrecognize the influence of friends and common interests onthe motivation to visit niche or special events of value whichis higher to the social group than to the general community.

[H3] Inter-signal InteractionsWe have observed that while certain features such as So-cial Influence and Popularity dominate in most cases, thereis some heterogeneity in their relative performance acrosscities. We have also seen that the random walk model per-forms well for niche events which can vary in number fromone city to another. The question we address here is whetherwe could adopt a supervised learning procedure for combin-ing participation features into a fused prediction system thatautomatically dissects the heterogeneities and outperforms

the individual participation factors. By building this frame-work we hypothesize that a combination of factors betterreflects users’ decision to participate in an event.

Training Strategy. The features we have examined pro-duce a score for a user-event pair which indicates the like-lihood that a user attends an event. For each user-event ex-ample we build an instance by assembling the scores of theindividual predictors into a feature list and appending a pos-itive (+1) or a negative (-1) label depending on whether theuser truly attended the event. A training set is built from asubset of the users. For each user we include the positive ex-amples as well as 15 randomly chosen instances correspond-ing to events the user has not attended. Regression modelsare trained that produce a real-valued output for user-eventpairs which allows us to rank events according to the pre-dicted preference scores.

Evaluation Strategy. We adopt the same 10-fold crossvalidation procedure as presented in the ”EvaluationMethodology” Section. The difference is that now for eachof our training users we have a set of positive and nega-tive examples which constitute the training set. Note thatalthough the training phase includes a reduced set of user-event samples, in the testing phase we evaluate against allpossible combinations of test users and events. The super-vised learning algorithms we have experimented with arelinear ridge regression (with the regularization parameter be-ing set to λ=10−8) (Hoerl and Kennard 1970) and M5 modeltrees (Quinlan 1992). We have used the publicly availableimplementations in the WEKA framework (Witten, Frank,and Hall 2011). Two versions of the algorithms are consid-ered: one that combines all features and one that excludesthe random walk probability scores from the socio-spatialgraph. This separation allows us to evaluate an additionalhypothesis: the place type preferences of friends implicitlyexpressed with the random walk scores are a fundamentallydifferent signal not captured in a combination of other fea-tures.

Results. By comparing the supervised models against thesingle predictors in Table 7, we find that the M5+RWR treesattain the best performance. They outperform the incorpo-rated best single features by a clear margin (0.117 for Lon-don, 0.057 for Chicago, and 0.099 for New York) and bet-ter the results of the linear regression models. This sug-gests that a combination of temporal, spatial and social sig-nals integrated into a supervised learning framework canprove highly effective in predicting the participation of usersto events in location-based services. Furthermore, the reg-ularized linear regression model does not provide consis-tently good results, even when it is compared with the sin-gle features. As in the case of Chicago, the linear regres-sion classifier LR achieves a score of 0.311 which is onlyslightly above the 0.306 value of the Social Influence fea-ture. Thus, a non-linear combination of features may pro-vide a more effective modeling recipe in inferring the eventinterests of Foursquare users. A similar finding with respectto non-linearly mixing spatio-temporal signals for person-alized venue search in Foursquare has been highlighted byShaw et al. (Shaw et al. 2013).

Our further hypothesis is that the place type preferences


Popularity 0.411 0.275 0.262Social Influence 0.290 0.306 0.268

LR 0.481 0.311 0.336M5 0.494 0.346 0.344

LR + RWR 0.505 0.324 0.343M5 + RWR 0.528 0.363 0.367

Table 7: Averaged NDCG@10 for the different supervisedlearning algorithms.


Popularity 0.267 0.168 0.151Social Influence 0.220 0.198 0.160

LR 0.293 0.152 0.179M5 0.344 0.205 0.185

LR + RWR 0.307 0.165 0.182M5 + RWR 0.372 0.229 0.212

Table 8: Averaged user Accuracy@1 for the different super-vised learning algorithms compared against Popularity andSocial Influence.

of friends are a fundamentally different signal not cap-tured as a combination of other features. We confirm thisby the important observation that using the random walkscores as a feature in the supervised learning framework im-proves the results for both the linear regression and the M5model trees algorithms. In Chicago, for instance, the aver-aged NDCG@10 for the random walk on the socio-spatialgraph achieves a score of 0.221 which is lower than the homedistance. When this random walk signal is fused into the M5tree, the results soar to 0.363 which is much higher than the0.306 value of the best performing feature. Similar outstand-ing results are valid for London and New York as well.

In terms of user accuracy it is also notable that the onlymodel that is able to substantially outperform the Accu-racy@1 of the best single feature across all cities is the M5Tree + RWR (Table 8). The accuracy for London goes ashigh as 37% which means that roughly for 1 in every 3 usersthe model correctly identifies the exact event the user willattend. Given the results, the supervised framework accu-rately identifies the preferred event for one in three users inLondon and one in five users for New York and Chicago.

Discussion and ImplicationsThe analysis and subsequent evaluation of the event partic-ipation prediction problem in Foursquare has revealed in-teresting insights both on the nature of social events, as seenthrough the lens of location-based services, and the algorith-mic strategies one may employ to recommend events.

The superiority in the performance of social signals can beeventually identified on three fronts. First, event popularity,which can be related to the strong social urge to follow trend-ing behavior, is topping the results in London. Second, theexplicit social filtering which accounts whether friends areattending an event has performed very well in all cities. To

some extent, this behavior could be attributed to the presenceof a social identity where individuals participate in the eventto share collective experiences with friends. Third, the me-chanics and performance of the random walk strategy haveuncovered the presence of an implicit social signal hidden inthe user preferences (interests) for particular place types andby extension to specific event types. This could be viewed asa form of homophily that brings together like minded usersto social events. We have shown that in the cases of nicheevents this signal yields excellent performance.

Although we have observed some diversity in the perfor-mance of the various participation features both across eventtypes and cities, we have offered a recipe that copes withthese issues. A supervised learning approach has proven ef-fective in combining the different information signals intoa unified framework so as to provide top performance inall contexts. In the event recommendation task, our findingssuggest not only that combining multiple factors is highlydesirable, but also that extracting social signals is of utmostimportance for achieving high accuracy.

These results should be interpreted in the context of po-tential biases originating from the data collection and thecheck-in process in Foursquare. On the one hand, our datasetrelies on users who have explicitly shared their whereaboutsvia Twitter. According to Scellato et al. (Scellato et al. 2011)such users constitute between 20% and 25% of the totalFoursquare population in 2010. On the other hand, it is hardto validate the true intention of the users when they checkin at particular venues. As we primarily focus on studyingaggregated behavior from a large user base, however, ourapproach is able to tolerate a certain amount of noise.

Related WorkEvent Mobility Analysis and Detection. To our knowl-edge, event analysis so far has been limited to isolated casesand specific types of events. Xavier et al. (Xavier et al.2013), for instance, focus on mobility aspects of users dur-ing large-scale events but fail to provide any insights as towhy users attend the event. Calabrese et. al. (Calabrese etal. 2010) have studied crowd mobility during special eventsbut they have solely concentrated on correlating the typeof the event with the origin of people attending it. Onlyrecently have online social networks entered the event de-tection arena (Sakaki, Okazaki, and Matsuo 2010) due tothe massive amounts of timely user-generated content inresponse to external anomalous events. Sklar et al. (Sklar,Shaw, and Hogue 2012) have built a real-time event detec-tion engine in Foursquare that is based on a probabilisticmodel for measuring how unusually busy a place becomes.Although they recommend the detected nearby events tousers, they do not focus on understanding the relationshipbetween the user past check-in patterns and the likelihoodof attending certain events.

Event Prediction. The event prediction problem has beenstudied by Quercia et al. (Quercia et al. 2010) when pro-viding cold-start event recommendations for users whosehome location is known. However, the authors have not fo-cused on personalization. Three other prominent examples

of event recommender systems have been built in the do-mains of on-going cultural events, scientific and conferencetalks. Lee (Lee 2008) exploits trust relations together withexplicit user feedback to recommend cultural events, whileMinkov et al. (Minkov et al. 2010) combine content-basedwith collaborative filtering approaches to capture user pref-erences towards latent topics hidden in scientific talk an-nouncements. Liao et al. (Liao et al. 2013) further developlatent models based on offline spontaneous interactions andco-attendance information to recommend related events inoffline ephemeral social networks formed around confer-ence talks. In comparison to these works, the events that westudy in location-based social services currently lack manyof the contextual advantages that the above mentioned sys-tems take for granted: explicit event preference information,on-going nature of specific events, detailed topic descrip-tions and offline interaction data.

ConclusionsIn this work we have studied the spatio-temporal and so-cial forces behind users’ decisions to attend certain eventsas seen through location-based social networks. We have de-fined a prediction framework that at the expense of some po-tential attendance knowledge assesses different dimensionsof homophily effects observed through collective participa-tion in events. While social forces tend to dominate over theothers, confirming theories on crowd behavior, we uncoversome heterogeneities in the performance of the predictionfeatures across cities and event types. This proves that com-bining the disparate signals into a supervised learning frame-work for event participation prediction is necessary for ob-taining top performance in all cases. The insights drawn andthe framework developed in this work could help towardsdesigning better personalized event recommender systemsin the context of mobile applications and help the new gen-eration of location-based services including Foursquare toengage further with their users.

AcknowledgmentsWe acknowledge the support of Microsoft Research and EP-SRC through grant GALE (EP/K019392).

References[Albert and Barabasi 2002] Albert, R., and Barabasi, A.-L. 2002.

Statistical mechanics of complex networks. Rev. Mod. Phys.74:47–97.

[Andersen et al. 2008] Andersen, R.; Borgs, C.; Chayes, J.; Feige,U.; Flaxman, A.; Kalai, A.; Mirrokni, V.; and Tennenholtz, M.2008. Trust-based recommendation systems: an axiomatic ap-proach. In WWW ’08.

[Aveni 1977] Aveni, A. F. 1977. The Not-So-Lonely Crowd:Friendship Groups in Collective Behavior. Sociometry 40(1):96–99.

[Baeza-Yates and Ribeiro-Neto 1999] Baeza-Yates, R. A., andRibeiro-Neto, B. 1999. Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc.

[Calabrese et al. 2010] Calabrese, F.; Pereira, F. C.; Di Lorenzo, G.;Liu, L.; and Ratti, C. 2010. The geography of taste: Analyzing cell-phone mobility and social events. In Pervasive’10.

[Cho, Myers, and Leskovec 2011] Cho, E.; Myers, S. A.; andLeskovec, J. 2011. Friendship and mobility: user movement inlocation-based social networks. In KDD ’11.

[Crompton and McKay 1997] Crompton, J. L., and McKay, S. L.1997. Motives of visitors attending festival events. Annals ofTourism Research 24:425 – 439.

[Drury and Reicher 1999] Drury, J., and Reicher, S. 1999. The In-tergroup Dynamics of Collective Empowerment: Substantiating theSocial Identity Model of Crowd Behavior. Group Process and In-tergroup Relations 381–402.

[Hoerl and Kennard 1970] Hoerl, A. E., and Kennard, R. W. 1970.Ridge regression: Biased estimation for nonorthogonal problems.Technometrics 12:55–67.

[Jamali and Ester 2009] Jamali, M., and Ester, M. 2009. Trust-walker: a random walk model for combining trust-based and item-based recommendation. In KDD ’09.

[Jarvelin and Kekalainen 2002] Jarvelin, K., and Kekalainen, J.2002. Cumulated gain-based evaluation of ir techniques. ACMTrans. Inf. Syst. 20(4):422–446.

[Kendall 1938] Kendall, M. G. 1938. A new measure of rank cor-relation. Biometrika 30(1/2):81–93.

[Le Bon 2001] Le Bon, G. 2001. The crowd: a study of the popularmind. Batoche Books. Originally published 1896.

[Lee 2008] Lee, D. H. 2008. Pittcult: trust-based cultural eventrecommender. In RecSys ’08.

[Liang et al. 2013] Liang, Y.; Caverlee, J.; Cheng, Z.; and Kamath,K. Y. 2013. How big is the crowd?: event and location basedpopulation modeling in social media. In HT.

[Liao et al. 2013] Liao, G.; Zhao, Y.; Xie, S.; and Yu, P. S. 2013.Latent networks fusion based model for event recommendation inoffline ephemeral social networks. In CIKM ’13.

[McPherson, Smith-Lovin, and Cook 2001] McPherson, M.;Smith-Lovin, L.; and Cook, J. M. 2001. Birds of a feather:Homophily in social networks. Annual Review of Sociology27(1):415–444.

[Minkov et al. 2010] Minkov, E.; Charrow, B.; Ledlie, J.; Teller, S.;and Jaakkola, T. 2010. Collaborative future event recommendation.In CIKM ’10.

[Page et al. 1999] Page, L.; Brin, S.; Motwani, R.; and Winograd,T. 1999. The pagerank citation ranking: Bringing order to the web.Technical Report 1999-66, Stanford InfoLab. Previous number =SIDL-WP-1999-0120.

[Quercia et al. 2010] Quercia, D.; Lathia, N.; Calabrese, F.;Di Lorenzo, G.; and Crowcroft, J. 2010. Recommending socialevents from mobile phone location data. In ICDM ’10.

[Quinlan 1992] Quinlan, J. R. 1992. Learning with continuousclasses. 343–348. World Scientific.

[Sakaki, Okazaki, and Matsuo 2010] Sakaki, T.; Okazaki, M.; andMatsuo, Y. 2010. Earthquake Shakes Twitter Users: Real-timeEvent Detection by Social Sensors. In WWW ’10.

[Scellato et al. 2011] Scellato, S.; Noulas, A.; Lambiotte, R.; andMascolo, C. 2011. Socio-spatial properties of online location-based social networks. Proceedings of ICWSM 11.

[Shaw et al. 2013] Shaw, B.; Shea, J.; Sinha, S.; and Hogue, A.2013. Learning to rank for spatiotemporal search. In WSDM ’13.

[Sklar, Shaw, and Hogue 2012] Sklar, M.; Shaw, B.; and Hogue, A.2012. Recommending interesting events in real-time with four-square check-ins. In RecSys ’12.

[Song et al. 2010] Song, C.; Qu, Z.; Blumm, N.; and Barabasi, A.-L. 2010. Limits of Predictability in Human Mobility. Science327(5968):1018–1021.

[Tong, Faloutsos, and Pan 2006] Tong, H.; Faloutsos, C.; and Pan,J.-Y. 2006. Fast random walk with restart and its applications. InICDM ’06.

[Witten, Frank, and Hall 2011] Witten, I. H.; Frank, E.; and Hall,M. A. 2011. Data Mining: Practical Machine Learning Tools andTechniques. Morgan Kaufmann, 3 edition.

[Xavier et al. 2013] Xavier, F. H. Z.; Silveira, L. M.; Almeida,J. M.; Malab, C. H. S.; Ziviani, A.; and Marques-Neto, H. T. 2013.Understanding human mobility due to large-scale events. In Net-Mob ’13.

The Call of the Crowd: Event Participation in Location ... · Foursquare, a location-based social service created in 2009, has quickly advanced as being one of the most popular location-based

Documents