Top Banner
TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 6, NO. 1, NOVEMBER 2012 1 LARS*: An Efficient and Scalable Location-Aware Recommender System Mohamed Sarwat * , Justin J. Levandoski , Ahmed Eldawy * and Mohamed F. Mokbel * * Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455 Microsoft Research, Redmond, WA 98052-6399 Abstract—This paper proposes LARS*, a location-aware recommender system that uses location-based ratings to produce recommen- dations. Traditional recommender systems do not consider spatial properties of users nor items; LARS*, on the other hand, supports a taxonomy of three novel classes of location-based ratings, namely, spatial ratings for non-spatial items, non-spatial ratings for spatial items, and spatial ratings for spatial items. LARS* exploits user rating locations through user partitioning, a technique that influences recommendations with ratings spatially close to querying users in a manner that maximizes system scalability while not sacrificing recommendation quality. LARS* exploits item locations using travel penalty, a technique that favors recommendation candidates closer in travel distance to querying users in a way that avoids exhaustive access to all spatial items. LARS* can apply these techniques separately, or together, depending on the type of location-based rating available. Experimental evidence using large-scale real-world data from both the Foursquare location-based social network and the MovieLens movie recommendation system reveals that LARS* is efficient, scalable, and capable of producing recommendations twice as accurate compared to existing recommendation approaches. Index Terms—Recommender System, Spatial, Location, Performance, Efficiency, Scalability, Social. 1 I NTRODUCTION R ECOMMENDER systems make use of community opinions to help users identify useful items from a consider- ably large search space (e.g., Amazon inventory [1], Netflix movies 1 ). The technique used by many of these systems is collaborative filtering (CF) [2], which analyzes past commu- nity opinions to find correlations of similar users and items to suggest k personalized items (e.g., movies) to a querying user u. Community opinions are expressed through explicit ratings represented by the triple (user, rating, item) that represents a user providing a numeric rating for an item. Currently, myriad applications can produce location-based ratings that embed user and/or item locations. For example, location-based social networks (e.g., Foursquare 2 and Face- book Places [3]) allow users to “check-in” at spatial destina- tions (e.g., restaurants) and rate their visit, thus are capable of associating both user and item locations with ratings. Such rat- ings motivate an interesting new paradigm of location-aware recommendations, whereby the recommender system exploits the spatial aspect of ratings when producing recommendations. Existing recommendation techniques [4] assume ratings are represented by the (user, rating, item) triple, thus are ill- equipped to produce location-aware recommendations. In this paper, we propose LARS*, a novel l ocation- a ware r ecommender s ystem built specifically to produce high- quality location-based recommendations in an efficient man- ner. LARS* produces recommendations using a taxonomy of This work is supported in part by the National Science Foundation under Grants IIS-0811998, IIS-0811935, CNS-0708604, IIS-0952977 and by a Microsoft Research Gift 1. Netflix: http://www.netflix.com 2. Foursquare: http://foursquare.com U.S. State Top Movie Genres Avg. Rating Minnesota Film-Noir War Drama Wisconsin Florida 3.6 4.0 4.0 3.9 3.8 4.3 4.1 4.0 4.0 Documentary War Film-Noir Mystery Romance Fantansy Animation War Musical 3.6 3.7 3.8 (a) Movielens preference locality Users from: Visited venues in: % Visits Edina, MN Minneapolis , MN Edina , MN Eden Prarie , MN Robbinsdale, MN Brooklyn Park, MN Robbinsdale, MN Minneapolis, MN Falcon Heights, MN St. Paul, MN Minneapolis, MN Roseville, MN 37 % 59 % 5 % 32 % 20 % 15 % 17 % 13 % 10 % (b) Foursquare preference locality Fig. 1. Preference locality in location-based ratings. three types of location-based ratings within a single frame- work: (1) Spatial ratings for non-spatial items, represented as a four-tuple (user, ulocation, rating, item), where ulocation represents a user location, for example, a user located at home rating a book; (2) non-spatial ratings for spatial items, represented as a four-tuple (user, rating, item, ilocation), where ilocation represents an item location, for example, a user with unknown location rating a restaurant; (3) spatial ratings for spatial items, represented as a five-tuple (user, ulocation, rating, item, ilocation), for example, a user at his/her office rating a restaurant visited for lunch. Traditional rating triples can be classified as non-spatial ratings for non- spatial items and do not fit this taxonomy. 1.1 Motivation: A Study of Location-Based Ratings The motivation for our work comes from analysis of two real- world location-based rating datasets: (1) a subset of the well- known MovieLens dataset [5] containing approximately 87K movie ratings associated with user zip codes (i.e., spatial rat- ings for non-spatial items) and (2) data from the Foursquare [6] location-based social network containing user visit data for 1M
16

TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, … · ings motivate an interesting new paradigm of location-aware recommendations, whereby the recommender system exploits the spatial

Sep 30, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, … · ings motivate an interesting new paradigm of location-aware recommendations, whereby the recommender system exploits the spatial

TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 6, NO. 1, NOVEMBER 2012 1

LARS*: An Efficient and ScalableLocation-Aware Recommender SystemMohamed Sarwat∗, Justin J. Levandoski†, Ahmed Eldawy∗ and Mohamed F. Mokbel∗

∗Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455†Microsoft Research, Redmond, WA 98052-6399

Abstract—This paper proposes LARS*, a location-aware recommender system that uses location-based ratings to produce recommen-dations. Traditional recommender systems do not consider spatial properties of users nor items; LARS*, on the other hand, supports ataxonomy of three novel classes of location-based ratings, namely, spatial ratings for non-spatial items, non-spatial ratings for spatialitems, and spatial ratings for spatial items. LARS* exploits user rating locations through user partitioning, a technique that influencesrecommendations with ratings spatially close to querying users in a manner that maximizes system scalability while not sacrificingrecommendation quality. LARS* exploits item locations using travel penalty, a technique that favors recommendation candidates closerin travel distance to querying users in a way that avoids exhaustive access to all spatial items. LARS* can apply these techniquesseparately, or together, depending on the type of location-based rating available. Experimental evidence using large-scale real-worlddata from both the Foursquare location-based social network and the MovieLens movie recommendation system reveals that LARS* isefficient, scalable, and capable of producing recommendations twice as accurate compared to existing recommendation approaches.

Index Terms—Recommender System, Spatial, Location, Performance, Efficiency, Scalability, Social.

1 INTRODUCTION

RECOMMENDERsystems make use of community opinionsto help users identify useful items from a consider-

ably large search space (e.g., Amazon inventory [1], Netflixmovies1). The technique used by many of these systems iscollaborative filtering (CF) [2], which analyzes past commu-nity opinions to find correlations of similar users and itemstosuggestk personalized items (e.g., movies) to a querying useru. Community opinions are expressed through explicit ratingsrepresented by the triple (user, rating, item) that represents auserproviding a numericrating for an item.

Currently, myriad applications can producelocation-basedratings that embed user and/or item locations. For example,location-based social networks (e.g., Foursquare2 and Face-book Places [3]) allow users to “check-in” at spatial destina-tions (e.g., restaurants) and rate their visit, thus are capable ofassociating both user and item locations with ratings. Suchrat-ings motivate an interesting new paradigm oflocation-awarerecommendations, whereby the recommender system exploitsthe spatial aspect of ratings when producing recommendations.Existing recommendation techniques [4] assume ratings arerepresented by the (user, rating, item) triple, thus are ill-equipped to produce location-aware recommendations.

In this paper, we propose LARS*, a novel location-aware recommender system built specifically to produce high-quality location-based recommendations in an efficient man-ner. LARS* produces recommendations using a taxonomy of

This work is supported in part by the National Science Foundation underGrants IIS-0811998, IIS-0811935, CNS-0708604, IIS-0952977 and by aMicrosoft Research Gift

1. Netflix: http://www.netflix.com2. Foursquare: http://foursquare.com

U.S. State Top Movie Genres Avg. Rating

Minnesota Film-NoirWar

Drama

Wisconsin

Florida

3.64.04.03.93.84.34.14.04.0

DocumentaryWar

Film-NoirMysteryRomanceFantansyAnimation

WarMusical

3.63.73.8

(a) Movielens preference locality

Users from: Visited venues in: % Visits

Edina, MN Minneapolis , MN

Edina , MN

Eden Prarie , MN

Robbinsdale, MN Brooklyn Park, MN

Robbinsdale, MN

Minneapolis, MN

Falcon Heights,

MN

St. Paul, MN

Minneapolis, MN

Roseville, MN

37 %

59 %

5 %

32 %

20 %

15 %

17 %

13 %

10 %

(b) Foursquare preference locality

Fig. 1. Preference locality in location-based ratings.

three types of location-based ratings within a single frame-work: (1) Spatial ratings for non-spatial items, represented asa four-tuple (user, ulocation, rating, item), whereulocationrepresents a user location, for example, a user located athome rating a book; (2) non-spatial ratings for spatial items,represented as a four-tuple (user, rating, item, ilocation),where ilocation represents an item location, for example, auser with unknown location rating a restaurant; (3) spatialratings for spatial items, represented as a five-tuple (user,ulocation, rating, item, ilocation), for example, a user athis/her office rating a restaurant visited for lunch. Traditionalrating triples can be classified as non-spatial ratings for non-spatial items and do not fit this taxonomy.

1.1 Motivation: A Study of Location-Based Ratings

The motivation for our work comes from analysis of two real-world location-based rating datasets: (1) a subset of the well-known MovieLens dataset [5] containing approximately 87Kmovie ratings associated with user zip codes (i.e., spatialrat-ings for non-spatial items) and (2) data from the Foursquare[6]location-based social network containing user visit data for 1M

Page 2: TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, … · ings motivate an interesting new paradigm of location-aware recommendations, whereby the recommender system exploits the spatial

TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 6, NO. 1, NOVEMBER 2012 2

users to 643K venues across the United States (i.e., spatialratings for spatial items). In our analysis we consistentlyobserved two interesting properties that motivate the needforlocation-aware recommendation techniques.

Preference locality. Preference locality suggests users froma spatial region (e.g., neighborhood) prefer items (e.g., movies,destinations) that are manifestly different than items preferredby users from other, even adjacent, regions. Figure 1(a) liststhe top-4 movie genres using average MovieLens ratings ofusers from different U.S. states. While each list is different,the top genres from Florida differ vastly from the others.Florida’s list contains three genres (“Fantasy”, “Animation”,“Musical”) not in the other lists. This difference implies moviepreferences are unique to specific spatial regions, and confirmsprevious work from the New York Times [7] that analyzedNetflix user queues across U.S. zip codes and found similardifferences. Meanwhile, Figure 1(b) summarizes our observa-tion of preference locality in Foursquare by depicting the visitdestinations for users from threeadjacentMinnesota cities.Each sample exhibits diverse behavior: users from FalconHeights, MN favor venues in St. Paul, MN (17% of visits)Minneapolis (13%), and Roseville, MN (10%), while usersfrom Robbinsdale, MN prefer venues in Brooklyn Park, MN(32%) and Robbinsdale (20%). Preference locality suggeststhat recommendations should be influenced by location-basedratingsspatially closeto the user. The intuition is that localiza-tion influences recommendation using the unique preferencesfound within the spatial region containing the user.

Travel locality. Our second observation is that, when recom-mended items are spatial, users tend to travel a limited distancewhen visiting these venues. We refer to this property as “travellocality.” In our analysis of Foursquare data, we observedthat 45% of users travel 10 miles or less, while 75% travel50 miles or less. This observation suggests that spatial itemscloser in travel distance to a user should be given precedenceas recommendation candidates. In other words, a recommen-dation loses efficacy the further a querying user must travelto visit the destination. Existing recommendation techniquesdo not consider travel locality, thus may recommend usersdestinations with burdensome travel distances (e.g., a user inChicago receiving restaurant recommendations in Seattle).

1.2 Our Contribution: LARS* - A Location-AwareRecommender System

Like traditional recommender systems, LARS* suggestsk

items personalized for a querying useru. However, LARS*is distinct in its ability to produce location-aware recommen-dations usingeachof the three types of location-based ratingwithin a single framework.

LARS* produces recommendations usingspatial ratingsfor non-spatial items, i.e., the tuple (user, ulocation, rating,item), by employing auser partitioningtechnique that exploitspreference locality. This technique uses an adaptive pyramidstructure to partition ratings by theiruser locationattributeinto spatial regions of varying sizes at different hierarchies.For a querying user located in a regionR, we apply anexisting collaborative filtering technique that utilizes only the

ratings located inR. The challenge, however, is to determinewhether all regions in the pyramid must be maintained in orderto balance two contradicting factors:scalability and locality.Maintaining a large number of regions increaseslocality(i.e., recommendations unique to smaller spatial regions),yet adversely affects systemscalability because each regionrequires storage and maintenance of a collaborative filteringdata structure necessary to produce recommendations (i.e.,the recommender model). The LARS* pyramid dynamicallyadapts to find the right pyramid shape that balances scalabilityand recommendation locality.

LARS* produces recommendations usingnon-spatial rat-ings for spatial items, i.e., the tuple (user, rating, item, iloca-tion), by usingtravel penalty, a technique that exploits travellocality. This technique penalizes recommendation candidatesthe further they are in travel distance to a querying user. Thechallenge here is to avoid computing the travel distance forall spatial items to produce the list ofk recommendations, asthis will greatly consume system resources. LARS* addressesthis challenge by employing an efficient query processingframework capable of terminating early once it discovers thatthe list of k answers cannot be altered by processing morerecommendation candidates. To produce recommendations us-ing spatial ratings for spatial items, i.e., the tuple (user,ulocation, rating, item, ilocation) LARS* employs both theuser partitioningand travel penaltytechniques to address theuser and item locations associated with the ratings. This isa salient feature of LARS*, as the two techniques can beused separately, or in concert, depending on the location-basedrating type available in the system.

We experimentally evaluate LARS* using real location-based ratings from Foursquare [6] and MovieLens [5], alongwith a generated user workload of bothsnapshotand con-tinuous queries. Our experiments show LARS* is scalableto real large-scale recommendation scenarios. Since we haveaccess to real data, we also evaluate recommendationqualityby building LARS* with 80% of the spatial ratings and testingrecommendation accuracy with the remaining 20% of the(withheld) ratings. We find LARS* produces recommendationsthat are twice as accurate (i.e., able to better predict userpreferences) compared to traditional collaborative filtering. Insummary, the contributions of this paper are as follows:• We provide a novel classification of three types of

location-based ratings not supported by existing recom-mender systems:spatial ratings for non-spatial items,non-spatial ratings for spatial items, andspatial ratingsfor spatial items.

• We propose LARS*, a novel location-aware recom-mender system capable of using three classes of location-based ratings. Within LARS*, we propose: (a) auser par-titioning technique that exploits user locations in a waythat maximizes system scalability while not sacrificingrecommendation locality and (b) atravel penaltytech-nique that exploits item locations and avoids exhaustivelyprocessing all spatial recommendation candidates.

• LARS* distinguishes itself from LARS [8] in the follow-ing points: (1) LARS* achieves higher locality gain thanLARS using a better user partitioning data structure and

Page 3: TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, … · ings motivate an interesting new paradigm of location-aware recommendations, whereby the recommender system exploits the spatial

TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 6, NO. 1, NOVEMBER 2012 3

ip

iq

u juk

......

......

3

4

5

42

1

Users

Items .7ip

ItemSimilarity List

iq .6i r .4i s

.9i r i x .5i y .2i z

(a) Ratings matrix (b) Item-based CF model

Fig. 2. Item-based CF model generation.

algorithm. (2) LARS* exhibits a more flexible tradeoffbetween locality and scalability. (3) LARS* providesa more efficient way to maintain the user partitioningstructure, as opposed to LARS expensive operations.

• We provide experimental evidence that LARS* scales tolarge-scale recommendation scenarios and provides betterquality recommendations than traditional approaches.

This paper is organized as follows: Section 2 gives anoverview of LARS*. Sections 4, 5, and 6 cover LARS*recommendation techniques usingspatial ratings for non-spatial items, non-spatial ratings for spatial items, andspatialratings for spatial items, respectively. Section 7 providesexperimental analysis. Section 8 covers related work, whileSection 9 concludes the paper.

2 LARS* OVERVIEW

This section provides an overview of LARS* by discussingthe query model and the collaborative filtering method.

2.1 LARS* Query Model

Users (or applications) provide LARS* with a user idU ,numeric limitK, and locationL; LARS* then returnsK rec-ommended items to the user. LARS* supports bothsnapshot(i.e., one-time) queries andcontinuousqueries, whereby a usersubscribes to LARS* and receives recommendation updates asher location changes. The technique LARS* uses to producerecommendations depends on the type of location-based ratingavailable in the system. Query processing support for each typeof location-based rating is discussed in Sections 4 to 6.

2.2 Item-Based Collaborative Filtering

LARS* uses item-based collaborative filtering (abbr. CF) asits primary recommendation technique, chosen due to itspopularity and widespread adoption in commercial systems(e.g., Amazon [1]). Collaborative filtering (CF) assumes aset of n users U = {u1, ..., un} and a set ofm itemsI = {i1, ..., im}. Each useruj expresses opinions about aset of itemsIuj

⊆ I. Opinions can be a numeric rating (e.g.,the Netflix scale of one to five stars), or unary (e.g., Facebook“check-ins” [3]). Conceptually, ratings are represented as amatrix with users and items as dimensions, as depicted inFigure 2(a). Given a querying useru, CF produces a set ofk recommended itemsIr ⊂ I that u is predicted to like themost.

Phase I: Model Building. This phase computes a similarityscoresim(ip,iq) for each pair of objectsip and iq that have

ipsim( , ) = .7iq

ip

iq

u juk ......

3

4

5

42

1

co-rated

Fig. 3. Item-based similarity calculation.

at least one common rating by the same user (i.e., co-rateddimensions). Similarity computation is covered below. Usingthese scores, a model is built that stores for each itemi ∈ I, alist L of similar items ordered by a similarity scoresim(ip,iq),as depicted in Figure 2(b). Building this model is anO(R

2

U )process [1], whereR and U are the number of ratings andusers, respectively. It is common to truncate the model bystoring, for each listL, only then most similar items with thehighest similarity scores [9]. The value ofn is referred to asthe model sizeand is usually much less than|I|.

Phase II: Recommendation Generation. Given a queryinguser u, recommendations are produced by computingu’spredicted ratingP(u,i) for each itemi not rated byu [9]:

P(u,i) =

∑l∈L sim(i, l) ∗ ru,l∑

l∈L |sim(i, l)|(1)

Before this computation, we reduce each similarity listL tocontain only itemsrated by useru. The prediction is the sumof ru,l, a useru’s rating for a related iteml ∈ L weighted bysim(i,l), the similarity ofl to candidate itemi, then normalizedby the sum of similarity scores betweeni and l. The userreceives as recommendations the top-k items ranked byP(u,i).

Computing Similarity . To computesim(ip, iq), we repre-sent each item as a vector in the user-rating space of the ratingmatrix. For instance, Figure 3 depicts vectors for itemsip andiq from the matrix in Figure 2(a). Many similarity functionshave been proposed (e.g., Pearson Correlation, Cosine); weuse the Cosine similarity inLARS*due to its popularity:

sim(ip, iq) =~ip · ~iq

‖~ip‖‖~iq‖(2)

This score is calculated using the vectors’ co-rated dimensions,e.g., the Cosine similarity betweenip and iq in Figure 3is .7 calculated using the circled co-rated dimensions. Cosinedistance is useful for numeric ratings (e.g., on a scale [1,5]).For unary ratings, other similarity functions are used (e.g.,absolute sum [10]).

While we opt to use item-based CF in this paper, nofactors disqualify us from employing other recommendationtechniques. For instance, we could easily employ user-basedCF [4], that uses correlations between users (instead of items).

3 NON-SPATIAL USER RATINGS FORNON-SPATIAL ITEMS

The traditional item-based collaborative filtering (CF) methodis a special case of LARS*. CF takes as input the classicalrating triplet (user, rating, item) such that neither the userlocation nor the item location are specified. In such case,LARS* directly employs the traditional model building phase(Phase-I in section 2) to calculate the similarity scores between

Page 4: TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, … · ings motivate an interesting new paradigm of location-aware recommendations, whereby the recommender system exploits the spatial

TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 6, NO. 1, NOVEMBER 2012 4

!"#$%%& '"#$%%&

!"#$%$&%' ()*+,' -./'

()*&

+& +& +&

+& +& +&

()*&

+& +& +&

+& +& +&

()*& ()*&

,"#$%%&

01$2+'!3%"+4'52+#'6,+7+,'89'

:;:'<2=*'6,+7+,'>9'

?;?'<2=*'6,+7+,':9'

@;@'<2=*'6,+7+,'A9'

Fig. 4. Pyramid data structure

all items. Moreover, recommendations are produced to theusers using the recommendation generation phase (Phase-IIin section 2). During the rest of the paper, we explain howLARS* incorporates either the user spatial location or the itemspatial location to serve location-aware recommendationstothe system users.

4 SPATIAL USER RATINGS FORNON-SPATIAL ITEMS

This section describes how LARS* produces recommendationsusing spatial ratings for non-spatial items represented bythetuple (user, ulocation, rating, item). The idea is to exploitpreference locality, i.e., the observation that user opinionsare spatially unique (based on analysis in Section 1.1). Weidentify three requirements for producing recommendationsusing spatial ratings for non-spatial items: (1)Locality: rec-ommendations should be influenced by those ratings with userlocations spatially close to the querying user location (i.e., ina spatial neighborhood); (2)Scalability: the recommendationprocedure and data structure should scale up to large numberof users; (3)Influence: system users should have the ability tocontrol the size of the spatial neighborhood (e.g., city block,zip code, or county) that influences their recommendations.

LARS* achieves its requirements by employing auserpartitioning technique that maintains an adaptive pyramidstructure, where the shape of the adaptive pyramid is drivenbythe three goals oflocality, scalability, andinfluence. The ideais to adaptively partition the rating tuples (user, ulocation,rating, item) into spatial regions based on theulocationattribute. Then, LARS* produces recommendations using anyexisting collaborative filtering method (we use item-basedCF) over the remaining three attributes (user, rating, item)of only the ratings within the spatial region containing thequerying user. We note that ratings can come from users withvarying tastes, and that our method only forces collaborativefiltering to produce personalized user recommendations basedonly on ratings restricted to a specific spatial region. In thissection, we describe the pyramid structure in Section 4.1,query processing in Section 4.2, and finally data structuremaintenance in Section 4.3.

!"#

$"#

$%#

$&#

$'#

"()#

&%((#

"'#

*'#

!%#

$"#

$%#

$&#

$'#

""("#

"*((#

)+,#

"&')#

!&#

$"#

$%#

$&#

$'#

*'#

*(*#

%&((#

"-(,#

./01#23456/#$7859# ./01#23456/#$7859# ./01#23456/#$7859#

Fig. 5. Example of Items Ratings Statistics Table

4.1 Data Structure

LARS* employs a partialin-memorypyramid structure [11](equivalent to a partial quad-tree [12]) as depicted in Figure 4.The pyramid decomposes the space intoH levels (i.e., pyramidheight). For a given levelh, the space is partitioned into4h

equal area grid cells. For example, at the pyramid root (level0), one grid cell represents the entire geographic area, level 1partitions space into four equi-area cells, and so forth. Werepresent each cell with a unique identifiercid.

A rating may belong to up toH pyramid cells: one per eachpyramid level starting from the lowest maintained grid cellcontaining the embedded user location up to the root level.To provide a tradeoff between recommendation locality andsystem scalability, the pyramid data structure maintains threetypes of cells (see figure 4): (1) Recommendation Model Cell(α-Cell), (2) Statistics Cell (β-Cell), and (3) Empty Cell (γ-Cell), explained as follows:

Recommendation Model Cell (α-Cell). Eachα-Cell storesan item-based collaborative filtering model built usingonly thespatial ratings with user locations contained in the cell’sspatialregion. Note that the root cell (level 0) of the pyramid is anα-Cell and represents a “traditional” (i.e., non-spatial) item-basedcollaborative filtering model. Moreover, eachα-Cell maintainsstatistics about all the ratings located within the spatialextentsof the cell. Eachα-Cell Cp maintains a hash table that indexesall items (by their IDs) that have been rated in this cell, namedItems Ratings Statistics Table. For each indexed itemi in theItems Ratings Statistics Table, we maintain four parameters;each parameter represent thenumber of user ratingsto item i

in each of the four children cells (i.e.,C1, C2, C3, andC4) ofcell Cp. An example of the maintained parameters is given inFigure 5. Assume that cellCp contains ratings for three itemsI1, I2, and I3. Figure 5 shows the maintained statistics foreach item in cellCp. For example, for itemI1, the number ofuser ratings located in child cellC1, C2, C3, andC4 is equalto 109, 3200, 14, and 54, respectively. Similarly, the numberof user ratings is calculated for itemsI2 andI3.

Statistics Cell (β-Cell). Like an α-Cell, a β-Cell main-tains statistics (i.e.,items ratings Statistics Table) about theuser/item ratings that are located within the spatial rangeofthe cell. The only difference between anα-Cell and aβ-Cellis that a β-Cell does not maintain a collaborative filtering(CF) model for the user/item ratings lying in its boundaries. Inconsequence, aβ-Cell is a light weight cell such that it incursless storage than anα-Cell. In favor of system scalability,LARS* prefers aβ-Cell over anα-Cell to reduce the totalsystem storage.

Empty Cell (γ-Cell). a γ-Cell is a cell that maintains

Page 5: TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, … · ings motivate an interesting new paradigm of location-aware recommendations, whereby the recommender system exploits the spatial

TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 6, NO. 1, NOVEMBER 2012 5

neither the statistics nor the recommendation model for theratings lying within its boundaries. aγ-Cell is the most lightweight cell among all cell types as it almost incurs no storageoverhead. Note that anα-Cell can haveα-Cells, β-Cells, orγ-Cells children. Also, aβ-Cell can haveα-Cells,β-Cells, orγ-Cells children. However, aγ-Cell cannot have any children.

4.1.1 Pyramid structure intuitionAn α-Cell requires the highest storage and maintenance over-head because it maintains a CF model as well as the user/itemratings statistics. On the other hand, anα-Cell (as opposedto β-Cell andγ-Cell) is the only cell that can be leveragedto answer recommendation queries. A pyramid structure thatonly containsα-Cells achieves the highest recommendationlocality, and this is why anα-Cell is considered the highlyranked cell type in LARS*. aβ-Cell is the secondly rankedcell type as it only maintains statistics about the user/itemratings. The storage and maintenance overhead incurred bya β-Cell is less expensive than anα-Cell. The statisticsmaintained at aβ-Cell determines whether the children of thatcell needs to be maintained asα-Cells to serve more localizedrecommendation. Finally, aγ-Cell (lowest ranked cell type)has the least maintenance cost, as neither a CF model norstatistics are maintained for that cell. Moreover, aγ-Cell is aleaf cell in the pyramid.

LARS* upgrades (downgrades) a cell to a higher (lower)cell rank, based on trade-offs between recommendation lo-cality and system scalability (discussed in Section 4.3). Ifrecommendation locality is preferred over scalability, moreα-Cells are maintained in the pyramid. On the other hand,if scalability is favored over locality, moreγ-Cells exist inthe pyramid.β-Cells comes as an intermediary stage betweenα-Cells andγ-Cells to further increase the recommendationlocality whereas the system scalability is not quite affected.

We chose to employ a pyramid as it is a “space-partitioning”structure that is guaranteed to completely cover a given space.For our purposes, “data-partitioning” structures (e.g., R-trees)are less ideal than a “space-partitioning” structure for two mainreasons: (1) “data-partitioning” structures index data points,and hence covers only locations that are inserted in them. Inother words, “data-partitioning” structures are not guaranteedto completely cover a given space, which is not suitable forqueries issued in arbitrary spatial locations. (2) In contrastto “data-partitioning” structures (e.g., R-trees [13]), “spacepartitioning” structures show better performance for dynamicmemory resident data [14], [15], [16].

4.1.2 LARS* versus LARSTable 1 compares LARS* against LARS. Like LARS*,LARS [8] employs a partial pyramid data structure to supportspatial user ratings for non-spatial items. LARS is differentfrom LARS* in the following aspects: (1) As shown inTable 1, LARS* maintainsα-Cells, β-Cells, andγ-Cells,whereas LARS only maintainsα-Cells andγ-Cells. In otherwords, LARS either merges or splits a pyramid cell based ona tradeoff between scalability and recommendation locality.LARS* employs the same tradeoff and further increases therecommendation locality by allowing for moreα-Cells to be

LARS LARS*

Supported Featuresα-Cell Yes Yes

β-Cell No Yes

γ-Cell Yes Yes

Speculative Split Yes No

Rating Statistics No Yes

Performance FactorsLocality - ≈26% higher than LARS

Storage ≈5% lower than LARS* -

Maintenance - ≈38% lower than LARS

TABLE 1Comparison between LARS and LARS*. Detailed experimental

evaluation results are provided in section 7.

maintained at lower pyramid levels. (2) As opposed to LARS,LARS* does not perform a speculative splitting operationto decide whether to maintain more localized CF models.However, LARS maintains extra statistics at eachα-Cell andβ-Cell that helps in quickly deciding wether a CF model needsto be maintained at a child cell. (3) As it turns out fromTable 1, LARS* achieves higher recommendation locality thanLARS. That is due to the fact that LARS maintains a CFrecommendation model in a cell at pyramid levelh if andonly if a CF model, at its parent cell at levelh − 1, is alsomaintained. However, LARS* may maintain anα-Cell at levelh even though its parent cell, at levelh−1, does not maintaina CF model, i.e., the parent cell is aβ-Cell. In LARS*, therole of aβ-Cell is to keep theuser/item ratings statisticsthatare used to quickly decide whether the child cells needs tobeγ-Cells orα-Cells. (4) As given in Table 1, LARS* incursmore storage overhead than LARS which is explained by thefact that LARS* maintains additional type of cell, i.e.,β-Cells, whereas LARS only maintainsα-Cells andγ-Cells. Inaddition, LARS* may also maintain moreα-Cells than LARSdoes in order to increase the recommendation locality. (5) EvenLARS* may maintain moreα-Cells than LARS besides theextra statistics maintained atβ-Cells, nonetheless LARS*incurs less maintenance cost. That is due to the fact thatLARS* also reduces the maintenance overhead by avoiding theexpensive speculative splitting operation employed by LARSmaintenance algorithm. Instead, LARS* employs theuser/itemratings statisticsmaintained at either aβ-Cell or anα-Cell toquickly decide whether the cell children need to maintain aCF model (i.e., upgraded toα-Cells), just needs to maintainthe statistics (i.e., becomeβ-Cells), or perhaps downgraded toγ-Cells.

4.2 Query Processing

Given a recommendation query (as described in Section 2.1)with user locationL and a limit K, LARS* performs twoquery processing steps: (1) The user locationL is used tofind the lowest maintainedα-Cell C in the adaptive pyramidthat containsL. This is done by hashing the user location toretrieve the cell at the lowest level of the pyramid. If anα-Cell is not maintained at the lowest level, we return the nearestmaintained ancestorα-Cell. (2) The top-k recommended items

Page 6: TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, … · ings motivate an interesting new paradigm of location-aware recommendations, whereby the recommender system exploits the spatial

TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 6, NO. 1, NOVEMBER 2012 6

are generated using the item-based collaborative filteringtech-nique (covered in Section 2.2) using the model stored atC.As mentioned earlier, the model inC is built usingonly thespatial ratings associated with user locations withinC.

In addition to traditional recommendation queries (i.e.,snapshot queries), LARS* also supports continuous queriesand can account for theinfluencerequirement as follows.

Continuous queries.LARS* evaluates a continuous queryin full once it is issued, and sends recommendations backto a userU as an initial answer. LARS* then monitors themovement ofU using her location updates. As long asU doesnot cross the boundary of her current grid cell, LARS* doesnothing as the initial answer is still valid. OnceU crosses a cellboundary, LARS* reevaluates the recommendation query forthe new cell only if the new cell is anα-Cell. In case the newcell is anα-Cell, LARS* only sends incremental updates [16]to the last reported answer. Like snapshot queries, if a cellatlevel h is not maintained, the query is temporarily transferredhigher in the pyramid to the nearest maintained ancestorα-Cell. Note that since higher-level cells maintain larger spatialregions, the continuous query will cross spatial boundaries lessoften, reducing the amount of recommendation updates.

Influence level. LARS* addresses theinfluencerequirementby allowing querying users to specify an optionalinfluencelevel (in addition to locationL and limit K) that controlsthe size of the spatial neighborhood used to influence theirrecommendations. An influence levelI maps to a pyramidlevel and acts much like a “zoom” level in Google or Bingmaps (e.g., city block, neighborhood, entire city). The level Iinstructs LARS* to process the recommendation query startingfrom the grid α-Cell containing the querying user locationat level I, instead of the lowest maintained gridα-Cell (thedefault). An influence level of zero forces LARS* to use theroot cell of the pyramid, and thus act as a traditional (non-spatial) collaborative filtering recommender system.

4.3 Data Structure Maintenance

This section describes building and maintaining the pyramiddata structure. Initially, to build the pyramid, all location-basedratings currently in the system are used to build acompletepyramidof heightH , such that all cells in allH levels areα-Cells and contain ratings statistics and a collaborative filteringmodel. The initial heightH is chosen according to the levelof locality desired, where the cells in the lowest pyramid levelrepresent the most localized regions. After this initial build,we invoke acell type maintenancestep that scans all cellsstarting from the lowest levelh and downgrades cell types toeither (β-Cell or γ-Cell) if necessary (cell type switching isdiscussed in Section 4.5.2). We note that while the originalpartial pyramid [11] was concerned with spatial queries overstatic data, it did not address pyramid maintenance.

4.4 Main Idea

As time goes by, new users, ratings, and items will be addedto the system. This new data will both increase the size of thecollaborative filtering models maintained in the pyramid cells,as well as alter recommendations produced from each cell.

Algorithm 1 Pyramid maintenance algorithm1: /* Called after cellC receivesN% new ratings */2: Function PyramidMaintenance(Cell C, Level h)3: /* Step I: Statistics Maintenance*/4: Maintain cellC statistics5: /*Step II: Model Rebuild */6: if (Cell C is anα-Cell) then7: Rebuild item-based collaborative filtering model for cellC8: end if9: /*Step III: Cell Child Quadrant Maintenance */

10: if (C children quadrantq cells areα-Cells) then11: CheckDownGradeToSCells(q,C) /* covered in Section 4.5.2 */12: else if (C children quadrantq cells areγ-Cells) then13: CheckUpGradeToSCells(q,C)14: else15: isSwitchedToMcells← CheckUpGradeToMCells(q,C) /* covered in Sec-

tion 4.5.3 */16: if (isSwitchedToMcells isFalse) then17: CheckDownGradeToECells(q,C)18: end if19: end if20: return

To account for these changes, LARS* performs maintenanceon a cell-by-cell basis. Maintenance is triggered for a cellC

once it receivesN% new ratings; the percentage is computedfrom the number of existing ratings inC. We do this becausean appealing quality of collaborative filtering is that as amodel matures (i.e., more data is used to build the model),more updates are needed to significantly change the top-k

recommendations produced from it [17]. Thus, maintenanceis needed less often.

We note the following features of pyramid maintenance:(1) Maintenance can be performed completely offline, i.e.,LARS* can continue to produce recommendations using the”old” pyramid cells while part of the pyramid is being updated;(2) maintenance does not entail rebuilding the whole pyramidat once, instead, only one cell is rebuilt at a time; (3) main-tenance is performed only afterN% new ratings are added toa pyramid cell, meaning maintenance will be amortized overmany operations.

4.5 Maintenance Algorithm

Algorithm 1 provides the pseudocode for the LARS* mainte-nance algorithm. The algorithm takes as input a pyramid cellC and levelh, and includes three main steps:Statistics Mainte-nance, Model RebuildandCell Child Quadrant Maintenance,explained below.

Step I: Statistics Maintenance.The first step (line 4) isto maintain theItems Ratings Statistics Table. The maintainedstatistics are necessary for cell type switching decision,espe-cially when new location-based ratings enter the system. Asthe items ratings statistics tableis implemented using a hashtable, then it can be queried and maintained inO(1)) time,requiringO(|IC |) space such thatIC is the set of all itemsrated at cellC and |IC | is the total number of items inIC .

Step II: Model Rebuild. The second step is to rebuild theitem-based collaborative filtering (CF) model for a cellC, asdescribed in Section 2.2 (line 7). The model is rebuilt at cellC only if cell C is an α-Cell, otherwise (β-Cell or γ-Cell)no CF recommendation model is maintained, and hence themodel rebuild step does not apply Rebuilding the CF modelis necessary to allow the model to “evolve” as new location-

Page 7: TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, … · ings motivate an interesting new paradigm of location-aware recommendations, whereby the recommender system exploits the spatial

TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 6, NO. 1, NOVEMBER 2012 7

based ratings enter the system (e.g., accounting for new items,ratings, or users). Given the cost of building the item-based CFmodel isO(R

2

U ) (per Section 2.2), the cost of the model rebuildfor a cellC at levelh is (R/4h)2

(U/4h)= R2

4hU, assuming ratings and

users are uniformly distributed.Step III: Cell Child Quadrant Maintenance. LARS*

invokes a maintenance step that may decide whether cellC child quadrant need to be switched to a different celltype based on trade-offs betweenscalability and locality. Thealgorithm first checks if cellC child quadrantq at levelh+1 isof typeα-Cell (line 10). If that case holds, LARS* considersquadrantq cells as candidates to be downgraded toβ-Cells(calling functionCheckDownGradeToSCellson line 11). Weprovide details of theDowngradeα-Cells toβ-Cellsoperationin Section 4.5.2. On the other hand, ifC have a child quadrantof type γ-Cells at levelh + 1 (line 12), LARS* considersupgrading cellC four children cells at levelh + 1 to β-Cells (calling functionCheckUpGradeToSCellson line 13).The Updgrade From E toβ-Cells operation is covered inSection 4.5.4. However, ifC have a child quadrant of typeβ-Cells at levelh+1 (line 12), LARS* first considers upgradingcell C four children cells at levelh + 1 from β-Cells toα-Cells (calling functionCheckUpGradeToMCellson line 15).If the children cells are not switched toα-Cells, LARS*then considers downgrading them toγ-Cells (calling functionCheckDownGradeToECellson line 17). Cell Type switchingoperations are performed completely in quadrants (i.e., fourequi-area cells with the same parent). We made this decisionfor simplicity in maintaining the partial pyramid.

4.5.1 Recommendation LocalityIn this section, we explain the notion of locality in recommen-dation that is essential to understand the cell type switching(upgrade/downgrade) operations highlighted in thePyramid-Maintenance algorithm (algorithm 1). We use the followingexample to give the intuition behind recommendation locality.Running Example. Figure 6 depicts a two-levels pyramid inwhichCp is the root cell and its children cells areC1, C2, C3,andC4. In the example, we assume eight users (U1, U2, ...,andU8) have rated eight different items (I1, I2, ..., andI8).Figure 6(b) gives the spatial distributions of usersU1, U2, U3,U4, U5, U6, U7, andU8 as well as the items that each userrated.Intuition. Consider the example given in Figure 6. In cellCp,usersU2 andU5 that belongs to the child cellC2 have bothrated itemsI2 andI5. In that case, the similarity score betweenitems I2 and I5 in the item-based collaborative filtering CFmodel built at cellC2 is exactly the same as the one in theCF model built at cellCp. The last phenomenon happenedbecause items (i.e.,I2 andI5) have been rated by mostly userslocated in the same child cell, and hence the recommendationmodel at the parent cell will not be different from the modelat the children cells. In this case, if the CF model atC2 is notmaintained, LARS* does not lose recommendation locality atall.

The opposite case happens when an item is rated by userslocated in different pyramid cells (spatially skewed). Forexample, itemI4 is rated by usersU2, U4, andU7 in three

Parameter Description

RPc,i The set of user pairs that co-rated itemi in cell c

RSc,i The set of user pairs that co-rated itemi in cell c such that eachpair of users〈u1, u2〉 ∈ Sc,i are not located in the same childcell of c

LGc,i The degree of locality lost for itemi from downgrading the fourchildren of cellc to β-Cells, such that0 ≤ LGc,i ≤ 1

LGc The amount of locality lost by downgrading cellc four childrencells toβ-Cells (0 ≤ LGc ≤ 1)

TABLE 2Summary of Mathematical Notations.

different cells (C2, C3, andC4). In this case,U2, U4, andU7

are spatially skewed. Hence, the similarity score between itemI4 and other items at the children cells is different from thesimilarity score calculated at the parent cellCp because notall users that have rated itemI4 exist in the same child cell.Based on that, we observe the following:

Observation 1:The more the user/item ratings in a parentcell C are geographically skewed, the higher the localitygained from building the item-based CF model at the fourchildren cells.

The amount of locality gained/lost by maintaining the childcells of a given pyramid cell depends on whether the CFmodels at the child cells are similar to the CF model built at theparent cell. In other words, LARS* loses locality if the childcells are not maintained even though the CF model at thesecells produce different recommendations than the CF model atthe parent cell. LARS* leverages Observation 1 to determinethe amount of locality gained/lost due to maintaining an item-based CF model at the four children. LARS* calculates thelocality loss/gain as follows:Locality Loss/Gain. Table 2 gives the main mathemati-cal notions used in calculating the recommendation localityloss/gain. First, theItem Ratings Pairs Set(RPc,i) is definedas the set of all possible pairs of users that rated itemi

in cell c. For example, in figure 6(c) the item ratings pairsset for item I7 in cell Cp (RPCp,I7) has three elements(i.e.,RPCp,I7={〈U3, U6〉,〈U3, U7〉,〈U6, U7〉}) as only usersU1

and U7 have rated itemI1. Similarly, RPCp,I2 is equal to{〈U6, U7〉} (i.e., UsersU2 andU5 have rated itemI2).

For each item, we define theSkewed Item Ratings Set(RSc,i) as the total number of user pairs in cellc that rateditem i such that each pair of users∈ RSc,i do not exist inthe same child cell ofc. For example, in Figure 6(c), theskewed item ratings set for itemI2 in cell Cp (RSCP ,I2 ) is∅ as all users that ratedI2, i.e., U2 and U5 are collocatedin the same child cellC2. For I4, the skewed item ratings setRSCP ,I2={〈U2, U7〉, 〈U2, U4〉, 〈U4, U7〉} as all users that rateditem I2 are located in different child cells,i.e.,U2 at C2, U4

at C4, andU7 at C3.Given the aforementioned parameters, we calculateItem

Locality Loss (LGc,i) for each item, as follows:Definition 1: Item Locality Loss (LGc,i)

LGc,i is defined as the degree of locality lost for itemi fromdowngrading the four children of cellc to β-Cells, such that0 ≤ LGc,i ≤ 1.

Page 8: TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, … · ings motivate an interesting new paradigm of location-aware recommendations, whereby the recommender system exploits the spatial

TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 6, NO. 1, NOVEMBER 2012 8

Level h -1

Level h

U1 U2

U3 U4

Cp U5

U6 U7

U8

U2 U1

U3

U4

C1

C3

C4

C2 U5

U6

U7

U8

(a) Two-Levels Pyramid

!"#$%#&"#

!"#$%#&'#

!"#$%#&(#

!(#$%#&"#

!(#$%#&(#

C2

C4

C3

!)#$%#&)#

!)#$%#&*#

!*#$%#&'#

!*#$%#&*#

!*#$%#&+#

!'#$%#&'#

!'#$%#&,#

!'#$%#&+#

!,#$%#&)#

!,#$%#&*#

!'#$%#&,#

Cp

&-./0# 1-./"# 2345.#

!"# !$# %&'#

!"# !(# %&)#

!*# !)# %&)#

!*# !+# %&,#

!+# !)# %&+#

-# -# -#

Cp Model

&-./0# 1-./"# 2345.#

!,# !$# %&'#

!,# !+# %&+#

!,# !'# %&"#

-# -# -#

&-./0# 1-./"# 2345.#

!,# !+# %&(#

!,# !$# %&$#

!,# !'# %&,#

-# -# -#

C3 Model

C4 Model

&-./0# 1-./"# 2345.#

!*# !)# %&)#

!*# !+# %&,#

!+# !)# %&+#

C2 Model

C1

!0#$%#&0#

!0#$%#&,#

!*#$%#&0#

!*#$%#&*#

&-./0# 1-./"# 2345.#

!"# !$# %&'#

!"# !(# %&)#

-# -# -#

C1 Model

(b) Ratings Distribution and Recommendation Models

!"#$% &'()*+&% &',)*+&% -.)*+%

!"# "# $# $%$#

!&# "# $# $%$#

!'# "# "# "%$#

!(# '# '# "%$#

!)# "# $# $%$#

!*# '# &# $%***#

!+# *# &# $%"**#

!,# "# "# "%$#

)#//%-012/+"3%.2+4%5-.)*+6% $%(,#

(c) Locality Loss/Gain atCp

Fig. 6. Item Ratings Spatial Distribution Example

LGc,i =|RSc,i|

|RPc,i|(3)

The value of both|RSc,i| and |RPc,i| can be easily extractedusing theitems ratings statistics table. Then, we use theLGc,i

values calculated for all items in cellc in order to calculatethe overallCell Locality loss (LGc) from downgrading thechildren cells ofc to α-Cells.

Definition 2: Locality Loss (LGc)LGc is defined as the total locality lost by downgrading cellc

four children cells toβ-Cells (0 ≤ LGc ≤ 1). It is calculatedas the the sum of all items locality loss normalized by the

total number of items|Ic| in cell c.

LGc =

∑i∈Ic

LGc,i

|Ic|(4)

The cell locality loss (or gain) is harnessed by LARS* todetermine whether the cell children need to be downgradedfromα-Cell toβ-Cell rank, upgraded from theγ-Cell toβ-Cellrank, or downgraded fromβ-Cell to γ-Cell rank. During therest of section 4, we explain the cell rank upgrade/downgradeoperations.

4.5.2 Downgrade α-Cells to β-Cells

That operation entails downgrading an entire quadrant of cellsfrom M-Cells toβ-Cells at levelh with a common parent atlevel h − 1. Downgradingα-Cells to β-Cells improves scal-ability (i.e., storage and computational overhead) of LARS*,as it reduces storage by discarding the item-based collabo-rative filtering (CF) models of the the four children cells.Furthermore, downgradingα-Cells to β-Cells leads to thefollowing performance improvements: (a)less maintenancecost, since less CF models are periodically rebuilt, and (b)lesscontinuous query processing computation, as β-Cells doesnot maintain a CF model and if manyβ-Cells cover a largespatial region, hence, for users crossingβ-Cells boundaries,we do not need to update the recommendation query answer.Downgrading children cells fromα-Cells toβ-Cells might hurtrecommendation locality, since no CF models are maintainedat the granularity of the child cells anymore.

At cell Cp, in order to determine whether to downgradea quadrantq cells to β-Cells (i.e., functionCheckDown-GradeToSCellson line 11 in Algorithm 1), we calculatetwo percentage values: (1)locality loss (see equation 4),the amount of locality lost by (potentially) downgrading thechildren cells toβ-Cells, and (2)scalability gain, the amountof scalability gained by (potentially) downgrading the childrencells to β-Cells. Details of calculating these percentages arecovered next. When deciding to downgrade cells toβ-Cells,we define a system parameterM, a real number in therange [0,1] that defines a tradeoff between scalability gainandlocality loss. LARS* downgrades a quadrantq cells toβ-Cells(i.e., discards quadrantq) if:

(1 −M) ∗ scalability gain > M∗ locality loss (5)

A smaller M value implies gaining scalability is importantand the system is willing to lose a large amount of localityfor small gains in scalability. Conversely, a largerM valueimplies scalability is not a concern, and the amount of localitylost must be small in order to allow forβ-Cells downgrade.At the extremes, settingM=0 (i.e., always switch toβ-Cell)implies LARS* will function as a traditional CF recommendersystem, while settingM=1 causes LARS* pyramid cells toall be α-Cells, i.e., LARS* will employ a complete pyramidstructure maintaining a recommendation model at all cells atall levels.

Calculating Locality Loss. To calculate the locality lossat a cell Cp, LARS* leverages theItem Ratings StatisticsTablemaintained in that cell. First, LARS* calculates the itemlocality lossLGCp,i for each itemi in the cellCp. Therefore,

Page 9: TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, … · ings motivate an interesting new paradigm of location-aware recommendations, whereby the recommender system exploits the spatial

TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 6, NO. 1, NOVEMBER 2012 9

LARS* aggregates the item locality loss values calculated foreach itemi ∈ Cp, to finally deduce the global cell localitylossLGCp

.Calculating scalability gain. Scalability gain is measured

in storage and computation savings. We measure scalabilitygain by summing the recommendation model sizes for eachof the downgraded (i.e., child) cells (abbr.sizem), and dividethis value by the sum ofsizem and the recommendationmodel size of the parent cell. We refer to this percentageas thestorage gain. We also quantifycomputationsavingsusing storage gain as a surrogate measurement, as computationis considered a direct function of the amount of data in thesystem.

Cost. using theItems Ratings Statistics Tablemaintainedat cell Cp, the locality loss at cellCp can be calculated inO(|ICp

|) time such that|ICp| represents the total number of

items in Cp. As scalability gain can be calculated inO(1)time, then the total time cost of theDowngrade Toβ-Cellsoperation isO(|ICp

|).Example. For the example given in Figure 6(c), the

locality loss of downgrading cellCp four children cells{C1, C2, C3, C4} to β-Cells is calculated as follows: First,we retrieve the locality lossLGCp,i for each item i ∈{I1, I2, I3, I4, I5, I6, I7, I8}, from the maintained statisticsat cell Cp. As given in figure 6(c),LGCp,I1 , LGCp,I2 ,LGCp,I3 , LGCp,I4 , LGCp,I5 , LGCp,I6 , LGCp,I7 , andLGCp,I8

are equal to 0.0, 0.0, 1.0, 1.0, 0.0, 0.666, 0.166, and 1.0,respectively. Then, we calculate the overall locality lossatCp (using equation 4),LGCp

by summing all the localityloss values of all items and dividing the sum by the totalnumber of items. Hence, the scalability loss is equal to( 0.0+0.0+1.0+1.0+0.0+1.0+0.666+1.0

8 ) = 0.48 = 48%. To calcu-late scalability gain, assume the sum of the model sizes forcells C1 to C4 and CP is 4GB, and the sum of the modelsizes for cellsC1 to C4 is 2GB. Then, thescalability gainis24=50%. AssumingM=0.7, then(0.3 × 50) < (0.7 × 48),meaning that LARS* will not downgrade cellsC1, C2, C3,C4 to β-Cells.

4.5.3 Upgrade β-Cells to α-CellsUpgradingβ-Cells toα-Cells operation entails upgrading thecell type of a cell child quadrant at pyramid levelh undera cell at levelh − 1, to α-Cells. Upgrading β-Cells to α-Cells operation improves locality in LARS*, as it leads tomaintaining a CF model at the children cells that representmore granular spatial regions capable of producing recommen-dations unique to the smaller, more “local”, spatial regions. Onthe other hand, upgrading cells toα-Cells hurts scalability byrequiring storage and maintenance of more item-based collab-orative filtering models. The upgrade toα-Cells operation alsonegatively affects continuous query processing, since it createsmore granularα-Cells causing user locations to crossα-Cellboundaries more often, triggering recommendation updates.

To determine whether to upgrade a cellCP (quadrantq) four children cells toα-Cells (i.e., functionCheckUp-GradeToMCellson line 15 of Algorithm 1). Two percentagesare calculated:locality gainandscalability loss. These valuesare the opposite of those calculated for theUpgrade toβ-Cells

operation. LARS* change cellCP child quadrantq to α-Cellsonly if the following condition holds:

M∗ locality gain > (1−M) ∗ scalability loss (6)

This equation represents the opposite criteria of that presentedfor Upgrade toβ-Cells operation in Equation 5.

Calculating locality gain. To calculate the locality gain,LARS* does not need to speculatively build the CF model atthe four children cells. The locality gain is calculated thesameway the locality loss is calculated in equation 4.

Calculating scalability loss. We calculatescalability lossby estimating the storage necessary to maintain the childrencells. Recall from Section 2.2 that the maximum size of anitem-based CF model is approximatelyn|I|, wheren is themodel size. We can multiplyn|I| by the number of bytesneeded to store an item in a CF model to find an upper-boundstorage size of each potentiallyUpgradeded toα-Cell cell.The sum of these four estimated sizes (abbr.sizes) dividedby the sum of the size of the existing parent cell andsizesrepresents thescalability lossmetric.

Cost. Similar to theCheckDownGradeToSCellsoperation,scalability loss is calculate inO(1) and locality gain can becalculated inO(|ICp

|) time. Then, the total time cost of theCheckUpGradeToMCellsoperation isO(|ICp

|).Example. Consider the example given in Figure 6(c). As-

sume the cellCp is anα-Cell and its four childrenC1, C2, C3,andC4 are β-Cells. Thelocality gain (LGCp

) is calculatedusing equation 4 to be 0.48 (i.e., 48%) as depicted in thetable in Figure 6(c). Further, assume that we estimate theextra storage overhead for upgradinging the children cellstoα-Cells (i.e.,storage loss) to be 50%. AssumingM=0.7, then(0.7 × 48) > (0.3 × 50), meaning that LARS* will decideto upgradeCP four children cells toα-Cells aslocality gainis significantly higher thanscalability loss.

4.5.4 Downgrade β-Cells to γ-Cells and Vice Versa

Downgradingβ-Cells toγ-Cells operation entails downgrad-ing the cell type of a cell child quadrant at pyramid levelh under a cell at levelh − 1, to γ-Cells (i.e., empty cells).Downgrading the child quadrant type toγ-Cells means that themaintained statistics are no more maintained in the childrencell, which definitely reduces the overhead of maintaining theItem Ratings Statistics Tableat these cells. Even thoughγ-Cells incurs no maintenance overhead, however they reducethe amount of recommendation locality that LARS* provides.

The decision of downgrading fromβ-Cells to γ-Cells istaken based on a system parameter, namedMAX SLEVELS.It is defined as the maximum number of consecutivepyramid levels in which descendant cells can beβ-Cells.MAX SLEVELScan take any value between zero and the totalheight of the pyramid. A high value ofMAX SLEVELSresultsin maintaining moreβ-Cells and lessγ-Cells in the pyramid.For example, in Figure 4,MAX SLEVELSis set to two, andthis is why if two consecutive pyramid levels areβ-Cells,the third levelβ-Cells are autotmatically downgraded toγ-Cells. For eachβ-Cell C, a counter, calledS-Levels Counter,is maintained. The S-Levels Counter stores of the total number

Page 10: TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, … · ings motivate an interesting new paradigm of location-aware recommendations, whereby the recommender system exploits the spatial

TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 6, NO. 1, NOVEMBER 2012 10

of consecutive levels in the direct ancestry of cellC such thatall these levels containsβ-Cells.

At a β-Cell C, if the cell children areβ-Cells, then wecompare theS-Levels Counterat the child cells with theMAX SLEVELSparameter. Note that the counter counts onlythe consecutive S-Levels, so if some levels in the chain areα-Cells the counter is reset to zero at theα-Cells levels. IfS-Levels Counteris greater than or equal toMAX SLEVELS,then the children cells ofC are downgraded toγ-Cells.Otherwise, cellC children cells are not downgraded toγ-Cells. Similarly, LARS* also makes use of the sameS-LevelsCounterto decide whether to upgradeγ-Cells toβ-Cells.

5 NON-SPATIAL USER RATINGS FORSPATIAL ITEMS

This section describes how LARS* produces recommendationsusing non-spatial ratings for spatial items represented bythetuple (user, rating, item, ilocation). The idea is to exploittravellocality, i.e., the observation that users limit their choice ofspatial venues based on travel distance (based on analysis inSection 1.1). Traditional (non-spatial) recommendation tech-niques may produce recommendations with burdensome traveldistances (e.g., hundreds of miles away). LARS* producesrecommendations within reasonable travel distances by usingtravel penalty, a technique that penalizes the recommendationrank of items the further in travel distance they are from aquerying user.Travel penaltymay incur expensive computa-tional overhead by calculating travel distance to each item.Thus, LARS* employs an efficient query processing techniquecapable ofearly terminationto produce the recommendationswithout calculating the travel distance to all items. Section 5.1describes the query processing framework while Section 5.2describes travel distance computation.

5.1 Query Processing

Query processing for spatial items using thetravel penaltytechnique employs a single system-wide item-based collabo-rative filtering model to generate the top-k recommendationsby ranking each spatial itemi for a querying useru based onRecScore(u, i), computed as:

RecScore(u, i) = P (u, i)− TravelPenalty(u, i) (7)

P (u, i) is the standard item-based CF predicted rating of itemi for user u (see Section 2.2).TravelPenalty(u, i) is theroad network travel distance betweenu and i normalized tothe same value range as the rating scale (e.g., [0, 5]).

When processing recommendations, we aim to avoid cal-culating Equation 7 forall candidate items to find the top-k

recommendations, which can become quite expensive giventhe need to compute travel distances. To avoid such computa-tion, we evaluate items in monotonically increasing order oftravel penalty (i.e., travel distance), enabling us to use earlytermination principles from top-k query processing [18], [19],[20]. We now present the main idea of our query processingalgorithm and in the next section discuss how to computetravel penalties in an increasing order of travel distance.

Algorithm 2 Travel Penalty Algorithm for Spatial Items1: Function LARS* SpatialItems(User U , Location L, Limit K)2: /* Populate a listR with a set ofK items*/3: R ← φ4: for (K iterations)do5: i ← Retrieve the item with the next lowest travel penalty (Section 5.2)6: Insert i into R ordered byRecScore(U, i) computed by Equation 77: end for8: LowestRecScore← RecScore of thekth object inR9: /*Retrieve items one by one in order of their penalty value */

10: while there are more items to processdo11: i ← Retrieve the next item in order of penalty score (Section 5.2)12: MaxPossibleScore ← MAX RATING - i.penalty13: if MaxPossibleScore ≤ LowestRecScore then14: return R /* early termination - end query processing */15: end if16: RecScore(U, i) ← P (U, i) - i.penalty /* Equation 7 */17: if RecScore(U, i) > LowestRecScore then18: Insert i into R ordered byRecScore(U, i)19: LowestRecScore← RecScore of thekth object inR20: end if21: end while22: return R

Algorithm 2 provides the pseudo code of our query pro-cessing algorithm that takes a querying user idU , a locationL, and a limit K as input, and returns the listR of top-krecommended items. The algorithm starts by running ak-nearest-neighbor algorithm to populate the listR with k itemswith lowest travel penalty;R is sorted by the recommendationscore computed using Equation 7. This initial part is concludedby setting the lowest recommendation score value (LowestRec-Score) as theRecScoreof the kth item in R (Lines 3 to 8).Then, the algorithm starts to retrieve items one by one inthe order of their penalty score. This can be done using anincrementalk-nearest-neighbor algorithm, as will be describedin the next section. For each itemi, we calculate themaximumpossiblerecommendation score thati can have by subtractingthe travel penalty ofi from MAX RATING, the maximumpossible rating value in the system, e.g., 5 (Line 12). Ifi

cannot make it into the list of top-k recommended items withthis maximum possible score, we immediately terminate the al-gorithm by returningR as the top-k recommendations withoutcomputing the recommendation score (and travel distance) formore items (Lines 13 to 15). The rationale here is that sincewe are retrieving items in increasing order of their penaltyand calculating the maximum score that any remaining itemcan have, then there is no chance that any unprocessed itemcan beat the lowest recommendation score inR. If the earlytermination case does not arise, we continue to compute thescore for each itemi using Equation 7, inserti intoR sorted byits score (removing thekth item if necessary), and adjust thelowest recommendation value accordingly (Lines 16 to 20).

Travel penaltyrequires very little maintenance. The onlymaintenance necessary is to occasionally rebuild the singlesystem-wide item-based collaborative filtering model in orderto account for new location-based ratings that enter the system.Following the reasoning discussed in Section 4.3, we rebuildthe model after receivingN% new ratings.

5.2 Incremental Travel Penalty Computation

This section gives an overview of two methods we imple-mented in LARS* to incrementally retrieve items one by one

Page 11: TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, … · ings motivate an interesting new paradigm of location-aware recommendations, whereby the recommender system exploits the spatial

TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 6, NO. 1, NOVEMBER 2012 11

ordered by their travel penalty. The two methods exhibit atrade-off between query processing efficiency and penalty ac-curacy: (1) anonline method that provides exact travel penal-ties but is expensive to compute, and (2) anoffline heuristicmethod that is less exact but efficient in penalty retrieval.Both methods can be employed interchangeably in Line 11of Algorithm 2.

5.2.1 Incremental KNN: An Exact Online MethodTo calculate an exact travel penalty for a useru to item i,we employ an incrementalk-nearest-neighbor (KNN) tech-nique [21], [22], [23]. Given a user locationl, incrementalKNN algorithms return, on each invocation, the next itemi nearest tou with regard to travel distanced. In our case,we normalize distanced to the ratings scale to get the travelpenalty in Equation 7. Incremental KNN techniques exist forboth Euclidean distance [22] and (road) network distance [21],[23]. The advantage of using Incremental KNN techniques isthat they provide anexacttravel distances between a queryinguser’s location and each recommendation candidate item. Thedisadvantage is that distances must be computedonline atquery runtime, which can be expensive. For instance, the run-time complexity of retrieving a single item using incrementalKNN in Euclidean space is [22]:O(k+ logN ), whereN andk are the number of total items and items retrieved so far,respectively.

5.2.2 Penalty Grid: A Heuristic Offline MethodA more efficient, yet less accurate method to retrieve travelpenalties incrementally is to use a pre-computedpenalty grid.The idea is to partition space using ann× n grid. Each gridcell c is of equal size and contains all items whose locationfalls within the spatial region defined byc. Each cellc containsa penalty list that stores the pre-computed penalty values fortraveling from anywhere withinc to all othern2−1 destinationcells in the grid; this means all items within a destination gridcell share thesamepenalty value. The penalty list forc issorted by penalty value and always storesc (itself) as the firstitem with a penalty of zero. To retrieve items incrementally, allitems within the cell containing the querying user are returnedone-by-one (in any order) since they have no penalty. Afterthese items are exhausted, items contained in the next cell inthe penalty list are returned, and so forth until Algorithm 2terminates early or processes all items.

To populate the penalty grid, we must calculate the penaltyvalue for traveling from each cell to every other cell in thegrid. We assume items and users are constrained to a roadnetwork, however, we can also use Euclidean space withoutconsequence. To calculate the penalty from a single source cellc to a destination celld, we first find the average distance totravel from anywhere withinc to all item destinations withind.To do this, we generate ananchor pointp within c that both(1) lies on the road network segment withinc and (2) liesas close as possible to the center ofc. With these criteria,pserves as an approximate average “starting point” for travelingfrom c to d. We then calculate the shortest path distancefrom p to all items contained ind on the road network (anyshortest path algorithm can be used). Finally, we average all

calculated shortest path distances fromc to d. As a final step,we normalize the average distance fromc to d to fall withinthe rating value range. Normalization is necessary as the ratingdomain is usually small (e.g., zero to five), while distance ismeasured in miles or kilometers and can have large values thatheavily influence Equation 7. We repeat this entire process foreach cell to all other cells to populate the entire penalty grid.

When new items are added to the system, their presence ina cell d can alter the average distance value used in penaltycalculation for each source cellc. Thus, we recalculate penaltyscores in the penalty grid afterN new items enter the system.We assume spatial items are relatively static, e.g., restaurantsdo not change location often. Thus, it is unlikelyexistingitemswill change cell locations and in turn alter penalty scores.

6 SPATIAL USER RATINGS FORSPATIAL ITEMS

This section describes how LARS* produces recommendationsusing spatial ratings for spatial items represented by the tuple(user, ulocation, rating, item, ilocation). A salient feature ofLARS* is that both theuser partitioningand travel penaltytechniques can be used together with very little change toproduce recommendations using spatial user ratings for spatialitems. The data structures and maintenance techniques remainexactly the same as discussed in Sections 4 and 5; only thequery processing framework requires a slight modification.Query processing uses Algorithm 2 to produce recommen-dations. However, the only difference is that the item-basedcollaborative filtering prediction scoreP (u, i) used in the rec-ommendation score calculation (Line 16 in Algorithm 2) isgenerated using the (localized) collaborative filtering modelfrom the partial pyramid cell that contains the querying user,instead of the system-wide collaborative filtering model aswasused in Section 5.

7 EXPERIMENTS

This section provides experimental evaluation of LARS* basedon an actual system implementation using C++ and STL. Wecompare LARS* with the standard item-based collaborativefiltering technique along with several variations of LARS*.We also compare LARS* to LARS [8]. Experiments are basedon three data sets:Foursquare: a real data set consisting ofspatial user ratingsfor spatial itemsderived from Foursquare user histories. Wecrawled Foursquare and collected data for 1,010,192 users and642,990 venues across the United States. Foursquare does notpublish each “check-in” for a user, however, we were able tocollect the following pieces of data: (1) user tips for a venue,(2) the venues for which the user is the mayor, and (3) thecompleted to-do list items for a user. In addition, we extractedeach user’s friend list.

Extracting location-based ratings. To extract spatial userratings for spatial items from the Foursquare data (i.e., thefive-tuple (user, ulocation, rating, item, ilocation)), we mapeach user visit to a single location-based rating. Theuseranditem attributes are represented by the unique Foursquare userand venue identifier, respectively. We employ the user’s home

Page 12: TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, … · ings motivate an interesting new paradigm of location-aware recommendations, whereby the recommender system exploits the spatial

TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 6, NO. 1, NOVEMBER 2012 12

city in Foursquare as theulocation attribute. Meanwhile, theilocation attribute is the item’s inherent location. We use anumeric rating value range of [1, 3], translated as follows:(a) 3 represents the user is the “mayor” of the venue, (b) 2represents that the user left a “tip” at the venue, and (c) 1represents the user visited the venue as a completed “to-do”list item. Using this scheme, a user may have multiple ratingsfor a venue, in this case we use the highest rating value.

Data properties. Our experimental data consisted of 22,390location-based ratings for 4K users for 2K venues all fromthe state of Minnesota, USA. We used this reduced data setin order to focus our quality experiments on adenseratingsample. Use ofdenseratings data has been shown to be a veryimportant factor when testing and comparing recommendationquality [17], since use ofsparsedata (i.e., having users oritems with very few ratings) tends to cause inaccuracies inrecommendation techniques.MovieLens: a real data set consisting ofspatial user ratingsfor non-spatial itemstaken from the popular MovieLensrecommender system [5]. The Foursquare and MovieLens dataare used to test recommendation quality. The MovieLens dataused in our experiments was real movie rating data takenfrom the popular MovieLens recommendation system at theUniversity of Minnesota [5]. This data consisted of 87,025ratings for 1,668 movies from 814 users. Each rating wasassociated with the zip code of the user who rated the movie,thus giving us a real data set of spatial user ratings for non-spatial items.Synthetic: a synthetically generated data set consisting ofspatial user ratings for spatial items for venues in the stateof Minnesota, USA. The synthetic data set we use in ourexperiments is generated to contain 2000 users and 1000 items,and 500,000 ratings. Users and items locations are randomlygenerated over the state of Minnesota, USA. Users’ ratings toitems are assigned random values between zero and five. Asthis data set contains a number of ratings that is about twentyfive times and five times larger than the foursquare data setand the Movilens data set, we use such synthetic data set totest scalability and query efficiency.

Unless mentioned otherwise, the default value ofM is 0.3,k is 10, the number of pyramid levels is 8, the influence levelis the lowest pyramid level, and MAXSLEVELS is set totwo. The rest of this section evaluates LARS* recommendationquality (Section 7.1), trade-offs between storage and locality(Section 7.4), scalability (Section 7.5), and query processingefficiency (Section 7.6). As the system stores its data structuresin main memory, all reported time measurements represent theCPU time.

7.1 Recommendation Quality for Varying Pyramid Lev-elsThese experiments test the recommendation quality improve-ment that LARS* achieves over the standard (non-spatial)item-based collaborative filtering method using both theFoursquare and MovieLens data. To test the effectiveness ofour proposed techniques, we test the quality improvementof LARS* with only travel penalty enabled (abbr. LARS*-T), LARS* with only user partitioning enabled and M set

1.4x

1.6x

1.8x

2x

1 2 3 4 5 6 7 8

Qu

alit

y Im

pro

vem

ent

Pyramid Level

LARS*LARS*−ULARS−T

0.6x

0.8x

1x

1.2x

(a) Foursquare data

1 2 3 4 5 6 7 8

Qu

alit

y Im

pro

vem

ent

Pyramid Level

LARS*−ULARS*

5x

10x

15x

20x

25x

(b) MovieLens data

Fig. 7. Quality experiments for varying locality

to one (abbr. LARS*-U), and LARS* with both techniquesenabled and M set to one (abbr. LARS*). Notice that LARS*-T represents the traditional item-based collaborative filteringaugmented with the travel penalty technique (section 5) to takethe distance between the querying user and the recommendeditems into account. We do not plot LARS with LARS* as bothgive the same result for M=1, and the quality experimentsare meant to show how locality increases the recommendationquality.Quality Metric. To measure quality, we build each recom-mendation method using 80% of the ratings from each dataset. Each rating in the withheld 20% represents a Foursquarevenue or MovieLens movie a user is known to like (i.e.,rated highly). For each ratingt in this 20%, we request aset of k ranked recommendationsS by submitting theuserandulocationassociated witht. We first calculate the qualityas the weighted sum of the number of occurrences of theitemassociated witht (the higher the better) inS. The weight ofan item is a value between zero and one that determines howclose the rank of this item from its real rank. The qualityof each recommendation method is calculated and comparedagainst the baseline, i.e., traditional item-based collaborativefiltering. We finally report the ratio of improvement in qualityeach recommendation method achieves over the baseline. Therationale for this metric is that since each withheld ratingrepresents a real visit to a venue (or movie a user liked), thetechnique that produces a large number of correctly rankedanswers that contain venues (or movies) a user is known tolike is considered of higher quality.

Figure 7(a) compares the quality improvement of eachtechnique (over traditional collaborative filtering) for varyinglocality (i.e., different levels of the adaptive pyramid) using theFoursquare data. LARS*-T does not use the adaptive pyramid,thus has constant quality improvement. However, LARS*-Tshows some quality improvement over traditional collaborativefiltering. This quality boost is due to that fact that LARS*-T uses atravel penalty technique that recommends itemswithin a feasible distance. Meanwhile, the quality of LARS*and LARS*-U increases as more localized pyramid cells areused to produce recommendation, which verifies thatuserpartitioning is indeed beneficial and necessary for location-based ratings. Ultimately, LARS* has superior performancedue to the additional use oftravel penalty. While travel penaltyproduces moderate quality gain, it also enables more efficientquery processing, which we observe later in Section 7.6.

Figure 7(b) compares the quality improvement of LARS*-Uover CF (traditional collaborative filtering) for varying locality

Page 13: TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, … · ings motivate an interesting new paradigm of location-aware recommendations, whereby the recommender system exploits the spatial

TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 6, NO. 1, NOVEMBER 2012 13

0.6x

0.8x

1x

1.2x

1.4x

1.6x

1.8x

2x

1 3 5 10

Qu

alit

y Im

pro

vem

ent

Number of Recommended Items (K)

LARS*LARS*−ULARS−T

(a) Foursquare data

10x

15x

20x

25x

1 3 5 10

Qu

alit

y Im

pro

vem

ent

Number Of Recommended Items (K)

LARS*−ULARS*

5x

(b) MovieLens data

Fig. 8. Quality experiments for varying answer sizes

using the MovieLens data. Notice that LARS* gives the samequality improvement as LARS*-U because LARS*-T do notapply for this dataset since movies are not spatial. Comparedto CF, the quality improvement achieved by LARS*-U (andLARS*) increases when it produces movie recommendationsfrom more localized pyramid cells. This behavior furtherverifies thatuser partitioningis beneficial in providing qualityrecommendations localized to a querying user location, evenwhen items are not spatial. Quality decreases (or levels offforMovieLens) for both LARS*-U and/or LARS* for lower levelsof the adaptive pyramid. This is due torecommendation star-vation, i.e., not having enough ratings to produce meaningfulrecommendations.

7.2 Recommendation Quality for Varying k

These experiments test recommendation quality improvementof LARS*, LARS*-U, and LARS*-T for different values ofk(i.e., recommendation answer sizes). We do not plot LARSwith LARS* as both gives the same result for M=1, andthe quality experiments are meant to show how the degreeof locality increases the recommendation quality. We performexperiments using both the Foursquare and MovieLens data.Our quality metric is exactly the same as presented previouslyin Section 7.1.

Figure 8(a) depicts the effect of the recommendation list sizek on the quality of each technique using the Foursquare dataset. We report quality numbers using the pyramid height offour (i.e., the level exhibiting the best quality from Section 7.1in Figure 7(a)). For all sizes ofk from one to ten, LARS* andLARS*-U consistently exhibit better quality. In fact, LARS*consistently achieves better quality over CF for allk. LARS*-T exhibits similar quality to CF for smallerk values, but doesbetter fork values of three and larger.

Figure 8(b) depicts the effect of the recommendation listsize k on the quality of improvement of LARS*-U (andLARS*) over CF using the MovieLens data. Notice thatLARS* gives the same quality improvement as LARS*-Ubecause LARS*-T do not apply for this dataset since moviesare not spatial. This experiment was run using a pyramidhight of seven (i.e., the level exhibiting the best quality inFigure 7(b)). Again, LARS*-U (and LARS*) consistentlyexhibits better quality than CF for sizes ofK from one toten.

7.3 Recommendation Quality for Varying M

These experiments compares the quality improvementachieved by both LARS and LARS* for different values of

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Qu

ali

ty I

mp

rov

em

en

t

M

LARS

LARS*

(a) Foursquare data

0

5

10

15

20

25

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Qu

ali

ty I

mp

rov

em

en

t

M

LARS

LARS*

(b) MovieLens data

Fig. 9. Quality experiments for varying value of M

0

1

2

3

4

5

6

7

8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Sto

rag

e (

GB

)

M

LARS*-M=0

LARS*-M=1

LARS

LARS*

(a) Storage

0

10

20

30

40

50

60

70

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

loc

ali

ty (

%)

M

LARS*-M=0

LARS*-M=1

LARS

LARS*

(b) Locality

Fig. 10. Effect of M on storage and locality (Synthetic data)

M. We perform experiments using both the Foursquare andMovieLens data. Our quality metric is exactly the same aspresented previously in Section 7.1.

Figure 9(a) depicts the effect ofM on the quality of bothLARS and LARS* using the Foursquare data set. Noticethat we enable both the user partitioning and travel penaltytechniques for both LARS and LARS*. We report qualitynumbers using the pyramid height of four and the numberof recommended items of ten. WhenM is equal to zero, bothLARS and LARS* exhibit the same quality improvement asM = 0 represents a traditional collaborative filtering withthe travel penalty technique applied. Also, whenM is setto one, both LARS and LARS* achieve the same qualityimprovement as a fully maintained pyramid is maintainedin both cases. ForM values between zero and one, thequality improvement of both LARS and LARS* increases forhigher values ofM due to the increase in recommendationlocality. LARS* achieves better quality improvement overLARS because LARS* maintainsα-Cells at lower levels ofthe pyramid.

Figure 9(b) depicts the effect ofM on the quality ofboth LARS and LARS* using the Movilens data set. Wereport quality improvement over traditional collaborative fil-tering using the pyramid height of seven and the number ofrecommended items set to ten. Similar to Foursquare dataset, the quality improvement of both LARS and LARS*increases for higher values ofM due to the increase inrecommendation locality. ForM values between zero and one,LARS* consistently achieves higher quality improvement overLARS as LARS* maintains moreα-Cells at more granularlevels of the pyramid structure.

7.4 Storage Vs. Locality

Figure 10 depicts the impact of varyingM on both the storageand locality in LARS* using the synthetic data set. We plot

Page 14: TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, … · ings motivate an interesting new paradigm of location-aware recommendations, whereby the recommender system exploits the spatial

TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 6, NO. 1, NOVEMBER 2012 14

LARS*-M=0 and LARS*-M=1 as constants to delineate theextreme values ofM, i.e.,M=0 mirrors traditional collabora-tive filtering, whileM=1 forces LARS* to employ a completepyramid. Our metric for locality islocality loss (defined inSection 4.5.2) when compared to a complete pyramid (i.e.,M=1). LARS*-M=0 requires the lowest storage overhead, butexhibits the highest locality loss, while LARS*-M=1 exhibitsno locality loss but requires the most storage. For LARS*, in-creasingM results in increased storage overhead since LARS*favors switching cells toα-Cells, requiring the maintenance ofmore pyramid cells each with its own collaborative filteringmodel. Each additionalα-Cell incurs a high storage overheadover the original data size as an additional collaborativefiltering model needs to be maintained. Meanwhile, increasingM results in smaller locality loss as LARS* merges lessand maintains more localized cells. The most drastic drop inlocality loss is between 0 and 0.3, which is why we choseM=0.3 as a default. LARS* leads to smaller locality loss(≈26% less) than LARS because LARS* maintainsα-Cellsbelow β-Cells which result in higher locality gain. On theother hand, LARS* exhibits slightly higher storage cost (≈5%more storage) than LARS due to the fact that LARS* storesthe Item Ratings Statistics Tableper eachα-Cell andβ-Cell.

7.5 Scalability

Figure 11 depicts the storage and aggregate maintenanceoverhead required for an increasing number of ratings usingthe synthetic data set. We again plot LARS*-M=0 and LARS*-M=1 to indicate the extreme cases for LARS*. Figure 11(a)depicts the impact of increasing the number of ratings from10K to 500K on storage overhead. LARS*-M=0 requires thelowest amount of storage since it only maintains a singlecollaborative filtering model. LARS*-M=1 requires the highestamount of storage since it requires storage of a collaborativefiltering model for all cells (in all levels) of a completepyramid. The storage requirement of LARS* is in between thetwo extremes since it merges cells to save storage. Figure 11(b)depicts the cumulative computational overhead necessary tomaintain the adaptive pyramid initially populated with 100Kratings, then updated with 200K ratings (increments of 50Kreported). The trend is similar to the storage experiment, whereLARS* exhibits better performance than LARS*-M=1 due toswitching some cells fromα-Cells toβ-Cells. Though LARS*-M=0 has the best performance in terms of maintenance andstorage overhead, previous experiments show that it has un-acceptable drawbacks in quality/locality. Compare to LARS,LARS* has less maintenance overhead (≈38% less) due tothe fact that the maintenance algorithm in LARS* avoids theexpensive speculative splitting used by LARS.

7.6 Query Processing Performance

Figure 12 depicts snapshot and continuous query process-ing performance of LARS, LARS*, LARS*-U (LARS* withonly user partitioning), LARS*-T (LARS* with only travelpenalty), CF (traditional collaborative filtering), and LARS*-M=1 (LARS* with a complete pyramid), using the syntheticdata set.

0

1

2

3

4

5

6

7

8

10 50 100 200 500

Sto

rag

e (

GB

)

Number of Ratings (* 1K)

LARS*-M=0

LARS*-M=1

LARS

LARS*

(a) Storage

0

2

4

6

8

10

12

14

16

18

10 50 100 150 200

Ag

gre

ga

te M

ain

t T

ime

(*

1K

se

c)

Number of Ratings So Far (* 1K)

LARS*-M=0

LARS*-M=1

LARS

LARS*

(b) Maintenance

Fig. 11. Scalability of the adaptive pyramid (Synthetic data)

20

40

60

80

100

120

10 50 100 200 500

Re

sp

on

se

Tim

e (

ms

)

Number of Ratings (* 1K)

LARS*-M=0

LARS*-M=1

LARS

LARS*

LARS*-T

LARS*-U

(a) Snapshot Queries

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0.6 3 6.2 9.3 12.4 15.5 18.6

Ag

gre

ga

te R

es

po

ns

e T

ime

(s

ec

)

Travel Distance (miles)

LARS*-M=0

LARS*-M=1

LARS

LARS*

LARS*-T

LARS*-U

(b) Continuous Queries

Fig. 12. Query Processing Performance (Synthetic data).

Snapshot queries.Figure 12(a) gives the effect of variousnumber of ratings (10K to 500K) on the average snapshotquery performance averaged over 500 queries posed at randomlocations. LARS* and LARS*-M=1 consistently outperformall other techniques; LARS*-M=1 is slightly better due to rec-ommendations always being produced from the smallest (i.e.,most localized) CF models. The performance gap betweenLARS* and LARS*-U (and CF and LARS*-T) shows thatemploying thetravel penaltytechnique with early terminationleads to better query response time. Similarly, the performancegap between LARS* and LARS*-T shows that employinguser partitioning technique with its localized (i.e., smaller)collaborative filtering model also benefits query processing.LARS* performance is slightly better than LARS as LARS*sometimes maintains more localized CF models than LARSwhich incurs less query processing time.Continuous queries. Figure 12(b) provides the continuousquery processing performance of the LARS* variants by re-porting the aggregate response time of 500 continuous queries.A continuous query is issued once by a useru to get an initialanswer, then the answer is continuously updated asu moves.We report the aggregate response time when varying the traveldistance ofu from 1 to 30 miles using a random walk overthe spatial area covered by the pyramid. CF has a constantquery response time for all travel distances, as it requiresnoupdates since only a single cell is present. However, since CFis unaware of user location change, the consequence is poorrecommendation quality (per experiments from Section 7.1).LARS*-M=1 exhibits the worse performance, as it maintainsall cells on all levels and updates the continuous querywhenever the user crosses pyramid cell boundaries. LARS*-Uhas a lower response time than LARS*-M=1 due to switchingcells fromα-Cells toβ-Cells: when a cell is not present on agiven influence level, the query is transferred to its next highestancestor in the pyramid. Since cells higher in the pyramid

Page 15: TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, … · ings motivate an interesting new paradigm of location-aware recommendations, whereby the recommender system exploits the spatial

TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 6, NO. 1, NOVEMBER 2012 15

cover larger spatial regions, query updates occur less often.LARS*-T exhibits slightly higher query processing overheadcompared to LARS*-U: even though LARS*-T employs theearly termination algorithm, it uses a large (system-wide)collaborative filtering model to (re)generate recommendationsonce users cross boundaries in the penalty grid. LARS*exhibits a better aggregate response time since it employs theearly termination algorithm using a localized (i.e., smaller)collaborative filtering model to produce results while alsoswitching cells toβ-Cells to reduce update frequency. LARShas a slightly better performance than LARS* as LARS tendsto merge more cells at higher levels in the pyramid structure.

8 RELATED WORK

Location-based services. Current location-based services em-ploy two main methods to provide interesting destinations tousers. (1) KNN techniques [22] and variants (e.g., aggregateKNN [24]) simply retrieve thek objects nearest to a user andare completely removed from any notion of userpersonaliza-tion. (2) Preference methods such as skylines [25] (and spatialvariants [26]) and location-based top-k methods [27] requireusers to expressexplicit preference constraints. Conversely,LARS* is the first location-based service to considerimplicitpreferences by using location-based ratings to help usersdiscover new items.

Recent research has proposed the problem of hyper-localplace ranking [28]. Given a user location and query string(e.g., “French restaurant”), hyper-local ranking provides a listof top-k points of interest influenced by previously loggeddirectional queries (e.g., map direction searches from pointA to point B). While similar in spirit to LARS*, hyper-localranking is fundamentally different from our work as it doesnot personalizeanswers to the querying user, i.e., two usersissuing the same search term from the same location willreceive exactly the same ranked answer.Traditional recommenders. A wide array of techniquesare capable of producing recommendations using non-spatialratings for non-spatial items represented as the triple (user,rating, item) (see [4] for a comprehensive survey). We referto these as “traditional” recommendation techniques. Theclosest these approaches come to considering location is byincorporating contextual attributes into statistical recommen-dation models (e.g., weather, traffic to a destination) [29].However, no traditional approach has studied explicit location-based ratings as done in LARS*. Some existing commercialapplications make cursory use of location when proposinginteresting items to users. For instance, Netflix displays a“local favorites” list containing popular movies for a user’sgiven city. However, these movies arenot personalized toeach user (e.g., using recommendation techniques); rather,this list is built using aggregate rental data for a particularcity [30]. LARS*, on the other hand, produces personalizedrecommendations influenced by location-based ratings and aquery location.Location-aware recommenders. The CityVoyager sys-tem [31] mines a user’s personal GPS trajectory data todetermine her preferred shopping sites, and provides recom-mendation based on where the system predicts the user is

likely to go in the future.LARS*, conversely, does not attemptto predict future user movement, as it produces recommenda-tions influenced by user and/or item locations embedded incommunity ratings.

The spatial activity recommendation system [32] mines GPStrajectory data with embedded user-provided tags in order todetect interesting activities located in a city (e.g., art exhibitsand dining near downtown). It uses this data to answer twoquery types: (a) given an activity type, return where in thecity this activity is happening, and (b) given an explicit spatialregion, provide the activities available in this region. This is avastly different problem than we study in this paper. LARS*does not mine activities from GPS data for use as suggestionsfor a given spatial region. Rather, we apply LARS* to amore traditional recommendation problem that uses commu-nity opinion histories to produce recommendations.

Geo-measured friend-based collaborative filtering [33] pro-duces recommendations by using only ratings that are froma querying user’s social-network friends that live in the samecity. This technique only addresses user location embeddedinratings. LARS*, on the other hand, addresses three possibletypes of location-based ratings. More importantly, LARS* isa complete system (not just a recommendation technique)that employs efficiency and scalability techniques (e.g., par-tial pyramid structure, early query termination) necessary fordeployment in actual large-scale applications.

9 CONCLUSION

LARS*, our proposed location-aware recommender system,tackles a problem untouched by traditional recommender sys-tems by dealing with three types of location-based ratings:spatial ratings for non-spatial items, non-spatial ratings forspatial items, and spatial ratings for spatial items. LARS*employs user partitioning and travel penalty techniques tosupport spatial ratings and spatial items, respectively. Bothtechniques can be applied separately or in concert to supportthe various types of location-based ratings. Experimentalanal-ysis using real and synthetic data sets show that LARS* is ef-ficient, scalable, and provides better quality recommendationsthan techniques used in traditional recommender systems.

REFERENCES

[1] G. Linden et al, “Amazon.com Recommendations: Item-to-Item Collab-orative Filtering,” IEEE Internet Computing, vol. 7, no. 1, pp. 76–80,2003.

[2] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl, “Grou-pLens: An Open Architecture for Collaborative Filtering ofNetnews,”in CSWC, 1994.

[3] “The Facebook Blog, ”Facebook Places”: http://tinyurl.com/3aetfs3.”[4] G. Adomavicius and A. Tuzhilin, “Toward the Next Generation of

Recommender Systems: A Survey of the State-of-the-Art and PossibleExtensions,”IEEE Transactions on Knowledge and Data Engineering,TKDE, vol. 17, no. 6, pp. 734–749, 2005.

[5] “MovieLens: http://www.movielens.org/.”[6] “Foursquare: http://foursquare.com.”[7] “New York Times - A Peek Into Netflix Queues:

http://www.nytimes.com/interactive/2010/01/10/nyregion/20100110-netflix-map.html.”

[8] J. J. Levandoski, M. Sarwat, A. Eldawy, and M. F. Mokbel, “LARS: ALocation-Aware Recommender System,” inProceedings of the Interna-tional Conference on Data Engineering, ICDE, 2012.

Page 16: TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, … · ings motivate an interesting new paradigm of location-aware recommendations, whereby the recommender system exploits the spatial

TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 6, NO. 1, NOVEMBER 2012 16

[9] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Item-Based Collab-orative Filtering Recommendation Algorithms,” inProceedings of theInternational World Wide Web Conference, WWW, 2001.

[10] J. S. Breese, D. Heckerman, and C. Kadie, “Empirical Analysis ofPredictive Algorithms for Collaborative Filtering,” inProceedings of theConference on Uncertainty in Artificial Intelligence, UAI, 1998.

[11] W. G. Aref and H. Samet, “Efficient Processing of Window Queries inthe Pyramid Data Structure,” inProceedings of the ACM Symposium onPrinciples of Database Systems, PODS, 1990.

[12] R. A. Finkel and J. L. Bentley, “Quad trees: A data structure for retrievalon composite keys,”Acta Inf., vol. 4, pp. 1–9, 1974.

[13] A. Guttman, “R-trees: A dynamic index structure for spatial searching,”in Proceedings of the ACM International Conference on Managementof Data, SIGMOD, 1984.

[14] K. Mouratidis, S. Bakiras, and D. Papadias, “Continuous monitoring ofspatial queries in wireless broadcast environments,”IEEE Transactionson Mobile Computing, TMC, vol. 8, no. 10, pp. 1297–1311, 2009.

[15] K. Mouratidis and D. Papadias, “Continuous nearest neighbor queriesover sliding windows,” IEEE Transactions on Knowledge and DataEngineering, TKDE, vol. 19, no. 6, pp. 789–803, 2007.

[16] M. F. Mokbel et al, “SINA: Scalable Incremental Processing of Contin-uous Queries in Spatiotemporal Databases,” inProceedings of the ACMInternational Conference on Management of Data, SIGMOD, 2004.

[17] J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl, “EvaluatingCollaborative Filtering Recommender Systems,”ACM Transactions onInformation Systems, TOIS, vol. 22, no. 1, pp. 5–53, 2004.

[18] M. J. Carey et al, “On saying ”Enough Already!” in SQL,” in Proceed-ings of the ACM International Conference on Management of Data,SIGMOD, 1997.

[19] S. Chaudhuri et al, “Evaluating Top-K Selection Queries,” in Proceed-ings of the International Conference on Very Large Data Bases, VLDB,1999.

[20] R. Fagin, A. Lotem, and M. Naor, “Optimal Aggregation Algorithmsfor Middleware,” in Proceedings of the ACM Symposium on Principlesof Database Systems, PODS, 2001.

[21] J. Bao, C.-Y. Chow, M. F. Mokbel, and W.-S. Ku, “Efficientevaluationof k-range nearest neighbor queries in road networks,” inProceedingsof the International Conference on Mobile Data Management,MDM,2010.

[22] G. R. Hjaltason and H. Samet, “Distance Browsing in SpatialDatabases,”ACM Transactions on Database Systems, TODS, vol. 24,no. 2, pp. 265–318, 1999.

[23] K. Mouratidis, M. L. Yiu, D. Papadias, and N. Mamoulis, “Continuousnearest neighbor monitoring in road networks,” inProceedings of theInternational Conference on Very Large Data Bases, VLDB, 2006.

[24] D. Papadias, Y. Tao, K. Mouratidis, and C. K. Hui, “Aggregate NearestNeighbor Queries in Spatial Databases,”ACM Transactions on DatabaseSystems, TODS, vol. 30, no. 2, pp. 529–576, 2005.

[25] S. Borzsonyi et al, “The Skyline Operator,” inProceedings of theInternational Conference on Data Engineering, ICDE, 2001.

[26] M. Sharifzadeh and C. Shahabi, “The Spatial Skyline Queries,” inProceedings of the International Conference on Very Large Data Bases,VLDB, 2006.

[27] N. Bruno, L. Gravano, and A. Marian, “Evaluating Top-k Queriesover Web-Accessible Databases,” inProceedings of the InternationalConference on Data Engineering, ICDE, 2002.

[28] P. Venetis, H. Gonzalez, C. S. Jensen, and A. Y. Halevy, “Hyper-Local,Directions-Based Ranking of Places,”PVLDB, vol. 4, no. 5, pp. 290–301, 2011.

[29] M.-H. Park et al, “Location-Based Recommendation System UsingBayesian User’s Preference Model in Mobile Devices,” inProceedings ofthe International Conference on Ubiquitous Intelligence and Computing,UIC, 2007.

[30] “Netflix News and Info - Local Favorites: http://tinyurl.com/4qt8ujo.”[31] Y. Takeuchi and M. Sugimoto, “An Outdoor Recommendation System

based on User Location History,” inProceedings of the InternationalConference on Ubiquitous Intelligence and Computing, UIC, 2006.

[32] V. W. Zheng, Y. Zheng, X. Xie, and Q. Yang, “Collaborative Locationand Activity Recommendations with GPS History Data,” inProceedingsof the International World Wide Web Conference, WWW, 2010.

[33] M. Ye, P. Yin, and W.-C. Lee, “Location Recommendation for Location-based Social Networks,” inProceedings of the ACM Symposium onAdvances in Geographic Information Systems, ACM GIS, 2010.

Mohamed Sarwat is a doctoral candidate atthe Department of Computer Science and En-gineering, University of Minnesota. He obtainedhis Bachelor’s degree in computer engineeringfrom Cairo University in 2007 and his Master’sdegree in computer science from University ofMinnesota in 2011. His research interest lies inthe broad area of data management systems.More specifically, some of his interests includedatabase systems (i.e., query processing andoptimization, data indexing), database support

for recommender systems, personalized databases, database supportfor location-based services, database support for social networkingapplications, distributed graph databases, and large scale data manage-ment. Mohamed has been awarded the University of Minnesota DoctoralDissertation Fellowship in 2012. His research work has been recognizedby the Best Research Paper Award in the 12th international symposiumon spatial and temporal databases 2011.

Justin J. Levandoski is a researcher in thedatabase group at Microsoft Research. Justinreceived his Bachelor’s degree at Carleton Col-lege, MN, USA in 2003, and his Master’s andPhD degrees at the University of Minnesota, MN,USA in 2008 and 2011, repecitvely. His researchlies in a broad range of topics dealing withlarge-scale data management systems. Morespecifically, some of his interests include cloudcomputing, database support for new hardwareparadigms, transaction processing, query pro-

cessing, and support for new data-intensive applications such as so-cial/recommender systems.

Ahmed Eldawy is a PhD student at the De-partment of Computer Science and Engineering,the University of Minnesota. His main researchinterests are spatial data management, socialnetworks and cloud computing. More specifi-cally, his research focuses on building scalablespatial data management systems over cloudcomputing platforms. Ahmed received his Bach-elor’s and Master’s degrees in Computer Sci-ence from Alexandria University in 2005 and2010, respectively.

Mohamed F. Mokbel (Ph.D., Purdue University,2005, MS, B.Sc., Alexandria University, 1999,1996) is an associate professor in the Depart-ment of Computer Science and Engineering,University of Minnesota. His main research inter-ests focus on advancing the state of the art in thedesign and implementation of database enginesto cope with the requirements of emerging ap-plications (e.g., location-based applications andsensor networks). His research work has beenrecognized by three best paper awards at IEEE

MASS 2008, MDM 2009, and SSTD 2011. Mohamed is a recipientof the NSF CAREER award 2010. Mohamed has actively participatedin several program committees for major conferences including ICDE,SIGMOD, VLDB, SSTD, and ACM GIS. He is/was a program co-chairfor ACM SIGSPATIAL GIS 2008, 2009, and 2010. Mohamed is an ACMand IEEE member and a founding member of ACM SIGSPATIAL.