Mining User Similarity Based on Location History Yu Zheng, Quannan Li, Xing Xie Microsoft Research Asia
Mar 31, 2015
Mining User Similarity Based on Location History
Yu Zheng, Quannan Li, Xing XieMicrosoft Research Asia
Outline
• Introduction• Architecture– Modeling Location History– Measuring User Similarity
• Experimental Results• Conclusion
Introduction (1)
• Goals– Inferring the similarity (correlations ) between users from their location histories– Enable friend recommendation Personalized location recommendation
• Motivation– The increasing availability of user-generated trajectories
• Life logging, Travel experience sharing • Sports activity analysis, Multimedia content management,…
– People’s outdoor movements in the real world imply their interests• Like sports: if frequently visit gyms and stadiums• Like Travel: if usually access mountains and lakes
– According to the first law of the geography• Everything is related to everything else, but near things are more related than distant things.
• People with similar location histories might share similar interests and preferences.– Significance of user similarity in Web communities
• Generally, it help users find more relevant information from a large-scale dataset• In GIS community: friend discovering and location recommendation
Introduction (2)
• Difficulty & Challenges– How to model different users’ location history uniformly
• Various users’ location histories are inconsistent and incomparable • What’s a shared location? By distance ?? X
– How to measure the similarity between users• By counting the number of shared locations ??• The Pearson correlation and the cosine correlation ??• They do not take into account two important properties of people’s outdoor movements.
• Contribution and insights– A step towards integrating social networking into GIS– A hierarchical-graph
• Uniformly modeling different users’ location histories on a various scales of geo-spaces
– A similarity measure considering• Sequence property of users’ movement behavior• Hierarchy property of geographic spaces
Preliminary
• GPS logs P and GPS trajectory• Stay points S={s1, s2,…, sn}.
– Stands for a geo-region where a user has stayed for a while– E.g., if a user spent more 20 minutes within a distance of 200 meters– Carry a semantic meaning beyond a raw GPS point
• Location history: – represented by a sequence of stay points– with transition intervals
p4
p3
p5
p6
p7
A Stay Point S
p1
p2
Latitude, Longitude, Time
p1: Lat1, Lngt1, T1
p2: Lat2, Lngt2, T2
………...pn: Latn, Lngtn, Tn
𝐿𝑜𝑐𝐻= (𝑠1 ∆𝑡1ሱሮ 𝑠2 ∆𝑡2
ሱሮ ,…,∆𝑡𝑛−1ሱۛ ۛ ሮ 𝑠𝑛)
Architecture (1)
Modeling Location History
Measuring Similarity
GPS Logs of User 1
GPS Logs of User 2
GPS Logs of User n
GPS Logs of User i
GPS Logs of User i+1
GPS Logs of User n-1
l1
G3
G1
G2
c30
c31
c32c33
c34
c20
c21 l2
l3
l1
G3
G1
G2
c30
c31
c32c33
c34
c20
c21 l2
l3
G3
G1
G2
c32 c33
c34c30
l1
c20
c21 l2
l3
A similarity score Sij for each pair of users
A Hierarchical Graph for each individual
G3
c33
2Traj
c31
c32
c34
c20
c21
G1
G2
l1
l2
l3
Modeling Location History (1)
1. Stay point detection2. Hierarchical clustering3. Individual graph building
Measuring Similarity
GPS Logs of User 1
GPS Logs of User 2
GPS Logs of User n
GPS Logs of User i
GPS Logs of User i+1
GPS Logs of User n-1
l1
G3
G1
G2
c30
c31
c32c33
c34
c20
c21 l2
l3
l1
G3
G1
G2
c30
c31
c32c33
c34
c20
c21 l2
l3
G3
G1
G2
c32 c33
c34c30
l1
c20
c21 l2
l3
A similarity score Sij for each pair of users
A Hierarchical Graph for each individual
G3
c33
2Traj
c31
c32
c34
c20
c21
G1
G2
l1
l2
l3
Modeling Location History
Layer 1
Layer 2
Layer 3
G3
G1
G2
a
e
c
A
B
3. Individual graph building
Modeling Location History (2)GPS Logs of
User 1GPS Logs of
User 2GPS Logs of
User nGPS Logs of
User iGPS Logs of
User i+1GPS Logs of
User n-1
Stands for a stay point SStands for a stay point cluster cij
{C }High
Low
Shared Hierarchical Framework
c10
c20 c21
c30 c31 c32 c33 c34
Layer 1
Layer 2
Layer 3
G3
G1
G2High
Low
a bd
e
A
B
GPS Logs of User 1
GPS Logs of User 2
1. Stay point detection
2. Hierarchical clustering
Measuring User Similarity (1)
1. Sequence Extraction2. Sequence Matching3. Similarity Score Calculating
Measuring Similarity
GPS Logs of User 1
GPS Logs of User 2
GPS Logs of User n
GPS Logs of User i
GPS Logs of User i+1
GPS Logs of User n-1
l1
G3
G1
G2
c30
c31
c32c33
c34
c20
c21 l2
l3
l1
G3
G1
G2
c30
c31
c32c33
c34
c20
c21 l2
l3
G3
G1
G2
c32 c33
c34c30
l1
c20
c21 l2
l3
A similarity score Sij for each pair of users
A Hierarchical Graph for each individual
G3
c33
2Traj
c31
c32
c34
c20
c21
G1
G2
l1
l2
l3
Modeling Location History
Measuring Similarity (2)• Similar sequence Extraction
,
,
G3
G1
G2
User n’ s personal hierarchical graph
c32 c33
c34c30
l1
c20
c21 l2
l3
l1
G3
G1
G2
User 1's hierarchical graph
c30
c31
c32
c34
c20
c21 l2
l3
1Traj
1HG
G3
c30
c31
c32
c34
l3
G3
c33
c31
c32
c34
l3
c31 c33
c32
time
c30
11s1
2s13s
14s
15s
16s
17s
18s
time
c31 c33
c34
c32
25s
26s
27s
28s
21s
22s
23s2
4s
𝑠𝑒𝑞31 = 𝑐32(1) →𝑐31(1) → 𝑐33(2) →𝑐32(2) →𝑐33(1) →𝑐32(1),
𝑠𝑒𝑞32 = 𝑐31(1) →𝑐33(1) → 𝑐32(1) →𝑐31(2) →𝑐32(1) →𝑐31(1),
𝑠𝑒𝑞31 = 𝑐32(1) →𝑐31(1) → 𝑐33(2) →𝑐32(2) →𝑐33(1) →𝑐32(1),
𝑠𝑒𝑞32 = 𝑐31(1) →𝑐33(1) → 𝑐32(1) →𝑐31(2) →𝑐32(1) →𝑐31(1),
Measuring Similarity (3)• Sequence matching
– We aim to find out the maximum-length similar sequence– A pair of similar sequence: two individuals share the property of visiting the
same sequence of places with a similar time interval
A C B
u1
B(1)à A(1)à C(2)à B(2)
A(1)à C(1)à B(1)à A(2)
A C A
8h 6 h
7 h 14 hu2
B
B
5 h
6.5h
AC
ABC √
Same visiting order: ai == bi
Similar transition time:ห∆𝑡𝑗 − ∆𝑡𝑗′หmax(∆𝑡𝑗,∆𝑡𝑗′) ≤ 𝑝
AB
BA X
X
Measuring Similarity (4)
• Similarity Calculating– Two factors
• The length of the matched similar sequence• The level of the matched similar sequence
– Calculation
, ,
𝑠(𝑚) = 𝛼(𝑚) minሺk𝑖,k𝑖′ሻ (2)𝑚𝑖=1 1. Calculating similarity score for each
sequence (weighted by its length)
𝑆𝑙 = 1𝑁1 ∗𝑁2 𝑠𝑖𝑛𝑖=1 2. Adding up similarity score of each
sequence found on a level 𝑆𝑜𝑣𝑒𝑟𝑎𝑙 = 𝛽𝑙𝑆𝑙𝐻𝑙=1 3. Weighted Summing up the score
of multiple levels
Measuring Similarity (5)
Layer 1
Layer 2
Layer 3
G3
G1
G2High
Low
a b
e
c
A
B
Layer 1
Layer 2
Layer 3
G3
G1
G2High
Low
a bd
e
c
A
B
User 2: bd
User 1: A B
User 1: a c e
User 1: A BUser 3: A B
A B
c e
A B
User 1: a c e
User 2: A B
User 3: bc e
User 1: User3> User 2
Experiments (1)• GPS Devices and Users– 112 users collecting the data in the past year
16%
45%
30%
9%
age<=22 22<age<=25
26<=age<29 age>=30
18%14%
10%58%
Microsoft emplyeesEmployees of other companies Government staffColleage students
Experiments (2)• GPS dataset
– > 6 million GPS points– > 170,000 kilometers– 36 cities in China and a few city in the USA, Korea and Japan
Experiments (3)
Relevance level Relationships suggestion
4 Strongly similar Family members/intimate lovers/roommate
3 Similar Good friends/workmates/classmates
2 Weakly similar Ordinary friends, neighbors in a community
1 Different Strangers in the same city
0 Quite different Strangers in other cities
• Evaluation approach– Evaluated as an information retrieval problem – Ground truth: Users label the relationship with a ratings show in this Table
U1, U2, Ui, …, Un
U1, U2,
Ui
Un
3, 2,
2,
0...
...
3, 4,
0,
1
2, 0,
3,
1
0, 1,
2,
3
. . . .
Retrieve Similar Users
Top Ten Similar Users(U2, U3,…, U4)
Calculating nDCG and MAP
Relationship matrix
Get Ground Truth
( 4,3, 3, 2, 2, 1,…,0,0 )
G=(4, 3, 2, 3,0,1,…,0,0 )
A query user
Experiments (4)
• Comparing with baselines– The Pearson Correlation– Cosine Similarity
0.72
0.76
0.8
0.84
0.88
0.92
0.96
MA
P
Methods
Experiments (5)
• NDCG comparison
0.780.8
0.820.840.860.88
0.90.920.94
Methods
nDCG@ 5
nDCG@10
Conclusion
• A hierarchical graph – A uniform framework to measure various users’ location histories– Effectively modeling users’ outdoor movements
• Sequentially • Hierarchically
• Our similarity measurement outperformed existing methods– The Person measurement and – Cosine similarity measurement– Hierarchy + Sequence achieved the best performance