Mining User Similarity Based on Location History Yu Zheng, Quannan Li, Xing Xie Microsoft Research Asia.

Mining User Similarity Based on Location History

Yu Zheng, Quannan Li, Xing XieMicrosoft Research Asia

Outline

• Introduction• Architecture– Modeling Location History– Measuring User Similarity

• Experimental Results• Conclusion

Introduction (1)

• Goals– Inferring the similarity (correlations ) between users from their location histories– Enable friend recommendation Personalized location recommendation

• Motivation– The increasing availability of user-generated trajectories

• Life logging, Travel experience sharing • Sports activity analysis, Multimedia content management,…

– People’s outdoor movements in the real world imply their interests• Like sports: if frequently visit gyms and stadiums• Like Travel: if usually access mountains and lakes

– According to the first law of the geography• Everything is related to everything else, but near things are more related than distant things.

• People with similar location histories might share similar interests and preferences.– Significance of user similarity in Web communities

• Generally, it help users find more relevant information from a large-scale dataset• In GIS community: friend discovering and location recommendation

Introduction (2)

• Difficulty & Challenges– How to model different users’ location history uniformly

• Various users’ location histories are inconsistent and incomparable • What’s a shared location? By distance ?? X

– How to measure the similarity between users• By counting the number of shared locations ??• The Pearson correlation and the cosine correlation ??• They do not take into account two important properties of people’s outdoor movements.

• Contribution and insights– A step towards integrating social networking into GIS– A hierarchical-graph

• Uniformly modeling different users’ location histories on a various scales of geo-spaces

– A similarity measure considering• Sequence property of users’ movement behavior• Hierarchy property of geographic spaces

Preliminary

• GPS logs P and GPS trajectory• Stay points S={s1, s2,…, sn}.

– Stands for a geo-region where a user has stayed for a while– E.g., if a user spent more 20 minutes within a distance of 200 meters– Carry a semantic meaning beyond a raw GPS point

• Location history: – represented by a sequence of stay points– with transition intervals

p4

p3

p5

p6

p7

A Stay Point S

p1

p2

Latitude, Longitude, Time

p1: Lat1, Lngt1, T1

p2: Lat2, Lngt2, T2

………...pn: Latn, Lngtn, Tn

𝐿𝑜𝑐𝐻= (𝑠1 ∆𝑡1ሱሮ 𝑠2 ∆𝑡2

ሱሮ ,…,∆𝑡𝑛−1ሱۛ ۛ ሮ 𝑠𝑛)

Architecture (1)

Modeling Location History

Measuring Similarity

GPS Logs of User 1

GPS Logs of User 2

GPS Logs of User n

GPS Logs of User i

GPS Logs of User i+1

GPS Logs of User n-1

l1

G3

G1

G2

c30

c31

c32c33

c34

c20

c21 l2

l3

l1

G3

G1

G2

c30

c31

c32c33

c34

c20

c21 l2

l3

G3

G1

G2

c32 c33

c34c30

l1

c20

c21 l2

l3

A similarity score Sij for each pair of users

A Hierarchical Graph for each individual

G3

c33

2Traj

c31

c32

c34

c20

c21

G1

G2

l1

l2

l3

Modeling Location History (1)

1. Stay point detection2. Hierarchical clustering3. Individual graph building


GPS Logs of User 1

GPS Logs of User 2

GPS Logs of User n

GPS Logs of User i



l1

G3

G1

G2

c30

c31

c32c33

c34

c20

c21 l2

l3

l1

G3

G1

G2

c30

c31

c32c33

c34

c20

c21 l2

l3

G3

G1

G2

c32 c33

c34c30

l1

c20

c21 l2

l3



G3

c33

2Traj

c31

c32

c34

c20

c21

G1

G2

l1

l2

l3


Layer 1

Layer 2

Layer 3

G3

G1

G2

a

e

c

A

B

3. Individual graph building

Modeling Location History (2)GPS Logs of

User 1GPS Logs of

User 2GPS Logs of

User nGPS Logs of

User iGPS Logs of

User i+1GPS Logs of

User n-1

Stands for a stay point SStands for a stay point cluster cij

{C }High

Low

Shared Hierarchical Framework

c10

c20 c21

c30 c31 c32 c33 c34

Layer 1

Layer 2

Layer 3

G3

G1

G2High

Low

a bd

e

A

B

GPS Logs of User 1

GPS Logs of User 2

1. Stay point detection

2. Hierarchical clustering

Measuring User Similarity (1)

1. Sequence Extraction2. Sequence Matching3. Similarity Score Calculating


GPS Logs of User 1

GPS Logs of User 2

GPS Logs of User n

GPS Logs of User i



l1

G3

G1

G2

c30

c31

c32c33

c34

c20

c21 l2

l3

l1

G3

G1

G2

c30

c31

c32c33

c34

c20

c21 l2

l3

G3

G1

G2

c32 c33

c34c30

l1

c20

c21 l2

l3



G3

c33

2Traj

c31

c32

c34

c20

c21

G1

G2

l1

l2

l3


Measuring Similarity (2)• Similar sequence Extraction

,

,

G3

G1

G2

User n’ s personal hierarchical graph

c32 c33

c34c30

l1

c20

c21 l2

l3

l1

G3

G1

G2

User 1's hierarchical graph

c30

c31

c32

c34

c20

c21 l2

l3

1Traj

1HG

G3

c30

c31

c32

c34

l3

G3

c33

c31

c32

c34

l3

c31 c33

c32

time

c30

11s1

2s13s

14s

15s

16s

17s

18s

time

c31 c33

c34

c32

25s

26s

27s

28s

21s

22s

23s2

4s

𝑠𝑒𝑞31 = 𝑐32(1) →𝑐31(1) → 𝑐33(2) →𝑐32(2) →𝑐33(1) →𝑐32(1),

𝑠𝑒𝑞32 = 𝑐31(1) →𝑐33(1) → 𝑐32(1) →𝑐31(2) →𝑐32(1) →𝑐31(1),

𝑠𝑒𝑞31 = 𝑐32(1) →𝑐31(1) → 𝑐33(2) →𝑐32(2) →𝑐33(1) →𝑐32(1),

𝑠𝑒𝑞32 = 𝑐31(1) →𝑐33(1) → 𝑐32(1) →𝑐31(2) →𝑐32(1) →𝑐31(1),

Measuring Similarity (3)• Sequence matching

– We aim to find out the maximum-length similar sequence– A pair of similar sequence: two individuals share the property of visiting the

same sequence of places with a similar time interval

A C B

u1

B(1)à A(1)à C(2)à B(2)

A(1)à C(1)à B(1)à A(2)

A C A

8h 6 h

7 h 14 hu2

B

B

5 h

6.5h

AC

ABC √

Same visiting order: ai == bi

Similar transition time:ห∆𝑡𝑗 − ∆𝑡𝑗′หmax(∆𝑡𝑗,∆𝑡𝑗′) ≤ 𝑝

AB

BA X

X

Measuring Similarity (4)

• Similarity Calculating– Two factors

• The length of the matched similar sequence• The level of the matched similar sequence

– Calculation

, ,

𝑠(𝑚) = 𝛼(𝑚) minሺk𝑖,k𝑖′ሻ (2)𝑚𝑖=1 1. Calculating similarity score for each

sequence (weighted by its length)

𝑆𝑙 = 1𝑁1 ∗𝑁2 𝑠𝑖𝑛𝑖=1 2. Adding up similarity score of each

sequence found on a level 𝑆𝑜𝑣𝑒𝑟𝑎𝑙 = 𝛽𝑙𝑆𝑙𝐻𝑙=1 3. Weighted Summing up the score

of multiple levels

Measuring Similarity (5)

Layer 1

Layer 2

Layer 3

G3

G1

G2High

Low

a b

e

c

A

B

Layer 1

Layer 2

Layer 3

G3

G1

G2High

Low

a bd

e

c

A

B

User 2: bd

User 1: A B

User 1: a c e

User 1: A BUser 3: A B

A B

c e

A B

User 1: a c e

User 2: A B

User 3: bc e

User 1: User3> User 2

Experiments (1)• GPS Devices and Users– 112 users collecting the data in the past year

16%

45%

30%

9%

age<=22 22<age<=25

26<=age<29 age>=30

18%14%

10%58%

Microsoft emplyeesEmployees of other companies Government staffColleage students

Experiments (2)• GPS dataset

– > 6 million GPS points– > 170,000 kilometers– 36 cities in China and a few city in the USA, Korea and Japan

Experiments (3)

Relevance level Relationships suggestion

4 Strongly similar Family members/intimate lovers/roommate

3 Similar Good friends/workmates/classmates

2 Weakly similar Ordinary friends, neighbors in a community

1 Different Strangers in the same city

0 Quite different Strangers in other cities

• Evaluation approach– Evaluated as an information retrieval problem – Ground truth: Users label the relationship with a ratings show in this Table

U1, U2, Ui, …, Un

U1, U2,

Ui

Un

3, 2,

2,

0...

...

3, 4,

0,

1

2, 0,

3,

1

0, 1,

2,

3

. . . .

Retrieve Similar Users

Top Ten Similar Users(U2, U3,…, U4)

Calculating nDCG and MAP

Relationship matrix

Get Ground Truth

( 4,3, 3, 2, 2, 1,…,0,0 )

G=(4, 3, 2, 3,0,1,…,0,0 )

A query user

Experiments (4)

• Comparing with baselines– The Pearson Correlation– Cosine Similarity

0.72

0.76

0.8

0.84

0.88

0.92

0.96

MA

P

Methods

Experiments (5)

• NDCG comparison

0.780.8

0.820.840.860.88

0.90.920.94

Methods

nDCG@ 5

nDCG@10

Conclusion

• A hierarchical graph – A uniform framework to measure various users’ location histories– Effectively modeling users’ outdoor movements

• Sequentially • Hierarchically

• Our similarity measurement outperformed existing methods– The Person measurement and – Cosine similarity measurement– Hierarchy + Sequence achieved the best performance

Thanks!

Microsoft Research [email protected]

mailto:[email protected]

Mining User Similarity Based on Location History Yu Zheng, Quannan Li, Xing Xie Microsoft Research Asia.

Documents

location recommendation

b user

mining user similarity

individual slide

similarity correlations

similarity score sij

similar location histories

hierarchical clustering