YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: 13 sdm-blda-slides

It Is Not Just What We Say, But How We Say Them:Joint Behaviour-Topic Modelling

Minghui QIU, Feida ZHU and Jing JIANG

Singapore Management University

Page 2: 13 sdm-blda-slides

Microblogs

• Rich user interactions with textual information (posting behaviors)

RETWEET

REPLY

POST

MENTION

Why do we need to consider user behaviors?

2

Page 3: 13 sdm-blda-slides

Observation 1: users with similar topics of interest can have different behavioral patterns

• Users who are interested in `politics’ topic

3

Different behaviors people exhibit in Twitter suggest different motivations using the platform.

Page 4: 13 sdm-blda-slides

• Top 5 users who frequently post tweets about the topic `politics’

4

Observation 2: user clusters with distinct behavioral patterns usually represent different user profiles

Official news media accounts

Page 5: 13 sdm-blda-slides

IT Is Not Just What We Say, But How we Say Them

• The way people interact with text is critical in understanding user behavior patterns and modeling user interest in social networks

• To joint model the topic interests and interactions of a user with the topic in Microbloggs like Twitter

5

Page 6: 13 sdm-blda-slides

Outline

• Topic Modeling in Twitter• Joint behavior-topic model• Applications and Empirical Results

– Topic analysis– User clustering– Followee Recommendation

• Summary

Page 7: 13 sdm-blda-slides

Topic Modeling in Twitter

• Twitter– 140 character limit– Noisy tweets

• Comparison between LDA and Twitter-LDA [Zhao et al., ECIR’10]

LDA T-LDA (Twitter-LDA)

Document All tweets of a given Twitter user

Words Words in user’s tweets

Topic assignment Each word has a topic Each tweet has a topic

Word pools Topical words Topical words or background words

To extend T-LDA to jointly modelthe topic interests and interactions of a user.

Page 8: 13 sdm-blda-slides

LDA-based Behaviour-Topic Modelling – B-LDA

8

• U: # of users• N: # of tweets • L: # of words• z: a topic label• y: a switch

U

L

w

N

b

user’s topic distribution θ z

topic’s behavior distribution

𝜓T

background word distribution.

𝜙 ′𝜑

y

topic’s word

distributionT𝜙

𝜂

𝛽𝛽 ′γ𝛼background word

distribution.

Page 9: 13 sdm-blda-slides

Outline

• Topic Modeling in Twitter• Joint behavior-topic model• Applications and Empirical Results

– Topic analysis– User clustering– Followee Recommendation

• Summary

Page 10: 13 sdm-blda-slides

Data sets

• Base data set– 151,055 twitter users in Singapore and their tweets

• Our data set– Randomly selected 5000 users, among whom 1000 are further

selected to obtain their followees, totally 9688 users– Tweets from Sep 1, 2011 to Nov 30.2011– Total tweets: 11,882,441 tweets

• Preprocess– Remove stop words– Remove words with non-standard characters (url, emoticon etc.)

• Parameters setting (LDA, Twitter-LDA, B-LDA):

– # of topics: T = 80– α = 50/T, β = 0.01

10

Page 11: 13 sdm-blda-slides

Topic Analysis

• Whether the resulting topics in B-LDA has some dominant behaviors?

• Entropy on topic’s behavior distribution

– B-LDA: p(b|t) could be learnt– LDA and T-LDA:

– C(t,b): # of times topic t co-occurs with behavior b– δ: normalization factor

11

Page 12: 13 sdm-blda-slides

Topic Analysis

• Whether the resulting topics in B-LDA has some dominant behaviors?

– Low entropy means the topic is with dominant behaviors– B-LDA: topic is enhanced by dominant behavior patterns

12

Page 13: 13 sdm-blda-slides

• Topics of distinct behavior patterns

Topic Analysis

POST

REPLY

RETWEET

13

Topics that are similar but

with different behaviors

Page 14: 13 sdm-blda-slides

• Topics in T-LDA would be split by different behavioral patterns in B-LDA

– 15 topic groups each with two topics– 1 topic group with three topics

Topics Analysis

14

T-LDA B-LDA

1

2

T

…1

2

T

Distance:KL-divergence

Topic group

Page 15: 13 sdm-blda-slides

• Topics split by different behavioral patterns

Topics Analysis

15

Topics in B-LDA are with more distinct behavioral pattern than those in T-LDA

Topic 16 is mainly contributed by new media accounts, but topic 13 is not.Topic 61 is a retweet topic and contains more words with hashtags.

Page 16: 13 sdm-blda-slides

Applications – followee recommendation

• Followee recommendation– User profile: user’s or user’s followees’ textual content– Does not consider behavior patterns

• Behavior-matters– People who use Twitter as instant massager: follow users

who they may interact with– People who use Twitter as information source and news

feeds: follow official new media channels.– Twitter - news media or social network [Kwak et al., WWW’10]

• Definition: users who cares about the behavioral patterns of their followees, explicitly or implicitly, are “behavior-driven followers”. 16

Page 17: 13 sdm-blda-slides

Applications – followee recommendation

• Finding behavior-driven followers– A behavior-driven follower’s followees will naturally form a

small number of clusters within each of which the followees would share similar behavioral patterns.

– k-nearest-neighbor distance

• S: a given space, U: a set of users, : user v’s k-nearest-neighbors

– Behavior-driven index

• ST: the topic space, SB: the joint behavior-topic space, Fu: followees of u

• Behavior-driven follower has a large βK 17

Page 18: 13 sdm-blda-slides

Applications – followee recommendation

• Definition– βK ≥ τ : behavior-driven follower

– βK < τ : topic-driven follower

• Behavior-driven index

– K = 1, topic space: LDA, joint behavior-topic space: B-LDA– Half of users are to some extent behavior-driven

18

Page 19: 13 sdm-blda-slides

Applications – followee recommendation

• Followee recommendation approach [Chen et al., WWW’09]

– For a target user u, we randomly pick one followee from u’s current followee set, and then combine her with another m randomly-selected non-followees.

– For these m + 1 users, any recommendation algorithm would generate a ranking of them in descending order.

– The performance is measured by examining how high the real followee is ranked.

19

Page 20: 13 sdm-blda-slides

Applications – followee recommendation

• Followee recommendation approach [Chen et al., WWW’09]

20

Page 21: 13 sdm-blda-slides

• Evaluation– Rank of the real followee

– Mean reciprocal rank

Applications – followee recommendation

Page 22: 13 sdm-blda-slides

• Evaluation

– Smaller neighbourhood size K has better results– BLDA and TLDA ranks real followees higher than LDA with a smaller

deviation than LDA– Adding behaviours to topic modelling help the task: BLDA > TLDA– LDA: better MRR but low average rank: LDA is not robust and performs

particular well or worse on some set of users 22

Applications – followee recommendation

Page 23: 13 sdm-blda-slides

• Study on behavior-driven index

– Correlation between DKNN and Rank of the real followee

– Correlation between βK and relative rank rLDA/rBLDA

– β1 will be used judge whether a given user is behavior-driven or topic driven follower

Applications – followee recommendation

Page 24: 13 sdm-blda-slides

• Topic-driven follower vs. Behavior-driven follower

• Results on behavior-driven follower

24

Applications – followee recommendation

BLDA significantly performs better than LDA on behavior-driven followees.

Page 25: 13 sdm-blda-slides

• A combined followee recommendation method (comModel)– Using behavior-driven index to choose model

• Model selection

Applications – followee recommendation

Page 26: 13 sdm-blda-slides

• Comparisons of comModel, B-LDA and LDA– Rank of the real followee and MRR

– Cummulative distribution of ranks (CDR) for real followees

Applications – followee recommendation

Page 27: 13 sdm-blda-slides

Summary

• We propose B-LDA - a Behaviour-integrated topic model based on LDA

• Comparison B-LDA with LDA and Twitter-LDA– Experiment results show B-LDA can find topics with dominant

behaviours– We propose an index βK to characterize users who are behaviour-

driven followers, and demonstrate that B-LDA significantly outperforms other models on followee recommendation for behaviour-driven followers.

– Based on the βK index, we propose a new recommendation framework combining B-LDA and LDA which gives promising recommendations.

27

Page 28: 13 sdm-blda-slides

• Thanks!

28

Page 29: 13 sdm-blda-slides

Reference• [Zhao et al., ECIR’10] W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan,

and X. Li, “Comparing twitter and traditional media using topic models,” ser. ECIR, 2011, pp. 338–349

• [Kwak et al., WWW’10] H. Kwak, C. Lee, H. Park, and S. Moon, “What is Twitter, a social network or a news media?” in WWW, 2010, pp. 591–600.

• W.-Y. Chen, J.-C. Chu, J. Luan, H. Bai, Y. Wang, and E. Y. Chang, “Collaborative filtering for orkut communities: discovery of user latent behavior,” ser. WWW, 2009, pp. 681–690.


Related Documents