Top Banner
It Is Not Just What We Say, But How We Say Them: Joint Behaviour-Topic Modelling Minghui QIU, Feida ZHU and Jing JIANG Singapore Management University
29

13 sdm-blda-slides

Dec 15, 2014

Download

Technology

Minghui QIU

Textual information exchanged among users on online social network platforms provides deep understanding into users' interest and behavioral patterns. However, unlike traditional text-dominant settings such as online publishing, one distinct feature for online social network is users' rich interactions with the textual content, which, unfortunately, has not yet been well incorporated in the existing topic modeling frameworks.
In this paper, we propose an LDA-based behavior-topic
model (B-LDA) which jointly models user topic interests and behavioral patterns. We focus the study of the model on on-line social network settings such as microblogs like Twitter where the textual content is relatively short but user inter-actions on them are rich. We conduct experiments on real Twitter data to demonstrate that the topics obtained by our model are both informative and insightful. As an application of our B-LDA model, we also propose a Twitter followee rec-ommendation algorithm combining B-LDA and LDA, which we show in a quantitative experiment outperforms LDA with a signi cant margin.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 13 sdm-blda-slides

It Is Not Just What We Say, But How We Say Them:Joint Behaviour-Topic Modelling

Minghui QIU, Feida ZHU and Jing JIANG

Singapore Management University

Page 2: 13 sdm-blda-slides

Microblogs

• Rich user interactions with textual information (posting behaviors)

RETWEET

REPLY

POST

MENTION

Why do we need to consider user behaviors?

2

Page 3: 13 sdm-blda-slides

Observation 1: users with similar topics of interest can have different behavioral patterns

• Users who are interested in `politics’ topic

3

Different behaviors people exhibit in Twitter suggest different motivations using the platform.

Page 4: 13 sdm-blda-slides

• Top 5 users who frequently post tweets about the topic `politics’

4

Observation 2: user clusters with distinct behavioral patterns usually represent different user profiles

Official news media accounts

Page 5: 13 sdm-blda-slides

IT Is Not Just What We Say, But How we Say Them

• The way people interact with text is critical in understanding user behavior patterns and modeling user interest in social networks

• To joint model the topic interests and interactions of a user with the topic in Microbloggs like Twitter

5

Page 6: 13 sdm-blda-slides

Outline

• Topic Modeling in Twitter• Joint behavior-topic model• Applications and Empirical Results

– Topic analysis– User clustering– Followee Recommendation

• Summary

Page 7: 13 sdm-blda-slides

Topic Modeling in Twitter

• Twitter– 140 character limit– Noisy tweets

• Comparison between LDA and Twitter-LDA [Zhao et al., ECIR’10]

LDA T-LDA (Twitter-LDA)

Document All tweets of a given Twitter user

Words Words in user’s tweets

Topic assignment Each word has a topic Each tweet has a topic

Word pools Topical words Topical words or background words

To extend T-LDA to jointly modelthe topic interests and interactions of a user.

Page 8: 13 sdm-blda-slides

LDA-based Behaviour-Topic Modelling – B-LDA

8

• U: # of users• N: # of tweets • L: # of words• z: a topic label• y: a switch

U

L

w

N

b

user’s topic distribution θ z

topic’s behavior distribution

𝜓T

background word distribution.

𝜙 ′𝜑

y

topic’s word

distributionT𝜙

𝜂

𝛽𝛽 ′γ𝛼background word

distribution.

Page 9: 13 sdm-blda-slides

Outline

• Topic Modeling in Twitter• Joint behavior-topic model• Applications and Empirical Results

– Topic analysis– User clustering– Followee Recommendation

• Summary

Page 10: 13 sdm-blda-slides

Data sets

• Base data set– 151,055 twitter users in Singapore and their tweets

• Our data set– Randomly selected 5000 users, among whom 1000 are further

selected to obtain their followees, totally 9688 users– Tweets from Sep 1, 2011 to Nov 30.2011– Total tweets: 11,882,441 tweets

• Preprocess– Remove stop words– Remove words with non-standard characters (url, emoticon etc.)

• Parameters setting (LDA, Twitter-LDA, B-LDA):

– # of topics: T = 80– α = 50/T, β = 0.01

10

Page 11: 13 sdm-blda-slides

Topic Analysis

• Whether the resulting topics in B-LDA has some dominant behaviors?

• Entropy on topic’s behavior distribution

– B-LDA: p(b|t) could be learnt– LDA and T-LDA:

– C(t,b): # of times topic t co-occurs with behavior b– δ: normalization factor

11

Page 12: 13 sdm-blda-slides

Topic Analysis

• Whether the resulting topics in B-LDA has some dominant behaviors?

– Low entropy means the topic is with dominant behaviors– B-LDA: topic is enhanced by dominant behavior patterns

12

Page 13: 13 sdm-blda-slides

• Topics of distinct behavior patterns

Topic Analysis

POST

REPLY

RETWEET

13

Topics that are similar but

with different behaviors

Page 14: 13 sdm-blda-slides

• Topics in T-LDA would be split by different behavioral patterns in B-LDA

– 15 topic groups each with two topics– 1 topic group with three topics

Topics Analysis

14

T-LDA B-LDA

1

2

T

…1

2

T

Distance:KL-divergence

Topic group

Page 15: 13 sdm-blda-slides

• Topics split by different behavioral patterns

Topics Analysis

15

Topics in B-LDA are with more distinct behavioral pattern than those in T-LDA

Topic 16 is mainly contributed by new media accounts, but topic 13 is not.Topic 61 is a retweet topic and contains more words with hashtags.

Page 16: 13 sdm-blda-slides

Applications – followee recommendation

• Followee recommendation– User profile: user’s or user’s followees’ textual content– Does not consider behavior patterns

• Behavior-matters– People who use Twitter as instant massager: follow users

who they may interact with– People who use Twitter as information source and news

feeds: follow official new media channels.– Twitter - news media or social network [Kwak et al., WWW’10]

• Definition: users who cares about the behavioral patterns of their followees, explicitly or implicitly, are “behavior-driven followers”. 16

Page 17: 13 sdm-blda-slides

Applications – followee recommendation

• Finding behavior-driven followers– A behavior-driven follower’s followees will naturally form a

small number of clusters within each of which the followees would share similar behavioral patterns.

– k-nearest-neighbor distance

• S: a given space, U: a set of users, : user v’s k-nearest-neighbors

– Behavior-driven index

• ST: the topic space, SB: the joint behavior-topic space, Fu: followees of u

• Behavior-driven follower has a large βK 17

Page 18: 13 sdm-blda-slides

Applications – followee recommendation

• Definition– βK ≥ τ : behavior-driven follower

– βK < τ : topic-driven follower

• Behavior-driven index

– K = 1, topic space: LDA, joint behavior-topic space: B-LDA– Half of users are to some extent behavior-driven

18

Page 19: 13 sdm-blda-slides

Applications – followee recommendation

• Followee recommendation approach [Chen et al., WWW’09]

– For a target user u, we randomly pick one followee from u’s current followee set, and then combine her with another m randomly-selected non-followees.

– For these m + 1 users, any recommendation algorithm would generate a ranking of them in descending order.

– The performance is measured by examining how high the real followee is ranked.

19

Page 20: 13 sdm-blda-slides

Applications – followee recommendation

• Followee recommendation approach [Chen et al., WWW’09]

20

Page 21: 13 sdm-blda-slides

• Evaluation– Rank of the real followee

– Mean reciprocal rank

Applications – followee recommendation

Page 22: 13 sdm-blda-slides

• Evaluation

– Smaller neighbourhood size K has better results– BLDA and TLDA ranks real followees higher than LDA with a smaller

deviation than LDA– Adding behaviours to topic modelling help the task: BLDA > TLDA– LDA: better MRR but low average rank: LDA is not robust and performs

particular well or worse on some set of users 22

Applications – followee recommendation

Page 23: 13 sdm-blda-slides

• Study on behavior-driven index

– Correlation between DKNN and Rank of the real followee

– Correlation between βK and relative rank rLDA/rBLDA

– β1 will be used judge whether a given user is behavior-driven or topic driven follower

Applications – followee recommendation

Page 24: 13 sdm-blda-slides

• Topic-driven follower vs. Behavior-driven follower

• Results on behavior-driven follower

24

Applications – followee recommendation

BLDA significantly performs better than LDA on behavior-driven followees.

Page 25: 13 sdm-blda-slides

• A combined followee recommendation method (comModel)– Using behavior-driven index to choose model

• Model selection

Applications – followee recommendation

Page 26: 13 sdm-blda-slides

• Comparisons of comModel, B-LDA and LDA– Rank of the real followee and MRR

– Cummulative distribution of ranks (CDR) for real followees

Applications – followee recommendation

Page 27: 13 sdm-blda-slides

Summary

• We propose B-LDA - a Behaviour-integrated topic model based on LDA

• Comparison B-LDA with LDA and Twitter-LDA– Experiment results show B-LDA can find topics with dominant

behaviours– We propose an index βK to characterize users who are behaviour-

driven followers, and demonstrate that B-LDA significantly outperforms other models on followee recommendation for behaviour-driven followers.

– Based on the βK index, we propose a new recommendation framework combining B-LDA and LDA which gives promising recommendations.

27

Page 28: 13 sdm-blda-slides

• Thanks!

28

Page 29: 13 sdm-blda-slides

Reference• [Zhao et al., ECIR’10] W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan,

and X. Li, “Comparing twitter and traditional media using topic models,” ser. ECIR, 2011, pp. 338–349

• [Kwak et al., WWW’10] H. Kwak, C. Lee, H. Park, and S. Moon, “What is Twitter, a social network or a news media?” in WWW, 2010, pp. 591–600.

• W.-Y. Chen, J.-C. Chu, J. Luan, H. Bai, Y. Wang, and E. Y. Chang, “Collaborative filtering for orkut communities: discovery of user latent behavior,” ser. WWW, 2009, pp. 681–690.