It Is Not Just What We Say, But How We Say Them: Joint Behaviour-Topic Modelling Minghui QIU, Feida ZHU and Jing JIANG Singapore Management University
Dec 15, 2014
It Is Not Just What We Say, But How We Say Them:Joint Behaviour-Topic Modelling
Minghui QIU, Feida ZHU and Jing JIANG
Singapore Management University
Microblogs
• Rich user interactions with textual information (posting behaviors)
RETWEET
REPLY
POST
MENTION
Why do we need to consider user behaviors?
2
Observation 1: users with similar topics of interest can have different behavioral patterns
• Users who are interested in `politics’ topic
3
Different behaviors people exhibit in Twitter suggest different motivations using the platform.
• Top 5 users who frequently post tweets about the topic `politics’
4
Observation 2: user clusters with distinct behavioral patterns usually represent different user profiles
Official news media accounts
IT Is Not Just What We Say, But How we Say Them
• The way people interact with text is critical in understanding user behavior patterns and modeling user interest in social networks
• To joint model the topic interests and interactions of a user with the topic in Microbloggs like Twitter
5
Outline
• Topic Modeling in Twitter• Joint behavior-topic model• Applications and Empirical Results
– Topic analysis– User clustering– Followee Recommendation
• Summary
Topic Modeling in Twitter
• Twitter– 140 character limit– Noisy tweets
• Comparison between LDA and Twitter-LDA [Zhao et al., ECIR’10]
LDA T-LDA (Twitter-LDA)
Document All tweets of a given Twitter user
Words Words in user’s tweets
Topic assignment Each word has a topic Each tweet has a topic
Word pools Topical words Topical words or background words
To extend T-LDA to jointly modelthe topic interests and interactions of a user.
LDA-based Behaviour-Topic Modelling – B-LDA
8
• U: # of users• N: # of tweets • L: # of words• z: a topic label• y: a switch
U
L
w
N
b
user’s topic distribution θ z
topic’s behavior distribution
𝜓T
background word distribution.
𝜙 ′𝜑
y
topic’s word
distributionT𝜙
𝜂
𝛽𝛽 ′γ𝛼background word
distribution.
Outline
• Topic Modeling in Twitter• Joint behavior-topic model• Applications and Empirical Results
– Topic analysis– User clustering– Followee Recommendation
• Summary
Data sets
• Base data set– 151,055 twitter users in Singapore and their tweets
• Our data set– Randomly selected 5000 users, among whom 1000 are further
selected to obtain their followees, totally 9688 users– Tweets from Sep 1, 2011 to Nov 30.2011– Total tweets: 11,882,441 tweets
• Preprocess– Remove stop words– Remove words with non-standard characters (url, emoticon etc.)
• Parameters setting (LDA, Twitter-LDA, B-LDA):
– # of topics: T = 80– α = 50/T, β = 0.01
10
Topic Analysis
• Whether the resulting topics in B-LDA has some dominant behaviors?
• Entropy on topic’s behavior distribution
– B-LDA: p(b|t) could be learnt– LDA and T-LDA:
– C(t,b): # of times topic t co-occurs with behavior b– δ: normalization factor
11
Topic Analysis
• Whether the resulting topics in B-LDA has some dominant behaviors?
– Low entropy means the topic is with dominant behaviors– B-LDA: topic is enhanced by dominant behavior patterns
12
• Topics of distinct behavior patterns
Topic Analysis
POST
REPLY
RETWEET
13
Topics that are similar but
with different behaviors
• Topics in T-LDA would be split by different behavioral patterns in B-LDA
– 15 topic groups each with two topics– 1 topic group with three topics
Topics Analysis
14
T-LDA B-LDA
1
2
T
…1
2
T
…
Distance:KL-divergence
Topic group
• Topics split by different behavioral patterns
Topics Analysis
15
Topics in B-LDA are with more distinct behavioral pattern than those in T-LDA
Topic 16 is mainly contributed by new media accounts, but topic 13 is not.Topic 61 is a retweet topic and contains more words with hashtags.
Applications – followee recommendation
• Followee recommendation– User profile: user’s or user’s followees’ textual content– Does not consider behavior patterns
• Behavior-matters– People who use Twitter as instant massager: follow users
who they may interact with– People who use Twitter as information source and news
feeds: follow official new media channels.– Twitter - news media or social network [Kwak et al., WWW’10]
• Definition: users who cares about the behavioral patterns of their followees, explicitly or implicitly, are “behavior-driven followers”. 16
Applications – followee recommendation
• Finding behavior-driven followers– A behavior-driven follower’s followees will naturally form a
small number of clusters within each of which the followees would share similar behavioral patterns.
– k-nearest-neighbor distance
• S: a given space, U: a set of users, : user v’s k-nearest-neighbors
– Behavior-driven index
• ST: the topic space, SB: the joint behavior-topic space, Fu: followees of u
• Behavior-driven follower has a large βK 17
Applications – followee recommendation
• Definition– βK ≥ τ : behavior-driven follower
– βK < τ : topic-driven follower
• Behavior-driven index
– K = 1, topic space: LDA, joint behavior-topic space: B-LDA– Half of users are to some extent behavior-driven
18
Applications – followee recommendation
• Followee recommendation approach [Chen et al., WWW’09]
– For a target user u, we randomly pick one followee from u’s current followee set, and then combine her with another m randomly-selected non-followees.
– For these m + 1 users, any recommendation algorithm would generate a ranking of them in descending order.
– The performance is measured by examining how high the real followee is ranked.
19
Applications – followee recommendation
• Followee recommendation approach [Chen et al., WWW’09]
20
• Evaluation– Rank of the real followee
– Mean reciprocal rank
Applications – followee recommendation
• Evaluation
– Smaller neighbourhood size K has better results– BLDA and TLDA ranks real followees higher than LDA with a smaller
deviation than LDA– Adding behaviours to topic modelling help the task: BLDA > TLDA– LDA: better MRR but low average rank: LDA is not robust and performs
particular well or worse on some set of users 22
Applications – followee recommendation
• Study on behavior-driven index
– Correlation between DKNN and Rank of the real followee
– Correlation between βK and relative rank rLDA/rBLDA
– β1 will be used judge whether a given user is behavior-driven or topic driven follower
Applications – followee recommendation
• Topic-driven follower vs. Behavior-driven follower
• Results on behavior-driven follower
24
Applications – followee recommendation
BLDA significantly performs better than LDA on behavior-driven followees.
• A combined followee recommendation method (comModel)– Using behavior-driven index to choose model
• Model selection
Applications – followee recommendation
• Comparisons of comModel, B-LDA and LDA– Rank of the real followee and MRR
– Cummulative distribution of ranks (CDR) for real followees
Applications – followee recommendation
Summary
• We propose B-LDA - a Behaviour-integrated topic model based on LDA
• Comparison B-LDA with LDA and Twitter-LDA– Experiment results show B-LDA can find topics with dominant
behaviours– We propose an index βK to characterize users who are behaviour-
driven followers, and demonstrate that B-LDA significantly outperforms other models on followee recommendation for behaviour-driven followers.
– Based on the βK index, we propose a new recommendation framework combining B-LDA and LDA which gives promising recommendations.
27
• Thanks!
28
Reference• [Zhao et al., ECIR’10] W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan,
and X. Li, “Comparing twitter and traditional media using topic models,” ser. ECIR, 2011, pp. 338–349
• [Kwak et al., WWW’10] H. Kwak, C. Lee, H. Park, and S. Moon, “What is Twitter, a social network or a news media?” in WWW, 2010, pp. 591–600.
• W.-Y. Chen, J.-C. Chu, J. Luan, H. Bai, Y. Wang, and E. Y. Chang, “Collaborative filtering for orkut communities: discovery of user latent behavior,” ser. WWW, 2009, pp. 681–690.