Collaborative Personalized Twitter Search with Topic-Language Models
Post on 02-Jul-2015
570 Views
Preview:
DESCRIPTION
Transcript
Collaborative Personalized
Twitter Search with Topic-Language Models
Jan Vosecky
Kenneth Wai-Ting Leung
Wilfred Ng
Supported by SIGIR Travel Grant
Microblogs
2
Microblogs
Tweet 1
Tweet 2
3
User-generated content
– Short length
– Informal language, free-form
– Diverse topics
Very high volume
Information overload
Searching on Twitter
4
“When you've got 5 minutes to fill,
Twitter is a great way to fill 35 minutes”
@mattcutts
Searching for “ipad” on Twitter
Around 50 tweets
mentioning “iPad”
posted within
1-minute
5
Personalizing
Twitter Search
6
Microblog data
• Compared with traditional domains
(e.g. web search, news search):
– Explicitly stated user interests
• tweets, conversations, re-tweets
– Social network structure
• following
7
• Individual user’s data
– Diverse
– Sparse
• User’s social connections
Personalization challenge
Putting all kinds of information into a single user model
inaccurate, noisy
8
• Individual user’s data
– Diverse
– Sparse
• User’s social connections
– Diverse friends, topics
– Need to carefully organize friends’ informatio
Personalization challenge
9
Short messages
Few messages
Few social connections
Little search history
• Individual user’s data
– Diverse
– Sparse
• User’s social connections
– Diverse friends, topics
– Need to carefully organize friends’ information
for it to be useful
Personalization challenge
10
• Individual user’s data
– Diverse
– Sparse
• User’s social connections
– Diverse friends, topics
Topics
Contributions
11
Novel User Model
structure
Collaborative User
Model
12
Language
modeling IR
Query likelihood model
– Given a query Q and a
document D,
where
Topic Models
A latent topic in LDA:
“Information Technology”
Google 0.00040
Android 0.00020
Microsoft 0.00010
App 0.00010
Security 0.00009
Email 0.00008
Login 0.00005
Virus 0.00004
Scope of our approach
• Input to our algorithm:
– Set of n documents returned by Twitter given
query Q
• Our task:
– Rank the documents according to:
• Query
• User model
13
Proposed Framework
14
At a Glance: Proposed User Model
15
At a Glance: Proposed User Model
16
17
At a Glance: Proposed User Model
18
At a Glance: Proposed User Model
Individual User Model
19
ITW = 2/5 = 40%
Sport
W = 2/5 = 40%
Manchester: 5
Play: 4
Win: 2
Android: 6
Coding: 2
Java: 2
ID Tweet Time Topic
1 Manchester playing tonight 1. 1. Sport
2 Doing some android coding 2. 1. IT
3 Great game, great win for manchester! 5. 1. Sport
4 Had a great apple cake with chocolate 6. 1. Food
5 My java code keeps throwing exceptions 10. 1. IT
Food
W = 1/5 =
20%
Cake: 6
Apple: 5
Oven: 2
Individual User Model (IM)
20
Is u interested in word w from topic k?
Is u interested in topic k?
Is word w related to topic k?
Prior prob. of topic k
Recent interest is more important:
From user From topic model
Personalization using IM
21
Is the Query relevant to topic k?
Is Q related to topic k in general?
Is the User interested in topic k?
Is Q related to the words in topic k that User is interested in?
Is the Document relevant to topic k?
Is D related to topic k in general?
Is the User interested in topic k?
Is D related to the words in topic k that User is interested in?
Prior Document probability
Personalization using IM
22
Q = australia
I’m interested in IT and travel
I’ve never tweeted about Australia
…
TV
Music
IT
Travel
Politics
Business
…
0.1
0.3
…
Top 10 restaurants in Australia
…
iPhones, iPads, and Macs Hacked and Hijacked
for Ransom in Australia - Gotta Be Mobile
…
Tweet (D):
Personalization using IM
23
Q = australia
I’m interested in IT and travel
I have tweeted about IT in Australia
…
TV
Music
IT
Travel
Politics
Business
…
0.6
0.3
…
Top 10 restaurants in Australia
…
iPhones, iPads, and Macs Hacked and Hijacked
for Ransom in Australia - Gotta Be Mobile
…
Tweet (D):
Collaborative User Model
Sport Food
Manchester: 5
Play: 4
Win: 2
Cake: 6
Apple: 5
Oven: 2
Friend 1
Sport
Manchester: 5
Play: 4
Win: 2
Friend 2
IT Music
Radiohead: 4
Listen: 2
Song: 5
Android: 6
Coding: 2
Java: 2
Friend 3
Sport
Manchester: 5
Play: 4
Win: 2
IT
Android: 6
Coding: 2
Java: 2
Music
Radiohead: 4
Listen: 2
Song: 5
Food
Cake: 6
Apple: 5
Oven: 2
Collaborative Model
24
Collaborative User Model
• Weighted sum of IM’s of the top-n friends– based on the amount of interactions (re-tweets, mentions,
conversations)
• Weight of each friend f:
– wP(f): Popularity of f
– wA(u,f): Affinity of u and f
• Weight of each f’s topic k:
– wB(u,k): Topic bias
– wI(u,f,k): Topic-interaction between u and f
25
Personalization using IM and CM
26
From user From topic modelFrom friends
Dirichlet smoothing
Depends on the amount of user’s tweets
Search User Model (SM)
• Feedback sources: Queries + clicks
• What does a ‘click’ mean?
27
URL clickre-tweetfavorite
Search User Model (SM)
• Feedback sources: Queries + clicks
• Feedback from a ‘click’:
– Query-topic: preference for topic k when issuing Q
– Topic-word: preference for words in topic k
– Topic: user’s search bias towards topic k
28
Evaluation
29
Evaluation
30
Query log collection
• Evaluation interface
– Submit query, returns tweets from Twitter API
– Rate relevant tweets
31
Datasets
• Controlled user study (Log_CoS)
– 11 users
• In-the-wild user study (Log_IwS)
– 24 users
32
Log_CoS Log_IwS
Ranking Results
33
Baselines:
Query likelihood (J-M smoothing)
Topic model-based IR
Personalized search (User-specific language models)
Collaborative search (Cluster-specific language models)
Collaborative Personalized search
Ranking Results
34
Ranking Results
35
Ranking Results
36
Ranking Results
37
Average per-user ranking performance
after processing i user’s queries
Comparison of models
38
(a) Log_CoS (b) Log_IwS
Query types
39
(a) Log_CoS (b) Log_IwS
Performance by query type
In summary
• Collaborative Personalized Twitter Search
– User’s tweets
– User’s friends’ tweets
– User’s search activity
– Organized around topics
• topic-specific language models
40
Future work
• Query-dependent personalization
strategies
• Selection of an optimal set of friends for
collaborative model
• Integrating spatial and temporal features
41
Thank You!
Jan Vosecky
Kenneth Wai-Ting Leung
Wilfred Ng
Supported by SIGIR Travel Grant
top related