Towards Social User Profiling: Unified and Discriminative Influence Model for Inferring Home Locations
Post on 14-Jul-2015
2380 Views
Preview:
Transcript
Towards Social User Profiling: Unified and Discriminative Influence Model for Inferring Home Locations
Rui Li, Shengjie Wang, Hongbo Deng, Rui Wang, Kevin Chen-Chuan Chang University of Illinois at Urbana-Champaign
13/04/24 KDE Seminar: Yuto Yamaguchi 1
Paper Introduction Speaker: Yuto Yamaguchi
KDD ‘12
Introduction • Users’ locations are important to many applications
• e.g.) Advertisement, Recommendation
• But most of users do not provide their location information • On Twitter, only 16% of users register city level locations in their
profiles
• The objective of this paper is to profile users’ home locations in social network.
13/04/24 KDE Seminar: Yuto Yamaguchi 2
General Ideas for Location Inference • A user more likely to follow another user who lives near
• e.g.) A user in Chicago follows another user in Chicago
• [Backstorm et al., WWW ‘10], • [Clodoveu et al., T-GIS ‘11] , …
• A user more likely to post about a near location to him • e.g.) A user in Houston posts about rockets
• [Cheng et al., CIKM ‘10], • [Chandra et al., SocialCom ’11], • [Kinsella et al., SMUC ‘11], …
13/04/24 KDE Seminar: Yuto Yamaguchi 3
Challenges • On Twitter, following network and tweets provide valuable
signals for profiling their home locations
• But there are two challenges,
• Scarce Signals • 126 friends on average, but only 16% of them provide locations • 6 location related terms in every 100 tweets
• Noisy Signals • a user may follow another user who lives in a distant location • a user may post about distant locations
13/04/24 KDE Seminar: Yuto Yamaguchi 4
Ideas in this paper • The authors propose a unified discriminative influence
model UDI which has two features below
• Unified Signals (for scarce signal challenge) • Integrates social network and user-centric data (i.e., tweets) in a
probabilistic framework, which is viewed as a heterogeneous graph
• Discriminative Influence (for noisy signal challenge) • Users and locations have their own influence scope
e.g.) Lady Gaga (with a broad influence scope) is more likely to be followed by a user far away
à users with broad scopes do not provide so strong signals for location inference
13/04/24 KDE Seminar: Yuto Yamaguchi 5
Contributions • Propose a unified discriminative influence model UDI
• Heterogeneous graph • Influence scope
• Propose two location profiling methods using the above model (introduced later) • Local prediction method • Global prediction method
• Conduct extensive experiments using Twitter dataset • Their method can place 66% users within 100 miles error distance
13/04/24 KDE Seminar: Yuto Yamaguchi 6
Heterogeneous Graph
13/04/24 KDE Seminar: Yuto Yamaguchi 8
User nodes ui ∈Uvj ∈VVenue nodes
If ui posts about vj, create an edge <ui, vj>
If ui follows uj, create an edge <ui, uj>
Location Profiling Problem
13/04/24 KDE Seminar: Yuto Yamaguchi 9
Given a Twitter Graph G, estimate a location for each user ui so as to make close to ui’s true location
L̂ uiL̂ ui
L ui
Motivation 1/2
13/04/24 KDE Seminar: Yuto Yamaguchi 11
Near users (venues) are more likely to be followed (tweeted) by other users
Motivation 2/2
13/04/24 KDE Seminar: Yuto Yamaguchi 12
Each user (venue) has an influence scope of different size
Influential user
regular user
Basic Ideas for the Influence model • Geographically influential user has a broad influence
scope • e.g.) world wide celebrities such as Lady Gaga
• The fact that a user follows a geographically influential user does NOT provide valuable signals for location inference
• e.g.) NOT VALUABLE: a user follows Lady Gaga VALUABLE: a user follows a regular user in Chicago
13/04/24 KDE Seminar: Yuto Yamaguchi 13
Model Formulation • The authors adopt a Gaussian distribution to model the
above characteristics
13/04/24 KDE Seminar: Yuto Yamaguchi 14
latitude longitude
probability to follow (tweet)
N(Lni ,Σni)
node ni’s influence scope
Influence scope – users
13/04/24 KDE Seminar: Yuto Yamaguchi 15
latitude longitude
probability to follow
N(Lui ,Σui)
user ui’s influence scope
High probability to follow ui Low probability
to follow ui
user ui’s home location
Influence scope – venues
13/04/24 KDE Seminar: Yuto Yamaguchi 16
latitude longitude
probability to tweet
N(Lvi ,Σvi)
venue vi’s influence scope
High probability to tweet Low probability
to tweet
venue vi’s location
Different scope size – users
13/04/24 KDE Seminar: Yuto Yamaguchi 17
high influence
Regular user Geographically influential user
More likely to be followed by distant users
Different scope size – venues
13/04/24 KDE Seminar: Yuto Yamaguchi 18
high influence
Regular venue Geographically influential venue
More likely to be tweeted by distant users
Model Parameters • Mean and variance for each Gaussian
• Mean is the location of node ni
• Variance decides the size of each influence scope
• The number of parameters is
13/04/24 KDE Seminar: Yuto Yamaguchi 19
N(Lni ,Σni)
Lni
Σni
Σni=
σ ni0
0 σ ni
"
#
$$
%
&
''
2 U + V( )
LOCATION PROFILING METHODS Local prediction method Global prediction method
13/04/24 KDE Seminar: Yuto Yamaguchi 20
Basic Ideas for Location Profiling
13/04/24 KDE Seminar: Yuto Yamaguchi 21
Estimate such model parameters that maximize the likelihood of obtaining the given Twitter graph
Lni Σniand for each node ni Parameters:
Local Prediction Method • This method only considers the ego-network
• Maximize the likelihood of this network
13/04/24 KDE Seminar: Yuto Yamaguchi 22
tweet
follow
labeled user
labeled user labeled user
unlabeled user
labeled user: his location is known
unlabeled user: his location is unknown
ego-network
Likelihood Function of Local Method
13/04/24 KDE Seminar: Yuto Yamaguchi 23
P ego-network of ui | parameters( ) =
P uj follows ui | Luj ,Lui ,Σui( )uj∈Followers ui( )∏ ×
P ui follows uj | Lui ,Luj ,Σuj( )uj∈Followees ui( )∏ ×
P ui tweets v j | Lui ,Lvj ,Σvj( )vj∈Venues ui( )∏
These are Gaussian
Maximize this function
Each Gaussian
13/04/24 KDE Seminar: Yuto Yamaguchi 24
P uj follows ui | Luj ,Lui ,Σui( ) =
12πσ ui
2 expXui
− Xuj( )2+ Yui −Yuj( )
2
−2σ ui2
#
$
%%%
&
'
(((
• High probability if ui and uj is close • High probability if ui has broad influence scope
Solution of Local Method
13/04/24 KDE Seminar: Yuto Yamaguchi 25
Xui=
Xuj
σ uiuj∈ followers ui( )∑ +
Xuj
σ ujuj∈ followees ui( )∑ +
Xvj
σ viv j∈venues ui( )∑
1σ uiuj∈ followers ui( )
∑ +1σ ujuj∈ followees ui( )
∑ +1σ viv j∈venues ui( )
∑
σ ui2 =
Xui− Xuj( )
2+ Yui −Yuj( )
2
2 followers ui( )uj∈ followers ui( )∑
Obtained as closed-form (no need to memorize)
substitute
Global Prediction Method • This method maximizes the likelihood of the whole network
• Predict locations of unknown users simultaneously
13/04/24 KDE Seminar: Yuto Yamaguchi 26
Likelihood Function of Global Method
13/04/24 KDE Seminar: Yuto Yamaguchi 27
P whole network | parameters( ) =
P ui follows uj | Lui ,Luj ,Σuj( )ui ,uj ∈FollowEdges∏ ×
P ui tweets v j | Lui ,Lvj ,Σvj( )ui ,vj ∈TweetEdges∏
These are Gaussian
Maximize this function
Iterative Algorithm for Global Method • Global method has no closed form solution
à Iterative algorithm
13/04/24 KDE Seminar: Yuto Yamaguchi 28
1. Initialize locations for all unlabeled users 2. 3. repeat
1. update for all nodes using 2. repeat
1. update for all unlabeled users using 3. until converge 4. 5.
4. until converge
Lu
σ nk
Luk
Lu ← Luk
k←1
σ nk
Lu
Luk
Luk← k +1
Dataset • Twitter dataset
• Crawled Profiles, followers, and followees of 3,980,061 users • Geocoded their location profiles into coordinates based on U.S.
Gazetteer • 630,187 users are correctly geocoded ß labeled users
• 158,220 of labeled users have at least one labeled neighbor • neighbor: follower or followee
• Crawled at most 600 tweets for each labeled user, and obtained 139,180 users’ tweets • Other users are protected users
• Using this dataset, the authors conducted five-fold cross validation • 80% of 139,180 users are for training set, 20% are for test set • Repeat 5 runs
13/04/24 KDE Seminar: Yuto Yamaguchi 30
Methods • Compared 6 methods
• BaseU: Backstorm et al.’s method [1] • Using only social graph
• BaseC: Cheng et al.’s method [2] • Using only tweets
• UDIU: Local prediction method, but only uses user nodes • UDIC: Local prediction method, but only uses venue nodes • UDII: Local prediction method • UDIG: Global prediction method
13/04/24 KDE Seminar: Yuto Yamaguchi 31
No influence model
[1] Backstorm et al., “Find me if you can: improving geographical prediction with social and spatial proximity”, WWW’10 [2] Cheng et al., “You are where you tweet: a content-based approach to geo-locating twitter users”, CIKM’10
Results – Prediction results
13/04/24 KDE Seminar: Yuto Yamaguchi 32
ACC: Ratio of correctly predicted users within 100 miles AED@k%: Average error distance of top k% users
• Influence model is effective to predict locations • Comparing BaseU and UDIU (BaseC and UDIC)
• Integrating both signals is effective to predict locations • Comparing UDIU and UDII (UDIC and UDII)
• Global method improves Local one only 1.5% • Comparing UDIG and UDII
Results – Global and Local
13/04/24 KDE Seminar: Yuto Yamaguchi 33
+9% in ACC 20% training users and 80% test users
In the case that most of users are unlabeled, the global method improves the local one substantially
Results – Influence scope
13/04/24 KDE Seminar: Yuto Yamaguchi 34
• Users with a large number of followers do not always have large σ • e.g.) MythBusters Official have larger σ than Lady
Gaga but have smaller number of followers
Conclusion • Proposed
• Unified discriminative influence model (UDI) • Two location prediction method based on influence model
• global and local
• Conducted experiments using large Twitter dataset • Proposed methods significantly outperform existing methods
• NO future work
13/04/24 KDE Seminar: Yuto Yamaguchi 36
top related