Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Post on 03-Jan-2016

32 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Workshop on Social Computing IIT Kharagpur, Oct 5-6 2012. Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media *. Indrajit Bhattacharya Research Scientist IBM Research, Bangalore. - PowerPoint PPT Presentation

Transcript

Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on

Users in Social Media*Indrajit Bhattacharya

Research ScientistIBM Research, Bangalore

*Collaboration w/ Himabindu Lakkaraju & Chiranjib Bhattacharyya

Workshop on Social ComputingIIT Kharagpur, Oct 5-6 2012

Social Media Analysis: Motivation

Microblogs: Twitter, Facebook, MySpace

Understanding and analyzing topics & trends

Influences on users

Variety of stakeholders

Business

Government

Social scientists

2

Social Media Analysis: Challenges

Network and Influences on Users

User personality: Personal preferences, global and geographic trends, social circle in the network [Yang WSDM 11]

Dynamic nature

Topics & user personalities evolve over time

Volume of data

Existing approaches fall short 3

Soc Med Analysis: State of the Art

Content Analysis

Ramage ICWSM 2010, Hong SOMA 2010

Variants of LDA

Inferring User Interests

Ahmed KDD 2011, Wen KDD 2010

Individual features such as user activity or network

Patterns in Temporal Evolution

Yang et al WSDM 20114

Bayesian Non-parametric Models

Choosing no of components in a mixture model

Particularly severe problem for large data volumes such as for social media data

Bayesian solution

Infinite dimensional prior

Allows no of mixture components to grow with data size

Cannot capture richness of social media data

Algorithms often not scalable 5

Talk Outline

Background: Chinese Restaurant Processes

CRP with multiple relationships: (RelCRP, MRelCRP)

Dynamic MRelCRP

Multi-threaded Online Inference Algorithm

Experimental Results 8

Talk Outline

Background: Chinese Restaurant Processes

CRP with multiple relationships: (RelCRP, MRelCRP)

Dynamic MRelCRP

Multi-threaded Online Inference Algorithm

Experimental Results 9

Dirichlet Process (Informal)

10

Dirichlet Process: Properties

12

Chinese Restaurant Process (CRP)

14

Talk Outline

Background: Chinese Restaurant Processes

CRP with multiple relationships: (RelCRP, MRelCRP)

Dynamic MRelCRP

Parallelized Online Inference Algorithm

Experimental Results 15

Relational Ch. Rest. Pr. (RelCRP)

R16

Relational Ch. Rest. Pr. (RelCRP)

17

Influence of World-wide Factors

18

Influence of World-wide Factors

19

Influence of Personal Preferences

20

Influence of Personal Preferences

21

Influence of Friend Network

22

Influence of Friend Network

23

Influence of Geography

India China

UK

24

Influence of Geography

25

Aggregating Influences

RelCRP is exchangeable like the CRP

Useful as a prior for infinite mixture model

RelCRP captures influence of one relation on posts

Influences act simultaneously on any user

Aggregated influence pattern is user specific

Different users affected differently by same combination of world-wide and geographic factors

Multi Relational CRP

28

Talk Outline

Background: Chinese Restaurant Processes

CRP with multiple relationships: (RelCRP, MRelCRP)

Dynamic MRelCRP

Multi-threaded Online Inference Algorithm

Experimental Results 30

Evolving Patterns in Social Media

Number of Topics

Topics die and new ones are born

User Personalities

Susceptibility to influence by world-wide, geographic and friends’ preferences

Existing Topic Distributions

Words go out of fashion, new ones enter vocabulary

Topic Characters:

Popularity of topic changes world-wide, in users preference, sub-networks and geographies 31

Dynamic MultiRelCRP

32

User Personality Trends

33

Evolving Topic Distributions

34

Topic Character Trends

35

Talk Outline

Background: Chinese Restaurant Processes

CRP with multiple relationships: (RelCRP, MRelCRP)

Dynamic MRelCRP

Multi-threaded Online Inference Algorithm

Experimental Results 36

Inference and Estimation Tasks

37

Online Algorithm

Traditional iterative framework does not scale for social media data

Sequential Monte Carlo methods [Canini AIStats ‘09] that rejuvenate some old labels also infeasible

Online sampling [Banerjee SDM ‘07] does not revisit old labels at all; initial batch phase

Adapt for non-parametric setting

38

Multi-threaded Implementation

Sequential online implementation does not scale

Iterative Gibbs sampling algorithms parallelized for hierarchical Bayesian models [Asuncion NIPS 08, Smola VLDB 10]

Our algorithm is parallel, online and non-parametric

Explicit consolidation by master thread at the end of each iteration

Only new topics consolidated 39

Talk Outline

Background: Chinese Restaurant Processes

CRP with multiple relationships: (RelCRP, MRelCRP)

Dynamic MRelCRP

Multi-threaded Online Inference Algorithm

Experimental Results 40

Datasets and Baselines

Twitter: 360 million tweets (Jun-Dec 2009)

Facebook: 300,000 posts (public profiles, 3 mths)

Latent Dirichlet Allocation (LDA)

[Hong SOMA 2010]

Labeled LDA (L-LDA)

Hashtags as topics [Ramage ICWSM 2010]

Timeline

Dynamic non-parametric topic model [Ahmed UAI 2010] 41

1 Model Goodness

Perplexity: Ability to generalize to unseen data

Both network and dynamics are important for modeling social media data

Model Twitter FacebookDMRelCRP 1188.29 1562.34Timeline 1582.86 1802.9L-LDA 1982.76 -LDA 2932.06 3602

Perplexity

42

2 Quality of Discovered Topics

Label assigned to each post indicating category

Distribution over words indicating semantics

A. Clustering posts using topic labels

B. Prediction using topic labels

Predicting post authorship & user commenting activity

C. Major event detection

43

2A Post Clustering using Topics

Use hashtags as gold standard (for Twitter)

16K posts #NIPS2009, #ICML2009, #bollywood etc

DMRelCRP close to L-LDA without using hashtags

DMelCRP produces ‘finer-grained’ clusters

Model nMI R-Index F1DMRelCRP 0.93 0.88 0.86Timeline 0.81 0.72 0.73L-LDA 1 1 1LDA 0.55 0.52 0.48

Clustering accuracy (Tw)

44

2B Prediction Using Topics

Authorship: Given post and user, predict if author

Commenting activity: Given post and (non-author) user, predict if user comments on that post

DMRelCRP topics lead to more accurate prediction

Model Twitter Facebook Twitter FacebookDMRelCRP 0.793 0.734 0.683 0.648Timeline 0.718 0.669 0.582 0.579L-LDA 0.521 0.432 0.429 0.482LDA 0.647 - 0.542 -

Authorship Commenting

45

2C Major Event Detection

47

2C Major Event Detection

48

3 Analysis of Influences

49

3A Global Personality Trends

50

3A Global Personality Trends

51

Michael Jackson’s death

FIFA WC

Google Wave

3A Global Personality Trends

52

3B Geo-specific Personality Trends

Personality trends very similar in UK and US

Geographic influences high at different epochs 53

3B Geo-specific Personality Trends

India: W-wide and geographic influences weaker

China: W-wide weak, geo strong; stable pattern 54

3C Topic Character Trends

55

3C Topic Character Trends

56

3C Topic Character Trends

57

Scaling with Data Size

Java-based multi-threaded framework; 7 threads

8-core 32 GB RAM

Scales largely because of multi-threading 58

Summary

First attempt at studying user influences in social media data

New non-parametric model that captures multiple relationships and temporal evolution

Multi-threaded online Gibbs sampling algorithm

Extensive evaluation on large real dataset

Topics lead to better clustering and prediction

Insights on user influence patterns

59

top related