Top Banner
Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research, Bangalore laboration w/ Himabindu Lakkaraju & Chiranjib Bhattachar Workshop on Social Computing IIT Kharagpur, Oct 5-6 2012
52

Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Jan 03, 2016

Download

Documents

driscoll-york

Workshop on Social Computing IIT Kharagpur, Oct 5-6 2012. Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media *. Indrajit Bhattacharya Research Scientist IBM Research, Bangalore. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on

Users in Social Media*Indrajit Bhattacharya

Research ScientistIBM Research, Bangalore

*Collaboration w/ Himabindu Lakkaraju & Chiranjib Bhattacharyya

Workshop on Social ComputingIIT Kharagpur, Oct 5-6 2012

Page 2: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Social Media Analysis: Motivation

Microblogs: Twitter, Facebook, MySpace

Understanding and analyzing topics & trends

Influences on users

Variety of stakeholders

Business

Government

Social scientists

2

Page 3: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Social Media Analysis: Challenges

Network and Influences on Users

User personality: Personal preferences, global and geographic trends, social circle in the network [Yang WSDM 11]

Dynamic nature

Topics & user personalities evolve over time

Volume of data

Existing approaches fall short 3

Page 4: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Soc Med Analysis: State of the Art

Content Analysis

Ramage ICWSM 2010, Hong SOMA 2010

Variants of LDA

Inferring User Interests

Ahmed KDD 2011, Wen KDD 2010

Individual features such as user activity or network

Patterns in Temporal Evolution

Yang et al WSDM 20114

Page 5: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Bayesian Non-parametric Models

Choosing no of components in a mixture model

Particularly severe problem for large data volumes such as for social media data

Bayesian solution

Infinite dimensional prior

Allows no of mixture components to grow with data size

Cannot capture richness of social media data

Algorithms often not scalable 5

Page 6: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Talk Outline

Background: Chinese Restaurant Processes

CRP with multiple relationships: (RelCRP, MRelCRP)

Dynamic MRelCRP

Multi-threaded Online Inference Algorithm

Experimental Results 8

Page 7: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Talk Outline

Background: Chinese Restaurant Processes

CRP with multiple relationships: (RelCRP, MRelCRP)

Dynamic MRelCRP

Multi-threaded Online Inference Algorithm

Experimental Results 9

Page 8: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Dirichlet Process (Informal)

10

Page 9: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Dirichlet Process: Properties

12

Page 10: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Chinese Restaurant Process (CRP)

14

Page 11: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Talk Outline

Background: Chinese Restaurant Processes

CRP with multiple relationships: (RelCRP, MRelCRP)

Dynamic MRelCRP

Parallelized Online Inference Algorithm

Experimental Results 15

Page 12: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Relational Ch. Rest. Pr. (RelCRP)

R16

Page 13: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Relational Ch. Rest. Pr. (RelCRP)

17

Page 14: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Influence of World-wide Factors

18

Page 15: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Influence of World-wide Factors

19

Page 16: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Influence of Personal Preferences

20

Page 17: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Influence of Personal Preferences

21

Page 18: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Influence of Friend Network

22

Page 19: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Influence of Friend Network

23

Page 20: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Influence of Geography

India China

UK

24

Page 21: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Influence of Geography

25

Page 22: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Aggregating Influences

RelCRP is exchangeable like the CRP

Useful as a prior for infinite mixture model

RelCRP captures influence of one relation on posts

Influences act simultaneously on any user

Aggregated influence pattern is user specific

Different users affected differently by same combination of world-wide and geographic factors

Page 23: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Multi Relational CRP

28

Page 24: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Talk Outline

Background: Chinese Restaurant Processes

CRP with multiple relationships: (RelCRP, MRelCRP)

Dynamic MRelCRP

Multi-threaded Online Inference Algorithm

Experimental Results 30

Page 25: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Evolving Patterns in Social Media

Number of Topics

Topics die and new ones are born

User Personalities

Susceptibility to influence by world-wide, geographic and friends’ preferences

Existing Topic Distributions

Words go out of fashion, new ones enter vocabulary

Topic Characters:

Popularity of topic changes world-wide, in users preference, sub-networks and geographies 31

Page 26: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Dynamic MultiRelCRP

32

Page 27: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

User Personality Trends

33

Page 28: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Evolving Topic Distributions

34

Page 29: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Topic Character Trends

35

Page 30: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Talk Outline

Background: Chinese Restaurant Processes

CRP with multiple relationships: (RelCRP, MRelCRP)

Dynamic MRelCRP

Multi-threaded Online Inference Algorithm

Experimental Results 36

Page 31: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Inference and Estimation Tasks

37

Page 32: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Online Algorithm

Traditional iterative framework does not scale for social media data

Sequential Monte Carlo methods [Canini AIStats ‘09] that rejuvenate some old labels also infeasible

Online sampling [Banerjee SDM ‘07] does not revisit old labels at all; initial batch phase

Adapt for non-parametric setting

38

Page 33: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Multi-threaded Implementation

Sequential online implementation does not scale

Iterative Gibbs sampling algorithms parallelized for hierarchical Bayesian models [Asuncion NIPS 08, Smola VLDB 10]

Our algorithm is parallel, online and non-parametric

Explicit consolidation by master thread at the end of each iteration

Only new topics consolidated 39

Page 34: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Talk Outline

Background: Chinese Restaurant Processes

CRP with multiple relationships: (RelCRP, MRelCRP)

Dynamic MRelCRP

Multi-threaded Online Inference Algorithm

Experimental Results 40

Page 35: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Datasets and Baselines

Twitter: 360 million tweets (Jun-Dec 2009)

Facebook: 300,000 posts (public profiles, 3 mths)

Latent Dirichlet Allocation (LDA)

[Hong SOMA 2010]

Labeled LDA (L-LDA)

Hashtags as topics [Ramage ICWSM 2010]

Timeline

Dynamic non-parametric topic model [Ahmed UAI 2010] 41

Page 36: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

1 Model Goodness

Perplexity: Ability to generalize to unseen data

Both network and dynamics are important for modeling social media data

Model Twitter FacebookDMRelCRP 1188.29 1562.34Timeline 1582.86 1802.9L-LDA 1982.76 -LDA 2932.06 3602

Perplexity

42

Page 37: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

2 Quality of Discovered Topics

Label assigned to each post indicating category

Distribution over words indicating semantics

A. Clustering posts using topic labels

B. Prediction using topic labels

Predicting post authorship & user commenting activity

C. Major event detection

43

Page 38: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

2A Post Clustering using Topics

Use hashtags as gold standard (for Twitter)

16K posts #NIPS2009, #ICML2009, #bollywood etc

DMRelCRP close to L-LDA without using hashtags

DMelCRP produces ‘finer-grained’ clusters

Model nMI R-Index F1DMRelCRP 0.93 0.88 0.86Timeline 0.81 0.72 0.73L-LDA 1 1 1LDA 0.55 0.52 0.48

Clustering accuracy (Tw)

44

Page 39: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

2B Prediction Using Topics

Authorship: Given post and user, predict if author

Commenting activity: Given post and (non-author) user, predict if user comments on that post

DMRelCRP topics lead to more accurate prediction

Model Twitter Facebook Twitter FacebookDMRelCRP 0.793 0.734 0.683 0.648Timeline 0.718 0.669 0.582 0.579L-LDA 0.521 0.432 0.429 0.482LDA 0.647 - 0.542 -

Authorship Commenting

45

Page 40: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

2C Major Event Detection

47

Page 41: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

2C Major Event Detection

48

Page 42: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

3 Analysis of Influences

49

Page 43: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

3A Global Personality Trends

50

Page 44: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

3A Global Personality Trends

51

Michael Jackson’s death

FIFA WC

Google Wave

Page 45: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

3A Global Personality Trends

52

Page 46: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

3B Geo-specific Personality Trends

Personality trends very similar in UK and US

Geographic influences high at different epochs 53

Page 47: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

3B Geo-specific Personality Trends

India: W-wide and geographic influences weaker

China: W-wide weak, geo strong; stable pattern 54

Page 48: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

3C Topic Character Trends

55

Page 49: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

3C Topic Character Trends

56

Page 50: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

3C Topic Character Trends

57

Page 51: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Scaling with Data Size

Java-based multi-threaded framework; 7 threads

8-core 32 GB RAM

Scales largely because of multi-threading 58

Page 52: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Summary

First attempt at studying user influences in social media data

New non-parametric model that captures multiple relationships and temporal evolution

Multi-threaded online Gibbs sampling algorithm

Extensive evaluation on large real dataset

Topics lead to better clustering and prediction

Insights on user influence patterns

59