Top Banner
Iden%fying Relevant Social Media Content: Leveraging Informa%on Diversity and User Cogni%on Munmun De Choudhury 1 , Sco% Counts 2 & Mary Czerwinski 2 1 Rutgers, The State University of New Jersey 2 Microso< Research, Redmond
48

Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

May 20, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

Iden%fying Relevant Social Media Content: Leveraging Informa%on Diversity and User Cogni%on 

Munmun De Choudhury1, Sco% Counts2 & Mary Czerwinski2 1Rutgers, The State University of New Jersey 

                                     2Microso< Research, Redmond 

Page 2: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

6/8/11  2 

Modern Social Interac%onal Modes 

Facebook Slashdot 

Engadget 

Flickr 

LiveJournal Digg 

YouTube Blogger 

MetaFilter Reddit MySpace 

Orkut TwiIer 

Page 3: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

140 characters can cause revoluEons 

Page 4: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

During the elec%ons in Iran 

Page 5: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

And during the earthquake in Hai% 

Page 6: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

However the social web is changing at a fast rate 

Page 7: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

And what exactly is changing? 

Page 8: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

New people appear 

Page 9: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

New Ees are formed 

Page 10: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

New interacEonal data appears too! 

{pink, story, design} 

{visualizaEon, Environment} 

Page 11: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

We are aQracted to social media, in part due to large scale datasets 

Page 12: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

Is there something more fundamental happening here than just scale? 

Page 13: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

This talk is about selecEng 

content that ma%ers 

Page 14: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

“InformaEon overload” problem – Get me the right content! 

6/8/11 

14 

Page 15: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

How do we idenEfy the most “relevant” or “’best” items on a topic, from millions and even billions of units of social media content? 

Page 16: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

Let’s contrast this with a familiar example 

Page 17: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

Discrete, regular and fixed sampling la\ce 

• Shannon‐Nyquist sampling theorem:  “If a funcEon x(t) contains no frequencies higher than B hertz, it is completely determined by giving its ordinates at a series of points spaced 1/(2B) seconds apart.” 

Page 18: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

Time to sample each pixel is constant 

Page 19: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

Note that the web acEvity has no noEon of bandwidth! 

Page 20: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

Interfaces / tools  #Responses 

TwiQer website  50 

TwiQer clients, such as Tweetdeck, TwiQerific etc.  25 

Search engines, such as Bing Social   19 

Third party apps, such as TwiQer plugin for Google 

Uni‐dimensional informaEon presentaEon; but social media informaEon is diverse. 

Page 21: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

CharacterisEcs of social media – high dimensionality 

Geo

graphy 

Authority 

ConversaEonal nature ThemaEc category 

InformaEon Diversity  InformaEon Diversity 

[Simon 1971, Zaichkowsky 1985,  Jost 2006] 

Page 22: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

Also, social media content selecEon needs to benefit from mechanisms of human cogniEon 

Page 23: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

“Goodness of a set” – using measures of human informaEon processing 

Engagement Memory encoding InteresEngness InformaEveness 

Page 24: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

How do we select  such content that  matches a certain  degree of  informaEon diversity? 

Page 25: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

Dimensional Importance 

•  Survey based feedback on the importance of different dimensions – referred to as “concentraEon parameters”. –  ParEcipants (11 ‘acEve’ TwiQer users) were requested to rate each of the 

tweet dimensions on a scale of 1 through 7, where 1 implied “not important at all”, and 7 meant “highly important”. 

–  The survey also allowed them to idenEfy other dimensions that they might think to be significant. 

6/8/11  25 

Page 26: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

Social Media Content Selec%on 

•  Every tweet ti represented as a vector of its dimensions and their corresponding weights. 

•  We propose an iteraEve clustering for tweet set generaEon – based on entropy distorEon minimizaEon technique. –  The sets are constructed given a sampling raEo ρ and a diversity parameter value ω. 

–  The (sub)‐opEmal set to be constructed is represented as, ΨS*(ρ ,ω). 

•  Start with a random tweet as a seed. •  IteraEvely keep on adding tweets from ΨS, say ti, such that the 

distorEon (in terms of L1‐norm) of entropy of the set (say,ΨS(i,ω)) on addiEon of the tweet ti is least with respect to the specified diversity measure ω. 

6/8/11  26 

Page 27: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

How does this method compare to state‐of‐the‐art techniques? 

6/8/11  27 

TwiQer, Firehose, June 2010, total 1.4 Billion tweets 

Page 28: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

Quan%ta%ve evalua%on framework 

•  We defined a set of baseline techniques using simplified version of our proposed algorithm: –  Random set (B1) 

–  Randomly sampled diversity level (B2) –  Equal weighEng of tweet dimensions (B3) 

–  Another two methods were used: “most recent” tweets (MR) and “most tweeted URL”  (MTU) meaning the tweets corresponding to URLs that were highly shared in the network 

6/8/11  28 

Page 29: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

SubjecEve EvaluaEon 

6/8/11  29 

Page 30: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

Cogni%ve metrics 

•  Explicit Measures. Explicit measures consisted of three 7‐point Likert scale raEngs made a<er reading each tweet set, –  “interesEngness” –  “informaEveness” 

•  Implicit Measures.  –  CogniEve Engagement [Czerwinski 2001] – ideally if the informaEon 

presented in a tweet sample is very engaging, the parEcipant would underesEmate the Eme taken to go through it.  

–  RecogniEon Memory for tweets already shown – related to encoding in the long‐term memory [Sperling 1973, Smith 1979]. 

6/8/11  30 

Page 31: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

31 User Study… 

6/8/11 

Page 32: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

32 User Study… 

6/8/11 

Page 33: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

33 User Study… 

6/8/11 

Page 34: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

Hypothesis I. Tweet sets generated by proposed method will be beQer than those from baseline methods.  

6/8/11  34 

Page 35: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

Performance Evalua%on 

6/8/11  35 

Page 36: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

Performance Evalua%on (Contd.) 

6/8/11  36 

Interac%ons  Interes%ngness  Informa%veness  Cogni%ve engagement 

Recogni%on Memory 

B1 X PM  0.002  0.009  0.007  0.097 

B2 X PM  0.027  0.117  0.011  0.105 

B3 X PM  0.241  0.351  0.138  0.411 

MR X PM  0.0003  <0.0001  0.003  0.005 

MTU X PM  0.061  0.171  0.004  0.214 

TesEng for staEsEcal significance: one‐tail paired t‐test; confidence level p<0.1. 

Page 37: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

Hypothesis II: ParEcipants will perceive the diversity of sets by our method more accurately than by baselines.  

6/8/11  37 

Page 38: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

Diversity Percep%on 

6/8/11  38 

B1  B2  B3  Proposed Method 

d’  Error  d’  Error  d’  Error  d’  Error 

ω = 0.1  2.8  20.6%  2.2  11.1%  2.1  8.8%  1.1  7.8% 

ω = 0.6  1.7  47.5%  2.9  28.1%  3.3  20.8%  5.4  13.6% 

ω = 0.9  5.1  20.6%  5.5  14.6%  6.1  9.5%  6.8  7.3% 

Perceived diversity is more accurate for highly heterogeneous and highly homogenous Tweet samples. Diversity percepEon is beQer for our proposed method. 

Page 39: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

Hypothesis III: ParEcipants responses will be affected by the level of diversity in the various tweet sets shown   

6/8/11  39 

Page 40: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

Impact of Diversity on User Response 

6/8/11  40 

ParEcipant raEngs on different cogniEve aspects of informaEon consumpEon seems to be higher for highly homogenous and highly heterogeneous informaEon samples 

Page 41: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

Conclusions 

•  Content selecEon methodologies of large social spaces that incorporate cogniEve metrics of content consumpEon can enable the design of beQer content exploraEon interfaces. –  InformaEon diversity is key 

–  User appear to cogniEvely encode informaEon beQer, when presented with samples of high or low diversity 

6/8/11  41 

Page 42: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

Open QuesEons 

6/8/11  42 

Page 43: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

Are there empirical bounds on what degrees of diversity in a sample best suit content consumpEon? 

6/8/11  43 

Page 44: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

Does the informaEon space seem to exhibit entropy signatures? 

6/8/11  44 

Page 45: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

If so, can these entropy signatures guide the content selecEon methodology more adequately?  

6/8/11  45 

Page 46: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

QuesEons? 

For details: [email protected] Web: hQp://www.public.asu.edu/~mdechoud/ TwiQer: @munmun10 

Page 47: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

Appendix 

Page 48: Idenfying Relevant Social Media Content: Leveraging ... Relevant Social Media Content: Leveraging Informaon Diversity and User Cognion Munmun De Choudhury1, Sco Counts 2 & Mary Czerwinski2

Qualita%ve evalua%on 

@Paramedic_Fla  Some oil spill events from Monday, June 7, 2010 hQp://bit.ly/cRwfXn 

@miamiauto  Some oil spill events from Monday, June 7, 2010: A summary of events on Monday, June 7, Day 48 of the Gulf of Mexi... hQp://bit.ly/9HNG9Z 

@franklanguage  RT @DAYLEE F@CK  that! Broken pipe is not NATURAL! RT @RayBeckermanFreedomWorks CEO, Calls Oil Spill Natural Disaster hQp://bit.ly/coUY4l 

@Teasdallqrb  Public offers &apos;helpful&apos; ideas on containing BP oil spill ‐ NEWS.com.au 

@_paigenesss  RT @TEDchris: A Gulf oil spill picture I will never forget. hQp://twitpic.com/1toz8a 

@LeiaOfAlderaan  CiEzen Speaks The Truth ON BP Gulf Oil Spill‐‐the Govt, BP Are Doing Nothing, There Are No Leaders Here hQp://bit.ly/BP‐Gulf‐Oil‐Spill 

@FausEnagwlxo  WOOW! NO WAY! so brutal! hQp://ilil.me/h MTV Movie Summer Jam WWDC Oil Spill XEna Another Cinderella Story 

@minxdeluxe  RT @OliBarreQ: Visualizing the BP Oil Spill hQp://www.ifitwasmyhome.com/ 

[TwiIer search‐alike] Most Recent tweets  [Bing‐alike] Most tweeted URL‐containing tweets 

@JosephAGallant  Erin Brockovich to meet with fishermen who say oil spill dispersant used by BP made them sick. hQp://huff.to/aGVWIl #tcot #BP #oilspill 

@dixie_patriot  Oil spill cap catching about 10,000 barrels a day|LONDON ? BP's oil spill cap, designed to stop a huge leak from .. hQp://oohja.com/xeWhD 

@MoCuad  My heart breaks all over again, every Eme I'm reminded of the oil spill. 

@NFGNL  Looking for Liability in BP's Gulf Oil Spill: White Collar Watch examines the potenEal criminal and civil liab.. hQp://nyE.ms/9lUMaT 

@jameelee  How You Can Volunteer to Clean Up the Gulf of Mexico Oil Spill hQp://ow.ly/1V3cu 

@conchkid  Gulf;Oil Spill Many Federal Judges Have Links To Oil Industry hQp://bit.ly/9v45UT 

@NewsOnGreen  BP Oil Spill: Containment Cap To Be Replaced Next Month hQp://dlvr.it/1WDZ8 

@TrinitySaveNeo  CiEzen Speaks The Truth ON BP Gulf Oil Spill‐‐the Govt, BP Are Doing Nothing, There Are No Leaders Here hQp://bit.ly/BP‐Gulf‐Oil‐Spill 

Proposed Method (user‐weighted; ω=0.1; ordered)  Proposed Method (user‐weighted; ω=0.6; ordered)  48 6/8/11