Top Banner
Information Retrieval and Social Media Prof.dr.ir. Arjen P. de Vries [email protected] Lecture for the User-Centred Social Media Summer School Duisburg, September 19, 2017
56

Information Retrieval and Social Media

Jan 21, 2018

Download

Science

Arjen de Vries
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Information Retrieval and Social Media

Information Retrieval and Social Media

Prof.dr.ir. Arjen P. de Vries

[email protected] for the User-Centred Social Media Summer School

Duisburg, September 19, 2017

Page 2: Information Retrieval and Social Media

Social MediaNoun

social media (uncountable)

Interactive forms of media that allow users to interact with and publish to

each other, generally by means of the Internet.

The early 21st century saw a huge increase in social media thanks to the widespread availability of the

Internet.

Page 3: Information Retrieval and Social Media

Social Media

“Social bookmarking” sites “User generated content”

- Images (flickr) and videos (youtube, vimeo), but also blogs, Wikipedia, etc. Social network services

- Twitter, facebook, instagram, snapchat

Page 4: Information Retrieval and Social Media
Page 5: Information Retrieval and Social Media
Page 6: Information Retrieval and Social Media
Page 7: Information Retrieval and Social Media
Page 8: Information Retrieval and Social Media

Not just one beast!

Page 9: Information Retrieval and Social Media

User contributed content

Page 10: Information Retrieval and Social Media

Permission based tagging, Set model

Page 11: Information Retrieval and Social Media

Bag model

Global Content

Free for all tagging

Page 12: Information Retrieval and Social Media

Social Media to help improve IR (1)

Page 13: Information Retrieval and Social Media

‘Co-creation’ Social Media:

- Consumer becomes a co-creator- Many ‘data consumption’ traces in social media are public

Page 14: Information Retrieval and Social Media

Richer information representations

Page 15: Information Retrieval and Social Media

Richer information representations User profiles

- User name, full name, description, image, homepage url, etc. Connections between users

- Networks of friends, followers, etc Comments/reactions Endorsing and sharing

Page 16: Information Retrieval and Social Media

E.g., Twitter Bio

- Often includes a geo-location of the profile Friends Followers Lists

- Groups followed Twitter accounts; lists can be followed Hashtags Mentions

Page 17: Information Retrieval and Social Media

User Demographics Gender from Tweet author’s first name Geographic location from profile

Diaz, Gamon, Hofman, Kiciman, Rothschild. Online and Social Media as an Imperfect Continuous Panel Survey. In PLOS ONE, 2016

Page 18: Information Retrieval and Social Media

Detailed User Characteristics…

de Volkskrant, March 13, 2013

Michal Kosinski, David Stillwell, and Thore Graepel. Private traits and

attributes are predictable from digital records of human behavior. PNAS

2013.

Youyou, W., Kosinski, M. & Stillwell, D. (2015) Computer-based personality judgments are more accurate than

those made by humans. PNAS 2015.

Page 19: Information Retrieval and Social Media

… in Search Age and Gender, and perhaps also political and religious

views Maps both Page Likes from myPersonality dataset and

search results on a common space of ODP categories Learning approach to overcome the difference in

distribution between myPersonality data and Search data- E.g., their FB dataset has 63% female, vs. only 47% in Bing

Bi, Kosinski, Shokouhi, Graepel. Inferring the Demographics of Search Users. WWW 2013

Page 20: Information Retrieval and Social Media

Many Opportunities for IR Expand content representation Reduce the vocabulary gap(s) between creators of

content (the indexers) and consumers of content (the users)

More diverse views on the same content

Page 21: Information Retrieval and Social Media

LibraryThing Items People Tags Ratings

Page 22: Information Retrieval and Social Media

Synonyms

Page 23: Information Retrieval and Social Media

Synonyms

Dissimilar users… … with similar items

(Pearson Correlation)

Note: this representation ignored the item ratings

Page 24: Information Retrieval and Social Media
Page 25: Information Retrieval and Social Media

Examples• Humour

• Classic

Page 26: Information Retrieval and Social Media

IR to help improve Social Media

Page 27: Information Retrieval and Social Media

LibraryThing – beyond terms

Items People Tags Ratings

Page 28: Information Retrieval and Social Media

Maarten Clements, Arjen P. de Vries and Marcel J.T. Reinders. The task dependent effect of tags and ratings on social media access. TOIS 28, 4, article 21 (November 2010), 42 pages.

Page 29: Information Retrieval and Social Media

Search with Random Walk

Present nodes according to estimated probability that a random walk that starts from (task dependent) starting nodes, would end at this node

Page 30: Information Retrieval and Social Media

Tagging Relationships

Page 31: Information Retrieval and Social Media

Note: this representation used the item ratings in the user – item transitions

Page 32: Information Retrieval and Social Media

An item recommendation walk

Page 33: Information Retrieval and Social Media

Personalized Search

Assume a user who types a single tag as query

Page 34: Information Retrieval and Social Media

A soft clustering effect smoothly relates similar concepts before converging to the background probability

Page 35: Information Retrieval and Social Media

Homographs like “Java” are disambiguated because the walk starts in both the query tag and the target user- So, content that matches the user’s preference is more likely to

be found first

Page 36: Information Retrieval and Social Media

Expert Finding on Twitter Empirical evidence demonstrates that a mix of tweet text,

friends, followers and lists is most effective to infer expertise

Expertise ground truth taken from Quora, where (many) users list their expertise and their social media accounts

Xu, Zhou and Lawless. Inferring your expertise from Twitter: combining multiple types of user activity. WI ‘2017

Page 37: Information Retrieval and Social Media

Multiple Social Networks

Accounts linked via services like about.me and Quora Users explicitly list their multiple accounts in one profile

Missing data addressed via non-negative matrix factorization (NMF)- E.g., 57% list school in FB, 81% in LinkedIn

Applied to various prediction tasks, e.g.,topics users are interesting in

Page 38: Information Retrieval and Social Media

Social Media to help improve IR (2)

Page 39: Information Retrieval and Social Media

Relevant for Search… (1/4) Wikipedia contains semantically very rich annotations:

- Wikipedia Categories, Lists- Times (1930, 1931, 1932, etc. etc.)- Disambiguation pages- Edit historyEtc.

Note: DBPedia is “just” Wikipedia

Page 40: Information Retrieval and Social Media

Relevant for Search… (2/4) “Twanchor text”

- Tweets citing online media can be used as additional resources describing the content, just like anchor text

Page 41: Information Retrieval and Social Media

Relevant for Search… (3/4) Geotags / POIs

- Recommend geo-locations to people- Recommend people to geo-locations- Predict a user’s whereabouts (or “trails”)

Page 42: Information Retrieval and Social Media

Relevant for Search… (4/4) Timestamps

- Helps reveal trends, e.g., which documents went viral?- Allows to search “in the past”

Page 43: Information Retrieval and Social Media

Searching the Social Web Do not improve Web search with social annotations, but

improve search in Social

Builds on the observation in prior work (Goel et al., 2016) that virality is really different from popularity- The most viral content is often distinct from the most popular

content being shared online- Can we surface that content more easily?

Alonso, Kandylas, Tremblay, Hofman, Sen. What’s Happening and What Happened: Searching theSocial Web. WebSci ‘17.

Page 44: Information Retrieval and Social Media
Page 45: Information Retrieval and Social Media

Pipeline Content selection:

- Select tweets that contain links and satisfy simple user, content and time range criteria

User selection: - Extract and normalize links and select those that have been

shared by a minimum number of trusted users Link selection:

- Clean-up links, compute link virality and popularity, cluster similar links, and apply heuristic criteria to select good quality links

Annotations: - Generate metadata for the selected links from the associated

tweets

Page 46: Information Retrieval and Social Media
Page 47: Information Retrieval and Social Media

Collecting Data

Page 48: Information Retrieval and Social Media

API BluesBit.ly API used in my own research:

/v3/link/contentdeprecated Note: This endpoint was deprecated on 10/15/2014.

Page 49: Information Retrieval and Social Media

API Blues The combination of rate limits and Terms of Service of

most social media platforms complicates our life

Not even to mention volume- TREC Microblog collection of 2013 “Tweets2013” consists of

107 GB compressed (for only 2 months of data!)

Did I mention ToS?- Mandatory continual processing of deletions…

Page 50: Information Retrieval and Social Media

Good News for Twitter The Internet Archive distributes two collections from 2013

that can be used as drop-in replacement for evaluation purposes

Deletions seem to affect non-relevant documents more than relevant documents

Sequira and Lin. Finally, a Downloadable Test Collection of Tweets. SIGIR 2017.

Page 51: Information Retrieval and Social Media

Social Media as Panel Survey Online population is a non-representative sample of the

off-line world Demographic skew and user participation is non-

stationary and difficult to predict over time- E.g., women are underrepresented in the raw volume of tweets,

but tweet more often about politics than men- Half of the activity on a specific debate came from individuals

who had not previously posted about the election

Diaz, Gamon, Hofman, Kiciman, Rothschild. Online and Social Media as an Imperfect Continuous Panel Survey. In PLOS ONE, 2016

Page 52: Information Retrieval and Social Media

Fred Morstatter, Jürgen Pfeffer, Huan Liu and Kathleen M. Carley. Is the Sample Good Enough? Comparing Data from Twitter’s

Streaming API with Twitter’s Firehose.ICWSM 2013

API Blues

Page 53: Information Retrieval and Social Media

Take home message(s)

Page 54: Information Retrieval and Social Media

Take home message(s)• Social media give access to a rich resource of context

- Including time & location!

Page 55: Information Retrieval and Social Media

Take home message(s)• Social media give access to a rich resource of context

- Including time & location!

• The academic’s alternative to click data?

Page 56: Information Retrieval and Social Media

Take home message(s)• Social media give access to a rich resource of context

- Including time & location!

• The academic’s alternative to click data?

• A big open research question:

Can one theory (about matching users and content) address the

complete spectrum of IR tasks that arise in social media?