Top Banner
Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS) Saptarshi Ghosh
52

Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Dec 23, 2015

Download

Documents

scott cook
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Topical search in the Twitter OSN

Collaborators:

Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP)Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Saptarshi Ghosh

Page 2: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Topical search in Twitter Twitter has emerged as an important source

of information & real-time news Search for breaking news and trending topics

Topical search Searching for topical experts Searching for information on specific topics

Primary requirement: Identify topical expertise of users

Page 3: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Profile of a Twitter user

Page 4: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Example tweets

Page 5: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Prior approaches to find topic experts Research studies

Pal et. al. (WSDM 2011) uses 15 features from tweets, network, to identify topical experts

Weng et. al. (WSDM 2010) uses ML approach

Application systems Twitter Who To Follow (WTF), Wefollow, … Methodology not fully public, but reported to utilize

several features

Page 6: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Prior approaches use features extracted from User profiles

Screen-name, bio, …

Tweets posted by a user Hashtags, others retweeting a given user, …

Social graph of a user Number of followers, PageRank, …

Page 7: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Problems with prior approaches User profiles – screen-name, bio, …

Bio often does not give meaningful information

Tweets posted by a user Tweets mostly contain day-to-day conversation

Social graph of a user – number of followers, PageRank Helps to identify authoritative users, but … Does not provide topical information

Page 8: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

We propose … Use a completely different feature to infer

topics of expertise for an individual Twitter user

Utilize social annotations How does the Twitter crowd describe a user? Social annotations obtained through Twitter Lists Approach essentially relies on crowdsourcing

Page 9: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Twitter Lists Primarily an organizational feature

Used to organize the people one is following

Create a named list, add an optional List description

Add related users to the List

Tweets posted by these users will be grouped together as a separate stream

Page 10: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

How Lists work ?

Page 11: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Using Lists to infer topics for users If U is an expert / authority in a certain topic

U likely to be included in several Lists List names / descriptions provide valuable

semantic cues to the topics of expertise of U

Page 12: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Inferring topical attributes of users

Page 13: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Dataset Collected Lists of 55 million Twitter users who

joined before or in 2009 88 million Lists collected in total

All studies consider 1.3 million users who are included in 10 or more Lists

Most List names / descriptions in English, but significant fraction also in French, Portuguese, …

Page 14: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Mining Lists to infer expertise Collect Lists containing a given user U

List names / descriptions collected into a ‘topic document’ for the given user

Identify U’s topics from the document Ignore domain-specific stopwords Identify nouns and adjectives Unify similar words based on edit-distance,

e.g., journalists and jornalistas, politicians and politicos (not unified by stemming)

Page 15: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Mining Lists to infer expertise

Unigrams and bigrams considered as topics

Extracted from topic document of U: Topics for user U Frequencies of the topics in the

document

Page 16: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Topics inferred from Lists

linux, tech, open, software, libre, gnu, computer, developer, ubuntu, unix

politics, senator, congress, government, republicans, Iowa, gop, conservative

politics, senate, government, congress, democrats, Missouri, progressive, women

celebs, actors, famous, movies, comedy, funny, music, hollywood, pop culture

Page 17: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Lists vs. other features

love, daily, people, time, GUI, movie, video, life, happy, game, cool

Most common words from tweets

celeb, actor, famous, movie, stars, comedy, music, Hollywood, pop culture

Most common words from Lists

Profile bio

Page 18: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Lists vs. other features

Fallon, happy, love, fun, video, song, game, hope, #fjoln, #fallonmono

Most common words from tweets

celeb, funny, humor, music, movies, laugh, comics, television, entertainers

Most common words from Lists

Profile bio

Page 19: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Evaluation of inferred topics – 1 Evaluated through user-survey

Evaluator shown top 30 topics for a chosen user Are the inferred attributes (i) accurate, (ii)

informative? Binary response for both queries

More than 93% evaluators judged the topics to be both accurate and informative The few negative judgments were a result of

subjectivity

Page 20: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Evaluation of inferred topics – 2 Comparison with topics identified by Twitter WTF

Obtained top 20 WTF results for about 200 queries 3495 distinct users

Topics inferred by us from Lists include query-topic for 2916 users (83.4%)

For the rest Case 1 – inferred topics include semantically very

similar words, but not exact query-word (18%) Case 2 – wrong results by WTF, unrelated to query

(58%)

Page 21: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Comparison with Twitter WTF Restaurant dineLA for query “dining”

Inferred topics – food, restaurant, recipes, los angeles Space explorer HubbleHugger77 for query “hubble”

Inferred topics – science, tech, space, cosmology, nasa

Comedian jimmyfallon for query “astrophysicist” Inferred topics – celebs, comedy, humor, actor

Web developer ScreenOrigami for query “origami” Inferred topics – webdesign, html, designers

Case 1

Case 2

Page 22: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Who-is-who service Developed a Who-is-

Who service for Twitter

Shows word-cloud for major topics for a user

http://twitter-app.mpi-sws.org/who-is-who/Inferring Who-is-who in the Twitter

Social Network, WOSN 2012 (Highest rated paper in workshop)

Page 23: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Identifying topical experts

Page 24: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Topical experts in Twitter 400 million tweets posted daily

Quality of tweets posted by different users vary widely News, pointless babble, conversational tweets,

spam, …

Challenge: to find topical experts Sources of authoritative information on specific

topics

Page 25: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Basic methodology Given a query (topic)

Identify experts on the topic using Lists Discussed earlier

Rank identified experts w.r.t. expertise on the given topic Need a suitable ranking algorithm Commonly used ranking metrics such as number of

followers, PageRank does not consider topic

Page 26: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Ranking experts Two components of ranking user U w.r.t. query

Q: relevance of U to Q, popularity of U

Relevance of user to query Cover density ranking between topic document TU of

user U and Q Cover Density ranking preferred for short queries

Popularity of user: Number of Lists including the user

Topic relevance( TU, Q ) × log( #Lists including U )

Page 27: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Cognos Search system for topical experts in Twitter

Publicly deployed athttp://twitter-app.mpi-sws.org/whom-to-follow/

Cognos: Crowdsourcing Search for Topic Experts in Microblogs, ACM International SIGIR Conference 2012

Page 28: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Cognos results for “politics”

Page 29: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Cognos results for “stem cell”

Page 30: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Cognos results for “earthquake”

Page 31: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Evaluation of Cognos System evaluated ‘in-the-wild’

People were asked to try the system and give feedback

Evaluators were students & researchers from the home institutes of researchers

Advantage – lot of varied queries tried

Disadvantage – subjectivity in relevance judgement

Page 32: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

User-evaluation of Cognos

Page 33: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Sample queries for evaluation

Page 34: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Evaluation results Overall 2136 relevance judgments over 55

queries 1680 said relevant (78.7%)

Large amount of subjectivity in evaluations Same result for same query received both relevant

and non-relevant judgments

E.g., for query “cloud computing”, Werner Vogels got 4 relevant judgments, 6 non-relevant judgments

Page 35: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Cognos vs Twitter Who-to-follow Evaluator shown top 10 results by both

systems Result-sets anonymized Evaluator judges which is better / both good / both

bad Queries chosen by evaluators themselves

27 distinct queries were asked at least twice In total, asked 93 times

Judgment by majority voting

Page 36: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)
Page 37: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Cognos vs Twitter WTF Cognos judged better on 12 queries

Computer science, Linux, mac, Apple, ipad, India, internet, windows phone, photography, political journalist

Twitter WTF judged better on 11 queries Music, Sachin Tendulkar, Anjelina Jolie, Harry Potter,

metallica, cloud computing, IIT Kharagpur Mostly names of individuals or organizations

Tie on 4 queries Microsoft, Dell, Kolkata, Sanskrit as an official language

Page 38: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Topical content search

Page 39: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Challenges in topical content search Services today are limited to keyword search

Search for ‘politics’ get only tweets which contain the word ‘politics’

Knowing which keywords to search for, is itself an issue

Individual tweets are too small to deduce topics

Scalability: 400M tweets posted per day

Tweets may contain spam / rumors / phishing URLs

Page 40: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Our approach Look at tweets posted by a selected set of

topical experts

Inferring topic of tweets from tweeters’ expertise Large fraction of tweets posted by experts are only

about day-to-day conversation

Solution: If multiple experts on a topic tweet about something, it is most likely related to the topic

Page 41: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Sampling Tweets from Experts We capture all tweets from 585K topical experts

Identified through Lists Expertise in a wide variety of topics

The experts generate 1.46 million tweets per day 0.268% of all tweets on twitter scalable

Trustworthiness Experts not likely to post spam / phishing URLs Less chance of rumors in what is posted by several

experts

Page 42: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Methodology at a Glance Gather tweets from experts on given topic Group tweets on the same news-story

We use a group of hashtags to represent a news-story

Multi-level clustering (cluster: news-story) Cluster tweets based on the hashtags they contain Cluster hashtags based on co-occurrence

Rank new-stories by popularity Number of distinct experts tweeting on the story Number of tweets on the story

Page 43: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Results for thelast week on

Politics (a popular topic)

Page 44: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Related tweetsgrouped together bycommon hashtags.

The most popular tweet in the story shown

Hashtags which co-occur frequently grouped together

Page 45: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Our system specially excels for niche topics.

Page 46: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Evaluation – Relevance Evaluated using human feedback

Used Amazon Mechanical Turk for user evaluation Evaluated top 10 clusters for 20 topics

Users have to judge if the tweet shown was relevant to the given topic Options are Relevant / Not Relevant / Can’t Say

Page 47: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Evaluating Tweet Relevance We obtained 3150 judgments

80% of tweets marked relevant by majority judgment

Non-relevant results primarily due to Global events that were discussed by experts

across all topics, e.g., Hurricane Sandy in the USA

Sometimes, topic is too specific and several experts tweet on a broader topic (e.g., baseball and ESPN Sports Update)

Page 48: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Effect of global events

Experts on all topics tweeting on #sandy Most of these got negative judgments

Page 49: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Diversity of topics in Twitter

Page 50: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Topics in Twitter Discovering thousands of experts on diverse topics

characterizing the Twitter platform as a whole

On what topics is expert content available in Twitter?

Popular view – few topics such as politics, sports, music, celebs, …

We find – lots of niche topics along with the popular ones

Page 51: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Topics in Twitter – major topics to niche ones what Twitter is mostly known for

wide variety of niche topics

Page 52: Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Thank You

Contact: [email protected]