Top Banner
Data Mining and Machine Learning- in a nutshell Arizona State University Data Mining and Machine Learning Lab Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 1 DATA MINING AND MACHINE LEARNING IN A NUTSHELL LEARNING TO RECOGNIZE RELIABLE USERS AND CONTENT IN SOCIAL MEDIA WITH COUPLED MUTUAL REINFORCEMENT Mohammad-Ali Abbasi http://www.public.asu.edu/~mabbasi2/ SCHOOL OF COMPUTING, INFORMATICS, AND DECISION SYSTEMS ENGINEERING ARIZONA STATE UNIVERSITY http://dmml.asu.edu/
27

Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Jan 27, 2015

Download

Education

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement, Mohammad Ali Abbasi,
Arizona State University
http://dmml.asu.edu
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 1

DATA MINING AND MACHINE LEARNINGIN A NUTSHELL

LEARNING TO RECOGNIZE RELIABLE USERS AND CONTENT IN SOCIAL MEDIA WITH COUPLED MUTUAL

REINFORCEMENT

Mohammad-Ali Abbasihttp://www.public.asu.edu/~mabbasi2/

SCHOOL OF COMPUTING, INFORMATICS, AND DECISION SYSTEMS ENGINEERINGARIZONA STATE UNIVERSITY

http://dmml.asu.edu/

Page 2: Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 2

About the paper

• Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement– Jiang Bian, Georgia Institute of Technology– Yandong Liu, Emory University– Ding Zhou, Facebook Inc.– Eugene Agichtein, Emory University– Hongyuan Zha, Georgia Institute of Technology

• WWW 2009, April 20–24, 2009, Madrid, Spain.

2

Page 3: Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 3

Community Question Answering (CQA)

• Is a popular forum for users to pose questions for the other users to answer

• User can ask natural language question

• Is comparable with regular web search

3

Page 4: Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 4

Sample: Yahoo! Answers

• Introduction

4

Page 5: Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 5

What is the problem?

• retrieve answers from a social media archive with a large amount information– the quality, accuracy, and comprehensiveness of

the submitted questions and answers varies widely

– A large fraction of the content is not useful for answering queries

– Current approaches require large amounts of manually labeled data

5

Page 6: Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 6

CQA environment

• Users

• Question

• Answers

6

Page 7: Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 7

The goal

• Identify – High quality Answers– High quality Questions– High reputation Users

• Simultaneously

• With the minimum manual labeling

7

Page 8: Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 8

The contribution of this paper

• developing a semi-supervised coupled mutual reinforcement framework for simultaneously calculating content quality and user reputation, that requires relatively few labeled examples to initialize the training process

• more effective for finding high-quality answers, questions, and users.

• improves the accuracy of search over CQA archives

8

Page 9: Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 9

Current approaches

• Relies on the users reputation,

• OR- Require large amount of supervision,

• OR- focus on the network properties of the CQA

• without considering the actual content of the information exchanged

9

Page 10: Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 10

How to rank?

• Current approaches:– Content QualityOR– User reputation

• This paper:– Content QualityAND– User reputation

10

Page 11: Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 11

Definitions

• Question Quality– A question's effectiveness at attracting high quality

answers

• Answer Quality– the responsiveness, accuracy, and comprehensiveness of

the answer to a question.

• Question Reputation– indicating the expected quality of the questions posted by

a user

• Answer Reputation– the expected quality of the answers posted by a user.

11

Page 12: Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 12

Model the problem

• Solution

12

Page 13: Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 13

Mutual reinforcement Principle

• Solution

13

Page 14: Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 14

Feature Space: X(Q), X(A), X(U)

• Solution

14

Page 15: Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 15

Learning quality and reputation(Coupled Mutual Reinforcement)

• P(x): probability of being “good”

• Model of P(x)

• B is Coefficient of the linear model and can be found by maximizing:

15

Page 16: Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 16

Non independent equations

• Conditional log-likelihood

• Objective function

16

Page 17: Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 17

CQA-MR Algorithm

• Solution

17

Page 18: Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 18

Experimental Setup- Data Collection

• From Yahoo! Answers with their API

• Use TREC QA benchmark Archive to crawl QA archives (http://trec.nist.gov/data.html)

• Get all available answers for each question– 107293 users– 27354 questions– 224617 answers

18

Page 19: Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 19

Evaluation Metrics

• Mean Reciprocal Rank(MRR)– the reciprocal of the rank at which the first relevant

answer was returned, or 0 if none of the top N results contained a relevant answer

• Precision at K– for a given query, P(K) reports the fraction of answers

ranked in the top K results that are labeled as relevant

• Mean Average of Precision(MAP)– the mean of the precision at K values calculated after each

relevant answer was retrieved

19

Page 20: Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 20

User reputation methods

• Baseline– users are ranked by “indegree" (number of answers

posted)

• HITS– Users are ranked based on their authority scores

• CQA-Supervised– classify users into those with "high" and "low” reputation,

and trained over the features

• CQA-MR– predict user reputation based on mutual- reinforcement

algorithm

20

Page 21: Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 21

CQA Retrieval methods

• Baseline– score computed as the difference of up votes and down

votes

• Gbrank– did not include answer and question quality and user

reputation

• GBrank-HITS:– optimized GBrank by adding user reputation calculated by

HITS algorithm

• GBrank-Supervised– supervised learning and optimize GBrank by adding

obtained quality21

Page 22: Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 22

Precision at K for the top contributors

• Experiments

22

Page 23: Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 23

Precision at K

• Experiments

23

Page 24: Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 24

Accuracy

• Experiments

24

Page 25: Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 25

Training Labels

• Experiments

25

Page 26: Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 26

Training Labels

• Experiments

26

Page 27: Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 27

Mohammad-Ali Abbasi (Ali), Ali, is a Ph.D student at Data Mining and Machine Learning Lab, Arizona State University. His research interests include Data Mining, Machine Learning, Social Computing, and Social Media Behavior Analysis.

http://www.public.asu.edu/~mabbasi2/