Question-Answering on Question-Answering on Yahoo!Answers: Preliminary Yahoo!Answers: Preliminary Results Results Rong Tang Rong Tang Sheila Denn Sheila Denn OCLC/ALISE LIS Research Grant Presentation OCLC/ALISE LIS Research Grant Presentation ALISE 2009 ALISE 2009 January 23, 2009 January 23, 2009
22
Embed
Question-Answering on Yahoo!Answers: Preliminary Results Rong Tang Sheila Denn OCLC/ALISE LIS Research Grant Presentation ALISE 2009 January 23, 2009.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Question-Answering on Question-Answering on Yahoo!Answers: Yahoo!Answers:
Users post questions, answer questions, Users post questions, answer questions, rate answers, provide commentsrate answers, provide comments
One best answer chosen by the asker or One best answer chosen by the asker or through votethrough vote
Users may provide commentsUsers may provide comments
Rating/Voting/Rating/Voting/CommentingCommenting
Our Research Our Research ProjectProject
Funded by OCLC/ALISE Grant Program and Funded by OCLC/ALISE Grant Program and Simmons College President’s Fund for Simmons College President’s Fund for ResearchResearch
The The project wiki page documents the relevant documents the relevant literature and project progression, with literature and project progression, with extensive meeting notes on coding decisionsextensive meeting notes on coding decisions
Are existing question taxonomies (such as Are existing question taxonomies (such as those in Graesser et al. (1994) and Freed those in Graesser et al. (1994) and Freed (1994)) valid in a social Q&A environment?(1994)) valid in a social Q&A environment?
What are the relationships between the What are the relationships between the linguistic characteristics, functional properties, linguistic characteristics, functional properties, and subject content of the questions and the and subject content of the questions and the kinds of responses that they receive?kinds of responses that they receive?
What are the characteristics of answers that are What are the characteristics of answers that are chosen as “best” answers?chosen as “best” answers?
What is the role of the social function vs. the What is the role of the social function vs. the information function in social Q&A?information function in social Q&A?
What are the implications of the above for What are the implications of the above for provision of library and information services?provision of library and information services?
Answer classificationAnswer classificationMuch less research here than with question Much less research here than with question classificationclassification
Criteria based on Yahoo!Answers comments (Kim et al., Criteria based on Yahoo!Answers comments (Kim et al., 2007)2007)
Previous Previous Research Research (cont.)(cont.)
Formal studies of Online Q&AFormal studies of Online Q&AAnswerers: “specialists” vs. “synthesists” Answerers: “specialists” vs. “synthesists” (Gazan, 2006)(Gazan, 2006)
Questioners: “seekers” vs. “sloths” (Gazan, Questioners: “seekers” vs. “sloths” (Gazan, 2007)2007)
Question purpose (Graesser, et al., 1994)Question purpose (Graesser, et al., 1994)Filling knowledge gapsFilling knowledge gaps
Establishing and monitoring common groundEstablishing and monitoring common ground
Coordinating social actionCoordinating social action
Directing the conversation and controlling Directing the conversation and controlling attention attention
Research PlanResearch PlanData collection and samplingData collection and sampling
Gathered a stratified random sample of Gathered a stratified random sample of 3,000 question-answer sets, including 3,000 question-answer sets, including any commentsany commentsStratified by 25 top-level categories Stratified by 25 top-level categories assigned by Yahoo!Answersassigned by Yahoo!Answers
Data codingData codingContent analysis at multiple levelsContent analysis at multiple levels
Data AnalysisData AnalysisDescriptive statistics will be produced for:Descriptive statistics will be produced for:
Frequency of answers provided per questionFrequency of answers provided per questionAverage length of time to first answerAverage length of time to first answerDistribution of subject categories Distribution of subject categories Distribution of question and answer typesDistribution of question and answer typesDistribution of chosen answer typesDistribution of chosen answer types
Correlation analysis will be performed for:Correlation analysis will be performed for:Linguistic characteristics of questions and Linguistic characteristics of questions and answersanswersFunctional categories of questions and answersFunctional categories of questions and answersSubject categories of questions and answersSubject categories of questions and answers
Progress to DateProgress to DateSample has been collectedSample has been collected
Preliminary coding has begunPreliminary coding has begunSyntactic coding of questions is completeSyntactic coding of questions is complete
Syntactic coding of question descriptions Syntactic coding of question descriptions is completeis complete
Number of questions included in description Number of questions included in description texttextType of questionsType of questions
Data CodingData CodingTwo coders perform coding individually then go Two coders perform coding individually then go over the coding to reach consensus on final over the coding to reach consensus on final coding of each question coding of each question
Use of informal language presents a challenge for Use of informal language presents a challenge for codingcoding
Is it a question if it doesn’t include a question mark? Is it Is it a question if it doesn’t include a question mark? Is it a question simply because it has a question mark in the a question simply because it has a question mark in the end?end?Should “WTF” be coded a “what” question or other Should “WTF” be coded a “what” question or other question? Or not at all?question? Or not at all?Coding multiparts of a question, eg., “Why do husbands Coding multiparts of a question, eg., “Why do husbands feel they have to lie to other women about being feel they have to lie to other women about being married, and when the other woman finds out?”married, and when the other woman finds out?”Double coding questions such as "Is there anywhere you Double coding questions such as "Is there anywhere you can listen to citizen band radio online?" can listen to citizen band radio online?"
Preliminary Preliminary ResultsResults
Number of Answers Number of Answers Per Question Per Question
Average Number of Answers per Question by Category
8.2
7.86
7.14
6.98
6.92
6.72
6.46
6.37
6.28
6.18
6.08
5.79
5.51
4.78
3.84
3.76
3.68
3.68
3.65
3.61
3.28
3.15
2.98
2.89
2.63
0 1 2 3 4 5 6 7 8 9
pregancyparentingdiningout
politicsgovbeautystyle
socialscience
environmentfamilyrelationships
pets
societyculturefooddrink
newsevents
sportsentertainmentmusic
artshumanities
healtheducationreference
homegarden
travelgamrecreation
carstransportation
consumerelectronicsciencemath
computerinternet
businessfinancelocalbusiness
Length to Receive Length to Receive 11st st Answer Answer