1 Natural Language Natural Language Processing Processing @ Emory @ Emory Eugene Agichtein Eugene Agichtein Math & Computer Science and CCI Math & Computer Science and CCI Andrew Post Andrew Post CCI and Biomedical Engineering (?) CCI and Biomedical Engineering (?)
66
Embed
1 Natural Language Processing @ Emory Eugene Agichtein Math & Computer Science and CCI Andrew Post CCI and Biomedical Engineering (?)
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
11
Natural Language Natural Language ProcessingProcessing@ Emory@ Emory
Eugene AgichteinEugene AgichteinMath & Computer Science and CCIMath & Computer Science and CCI
Andrew PostAndrew PostCCI and Biomedical Engineering (?)CCI and Biomedical Engineering (?)
22
Projects in the IR Lab (Agichtein Projects in the IR Lab (Agichtein Lab)Lab)
and Question Answering
Patterns in Text (Author
Behavior)
Patterns in Search
(Searcher Behavior)
Structuring Information in
Bio- and Medical text
Discovering Implicit Networks: Entity, Relation,
and Event Extraction
Content Creation and Discovery in
Social Media
Understanding Searcher
Inference and Decision Process
Question Answering
33
NLP & Text Mining Projects in NLP & Text Mining Projects in IRLabIRLab
EMTextEMText: Information Extraction from : Information Extraction from Text in Electronic Medical RecordsText in Electronic Medical Records
Other projects:Other projects:Collaborative filtering for Med. LiteratureCollaborative filtering for Med. LiteratureRecognizing textual entailment (TAC 2008 Recognizing textual entailment (TAC 2008 RTE track)RTE track)Web-scale semantic network extractionWeb-scale semantic network extraction
44
Information Extraction From EMR Information Extraction From EMR TextText Electronic Medical Records (EMRs) contain Electronic Medical Records (EMRs) contain
important metadata for analysis, data important metadata for analysis, data mining, and decision supportmining, and decision support– Example: patient who has had diabetes should Example: patient who has had diabetes should
have different interpretation of MPI results; have different interpretation of MPI results; depends on how long, how severe, and how long depends on how long, how severe, and how long since has been controlledsince has been controlled
– This information often resides in the text of the This information often resides in the text of the EMR (physican/nurse reports, notes, discharge EMR (physican/nurse reports, notes, discharge summaries)summaries)
Challenges:Challenges:– Access to dataAccess to data– Inconsistent informationInconsistent information– Little or no manually labeled data Little or no manually labeled data
Participated in the I2B2 2008 NLP Obesity Participated in the I2B2 2008 NLP Obesity ChallengeChallenge– The Challenge: to build systems that will The Challenge: to build systems that will
correctly replicate the textual and intuitive correctly replicate the textual and intuitive judgments of the obesity experts on obesity judgments of the obesity experts on obesity and [15] co-morbidities based on the narrative and [15] co-morbidities based on the narrative patient records.patient records.
Our approach: machine learning over Our approach: machine learning over lexical, semantic, and statistical featureslexical, semantic, and statistical features– Words, phrases, UMLS terms in textWords, phrases, UMLS terms in text– NegationNegation– Corpus co-occurrence statisticsCorpus co-occurrence statistics– SVM, boosting, TBL to combine predictionsSVM, boosting, TBL to combine predictions
Outcome:Outcome:– Much room for improvement exists both for Much room for improvement exists both for
accuracy and efficiency, great learning accuracy and efficiency, great learning experienceexperience
I2B2 NLP Challenge 2010
77
User Behavior:User Behavior:The 3The 3rdrd Dimension of the Dimension of the WebWeb
Amount exceeds web Amount exceeds web content and content and structurestructure– Published: 4Gb/day; Published: 4Gb/day; Social Media: Social Media:
Instrumenting the Emory Instrumenting the Emory Library and BeyondLibrary and Beyond
Evaluate effectiveness of Evaluate effectiveness of search/discovery with behavioral search/discovery with behavioral metrics (task-specific)metrics (task-specific)– Perform aggregate, longitudinal studiesPerform aggregate, longitudinal studies
Develop tools for usability studies Develop tools for usability studies ““in in the wildthe wild””– Scale (hundreds/thousands of Scale (hundreds/thousands of
““participantsparticipants””))– Realistic behavior and tasksRealistic behavior and tasks– On-demand playback of On-demand playback of ““interestinginteresting””
sessionssessions
Unified analysis/query framework for Unified analysis/query framework for internal and external resource access internal and external resource access and usage statisticsand usage statistics– Web-based query and statistics interfaceWeb-based query and statistics interface– Access auditing, privacy, anonymity Access auditing, privacy, anonymity
enforcedenforced
1313
Emory User Behavior Analysis Emory User Behavior Analysis System (EUBA)System (EUBA)
(Firefox toolbar)(Firefox toolbar)– Data mining/machine learning Data mining/machine learning
componentscomponents– Log DB management system, web-Log DB management system, web-
based interface for querying, based interface for querying, playback, annotation playback, annotation
Plan: to release the system to Plan: to release the system to research/library community (Q2 research/library community (Q2 2009)?2009)?
141414
Simple featuresSimple features Basic FeaturesBasic Features
– Trajectory Trajectory lengthlength
– Horizontal Horizontal rangerange
– Vertical rangeVertical range
Horizontal range
Vertical range
Trajectory length
151515Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
Mouse Movement Mouse Movement Representation Representation FeaturesFeatures
Second Second representation: representation: – 5 segments: 5 segments: initial, early, middle, initial, early, middle, late, and endlate, and end– Each segment: Each segment: speed, acceleration, speed, acceleration, rotation, slope, etc.rotation, slope, etc.
1
2
3
4
5
1616
Summary of Summary of Experimental ResultsExperimental Results Client-side behavior mining Client-side behavior mining
significantly outperforms aggregate, significantly outperforms aggregate, server-side measures for user intent server-side measures for user intent detection and satisfaction tasks detection and satisfaction tasks
Can be used even if user does not Can be used even if user does not generate server-trackable action (e.g., generate server-trackable action (e.g., click or download)click or download)
Feasible to perform inference on Feasible to perform inference on search instance vs. aggregating across search instance vs. aggregating across different users/searchersdifferent users/searchers 16
1717
OutlineOutline
Overview of Intelligent Overview of Intelligent Information Access Lab ResearchInformation Access Lab Research– Information retrieval & extraction, Information retrieval & extraction,
text mining, and data integrationtext mining, and data integration– User behavior modeling, User behavior modeling,
interactions, and collaborative interactions, and collaborative filteringfiltering
Some goals of mining social Some goals of mining social mediamedia
Find high-quality contentFind high-quality content Find Find relevantrelevant and high quality and high quality
contentcontent Use millions of interactions toUse millions of interactions to
– Understand complex information Understand complex information needsneeds
– Model subjective information Model subjective information seekingseeking
– Understand cultural dynamicsUnderstand cultural dynamics
2121
2222
2323
2424
2525
2626
2727
2828
2929
CommunityCommunity
3030
3131
3232
3333
3434
3535
Editorial Quality != Editorial Quality != User Perception!User Perception!
3636
Lifecycle of a QuestionLifecycle of a Question
User
Choose a category
Choose a category
Compose the question
Compose the question
Openquestion
Openquestion Examine
Find the answer?Find the answer?
Close questionChoose best answers
Give ratings
Close questionChoose best answers
Give ratings
Question is closed by system.Best answer is chosen by voters
Question is closed by system.Best answer is chosen by voters
Yes
No
AnswerAnswer AnswerAnswer AnswerAnswer
User User UserUser User User User
+-
--+ ++
3737
Yahoo! Answers: The Yahoo! Answers: The Good NewsGood News
Active community of millions Active community of millions of users in many countries of users in many countries and languagesand languages
Accumulated a great number Accumulated a great number of questions and answersof questions and answers
Effective for Effective for subjectivesubjective information needsinformation needs– Great forum for Great forum for
socialization/chatsocialization/chat (Can be) invaluable for hard-(Can be) invaluable for hard-
to-find information not to-find information not available on webavailable on web
3838
3939
Yahoo! Answers: The Yahoo! Answers: The Bad NewsBad News
May have to wait a May have to wait a longlong time to get a time to get a satisfactory answersatisfactory answer
May May nevernever obtain a satisfying answer obtain a satisfying answer
0
5
10
15
20
25
30
35
40
1 2 3 4 5 6 7 8 9 10
1. 2006 FIFA World Cup2. Optical3. Poetry4. Football (American)5. Scottish Football (Soccer)6. Medicine7. Winter Sports8. Special Education9. General Health Care10. Outdoor Recreation
Time to close a question (hours) for sample question categories
Tim
e t
o
clo
se
4040
The Problem of Asker The Problem of Asker SatisfactionSatisfaction Given a question submitted Given a question submitted
by an asker in CQA, predict by an asker in CQA, predict whether the user will be whether the user will be satisfiedsatisfied with the answers with the answers contributed by the contributed by the community.community.
– Where Where ““SatisfiedSatisfied”” is defined as:is defined as: The asker personally has closed the The asker personally has closed the
question ANDquestion AND Selected the best answer ANDSelected the best answer AND Provided a rating of at least 3 Provided a rating of at least 3 ““starsstars””
for the best answerfor the best answer
– Otherwise, the asker is Otherwise, the asker is ““UnsatisfiedUnsatisfied””
– The overall fraction of instances The overall fraction of instances classified correctly into the proper class. classified correctly into the proper class.
4848
DatasetDataset
Crawled from Yahoo! Answers in early 2008
Data is available at http://ir.mathcs.emory.edu/
QuestiQuestionon
AnsweAnswerr
AskeAskerr
CategoCategoriesries
% % SatisfieSatisfie
dd216,17
01,963,615
158,515
100 50.7%
4949
Dataset (cont.)Dataset (cont.) Realistic prediction task: given askers’
previous history, we try to predict satisfaction with her current (most recent) question
216,170 questions1,963,615 answers
158,515 askers100 categories
most recent 10,000 questions
random 5000 questions
training test
randomize
5050
Dataset StatisticsDataset StatisticsCategoryCategory #Q#Q #A#A #A per Q#A per Q SatisfiedSatisfied Avg asker Avg asker
ratingratingTime to Time to close by close by askerasker
7.687.68 70.9%70.9% 4.304.30 1 day and 1 day and 13 hours13 hours
MathematicMathematicss
656511
23223299
3.583.58 44.5%44.5% 4.484.48 33 33 minutesminutes
Diet & Diet & FitnessFitness
454500
24324366
5.415.41 68.4%68.4% 4.304.30 1.5 days1.5 days
Asker satisfaction varies significantly across different categories.
#Q, #A, Time to close… -> Asker Satisfaction
5151
Human Satisfaction Human Satisfaction PredictionPrediction
Truth: askerTruth: asker’’s ratings rating A random sample of 130 A random sample of 130
questionsquestions Annotated by researchers to Annotated by researchers to
calibrate the asker calibrate the asker satisfactionsatisfaction– Agreement: 0.82Agreement: 0.82– F1: 0.45F1: 0.45
5252
Human Satisfaction Human Satisfaction Prediction (ContPrediction (Cont’’d):d): Amazon Mechanical TurkAmazon Mechanical Turk
A service provided by Amazon. A service provided by Amazon. Workers submit responses to a Workers submit responses to a Human Intelligence Task (HIT)Human Intelligence Task (HIT) for a for a small feesmall fee
HIT:HIT:– Used the same 130 questionsUsed the same 130 questions– For each question, list the best answer, For each question, list the best answer,
as well as other four answers ordered by as well as other four answers ordered by votesvotes
– Five independent raters for each Five independent raters for each question. question.
– Agreement: 0.9 F1: 0.61. Agreement: 0.9 F1: 0.61. – Best accuracy achieved when at least 4 Best accuracy achieved when at least 4
out of 5 raters predicted asker to be out of 5 raters predicted asker to be ‘‘satisfiedsatisfied’’ (otherwise, labeled as (otherwise, labeled as ““unsatisfiedunsatisfied””).).
5353
Amazon Mechanical Amazon Mechanical TurkTurk
5454
Comparison of Classifiers Comparison of Classifiers (F-score)(F-score)
ClassifierClassifier With TextWith Text Without TextWithout Text Selected Selected FeaturesFeatures
C4.5 is the most effective classifier in this task
Human F1 performance is lower than the naïve baseline!
5555
F1 (Satisfied) with varying F1 (Satisfied) with varying training sizestraining sizes
ASP_C4.5 substantially outperforms others
2000 questions is sufficient to achieve 0.75 F1
5656
Features by Information Gain Features by Information Gain (Satisfied)(Satisfied)
0.14219 Q: Askers’ previous rating 0.13965 Q: Average past rating by asker 0.10237 UH: Member since (interval) 0.04878 UH: Average # answers for by past
Q 0.04878 UH: Previous Q resolved for the
asker 0.04381 CA: Average asker rating for the
category 0.04306 UH: Total number of answers
received 0.03274 CA: Average voter rating 0.03159 Q: Question posting time 0.02840 CA: Average # answers per Q
5757
““OfflineOffline”” vs. vs. ““OnlineOnline”” PredictionPrediction
Offline prediction:Offline prediction:– All features( question, answer, asker All features( question, answer, asker
& category)& category)– F1: 0.77F1: 0.77
Online prediction:Online prediction:– all answer featuresall answer features– question features (stars, question features (stars,
#comments, sum of votes#comments, sum of votes……))– F1: 0.74F1: 0.74
5858
Feature AblationFeature AblationPrecision Recall F1
caring or supportive answers might be preferred sometimes
5959
Satisfaction with varying Satisfaction with varying experienceexperience
Group together questions from askers with the same number of previous questionsAccuracy of prediction increase dramaticallyReaching F1 of 0.9 for askers with >= 5 questions
6060
SummarySummary Asker satisfaction is predictableAsker satisfaction is predictable
– Can achieve higher than human accuracy Can achieve higher than human accuracy by exploiting historyby exploiting history
UserUser’’s experience is importants experience is important General model: one-size-fits-allGeneral model: one-size-fits-all
– 2000 questions for training model are 2000 questions for training model are enoughenough
Current workCurrent work– Personalized satisfaction predictionPersonalized satisfaction prediction– Y.Liu, E. Agichtein.Y.Liu, E. Agichtein. You've Got Answers: Towards You've Got Answers: Towards
Personalized Models for Predicting Success in Personalized Models for Predicting Success in Community Question Answering (ACL 2008)Community Question Answering (ACL 2008)
6161
ACL08ACL08
Textual features only become helpful Textual features only become helpful for users with more than 20 questionsfor users with more than 20 questions
Personalized classifier achieves Personalized classifier achieves surprisingly good accuracysurprisingly good accuracy
For users with only 1 previous question, For users with only 1 previous question, personalized classifiers works very wellpersonalized classifiers works very well
Simple strategy of grouping users by Simple strategy of grouping users by number of previous questions is even number of previous questions is even more effective than other methods for more effective than other methods for users with moderate amount of historyusers with moderate amount of history
For users with few questions, non-For users with few questions, non-textual features are dominanttextual features are dominant
For users with lots of questions, textual For users with lots of questions, textual features are more significantfeatures are more significant
Some personalized Some personalized modelsmodels
6262
6363
Other tasksOther tasks
Subjectivity, sentiment Subjectivity, sentiment analysisanalysis– B. Li, Y. Liu, and E. Agichtein, B. Li, Y. Liu, and E. Agichtein, CoCQA: CoCQA:
Co-Training Over Questions and Co-Training Over Questions and Answers with an Application to Answers with an Application to Predicting Question Subjectivity Predicting Question Subjectivity OrientationOrientation, in EMNLP 2008, in EMNLP 2008
Discourse analysisDiscourse analysis Cross-cultural comparisonsCross-cultural comparisons CQA vs. web search CQA vs. web search
comparisoncomparison
6464
6565
OutlineOutline
Overview of Intelligent Overview of Intelligent Information Access Lab ResearchInformation Access Lab Research– Information retrieval & extraction, Information retrieval & extraction,
text mining, and data integrationtext mining, and data integration– User behavior modeling, User behavior modeling,
interactions, and collaborative interactions, and collaborative filteringfiltering