Top Banner
text mining, machine learning, NLP and all that (in 10 minutes) Byron C Wallace Brown Center for Evidence Based Medicine #CochraneTech
15

Text mining, machine learning, NLP and all that (in 10 minutes)

Dec 17, 2014

Download

Technology

Byron C Wallace, from #CochraneTech Symposium, Québec 2013
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Text mining, machine learning, NLP and all that (in 10 minutes)

text mining, machine learning, NLP and all that (in 10 minutes)

Byron C WallaceBrown Center for Evidence Based Medicine

#CochraneTech

Page 2: Text mining, machine learning, NLP and all that (in 10 minutes)

why do we need this stuff?

[Bastian et al, PLoS Medicine 2010]

Page 3: Text mining, machine learning, NLP and all that (in 10 minutes)

why do we need this stuff?

[Bastian et al, PLoS Medicine 2010]

eleven systematic reviews. every day.

Page 4: Text mining, machine learning, NLP and all that (in 10 minutes)

PubMed growth

[http://altmetrics.org/wp-content/uploads/2010/10/medline-articles-by-year-lg.png]

Page 5: Text mining, machine learning, NLP and all that (in 10 minutes)

what can we automate

Page 6: Text mining, machine learning, NLP and all that (in 10 minutes)

what can we automate

Page 7: Text mining, machine learning, NLP and all that (in 10 minutes)

what can we automate?

Page 8: Text mining, machine learning, NLP and all that (in 10 minutes)

abstracts from PubMed search

doctor conducting review

manually screened abstracts

SVM

how does this work?

Page 9: Text mining, machine learning, NLP and all that (in 10 minutes)

SVMs

Page 10: Text mining, machine learning, NLP and all that (in 10 minutes)

bag of words

Page 11: Text mining, machine learning, NLP and all that (in 10 minutes)

special considerations for the case of systematic reviews

• class imbalance – far fewer relevant than irrelevant abstracts– asymmetric costs sensitivity more important than

specificity

• reviewer time is scarce and expensive– better models, fewer labels: active learning and

dual supervision

Page 12: Text mining, machine learning, NLP and all that (in 10 minutes)

how do we do?

we can achieve 100% sensitivity while

substantially reducing workload

“Towards Modernizing the Systematic Review Pipeline: Efficient Updating via Data Mining” Genetics in Medicine 2012

Page 13: Text mining, machine learning, NLP and all that (in 10 minutes)

beyond citation screening

Page 14: Text mining, machine learning, NLP and all that (in 10 minutes)

beyond citation screening