Top Banner
Using Citizen Science to organize biomedical knowledge Andrew Su, Ph.D. @andrewsu [email protected] http://sulab.org March 5, 2015 Future of Genomic Medicine Slides posted at slideshare.net/andrewsu
26
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Using Citizen Science to organize biomedical knowledge

Using Citizen Science to

organize biomedical

knowledge

Andrew Su, Ph.D.@andrewsu

[email protected]

http://sulab.org

March 5, 2015

Future of Genomic Medicine

Slides posted at slideshare.net/andrewsu

Page 2: Using Citizen Science to organize biomedical knowledge

2

Candidate genes

FLNB

CTNNB1

EPHA3

SMAD3

XPO1

RPS27

FLCN

ATR

FLT3

BRD2

ERG

RAF1

EGFR

ERBB4

RARA

JAK3

LRP1

WT1

PML

SMARCA4

Page 3: Using Citizen Science to organize biomedical knowledge

The biomedical literature is growing fast…3

0

200,000

400,000

600,000

800,000

1,000,000

1,200,000

1983 1988 1993 1998 2003 2008 2013

Number of new PubMed-indexed articles

Page 4: Using Citizen Science to organize biomedical knowledge

… but it is very hard to query and compute4

Page 5: Using Citizen Science to organize biomedical knowledge

… but it is very hard to query and compute5

Imatinib

Crizotinib

Erlotinib

Gefitinib

Sorafenib

Lapatinib

Dasatinib

Acute myeloid leukemia

Acute lymphoblastic leukemia

Chronic myelogenous leukemia

Chronic lymphocytic leukemia

Hodgkin lymphoma

Non-Hodgkin lymphoma

Myeloma

AND

Page 6: Using Citizen Science to organize biomedical knowledge

6

Pathways

Diseases

Proteins

Variants

Genes

Drugs

Goal: Assemble a network of biomedical

knowledge that is comprehensive,

current, computable and traceable.

Page 7: Using Citizen Science to organize biomedical knowledge

Information Extraction7

1. Identify high level concepts in text

2. Identify relationships between concepts

Page 8: Using Citizen Science to organize biomedical knowledge

8

Doğan and Lu. Proceedings of the 2012 Workshop on BioNLP, 2012, 91-9.

NCBI Disease Corpus

593 PubMed abstracts 12 expert annotators

(2 per document)

6,900 “disease concept” mentions

Page 9: Using Citizen Science to organize biomedical knowledge

Question: Can a group of non-scientists

collectively perform concept recognition in

biomedical texts?

9

Page 10: Using Citizen Science to organize biomedical knowledge

Amazon Mechanical Turk (AMT)10

Requester

AmazonWorkers

1. Create tasks

2. Execute

3. Aggregate

Page 11: Using Citizen Science to organize biomedical knowledge

Experimental design

Task: Identify the “disease concepts” in

the 593 abstracts from the NCBI disease

corpus

– $0.06 per Human Intelligence Task (HIT)

– HIT = annotate one abstract from PubMed

– 15 workers annotate each abstract

11

Page 12: Using Citizen Science to organize biomedical knowledge

Comparison to gold standard12

K = 6

F score = 0.87

• 593 documents

• 15 users / doc

• 9 days

• 145 workers

• $630.96

Precision

Recall

Page 13: Using Citizen Science to organize biomedical knowledge

Comparisons to text-mining algorithms13

F s

co

re

Text-miningAMT

experiments

Page 14: Using Citizen Science to organize biomedical knowledge

Comparisons to human annotators14

Average level of

agreement

between expert

annotators

(stage 1)

F = 0.76

Page 15: Using Citizen Science to organize biomedical knowledge

Comparisons to human annotators15

F = 0.76F = 0.87

Average level of

agreement

between expert

annotators

(stage 2)

Page 16: Using Citizen Science to organize biomedical knowledge

Does Mechanical Turk scale?16

1,000,000 articles per year

10 annotators / article

4 tasks / doc

$0.06 / task

$ 2,400,000 / year

Page 17: Using Citizen Science to organize biomedical knowledge

Question: Can a group of non-scientists

collectively perform concept recognition in

biomedical texts ?

17

and will they do

it for free?

^

Page 18: Using Citizen Science to organize biomedical knowledge

18

http://mark2cure.org

Page 19: Using Citizen Science to organize biomedical knowledge

Mark2Cure Campaign #0

• Goal: replicate the NCBI disease corpus

– 593 documents, 15x redundancy

• Launched Jan 19, 2015

• Completed Feb 16, 2015

19

– 4 weeks

– 10,275 document

annotation events

– 212 unique users

Page 20: Using Citizen Science to organize biomedical knowledge

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Comparison to gold standard20

k = 6

F score = 0.84

PrecisionRecall

Voting threshold

Total cost: $0

Page 21: Using Citizen Science to organize biomedical knowledge

Does Citizen Science scale?21

1,000,000 articles * 10 AE / article 15,828

volunteers

needed

10,275 AE * 365 days

212 annotators* 28 days

AE = Annotation events

=

Number of annotation

events per year

Number of annotation

events per year

per volunteer

Page 22: Using Citizen Science to organize biomedical knowledge

Does Citizen Science scale?22

15,828

volunteers

needed

175,000

volunteers

300,000

volunteers

37,000

volunteers

1,000,000

volunteers

Page 23: Using Citizen Science to organize biomedical knowledge

Annotating the relationships23

This molecule inhibits the growth of a broad

panel of cancer cell lines, and is particularly

efficacious in leukemia cells, including

orthotopic leukemia preclinical models as

well as in ex vivo acute myeloid leukemia

(AML) and chronic lymphocytic leukemia

(CLL) patient tumor samples. Thus, inhibition

of CDK9 may represent an interesting

approach as a cancer therapeutic target

especially in hematologic malignancies.

therapeutic target

subjectpredicate

object

GENE

DISEASE

Page 24: Using Citizen Science to organize biomedical knowledge

24

Candidate genes

FLNB

CTNNB1

EPHA3

SMAD3

XPO1

RPS27

FLCN

ATR

FLT3

BRD2

ERG

RAF1

EGFR

ERBB4

RARA

JAK3

LRP1

WT1

PML

SMARCA4

Page 25: Using Citizen Science to organize biomedical knowledge

25

Cyrus Afrasiabi

Sebastian Burgstaller

Ramya Gamini

Louis Gioia

Salvatore Loguercio

Adam Mark

Erick Scott

Greg Stupp

Andra Waagmeester

Kevin Xin

Other group members

Contact

http://sulab.org

[email protected]

@andrewsu

+Andrew Su

Mark2Cure

Ben Good

Max Nanis

Ginger Tsueng

Chunlei Wu

All Mark2Curators!

Funding and Support

BioGPS: GM83924

Gene Wiki: GM089820

BD2K Center of Excellence: GM114833

Icon credits (Noun Project, Wikimedia Commons): Zach VanDeHey, hunotika, Viktorvoigt, Alberto Rojas, Lloyd Humphreys

Matt and Cristina Might

NGLY1 community

Page 26: Using Citizen Science to organize biomedical knowledge

Why do I Mark2Cure?26

I am retired, have a doctorate in

medical humanities, and have two

children with Gaucher disease. I am

just looking for some way to put my

education to use.

My 4 year old daughter Phoebe is

living with and battling rare

disease.

I have Ehlers Danlos Syndrome. I hope to help people

learn about this painful and debilitating disorder, so that

others like me can receive more effective medical care.

Take part in

something that

helps humanity.

I Mark2Cure in memory of

my son Mike who had type 1

diabetes.

Studied biology in

college and I really

miss it!

In memory of my daughter

who had Cystic Fibrosis

To give back