Top Banner
infrastructure for communicating data-intensive science brian m. bot | senior scientist | community manager | sage bionetworks clear Science
29

infrastructure for communicating data-intensive science

Jan 23, 2017

Download

Science

Brian Bot
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: infrastructure for communicating data-intensive science

infrastructure for communicating data-intensive science

brian m. bot | senior scientist | community manager | sage bionetworks

clearScience

Page 2: infrastructure for communicating data-intensive science

a non-profit organization which pilots a variety of components that are necessary to build a scientific research “commons”

why?

Sage Bionetworks

Page 3: infrastructure for communicating data-intensive science

“We Must Guard Against the acquisition of unwarranted influence,

whether sought or unsought, by the Military Industrial Complex”

- Dwight D. Eisenhower 1961 Medical

Page 4: infrastructure for communicating data-intensive science

not conducive for a ‘commons’

Page 5: infrastructure for communicating data-intensive science

institutional incrementalism

individual tenure

proprietary shortsighted solutions

not conducive for a ‘commons’

Page 6: infrastructure for communicating data-intensive science

commonsenabling a

open data

accessible platform

clear communication

Page 7: infrastructure for communicating data-intensive science

“The problem is that right now, it’s not easy to donate your data to health research.”

“The goal of Consent to Research is to play a part in the transformation of health from

something we experience passively to something we

experience actively.”

http://weconsent.usJohn Wilbanks, Chief Commons Officer

open data

Page 8: infrastructure for communicating data-intensive science

open data

accessible platform

clear communication

commonsenabling a

Page 9: infrastructure for communicating data-intensive science

accessible platform

a collaborative compute space that allows scientists to share and analyze

data together

Page 10: infrastructure for communicating data-intensive science

open data

accessible platform

clear communication

commonsenabling a

Page 11: infrastructure for communicating data-intensive science

clear communication

Page 12: infrastructure for communicating data-intensive science

Deception at Duke

Page 13: infrastructure for communicating data-intensive science

research scandals represent merely the extreme of a continuum in the culture of academic research

Page 14: infrastructure for communicating data-intensive science

the status quo tolerates poor communication of findings

6%

21%

8%

11%

54%cannot reproduce

can reproduce in principle

can reproduce w/discrepancies

can reproduce from processed data w/discrepancies

can reproduce partially

Ioannidis A. et al. Repeatability of published microarray gene expression analyses. Nature Genetics 41, 149-155 (2009) | doi:10.1038/ng.295

Page 15: infrastructure for communicating data-intensive science

208,294,724 datapoints

124 pages supplemental material

?? lines unobtainable source code

?? version or architecture of statistical analysis program (R)

enumerable R packages and package dependencies

key R package “ClaNC” no longer available

442 citations

often what is in principle reproducible, is not practically reproducible

unidentified publication‣ from journal with 5 year impact factor of 28‣ article freely available for download‣ data freely available for download

Page 16: infrastructure for communicating data-intensive science

how are we to move science forward

if we cannot understand what was done previously?

Page 17: infrastructure for communicating data-intensive science

let’s go back to basics

Page 18: infrastructure for communicating data-intensive science

4. test hypothesis experimentally

5. analyze experimental data

7. publish results

6. draw conclusions based on data

scientific method1. define a question

2. gather information and resources (background research)

3. form a hypothesis

8. retest (frequently done by other scientists)

4. test hypothesis experimentally

5. analyze experimental data

7. publish results

6. draw conclusions based on data

Page 19: infrastructure for communicating data-intensive science

7. publish results

Page 20: infrastructure for communicating data-intensive science

finitein

∞...

Page 21: infrastructure for communicating data-intensive science

submit to journal

analyze on local machine

write a documentsent to reviewers as pdf

printed on paper

static html representation

experimentally generate data

accepted & digitally typeset

static pdf representation

store on local server

Page 22: infrastructure for communicating data-intensive science

are being artificially uncoupled from

scientific claims

science itself

Page 23: infrastructure for communicating data-intensive science

clearSciencere-imagining scientific communication

allow consumption of content at a variety of levels of complexity

and abstraction

leverage Synapse RESTful APIs

Page 24: infrastructure for communicating data-intensive science

clearScienceallow consumption of content at a

variety of levels of complexity and abstraction

“hand the keys over” to the reviewers

Page 25: infrastructure for communicating data-intensive science
Page 26: infrastructure for communicating data-intensive science

scientific communicationneeds to evolve

Page 27: infrastructure for communicating data-intensive science

along with scienceneeds to evolve

Page 28: infrastructure for communicating data-intensive science

“Scientists often study the past as obsessively as historians because few

other professions depend so acutely on it. Every experiment is a conversation with

a prior experiment, every new theory a refutation of the old”

-Siddhartha Mukherjee, The Emperor of All Maladies

Page 29: infrastructure for communicating data-intensive science

AcknowledgementsSage Bionetworks

David Burdick - Senior Software Engineer

Stephen Friend - President and CEO

Erich S. Huang - Director of Cancer Research

Michael Kellen - Director of Technology

External Partners

Myles Axton - Nature Genetics

Phil Bourne - PLoS Computational Biology

Josh Greenberg - Alfred P. Sloan Foundation

Kelly LaMarco - Science Translational Medicine

Eric Schadt - Mount Sinai School of Medicine