Top Banner
Looking for Data: Finding New Science Anita de Waard VP Research Data Collaborations [email protected] http://researchdata.elsevier.com /
18

Looking for Data: Finding New Science

Sep 10, 2014

Download

Science

Anita de Waard

Keynote for STM innovations seminar 2014: http://www.stm-assoc.org/events/stm-innovations-seminar-u-s-2014/
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Looking for Data: Finding New Science

Looking for Data:Finding New Science

Anita de WaardVP Research Data Collaborations

[email protected]

http://researchdata.elsevier.com/

Page 2: Looking for Data: Finding New Science

Why should science publishers care about Research Data?

Funding bodies: Demonstrate impact Guarantee permanence,

discoverability Avoid fraud Avoid double funding Serve general public

Research Management/Libary: Generate, track outputs Comply with mandates Ensure availability

Phil Bourne, (then) Associate Vice Chancellor, UCSD, 4/13: “We need to think about the university as a digital enterprise.”

Mike Huerta, Ass. Director NLM: “Today, the major public product of science are concepts, written down in papers. But tomorrow, data will be the main product of science…. We will require scientists to track and share their data as least as well, if not better, than they are sharing their ideas today.”

Researchers: Derive credit Comply with mandates Discover and use Cite/acknowledge

Nathan Urban, PI Urban Lab, CMU, 3/13: “If we can share our data, we can write a paper that will knock everybody’s socks off!”

Barbara Ransom, NSF Program Director Earth Sciences: “We’re not going to spend any more money for you to go out and get more data! We want you first to show us how you’re going to use all the data we paid y’all to collect in the past!”

Page 3: Looking for Data: Finding New Science

Research data management today:

Using antibodiesand squishy bits Grad Students experimentand enter details into theirlab notebook. The PI then tries to make sense of their slides,and writes a paper. End of story.

Page 4: Looking for Data: Finding New Science

Prepare

Observe

Analyze

Ponder

Communicate

Prepare

Observe

Analyze

Ponder

Communicate

Most of biology is quite insular

Page 5: Looking for Data: Finding New Science

But it is also VERY complicated:

http://en.wikipedia.org/wiki/File:Duck_of_Vaucanson.jpg

• Interspecies variability: A specimen is not a species• Gene expression variability: Knowing genes is not

knowing how they are expressed• Microbiome: An animal is an ecosystem• Systems biology: A whole is more than the sum of its parts• Male researchers stress out rodents!

Reductionist science does not workfor living systems!Statistics to the rescue!

Page 6: Looking for Data: Finding New Science

What if the research data was connected?

Prepare

Analyze Communicate

Prepare

Analyze Communicate

Observations

Observations

Observations

Across labs, experiments: track reagents and how they are used

Page 7: Looking for Data: Finding New Science

Prepare

Analyze Communicate

Prepare

Analyze Communicate

Observations

Observations

Observations

Compare outcome of interactions with these entities

What if the research data was connected?

Page 8: Looking for Data: Finding New Science

Prepare

Analyze Communicate

Prepare

AnalyzeCommunicate

Observations

Observations

Observations

Build a ‘virtual reagent spectrogram’ by comparing how different entities interacted in different experiments

Think

What if the research data was connected?

Page 9: Looking for Data: Finding New Science

Maslow Hierarchy of Research Data Needs:

Useful

Trusted

Reproducible

Discoverable

Comprehensible

Archived

Accessible

Preserved in digital format

Page 10: Looking for Data: Finding New Science

1: Urban LegendHow can we make a standard neuroscience wet lab store and share their data?• Incorporate structured workflows into

the daily practice of a typical electrophysiology lab (the Urban Lab at CMU)– What does it take?– Where are points of conflict?

• 1-year pilot, funded by Elsevier RDS: – CMU: Shreejoy Tripathy, manage/user test– Elsevier: development, UI, project management

• Next steps: NIH grant to scale up to 4 labs

Useful

Trusted

Reproducible

Discoverable

Comprehensible

Archived

Accessible

Preserved in digital format

Page 11: Looking for Data: Finding New Science

de Waard, A., Burton, S. et al., 2013

Urban Legend Components

Page 13: Looking for Data: Finding New Science

Data dashboard (e.g. SDB140225c4_onbeam_CC)

Page 14: Looking for Data: Finding New Science

2: MoonrocksHow can we scale up data curation?Pilot project with IEDA: • Build a database for lunar geochemistry• Leapfrog & improve curation time• Write joint report on processes, costs

and challenges• 1-year pilot, funded by Elsevier• Next step: NSF grant on schema’s >

spreadsheets

Useful

Trus-ted

Reprodu-cible

Discoverable

Comprehensible

Archived

Accessible

Preserved in digital format

Page 15: Looking for Data: Finding New Science

Moonrocks Data Import:

Moonrocks: pushing data curation to the researcher

Page 16: Looking for Data: Finding New Science

3: How do we improve how data (and software) are published?

• Eg with the Virtual Microscope • Or Interactive Plots• Or Executable Papers

Useful

Trusted

Reprodu-cible

Discoverable

Comprehensible

Archived

Accessible

Preserved in digital format

Page 17: Looking for Data: Finding New Science

So what does research data need?Trusted (validated by domain

specialists/curated, checked by reviewers)

Reproducible (others can redo experiments given description)

Usable (with normalized metadata that allows cross-dataset tools to run)

Comprehensible (other than researcher can understand data & how created)

Accessible (can be accessed in some form by other than the researcher)

Preserved (existing in some form, not deleted)

Discoverable (described in a way that can be indexed by some system)

Archived (long-term preserved, format-independent)

Experimental Metadata: Workflows, Samples, Settings, Reagents, Organisms, etc.

Record Metadata: DOI, Date, Author, Institute, etc.

Processed Data: Mathematically/computationally processed

data: correlations, plots, etc.

Raw Data: Direct outputs from equipment: images, traces, spectra, etc.

Methods and Equipment: Reagents, settings, manufacturer’s details, etc.

Validation: Approval, Reproduction, Selection, Quality Stamp

Mor

e cu

ratio

nM

ore

usab

le

Page 18: Looking for Data: Finding New Science

Collaborations and discussions gratefully acknowledged: • CMU: Nathan Urban, Shreejoy Tripathy, Shawn Burton, Ed Hovy• UCSD: Phil Bourne, Brian Shoettlander, David Minor, Declan

Fleming, Ilya Zaslavsky• NIF: Maryann Martone, Anita Bandrowski• MSU: Brian Bothner• OHSU: Melissa Haendel, Nicole Vasilevsky• California Digital Library: Carly Strasser, John Kunze, Stephen

Abrams• Columbia/IEDA: Kerstin Lehnert, Leslie Hsu• CNI: Clifford Lynch• Harvard: Michael Kurtz, Chris Erdmann• MIT: Micah Altman• UVM: Mara Saurle

Thank you!

http://researchdata.elsevier.com/