Top Banner
| 1 Anita de Waard 0000-0002-9034- 4119 VP Research Data Collaborations Elsevier RDM Services [email protected] CMMI Workshop February 6, 2016 The Economics of Data Sharing
18

The Economics of Data Sharing

Apr 11, 2017

Download

Science

Anita de Waard
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Economics of Data Sharing

| 1

Anita de Waard 0000-0002-9034-4119VP Research Data CollaborationsElsevier RDM [email protected]

CMMI WorkshopFebruary 6, 2016

The Economics of Data Sharing

Page 2: The Economics of Data Sharing

| 2

How do we get scientists to share their data?

How do we make data repositories sustainable?

• The economics of science• Cost recovery models of data repositories• Some examples that work• Some thoughts on the future.

How do we create effective and sustainable ecosystems for storing, sharing and reusable data—

and get people to use them?

Page 3: The Economics of Data Sharing

| 3

Debit Economy (like a pie)

• Single pile of ‘stuff’ gets divided:- Thing can only be for one person at

one time- “If you get more, I get less”

• Examples: - Money- Jobs- Samples, equipment, space, etc.

• Behaviors: - Hoarding, secrecy- (Cut-throat) competition- Winning by owning

(and not sharing)

Credit Economy (like a song)

• Credit comes from visibility:- The more you give away,

the more you benefit- “Only if I share do I really own”

(“You need me to do you!” JW)• Examples:

- Papers, citations- Good ideas (if credited)- Skills

• Behaviors: - Open access, citation game- Collaboration with top-X- Winning by sharing

(to enable priority & visibility)

Two Economies of Science [1]:

[1] Paula Stephan: “How Economics Shapes Science”, Harvard University Press, 2012: http://www.jstor.org/stable/j.ctt2jbqd1

<<

< D

ATA

???

Page 4: The Economics of Data Sharing

| 4

RDA IG Repository Cost Recovery• Interviewed 22 repositories, globally• Different income streams:

1. Structurally funded2. Mostly data access charges3. Mostly data deposit fees4. Membership fees (for deposits and/or access)5. Serial project funding6. Supported by host institution

• Different new models under considerations:• Sponsorships/services for the commercial sector • Contracts for specific services offered (hosting, archiving, curation)• Expanding the number of affiliated institutions• Deposit fees• More services for “national memory institutes”

• Some comments:• Some countries structurally fund repositories (not US!)• Some repositories embedded in scholarly practice• Hard to come up with new models: no time, no skill sets!

Page 5: The Economics of Data Sharing

| 5

Object of Study Raw

Data

Processed Data

Data With

PaperCurated Record

Method Analysis Tables/Figures Curate

Methods Software

Four Types of Repositories:

ResearchQuestion

NOAA: 20 TB/NASA streaming > 24 PB/day NASA Reverb: 12 PB Data NSSD: > 230 TB of digital dataNSIDC: 1 PB data, : 1 PB totalALMA Telescope: 40 TB/day

Local Storage/Instrument Repositories

Size: PBNr of files: Trillions

Deep Blue (Umich): 80kMIT Dspace: 75 kHAL (France): 60 kD-Space Cambr: 1.5 kOf which data: hundreds

Institutional/Local Repositories

Size: GBNr of files: Billions

Figshare: 1.2 M DataDryad: 3 kDataverse: 58 k

Non-Domain Repositories

Size: MBNr of files: Milliions

Domain Repositories

PetDB: 6 kPDB: 100 kNIST ASD: 170 k

Size: kBNr of files: 100ks

Publication

Page 6: The Economics of Data Sharing

| 6

YES:• Astronomy: telescopes• High-energy physics: accelerators• Earth science: satellites• Social science: censuses • Medicine (sometimes): patient data in

large studies• Life science: sequence data

NO:• Low-temperature physics: cryostats• Earth science: samples• Materials science: catalysts,

microscopes, etc.• Social science: interviews• Medicine: individual patient data• Neuroscience: microscope

Where is data sharing happening?

• Big equipment, not a single lab/person can run

• Can’t do science without it• Tools in place to be effective

• Small equipment, single lab/person can run

• Can do science without sharing• No effective tools in place

Communicate

Prepare

Observe

Analyze

Ponder

Page 7: The Economics of Data Sharing

| 7

Prepare

Analyze Communicate

Prepare

Analyze Communicate

Observations

Observations

Observations

Identify entities from the start

Connecting small science

Page 8: The Economics of Data Sharing

| 8

Prepare

Analyze Communicate

Prepare

Analyze Communicate

Observations

Observations

Observations

Compare outcome of interactions with these entities

Connecting small science

Page 9: The Economics of Data Sharing

| 9

Prepare

Analyze Communicate

Prepare

AnalyzeCommunicate

Observations

Observations

Observations

Build a ‘virtual reagent spectrogram’ by comparing how different entities interacted in different experiments

Think

Reason collectively!

Connecting small science

Page 10: The Economics of Data Sharing

| 10

A small change for small science: Urban Legend [2]

• Encourage data sharing of raw data files + experimental metadata• Add metadata to your experiment while you’re performing it• Improved data practices made lab more productive and more creative, and

enabled effective and novel collaborations• Lesson: split the data storage and curation from data sharing!

- Provide direct reward to storage: now we can find our own data!- Enable simple upload to embargo’d data set when owner is ready.

[2] Tripathy et al, 2014: http://www.frontiersin.org/10.3389/conf.fninf.2014.18.00077/event_abstract

Page 11: The Economics of Data Sharing

| 11

Researcher

Funding AgencyInstitution

Data Repository

Dataset

JournalPaper

Addressing the fear of scooping with embargo’s:

1. Researcher creates datasets2. Researcher writes paper & publishes in journal3. (Sometimes,) dataset gets posted to repository4. Researcher reports (post-hoc) to Institution and Funder

22

1

3

4

4

Page 12: The Economics of Data Sharing

| 12

Researcher

Funding AgencyInstitution

Data Repository

Dataset

JournalPaper2

2

1

3

4

4iii. No links between

data and paper

iv. Funders/Institutions informed as an afterthought

i. Too much work for researchers

ii. Data posting not mandatory

Addressing the fear of scooping with embargo’s:

Page 13: The Economics of Data Sharing

| 13

Researcher

Funding Agency

Institution

Data Repository

Dataset

Journal

Paper

1. Researcher creates datasets and posts to repository(under embargo – not publicly viewable)

2. Funder is automatically notified of dataset posting3. Researcher writes paper & publishes in journal; embargo is lifted and data linked

- NB this also allows release of non-used data for negative result and reproducibility4. Funder and institution get report on publication and embargo lifting

2

11

3

3

3

44

Addressing the fear of scooping with embargo’s:

Page 14: The Economics of Data Sharing

| 14

A System for Linking Data Links: Scholix

• ICSU-WDS/RDA Publishing Data Service Working group, merged with National Data Service pilot

• Cross-stakeholder – with input from CrossRef, DataCite, OpenAIRE, Europe PubMed Central, ANDS, PANGAEA, Thomson Reuters, Elsevier, and others

• Proposed long-term architecture and interoperability framework: www.scholix.org• Operational prototype at http://dliservice.research-infrastructures.eu/#/api

(including 1.4 Million links from various sources) • Making links between datasets and articles available could/should encourage data

citation and deposition• Together with Force11 Data Citation Principles, encourage Research Object

citation/credit metrics.

Page 15: The Economics of Data Sharing

| 15

The Commons

Cloud ProviderA

NIH

Option:Direct Funding

NIH BD2K

A System for A New Data Economics: NIH Data Commons

Phil Bourne, Dec15

Enables Search

Discovery Index

Indexes

Search Engines

Cloud ProviderB

Investigator

Provides credits

Uses credits inthe Commons

User

Page 16: The Economics of Data Sharing

| 16

Drivers for Data Sharing: A Study in Behavioral Economics

• Study scholarly reward systems from point of view of economics• Develop economic model for entire scholarly rewards ecosystem:

career, prestige, tenure, finances, etc• Two intended outcomes:

- Understanding current behavior with respect to data sharing: can we explain what we see, and the differences between different domains?

- Theoretical foundation for recommendations for policies and practices to stakeholders such as funders, publishers and standards bodies

• Small group working on it, planning first meeting:- Mike Huerta (NLM), Micah Altman (MIT), Fran Berman (RPI), Carol

Tenopir (TN), Carole Palmer (UW), Greg Gordon (SSRN).• Thoughts, join?

Page 17: The Economics of Data Sharing

| 17

• The Economy of Science: pies vs. songs- RDA Data Repositories Cost Recovery IG:- Different types of repositories, different types of science- Need to move from ‘small’ to ‘big’ science thinking

• Some examples of successful data sharing: - Online electronic lab notebooks: making it too easy not to use- RDA Scholix: linking systems of links using existing technology- The NIH Data Commons: enabling a data economy in practice

• Some things we can do:- Embargo pilots: circumvent the fear of scooping- Drivers for data sharing report: science is a human endeavor

In summary: cyberinfrastucture

Page 18: The Economics of Data Sharing

| 18

Thank you!

Links:• https://www.hivebench.com• https://www.elsevier.com/physical-sciences/earth-and-planetary-sciences/the-20

15-international-data-rescue-award-in-the-geosciences

• http://www.journals.elsevier.com/softwarex/• https://www.elsevier.com/books-and-journals/content-innovation/data-base-

linking• https://rd-alliance.org/groups/rdawds-publishing-data-services-wg.html • https://rd-alliance.org/bof-data-search.html• https://data.mendeley.com/• https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data• https://www.force11.org/• http://www.nationaldataservice.org/• https://rd-alliance.org/• https://www.elsevier.com/about/open-science/research-data

Anita de Waard, [email protected]