Public Sharing of Research Datasets: A Pilot Study of Associations

Post on 01-Nov-2014

1650 Views

Category:

Health & Medicine

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presented at ASIST & ISSI Pre-Conference Symposium on Informetrics and Scientometrics on Nov 7, 2009 http://www.sois.uwm.edu/MetricsPreCon/program.html

Transcript

Public sharing of research datasets:

a pilot study of associations

Heather Piwowar and Wendy Chapman

Department of Biomedical InformaticsUniversity of Pittsburgh

data data

http://www.flickr.com/photos/vroomvroommm/3457772539

stale

http://www.flickr.com/photos/75166820@N00/5318468/

sounds great

http://www.flickr.com/photos/ryanr/142455033/

but not easy

http://www.flickr.com/photos/faerie-dust/2315927946/

does it work?

http://www.flickr.com/photos/mesh/14102209/

aim

Prior work has focused on surveys and studies of intention.

Our aim: measure associations between observed data sharing behaviour and environmental variables

aim

Funder Journal Investigator Institution Study

Is research data shared after publication?

aim

Funder Journal Investigator Institution Study

Is research data shared after publication?

microarray data

http://en.wikipedia.org/wiki/DNA_microarray

http://en.wikipedia.org/wiki/Image:Heatmap.png

http://commons.wikimedia.org/wiki/File:DNA_double_helix_vertikal.PNG

microarray data

data sample

Ochsner et al. (2008). Much room for improvement in deposition rates of expression microarray datasets. Nature Methods, 5(12), 991.

Manually reviewed 20 journals for 2007:

400 studies

200 shared their microarray data

variables

Is research data shared after publication?

Funder mandates

Journalimpact factor

Investigator “experience”

Journalmandates

variables

Funder mandates

variables

Funder mandates

NIH 2003 Data Sharing Requirement

Requires a data sharing plan

for studies funded after October 2003

that receive more than $500 000 in direct funding per year

variables

Funder mandates

Assumed data sharing requirement was applicable if:

the NIH grant numbers associated with PubMed entry had

$750 000 in total funding any year since 2004

plus

a NIH grant number with a leading “1” or “2” since 2004

variables

Journal mandates

variables

Journal mandates

Piwowar and Chapman.

A review of journal policies for sharing research data.

International Conference on Electronic Publishing (ELPUB) 2008

Journal Policy Strength: Strong, Weak, or None

variables

Author experience

variables

Author experience

Publication history and impact

variables

“experience and impact” proxy:• years since first publication• h-index estimate• a-index estimate

Scriptable, to allow scaling up to thousands of authors?

Author experience

variables

Author experience

Author publication history

variables

Author experience

Citation counts

variables

Author experience

Author-ity web service:Torvik & Smalheiser. (2009). Author Name Disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data, 3(3):11.

Author name disambiguation

variables

PubMed + PubMed Central + Author-ity to computepubmedi citation estimates

Author experience

➡ not comprehensive account of publication accomplishments

➡ for aggregate analysis: free, open, scriptable, flexible, reproducible.

variables

For each first and last author, we used the first principal component of:

• years since first publication• pubmedi h-index estimate• pubmedi a-index estimate

Author experience

variables

Is research data shared after publication?

Funder mandates

Journalimpact factor

Investigator “experience”

Journalmandates

stats

Univariate odds ratios

Multivariate logistic regression

resultsIs research data shared

after publication?

Funder mandates

Journalimpact factor

Investigator “experience”

Journalmandates

Statistically significantNot statistically significant

results

33%

Funder mandates

results

Strength of journal data sharing policy is very correlated with impact factor

Journalmandates

Journalimpact factor

results

Investigator “experience”

results

Investigator “experience”

results

Investigator “experience”

results

Investigator “experience”

results

Investigator “experience”

results

Investigator “experience”

limitations

• Association does not imply causation

• Only one datatype

• Small sample, limited variables

• Dataset contains disproportionate number of high-impact studies

http://www.flickr.com/photos/vlastula/300102949/

prelim conclusions

• NIH data sharing plan applies to a minority of NIH microarray studies

• NIH data sharing plan does not seem to increase frequency of data sharing

• More experienced investigators are more likely to share data

next steps

PhD dissertation!

• More samples

• More variables

http://www.flickr.com/photos/krcla/2069243613/

future

Spin-off projects:

• Quantify usefulness of pubmedi h-index

• Study the patterns and prevalence of data reuse

http://www.flickr.com/photos/cogdog/123072/

thanks

Dept of Biomedical Informatics at U of Pittsburgh

NLM for training grant funding

Open science online community and those who release their articles, datasets and photos openly

Dr Wendy Chapman for her support and feedback

variables

Journal mandates

variables

Journal mandates

None: No applicable mention of data sharing

Weak: Request or unenforceable requirement

Strong: Require data deposit accession number as a condition of publication

Policy strength categorization:

open science

I post my data, code, and statistical scripts athttp://www.dbmi.pitt.edu/piwowar

Share yours too!

http://www.flickr.com/photos/myklroventine/892446624/

top related