Top Banner
Public sharing of research datasets: a pilot study of associations Heather Piwowar and Wendy Chapman Department of Biomedical Informatics University of Pittsburgh
48

Public Sharing of Research Datasets: A Pilot Study of Associations

Nov 01, 2014

Download

Health & Medicine

Heather Piwowar

Presented at ASIST & ISSI Pre-Conference
Symposium on Informetrics and Scientometrics on Nov 7, 2009

http://www.sois.uwm.edu/MetricsPreCon/program.html
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Public Sharing of Research Datasets: A Pilot Study of Associations

Public sharing of research datasets:

a pilot study of associations

Heather Piwowar and Wendy Chapman

Department of Biomedical InformaticsUniversity of Pittsburgh

Page 2: Public Sharing of Research Datasets: A Pilot Study of Associations

data data

http://www.flickr.com/photos/vroomvroommm/3457772539

Page 3: Public Sharing of Research Datasets: A Pilot Study of Associations

stale

http://www.flickr.com/photos/75166820@N00/5318468/

Page 4: Public Sharing of Research Datasets: A Pilot Study of Associations

sounds great

http://www.flickr.com/photos/ryanr/142455033/

Page 5: Public Sharing of Research Datasets: A Pilot Study of Associations

but not easy

http://www.flickr.com/photos/faerie-dust/2315927946/

Page 7: Public Sharing of Research Datasets: A Pilot Study of Associations

does it work?

http://www.flickr.com/photos/mesh/14102209/

Page 8: Public Sharing of Research Datasets: A Pilot Study of Associations

aim

Prior work has focused on surveys and studies of intention.

Our aim: measure associations between observed data sharing behaviour and environmental variables

Page 9: Public Sharing of Research Datasets: A Pilot Study of Associations

aim

Funder Journal Investigator Institution Study

Is research data shared after publication?

Page 10: Public Sharing of Research Datasets: A Pilot Study of Associations

aim

Funder Journal Investigator Institution Study

Is research data shared after publication?

Page 11: Public Sharing of Research Datasets: A Pilot Study of Associations

microarray data

http://en.wikipedia.org/wiki/DNA_microarray

http://en.wikipedia.org/wiki/Image:Heatmap.png

http://commons.wikimedia.org/wiki/File:DNA_double_helix_vertikal.PNG

Page 12: Public Sharing of Research Datasets: A Pilot Study of Associations

microarray data

Page 13: Public Sharing of Research Datasets: A Pilot Study of Associations

data sample

Ochsner et al. (2008). Much room for improvement in deposition rates of expression microarray datasets. Nature Methods, 5(12), 991.

Manually reviewed 20 journals for 2007:

400 studies

200 shared their microarray data

Page 14: Public Sharing of Research Datasets: A Pilot Study of Associations

variables

Is research data shared after publication?

Funder mandates

Journalimpact factor

Investigator “experience”

Journalmandates

Page 15: Public Sharing of Research Datasets: A Pilot Study of Associations

variables

Funder mandates

Page 16: Public Sharing of Research Datasets: A Pilot Study of Associations

variables

Funder mandates

NIH 2003 Data Sharing Requirement

Requires a data sharing plan

for studies funded after October 2003

that receive more than $500 000 in direct funding per year

Page 17: Public Sharing of Research Datasets: A Pilot Study of Associations

variables

Funder mandates

Assumed data sharing requirement was applicable if:

the NIH grant numbers associated with PubMed entry had

$750 000 in total funding any year since 2004

plus

a NIH grant number with a leading “1” or “2” since 2004

Page 18: Public Sharing of Research Datasets: A Pilot Study of Associations

variables

Journal mandates

Page 19: Public Sharing of Research Datasets: A Pilot Study of Associations

variables

Journal mandates

Piwowar and Chapman.

A review of journal policies for sharing research data.

International Conference on Electronic Publishing (ELPUB) 2008

Journal Policy Strength: Strong, Weak, or None

Page 20: Public Sharing of Research Datasets: A Pilot Study of Associations

variables

Author experience

Page 21: Public Sharing of Research Datasets: A Pilot Study of Associations

variables

Author experience

Publication history and impact

Page 22: Public Sharing of Research Datasets: A Pilot Study of Associations

variables

“experience and impact” proxy:• years since first publication• h-index estimate• a-index estimate

Scriptable, to allow scaling up to thousands of authors?

Author experience

Page 23: Public Sharing of Research Datasets: A Pilot Study of Associations

variables

Author experience

Author publication history

Page 24: Public Sharing of Research Datasets: A Pilot Study of Associations

variables

Author experience

Citation counts

Page 25: Public Sharing of Research Datasets: A Pilot Study of Associations

variables

Author experience

Author-ity web service:Torvik & Smalheiser. (2009). Author Name Disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data, 3(3):11.

Author name disambiguation

Page 26: Public Sharing of Research Datasets: A Pilot Study of Associations

variables

PubMed + PubMed Central + Author-ity to computepubmedi citation estimates

Author experience

➡ not comprehensive account of publication accomplishments

➡ for aggregate analysis: free, open, scriptable, flexible, reproducible.

Page 27: Public Sharing of Research Datasets: A Pilot Study of Associations

variables

For each first and last author, we used the first principal component of:

• years since first publication• pubmedi h-index estimate• pubmedi a-index estimate

Author experience

Page 28: Public Sharing of Research Datasets: A Pilot Study of Associations

variables

Is research data shared after publication?

Funder mandates

Journalimpact factor

Investigator “experience”

Journalmandates

Page 29: Public Sharing of Research Datasets: A Pilot Study of Associations

stats

Univariate odds ratios

Multivariate logistic regression

Page 30: Public Sharing of Research Datasets: A Pilot Study of Associations

results

http://www.flickr.com/photos/paperpariah/3002687604/

Page 31: Public Sharing of Research Datasets: A Pilot Study of Associations

resultsIs research data shared

after publication?

Funder mandates

Journalimpact factor

Investigator “experience”

Journalmandates

Statistically significantNot statistically significant

Page 32: Public Sharing of Research Datasets: A Pilot Study of Associations

results

33%

Funder mandates

Page 33: Public Sharing of Research Datasets: A Pilot Study of Associations

results

Strength of journal data sharing policy is very correlated with impact factor

Journalmandates

Journalimpact factor

Page 34: Public Sharing of Research Datasets: A Pilot Study of Associations

results

Investigator “experience”

Page 35: Public Sharing of Research Datasets: A Pilot Study of Associations

results

Investigator “experience”

Page 36: Public Sharing of Research Datasets: A Pilot Study of Associations

results

Investigator “experience”

Page 37: Public Sharing of Research Datasets: A Pilot Study of Associations

results

Investigator “experience”

Page 38: Public Sharing of Research Datasets: A Pilot Study of Associations

results

Investigator “experience”

Page 39: Public Sharing of Research Datasets: A Pilot Study of Associations

results

Investigator “experience”

Page 40: Public Sharing of Research Datasets: A Pilot Study of Associations

limitations

• Association does not imply causation

• Only one datatype

• Small sample, limited variables

• Dataset contains disproportionate number of high-impact studies

http://www.flickr.com/photos/vlastula/300102949/

Page 41: Public Sharing of Research Datasets: A Pilot Study of Associations

prelim conclusions

• NIH data sharing plan applies to a minority of NIH microarray studies

• NIH data sharing plan does not seem to increase frequency of data sharing

• More experienced investigators are more likely to share data

Page 42: Public Sharing of Research Datasets: A Pilot Study of Associations

next steps

PhD dissertation!

• More samples

• More variables

http://www.flickr.com/photos/krcla/2069243613/

Page 43: Public Sharing of Research Datasets: A Pilot Study of Associations

future

Spin-off projects:

• Quantify usefulness of pubmedi h-index

• Study the patterns and prevalence of data reuse

http://www.flickr.com/photos/cogdog/123072/

Page 44: Public Sharing of Research Datasets: A Pilot Study of Associations

thanks

Dept of Biomedical Informatics at U of Pittsburgh

NLM for training grant funding

Open science online community and those who release their articles, datasets and photos openly

Dr Wendy Chapman for her support and feedback

Page 45: Public Sharing of Research Datasets: A Pilot Study of Associations
Page 46: Public Sharing of Research Datasets: A Pilot Study of Associations

variables

Journal mandates

Page 47: Public Sharing of Research Datasets: A Pilot Study of Associations

variables

Journal mandates

None: No applicable mention of data sharing

Weak: Request or unenforceable requirement

Strong: Require data deposit accession number as a condition of publication

Policy strength categorization:

Page 48: Public Sharing of Research Datasets: A Pilot Study of Associations

open science

I post my data, code, and statistical scripts athttp://www.dbmi.pitt.edu/piwowar

Share yours too!

http://www.flickr.com/photos/myklroventine/892446624/