Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011 http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID ORCID and data publication Identifying knowledge contributors to motivate sharing 1 Gudmundur A. Thorisson <[email protected]> Tony Brookes bioinformatics group Departments of Genetics University of Leicester -- Outline -- • Pretext: my route to workshop • Ongoing & planned data publication projects • Disease genetics data • Planned integration with ORCID for researcher identification • Role of ORCID in data publication ecosystem? • [shameless] plug for Sept workshop on researcher identity This work can be freely copied, redistributed and adapted, as long as proper attribution is given Monday, 16 May 2011
38
Embed
Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
ORCID and data publicationIdentifying knowledge contributors to motivate sharing
1
Gudmundur A. Thorisson <[email protected]> Tony Brookes bioinformatics group
Departments of GeneticsUniversity of Leicester
-- Outline --• Pretext: my route to workshop
• Ongoing & planned data publication projects
• Disease genetics data
• Planned integration with ORCID for researcher identification
• Role of ORCID in data publication ecosystem?
• [shameless] plug for Sept workshop on researcher identity
This work can be freely copied, redistributed and adapted, as long as proper attribution is given
Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
Prologue
2
Monday, 16 May 2011
Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
3
Monday, 16 May 2011
Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
44
Prof Anthony J Brookes GEN2PHEN coordinatorChair, Bioinformatics and GenomicsDepartment of GeneticsUniversity of Leicester, UK
Monday, 16 May 2011
Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
5
Monday, 16 May 2011
Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
The data sharing problem
6
Monday, 16 May 2011
Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
Lack of incentives for sharing
• Effort required to prepare, package and submit datasets to public repositories
• Time better spent writing papers & grants
• All sticks (funders, journals) - no carrots
• Need incentives - treat data as publications and credit creators
7
“[...] Many of the issues regarding data availability can be addressed if the principles of “publication” rather than “sharing” are applied. However, online data publication systems also need to develop mechanisms for data citation and indices of data access comparable to those for citation systems in print journals”
Costello, M. Motivating Online Publication of Data. BioScience (2009) vol. 59 (5) pp. 418-427
Monday, 16 May 2011
Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
Name ambiguity => attribution challenges
8
Are these authors all the same person?G. Thorisson, University of LeicesterG. A. Thorisson, University of LeicesterG. A. Thorisson, Cold Spring Harbor Laboratory
J. SmithJ. SmithJ. SmithJ. SmithJ. Smith [etc.]
Or these?
∼2/3 of the ∼6 million authors in MEDLINE share a last name and first initial with at least one other author, and an ambiguous name refers to ∼8 persons on average.Torvik and Smalheiser. Author name disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data (2009) vol. 3 (3)
How about these?
Monday, 16 May 2011
Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
ORCID
F67572010
?
ORCID ID: B-1242-2010G. Thorisson, Univ. LeicesterG. A. Thorisson, Univ. LeicesterG. A. Thorisson, Cold Spring Harbor Lab.
ORCID ID: G-1442-2009J. Smith, Univ. North Pole
ORCID ID: D-2400-2010J. Smith, Luthor Corporation
ORCID - tackling the contributor identity problem
Monday, 16 May 2011
Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
Projects
10
Monday, 16 May 2011
Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
1110
1. Diagnostic laboratories
2. Central ‘clearinghouse’
3. End-users (e.g. LSDB curators)
Publish data Retrieve Atom feeds
Submi&ng muta,ons from diagnos,c labs using “Café RouGE enabled” so<ware via simple bu@on click
Data are shared with diverse 3rd par,es via manual retrieval or automated feed-‐based monitoring/retrieval
Cafe Variome - facilitating exchange of genetic data
Monday, 16 May 2011
Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
12
Cafe Variome - facilitating exchange of genetic data
dbSNP (coding)UniProt
PhenCode
Submission from diag. lab
Metadata describing varia,on data published elsewhere
Data shared with diverse 3rd par,es and data usage/cita,on tracked via DOI
✔
×
DOI assigned to incoming data upload
Already stable IDs so no DOI assignedA@ribu,on given to data submi@ers
via ORCID unique iden,fier
Monday, 16 May 2011
Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
12
Cafe Variome - facilitating exchange of genetic data
dbSNP (coding)UniProt
PhenCode
Submission from diag. lab
Metadata describing varia,on data published elsewhere
Data shared with diverse 3rd par,es and data usage/cita,on tracked via DOI
✔
×
DOI assigned to incoming data upload
Already stable IDs so no DOI assignedA@ribu,on given to data submi@ers
via ORCID unique iden,fier
Monday, 16 May 2011
Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
• Digital resources, incl. biomedical databases– E.g. locus-specific databases (LSDBs), variation archives (e.g. Cafe Variome)
– How to acknowledge researchers who:
• Maintain vital community resource (e.g. http://www.wormbase.org )
• Undertake value-adding curation
– Micro-attribution: Giardine, B. et al. Systematic documentation and analysis of human genetic variation in hemoglobinopathies using the microattribution approach. Nature Genetics advance on, (2011). http://dx.doi.org/10.1038/ng.785
This work has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013)under grant agreement number 200754 - the GEN2PHEN project.