Top Banner
NATIONAL FEDERATION OF ADVANCED INFORMATION SERVICES Mastering the Curation, Integrity and Citation of Quality Research Data: Research Data Publication, Part II Richard Huffine, Independent Consultant NFAIS Hybrid One-Day Workshop, April 20, 2015
15
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Publishing Overview

NATIONAL FEDERATION OF ADVANCED INFORMATION SERVICES

Mastering the Curation, Integrity and Citation of Quality Research Data:

Research Data Publication, Part II

Richard Huffine, Independent Consultant

NFAIS Hybrid One-Day Workshop, April 20, 2015

Page 2: Data Publishing Overview

Overview

• Publishing: The Change in Expectations

• Where it Starts: Planning for Data

• Role of Data in Publishing

• Metadata and Citation

• Emerging U.S. Federal Policies

• Case Study: USGS

• Capabilities of Data Repositories

• Access to Data– Role of Librarians

– Role of Publishers

– Other Commercial Interests

• *A Note About Copyrights

• Questions and Feedback

Page 3: Data Publishing Overview

Publishing: the Change in Expectations

• Just a generation ago publishing was a very prospective industry– Most publications were allowed to go out of print

– Publishers were always looking for their next releases, not focusing on what they had published in the past

• With the dawn of electronic publishing, the industry changed

• Publishers digitized their entire backlists and are making the most of everything they publish

• This opens the door for publishers to take an interest in the data associated with publications

Page 4: Data Publishing Overview

• Data publishing will continue to grow in both volume and diversity:– Data associated with publications

– Data as the publication, largely to replace reference books

– Data sets and sub-sets for specific applications

– Data systems which aggregate and collect data from multiple sources

• The processes for licensing both content and data are converging

Publishing: the Change in Expectations

Page 5: Data Publishing Overview

Where it Starts: Planning for Data

• A number of funders are now requiring data management plans as part of the funding process

• Those plans include the strategies authors will use to archive, distribute, and preserve the data collected using those funds

• That data – the entirety of what is collected in support of a research project – is typically archived by the institutional sponsors

• Smaller subsets of the data – that which supports specific publications – is being published either with or at the same time as the published research.

Page 6: Data Publishing Overview

Role of Data in Publishing

• Publishers have adopted a variety of strategies for making data available

• Some support ancillary file delivery; others point to author-provided locations for data

• Some publishers are developing policies for data availability– PLoS, Sharing of Data, Materials, and Software

• https://www.plos.org/policies/#sharing

• Beyond publication, institutions are investing in data management infrastructures that can assist researchers in managing their data in perpetuity.– Purdue University Research Repository (PURR)

• https://purr.purdue.edu/

Page 7: Data Publishing Overview

Metadata and Citation

• Metadata Standards – Repositories should collect everything they can about

the data they hold and only limit what they share based on the specified standards of requestors

• Citation– Repositories should provide persistent Identifiers to

data sets and to their descriptive metadata records– The elements for citation should be one of the

required elements during the ingest of a data collection.

Page 8: Data Publishing Overview

Emerging U.S. Federal Policies

• The U.S. federal government does not currently have a legislative mandate to share data produced using federal funding

• But the current administration has made commitments to increase the availability of data – both government produced and funded

• All agencies are developing strategies for managing their data and making it more accessible

• Some agencies have developed policies in response to a Presidential Memorandum from 2013 – to enhance public access to publicly funded research

• Only the Departments of Health & Human Services and Education have mandates defined in law

Page 9: Data Publishing Overview

USGS: A Case Study

• Longstanding Data Series publication process• Community for Data Integration (www.usgs.gov/cdi/)

• Data Management Web site (www.usgs.gov/datamanagement/share.php)

• USGS Policies (www.usgs.gov/usgs-manual/95imlist.html)

– Scientific Data Management Foundation– Metadata for Scientific Data, Software, and Other Information Products– Review and Approval of Scientific Data for Release– Preservation Requirements for Digital Scientific Data

• Moving towards repository services to manage data and publications for discovery and access.

Page 10: Data Publishing Overview

Capabilities of Data Repositories

• The technical infrastructure for managing digital content is evolving.• Publishers and research institutions currently have very different

strategies• Publishers are using a variety of expensive commercial products that

can scale to their needs.• Research institutions are developing repository solutions using Open

Source products that require significant investment and development

• Neither path is sustainable and the two rarely interface with one another

• The two also place different priorities on persistence, relationships to other objects, and end-user capabilities

• Capabilities like Application Protocol Interfaces (APIs) and Representational State Transfer (REST) services are being sought by users of both of these solutions

Page 11: Data Publishing Overview

Access to Data: Role of Librarians

• Librarians have a unique skill set to help address the requirements for data access in this changing environment

• Librarians sit at the intersection between:– Publishers and Users– Researchers and Institutional Repositories– Publishers and Institutions

• Librarians can, and should, be working to improve the path for data access through improving the interchange that occurs at these intersections

• Librarians – and not necessarily libraries – are needed to support a culture of continuous improvement in access and usability of both publications and data

• Boston Public Library to Tackle Boston’s Data– http://www.bostonherald.com/business/business_markets/2015/04/library_to_tackle_bostons_data

Page 12: Data Publishing Overview

Access to Data: Role of Publishers

• Publishers, like funders, need to establish standards for data sharing and facilitate access to data, regardless of where it is housed

• Publishers need to build on the current trends and make the data they provide more useful to the users of their content. Including:– Interactivity and visualization with data associated with

publications– Data services for accessing data through direct interchange

• Improved integration with institutional repositories to facilitate access to ancillary material regardless of where it resides

Page 13: Data Publishing Overview

Access to Data: Other Commercial Interests

• The responsibility for data publication and access does not reside solely with publishers and institutional repositories

• A number of other commercial interests can step up to support improved access to data and enhanced services for its re-use

• Research Data Management is a growing opportunity for both commercial and non-commercial development

• Commercial services that support scientists like Mendeley, Flow, and EndNote could develop tools for discovery, visualization, and integration with other sources of data and analysis

• As publishers and repositories improve, the ability to support researchers with enhanced tools grows.

Page 14: Data Publishing Overview

* A Note About Copyrights

• Just a note to clarify that in the United States, copyright law does not apply to facts, data, or ideas. However, copyright may protect a collection of data as contained in a database or compilation, but only if it meets certain requirements.

• In Europe, however, provides much greater protection of databases. It prohibits the extraction or reutilization of any database in which there has been a substantial investment in either obtaining, verification, or presentation of the data contents.

– Database legal protection• http://www.bitlaw.com/copyright/database.html

Page 15: Data Publishing Overview

Questions and Feedback

• What questions did today’s presentation raise for you?

Feedback:Richard [email protected]