NATIONAL FEDERATION OF ADVANCED INFORMATION SERVICES Mastering the Curation, Integrity and Citation of Quality Research Data: Research Data Publication, Part II Richard Huffine, Independent Consultant NFAIS Hybrid One-Day Workshop, April 20, 2015
Jul 16, 2015
NATIONAL FEDERATION OF ADVANCED INFORMATION SERVICES
Mastering the Curation, Integrity and Citation of Quality Research Data:
Research Data Publication, Part II
Richard Huffine, Independent Consultant
NFAIS Hybrid One-Day Workshop, April 20, 2015
Overview
• Publishing: The Change in Expectations
• Where it Starts: Planning for Data
• Role of Data in Publishing
• Metadata and Citation
• Emerging U.S. Federal Policies
• Case Study: USGS
• Capabilities of Data Repositories
• Access to Data– Role of Librarians
– Role of Publishers
– Other Commercial Interests
• *A Note About Copyrights
• Questions and Feedback
Publishing: the Change in Expectations
• Just a generation ago publishing was a very prospective industry– Most publications were allowed to go out of print
– Publishers were always looking for their next releases, not focusing on what they had published in the past
• With the dawn of electronic publishing, the industry changed
• Publishers digitized their entire backlists and are making the most of everything they publish
• This opens the door for publishers to take an interest in the data associated with publications
• Data publishing will continue to grow in both volume and diversity:– Data associated with publications
– Data as the publication, largely to replace reference books
– Data sets and sub-sets for specific applications
– Data systems which aggregate and collect data from multiple sources
• The processes for licensing both content and data are converging
Publishing: the Change in Expectations
Where it Starts: Planning for Data
• A number of funders are now requiring data management plans as part of the funding process
• Those plans include the strategies authors will use to archive, distribute, and preserve the data collected using those funds
• That data – the entirety of what is collected in support of a research project – is typically archived by the institutional sponsors
• Smaller subsets of the data – that which supports specific publications – is being published either with or at the same time as the published research.
Role of Data in Publishing
• Publishers have adopted a variety of strategies for making data available
• Some support ancillary file delivery; others point to author-provided locations for data
• Some publishers are developing policies for data availability– PLoS, Sharing of Data, Materials, and Software
• https://www.plos.org/policies/#sharing
• Beyond publication, institutions are investing in data management infrastructures that can assist researchers in managing their data in perpetuity.– Purdue University Research Repository (PURR)
• https://purr.purdue.edu/
Metadata and Citation
• Metadata Standards – Repositories should collect everything they can about
the data they hold and only limit what they share based on the specified standards of requestors
• Citation– Repositories should provide persistent Identifiers to
data sets and to their descriptive metadata records– The elements for citation should be one of the
required elements during the ingest of a data collection.
Emerging U.S. Federal Policies
• The U.S. federal government does not currently have a legislative mandate to share data produced using federal funding
• But the current administration has made commitments to increase the availability of data – both government produced and funded
• All agencies are developing strategies for managing their data and making it more accessible
• Some agencies have developed policies in response to a Presidential Memorandum from 2013 – to enhance public access to publicly funded research
• Only the Departments of Health & Human Services and Education have mandates defined in law
USGS: A Case Study
• Longstanding Data Series publication process• Community for Data Integration (www.usgs.gov/cdi/)
• Data Management Web site (www.usgs.gov/datamanagement/share.php)
• USGS Policies (www.usgs.gov/usgs-manual/95imlist.html)
– Scientific Data Management Foundation– Metadata for Scientific Data, Software, and Other Information Products– Review and Approval of Scientific Data for Release– Preservation Requirements for Digital Scientific Data
• Moving towards repository services to manage data and publications for discovery and access.
Capabilities of Data Repositories
• The technical infrastructure for managing digital content is evolving.• Publishers and research institutions currently have very different
strategies• Publishers are using a variety of expensive commercial products that
can scale to their needs.• Research institutions are developing repository solutions using Open
Source products that require significant investment and development
• Neither path is sustainable and the two rarely interface with one another
• The two also place different priorities on persistence, relationships to other objects, and end-user capabilities
• Capabilities like Application Protocol Interfaces (APIs) and Representational State Transfer (REST) services are being sought by users of both of these solutions
Access to Data: Role of Librarians
• Librarians have a unique skill set to help address the requirements for data access in this changing environment
• Librarians sit at the intersection between:– Publishers and Users– Researchers and Institutional Repositories– Publishers and Institutions
• Librarians can, and should, be working to improve the path for data access through improving the interchange that occurs at these intersections
• Librarians – and not necessarily libraries – are needed to support a culture of continuous improvement in access and usability of both publications and data
• Boston Public Library to Tackle Boston’s Data– http://www.bostonherald.com/business/business_markets/2015/04/library_to_tackle_bostons_data
Access to Data: Role of Publishers
• Publishers, like funders, need to establish standards for data sharing and facilitate access to data, regardless of where it is housed
• Publishers need to build on the current trends and make the data they provide more useful to the users of their content. Including:– Interactivity and visualization with data associated with
publications– Data services for accessing data through direct interchange
• Improved integration with institutional repositories to facilitate access to ancillary material regardless of where it resides
Access to Data: Other Commercial Interests
• The responsibility for data publication and access does not reside solely with publishers and institutional repositories
• A number of other commercial interests can step up to support improved access to data and enhanced services for its re-use
• Research Data Management is a growing opportunity for both commercial and non-commercial development
• Commercial services that support scientists like Mendeley, Flow, and EndNote could develop tools for discovery, visualization, and integration with other sources of data and analysis
• As publishers and repositories improve, the ability to support researchers with enhanced tools grows.
* A Note About Copyrights
• Just a note to clarify that in the United States, copyright law does not apply to facts, data, or ideas. However, copyright may protect a collection of data as contained in a database or compilation, but only if it meets certain requirements.
• In Europe, however, provides much greater protection of databases. It prohibits the extraction or reutilization of any database in which there has been a substantial investment in either obtaining, verification, or presentation of the data contents.
– Database legal protection• http://www.bitlaw.com/copyright/database.html
Questions and Feedback
• What questions did today’s presentation raise for you?
Feedback:Richard [email protected]