Top Banner
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065 www.eudat.eu Research Data Management Version 2 August 2016 This work is licensed under the Creative Commons CC-BY 4.0 licence
29

EUDAT Research Data Management

Jan 24, 2017

Download

Science

EUDAT
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: EUDAT Research Data Management

EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065

www.eudat.eu

Research Data Management

Version 2August 2016

This work is licensed under the Creative Commons CC-BY 4.0 licence

Page 2: EUDAT Research Data Management

The changing data landscape

Managing and sharing research data

EUDAT services

Overview

Page 3: EUDAT Research Data Management

THE CHANGING DATA LANDSCAPEImage CC-BY-SA ‘data.path Ryoji.Ikeda - 3’ by r2hox

www.flickr.com/photos/rh2ox/9990016123

Page 4: EUDAT Research Data Management

Data explosion

More and more data is being created

Issue is not creating data, but being able to navigate and use it

Data management is critical to make sure data are well-organised, understandable and reusable

Image by ‘Coupmedia’ by http://www.coupmedia.com/resources/

Page 5: EUDAT Research Data Management

Digital data are fragile and susceptible to loss for a wide variety of reasons

Natural disasterFacilities infrastructure failureStorage failureServer hardware/software failureApplication software failureFormat obsolescenceLegal encumbranceHuman errorMalicious attackLoss of staffing competenciesLoss of institutional commitmentLoss of financial stabilityChanges in user expectations

Data loss

Image CC-BY ‘Hard Drive 016’ by Jon Ross www.flickr.com/photos/jon_a_ross/1482849745

Page 6: EUDAT Research Data Management

Link rot – more 404 errors generated over time

Reference rot* – link rot plus content drift i.e. webpages evolving and no longer reflecting original content cited

* Term coined by Hiberlink http://hiberlink.org

Data persistency issues

Jonathan D. Wren Bioinformatics 2008;24:1381-1385

Page 7: EUDAT Research Data Management

A reproducibility crisis

Nature special issue

http://www.nature.com/news/reproducibility-1.17552

Several studies have shown alarming numbers of published papers that don’t stand up to scrutiny

Page 8: EUDAT Research Data Management

A wildlife biologist for a small field office was the in-house GIS expert and provided support for all the staff’s GIS needs. However, the data was stored on her own workstation. Whenthe biologist relocated to another office, no one understood howthe data was stored or managed.

Solution: A state office GIS specialist retrieved the workstationand sifted through files trying to salvage relevant data.

Cost: 1 work month ($4,000) plus the value of data that was not recovered

Consider that the situation could have been worse, because the data was not being backed up as it would have been if stored on a server.

Poor data management - science example

Page 9: EUDAT Research Data Management

In preparation for a Resource Management Plan, an office discovered 14 duplicate GPS inventories of roads. However, because none of the inventories had enough metadata, it was impossible to know which inventory was best or if any of the inventories actually met their requirements.

Solution: Re-Inventory roadsCost: Estimated 9 work months per inventory @$4,000/wm (14 inventories = $504,000)

Poor data management - federal example

Image CC-BY ‘Minature fake highway interchange in Chicago’ by Ryan www.flickr.com/photos/ryanready/4692092024

Page 10: EUDAT Research Data Management

Why manage research data?

To make your research easier!To stop yourself drowning in irrelevant stuffIn case you need the data laterTo avoid accusations of fraud or bad scienceTo share your data for others to use and learn fromTo get credit for producing itBecause funders or your organisation require it

Well-managed data opens up opportunities for re-use, integration and new science

Page 11: EUDAT Research Data Management

MANAGING & SHARING DATAImage CC-BY-SA by https://www.flickr.com/photos/notbrucelee/8016192302

Page 12: EUDAT Research Data Management

CREATING DATA

PROCESSING DATA

ANALYSING DATA

PRESERVING DATA

GIVING ACCESS TO DATA

RE-USING DATA

Research data lifecycleCREATING DATA: designing research, DMPs, planning consent, locate existing data, data collection and management, capturing and creating metadata

RE-USING DATA: follow-up research, new research, undertake research reviews, scrutinising findings, teaching & learning

ACCESS TO DATA: distributing data, sharing data, controlling access, establishing copyright, promoting data PRESERVING DATA: data storage, back-

up & archiving, migrating to best format & medium, creating metadata and documentation

ANALYSING DATA: interpreting, & deriving data, producing outputs, authoring publications, preparing for sharing

PROCESSING DATA: entering, transcribing, checking, validating and cleaning data, anonymising data, describing data, manage and store data

Ref: UK Data Archive: http://www.data-archive.ac.uk/create-manage/life-cycle

Page 13: EUDAT Research Data Management

BitstreamPersistent IdentifierMetadata

Digital objects can be aggregated to digital collections

What is a digital object?

https://b2share.eudat.eu/record/1

Page 14: EUDAT Research Data Management

A DMP is a brief plan to define:• how the data will be created?• how it will be documented?• who will access it?• where it will be stored?• who will back it up?• whether (and how) it will be shared & preserved?

DMPs are often submitted as part of grant applications, but are useful whenever researchers are creating data.

Data Management Planning

Page 15: EUDAT Research Data Management

Metadata and documentation is needed to locate and understand research data

Think about what others would need in order to find, evaluate, understand, and reuse your data.

Get others to check the metadata to improve quality

Use standards to enable interoperability

Metadata and documentation

Page 16: EUDAT Research Data Management

Where to store your data?

Your own drive (PC, server, flash drive, etc.)– And if you lose it? Or it breaks?

Somebody else’s drive / departmental drive

“Cloud” drive– Do they care as much about your data as you

do?

Large scale infrastructure services like EUDAT

Page 17: EUDAT Research Data Management

How to backup?

3... 2... 1... backup!– at least 3 copies of a file– on at least 2 different media– with at least 1 offsite

Use managed services where possible e.g. University filestores or infrastructure services like EUDAT rather than local or external hard drives

Ask IT teams for advice

Page 18: EUDAT Research Data Management

Backup and preservation – not the same thing!

Backupso Used to take periodic snapshots of data in case the

current version is destroyed or losto Backups are copies of files stored for short or near-

long-termo Often performed on a somewhat frequent schedule

Archivingo Used to preserve data for historical reference or

potentially during disasterso Archives are usually the final version, stored for long-

term, and generally not copied overo Often performed at the end of a project or during major

milestones

Page 19: EUDAT Research Data Management

A mistake in a spreadsheet led to dramatically different results from those published.

These results were cited by the International Monetary Fund and the UK Treasury to justify austerity programmes.

Had the data been shared, this could have been picked up earlier.

The importance of sharing data

Page 20: EUDAT Research Data Management

Concerns About Data Sharing

Concern Solution

inappropriate use due to misunderstanding of research purpose or parameters

security and confidentiality of sensitive data

lack of acknowledgement / credit

loss of advantage when competing for research dollars

Page 21: EUDAT Research Data Management

Concerns About Data Sharing

Concern Solution

inappropriate use due to misunderstanding of research purpose or parameters

security and confidentiality of sensitive data

lack of acknowledgement / credit

loss of advantage when competing for research dollars

metadata

metadata

metadata

metadata

Page 22: EUDAT Research Data Management

Concerns About Data Sharing

Concern Solutioninappropriate use due to misunderstanding of research purpose or parameters

provide rich Abstract, Purpose, Use Constraints and Supplemental Information where needed

security and confidentiality of sensitive data

• the metadata does NOT contain the data

• Use Constraints specify who may access the data and how

lack of acknowledgement / credit

specify a required data citation within the Use Constraints

loss data insight and competitive advantage when vying for research dollars

create second, public version with generalized Data Processing Description

Page 23: EUDAT Research Data Management

Making data shareable

Create robust metadata that has been checked

Include reference information e.g. unique IDs & properly formatted data citations

Publish your metadata so it’s discoverable. Use portals, clearing houses, online resources…

Package up the data and associated metadata to deposit in repositories

Page 24: EUDAT Research Data Management

Deciding what to preserve and share

It’s not possible to keep everything. Select based on:What has to be kept e.g. data underlying publicationsWhat can’t be recreated e.g. environmental recordings What is potentially useful to othersWhat has scientific, cultural or historical valueWhat legally must be destroyed

How to select and appraise research data:www.dcc.ac.uk/resources/how-guides/appraise-select-research-data

Page 25: EUDAT Research Data Management

EUDAT SERVICE SUITEImage CC-BY-NC ‘Data centre’ by Bob Mical www.flickr.com/photos/small_realm/15995555571

Page 26: EUDAT Research Data Management

EUDAT servicesEUDAT offers a pan-European solution, providing a generic set of services to ensure minimum level of interoperability

Building common data services in close collaboration with 25+ communities

Page 27: EUDAT Research Data Management

EUDAT B2 service suite

Covering both access and deposit, from informal data

sharing to long-term archiving, and addressing

identification, discoverability and computability of both

long-tail and big data, EUDAT’s services will

address the full lifecycle of research data

Page 28: EUDAT Research Data Management

Support throughout the lifecycle

CREATING DATA

PROCESSING DATA

ANALYSING DATA

PRESERVING DATA

GIVING ACCESS TO

DATA

RE-USING DATA

Page 29: EUDAT Research Data Management

www.eudat.eu

Authors Contributors

This work is licensed under the Creative Commons CC-BY 4.0 licence

EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures.Contract No. 654065

Sarah Jones, Digital Curation CentreMark van de Sanden, SURFsara

Thank you

Content has also been repurposed from the DataONE Educational modules, ‘Data Management’ and ‘Data Sharing’ Retrieved from https://www.dataone.org/education-modules