Peter Lund, Anton Angelo, Chris Thomson (CEISMIC)
Jul 17, 2015
Peter Lund, Anton Angelo, Chris Thomson (CEISMIC)
Overview
• What is data?
• Challenges in working with
data
• Advantages of good data
management
• Data management plans
• Practicalities
– Back up and storage
– Ethics
– Sharing data
– Licencing
– Resources
Learning outcomes
• Identify the benefits and drivers for good data management
• Appreciate the common elements of an effective data
management plan and why it is desirable to complete one
• Understand the benefits and challenges of sharing data
• Know how to describe your data
• Reflect on best practice for managing digital data effectively
• Understand what further help is available in managing data
• What kind of data do you
collect?
• What challenges do you
face in collecting data?
• Discuss in groups for 3
minutes
What is data?
Advantages of RDM
Compliance with funders’& institutional policies
Reduces the risk of data loss
Facilitates sharing and reuse of data
Enhances the visibility of your research
Provides opportunities for collaborations
Funder requirements
Include the following matters in the final report to the Society required under
clause 4.2(c):
(i) Which data and sample repositories will be used to store the metadata,
data and samples collected as part of the Programme and
(ii) Where the metadata will be stored if no data or sample repositories are
available
A view from RCUK1. Make data openly available where possible
2. Have policies and plans for research data and preserve data with long-
term value
3. Provide sufficient metadata for discovery and provide information on
access to data in publications
4. Consider legal, ethical and commercial constraints on release of research
data
5. Protect the efforts of research data creators with appropriate embargoes
6. Acknowledge the source of research datasets and abide by the terms
and conditions of use
7. Ensure cost-effective use of public funds for RDM
Credit: Loughborough University
Research lifecycle
Credit: University of
California: Irvine
What is a data management plan?
• DCC Checklist
“A Data Management Plan is a project document
which describes the data (or similar evidence) that
a project will collect, how it will be stored during
the project, how it will be archived at the end of
the project and how access will be granted to it
where appropriate.”
Some practicalities…
Organise your files
• Directory structure naming conventions
• File naming conventions
File formats for long-term access
• Non-proprietary
• Open, documented standard
• Common usage by research community
• Standard representation (ASCII, Unicode)
• Unencrypted
• Uncompressed
Make it so one thing can’t ruin everything
Pen drives fail Hard disk
stolen with laptop
Hacked email
account
Viruses and Malware
Cloud service issues
Fire
Sunspots
Cosmic rays
Alien attack
The Apocalypse
When Toy Story 2 almost
vanished
<iframe width="560" height="315"
src="https://www.youtube.com/embed/yIz9
eqwLt9U" frameborder="0"
allowfullscreen></iframe>
Rule of three
Removable Storage
• USB Key
• Hard Drive
Laptop or Desktop
• Backed up corporate folder?
Cloud Storage
• One/Google drive
EthicsAnonymity and confidentiality
• What personal information have you collected?
• What commitments have you made to protect personal data
• The Privacy Act
• What have you said in your ethics application?
• Whose data is it?
Data Sharing
Sharing data and management snafu
in 3 short acts
Meta data
• Data about data
• What elements might
you use to describe
data?
Data citation
• Academic impact is measured by
citation counts
• Your data should be cited by you and
others
Data set citation
• Cool, H. E. M., & Bell, M. (2011). Excavations at St
Peter’s Church, Barton-upon-Humber [Data set].
doi:10.5284/1000389
• DOIs are available from repositories e.g. UC
Research Repository, Figshare
Publishing data• PLOS
• Data journals e.g.-
– Scientific Data
– Geoscience data journal
• Subject repositories e.g. RePec, ArXiv
• Figshare, Dryad
• UC Research Repository
LicensingCopyright Graffiti Sign by Horia Varlan
CC-BY
https://flic.kr/p/7vBD4T
Public Domain
Few Restrictions
Public Domain
Few Restrictions
All Rights Reserved
Few Freedoms
Public Domain
Few Restrictions
Some Rights Reserved
Range of Licence Options
All Rights Reserved
Few Freedoms
Case Study: CEISMIC Canterbury
Earthquakes Digital Archive
Enabling effective data
management and reuse:
• Discoverability
• Ethics
• Licensing
• Technical
Discoverability
- Submit to your IR
- Use unique identifiers or URIs
- Provide metadata – you are the
best source
Ethics
- Identify data of long-term value
- Consent forms should cover:
- Storage & access
- How data can be reused
Licensing
- Use NZ CC licenses for data
- Consider how ethics requirements
affect licensing
Technical
- Use ‘open’ formats, eg CSV
- Consider standards, eg
http://dataprotocols.org/tabular-
data-package/
Why you should manage your data
Compliance with funders’& institutional policies
Reduces the risk of data loss
Facilitates sharing and reuse of data
Enhances the visibility of your research
Provides opportunities for collaborations
Resources
Mantra from Edinburgh University
DMPonlineDigital Curation Centre
ITS support
Virtual machines -Windows (currently Windows 12 server)
Linux (Red Hat Enterprise)
Bandwidth quota per month 20gb
(40gb for international students)
KAREN network from REANNZ
Storage and further resources on request
More help
RDM Subject guide
Anton Angelo
Research Data Coordinator
Liaison Librarians:
Kerry Gilmour
Dave Lane
Janette Nicolle
Cuiying Mu
Departmental IT TechniciansPeter Lund,
Manager, Research Support
Importance of data management
plans
Credit: Mantra –
University of
Edinburgh
Photo creditstaken from Flickr and used with attribution under cc licence
• Slide 1 Janeneka Staaks
• Slide 9 Caroline and Louis Volant
• Slide 10 Global Panorama