A basic course on Research data management part 2: protecting and organizing your data PROOF course Information Literacy and Research Data Management TU/e, 24-01-2017 [email protected], TU/e IEC/Library Available under CC BY-SA license, which permits copying and redistributing the material in any medium or format & adapting the material for any purpose, provided the original author and source are credited & you distribute the adapted material under the same license as the original
14
Embed
A basic course on Reseach data management, part 2: protecting and organizing your data
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
A basic course on Research data management
part 2: protecting and organizingyour dataPROOF course Information Literacy and Research Data Management
Available under CC BY-SA license, which permits copying and redistributing the material in any medium or format & adapting the material for any purpose, provided the original author and source are credited & you distribute the adapted material under the same license as the original
Research data management Sharing your data, or making your data findable and accessible
with good data practices→ protecting your data: back up, access control; file naming, organizing
data, versioning+ sharing your data via collaboration platforms and archives
Caring for your data, or making your data re-usable and interoperable with good data practices+ metadata, tidy data, licenses
Research data managementwhat was it again
Be safe+ storage, backup data safety, protecting against loss: use local
ICT infrastructure (including SURFdrive) as much as possible+ access control data security, protecting against unauthorized
use: with DataverseNL for example
Be organized, or: you should be able to tell what’s in a file without opening it+ file-naming, organizing data in folders, versioning,+ data classification and retention; different treatment of different
data (raw versus processed data)
Protecting your datagood data practices during your research
“…we can copy everything and do not manage it well.” (Indra Sihar)
File-naming #1be consistent and aim for concise but informative names
Good file names are consistent (use file-naming conventions), unique (distinguishes a file from files with similar subjects as well as different versions of the file) and meaningful (use descriptive names).
File-naming conventions help you find your data, help others to find your data and help track which version of a file is most current
Avoid using special characters in a file name: \ / : * ? < > | [ ] & $
Use underscores instead of periods or spaces to separate logical elements in a file name
Avoid very long names: usually 25 characters is sufficient length
Names should include all necessary descriptive information independent of where it is stored
Include dates and a version number on files Add a readme.txt to each folder in which the file naming
and its meaning is explained Source: File naming conventions
File naming #2think about the ordering of elements within a filename
Order by date:2013-04-12_interview-recording_THD.mp32013-04-12_interview-transcript_THD.docx2012-12-15_interview-recording_MBD.mp32012-12-15_interview-transcript_MBD.docx
Order by subject:MBD_interview-recording_2012-12-15.mp3MBD_interview-transcript_2012-12-15.docxTHD_interview-recording_2013-04-12.mp3THD_interview-transcript_2013-04-12.docx
Order by type:Interview-recording_MBD_2012-12-15.mp3Interview-recording_THD_2013-04-12.mp3Interview-transcript_MBD_2012-12-15.docxInterview-transcript_THD_2013-04-12.docx
Forced order with numbering:01_THD_interview-recording_2013-04-12.mp302_THD_interview-transcript_2013-04-12.docx03_MBD_interview-recording_2012-12-15.mp304_MBD_interview-transcript_2012-12-15.docx
<
File organization
PAGE 631-1-2017
<Source: Beatriz Ramirez, Data management plan for the PhD project: development and application of a monitoring system to assess the impacts of climate and land cover changes on eco-hydrological processes in an eastern Andes catchment area
Source: Haselager, dr. G.J.T. (Radboud University Nijmegen); Aken, prof. dr. M.A.G. van (Utrecht University) (2000): Personality and Family Relationships. DANS. http://dx.doi.org/10.17026/dans-xk5-y7vc .
1. Main project folder (name of your research project/working title of your paper)1.1. Original data and metadata
1.1.1. Original data (keep these read only)Any data that were necessary for any part of the processing and/or analysis you reported in you paper. Copies of all your original data files, saved in exactly the format it was when you first obtained it. The name of the original data file may be changed1.1.2. Metadata
1.1.2.1. Supplements
Organizing your data in folders #2based on the TIER documentation protocol
1. Main project folder (name of your research project/working title of your paper)1.1. Original data and metadata
1.1.1. Original data
1.1.2. MetadataThe Metadata Guide: document that provides information about each of your original data files. Applies especially to obtained data files A bibliographic citation of the original data files, including the date you
downloaded or obtained the original data files and unique identifiers that have been assigned to the original data files.
Information about how to obtain a copy of the original data file Whatever additional information to understand and use the data in the
original data file1.1.2.1. SupplementsAdditional information about an original data file that’s not written by yourself but that is found in existing supplementary documents, such as users’ guides and code books that accompany the original data file
Organizing your data in folders #3based on the TIER documentation protocol
Organizing your data in folders #4based on the TIER documentation protocol
1. Main project folder (name of your research project/working title of your paper)1.1. Original data and metadata
1.1.1. Original data1.1.2. Metadata
1.1.2.1. Supplements
1.2. Processing and analysis files1.2.1. Importable data files (the data you work with)A corresponding version for each of the original data files. This version can be identical to the original version, or in some cases it will be a modified version.For example modifications required to allow your software to read the file (converting the file to another format, removing explanatory notes from a table…). The original and importable versions of a data file should be given different
names The importable data file should be as nearly as identical as possible to the
original The changes you make to your original data files to create the corresponding
importable data files should be described in a Readme file 1.2.2. Command files1.2.3. Analysis files
Organizing your data in folders #5based on the TIER documentation protocol
1. Main project folder (name of your research project/working title of your paper)1.1. Original data and metadata
1.1.1. Original data1.1.2. Metadata
1.1.2.1. Supplements1.2. Processing and analysis files
1.2.1. Importable data files
1.2.2. Command filesOne or more files containing code written in the syntax of the (statistical) software you use for the study Importing phase: commands to import or read the files and save them in a
format that suits your software Processing phase: commands that execute all the processing required to
transform the importable version of your files into the final data files that you will use in your analysis (i.e. cleaning, recoding, joining two or more data files, dropping variables or cases, generating new variables)
Generating the results: commands that open the analysis data file(s), and then generate the results reported in your paper.
Organizing your data in folders #6based on the TIER documentation protocol
1. Main project folder (name of your research project/working title of your paper)1.1. Original data and metadata
1.1.1. Original data1.1.2. Metadata
1.1.2.1. Supplements1.2. Processing and analysis files
1.2.1. Importable data files1.2.2. Command files
1.2.3. Analysis files The fully cleaned and processed data files that you use to generate the
results reported in your paper in your paper The Data Appendix: codebook for your analysis data files: brief description
of the analysis data file(s), a complete definition of each variable (including coding and/or units of measurement), the name of the original data files from which the variable was extracted, the number of valid observations for the variable, and the number of cases with missing values
95eef49d777c&owner=c057b578-4a6a-4449-881b-17fff17e2f1a (paragraph 6, example 1)3. File organization: Haselager, dr. G.J.T. , Aken, prof. dr. M.A.G. van (2000): Personality and Family
Relationships. DANS. http://dx.doi.org/10.17026/dans-xk5-y7vc (Data guide, p. 24-26)4. Version control: http://www.data-archive.ac.uk/create-manage/format/versions5. Storage, back up of data: http://www.data-archive.ac.uk/create-manage/storage6. Local ICT infrastructure: https://intranet.tue.nl/en/university/services/ict-services/ict-service-