Top Banner
The Ethics of Digital Preservation: Data Corruption is Worse than you Know Kevin J. Comerford, MS, MFA Associate Professor Director of Digital Initiatives & Scholarly Director of Library IT Services University Libraries
21

The Ethics of Digital Preservation

Jan 21, 2017

Download

Education

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Ethics of Digital Preservation

The Ethics of Digital Preservation:Data Corruption is Worse than you Know

Kevin J. Comerford, MS, MFAAssociate ProfessorDirector of Digital Initiatives & Scholarly CommunicationDirector of Library IT ServicesUniversity Libraries

Page 2: The Ethics of Digital Preservation

Data Loss• Data loss is an error condition in

information systems in which information is destroyed by failures or neglect in storage, transmission, or processing. Information systems implement backup and disaster recovery equipment and processes to prevent data loss or restore lost data. - Wikipedia. 2016. “Data Loss.”

Page 3: The Ethics of Digital Preservation

How Bad is Data Loss?• CERN data studies show that on average, 1 out of

every 1,500 files is corrupt • 3-500 corrupt files on each PC hard drive• Disk write errors most frequent• Network transfer errors also frequent• Memory, RAID, machine errors 3%• 10% of catastrophic data loss in large organizations

is due to “silent corruption” or “bit rot”

References:Robin Harris. 2007. “Data Corruption is worse than you know.”Wikipedia. 2016. “Data Degradation.”Iron Mountain. 2016. “When data dies without a sound.”

Page 4: The Ethics of Digital Preservation

DATA PRESERVATIONETHICS

Page 5: The Ethics of Digital Preservation

Why Preserve Research Data?• Responsibility to Colleagues• Responsibility to Research Subjects• Responsibility to the Public

• Preservation for Access• Preservation for Validation• Preservation for Re-Use

Page 6: The Ethics of Digital Preservation

An inherent principle of publication is that others should be able to replicate and build upon the authors' published claims. Therefore, a condition of publication in a Nature journal is that authors are required to make materials, data and associated protocols promptly available to readers without undue qualifications. Any restrictions on the availability of materials or information must be disclosed to the editors at the time of submission. Any restrictions must also be disclosed in the submitted manuscript, including details of how readers can obtain materials and information.

- Nature’s data accessibility policy (http://www.nature.com/authors/policies/availability.html)

Preservation for Access

Page 7: The Ethics of Digital Preservation

Preservation for AccessUS Federal Funding

AgencyPolicy and Guideline Status

National Science Foundation (NSF) "Investigators are expected to share with other researchers,

at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants. Grantees are expected to encourage and facilitate such sharing."

"Proposals submitted to NSF must include a supplementary document of no more than two pages labeled 'Data Management Plan'. This supplementary document should describe how the proposal will conform to NSF policy on the dissemination and sharing of research results."

National Endowment for the Humanities (NEH) Beginning in Jan 2012, the NEH Office of Digital Humanities

Digital Humanities Implementation Grants applicants will be required to submit a data management plan and a sustainability plan.

National Oceanic and Atmospheric

Administration (NOAA)"The Federal Ocean Data Policy requires that appropriate ocean data and related information collected under federal sponsorship be submitted to and archived by designated national data centers."

Page 8: The Ethics of Digital Preservation

Data Management EthicsData Loss due to poor management not only affects you, but others. That is why good data management bears an ethical responsibility for any researcher

• Time and inconvenience to colleaguesand subjects is irresponsible

• Data mismanagement in publicly fundedprojects is wasteful and violates the public trust

High Security Data• NEVER store private information on human subjects on

public online services, not even for a moment. Never store budgetary data, confidential research data or data in a public location

• NO Google Docs• NO Dropbox or SkyDrive• NO Basecamp or other Data Sharing Site

Page 9: The Ethics of Digital Preservation

Data Management EthicsMaintaining Data Accessibility

• Just as you need to keep the wrong people away from your data, you need to ensure that the RIGHT people have access to it

• Hit by a Bus Proposition: Consider what will become of project or lab research data if you leave for another post, or become incapacitated for a long period of time

• Document where your data is stored, how it is organized, and how to access it, so you or others can reconstruct your work

• Store access information securely, but make sure PIs and research team members know how to get to it

• Keep colleagues informed of changes and updates to your professional data storage areas

Page 10: The Ethics of Digital Preservation

DATA PRESERVATIONIN PRACTICE

Page 11: The Ethics of Digital Preservation

Saving vs. Preserving Data• SAVING is a 1-time event: A

temporary measure to store the current state of data. Examples: Saving a Word document or burning a draft copy of a paper to disc

• Saving only ensures that data will remain available for a limited period of time

• Saving is the first step in preservation

Page 12: The Ethics of Digital Preservation

Saving vs. Preserving Data• PRESERVING is an ongoing

activity• Preservation involves saving a

final, complete version of data• Preservation generally requires

multiple copies of data• Preservation requires planning,

and often special tools• Preservation requires regular

monitoring of data to ensure that it does not become corrupt

Page 13: The Ethics of Digital Preservation

Preserving DataPreservation Planning is an important formal activity for data management

Page 14: The Ethics of Digital Preservation

Research data is preserved at a discrete point in the data life cycle

Preserving Data

Page 15: The Ethics of Digital Preservation

Strategies for Preservation

• Personal Information Management (PIM) research shows that most of us manage our data and information passively

• Data loss is inevitable when data is managed passively

• Data loss costs both time and money

• To avoid data loss, we have to manage our data and information ACTIVELY

Page 16: The Ethics of Digital Preservation

Strategies for Preservation 5 Principles of Good Data and Information Management

– PLAN: Create a personal plan for managing your data

– CENTRALIZE: Minimize the number of data storage locations

you have to manage

– ORGANIZE: Keep your data files ordered and identified

according to a simple but effective master plan

– DUPLICATE: Routinely make safety copies of your important

files

– SECURE: Keep your files and user accounts safe from theft or

vandalism

Page 17: The Ethics of Digital Preservation

Strategies for Preservation• Storage Strategy: LOCKSS Principle

– Digital Preservation research has developed the LOCKSS principle - Lots of Copies Keeps Stuff Safe

– Keep 3 copies of your important data files• Active (Working) Copy • Safety Copy – Store separately from the active copy• Archive Copy – Store separately on semi-permanent media

• Maintain the Data Storage Chain– Active Storage Safety Storage Archival Storage– Don’t violate the order of access in the chain

Page 18: The Ethics of Digital Preservation

Strategies for Preservation

Active StorageArchival Storage

(Back Up Monthly orperiodically)

Safety Storage(Back Up Daily)

This is a simple but effective plan for managing your

personal and research data

Original Copy

Backup Copy

Archive Copy

1

2

3

Page 19: The Ethics of Digital Preservation

Strategies for PreservationTo preserve your important data:

1. Store multiple copies of data files (min. 3)

2. Store data on several formats or devices (HDD, disc, tape, SSD)

3. Monitor the condition of your storage media (1-3 years)

4. Test the actual data files (1-3 years)

Page 20: The Ethics of Digital Preservation

UNM Data Services Available

• Data Management Planning• DMPs are required by agencies that fund research

• Data Curation• Converts research data into common formats

• Data Archiving• Preserves research data

• Data Publishing• Makes data accessible for peer review

http://library.unm.edu/services/data.php

Page 21: The Ethics of Digital Preservation

Questions & Discussion