Top Banner
Introduction to Data Management Introduction to Data Management
58

Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Jan 01, 2016

Download

Documents

Jocelyn Lester
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Introduction to Data ManagementIntroduction to Data Management

Page 2: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

22

Data Management

•Overview of research dataOverview of research data– Joel Roselin, Office of Research Compliance and Joel Roselin, Office of Research Compliance and

TrainingTraining

•Data Storage and RetentionData Storage and Retention– Danianne Mizzy, Engineering LibrarianDanianne Mizzy, Engineering Librarian

•Data SharingData Sharing– Kathryn Pope, Center for Digital Research and Kathryn Pope, Center for Digital Research and

ScholarshipScholarship

Page 3: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

33

Goals of research

• The primary goals of research are:The primary goals of research are:– To advance knowledgeTo advance knowledge

– To improve life for people (or animals)To improve life for people (or animals)

•Secondary goals of research:Secondary goals of research:– Career advancementCareer advancement

– Professional recognitionProfessional recognition

– Financial gainFinancial gain

Page 4: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

44

When you conduct research…

•……You are entrusted with:You are entrusted with:– Human subjectsHuman subjects

– AnimalsAnimals

– Access to specialized materials and technologyAccess to specialized materials and technology• ChemicalsChemicals

• DrugsDrugs

• MachineryMachinery

• Information (personal or confidential)Information (personal or confidential)

– Funding from government or industryFunding from government or industry

Page 5: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

55

When you conduct research…

•Not everyone is granted the privilege to Not everyone is granted the privilege to conduct research:conduct research:– Qualifications include: Qualifications include: • Advanced degree (or enrolled in a degree program)Advanced degree (or enrolled in a degree program)

• Position in a research institutionPosition in a research institution

– Promise to:Promise to:• Be responsible in the conduct of the researchBe responsible in the conduct of the research

• Be responsible stewards of the research dollars and other Be responsible stewards of the research dollars and other resourcesresources

• Share the results of the research for the good of societyShare the results of the research for the good of society

Page 6: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

66

When you conduct research…

• The privilege can be revoked for failing to fulfill The privilege can be revoked for failing to fulfill professional responsibilities:professional responsibilities:– Not get fundingNot get funding

– DebarmentDebarment

– Lose of positionLose of position

Page 7: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

77

What are data?

•What counts as data in your field?What counts as data in your field?

Page 8: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

88

What are data?

•What counts as data in your field?What counts as data in your field?– Subject data (humans or animals)Subject data (humans or animals)• Blood cell countsBlood cell counts

• ObservationalObservational

• Survey responsesSurvey responses

– Lab dataLab data• Test resultsTest results

• AssaysAssays

– Other dataOther data• Library informationLibrary information

• PhotographsPhotographs

Page 9: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

99

What are data?

True or FalseTrue or False

In scientific research, only the information and In scientific research, only the information and observations that are made as part of scientific observations that are made as part of scientific inquiry are considered data.inquiry are considered data.

Page 10: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

1010

It’s ALL data

• FalseFalse!!

•Data are not only the information and Data are not only the information and observations made as part of scientific inquiry observations made as part of scientific inquiry but also the materials, the means, and the but also the materials, the means, and the products of that inquiry (sometimes called products of that inquiry (sometimes called data data sourcessources).).

• ExamplesExamples::• Cell linesCell lines• Survey instrumentsSurvey instruments• Associated softwareAssociated software• SpecimensSpecimens

Page 11: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

1111

Everything is Data

Everything is data and Everything is data and

data is everything!data is everything!

Page 12: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

1212

Sensitive Data

• Some data are highly sensitiveSome data are highly sensitive– Private Health Information (PHI), including insurance informationPrivate Health Information (PHI), including insurance information

– Personal information such as Social Security numbers, financial dataPersonal information such as Social Security numbers, financial data

• Inappropriate release of sensitive information can lead to Inappropriate release of sensitive information can lead to harms:harms:– Privacy violationsPrivacy violations

– Identity theftIdentity theft

– Financial liability for the UniversityFinancial liability for the University

• Sensitive information is highly regulated and requires security, Sensitive information is highly regulated and requires security, e.g. encryptione.g. encryption

• University resources:University resources:– HIPAA website HIPAA website

– IRB WebsiteIRB Website

– Policy on Electronic Data Security Breach Reporting and ResponsePolicy on Electronic Data Security Breach Reporting and Response

Page 13: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

1313

Page 14: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

1414

Takeways

• Everything is data and data is everything!Everything is data and data is everything!

• The PI is has The PI is has stewardshipstewardship (control) of a (control) of a project's data, with regard to publication and project's data, with regard to publication and copyright. copyright.

Page 15: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Data Management & RetentionData Management & Retention

Danianne MizzyDanianne MizzyEngineering LibrarianEngineering Librarian

1515

Page 16: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Data Management & Retention

• Funder requirements Funder requirements – Minimum or maximum?Minimum or maximum?

– Just because not required doesn’t mean you don’t need Just because not required doesn’t mean you don’t need to consider and address long term accessto consider and address long term access

•Columbia Data Retention PolicyColumbia Data Retention Policy– Research data must be archived for a minimum of three Research data must be archived for a minimum of three

years after the final project close-out, with original data years after the final project close-out, with original data retained wherever possible.retained wherever possible.

1616

Page 17: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Relevant Policies

•CU Policies & ProceduresCU Policies & Procedures– Administrative Code of ConductAdministrative Code of Conduct

– Statement of Ethical ConductStatement of Ethical Conduct

– Faculty HandbookFaculty Handbook

– Sponsored Projects Handbook Sponsored Projects Handbook

– Clinical Research Handbook Clinical Research Handbook

– Electronic Information ResourcesElectronic Information Resources Security Security

• Funder RequirementsFunder Requirements

1717

Page 18: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Agency Retention Periods

•HIPAA – At least 6 yearsHIPAA – At least 6 years

•NIH – 3 yearsNIH – 3 years

•NSF - What constitute reasonable procedures NSF - What constitute reasonable procedures will be determined by the community of will be determined by the community of interest through the process of peer review interest through the process of peer review and program management.and program management.

1818

Page 19: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Data Storage Planning

•Need to plan for entire life-cycleNeed to plan for entire life-cycle

• Establish a baseline and project the rate of Establish a baseline and project the rate of growth for the duration of the project.growth for the duration of the project.

•ActiveActive– Frequent additions & updatesFrequent additions & updates

•ArchivalArchival– In fixed form - only need periodic accessIn fixed form - only need periodic access

1919

Page 20: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Data Storage Considerations

•SizeSize

•Retention periodRetention period

•Privacy or security requirements? Privacy or security requirements?

•Sharing? Sharing?

2020

Page 21: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Data Storage Options at CU

Active (Working) StorageActive (Working) Storage

CUIT CUIT – 500 MB personal critical data500 MB personal critical data

– Workgroup Space on Central –Workgroup Space on Central –• $400 per gigabyte per year with a minimum $400 per gigabyte per year with a minimum

of a half gigabyte (500 MB)of a half gigabyte (500 MB)

– Research Computing ServicesResearch Computing Services• High Performance ClusterHigh Performance Cluster

• For more information contact For more information contact [email protected]

School & Departmental serversSchool & Departmental servers

2121

Page 22: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Data Storage Options at CU

Active StorageActive Storage

Library Library

Center for Digital Research & Center for Digital Research & Scholarship (CDRS)Scholarship (CDRS)

– – Consultation availableConsultation available

2222

Page 23: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Data Storage Options at CU

Archival StorageArchival Storage

•Library – Academic Commons Library – Academic Commons

2323

Page 24: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Data Management Planning

• What file formats? Are they long-lived?What file formats? Are they long-lived?– Long-livedLong-lived– Non-proprietaryNon-proprietary

• Storage and backup strategy?Storage and backup strategy?– Media – CDs and DVDs not long-livedMedia – CDs and DVDs not long-lived

• What project and data identifiers will be assigned?What project and data identifiers will be assigned?

• Naming conventions, file/directory structureNaming conventions, file/directory structure

• Version ControlVersion Control

• Is there a metadata scheme or other community Is there a metadata scheme or other community standard for data sharing/integration?standard for data sharing/integration?

2424

Page 25: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

CU Security Policy

• Individuals who access or control University Individuals who access or control University electronic information resources must take electronic information resources must take appropriate and necessary measures to ensure appropriate and necessary measures to ensure the security, integrity, and protection of these the security, integrity, and protection of these resources, using appropriate physical and resources, using appropriate physical and logical security measures.logical security measures.

2525

Page 26: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Data Security and Data Integrity

•Unencrypted vs. EncryptedUnencrypted vs. Encrypted– Keep passwords & keys on paper in a secure locationKeep passwords & keys on paper in a secure location

– and in an Encrypted Digital Fileand in an Encrypted Digital File

•Uncompressed vs. CompressedUncompressed vs. Compressed

2626

Page 27: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Security - Physical

•Restrict access to computers, offices and Restrict access to computers, offices and storage mediastorage media

•Store lab notebooks, samples in locked Store lab notebooks, samples in locked cabinetscabinets

•Only let trusted individuals troubleshoot Only let trusted individuals troubleshoot computer problemscomputer problems

•Appropriate environmental controlsAppropriate environmental controls

2727

Page 28: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Security - Network

•Keep confidential and sensitive data on Keep confidential and sensitive data on computers not connected to the Internetcomputers not connected to the Internet

•Keep virus protection up to dateKeep virus protection up to date

•Don't sent confidential data via e-mail or FTP Don't sent confidential data via e-mail or FTP (use encryption, if you must)(use encryption, if you must)

•Use passwords on files and computersUse passwords on files and computers

•Data disposition at end retention periodData disposition at end retention period

2828

Page 29: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Security – CU Encryption Options

CUIT CUIT

•BitLocker for removable storage devicesBitLocker for removable storage devices

•Can purchase Guardian Hard Disk Encryption Can purchase Guardian Hard Disk Encryption through CUITthrough CUIT

•Windows Encrypting File System (native)Windows Encrypting File System (native)

•Apple – File Vault (native)Apple – File Vault (native)

•WinZip/7 Zip/TruecryptWinZip/7 Zip/Truecrypt

•Savant Application Whitelist software Savant Application Whitelist software

2929

Page 30: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Back-ups

•Make 3 copies Make 3 copies – OriginalOriginal

– External/local External/local

– External/remote – different geographic areaExternal/remote – different geographic area

•Verify recovery is possibleVerify recovery is possible– Checksum validationChecksum validation

– Test file restore after initial set-upTest file restore after initial set-up

– Periodically thereafterPeriodically thereafter

3030

Page 31: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Data Back-up Options

•Hard DriveHard Drive

• Tape Back-upTape Back-up

•ServerServer

•Cloud StorageCloud Storage– Amazon S3Amazon S3

– Subject Repository/ Data CentersSubject Repository/ Data Centers• (PubChem, Dryad, IRI/LDEO) (PubChem, Dryad, IRI/LDEO)

3131

Page 32: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Metadata

Structured information that describes, explains, Structured information that describes, explains, locates, and otherwise makes it easier to locates, and otherwise makes it easier to retrieve and use an information resource. retrieve and use an information resource.

3 main types:3 main types:

DescriptiveDescriptive

AdministrativeAdministrative

StructuralStructural

3232

Page 33: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Major Research Metadata Standards

•Darwin Core (Biology)Darwin Core (Biology)

•DDI (Data Documentation Initiative, for data DDI (Data Documentation Initiative, for data sets in social and behavioral sciences) sets in social and behavioral sciences)

•DIF (Directory Interchange Format for scientific DIF (Directory Interchange Format for scientific data sets) data sets)

• EML (Ecological Metadata Language) EML (Ecological Metadata Language)

• FGDC/CSDGM (geographic data) FGDC/CSDGM (geographic data)

•National Biological Information Infrastructure National Biological Information Infrastructure (NBII)(NBII)

3333

Page 34: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Other DMP elements

•Who in the research group will be responsible Who in the research group will be responsible for data management?for data management?

•Are there tools or software needed to Are there tools or software needed to create/process/visualize the data? create/process/visualize the data?

3434

Page 35: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Writing Data Management Plans

• Follow CU and funder polices and guidelinesFollow CU and funder polices and guidelines

•Can use CUL template as starting pointCan use CUL template as starting point

•Visit SCP web site for further informationVisit SCP web site for further information

http://scholcomm.columbia.edu/

3535

Page 36: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Data Management Plans - NSF

1.1. TYPES of data, samples, physical collections, software, TYPES of data, samples, physical collections, software, curriculum materials, and other materials to be curriculum materials, and other materials to be produced in the course of the projectproduced in the course of the project

2.2. STANDARDS to be used for data and metadata format STANDARDS to be used for data and metadata format and content (where existing standards are absent or and content (where existing standards are absent or deemed inadequate, this should be documented along deemed inadequate, this should be documented along with any proposed solutions or remedies)with any proposed solutions or remedies)

3.3. ACCESS and sharing policies including provisions for ACCESS and sharing policies including provisions for appropriate protection of privacy, confidentiality, appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or security, intellectual property, or other rights or requirementsrequirements

4.4. Policies and provisions for RE-USE, re-distribution, and Policies and provisions for RE-USE, re-distribution, and the production of derivativesthe production of derivatives

5.5. Plans for ARCHIVING data, samples, and other research Plans for ARCHIVING data, samples, and other research products, and for preservation of access to themproducts, and for preservation of access to them

6.6. OROR justification why no plan is needed justification why no plan is needed3636

Page 37: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Data Sharing Plan - NIH

1.1. Expected schedule for data sharingExpected schedule for data sharing

2.2. Format of the final datasetFormat of the final dataset

3.3. Documentation to be providedDocumentation to be provided

4.4. Whether or not any analytic tools Whether or not any analytic tools will be providedwill be provided

5.5. Whether or not a data-sharing Whether or not a data-sharing agreement will be required and, if agreement will be required and, if so, a brief description of such an so, a brief description of such an agreementagreement

6.6. Mode of data sharingMode of data sharing

3737

Page 38: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Takeaways

•Create a plan to manage your research data Create a plan to manage your research data before the project beginsbefore the project begins

• Follow the planFollow the plan

•At the end of the project securely archive data At the end of the project securely archive data of long term value and of long term value and

•Properly dispose of obsolete or sensitive dataProperly dispose of obsolete or sensitive data

•Guidance available from OVPR and Scholarly Guidance available from OVPR and Scholarly Communications ProgramCommunications Program

3838

Page 39: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Sharing your data Sharing your data Emerging practicesEmerging practices

3939

Page 40: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Why isn’t data sharing the norm?

• not common in many disciplines not common in many disciplines

• not recognized in promotion/tenurenot recognized in promotion/tenure

• researcher gives up control of dataresearcher gives up control of data

• worries about being scooped or worries about being scooped or misinterpretedmisinterpreted

• time required to present data in usable time required to present data in usable formatformat

• lack of infrastructure and standardslack of infrastructure and standards

4040

Page 41: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Sharing increasingly seen as valuable

““More and more often these More and more often these days, a research project's days, a research project's success is measured not just by success is measured not just by the publications it produces, but the publications it produces, but also by the data it makes also by the data it makes available to the wider available to the wider community.”community.”

-- Nature Nature editorialeditorial 9.10.099.10.09

““It is obvious that making data It is obvious that making data widely available is an essential widely available is an essential element of scientific research.”element of scientific research.”

- Science - Science editorial 2.11.11editorial 2.11.11

4141

Page 42: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

4242

““Science has always been about open Science has always been about open debate. But incidents such as the UEA email debate. But incidents such as the UEA email leaks have prompted the Royal Society to leaks have prompted the Royal Society to look at how open science really is.  With the look at how open science really is.  With the advent of the Internet, the public now advent of the Internet, the public now expect a greater degree of transparency. expect a greater degree of transparency. The impact of science on people’s lives, and The impact of science on people’s lives, and the implications of scientific assessments the implications of scientific assessments for society and the economy are now so for society and the economy are now so great that  people won’t just believe great that  people won’t just believe scientists when they say “trust me, I’m an scientists when they say “trust me, I’m an expert.” … Science has to adapt.” expert.” … Science has to adapt.”

- Geoffrey Boulton, chair Royal Society working - Geoffrey Boulton, chair Royal Society working group for study: group for study: Science as a public enterprise: Science as a public enterprise:

opening up scientific informationopening up scientific information, 5.13.11, 5.13.11

New need for openness

Page 43: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Sharing advances science

Sharing can help produce significant Sharing can help produce significant advances in research, as these projects advances in research, as these projects have demonstrated.have demonstrated.

Human Human Genome Genome ProjectProject

NIH-funded NIH-funded Alzheimer’s study Alzheimer’s study published in April published in April 20112011

Sloan Sloan Digital Digital Sky Sky SurveySurvey

4343

Page 44: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Sharing benefits researchers

Rewards of sharing may include:Rewards of sharing may include:

• opportunities to do innovative researchopportunities to do innovative research

• research with higher impactresearch with higher impact

• support for transparency in research support for transparency in research

• recognition, reciprocity from colleaguesrecognition, reciprocity from colleagues

• more opportunities to preserve datamore opportunities to preserve data

4444

Page 45: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

You may have to share

More funders are requiring itMore funders are requiring it

The National Science Foundation now The National Science Foundation now asks researchers requesting funding to asks researchers requesting funding to show how they will share data.show how they will share data.

• Grant applications must include a Grant applications must include a two-page data management plan.two-page data management plan.

• Data management and access plans Data management and access plans will be evaluated “through the will be evaluated “through the process of peer review and program process of peer review and program management.”management.”

4545

Page 46: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

You may have to share

More journals are requiring itMore journals are requiring it

“…“…authors are required to make materials, authors are required to make materials, data and associated protocols promptly data and associated protocols promptly available to readers….available to readers….Nature Nature journals reserve journals reserve the right to refuse publication in cases where the right to refuse publication in cases where authors do not provide adequate assurances authors do not provide adequate assurances that they can comply...”that they can comply...”

4646

Page 47: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

What do you share?

NSF says data covered by its NSF says data covered by its data management and sharing data management and sharing requirements will “be requirements will “be determined by the community determined by the community of interest.” of interest.”

This “may include, but is not This “may include, but is not limited to: data, publications, limited to: data, publications, samples, physical collections, samples, physical collections, software and models.”software and models.”

4747

Page 48: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Some data are not shareable

Be aware of reasons you may NOT Be aware of reasons you may NOT want to share your data:want to share your data:

• Data must be scrubbed of Data must be scrubbed of confidential information before confidential information before sharing. sharing.

• You may be able to justify not You may be able to justify not sharing if your data includes sharing if your data includes proprietary licenses or patentable proprietary licenses or patentable items, is useful for further items, is useful for further analyses, etc.analyses, etc.

4848

Page 49: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

How and when do you share?““How” depends on… How” depends on…

• the format of your datathe format of your data• funder and publisher requirementsfunder and publisher requirements• any restrictions on your dataany restrictions on your data

““When” depends on…When” depends on…• customary embargo periodscustomary embargo periods• if relevant guidelines specify amount if relevant guidelines specify amount

of time within which data must be of time within which data must be sharedshared

4949

Page 50: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

5050

Guidelines from the NSF

Data should be provided at lowest possible cost.Data should be provided at lowest possible cost.

Data may be made available viaData may be made available via

• national data centernational data center

• widely available journal, book, or websitewidely available journal, book, or website

• institutional archives standard for discipline institutional archives standard for discipline

• other EAR-specified repositories. other EAR-specified repositories.

Data should be made available as soon as Data should be made available as soon as possible, but no later than two years after possible, but no later than two years after collection. collection.

Division of Earth Sciences (EAR) Division of Earth Sciences (EAR)

Page 51: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Repositories are: Repositories are:

• organized around institutions or subjectsorganized around institutions or subjects

• often open accessoften open access

• archival, not active, storage for digital dataarchival, not active, storage for digital data

• may offer:may offer:

o long-term preservation and accesslong-term preservation and access

o search engine optimizationsearch engine optimization

o permanent URL or DOI permanent URL or DOI

Online repositories

5151

Page 52: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Columbia’s repository

AC accepts data and other materials from AC accepts data and other materials from Columbia faculty, students, and staff, and Columbia faculty, students, and staff, and provides: provides: • a permanent URLa permanent URL• secure replicated storagesecure replicated storage• accurate metadataaccurate metadata• globally accessible repository globally accessible repository • option for contextual linking between data option for contextual linking between data

and published research resultsand published research results5252

Page 53: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Some subject-based repositories

5353

NASA’s space science NASA’s space science mission repositorymission repository

Cryospheric data repository Cryospheric data repository run by U of Coloradorun by U of Colorado

Macromolecular structural Macromolecular structural data repository run by data repository run by international consortiuminternational consortium

NOAA’s NOAA’s marine data marine data

repositoryrepository

Biological activities of small Biological activities of small molecules data repository run molecules data repository run by NCBI at Nat’l Library of by NCBI at Nat’l Library of MedicineMedicine

Page 54: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

5454

More subject-based repositories

Deep-sea core Deep-sea core samples repository samples repository housed at LDEOhoused at LDEO

Data repository for Data repository for archeology and archeology and related disciplines related disciplines run by nonprofit run by nonprofit consortiumconsortium

Basic and applied biosciences Basic and applied biosciences data repository run by data repository run by consortium of publishersconsortium of publishers

Geodesy data Geodesy data repository run by repository run by university university consortiumconsortium

Social science data repository Social science data repository run by consortiumrun by consortium

Page 55: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Data licenses

• Copyright issues around data can be Copyright issues around data can be complexcomplex

• These groups offer “ready-made” licenses These groups offer “ready-made” licenses for data that help clarify any restrictions on for data that help clarify any restrictions on reusereuse

5555

Page 56: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Data sharing is here to stay

Initiatives are underway to:Initiatives are underway to:

• establish norms for sharingestablish norms for sharing

• create sharing and preservation infrastructurecreate sharing and preservation infrastructure

• establish standards for interoperabilityestablish standards for interoperability

• clarify copyright and licensing issues clarify copyright and licensing issues

Data ConservancyData Conservancy

Digital Curation CentreDigital Curation Centre5656

Page 57: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Takeaways

• Data sharing requirements are being Data sharing requirements are being

implemented by more funders and implemented by more funders and publishers.publishers.

• Norms and standards for sharing are not set Norms and standards for sharing are not set and vary across disciplines.and vary across disciplines.

• Be aware of sharing requirements and Be aware of sharing requirements and restrictions on your data. restrictions on your data.

• Find links to a variety of institutional and Find links to a variety of institutional and data repositories at data repositories at http:scholcomm.columbia.eduhttp:scholcomm.columbia.edu

5757

Page 58: Introduction to Data Management. 2 Data Management Overview of research dataOverview of research data –Joel Roselin, Office of Research Compliance and.

Contacts

• Joel RoselinJoel Roselin• Office of Research Compliance and TrainingOffice of Research Compliance and Training• [email protected]@columbia.edu

• Danianne MizzyDanianne Mizzy• Engineering LibrarianEngineering Librarian• [email protected]@columbia.edu

• Kathryn PopeKathryn Pope• Center for Digital Research and ScholarshipCenter for Digital Research and Scholarship• [email protected]@columbia.edu