Top Banner
GRAD 521, Research Data Management Winter 2014 - Lecture 1 Amanda L. Whitmire, Asst. Professor
33
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to research data management; Lecture 01 for GRAD521

GRAD 521, Research Data Management

Winter 2014 - Lecture 1

Amanda L. Whitmire, Asst. Professor

Page 2: Introduction to research data management; Lecture 01 for GRAD521

Lesson One Outline

Introductions

The importance ofdata management

What is/are ‘data’?

Page 3: Introduction to research data management; Lecture 01 for GRAD521

B.S. in Aquatic Biology, 2000Worked in a bioluminescence laboratory

Ph.D. in Oceanography, emphasis in biological oceanography, 2008Dissertation study area: bio-optics; using optical tools to study ocean ecology (N. California Current)

Post-doc in Oceanography, emphasis in biological oceanography, 2008-2012Study area: bio-optics; using optical tools to study ocean ecology in low oxygen zones (N. Chile)

Assistant Professor, Data Management Specialist, Sept. 2012 - present

Page 4: Introduction to research data management; Lecture 01 for GRAD521
Page 5: Introduction to research data management; Lecture 01 for GRAD521

Course Overview

Overview of research data management, definitions & best practices

Types, formats & stages of research data

Data storage, backup & security

Metadata (data documentation)

Legal & ethical considerations of research data

Data sharing & reuse

Archiving & preservation

Page 6: Introduction to research data management; Lecture 01 for GRAD521

Pair & Share

Name

College/Department/Unit/etc.

1st year, 2nd year, etc.

What is/are data?

Page 7: Introduction to research data management; Lecture 01 for GRAD521

Why actively manage it?

What is data?

Page 8: Introduction to research data management; Lecture 01 for GRAD521

“…the recorded factual material commonly accepted in the scientific community as necessary to validate

research findings.”

Research data is:

U.S. Office of Management and Budget, Circular A-110

8

Page 9: Introduction to research data management; Lecture 01 for GRAD521

“Unlike other types of information, research data are collected, observed, or created, for the

purposes of analysis to produce and validate original research results.”

University of Edinburgh

MANTRA Research Data Management Training,

‘Research Data Explained’

What is research data?

Page 10: Introduction to research data management; Lecture 01 for GRAD521

Actions that contribute to effective storage, preservation and reuse ofdata and documentation throughout the research lifecycle.

What is data management?

Page 11: Introduction to research data management; Lecture 01 for GRAD521

Data management is not:

Data scienceComputational scienceDatabase administrationA research method:

• what data to collect• how to collect them• how to design an experiment

Page 12: Introduction to research data management; Lecture 01 for GRAD521

Why Data Management?

Page 13: Introduction to research data management; Lecture 01 for GRAD521

Images collected by DataOne.org

Page 14: Introduction to research data management; Lecture 01 for GRAD521

Ph

oto

co

urt

esy

of

ww

w.c

arb

oaf

rica

.net

Data is collected from sensors, sensor networks, remote sensing, observations, and more - this calls for increased attention to data management and stewardship

Data deluge

Ph

oto

co

urt

esy

of

htt

p:/

/mo

dis

.gsf

c.n

asa.

gov/

Ph

oto

co

urt

esy

of

htt

p:/

/ww

w.f

utu

rlec

.co

m

CC

imag

e b

y ta

jaio

n F

lickr

CC

imag

e b

y C

IMM

YT o

n F

lickr

Imag

e co

llect

ed b

y V

ivH

utc

hin

son

Page 15: Introduction to research data management; Lecture 01 for GRAD521

Source: John Gantz, IDC Corporation: The Expanding Digital Universe

0

100,000

200,000

300,000

400,000

500,000

600,000

700,000

800,000

900,000

1,000,000

2005 2006 2007 2008 2009 2010

Transient information or unfilled demand for storage

Information

Available Storage

Peta

byt

es W

orl

dw

ide

The World of Data Around Us

Page 16: Introduction to research data management; Lecture 01 for GRAD521

Natural disaster

Facilities infrastructure failure

Storage failure

Server hardware/software failure

Application software failure

External dependencies (e.g. PKI failure)

Format obsolescence

Legal encumbrance

Human error

Malicious attack by human or automated agents

Loss of staffing competencies

Loss of institutional commitment

Loss of financial stability

Changes in user expectations and requirements

The World of Data Around Us: Data Loss

CC

imag

e b

y Sh

aryn

Mo

rro

w o

n F

lickr

CC

imag

e b

y m

om

bo

leu

mo

n F

lickr

Page 17: Introduction to research data management; Lecture 01 for GRAD521

Poor Data Management Affects Everyone

“MEDICARE PAYMENT ERRORS NEAR $20B” | (CNN) December 2004

Miscoding and billing errors from doctors and hospitals totaled $20,000,000,000 in FY2003 (9.3% error rate). The error rate measured claims that were paid despite being medically unnecessary, inadequately documented or improperly coded. In some instances, Medicare asked health care providers for medical records to back up their claims and got no response. The survey did not document instances of alleged fraud. This error rate actually was an improvement over the previous fiscal year (9.8% error rate).

“AUDIT: JUSTICE STATS ON ANTI-TERROR CASES FLAWED” | (AP) February 2007

The Justice Department Inspector General found only two sets of data out of 26 concerning terrorism attacks were accurate. The Justice Department uses these statistics to argue for their budget. The Inspector General said the data “appear to be the result of decentralized and haphazard methods of collections … and do not appear to be intentional.”

“OOPS! TECH ERROR WIPES OUT ALASKA INFO” | (AP) March 2007

A technician managed to delete the data and backup for the $38 billion Alaska oil revenue fund – money received by residents of the State. Correcting the errors cost the State an additional $220,700 (which of course was taken off the receipts to Alaska residents.)

Slide courtesy of BLM

Page 18: Introduction to research data management; Lecture 01 for GRAD521

A wildlife biologist for a small field office was the in-house GIS expert and provided support for all the staff’s GIS needs. However, the data was stored on her own workstation. When the biologist relocated to another office, no one understood how the data was stored or managed.

Solution: A state office GIS specialist retrieved the workstation and sifted through files trying to salvage relevant data.

Cost: 1 work month ($4,000) plus thevalue of data that was not recovered

Poor Science Data Management Example

CC

imag

e b

y D

TRav

eo

n

Op

en C

lip A

rt L

ibra

ry

Page 19: Introduction to research data management; Lecture 01 for GRAD521

Importance of Data Management

The climate scientists at the centre of a media storm over leaked emails were yesterday cleared of accusations that they fudged their results and silenced critics, but a review found they had failed to be open enough about their work.

Page 20: Introduction to research data management; Lecture 01 for GRAD521
Page 21: Introduction to research data management; Lecture 01 for GRAD521

Manage your data for yourself:

o Keep yourself organized

o Track your research processes for

reproducibility

o Better control versions of data

oQuality control your data more efficiently

Why Data Management: Researcher Perspective

Page 22: Introduction to research data management; Lecture 01 for GRAD521

Make backups to avoid data loss

Format your data for re-use (by yourself or others)

Be prepared: Document your data for your own

recollection, accountability, and re-use (by yourself or others)

Prepare it to share it – gain credibility

and recognition for your science efforts!

CC

imag

e b

y U

WW

Res

Net

on

Flic

kr

Why Data Management: Researcher Perspective

Page 23: Introduction to research data management; Lecture 01 for GRAD521

Data is a valuable asset

It is expensive & time consuming to collect

Why data management: Foundation to advance science

Page 24: Introduction to research data management; Lecture 01 for GRAD521

Well-managed data can result in re-use, integration & new science

Spatio-Temporal Exploratory Models predict the probability of occurrence of bird species across the United States at a 35 km x 35 km grid.

Land Cover

Potential Uses-• Examine patterns of migration • Infer impacts of climate change• Measure patterns of habitat usage• Measure population trends

Model results

eBird

Meteorology

MODIS –Remote sensing data

Occurrence of Indigo Bunting (2008)

Jan Sep DecJunApr

Slide courtesy of DataONE

Page 25: Introduction to research data management; Lecture 01 for GRAD521

Data Integration Results

Images court

esy o

f C

orn

ell

Orn

itholo

gy L

ab

http://www.youtube.com/watch?v=Cik6fIuoPDk

Page 26: Introduction to research data management; Lecture 01 for GRAD521

Where a majority of data end up now…

Page 27: Introduction to research data management; Lecture 01 for GRAD521

Imagine if data were more accessible

Page 28: Introduction to research data management; Lecture 01 for GRAD521

New discoveriesA new image processing technique reveals something not before seen in this Hubble Space Telescope image taken 11 years ago: A faint planet (arrows), the outermost of three discovered with ground-based telescopes last year around the young star HR 8799.D. Lafrenière et al., Astrophysical Journal Letters

“The first thing it tells you is how valuable maintaining long-term archives can be. Here is a major discovery that’s been lurking in the data for about 10 years!” comments Matt Mountain, director of the Space Telescope Science Institute in Baltimore, which operates Hubble.

“The second thing its tells you is having a well calibrated archive is necessary but not sufficient to make breakthroughs — it also takes a very innovative group of people to develop very smart extraction routines that can get rid of all the artifacts to reveal the planet hidden under all that telescope and detector structure.”

“Planet hidden in Hubble archives”

Science News Feb. 27, 2009

D. L

afre

niè

reet

al.,

Ap

JLe

tter

s

Page 29: Introduction to research data management; Lecture 01 for GRAD521

The data deluge has created a surge of information that needs to be well-managed and made accessible.

The cost of not doing data management can be very high.

Be cognizant of best practices and tools associated with the data lifecycle to manage your data well.

Many benefits are associated with the act of managing data, including the ability to find, access, understand, integrate and re-use data.

Summary

Page 30: Introduction to research data management; Lecture 01 for GRAD521

Summary, continued

If data are:

Well-organized

Documented

Preserved

Accessible

Verified as to accuracy and validity

The result is:

High quality data

Easy to share and re-use

Citation & credibility to the researcher

Cost-savings to science

Page 31: Introduction to research data management; Lecture 01 for GRAD521

Thursday

Data management plans & the research lifecycle

Homework:Take the pre-assessment survey

(link in Canvas)

Page 32: Introduction to research data management; Lecture 01 for GRAD521

Archived slides

Page 33: Introduction to research data management; Lecture 01 for GRAD521

About You