Top Banner
Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO [email protected] 5 th NACP Principal Investigator’s Meeting Washington, DC January 25, 2015
38

Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO [email protected] 5 th NACP Principal Investigator’s.

Jan 11, 2016

Download

Documents

Amice Barnett
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

Science Metadata

Viv HutchisonUS Geological SurveyCore Science Analytics Synthesis & LibrariesDenver, [email protected]

5th NACP Principal Investigator’s MeetingWashington, DCJanuary 25, 2015

Page 2: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

Presenter: Viv Hutchison• US Geological Survey• Core Science Analytics Synthesis & Libraries

program• Branch Chief, Science Data Management• Lead a team that works on application of the

science data lifecycle for USGS scientists through best practices, tools, training

[email protected]

2

ORNL, Oak Ridge, TN

Page 3: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

• Why metadata?

• Examples of metadata standards and how to choose one to use

• Tips on how to write quality metadata records

• Publishing metadata

Topics

CC

im

ag

e b

y A

lec

Cou

ros

on

Flic

kr

3

Page 4: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

The Data Life Cycle

4

Page 5: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

Data Collection

CC

im

ag

e b

y Ju

stin

See o

n F

lickr

CC

im

ag

e b

y C

IMM

YT o

n F

lickr

CC

im

ag

e b

y a

cord

ova

on

Flic

kr

CC

im

ag

e b

y k

ukk

uro

vaca

on

Flic

kr

CC

im

ag

e b

y S

ED

AC

on

Flic

krC

C im

ag

e b

y IS

AS

on F

lickr

5

Page 6: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

Average Temperature of Observation for Each Species

From Field Notes to Datasets

Species Average Temperature

Temperature Standard Deviation

Number of Observations

Minimum Temperature

Maximum Temperature

Northern

Red-legged Frog

4.4 --- 1 4.4 4.4

Tailed Frog 7.0 3.0 3 4 10

Arizona Toad 10.0 --- 1 10 10

Strecker's Chorus Frog

10.5 2.0 11 9 16

Oregon Spotted Frog

11.0 15.5 2 0 22

New Jersey Chorus Frog

11.5 4.5 17 3 22

Wood Frog 12.5 5.5 897 0 28.8

Spring Peeper 13.2 5.6 569 -1 32

Red-legged Frog 13.3 5.9 16 4 27

6

Page 7: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

From Datasets to Published Papers

CC

im

ag

e b

y H

eath

er

Ken

ned

y

on

Flic

kr7

Page 8: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

Metadata is a critical part of the data picture

CC

im

ag

e b

y I lik

e o

n F

lickr

8

Page 9: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

Why Care About Metadata?• Fourth Paradigm: scientific breakthroughs will

increasingly be powered by advanced computing capabilities that help researchers manipulate and explore massive datasets.

• “Metadata must be preserved when scientific data is generated…” -- Jim Gray, The Fourth Paradigm

• Further the time/space distance between data producer and re-use, the more detailed metadata that is required.

9

Page 10: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

Metadata: Why Care?

“Please forgive my paranoia about protocols, standards, and data review. I'm in the latter stages of a long career with USGS (30 years, and counting), and have experienced much. Experience is the knowledge you get just after you needed it.

Several times, I've seen colleagues called into court in order to testify about conditions they have observed.

Without a strong tradition of constant review and approval of basic data, they would've been in deep trouble under cross-examination. Instead, they were able to produce field notes, data approval records, and the like to back up their testimony.

It's one thing to be questioned by a college student who is working on a project for school. It's another entirely to be grilled by an attorney under oath with the media present.”

Nelson Williams Eastern RegionUSGS Water

10

Page 11: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

Metadata: Why Care?

The climate scientists at the centre of a media storm over leaked emails were yesterday cleared of accusations that they fudged their results and silenced critics, but a review found they had failed to be open enough about their work.

Senior climatologists were accused of manipulating important global temperature data

Investigations emphasized need for data to be more open to ensure credibility and avoid future misguided controversy

Metadata aids in open science

11

Page 12: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

A new image processing technique reveals something not before seen in this Hubble Space Telescope image taken 11 years ago: A faint planet (arrows), the outermost of three discovered with ground-based telescopes last year around the young star HR 8799.D. Lafrenière et al., Astrophysical Journal Letters

“The first thing it tells you is how valuable maintaining long-term archives can be. Here is a major discovery that’s been lurking in the data for about 10 years!” comments Matt Mountain, director of the Space Telescope Science Institute in Baltimore, which operates Hubble.

“Planet hidden in Hubble archives” Science News

(Feb. 27, 2009)

Metadata: Why Care?

…Metadata is critical in maintaining data in archives – for understanding data you discover 12

Page 13: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

The Value of Metadata

Data developers

Datausers

Organizations

MetadataMetadatahelps…helps…

13

Page 14: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

• Metadata allows data developers to:– Avoid data duplication – Share reliable information– Publicize efforts – promote the

work of a scientist and his/her contributions to a field of study

– Reduce Workload

What is the Value to Data Developers?

CC

im

ag

e b

y U

S E

mb

ass

y G

uyan

a

on

Flic

kr

14

Page 15: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

• Metadata gives a user the ability to:– Search, retrieve, and evaluate data set

information from both inside and outside an organization

– Find data: Determine what data exists for a geographic location and/or topic

– Determine applicability: Decide if a data set meets a particular need

– Discover how to acquire the dataset you identified; process and use the dataset

What is the Value to Data Users?

CC

im

ag

e b

y A

SEE o

n F

lickr

15

Page 16: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

• Metadata helps ensure an organization’s investment in data:

– Documentation of data processing steps, quality control, definitions, data uses, and restrictions

– Ability to use data after initial intended purpose

• Transcends people and time: – Offers data permanence– Creates institutional memory

• Advertises an organization’s research: – Creates possible new partnerships and

collaborations through data sharing

What is the Value to Organizations?

CC

im

ag

e b

y m

am

bol on

Flic

kr

16

Page 17: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

When data isn’t well managed…In

form

ati

on

Con

ten

t

Time

Time of publication

Specific details

General details

Accident

Retirement or career change

Death

(Michener et al. 1997)

Page 18: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

Memory Check

50% change in global average

Why?i checked my 2002 email archives, and here is what i found out:

it appears that the current 3rd generation algorithm was implemented into operations around Oct-Nov 2002 time frame. cannot say more precisely, as all email correspondence i am looking at, talks about this indirectly. (maybe it's what's refered to as the Phase II algorithm.) At the same time, we had implemented quite a few other changes fixing data bugs and formats: view angle problem, increased digitization in all channel's reflectances and AODs, etc.

The jump is deemed due to introducing 3rd generation algorithm, which replaced the 2nd generation. The new numbers (~0.08) look more realistic than the previous ones (~0.05 or so). The changes seen in the data is close to the expected effect of this change. The 3rd gen alg takes into account the exact spectral response, whereas the 2nd gen is generic ("one size fits all").

hopefully this settles the issue..

Page 19: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

Information Entropy

TIME

DA

TA

D

ETA

ILS

Sound information management, including metadata development, can arrest the loss of dataset detail.

19

Page 20: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

Even if the value of data documentation is recognized, concerns remain as to the effort required to create metadata that effectively describe the data.

Still…There are Occasional Concerns About Creating Metadata

CC

im

ag

e b

y w

ate

rlily

sag

e

on

Flic

kr

20

Page 21: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

Let’s Address these Concerns…

Concern Solution

workload required to capture accurate robust metadata

incorporate metadata creation into data development process – distribute the effort

time and resources to create, manage, and maintain metadata

include in grant budget and schedule

readability / usability of metadata use a standardized metadata format

discipline specific information and ontologies

use ‘profile’ standard to require specific information and use specific values

21

Page 22: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

Selecting a Standard

22

Page 23: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

Many standards collect similar information…factors to consider:

Your data type:• Are you working mainly with GIS data? Rastor/vector or point data?

Do you have biological or shoreline information in your dataset? - Consider the FGDC Content Standard for Digital Geospatial Metadata with one of its profiles: the Biological Data Profile or the Shoreline Data Profile.

• Are you working with data retrieved from instruments such as monitoring stations or satellites? Are you using geospatial data services such as applications for web-mapping applications or data modeling?

–If so, then consider using the ISO 19115-2 standard• Are you mainly working with ecological data?

– Consider Ecological Metadata Language (EML)

Choosing a Metadata Standard

23

Page 24: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

– Your organization’s policies: do they state which standard to use?

– What tools are available to create metadata? Examples of Tools:

FGDC CSDGM: – Mermaid (NOAA)– Metavist (Forest Service) -- Online Metadata Editor (USGS) EML:–- Morpho (KNB)ISO: -- XML Spy or Oxygen--- CatMD

Other factors: Availability of human support; instructional materials; use of controlled vocabularies; output formats

Choosing a Metadata Standard

24

Page 25: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

Writing Quality Metadata

25

Page 26: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

• Organize your information– Did you write a project abstract to obtain funding for your

proposal? Re-use it in your metadata! – Did you use a lab notebook or other notes during the data

development process that define measurements and other parameters?

– Do you have the contact information for colleagues you worked with?

– What about citations for other data sources you used in your project?

Steps to Create Quality Metadata

CC

im

ag

e b

y o

n G

oog

le

Imag

es

26

Page 27: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

• Write your metadata using a metadata tool• Submitting to the DAAC? A metadata creation

process in in place for you..

Steps to Create Quality Metadata

27

Page 28: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

• Review for accuracy and completeness• Have someone else read your record• Revise the record, based on comments

from your reviewer• Review once more before you publish

Steps to Create Quality Metadata

CC

im

ag

e b

y m

uja

lifah

on

Fl

ickr

CC

im

ag

e b

y S

helly

Mu

nkb

erg

on

Flic

kr

28

Page 29: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

• Do not use jargon -- define technical terms and acronyms:

– CA, LA, GPS, GIS : what do these mean?• Clearly state data limitations

– E.g., data set omissions, completeness of data– Express considerations for appropriate re-use of the data

• Use “none” or “unknown” meaningfully– None usually means that you knew about data and

nothing existed (e.g., a “0” cubic feet per second discharge value)

– Unknown means that you don’t know whether that data existed or not (e.g., a null value)

Tips for Writing Quality Metadata

CC

im

ag

e b

y k

ruu

sch

t on

Flic

kr

29

Page 30: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

Titles, Titles, Titles…•Titles are critical in helping readers find your data

– While individuals are searching for the most appropriate data sets, they are most likely going to use the title as the first criteria to determine if a dataset meets their needs.

– Treat the title as the opportunity to sell your dataset.•A complete title includes: What, Where, When, Who, and Scale•An informative title includes: topic, timeliness of the data, specific information about place and geography

Tips for Writing Quality Metadata

30

Page 31: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

• A Clear Choice: Which title is better?

• Rivers OR• Greater Yellowstone Rivers from 1:126,700 U.S.

Forest Service Visitor Maps (1961-1983)

Greater Yellowstone (where) Rivers (what) from 1:126,700 (scale) U.S. Forest Service (who) Visitor Maps (1961-1983) (when)

Tips for Writing Quality Metadata

CC

im

ag

e b

y d

olfi

on

Fl

ickr

31

Page 32: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

• Be specific and quantify when you can! The goal of a metadata record is to give the user enough information to know if they can use the data without contacting the dataset owner.

Vague: We checked our work and it looks complete.

Specific: We checked our work using a random sample of 5 monitoring sites reviewed by 2 different people. We determined our work to be 95% complete based on these visual inspections.

Tips for Writing Quality Metadata

CC

im

ag

e b

y P

NA

SH

on

Flic

kr

32

Page 33: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

• Use descriptive and clear writing• Fully qualify geographic locations• Select keywords wisely - use thesauri for keywords

whenever possible Example: USGS Biocomplexity Thesaurus (over 9,500 terms)

Tips for Writing Quality Metadata

CC

im

ag

e b

y M

arc

o A

rmen

t o

n F

lickr

33

Page 34: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

• Remember: a computer will read your metadata• Do not use symbols that could be misinterpreted:

Examples: ! @ # % { } | / \ < > ~• Do not use tabs, indents, or line feeds/carriage

returns• When copying and pasting from other sources, use

a text editor (e.g., Notepad) to eliminate hidden characters

Tips for Writing Quality Metadata

CC

im

ag

e b

y B

en

on

Goog

le

Imag

es

34

Page 35: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

• Fully define entities, attributes, units of measure• Ignore temptation to only fill in mandatory fields in the standard --

skipping sections of metadata standard labeled “mandatory if applicable” or “optional” are often critical portions of the standard

– Example:

Tips for Writing Quality Metadata

Seven Major Metadata Sections: Section 1 - Identification Information*Section 2 - Data Quality Information Section 3 - Spatial Data InformationSection 4 - Spatial Reference Information Section 5 - Entity and Attribute Information Section 6 - Distribution Information Section 7 - Metadata Information*

Three Supporting Sections:Section 8 - Citation Information*Section 9 - Time Period Information* Section 10 - Contact Information*

* Minimum required metadata 35

Page 36: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

• Share your metadata with other researchers Examples of metadata search portals:

– DAAC• Distributed Active Archive for Biogeochemical

Dynamicshttp://daac.ornl.gov/index.shtml

– Data.gov• Federal e-gov geospatial data portalhttp://www.geo.data.gov

– Metacat• Repository for data and metadatahttp://knb.ecoinformatics.org/index.jsp

– DataONE• NSF-funded data infrastructurehttp://dataone.org

Share Your Metadata: Distribution

36

Page 37: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

DAAC Search

37

Page 38: Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5 th NACP Principal Investigator’s.

NACP Best Data Management Practices, January 25, 2015

• Metadata is documentation of data• A metadata record captures critical information about the content of a

dataset• Metadata allows data to be discovered, accessed, and re-used• A metadata standard provides structure and consistency to data

documentation• Standards and tools vary – select according to defined criteria such as

data type, organizational guidance, and available resources• Metadata is of critical importance to data developers, data users, and

organizations• Writing quality metadata is important because records are expected to

last with the data over decades• Metadata completes a dataset.Creating robust metadata is in your OWN best interest!

Summary

38