Top Banner
Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico
78

Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Dec 29, 2015

Download

Documents

Bryan Wilcox
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

MetadataRCN Workshop

Samantha RomanelloLong Term Ecological Research

University of New Mexico

Page 2: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

In this session we will discuss…

• Metadata: what are they? and why should they be created?

• Metadata standards: why do we need them?• Metadata tools: what’s out there to help?• Creating metadata: just how much work is

this?• Finding and evaluating metadata: what is

good?• Metadata resources: what’s out there?

Page 3: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Metadata

what are they? and why should they be

created?

Page 4: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

What are metadata?

“higher level information that describe the content, quality, structure, and accessibility of a specific data set” Michener et al., 1997

Metadata?

Page 5: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Example

In front of you are two tuna shaped cans. How do you decide which can you would like to eat?

Page 6: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Metadata helps you decide which can you would like to eat !

Page 7: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Metadata are• The label

• The information the label contains

• Our understanding of what a label is and the information it describes

Page 8: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Metadata

• Provides the context of when, where, why, and how the data was collected

• It also provides the who – some insight into the analytical framework of the scientist who collected the data

Page 9: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Metadata is all around…

Page 10: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Data

072998 29.5 17.0

073098 29.7 6.1 073198 29.1 0

Page 11: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Data -- Metadata

Date Temp (C) Precip. (mm)

Obs. #1 072998 29.5 17.0Obs. #2 073098 29.7 6.1 Obs. #3 073198 29.1 0

Page 12: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

www.utexas.edu/depts/grg/gcraft/notes/mapproj/gif/threepro.gif

Page 13: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Value of Metadata

• Maintains internal investment in data

• Provides information to data catalogs and clearinghouses

• Promotes data sharing• Leads to potential research partners

(e.g., promotes data discovery)• Clarifies semantics• Enables machine-processing

Page 14: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Metadata describe:

• Who?• What?• When?• Where?• How?

about every facet of the data !

Page 15: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

In this session we will discuss…

• Metadata: what are they? and why should they be created?

• Metadata standards: why do we need them?• Metadata tools: what’s out there to help?• Creating metadata: just how much work is

this?• Finding and evaluating metadata: what is

good?• Metadata resources: what’s out there?

Page 16: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Metadata Standards

What are they and why do we need them?

Page 17: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Why do we need Metadata Standards …

… to reduce “information entropy”

en·tro·py : a process of degradation or running down or a trend to disorder – Merriam-Webster

Page 18: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Information Entropy over Time

Time

Info

rmat

ion

Co

nte

nt

Time of publication

Specific details

General details

Retirement orcareer change

DeathAccident

after Michener et al., 1997

Information usefulness at 10 years, 20 years, 30 years…

Page 19: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Why are metadata standardized ?

• To provide a common set of understandable terms to describe data;

• To facilitate entry and retrieval of metadata and data; and

• To create tools which can automate entry, search and integration of data

Page 20: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Metadata Standardization

• Defines a common terminology– Allows for system “cross-walks”; that is,

mapping one metadata structure to another

• Format and Structure– Binary (GeoTIFF header) … Text (XML)– Proprietary (MrSID) … Open (EML)

• Allows software engineers to automate– Entry– Searching– Integration– Synthesis

Page 21: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Metadata Content Specifications

• Dublin Core• NBII (National Biological Information

Infrastructure) Biological Data Profile / CSDGM (Content Standards for Digital Geospatial Metadata)

• ISO (International Organization for Standardization) CD 19115, Geographic information - metadata

• LTER Data Table of Contents• Darwin Core • Ecological Metadata Language (EML)

Page 22: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Ecological Metadata Language

• Adopted by the LTER Information Management• Metadata specification developed by the

ecology discipline for the ecology discipline• Based on prior work of Ecological Society of

America and others (Michener et. al., 1997)• Seven years in development – 14 versions

– EML 2.0.1• Implemented as an XML Schema• Supports four separate modules

– Dataset– Citation– Software– Protocol

Page 23: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

5 Classes of ecological metadata descriptors

• Data set• Research origin• Data set status and accessibility• Data structural• Supplemental

Page 24: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Metadata Descriptors

• What relevant data exist?• Why were those data collected and

are they suitable for a particular use?• How can these data be obtained?• How are the data organized and

structured?• What additional information is available

that would facilitate data use and interpretation?

Page 25: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

• Development influenced by SGML and HTML – Version 1.0 in early 1998

• A semantic language that lets you more meaningful annotate text (where HTML lets you define how text can be displayed, XML provides it with meaning).

• Important for presentation, exchange, and management of information

• Tools include DTD, Schema, XSL, and more…

XML: eXtensible Markup Language

Page 26: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

In this session we will discuss…

• Metadata: what are they? and why should they be created?

• Metadata standards: why do we need them?• Metadata tools: what’s out there to help?• Creating metadata: just how much work is

this?• Finding and evaluating metadata: what is

good?• Metadata resources: what’s out there?

Page 27: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Metadata tools

what’s out there to help?

Page 28: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

A Smorgasboard of Metadata Tools

• Proprietary• Non-

proprietary • On-line• Standalone• Windows • ASCII • Unix

Page 29: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Tools for Managing Metadata

• Flat-file System• Hybrid Flat-file System• Relational Databases

– Oracle, PostgreSQL, mySQL• Hybrid Relational Databases

– Metacat, Digital Library eXtension Service• Hierarchical Databases

– Adabas, IMS• Object-Relational Databases

– Birdstep, XDb, JADE

Page 30: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Metacat Data Repository

Page 31: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Tools for Creating Metadata

• Text editors– Notepad (Windows)– Emacs, vi (UNIX, Linux, …)– XML Specific (XMLSpy, oXygen, …)

• Custom software– NBII Metamaker– ESRI ArcCatalog– ecoinformatics.org Morpho

Page 32: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

ESRI ArcCatalog Metadata

• Metadata properties: Derived from the data itself and automatically created by ArcCatalog

• Documentation: Written by a person

• Metadata editor enforces Federal Geographic Data Committee (FGDC) standards

• Stored in XML format within the geodatabase

• Automatically exports/transfers with coverages

Page 33: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Attribute Metadata:

Properties…automaticDocumentation…input

ESRI ArcCatalog Metadata

Page 34: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Catalog list

Metadata Tab

Metadata Sections

Metadata creation/import selections

MetadataParts

ESRI ArcCatalog Metadata

Page 35: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

LocationMetadata

ESRI ArcCatalog Metadata

Page 36: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Import a template or existing file

Editscreen

ESRI ArcCatalog Metadata

Page 37: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Morpho• Metadata editor enforces EML 2.0

standards

• Stored in XML format within Metacat server

• Automatically imports Metadata and data

• 5 Classes of Metadata Descriptors• Data set descriptors• Research origin descriptors• Data set status and

accessibility • Data structural descriptors• Supplemental descriptors

Page 38: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Morpho

• Create & Edit Metadata

• Search & Query Metadata Collections

Page 39: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Data Set Descriptors

•Identity•Identification code

•Originator

Page 40: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Data Set DescriptorsAbstractKeywordsProject description

Page 41: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Research Origin Descriptors

•Research methods•Experimental or sampling design

Page 42: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Data Set Status & Accessibility

Where data is locatedWho is the contact personHow to access and use the data

Page 43: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Data Set Status & Accessibility

Dates of when the data were accessed & modified

Page 44: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Data Structural Descriptors•Variable information

•Units of measurement

Page 45: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Data Structural Descriptors

• Data type

• Data format

• Data anomalies

Page 46: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Supplemental Descriptors•Data acquisition

•Quality assurance

•Supplemental materials

•Computer programs

•Archival info•Publications•History of usage

Page 47: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

In this session we will discuss…

• Metadata: what are they? and why should they be created?

• Metadata standards: why do we need them?• Metadata tools: what’s out there to help?• Creating metadata: just how much work is

this?• Finding and evaluating metadata: what is

good?• Metadata resources: what’s out there?

Page 48: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Creating Metadata

Just how much work is this?

Page 49: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

How much work is this going to be???

Page 50: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Benefits of using Metadata

• Information entropy , data longevity

• Data reuse and sharing

– Even original researchers need refreshing

• System interoperability

• Broad-based data synthesis

• Compliance with funding agencies

Page 51: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Canadian Information Management Resource Centre

Page 52: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Variable Levels of Metadata Content & Structure Necessary for Specific Objectives (Michener 2000)

interoperability

indexing

Content

High

Low High Structure

personal

reuse

publication

expert colleague

resampling

3rd party exchange

Page 53: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Rules of Thumb (Michener 2000)

• The more comprehensive the metadata, the greater the longevity (& value) of the data

• Structured metadata can greatly facilitate data discovery, encourage “best metadata practices” & support data & metadata use by others

• Metadata implementation takes time!!!• Start implementing metadata for new data

collection efforts and then prioritize “legacy” & ongoing data sets that are of greatest benefit to the broadest user community

Page 54: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

The price you have to pay…

• Personnel costs

– time for learning and training

– additional effort

• Media for metadata storage

• Hardware/software requirements

• Long-term stewardship and curation

Page 55: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Make metadata implementation a team effort

• Team Leader• GIS Specialist• Field Personnel• Database Manager• Laboratory Specialist• Voucher/Repository Specialist• And others as appropriate….

Page 56: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

The Planning Process

• Develop a community-based plan– Legacy conversion– Present to future data entry– Metadata storage strategy

• Determine infrastructure needs– Personnel– Hardware– Software

• Reuse standards where possible• Be flexible when necessary

Page 57: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Long Term Ecological Research Network

• 26 U.S. LTER sites– 20 Continental/Coastal United States– 2 Alaska– 1 Puerto Rico– 2 Antarctica– 1 French Polynesia

• Funding Agency – NSF• 20+ years of research• IM IMExec CC

Page 58: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

LTER Tiered Trajectory for Metadata

Semi-automated knowledge extraction

Data access through a knowledge-based query process

Semantic-based discovery through machine-based searches

Future Outcome

• Complete validated EML

• Data analysis is integrated across the network

• Access-enabling metadata structured in EML

• Data access is integrated across network

• Discovery-enabling metadata structured in EML

• Data discovery integrated across network

Tier 3

• Structured, comprehensive metadata and data

• Data use does not require human intervention

• Automated access to data

• Access to site data and metadata does not require human intervention

• Online, enhanced metadata with consistent internal structure

• Data discovery through machine search

Tier 2

• Unstructured, machine-readable metadata and data

• Data use requires human intervention

• Establish data access policy

• Data and metadata access requires human intervention

• Unstructured, online site catalog with minimal metadata

• Data discovery through manual searches

Tier 1

UsabilityAccessDiscovery

Semi-automated knowledge extraction

Data access through a knowledge-based query process

Semantic-based discovery through machine-based searches

Future Outcome

• Complete validated EML

• Data analysis is integrated across the network

• Access-enabling metadata structured in EML

• Data access is integrated across network

• Discovery-enabling metadata structured in EML

• Data discovery integrated across network

Tier 3

• Structured, comprehensive metadata and data

• Data use does not require human intervention

• Automated access to data

• Access to site data and metadata does not require human intervention

• Online, enhanced metadata with consistent internal structure

• Data discovery through machine search

Tier 2

• Unstructured, machine-readable metadata and data

• Data use requires human intervention

• Establish data access policy

• Data and metadata access requires human intervention

• Unstructured, online site catalog with minimal metadata

• Data discovery through manual searches

Tier 1

UsabilityAccessDiscovery

Tiered TrajectoryMetadata completeness

Met

adat

a st

ruct

ure

Page 59: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

LTER Best Practices

• Identification• Discovery• Evaluation• Access• Integration• Semantic Use

Page 60: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

In this session we will discuss…

• Metadata: what are they? and why should they be created?

• Metadata standards: why do we need them?• Metadata tools: what’s out there to help?• Creating metadata: just how much work is

this?• Finding and evaluating metadata: what is

good?• Metadata resources: what’s out there?

Page 61: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Finding & evaluating metadata

what is good?

Page 62: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Finding Data & Metadata

• Colleagues• Scientific literature• WWW searches• Data and metadata registries

– Global Change Master Directory• Metadata Clearinghouses

– National Biological Information Infrastructure

Page 63: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Finding Data & Metadata

Page 64: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Finding Data & Metadata

Page 65: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Finding Data & Metadata

You can perform a simple search by either clicking on given keyword or typing in your own

Page 66: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Finding Data & Metadata

Or, You can perform an advancedsearch

Page 67: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Search Results

Page 68: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Associated Metadata• Data Set• Data Table• Xml files

Page 69: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

In this session we will discuss…

• Metadata: what are they? and why should they be created?

• Metadata standards: why do we need them?• Metadata tools: what’s out there to help?• Creating metadata: Just how much work is

this?• Finding and evaluating metadata: what is

good?• Metadata resources: what’s out there?

Page 70: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Finding and evaluating metadata

what is good?

Page 71: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Evaluating Metadata

Page 72: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
Page 73: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Evaluation Service

http://knb.ecoinformatics.org/emlparser/index.html

Page 74: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Evaluation (in short)

• Do the who? what? where? why? and how? of the data as documented in the metadata meet your particular needs?

Page 75: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

In this session we will discuss…

• Metadata: what are they? and why should they be created?

• Metadata standards: why do we need them?• Metadata tools: what’s out there to help?• Creating metadata: Just how much work is

this?• Finding and evaluating metadata: what is

good?• Metadata resources: what’s out there?

Page 76: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Metadata resources

what’s out there?

Page 77: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Resources

• http://www.nbii.gov• http://www.megrin.org/• http://www.eurogi.org• http://www.dlib.org• http://knb.ecoinformatics.org

Page 78: Metadata RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.

Resources

• Michener, W.K. 2000. Metadata. In: Ecological Data: Design, Management and Processing. (eds. W.K. Michener & J.W. Brunt), pp. 92-116. Blackwell Science, Oxford, United Kingdom.

• Michener, W.K., J.W. Brunt, J.J. Helly, T.B. Kirchner, and S.G. Stafford. 1997. Nongeospatial metadata for the ecological sciences. Ecological Applications 7(1):330-342.