GRAD 521, Research Data Management Winter 2014 – Lecture 9 Amanda L. Whitmire, Asst. Professor.
Post on 28-Dec-2015
217 Views
Preview:
Transcript
GRAD 521, Research Data Management Winter 2014 – Lecture 9
Amanda L. Whitmire, Asst. Professor
Data documentation through metadata
Lesson topics
1. Definition of metadata
2. Examine information included in a metadata record
3. Examples of metadata standards and how to choose
4. Illustrate the value of metadata to data users, data providers, and organizations
5. Describe the utility of metadata for a variety of scenarios beyond discovery
The data lifecycle
Data collection
CC im
age
by Ju
stin
See
on F
lickr
CC im
age
by C
IMM
YT o
n Fl
ickr
CC im
age
by a
cord
ova
on
Flic
kr
CC im
age
by k
ukku
rova
ca o
n Fl
ickr
CC im
age
by S
EDAC
on
Flic
krCC
imag
e by
ISAS
on
Flic
kr
From field notes to datasets
Average temperature of observation for each species
SpeciesAverage
Temperature
Temperature Standard Deviation
Number of Observations
Minimum Temperature
Maximum Temperature
Northern Red-legged Frog
4.4 --- 1 4.4 4.4
Tailed Frog 7.0 3.0 3 4 10
Arizona Toad 10.0 --- 1 10 10
Strecker's Chorus Frog
10.5 2.0 11 9 16
Oregon Spotted Frog
11.0 15.5 2 0 22
New Jersey Chorus Frog
11.5 4.5 17 3 22
Wood Frog 12.5 5.5 897 0 28.8
Spring Peeper 13.2 5.6 569 -1 32
Red-legged Frog 13.3 5.9 16 4 27
From datasets to published papers
CC im
age
by H
eath
er K
enne
dy o
n Fl
ickr
Working with data
When you provide data to someone else, what types of information would you want to include with the data?
When you receive a dataset from an external source, what types of details do you want to know about the data?
Working with data
Providing data: Why were the data created? What limitations, if any, do the data have? What does the data mean? How should the data be cited if it is re-used in a new study?
Receiving data:What are the data gaps?What processes were used for creating the data?Are there any fees associated with the data?In what scale were the data created? What do the values in the tables mean?What software do I need in order to read the data?What projection are the data in?Can I give these data to someone else?
What is metadata?
“Data about data”
“Structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource.”
NISO, Understanding Metadata
Metadata
“The metadata accompanying your data should be written for a user 20 years into the future -- what does that person need to know to use your data properly? Prepare the metadata for a user who is unfamiliar with your project, methods, or observations.”
Oak Ridge National Laboratory Distributed Active Archive Center for Biogeochemical
Dynamics(ORNL DAAC)
What is metadata?
WHO created the data? WHAT is the content of the data? WHEN were the data created? WHERE is it geographically? HOW were the data developed? WHY were the data developed?
Phot
o by
Mic
helle
Cha
ng. A
ll Ri
ghts
Res
erve
d
Metadata is: Data ‘reporting’
Levels of metadata
PROJECT LEVELDescriptive information
DATA LEVELGranular information
Metadata in real life
You use it all the time…
Metadata standards
Dublin Core (DC), Darwin
Core (DwC), EML, DDI, NBII,
FGDC/CSDGM, ISO 19139, ISO
19115, DIF, LDIF, e-GMS,
AGLS, METS, MODS, PREMIS,
OAI-PMH, MARC, CDWA,
CIDOC/CRM, DACS, DIG35,
GILS, GML, ISBD, LCSH, KML,
MARCXML, MEI, MODS, MIX,
OAIS, ANSI/NISO Z39.88, PB
Core, PRISM, QDC, RDF,
SGML, VSO, XML, XMP
What is a metadata standard?A Standard provides a structure to describe data with:
o Common terms to allow consistency between recordso Common definitions for easier interpretationo Common language for ease of communicationo Common structure to quickly locate information
In search and retrieval, standards provide:o Documentation structure in a reliable and predictable format for
computer interpretationo A uniform summary description of the dataset
CC im
age
by c
carls
tead
on
Flic
kr
What does a metadata record look like?
Ocean Currents and Biogeochemistry: Nearshore Water Profiles (Monthly CTD and Chemistry; SBC-LTER)web link
New York City Community Health Survey, 2009 (ICPSR)web link
Mountain hemlock tree-ring width chronologies from the western Oregon Cascade Mountains (USFS Research Data Archive)web link
Muddiest point…
What did you find unclear about the
concept of metadata?
Even if the value of data documentation is recognized, concerns remain as to the effort required to create metadata that effectively describe the data.
Concerns about creating metadata
Concerns about creating metadata
Concern Solution
workload required to capture accurate robust metadata
incorporate metadata creation into data development process – distribute the effort
time and resources to create, manage, and maintain metadata
include in grant budget and schedule
readability / usability of metadata use a standardized metadata format
discipline specific information and ontologies
‘profile’ standard to require specific information and use specific values
The value of metadata
Data creators
Datausers
Organizations
Metadatahelps…
What is the value to data creators?
Metadata allows data creators to:o Avoid data duplication o Share reliable informationo Publicize efforts – promote the work of a scientist and
his/her contributions to a field of study
CC im
age
by U
S Em
bass
y G
uyan
a o
n Fl
ickr
What is the value to data users?
Metadata gives a user the ability to:o Search, retrieve, and evaluate data
set information from both inside and outside an organization
o Find data: Determine what data exists for a geographic location and/or topic
o Determine applicability: Decide if a data set meets a particular need
o Discover how to acquire the dataset you identified; process and use the dataset
CC im
age
by A
SEE
on F
lickr
What is the value to organizations?
Metadata helps ensure an organization’s investment in data
o Documentation of data processing steps, quality control, definitions, data uses, and restrictions
o Ability to use data after initial intended purpose
Transcends people & time o Offers data permanenceo Creates institutional memory
Advertises an organization’s research o Creates possible new partnerships and
collaborations through data sharing
Information EntropyDA
TA D
ETAI
LS
Time of data development
Specific details about problems with individual items or specific dates are lost relatively rapidly
General details about datasets are lost through time
Accident or technology change may make data unusable
Retirement or career change makes access to “mental storage” difficult or unlikely
Loss of data developer leads to loss of remaining information
TIME (From Michener et al 1997)
Information Entropy
TIME
DATA
DET
AILS Sound information
management, including metadata development, can arrest the loss of dataset detail.
A closer look: the utility of metadata
Metadata can support:o data distributiono data managemento [project management]
If it is:o considered a component of the datao created during data developmento populated with rich content
derive classify
collect
planimetric imagery
analysis
alternativecommittee
review
PLAN
charette
meta
meta
meta
meta
Data distribution via metadata
metadata publication
dataportals
datadiscovery
Distribution: data discovery
The descriptive content of the metadata file can be used to identify, assess, and access available data resources.
• online access• order process• contacts
ACCESS
• use constraints• access constraints• data quality• availability/pricing
ASSESS
• keywords• geographic location• time period• attributes
IDENTIFY
Distribution: metadata publication
A metadata collection can be published to the internet via:
website catalogweb accessible folder (WAF)Z39.50 metadata clearinghousemetadata servicegeospatial data portal
Internet
Metadata CollectionUser Query
Internet /
Intranet
Dataset
Distribution: data portals
Examples of metadata search portals:Data.gov
Federal e-gov geospatial data portalhttp://www.geo.data.gov
MetacatRepository for data and metadatahttp://knb.ecoinformatics.org/index.jsp
US Geological SurveyUSGS Core Science Metadata Clearinghouse:
http://mercury.ornl.gov/clearinghouseICPSR
Political and Social Science data portal
Data management via metadata
DataAccountability
Discovery
& Re-use
Maintenance
& Update
DataLiability
Management: maintenance & updateMetadata records can used to track data provenance accuracyData Maintenance:
• Are the data current?o Do we have data older than ten years?o was before some political or geophysical event that resulted in
significant change?• Are the data valid?
o prior to most current source datao prior to most current methodologies
Data Update:• Contact information• Distribution policies, availability, pricing, URLs• New derivations of the dataset
Discovery: data reuse
If you create metadata, other people can discover your data
If you create metadata,you can find your own data
CC im
age
by O
cean
it D
aily
Pho
to
on F
lickr
Management: data discovery & reuse
Find your data by:o themes / attributeso geographic locationo time rangeso analytical methods usedo sources & contributorso data quality
Discoverable data is usable data!
CC im
age
by N
ASA
God
dard
Spe
ce F
light
Cen
ter o
n Fl
ickr
Management: data accountability
Metadata allows you to repeat scientific process if:o methodologies are definedo variables are definedo analytical parameters are defined
Metadata allows you to defend your scientific process:
o demonstrate process o increasingly GIS-savvy public
requires metadata for consumer information
INPUT
RESULTS
Management: data accountability
Metadata is an exercise in data accountability. It requires you to assess:
What do you know about the dataset?What don’t you know about the dataset?What should you know about the dataset?
Are you willing to associate yourself with the metadata record ?
Management: data liability
Metadata is a declaration of:Purposeo the originator’s intended application of
the data
Use Constraintso inappropriate applications of the data
Completenesso features or geographies excluded from the data
Distribution Liabilityo explicit liability of the data producer and assumed liability of the
consumer
What to do…
What not to do…
Review: the utility of metadata
Metadata can support: Data distribution
o discoveryo metadata publicationo data portals
Data managemento maintenance & updateo discovery & reuseo data accountabilityo data liability
[Project management]
Choosing Metadata Standards
Imag
e co
urte
sy o
f Viv
Hut
chin
son
Darwin Core | biological diversity, taxonomy
Dublin Core | general
DDI (Data Documentation Initiative) | social & behavioral sci.
DIF (Directory Interchange Format) | environmental sci.
EML (Ecological Metadata Language) | ecology, biology
ISO 19115| geographic data
Multiple standards exist
Browse by discipline: http://www.dcc.ac.uk/resources/metadata-standards
Comparing metadata standards
EML FGDC
Title Title
Abstract Abstract
Entity Description Entity Type Definition
Intellectual Rights Use Constraints
Choosing a metadata standard
Many standards collect similar informationFactors to consider:
1. Your data type• raster/vector GIS data, images, surveys/text, etc.
2. Organization [funder] policies3. Future preservation/sharing location4. Tools to support creation & distribution5. Other factors: Availability of human support;
instructional materials; use of controlled vocabularies; output formats
Summaryo Metadata is documentation of datao A metadata record captures critical information about the content of a dataseto Metadata allows data to be discovered, accessed, and re-usedo A metadata standard provides structure and consistency to data
documentationo Standards and tools vary – select according to defined criteria such as data
type, organizational guidance, and available resourceso Metadata is of critical importance to data developers, data users, and
organizationso Metadata can be effectively used for:
• data distribution• data management• project management
o Metadata completes a dataset.
Creating robust metadata is in your OWN best interest!
On Thursday
Barnard Classroom5th Floor
top related