The Research Data Archive at NCAR: A Metadata System that Enables Discovery Across a Diverse Archive Robert Dattore and Steven Worley National Center for Atmospheric Research Boulder, CO, USA 01/25/20 11 AMS 2011 1
Feb 22, 2016
The Research Data Archive at NCAR: A Metadata System
that Enables Discovery Across a Diverse Archive
Robert Dattore and Steven WorleyNational Center for Atmospheric
ResearchBoulder, CO, USA
01/25/2011 AMS 2011 1
Outline
o Introductiono RDA - Theno RDA - Nowo Data Discovery
01/25/2011 AMS 2011 2
Introduction
o Purpose - support climate & weather research at NCAR; services are extended worldwide as resources permit
o Observations, derived products; focus on historical atmosphere/ocean data
o Metrics Established in 1960s 600+ datasets, 4M files, 600 TB 7000 users annually
01/25/2011 AMS 2011 3
Introduction
o Changing data landscape Then – small datasets, single
country/experiment, specialized formats Now – global coverage, high spatial/temporal
resolutions, standard formatso Result and challenge:
Lots of diversity How can we provide uniform discovery?
01/25/2011 AMS 2011 4
Then
01/25/2011 AMS 2011 5
Then
o Bottom line Increasing data diversity, evolving technology;
difficult to develop good systematic discovery README files, directory names Primarily via personal communications
o Major limiting factor – insufficient metadata No metadata standard, dictionaries Collection not uniform across all datasets Rigidly-structured flat ASCII files Archiving separate from metadata collection
01/25/2011 AMS 2011 6
Unscalable System!
Now
01/25/2011 AMS 2011 7
oDeveloped local standard for discovery based on DIF1 & THREDDS2; applied across all datasets
oAdopted GCMD3 controlled vocabularies Local enhancements; e.g. data formats
oHarvest two types of file metadata File attribute – name, size, compression, … File content - variables, levels, date range, ...
oStorage using XML
Now
01/25/2011 AMS 2011 8
1Directory Interchange Format, NASA/GCMD3 ; 2Thematic Realtime Environmental Distributed Data Services; 3Global Change Master Directory
Metadata Collection
01/25/2011 AMS 2011 9
Metadata Collection
o Tools that automatically capture file metadata Integrated with archiving activities
o Web-based GUI - guided entry of dataset discovery metadata Required fields, constrained entries
01/25/2011 AMS 2011 10
Relational Databases
01/25/2011 AMS 2011 11
Relational Databaseso Fast accesso Dataset discovery metadata
Single database (~0.3M rows)o File attribute metadata
Single database (~45M rows) Maintains dataset/data file relationships
o File content metadata Four databases structured to handle diversity
of data (~920M rows) Maintains detailed parameter relationships
01/25/2011 AMS 2011 12
All together, support accurate
data discovery
Data Discovery
01/25/2011 AMS 2011 13
Data Discovery
o Dataset discovery Google-like dataset search “Look For Data” interface – user-defined
dataset catalogs Auto-generated dataset pages – always up-to-
date Collections – all reanalyses, upper air obs,
surface obs
01/25/2011 AMS 2011 14
o Data file discovery “Create Your Own List” for data file lists
- Show specific files from terabyte-sized collections
o Other “Station Viewer”
- Google maps; see stations, metadata
Data Discovery
01/25/2011 AMS 2011 15
Metadata Sharing
o OAI-PMH UCAR Community Data Portal (THREDDS) Global Change Master Directory (DIF) also Dublin Core, native easy to add others as necessary
01/25/2011 AMS 2011 16
Thank You!
Web: http://dss.ucar.edu Email: [email protected] Questions/comments?
01/25/2011 AMS 2011 17