Top Banner
The Research Data Archive at NCAR: A Metadata System that Enables Discovery Across a Diverse Archive Robert Dattore and Steven Worley National Center for Atmospheric Research Boulder, CO, USA 01/25/20 11 AMS 2011 1
17

Robert Dattore and Steven Worley National Center for Atmospheric Research Boulder, CO, USA

Feb 22, 2016

Download

Documents

davin

The Research Data Archive at NCAR: A Metadata System that Enables Discovery Across a Diverse Archive. Robert Dattore and Steven Worley National Center for Atmospheric Research Boulder, CO, USA. Outline. Introduction RDA - Then RDA - Now Data Discovery. Introduction. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Robert Dattore and Steven Worley National Center for Atmospheric Research Boulder, CO, USA

The Research Data Archive at NCAR: A Metadata System

that Enables Discovery Across a Diverse Archive

Robert Dattore and Steven WorleyNational Center for Atmospheric

ResearchBoulder, CO, USA

01/25/2011 AMS 2011 1

Page 2: Robert Dattore and Steven Worley National Center for Atmospheric Research Boulder, CO, USA

Outline

o Introductiono RDA - Theno RDA - Nowo Data Discovery

01/25/2011 AMS 2011 2

Page 3: Robert Dattore and Steven Worley National Center for Atmospheric Research Boulder, CO, USA

Introduction

o Purpose - support climate & weather research at NCAR; services are extended worldwide as resources permit

o Observations, derived products; focus on historical atmosphere/ocean data

o Metrics Established in 1960s 600+ datasets, 4M files, 600 TB 7000 users annually

01/25/2011 AMS 2011 3

Page 4: Robert Dattore and Steven Worley National Center for Atmospheric Research Boulder, CO, USA

Introduction

o Changing data landscape Then – small datasets, single

country/experiment, specialized formats Now – global coverage, high spatial/temporal

resolutions, standard formatso Result and challenge:

Lots of diversity How can we provide uniform discovery?

01/25/2011 AMS 2011 4

Page 5: Robert Dattore and Steven Worley National Center for Atmospheric Research Boulder, CO, USA

Then

01/25/2011 AMS 2011 5

Page 6: Robert Dattore and Steven Worley National Center for Atmospheric Research Boulder, CO, USA

Then

o Bottom line Increasing data diversity, evolving technology;

difficult to develop good systematic discovery README files, directory names Primarily via personal communications

o Major limiting factor – insufficient metadata No metadata standard, dictionaries Collection not uniform across all datasets Rigidly-structured flat ASCII files Archiving separate from metadata collection

01/25/2011 AMS 2011 6

Unscalable System!

Page 7: Robert Dattore and Steven Worley National Center for Atmospheric Research Boulder, CO, USA

Now

01/25/2011 AMS 2011 7

Page 8: Robert Dattore and Steven Worley National Center for Atmospheric Research Boulder, CO, USA

oDeveloped local standard for discovery based on DIF1 & THREDDS2; applied across all datasets

oAdopted GCMD3 controlled vocabularies Local enhancements; e.g. data formats

oHarvest two types of file metadata File attribute – name, size, compression, … File content - variables, levels, date range, ...

oStorage using XML

Now

01/25/2011 AMS 2011 8

1Directory Interchange Format, NASA/GCMD3 ; 2Thematic Realtime Environmental Distributed Data Services; 3Global Change Master Directory

Page 9: Robert Dattore and Steven Worley National Center for Atmospheric Research Boulder, CO, USA

Metadata Collection

01/25/2011 AMS 2011 9

Page 10: Robert Dattore and Steven Worley National Center for Atmospheric Research Boulder, CO, USA

Metadata Collection

o Tools that automatically capture file metadata Integrated with archiving activities

o Web-based GUI - guided entry of dataset discovery metadata Required fields, constrained entries

01/25/2011 AMS 2011 10

Page 11: Robert Dattore and Steven Worley National Center for Atmospheric Research Boulder, CO, USA

Relational Databases

01/25/2011 AMS 2011 11

Page 12: Robert Dattore and Steven Worley National Center for Atmospheric Research Boulder, CO, USA

Relational Databaseso Fast accesso Dataset discovery metadata

Single database (~0.3M rows)o File attribute metadata

Single database (~45M rows) Maintains dataset/data file relationships

o File content metadata Four databases structured to handle diversity

of data (~920M rows) Maintains detailed parameter relationships

01/25/2011 AMS 2011 12

All together, support accurate

data discovery

Page 13: Robert Dattore and Steven Worley National Center for Atmospheric Research Boulder, CO, USA

Data Discovery

01/25/2011 AMS 2011 13

Page 14: Robert Dattore and Steven Worley National Center for Atmospheric Research Boulder, CO, USA

Data Discovery

o Dataset discovery Google-like dataset search “Look For Data” interface – user-defined

dataset catalogs Auto-generated dataset pages – always up-to-

date Collections – all reanalyses, upper air obs,

surface obs

01/25/2011 AMS 2011 14

Page 15: Robert Dattore and Steven Worley National Center for Atmospheric Research Boulder, CO, USA

o Data file discovery “Create Your Own List” for data file lists

- Show specific files from terabyte-sized collections

o Other “Station Viewer”

- Google maps; see stations, metadata

Data Discovery

01/25/2011 AMS 2011 15

Page 16: Robert Dattore and Steven Worley National Center for Atmospheric Research Boulder, CO, USA

Metadata Sharing

o OAI-PMH UCAR Community Data Portal (THREDDS) Global Change Master Directory (DIF) also Dublin Core, native easy to add others as necessary

01/25/2011 AMS 2011 16

Page 17: Robert Dattore and Steven Worley National Center for Atmospheric Research Boulder, CO, USA

Thank You!

Web: http://dss.ucar.edu Email: [email protected] Questions/comments?

01/25/2011 AMS 2011 17