Top Banner
Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group
22

Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.

Dec 18, 2015

Download

Documents

Nathan Warren
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.

Best Practices to Promote Data Interoperability

Chris LynnesJoe Glassy

Technology Infusion Working Group

Page 2: Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.

Outline

• Data interoperability: what and why?• Factors affecting data interoperability• Implementations that support interoperability

Page 3: Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.

What is Data Interoperability?

Data interoperability exists when a data user is able to work with (view, analyze, process, etc.) a data provider's science data or model output “transparently,” without having to reformat the data, write special tools to read or extract the data, or rely on specific proprietary software.

Quicker data usability, easier portability, more transparency – S. Volz

Page 4: Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.

Illustration: Panoply

DATASET COMPARISON•North American Reanalysis

from NCDC•Atmospheric Infrared

Sounder (AIRS) from GES DISC

PROCEDURE1.Cut and paste NARR

OPeNDAP URL2.Double-click variable to

display3.Repeat for AIRS

Page 5: Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.

What good is data interoperability?

• Makes it easier to write tools that work with many datasets...

• ...Which increases the ability to work with multiple datasets together...

• ...And promotes user-satisfaction and early experiences with ( {your|my|our} data)...

• ...Which enhances a dataset’s life-cycle economics.

Page 6: Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.

FACTORS AFFECTING DATA INTEROPERABILITY

There is no single path to interoperability…

Page 7: Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.

File Formats

• Standard formats– More economical to develop general tools– Format is well documented– APIs* exist– Many datasets enabled by one set of code

modules• “Self-describing” formats– Contain embedded metadata to interpret the

content, context, and/or structure of the file

*Application Programming Interfaces

Page 8: Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.

File Structures

• Coordinates: where and named how?– Latitude, longitude– Vertical dimension: altitude, pressure, sigma

level, depth, ...– Time

• Flat vs. hierarchical• Simple vs. complex

Page 9: Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.

Usage Metadata

• Inside file vs. separate file– Easy for users to lose a separate file– A key benefit of self-describing formats

• Variable-level metadata– Units– Fill Value– Scale / offset

• File-level metadata• Standards (e.g., CF-1, HDF-EOS, ISO 19115)

Page 10: Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.

Grids

• Common grids enable dataset comparison, merging, etc.

• Reprojection from one grid to another usually loses information

• Tradeoff– Most appropriate grid for a dataset vs....– ...most commonly used grid in the “community”– Keep in mind that the potential community may

be much broader than you think

Page 11: Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.

Names and Units

• Variable names– Standard names (CF-1)– Unique names within file

• Some tools have difficulty with hierarchies having variables with the same name in different branches

• Dimension / coordinate names– Latitude, longitude, time, altitude/pressure

• Unit names– Standard units– Unit conversion

• Note that altitude <-> pressure requires additional information• Filenames– Descriptive filenames: dataset, version, data date/time…

Page 12: Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.

Sidebar: Data Identifiers• Filenames, even descriptive ones, may not be

completely reliable as unique identifiers• Identifiers are ideally embedded within the data file• Uniquely identifying datasets and data files helps:– Catalog interoperability– Transparency / provenance– Citation metrics

• See Ruth Duerr’s talk on recommendations for unique identifiers for datasets and granules

• Future tools may make use of these embedded identifiers: look up references, get related data...

Page 13: Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.

IMPLEMENTATIONS OF DATA INTEROPERABILITY

Page 14: Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.

CF-1• Climate-Forecast convention– Popular in modeling community– Extending to point and satellite data

• Coordinate system: Key for tool usage– Latitude + longitude

• Specifications for both regular L3 grids and L2 swaths– Time, vertical– Recognizable via units (e.g. “degrees_north”)

• Standard variable names: Key for model incorporation• Most often associated with netCDF– Also applicable in OPeNDAP– Work is underway to apply to HDF5

Page 15: Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.

OPeNDAP• Open-Source Project for a Network Data Access

Protocol• Client-Server framework– Standard web (GET) request syntax

• Remote fine-grained access to data files• Presents a standard data model and “format” to

clients• Supports multiple formats on the back end– HDF, netCDF, ASCII, GRIB, binary

• Multiple server implementations– Hyrax, THREDDS, ERDDAP, GDS, Dapper, PyDAP, TSDS...

• Client support in many tools– IDV, McIDAS-V, GrADS, Matlab, IDL, Ferret, Panoply

Page 16: Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.

Web Coverage Service

• Client-Server framework– Open Geospatial Consortium protocol– Standard web (GET) request syntax

• Multiple response formats, including GeoTIFF, netCDF/CF-1 and HDF-EOS

• Includes spatial subsetting• BUT:– Client support is still nascent outside GIS community– Some datatypes are difficult or impossible to fit into

WCS (e.g., limb-scanning profiles)

Page 17: Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.

Semantic Web

• Enables machine recognition of:– names– relationships

• Effective for:– Metadata– Small ASCII data

• Use of semantic web to make Earth Science data interoperable is still in its experimental phase

Page 18: Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.

Data Tools for Use with Interoperable Data

• Panoply– http://www.giss.nasa.gov/tools/panoply/

• IDV– http://www.unidata.ucar.edu/software/idv/

• McIDAS-V– http://www.ssec.wisc.edu/mcidas/software/v/

• GrADS– http://www.iges.org/grads/

• Ferret– http://ferret.wrc.noaa.gov/Ferret/

Page 19: Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.

Summary

• Data users benefit from data interoperability– More tools available to handle more datasets

• Consider format, structure, grids, metadata and naming

• If interoperability cannot be built in at data production, some tools (OPeNDAP, WCS, semantic web) can compensate...

• ...IF the metadata and information content of the data are sufficient

Page 20: Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.

BACKUP SLIDES

Page 21: Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.

References• Practical Data Interoperability for Earth Scientists

http://www.esdswg.org/techinfusion/downloads/pdies/view• Recommendations for Data Level Interoperability

http://tiwg.wik.is/Interoperability/Interoperability_Recommendations• HDF

http://www.hdfgroup.org/• HDF-EOS

http://hdfeos.org/• netCDF

http://www.unidata.ucar.edu/software/netcdf/• OPeNDAP:

http://www.opendap.org• CF-1

http://cf-pcmdi.llnl.gov/• Web Coverage Service

http://en.wikipedia.org/wiki/Web_Coverage_Service

Page 22: Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.

OPeNDAP URL examples• Get metadata in XML

http://airspar1u.ecs.nasa.gov/opendap/Aqua_AIRS_Level2/AIRX2RET.005/2010/285/AIRS.2010.10.12.090.L2.RetStd.v5.2.2.0.G10286064818.hdf.ddx

• Get data slice in ASCII:http://airspar1u.ecs.nasa.gov/opendap/Aqua_AIRS_Level2/AIRX2RET.005/2010/285/AIRS.2010.10.12.090.L2.RetStd.v5.2.2.0.G10286064818.hdf.ascii?H2OMMRStd[0:1:44][0:1:29][4:1:5]

• Data access URL for clients (IDV, Panoply):http://airspar1u.ecs.nasa.gov/opendap/Aqua_AIRS_Level2/AIRX2RET.005/2010/285/AIRS.2010.10.12.090.L2.RetStd.v5.2.2.0.G10286064818.hdf