Top Banner
47

Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Dec 30, 2015

Download

Documents

Briana Daniels
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.
Page 2: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

2

January 4, 2005

Project sponsors

Earth System Grid - DOE/SciDAC

Coupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM

Virtual Solar-Terrestrial Observatory - NSF/CISE/SCI

Related DODS/OPeNDAP work - NASA and NCAR/HAO

Page 3: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

3

January 4, 2005

Report on experience with data ‘systems’ and data ‘frameworks’

CEDARWEB

Earth System Grid

Compare and contrast success in terms of use(rs)

Technology integration - when and how does it work and scale?

Outline a merged approach for Virtual Observatory concept

Overview

Page 4: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

4

January 4, 2005

CEDARWEB

Page 5: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

5

January 4, 2005

CEDARWEB: heritage CEDAR is a large scientific and technical community focusing on the Earth’s middle and

upper atmosphere. The program features ground-based observing networks, models and integrative studies. Funded by NSF, in third phase (3rd decade)

CEDAR data history Started as an incoherent radar database in 1983 as a tape archive (back to 1966) Grew by late 80’s adding other instruments, models, indices Went on-line in early 90’s (became a single-tiered data system) Web access in 1996, three versions of the interface

Holdings - some satellite data, geophysical indices, modesl (GCM, empirical, tides, etc.), ISRs, HF Radars, Digisondes, FPIs, IR Michelson Interferometers, Spectrometers, Airglow Imagers, All-Sky Cameras, LIDARs, Multi-Channel Photometers, MST Radars, MF Radars, LF Radars, Meteor Wind Radars, Campaigns, Presentations, Surveys, Jobs, Workshops, etc.

Community, 600+, 300+ registered users, ~ 100 active data users per year NCAR tasked with community support, and especially in the early days to ‘take care’ of

the data and work with data providers and users Significant effort in catalogs, metadata, controlled vocabulary System has labored in getting past the code/mnemonic schemes of the past, base data

format

Page 6: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

6

January 4, 2005

CEDAR pre-web

Data query, selection and retrieval interface, without any integrated tools or ability to preview data before retrieving it.

Page 7: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

7

January 4, 2005

CEDARWEB 2.0

Page 8: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

8

January 4, 2005

CEDARWEB 2.0

Page 9: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

9

January 4, 2005

CEDARWEB 3.x

Data query, selection and retrieval interface, with integrated tools, e.g. ability to plot (preview) data before retrieving it.

Page 10: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

10

January 4, 2005

CEDARWEB - OPeNDAP

Page 11: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

11

January 4, 2005

CEDARWEB - OPeNDAP

Page 12: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

12

January 4, 2005

CEDARWEB 3.1

Ability to quickly plot data to assess suitability, quality, and produce a quick copy with some customization for a preliminary study.

Page 13: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

13

January 4, 2005

Experience: CEDARWEB

Don’t just provide data, but also build in community information and ancillary information that is of value.

Page 14: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

14

January 4, 2005

Inside CEDARWEB

Rich metadata; categorized OPeNDAP for data access and transport MySQL for catalog and user records https and cookies for session authentication Script-enabled interface with plotting built in (ION) delivers html to browsers ‘Hides’ organizational data record structure (sort of) Low-level data product, but also high-level Disconnect between delivery of data and attributes

Today: framework is inside the data system!

Page 15: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

15

January 4, 2005

Experience: CEDARWEB

CEDARWEB has been developed and improved over more than 10 years of interaction with users, data providers, and a community steering committee. Each of these elements has directly contributed to changes in what services are provided, what information and materials are made available via the web site and what levels of authorization and authentication are required.

Biggest lesson: systems approach has worked because of the heritage of the data collection but users (esp. new or very experienced) see a barrier to entry and don’t understand where system starts/stops.

http://cedarweb.hao.ucar.edu

Page 16: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

16

January 4, 2005

The goal of ESG is to make climate data – particularly climate model data – an easily accessible community resource. The project is funded by the SciDAC program: Scientific Discovery through Advanced Computing.

Enabling researchers to understand and make effective use of very large, distributed climate datasets is critical. The broad strategy is to develop a collection of server-side capabilities – minimize the amount of data movement.

Multiple interfaces to ESG will allow researchers to focus on science rather than issues of data transfer, format, and data set manipulation.

Foundation is Globus Grid technology

Earth System Grid Overview

Page 17: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

17

January 4, 2005

ESG: U.S. Collaborations & Development

ORNL: Climate storage &computational resources

ORNL: Climate storage &computational resources

LANL: Next generationcoupled models & computing

LANL: Next generationcoupled models & computing

ANL: Computational grids,& grid-based applications

ANL: Computational grids,& grid-based applications

USC/ISI: Computational grids,& grid-based applications

USC/ISI: Computational grids,& grid-based applications

NCAR: Climate changepredication and scenarios

NCAR: Climate changepredication and scenarios

LBNL: Climate storage facility

LBNL: Climate storage facility

LLNL: Model diagnostics& inter-comparison

LLNL: Model diagnostics& inter-comparison

Page 18: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

18

January 4, 2005

DODS/OPeNDAP: Distributed Oceanographic Data System (Unidata)Integrations of Globus GridFTP, DODS data access

THREDDS: THematic Real‑time Environmental Distributed Data Services (Unidata)LAS: Live Access Server (NOAA Pacific Marine Environmental Laboratory)

Works with CDAT, Ferret, GrADS, …CDAT: Climate Data Analysis Tools (PCMDI), includes CDMS: Climate Data Management System, VCDAT visualizationCommunity Data Portal project (NCAR)NCL (NCAR)Globus Grid technology(ANL, ISI): GridFTP, CAS Community Access Portal

ESG leverages existing software and projects

Page 19: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

19

January 4, 2005

ESG: Requirements & Priority Matrix

ESG Developer ESG Administrator ESG UserESG Services: Framework H H H Automatic Installation L L HDistributed Computing Authorization & Authentication H H M Registration H H L Event Services L L M Task Management L L L Logging Services L H HData Systems Search and Discovery M H H data movement (transport) L H H meta-data framework H H M collaboratories M L HTools analysis M M H visualization L L H collaboration M M H

L = LOW, M = MEDIUM, H = HIGH

Page 20: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

21

January 4, 2005

ESG: ESG-II Architecture

Page 21: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

23

January 4, 2005

NCAR

LBNL

LLNL

ISI

ANL

ORNL

GSI

GSIGSIGSI

GSI

GSI CAS server

CAS client

CAS client

CAS client

MyProxy client MyProxy server

TOMCAT

SECURITY services

GRAM

METADATA services

FRAMEWORK services

Auth metadata

RLSMySQL

RLSMySQL

RLSMySQL

RLSMySQL

NERSCHPSS

NCAR MSS

DISK

DISK

DISK

DISKORNLHPSS

DATA storage

The Earth System Grid

THREDDS catalogs Xindice

XindiceMySQL OGSA-DAISMCS

TRANSPORT services

gridFTP server/client

gridFTP server/client

gridFTP server/clientgridFTP server/client

HRM

HRM

HRMHRM

openDAPg server

openDAPg server

ANALYSIS & VIZ services

NCL openDAPg client LAS server

CDAT openDAPg client

MONITORING services

SLAMON daemon

SLAMON daemon

TOMCAT

AXIS

Page 22: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

24

January 4, 2005

Earth System Grid Portal

Page 23: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

25

January 4, 2005

Community Data Portal

Free text search

Applications

Live Access

News

Authentication

THREDDS catalog

Page 24: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

26

January 4, 2005

Community Data Portal

Page 25: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

27

January 4, 2005

LAS/CDAT: Example of a Web-based Data Portal

Technology: Web Based (end user requirements) LAS, DODS, ESG (i.e., Globus),

CDAT Portal should hide/simplify the Grid for

users Single sign-on Community-based authorization Simplified resource location Remote job submission,

management Accesses the ESG Grid Testbed

Page 26: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

28

January 4, 2005

ESG: Example of a Web-based Data Portal (serving 40+ simulations: AMIP, CMIP, and PCM)

Page 27: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

29

January 4, 2005

ESG: Example of a Client Application

Page 28: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

30

January 4, 2005

Metadata-centric view of ESG services

METADATASERVICES

METADATASERVICES

USER AUTHENTICATIONAND AUTHORIZATION

USER AUTHENTICATIONAND AUTHORIZATION

ACCESS AND AUTHORIZATION

METADATA

DATA TRANSPORTDATA TRANSPORT

LOCATIONMETADATA

SYSTEM MONITORINGAND CONTROL

SYSTEM MONITORINGAND CONTROL

LOGGINGMETADATA

DATA SEARCH & DISCOVERYDATA SEARCH & DISCOVERY

CONTENT METADATA

ANNOTATION & HISTORYMETADATA

DATA ANALYSIS & VISUALIZATION

DATA ANALYSIS & VISUALIZATION

AGGREGATION METADATA

DATA BROWSINGDATA BROWSING

CATALOGUINGMETADATA

Page 29: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

31

January 4, 2005

ESG Metadata Services Architecture

3-layer architecture: Metadata Holdings: physical metadata content, stored in a system

of relational and/or XML native databases Core Metadata Services: modules and libraries that mediates all

access to the Metadata Holdings (insert, update, delete, query) – expose an API that hides the specific implementation of the databases and query languages

High Level Metadata Services: system of applications that make use of the Core Metadata Services to fulfill a specific atomic functionality – will be invoked by external clients

Page 30: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

32

January 4, 2005

METADATAEXTRACTION

METADATAEXTRACTION

METADATADISPLAY

METADATADISPLAY

METADATABROWSING

METADATABROWSING

METADATASEARCH, QUERY

& DISCOVERY

METADATASEARCH, QUERY

& DISCOVERY

ESG CLIENTS API & USER INTERFACES

ReplicaLocationServices

MetadataCataloguing

ServicesXML DB THREDDS

catalogs

METADATA HOLDINGS

METADATAANNOTATION

METADATAANNOTATION

METADATAVALIDATION

METADATAVALIDATION

METADATA ACCESS(update, insert, delete, query)

METADATA ACCESS(update, insert, delete, query)

SERVICE TRANSLATIONLIBRARY

SERVICE TRANSLATIONLIBRARY

CORE METADATA SERVICES

METADATAAGGREGATION

METADATAAGGREGATION

METADATACONVERSION

METADATACONVERSION

METADATA & DATA REGISTRATION

METADATA & DATA REGISTRATION

PUBLISHINGPUBLISHING

HIGH LEVEL METADATA SERVICES

SEARCH & DISCOVERYSEARCH & DISCOVERYADMINISTRATIONADMINISTRATION BROWSING & DISPLAYBROWSING & DISPLAY

ANALYSIS & VISUALIZATIONANALYSIS & VISUALIZATION

Page 31: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

33

January 4, 2005

ESG Metadata Services Goal Functionality

Services responsible for the creation, management and utilization of metadata associated with geophysical data

Functionality: Metadata extraction (automatically, from files in different format and

according to various possible metadata standards) Metadata conversion (from one standard to another) Metadata aggregation (associated with data collections) Metadata annotation (manually by humans) Metadata validation (basic quality control of metadata) Registration (population of metadata holdings) Harvesting (combination of metadata from different repositories) Metadata browsing and display (for humans) Search and discovery of data through metadata Metadata query (by agents or clients for data analysis and visualization)

Page 32: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

34

January 4, 2005

ESG Metadata Services Current Development

Currently have in production the following technologies : Replica Location Services : database to manage and index multiple

copies of the same data stored at different centers Metadata Cataloguing Services : relational database to store

scientific metadata (developed for high energy physics and geophysical data)

XML native (**) and SQL databases THREDDS (by Unidata ) : system for hierarchical cataloguing of

datasets and associated metadata (http://www.unidata.ucar.edu/projects/THREDDS)

NcML (Netcdf Markup Language) : XML language for encoding of metadata associated with data in netcdf format (and more…)

Page 33: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

35

January 4, 2005

ESG Metadata Policy

Premise : geophysical sciences are too broad and complex to impose a single, omnicomprehensive metadata standard to capture the relevant information for all datasets, projects, instruments, scientists

ESG will not mandate use of any metadata schema or convention Allow data providers, scientists to use their metadata of choice,

provide technologies and tools to store and access metadata through common services (MCS, XML DB, THREDDS catalogs)

Encourage development and reuse of a limited set of domain-specific standards (climate data, radar data, airborn instrumentation etc), encoding in XML (according to community developed schemas), interoperability and combination of schemas (XML namespaces and RDF-based ontologies - developed but not used)

Page 34: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

36

January 4, 2005

OPeNDAP for ESG II

DODS since ~ 1995 was been based on http and cgi-style architecture

Two concernsApplication support and performance of HTTPHousekeeping abilities of cgi architecture

Solution evolve OPeNDAP the discipline neutral aspect of DODS

Page 35: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

37

January 4, 2005

OPeNDAP ctd.

Data transport protocol and access protocol separated

Revised server architecture Address Grid-style authentication Memory management Exception handling All these changes and retain interoperation with

HTTP and cgi Advanced requirements: URL should support

more than one dataset, or object, i.e. aggregation

Page 36: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

38

January 4, 2005

OPeNDAP 3.x vs OPeNDAP-g Architecture

• Simple and easy to install• One CGI process per

URL request• Limited memory

management – external• Limited scalability• Limited status reporting to

web server• Returns data stream from

one format

• Standalone server or httpd module

• Can manage multiple daemon processes

• Strong memory management – internal

• Reuse processes, scales• Coupled to OPeNDAP

server for status• Returns multiple formats

in a single stream, multiple protocols

Page 37: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

39

January 4, 2005

Page 38: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

40

January 4, 2005

Application development

Page 39: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

41

January 4, 2005

Status

Refactor core classes to remove http/libwww, etc. Operational/production release of standalone OPeNDAP

server (no dependence on web server) Multi-protocol support: file, http, GridFTP, ftp, etc. Re-architected for aggregation support and performance Run OPeNDAP server as a client to GridFTP server Portal application client in production, prototype of

netCDF client operational Authentication is handled outside OPeNDAP server URL syntax is more complex

Page 40: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

42

January 4, 2005

ESG: Framework experience

ESG is a highly collaborative effort and will allow users to quickly access data storage facilities storing petabytes of raw or processed data in an application independent manner.

Payoffs of this distributed collaborative infrastructure have included: Distributed data-sharing, RLS works! SRM/HRM work! OPeNDAP-g works! Simplified data discovery of climate data, the work on metadata paid off!

Scalability? Large-scale climate data processing and analysis via highly integrated portal Increased collaboration among climate research scientists, people use it! Aid in climate assessments and estimates of future climate variability and trends,

IPCC! Authentication and authorization have been a significant challenge

GSI to CAS MyProxy - session based and seems to work well, more compatible with

heterogeneous framework services SAML is working for multi-file batch transfer

Page 41: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

43

January 4, 2005

ESG: Framework experience

Privatization Portal interface (and much of the holdings) are cloned Closed communities are breeding dead-end alley developments, e.g. delivering

netCDF Transport - GridFTP versus HTTP

Server to server Very good performance Depends on a very specific version of GRIDftp server (stripped) Clients are not as capable due to ‘weight’ of globus, revert to HTTP

Scalability and response times (data AND metadata) Framework architecture supports re-layered for tuning

Service monitoring to support the distributed collaborative infrastructure need lots or all services to really make a production environment work

Many Globus services not used (GRIS, MDS, GIIS, … ) Feeling lucky? Try out ESG by visiting the website at: http://www.

earthsystemgrid.org

Page 42: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

44

January 4, 2005

Success?

Users are generally happy Exploited new technology components

Integration - when and how does it work and scale? XML SQL DODS OPeNDAP and OPeNDAP-g

Portals P2P - clients are not as ready as we think

Globus provides a suite of framework components, some are easier to integrate than others, some just don’t fit our use-cases and architecture

Data framework - e.g. OPeNDAP has been extremely successful

Page 43: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

45

January 4, 2005

User needs

In discussions with data providers and users, the needs are clear:

``Fast access to `portable' data, in a way that works with the tools we have; information must be easy to access, retrieve and work with.'’

Too often users (and data providers) have to deal with the organizational structure of the data sets which varies significantly --- data may be stored at one site in a small number of large files while similar data may be stored at another site in a large number of relatively smaller files. There is an equally large problem with the range of metadata descriptions for the data. Users often only want subsets of the data and struggle with getting it efficiently. One user expresses it as:

``(Please) solve the interface problem.''

Page 44: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

46

January 4, 2005

Vision for building science cyberinfrastructure

Use-case, then requirements Then derive architecture and choose technology

components Build a working system for users from the start Get your funding source and community to commit to an

evolving architecture

If you choose a major framework technology, e.g. Globus, OPeNDAP, THREDDS, partner with them

Data framework - e.g. OPeNDAP has been extremely successful

Page 45: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

47

January 4, 2005

One paradigm

Goal - find the right balance of data/model holdings, portals and client software that a researchers can use without effort or interference as if all the materials were available on his/her local computer.

E.g.The Virtual Solar-Terrestrial Observatory (VSTO) is proposed to be:• a distributed, scalable education and research environment for

searching, integrating, and analyzing observational, experimental and model databases in the fields of solar, solar-terrestrial and space physics

Comprises:• a system-like framework which provides virtual access to specific data,

model, tool and material archives containing items from a variety of space- and ground-based instruments and experiments, as well as individual and community modeling and software efforts bridging research and educational use

Page 46: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

48

January 4, 2005

Virtual Observatory? Need better glue

• Basic problem: schema are categorized rather than developed from an object model/class hierarchy -> significantly limits non-human use. However, they all form the basis to organize catalog interfaces for all types of data, images, etc.

• This limits data systems utilizing frameworks and prevents frameworks from truly interoperating (SOAP, WSDL only a start)

• Directories, e.g. NASA GCMD, CEDAR catalog, FITS (flat) keyword/ value pairs, are being turned into ontologies (SWEET, VSTO)

• Markup languages, e.g. ESML, SPDML, ESG/ncML are excellent bases

• Evolve, recast, merge (where appropriate) using formal processes, tools with intended use in mind - for interface specifications, reasoning, validation, etc. beyond the usual search and access

Page 47: Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Fox

49

January 4, 2005

Summary

Basic success in both data systems and data framework approaches

Satisfying user and sponsor needs (from ‘just’ to ‘outstanding’)

Experience with Globus ranges from very good, to not ready for our need

Experience with OPeNDAP is very good, especially with core services

Scalability and performance require an adaptable architecture which is something system-level interfaces can still hide from the user

Challenge - to bring these attributes to a framework, i.e. in which the user is more exposed

Interoperate, interoperate, interoperate - interface, interface, interface

User interfaces still require significant HCI efforts

Metadata services are extremely important