Top Banner
Supercomputing • Communications • D NCAR Scientific Computing Div The Earth System Grid (ESG) Don Middleton On behalf of many project collaborators and a lot of great work! NCAR Scientific Computing Division Section Head, Visualization & Enabling Technologies APAN eScience Workshop January 27, 2005
36

The Earth System Grid (ESG)

Feb 04, 2016

Download

Documents

Luke Van Horn

The Earth System Grid (ESG). APAN eScience Workshop January 27, 2005. Don Middleton On behalf of many project collaborators and a lot of great work! NCAR Scientific Computing Division Section Head, Visualization & Enabling Technologies. The ESG Collaboration. LBNL: Climate storage - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

The Earth System Grid (ESG)

Don MiddletonOn behalf of many project collaborators and a lot of great work!

NCAR Scientific Computing DivisionSection Head, Visualization & Enabling Technologies

APANeScience Workshop

January 27, 2005

Page 2: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

The ESG Collaboration

ORNL: Climate storage &computational resources

LANL: High-resolution oceanmodels & computing

USC/ISI: Computational grids,& grid-based applications

NCAR: Climate changepredication and scenarios

LBNL: Climate storage facility

LLNL: Model diagnostics& inter-comparison

ANL: Computational grids,& grid-based applications

Page 3: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

A Global Coupled Climate Model

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

Page 4: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

A Lot of Data: Simulation Dataset Sizes by

Resolution T42 CCSM (current, 280km)

7.5GB/yr, 100 years .75TB for one run T85 CCSM (140km)

29GB/yr, 100 years 2.9TB for one run T170 CCSM (70km)

110GB/yr, 100 years 11TB for one run

Page 5: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

CCM at T170 Resolution

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

Page 6: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Advances at the Earth Simulator

ESC Climate Model at T1279 (approx. 10km)

Page 7: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Longer-term MissionsLonger-term Missions - - Observation of Key Earth System InteractionsObservation of Key Earth System Interactions

Terra

Aura

Aqua

Landsat 7

Exploratory - Exploratory - Explore Specific Earth System Processes and Parameters and Explore Specific Earth System Processes and Parameters and Demonstrate TechnologiesDemonstrate Technologies

GRACE

PICASSO Cloudsat

QuikScat

EO-1

ICEsat Jason-1

SRTMVCL

We Will Examine Practically Every Aspect of the Earth System from Space in This Decade

Triana

Courtesy of Tim Killeen, NCAR

Page 8: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

IPCC

Page 9: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

The ESG Collaboration

ORNL: Climate storage &computational resources

LANL: High-resolution oceanmodels & computing

USC/ISI: Computational grids,& grid-based applications

NCAR: Climate changepredication and scenarios

LBNL: Climate storage facility

LLNL: Model diagnostics& inter-comparison

ANL: Computational grids,& grid-based applications

Page 10: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

The Earth System Grid

U.S. DOE SciDAC funded R&D effort - a “Collaboratory Pilot Project”

Build an “Earth System Grid” that enables management, discovery, distributed access, processing, & analysis of distributed terascale climate research data

Build upon Globus Toolkit and DataGrid technologies and deploy

Potential broad application to other areas

http://www.earthsystemgrid.org

Page 11: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

ESG People

PIs: Ian Foster (ANL) Don Middleton (NCAR) Dean Williams (LLNL)

Team: Veronika Nefedova (ANL) Luca Cincuini (NCAR) Gary Strand (NCAR) Peter Fox (NCAR) Jose Garcia (NCAR) Rob Markel (NCAR) Bob Drach (LLNL) David Bernholdt (ORNL) Kasidit Chanchio (ORNL) Line Pouchard (ORNL) Carl Kesselman (ISI) Ann Chervenak (ISI) Arie Shoshani (LBNL) Alex Sim (LBNL)

Page 12: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

ESG: Challenges

Enabling the simulation and data management team

Enabling the core research community in analyzing and visualizing results

Enabling broad multidisciplinary communities to access simulation results

We need integrated scientific work environments that enable smooth WORKFLOW for knowledge development: computation, collaboration & collaboratories, data management, access, distribution, analysis, and visualization.

Page 13: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

ESG: Strategies Keep track of what we have, particularly what’s on deep

storage Metadata and Replica Catalogs

Move data a minimal amount, keep it close to computational point of origin when possible Data access protocols, distributed analysis

When we must move data, do it fast and with a minimum amount of human intervention Storage Resource Management, fast networks

Harness a federation of sites, web portals Globus Toolkit -> The Earth System Grid -> The UltraDataGrid

Page 14: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

ESG Home

Page 15: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

PCM Metadata

Page 16: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

PCM Files and MSS

Page 17: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

ESG CCSM

Page 18: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

CCSM Datasets

Page 19: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Subsetting List

Page 20: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Subsetting Interface

Page 21: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

IPCC DataClick

Page 22: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

ESG architecture

Page 23: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

ESG topology

Page 24: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Page 25: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

ESG Technologies: Security Core security infrastructure provided by Globus GSI:

digital certificates, public/private keys, proxies ESG web-based digital registration system:

Hides from user complex details of digital certificate generation Allows easy web access by common users to ESG data services

Page 26: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

ESG Technologies: Metadata Collection-level description metadata (“climate metadata”)

Describes logical objects involved in climate modeling Stored in set of relational tables in OGSA-DAI MySQL database

(RDBMS with Grid Service interface) Input and output of database is XML

Location and replica metadata Indicates the physical locations of the many copies of a single logical

file Stored in a system of distributed RLS (Replica Location Services):

cross-updating grid-enabled MySQL databases installed at each site Any RLI in the system can be used as starting point for obtaining all

replicas (at any site) of a given lfn

Page 27: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

ESG Metadata Schema

Page 28: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

ESG Technologies: Metadata THREDDS metadata catalogs:

Generated from collection-level + location/replica metadata

NcML metadata: NetCDF specific Describes specific content of each file Used to create virtual dataset aggregations

Page 29: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

ESG Technologies: Data Transport SRM (Storage Resource Manager)

Middleware that allows seamless access to data resources whether they are stored on rotating or deep storage

File transfer between any deep storage (NCAR MSS, ORNL HPSS, NERSC) and local cache

Reliable, high performance transfer between sites via GridFTP Robust, efficient cache management capabilities

OPeNDAP-g Integration of OPeNDAP API with Globus technologies (GSI

authentication and GridFTP data transfer) Extension for aggregation of NetCDF data

Page 30: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

ESG Technologies: Web Portal Main entry point into ESG system: provides simple,

convenient web-based access to wide range of data services to access climate model data

Integrates and makes use of all other ESG technologies Main ESG web portal at NCAR: gateway to distributed

climate model datasets (PCM, CCSM data stored at NCAR, ORNL, NERSC, LLNL)

Same software under deployment by LLNL/PCMDI to serve locally stored IPCC data world wide

Page 31: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

ESG Technologies: Aggregation/subsetting

Page 32: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

ESG Metrics(November 2004)

Community Climate System Model 28.4 Terabytes, including 21 simulations, 141 datasets, and 289,374

files Parallel Climate Model

20.42 Terabytes, including 98 simulations, 434 datasets, and 44,000 files

Total 48.8 Terabytes, 119 simulations, 575 datasets, in over 333,872 files 167 registrations, 132 approved, 154.2GB downloaded to date

Plus new IPCC Data 150 user registrations, 1.1TB of data downloaded, in 16,000 files

Page 33: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

The Importance of Community:Collaborations & Relationships

GO-ESSP (multi-agency, intl.) CCSM Data Management Group IPCC Globus Project OPeNDAP/DODS (multi-agency) NCAR’s Community Data Portal (CDP) NSF National Science Digital Libraries Program (UCAR & Unidata THREDDS

Project) U.K. e-Science & British Atmospheric Data Center NOAA NOMADS and CEOS-grid VSTO (new NSF/NMI-funded project) Other SciDAC Projects: Climate, Security & Policy for Group Collaboration,

Scientific Data Management ISIC, & High-performance DataGrid Toolkit

Page 34: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

‘ing Our Data

Page 35: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Data->Knowledge

Mass StorageSystem (2.0PB) Petascale Knowledge

Environment

Establish new paradigms for managing and accessing scientific data based on semantic organization.

Page 36: The Earth System Grid (ESG)

Supercomputing • Communications • Data

NCAR Scientific Computing Division

www.earthsystemgrid.org