Supercomputing • Communications • D NCAR Scientific Computing Div The Earth System Grid (ESG) Don Middleton On behalf of many project collaborators and a lot of great work! NCAR Scientific Computing Division Section Head, Visualization & Enabling Technologies APAN eScience Workshop January 27, 2005
The Earth System Grid (ESG). APAN eScience Workshop January 27, 2005. Don Middleton On behalf of many project collaborators and a lot of great work! NCAR Scientific Computing Division Section Head, Visualization & Enabling Technologies. The ESG Collaboration. LBNL: Climate storage - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Supercomputing • Communications • Data
NCAR Scientific Computing Division
The Earth System Grid (ESG)
Don MiddletonOn behalf of many project collaborators and a lot of great work!
7.5GB/yr, 100 years .75TB for one run T85 CCSM (140km)
29GB/yr, 100 years 2.9TB for one run T170 CCSM (70km)
110GB/yr, 100 years 11TB for one run
Supercomputing • Communications • Data
NCAR Scientific Computing Division
CCM at T170 Resolution
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
Supercomputing • Communications • Data
NCAR Scientific Computing Division
Advances at the Earth Simulator
ESC Climate Model at T1279 (approx. 10km)
Supercomputing • Communications • Data
NCAR Scientific Computing Division
Longer-term MissionsLonger-term Missions - - Observation of Key Earth System InteractionsObservation of Key Earth System Interactions
Terra
Aura
Aqua
Landsat 7
Exploratory - Exploratory - Explore Specific Earth System Processes and Parameters and Explore Specific Earth System Processes and Parameters and Demonstrate TechnologiesDemonstrate Technologies
GRACE
PICASSO Cloudsat
QuikScat
EO-1
ICEsat Jason-1
SRTMVCL
We Will Examine Practically Every Aspect of the Earth System from Space in This Decade
U.S. DOE SciDAC funded R&D effort - a “Collaboratory Pilot Project”
Build an “Earth System Grid” that enables management, discovery, distributed access, processing, & analysis of distributed terascale climate research data
Build upon Globus Toolkit and DataGrid technologies and deploy
Potential broad application to other areas
http://www.earthsystemgrid.org
Supercomputing • Communications • Data
NCAR Scientific Computing Division
ESG People
PIs: Ian Foster (ANL) Don Middleton (NCAR) Dean Williams (LLNL)
Team: Veronika Nefedova (ANL) Luca Cincuini (NCAR) Gary Strand (NCAR) Peter Fox (NCAR) Jose Garcia (NCAR) Rob Markel (NCAR) Bob Drach (LLNL) David Bernholdt (ORNL) Kasidit Chanchio (ORNL) Line Pouchard (ORNL) Carl Kesselman (ISI) Ann Chervenak (ISI) Arie Shoshani (LBNL) Alex Sim (LBNL)
Supercomputing • Communications • Data
NCAR Scientific Computing Division
ESG: Challenges
Enabling the simulation and data management team
Enabling the core research community in analyzing and visualizing results
Enabling broad multidisciplinary communities to access simulation results
We need integrated scientific work environments that enable smooth WORKFLOW for knowledge development: computation, collaboration & collaboratories, data management, access, distribution, analysis, and visualization.
Supercomputing • Communications • Data
NCAR Scientific Computing Division
ESG: Strategies Keep track of what we have, particularly what’s on deep
storage Metadata and Replica Catalogs
Move data a minimal amount, keep it close to computational point of origin when possible Data access protocols, distributed analysis
When we must move data, do it fast and with a minimum amount of human intervention Storage Resource Management, fast networks
Harness a federation of sites, web portals Globus Toolkit -> The Earth System Grid -> The UltraDataGrid
Supercomputing • Communications • Data
NCAR Scientific Computing Division
ESG Home
Supercomputing • Communications • Data
NCAR Scientific Computing Division
PCM Metadata
Supercomputing • Communications • Data
NCAR Scientific Computing Division
PCM Files and MSS
Supercomputing • Communications • Data
NCAR Scientific Computing Division
ESG CCSM
Supercomputing • Communications • Data
NCAR Scientific Computing Division
CCSM Datasets
Supercomputing • Communications • Data
NCAR Scientific Computing Division
Subsetting List
Supercomputing • Communications • Data
NCAR Scientific Computing Division
Subsetting Interface
Supercomputing • Communications • Data
NCAR Scientific Computing Division
IPCC DataClick
Supercomputing • Communications • Data
NCAR Scientific Computing Division
ESG architecture
Supercomputing • Communications • Data
NCAR Scientific Computing Division
ESG topology
Supercomputing • Communications • Data
NCAR Scientific Computing Division
Supercomputing • Communications • Data
NCAR Scientific Computing Division
ESG Technologies: Security Core security infrastructure provided by Globus GSI:
digital certificates, public/private keys, proxies ESG web-based digital registration system:
Hides from user complex details of digital certificate generation Allows easy web access by common users to ESG data services
Generated from collection-level + location/replica metadata
NcML metadata: NetCDF specific Describes specific content of each file Used to create virtual dataset aggregations
Supercomputing • Communications • Data
NCAR Scientific Computing Division
ESG Technologies: Data Transport SRM (Storage Resource Manager)
Middleware that allows seamless access to data resources whether they are stored on rotating or deep storage
File transfer between any deep storage (NCAR MSS, ORNL HPSS, NERSC) and local cache
Reliable, high performance transfer between sites via GridFTP Robust, efficient cache management capabilities
OPeNDAP-g Integration of OPeNDAP API with Globus technologies (GSI
authentication and GridFTP data transfer) Extension for aggregation of NetCDF data
Supercomputing • Communications • Data
NCAR Scientific Computing Division
ESG Technologies: Web Portal Main entry point into ESG system: provides simple,
convenient web-based access to wide range of data services to access climate model data
Integrates and makes use of all other ESG technologies Main ESG web portal at NCAR: gateway to distributed
climate model datasets (PCM, CCSM data stored at NCAR, ORNL, NERSC, LLNL)
Same software under deployment by LLNL/PCMDI to serve locally stored IPCC data world wide
Supercomputing • Communications • Data
NCAR Scientific Computing Division
ESG Technologies: Aggregation/subsetting
Supercomputing • Communications • Data
NCAR Scientific Computing Division
ESG Metrics(November 2004)
Community Climate System Model 28.4 Terabytes, including 21 simulations, 141 datasets, and 289,374
files Parallel Climate Model
20.42 Terabytes, including 98 simulations, 434 datasets, and 44,000 files
Total 48.8 Terabytes, 119 simulations, 575 datasets, in over 333,872 files 167 registrations, 132 approved, 154.2GB downloaded to date
Plus new IPCC Data 150 user registrations, 1.1TB of data downloaded, in 16,000 files
Supercomputing • Communications • Data
NCAR Scientific Computing Division
The Importance of Community:Collaborations & Relationships
GO-ESSP (multi-agency, intl.) CCSM Data Management Group IPCC Globus Project OPeNDAP/DODS (multi-agency) NCAR’s Community Data Portal (CDP) NSF National Science Digital Libraries Program (UCAR & Unidata THREDDS
Project) U.K. e-Science & British Atmospheric Data Center NOAA NOMADS and CEOS-grid VSTO (new NSF/NMI-funded project) Other SciDAC Projects: Climate, Security & Policy for Group Collaboration,
Scientific Data Management ISIC, & High-performance DataGrid Toolkit
Supercomputing • Communications • Data
NCAR Scientific Computing Division
‘ing Our Data
Supercomputing • Communications • Data
NCAR Scientific Computing Division
Data->Knowledge
Mass StorageSystem (2.0PB) Petascale Knowledge
Environment
Establish new paradigms for managing and accessing scientific data based on semantic organization.