Top Banner
NCAR Cyberinfrastructure Cyberinfrastructure for for Earth System Modeling Earth System Modeling Don Middleton Don Middleton NCAR Scientific Computing Division NCAR Scientific Computing Division APAN eScience Workshop, Honolulu APAN eScience Workshop, Honolulu January 28, 2004 January 28, 2004
40

NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

Apr 01, 2015

Download

Documents

Abbigail Peyser
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

CyberinfrastructureCyberinfrastructureforfor

Earth System ModelingEarth System Modeling

Don MiddletonDon Middleton

NCAR Scientific Computing DivisionNCAR Scientific Computing Division

APAN eScience Workshop, HonoluluAPAN eScience Workshop, Honolulu

January 28, 2004January 28, 2004

Page 2: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

Cyberinfrastructure forCyberinfrastructure forEarth System ModelingEarth System Modeling SupercomputersSupercomputers High-bandwidth networksHigh-bandwidth networks ModelsModels Data centers and GridsData centers and Grids CollaboratoriesCollaboratories Analysis and VisualizationAnalysis and Visualization

Page 3: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

““Atkins Report”Atkins Report” ““A new age has dawned…”A new age has dawned…”

“The Panel’s overarching recommendation is that the National Science Foundation should establish and lead a large-scale, interagency, and internationally coordinated Advanced Cyberinfrastructure Program (ACP) to create, deploy, and apply cyberinfrastructure in ways that radically empower all scientific and engineering research and allied education. We estimate that sustained new NSF funding of $1 billion per year is needed to achieve critical mass and to leverage the coordinated co-investment from other federal agencies, universities, industry, and international sources necessary to empower a revolution. The cost of not acting quickly or at a subcritical level could be high, both in opportunities lost and in increased fragmentation and balkanization of the research.”

Atkins Report, Executive Summary

Page 4: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

Characteristics of Infrastructure(from Kim Mish workshop presentation) EssentialEssential

– So important that it becomes ubiquitousSo important that it becomes ubiquitous ReliableReliable

– Example: the built environment of the Roman EmpireExample: the built environment of the Roman Empire ExpensiveExpensive

– Nothing succeeds like excess (e.g. Interstate system)Nothing succeeds like excess (e.g. Interstate system)– Inherently one-off (often, few economies of scale)Inherently one-off (often, few economies of scale)

Clear factorization between research and practiceClear factorization between research and practice– Generally deploy what provably worksGenerally deploy what provably works

Page 5: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

A Global Coupled Climate A Global Coupled Climate ModelModel

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

Page 6: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

Climate Model Data ProductionClimate Model Data Production T42 CCSM (current, 280km)T42 CCSM (current, 280km)

– 7.5GB/yr, 100 years -> .75TB7.5GB/yr, 100 years -> .75TB T85 CCSM (140km)T85 CCSM (140km)

– 29GB/yr, 100 years -> 2.9TB29GB/yr, 100 years -> 2.9TB T170 CCSM (70km)T170 CCSM (70km)

– 110GB/yr, 100 years -> 11TB110GB/yr, 100 years -> 11TB

Page 7: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

Capacity-related ImprovementsCapacity-related ImprovementsIncreased turnaround, model development, ensemble of runs

Increase by a factor of 10, linear data

Current T42 CCSMCurrent T42 CCSM– 7.5GB/yr, 100 years -> .75TB * 10 = 7.5GB/yr, 100 years -> .75TB * 10 =

7.5TB7.5TB

Page 8: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

CCM at T170 ResolutionCCM at T170 Resolution

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

Page 9: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

Capability-related Improvements Capability-related Improvements Spatial Resolution: T42 -> T85 -> T170

Increase by factor of ~ 10-20, linear data Temporal Resolution: Study diurnal cycle, 3 hour data

Increase by factor of ~ 4, linear data

CCM3 at T170 (70km)

Page 10: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

Capability-related Improvements Capability-related Improvements

Quality: Improved boundary layer, clouds, convection, ocean physics, land model, river runoff, sea ice

Increase by another factor of 2-3, data flat

Scope: Atmospheric chemistry (sulfates, ozone…), biogeochemistry (carbon cycle, ecosystem dynamics),middle Atmosphere Model…

Increase by another factor of 10+, linear data

Page 11: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

Model Improvement WishlistModel Improvement Wishlist

Grand Total:

Increase compute by a Factor O(1000-10000)

Page 12: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

Advances at the Earth SimulatorAdvances at the Earth Simulator

ESC Climate Model at T1279 (approx. 10km)

Page 13: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

Longer-term MissionsLonger-term Missions - - Observation of Key Earth System InteractionsObservation of Key Earth System Interactions

Terra

Aura

Aqua

Landsat 7

Exploratory - Exploratory - Explore Specific Earth System Processes and Parameters and Explore Specific Earth System Processes and Parameters and Demonstrate TechnologiesDemonstrate Technologies

GRACE

PICASSO

Cloudsat

QuikScat

EO-1

ICEsat Jason-1

SRTMVCL

We Will Examine Practically Every Aspect of the Earth We Will Examine Practically Every Aspect of the Earth System from Space in This DecadeSystem from Space in This Decade

Triana

Courtesy of Tim Killeen, NCAR

Page 14: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

The Earth System GridThe Earth System Grid

U.S. DOE SciDAC funded R&D effort - a U.S. DOE SciDAC funded R&D effort - a ““Collaboratory Pilot Project”Collaboratory Pilot Project”

Build an “Earth System Grid” that enables Build an “Earth System Grid” that enables management, discovery, distributed access, management, discovery, distributed access, processing, & analysis of distributed terascale processing, & analysis of distributed terascale climate research dataclimate research data

Build upon Globus ToolkitBuild upon Globus Toolkit and DataGrid and DataGrid technologies and technologies and deploydeploy

Potential broad application to other areasPotential broad application to other areas

http://www.earthsystemgrid.org

Page 15: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

ESG TeamESG Team ANLANL

– Ian Foster (PI)Ian Foster (PI)– Veronika NefedovaVeronika Nefedova– (John Bresenhan)(John Bresenhan)– (Bill Allcock)(Bill Allcock)

LBNLLBNL– Arie ShoshaniArie Shoshani– Alex SimAlex Sim

ORNLORNL– David BernholdteDavid Bernholdte– Kasidit ChanchioKasidit Chanchio– Line PouchardLine Pouchard

LLNL/PCMDILLNL/PCMDI– Bob DrachBob Drach– Dean Williams (PI)Dean Williams (PI)

USC/ISIUSC/ISI– Anne ChervenakAnne Chervenak– Carl KesselmanCarl Kesselman– (Laura Perlman)(Laura Perlman)

NCARNCAR– David BrownDavid Brown– Luca CinquiniLuca Cinquini– Peter FoxPeter Fox– Jose GarciaJose Garcia– Don Middleton (PI)Don Middleton (PI)– Gary StrandGary Strand

Page 16: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

Page 17: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

ESG ScenarioESG Scenario End 2002: 1.2 million files comprising End 2002: 1.2 million files comprising

~75TB of data at NCAR, ORNL, LANL, ~75TB of data at NCAR, ORNL, LANL, NERSC, and PCMDINERSC, and PCMDI

End 2007: As much as 3 PB (3,000 TB) End 2007: As much as 3 PB (3,000 TB) of data (!)of data (!)

Current practice is already broken – the Current practice is already broken – the future will be even worse if something future will be even worse if something isn’t done…isn’t done…

Page 18: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

ESG: ChallengesESG: Challenges Enabling the simulation and data Enabling the simulation and data

management teammanagement team Enabling the core research community Enabling the core research community

in analyzing and visualizing resultsin analyzing and visualizing results Enabling broad multidisciplinary Enabling broad multidisciplinary

communities to access simulation communities to access simulation resultsresultsWe need integrated scientific work environments that enable

smooth WORKFLOW for knowledge development: computation, collaboration & collaboratories, data management, access, distribution, analysis, and visualization.

Page 19: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

ESG: StrategiesESG: Strategies Harness a federation of sites, web portalsHarness a federation of sites, web portals

– Globus Toolkit -> The Earth System Grid -> The Globus Toolkit -> The Earth System Grid -> The UltraDataGridUltraDataGrid

Move data a minimal amount, keep it close to Move data a minimal amount, keep it close to computational point of origin when possiblecomputational point of origin when possible– Data access protocols, distributed analysisData access protocols, distributed analysis

When we must move data, do it fast and with When we must move data, do it fast and with a minimum amount of human interventiona minimum amount of human intervention– Storage Resource Management, fast networksStorage Resource Management, fast networks

Keep track of what we have, particularly Keep track of what we have, particularly what’s on deep storagewhat’s on deep storage– Metadata and Replica CatalogsMetadata and Replica Catalogs

Page 20: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

Page 21: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

Server

Tera/Peta-scaleArchive

HRM

Tools for reliable staging,

transport, and replication

Server

Tera/Peta-scaleArchive

HRM

ClientSelectionControl

MonitoringHRM

Storage/Data Management

Page 22: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

OPeNDAPOPeNDAP

An Open Source Project for a An Open Source Project for a Network Data Access ProtocolNetwork Data Access Protocol

(originally DODS, the Distributed (originally DODS, the Distributed Oceanographic Data System)Oceanographic Data System)

Page 23: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

OPeNDAP-g-Transparency-Performance-Security-Authorization-(Processing)Typical Application

Data(local)

netCDF lib

Application

Data(remote)

OPeNDAP Client

Application

OPeNDAPViahttp

Big Data(remote)

ESG client

Application

ESG+

DODS

OpenDAP Server ESG Server

Distributed Application

data

Distributed Data Access Services

OPeNDAPViaGrid

Page 24: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

For XML encoding of metadata (and data) of any generic netCDF For XML encoding of metadata (and data) of any generic netCDF filefile

Objects: netCDF, dimension, variable, attributeObjects: netCDF, dimension, variable, attribute Beta version reference implementation as Java Library Beta version reference implementation as Java Library

(http://www.scd.ucar.edu/vets/luca/netcdf/extract_metadata.htm)(http://www.scd.ucar.edu/vets/luca/netcdf/extract_metadata.htm)

ESG: NcML Core SchemaESG: NcML Core Schema

netCDFnetCDF

nc:netCDFType

nc:dimension

nc:variable

nc: attribute

nc:attribute

nc:values

nc:VariableType

Page 25: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

Object[1] id

Object[1] id

Activity[0,1] name[0,1] description[0,1] rights[0,n] date type=[0,n] note[0,n] participant role=[0,n] reference uri=

Activity[0,1] name[0,1] description[0,1] rights[0,n] date type=[0,n] note[0,n] participant role=[0,n] reference uri=

isA

Investigation

Investigation

isA

Project[0,n] topic type=[0,1] funding

Project[0,n] topic type=[0,1] funding

isA Ensemble

Ensemble

Campaign

Campaign

isPartOf

Simulation[0,n] simulationInput type=[0,n] simulationHardware

Simulation[0,n] simulationInput type=[0,n] simulationHardware

Observation

Observation

Experiment

Experiment

Analysis

Analysis

isPartOf

hasParent

hasChild

hasSibling

Dataset[0,1] type[0,1] conventions[0,n] date type=[0,n] format type= uri=[0,1] timeCoverage[0,1] spaceCoverage

Dataset[0,1] type[0,1] conventions[0,n] date type=[0,n] format type= uri=[0,1] timeCoverage[0,1] spaceCoverage

isA

generatedBy

isPartOf

Person[0,1] firstName[0,1] lastName[0,1] contact

Person[0,1] firstName[0,1] lastName[0,1] contact

Institution[0,1] name[0,1] type[0,1] contact

Institution[0,1] name[0,1] type[0,1] contact

isAworksF

or

participant role=

Class

Class

AbstractClass

AbstractClass

inheritanceassociation

LEGEND

Service[0,1] name[0,1] description

Service[0,1] name[0,1] description

serviceId

Page 26: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

ESG Current TopologyESG Current Topology

RLI

MSSHRM

HPSS HRM

RLI

HPSSHRM

RLI

DISKHRM

RLI

DISK

OGSA-DAIMySQLRDBMS

ESG WEB PORTALTomcat/Struts

cross-updatecross-update

gridFTP

gridFTP

gridFTP

query

queryMyProxy

authenticate

GRAMGATEKEEPER

submit

execute

gridFTP SERVER

gridFTP SERVER

gridFTP SERVER

gridFTP SERVER

LAS SERVERvisualize

LBNL

ISI

LLNL

NCAR

ORNL

CAS

ANL

LRC

LRC

LRC

LRC

Page 27: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

Data->KnowledgeData->Knowledge

Mass StorageSystem (1.3PB) Petascale Knowledge

Repository

Establish new paradigms for managing and accessingscientific data based on semantic organization.

Page 28: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

Collaborations & RelationshipsCollaborations & Relationships CCSM Data Management GroupCCSM Data Management Group The Globus ProjectThe Globus Project Other SciDAC Projects: Climate, Security & Policy for Other SciDAC Projects: Climate, Security & Policy for

Group Collaboration, Scientific Data Management ISIC, & Group Collaboration, Scientific Data Management ISIC, & High-performance DataGrid ToolkitHigh-performance DataGrid Toolkit

OPeNDAP/DODS (multi-agency)OPeNDAP/DODS (multi-agency) NSF National Science Digital Libraries Program (UCAR & NSF National Science Digital Libraries Program (UCAR &

Unidata THREDDS Project)Unidata THREDDS Project) U.K. e-Science and British Atmospheric Data CenterU.K. e-Science and British Atmospheric Data Center NOAA NOMADS and CEOS-gridNOAA NOMADS and CEOS-grid Earth Science Portal group (multi-agency, intnl.)Earth Science Portal group (multi-agency, intnl.) ESMF (emerging)ESMF (emerging)

Page 29: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

NCAR Command Language NCAR Command Language (NCL)(NCL)

Page 30: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

Page 31: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

Page 32: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

Page 33: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

Page 34: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

Page 35: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

Page 36: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

NCL: CoreNCL: Core Approx. 500 built-in functions and proceduresApprox. 500 built-in functions and procedures

– File I/O & data model for Earth sciencesFile I/O & data model for Earth sciences– Unique grids, Climate-modeling routinesUnique grids, Climate-modeling routines– Spherical harmonics, Regridding and Spherical harmonics, Regridding and

interpolationinterpolation– Graphics (wind barbs, simple 3D plots)Graphics (wind barbs, simple 3D plots)

36 NCL core visual representations36 NCL core visual representations– Contours, XY plots, vectors, streamlines, Contours, XY plots, vectors, streamlines,

maps, histograms, text, markers, polygonsmaps, histograms, text, markers, polygons Supported on Unix, Linux, Mac, and PCSupported on Unix, Linux, Mac, and PC10 years, 20 People involved with

development, 50 person-years of effort, about 1.5 million lines of source, 500K lines of documentation

Page 37: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

NCL as CI for a CommunityNCL as CI for a Community CAM & CCSM Processor – 100 functions, 200 CAM & CCSM Processor – 100 functions, 200

examples, 20K lines of NCL code (CGD)examples, 20K lines of NCL code (CGD) WGNE Climate Diagnostics Processor – 10K WGNE Climate Diagnostics Processor – 10K

lines of NCL code (CGD) lines of NCL code (CGD) Award-winning Aviation Weather Site (RAP)Award-winning Aviation Weather Site (RAP) MM5 Analysis Package (RIP)MM5 Analysis Package (RIP) Weather Research & Forecast Model: Initial Weather Research & Forecast Model: Initial

community analysis software and RIPcommunity analysis software and RIP Community Data Portal (SCD)Community Data Portal (SCD)

Page 38: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

NCLNCL

http://ngwww.ucar.edu/nclhttp://ngwww.ucar.edu/ncl

Page 39: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

Collaborative Environments and the Collaborative Environments and the AccessGridAccessGrid

Science Portals + AccessGrid:University of Michigan (Knoop, Hardin)Vegetation & Ecosystem Mapping Program

(VEMAP)NCAR/SCD VETS/KEGArgonne National Labs

Page 40: NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.

NCAR

ENDEND