NCAR Cyberinfrastructure Cyberinfrastructure for for Earth System Modeling Earth System Modeling Don Middleton Don Middleton NCAR Scientific Computing Division NCAR Scientific Computing Division APAN eScience Workshop, Honolulu APAN eScience Workshop, Honolulu January 28, 2004 January 28, 2004
40
Embed
NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Cyberinfrastructure forCyberinfrastructure forEarth System ModelingEarth System Modeling SupercomputersSupercomputers High-bandwidth networksHigh-bandwidth networks ModelsModels Data centers and GridsData centers and Grids CollaboratoriesCollaboratories Analysis and VisualizationAnalysis and Visualization
NCAR
““Atkins Report”Atkins Report” ““A new age has dawned…”A new age has dawned…”
“The Panel’s overarching recommendation is that the National Science Foundation should establish and lead a large-scale, interagency, and internationally coordinated Advanced Cyberinfrastructure Program (ACP) to create, deploy, and apply cyberinfrastructure in ways that radically empower all scientific and engineering research and allied education. We estimate that sustained new NSF funding of $1 billion per year is needed to achieve critical mass and to leverage the coordinated co-investment from other federal agencies, universities, industry, and international sources necessary to empower a revolution. The cost of not acting quickly or at a subcritical level could be high, both in opportunities lost and in increased fragmentation and balkanization of the research.”
Atkins Report, Executive Summary
NCAR
Characteristics of Infrastructure(from Kim Mish workshop presentation) EssentialEssential
– So important that it becomes ubiquitousSo important that it becomes ubiquitous ReliableReliable
– Example: the built environment of the Roman EmpireExample: the built environment of the Roman Empire ExpensiveExpensive
– Nothing succeeds like excess (e.g. Interstate system)Nothing succeeds like excess (e.g. Interstate system)– Inherently one-off (often, few economies of scale)Inherently one-off (often, few economies of scale)
Clear factorization between research and practiceClear factorization between research and practice– Generally deploy what provably worksGenerally deploy what provably works
NCAR
A Global Coupled Climate A Global Coupled Climate ModelModel
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
NCAR
Climate Model Data ProductionClimate Model Data Production T42 CCSM (current, 280km)T42 CCSM (current, 280km)
– 7.5GB/yr, 100 years -> .75TB7.5GB/yr, 100 years -> .75TB T85 CCSM (140km)T85 CCSM (140km)
– 29GB/yr, 100 years -> 2.9TB29GB/yr, 100 years -> 2.9TB T170 CCSM (70km)T170 CCSM (70km)
– 110GB/yr, 100 years -> 11TB110GB/yr, 100 years -> 11TB
NCAR
Capacity-related ImprovementsCapacity-related ImprovementsIncreased turnaround, model development, ensemble of runs
Increase by a factor of 10, linear data
Current T42 CCSMCurrent T42 CCSM– 7.5GB/yr, 100 years -> .75TB * 10 = 7.5GB/yr, 100 years -> .75TB * 10 =
Model Improvement WishlistModel Improvement Wishlist
Grand Total:
Increase compute by a Factor O(1000-10000)
NCAR
Advances at the Earth SimulatorAdvances at the Earth Simulator
ESC Climate Model at T1279 (approx. 10km)
NCAR
Longer-term MissionsLonger-term Missions - - Observation of Key Earth System InteractionsObservation of Key Earth System Interactions
Terra
Aura
Aqua
Landsat 7
Exploratory - Exploratory - Explore Specific Earth System Processes and Parameters and Explore Specific Earth System Processes and Parameters and Demonstrate TechnologiesDemonstrate Technologies
GRACE
PICASSO
Cloudsat
QuikScat
EO-1
ICEsat Jason-1
SRTMVCL
We Will Examine Practically Every Aspect of the Earth We Will Examine Practically Every Aspect of the Earth System from Space in This DecadeSystem from Space in This Decade
Triana
Courtesy of Tim Killeen, NCAR
NCAR
The Earth System GridThe Earth System Grid
U.S. DOE SciDAC funded R&D effort - a U.S. DOE SciDAC funded R&D effort - a ““Collaboratory Pilot Project”Collaboratory Pilot Project”
Build an “Earth System Grid” that enables Build an “Earth System Grid” that enables management, discovery, distributed access, management, discovery, distributed access, processing, & analysis of distributed terascale processing, & analysis of distributed terascale climate research dataclimate research data
Build upon Globus ToolkitBuild upon Globus Toolkit and DataGrid and DataGrid technologies and technologies and deploydeploy
Potential broad application to other areasPotential broad application to other areas
LBNLLBNL– Arie ShoshaniArie Shoshani– Alex SimAlex Sim
ORNLORNL– David BernholdteDavid Bernholdte– Kasidit ChanchioKasidit Chanchio– Line PouchardLine Pouchard
LLNL/PCMDILLNL/PCMDI– Bob DrachBob Drach– Dean Williams (PI)Dean Williams (PI)
USC/ISIUSC/ISI– Anne ChervenakAnne Chervenak– Carl KesselmanCarl Kesselman– (Laura Perlman)(Laura Perlman)
NCARNCAR– David BrownDavid Brown– Luca CinquiniLuca Cinquini– Peter FoxPeter Fox– Jose GarciaJose Garcia– Don Middleton (PI)Don Middleton (PI)– Gary StrandGary Strand
NCAR
NCAR
ESG ScenarioESG Scenario End 2002: 1.2 million files comprising End 2002: 1.2 million files comprising
~75TB of data at NCAR, ORNL, LANL, ~75TB of data at NCAR, ORNL, LANL, NERSC, and PCMDINERSC, and PCMDI
End 2007: As much as 3 PB (3,000 TB) End 2007: As much as 3 PB (3,000 TB) of data (!)of data (!)
Current practice is already broken – the Current practice is already broken – the future will be even worse if something future will be even worse if something isn’t done…isn’t done…
NCAR
ESG: ChallengesESG: Challenges Enabling the simulation and data Enabling the simulation and data
management teammanagement team Enabling the core research community Enabling the core research community
in analyzing and visualizing resultsin analyzing and visualizing results Enabling broad multidisciplinary Enabling broad multidisciplinary
communities to access simulation communities to access simulation resultsresultsWe need integrated scientific work environments that enable
smooth WORKFLOW for knowledge development: computation, collaboration & collaboratories, data management, access, distribution, analysis, and visualization.
NCAR
ESG: StrategiesESG: Strategies Harness a federation of sites, web portalsHarness a federation of sites, web portals
– Globus Toolkit -> The Earth System Grid -> The Globus Toolkit -> The Earth System Grid -> The UltraDataGridUltraDataGrid
Move data a minimal amount, keep it close to Move data a minimal amount, keep it close to computational point of origin when possiblecomputational point of origin when possible– Data access protocols, distributed analysisData access protocols, distributed analysis
When we must move data, do it fast and with When we must move data, do it fast and with a minimum amount of human interventiona minimum amount of human intervention– Storage Resource Management, fast networksStorage Resource Management, fast networks
Keep track of what we have, particularly Keep track of what we have, particularly what’s on deep storagewhat’s on deep storage– Metadata and Replica CatalogsMetadata and Replica Catalogs
NCAR
NCAR
Server
Tera/Peta-scaleArchive
HRM
Tools for reliable staging,
transport, and replication
Server
Tera/Peta-scaleArchive
HRM
ClientSelectionControl
MonitoringHRM
Storage/Data Management
NCAR
OPeNDAPOPeNDAP
An Open Source Project for a An Open Source Project for a Network Data Access ProtocolNetwork Data Access Protocol
(originally DODS, the Distributed (originally DODS, the Distributed Oceanographic Data System)Oceanographic Data System)
Dataset[0,1] type[0,1] conventions[0,n] date type=[0,n] format type= uri=[0,1] timeCoverage[0,1] spaceCoverage
Dataset[0,1] type[0,1] conventions[0,n] date type=[0,n] format type= uri=[0,1] timeCoverage[0,1] spaceCoverage
isA
generatedBy
isPartOf
Person[0,1] firstName[0,1] lastName[0,1] contact
Person[0,1] firstName[0,1] lastName[0,1] contact
Institution[0,1] name[0,1] type[0,1] contact
Institution[0,1] name[0,1] type[0,1] contact
isAworksF
or
participant role=
Class
Class
AbstractClass
AbstractClass
inheritanceassociation
LEGEND
Service[0,1] name[0,1] description
Service[0,1] name[0,1] description
serviceId
NCAR
ESG Current TopologyESG Current Topology
RLI
MSSHRM
HPSS HRM
RLI
HPSSHRM
RLI
DISKHRM
RLI
DISK
OGSA-DAIMySQLRDBMS
ESG WEB PORTALTomcat/Struts
cross-updatecross-update
gridFTP
gridFTP
gridFTP
query
queryMyProxy
authenticate
GRAMGATEKEEPER
submit
execute
gridFTP SERVER
gridFTP SERVER
gridFTP SERVER
gridFTP SERVER
LAS SERVERvisualize
LBNL
ISI
LLNL
NCAR
ORNL
CAS
ANL
LRC
LRC
LRC
LRC
NCAR
Data->KnowledgeData->Knowledge
Mass StorageSystem (1.3PB) Petascale Knowledge
Repository
Establish new paradigms for managing and accessingscientific data based on semantic organization.
NCAR
Collaborations & RelationshipsCollaborations & Relationships CCSM Data Management GroupCCSM Data Management Group The Globus ProjectThe Globus Project Other SciDAC Projects: Climate, Security & Policy for Other SciDAC Projects: Climate, Security & Policy for
Group Collaboration, Scientific Data Management ISIC, & Group Collaboration, Scientific Data Management ISIC, & High-performance DataGrid ToolkitHigh-performance DataGrid Toolkit
OPeNDAP/DODS (multi-agency)OPeNDAP/DODS (multi-agency) NSF National Science Digital Libraries Program (UCAR & NSF National Science Digital Libraries Program (UCAR &
Unidata THREDDS Project)Unidata THREDDS Project) U.K. e-Science and British Atmospheric Data CenterU.K. e-Science and British Atmospheric Data Center NOAA NOMADS and CEOS-gridNOAA NOMADS and CEOS-grid Earth Science Portal group (multi-agency, intnl.)Earth Science Portal group (multi-agency, intnl.) ESMF (emerging)ESMF (emerging)
NCAR
NCAR Command Language NCAR Command Language (NCL)(NCL)
NCAR
NCAR
NCAR
NCAR
NCAR
NCAR
NCAR
NCL: CoreNCL: Core Approx. 500 built-in functions and proceduresApprox. 500 built-in functions and procedures
– File I/O & data model for Earth sciencesFile I/O & data model for Earth sciences– Unique grids, Climate-modeling routinesUnique grids, Climate-modeling routines– Spherical harmonics, Regridding and Spherical harmonics, Regridding and
interpolationinterpolation– Graphics (wind barbs, simple 3D plots)Graphics (wind barbs, simple 3D plots)
maps, histograms, text, markers, polygonsmaps, histograms, text, markers, polygons Supported on Unix, Linux, Mac, and PCSupported on Unix, Linux, Mac, and PC10 years, 20 People involved with
development, 50 person-years of effort, about 1.5 million lines of source, 500K lines of documentation
NCAR
NCL as CI for a CommunityNCL as CI for a Community CAM & CCSM Processor – 100 functions, 200 CAM & CCSM Processor – 100 functions, 200