Anatomy of a Climate Science- centric Workflow Harinarayan Krishnan, CAlibrated and Systematic Characterization, Attribution, and Detection of Extremes (CASCADE Team) Kevin Bensema, Surendra Byna, Soyoung Jeon, Karthik Kashinath, Burlen Loring, Pardeep Pall, Prabhat, Alexandru Romosan, Oliver Ruebel, Daithi Stone, Travis O'Brien, Christopher Paciorek, Michael Wehner, Wes Bethel, William Collins
24
Embed
Anatomy of a Climate Science-centric Workflow Harinarayan Krishnan, CAlibrated and Systematic Characterization, Attribution, and Detection of Extremes.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Anatomy of a Climate Science-centric Workflow
Harinarayan Krishnan, CAlibrated and Systematic Characterization, Attribution,
and Detection of Extremes (CASCADE Team)
Kevin Bensema, Surendra Byna, Soyoung Jeon, Karthik Kashinath, Burlen Loring, Pardeep Pall, Prabhat, Alexandru Romosan, Oliver Ruebel, Daithi Stone, Travis
O'Brien, Christopher Paciorek, Michael Wehner, Wes Bethel, William Collins
Challenges
• Scale of data already at TBs and will only grow larger.
• Processing Three to Six hours of intervals frequently.
• Foci now is on High resolution 1/4th to 1/8th degree. Extensible to higher.
• High resolution and high frequency analysis add several orders of magnitude.
Proposed Strategy
• Identification of use cases, extraction of common computational algorithms, scaling & optimization of current work.
• Template workflow configurations of common use cases.
• Abstraction of services to HPC environments.
• Easy to use archiving, distribution, and verification strategies.
• Standardization of parallel work environment.
What it is/What it is not
• What it is not— Not a general workflow— Not a general infrastructure – Balancing between
performance & exploratory science.• What it is
—…
For Example:
t = cascade.Teca()
t['filename'] = ‘myfile’
writer = cascade.Writer(cascade.ESGF)
writer[‘input’] = t[‘out’]
n = workflow.NERSC(<resources>, writer)
n.execute()
Note: Active Work in progress & ongoing…
Start (Stage Data)
Schedule Job (Time|
Log)
Load & Run Teca task
(Time|Log)
Verify & Validate
Load & Run Teca task
(Time|Log)
Verify & Validate
Load & Run Teca task
(Time|Log)
Verify & Validate
Publish using ESGF node
Write & Distribute to
ESGF
Record Workflow
Load & Run Teca task
(Time|Log)
Verify & Validate
Load & Run Teca task
(Time|Log)
Verify & Validate
What it is/What it is not
• What it is not— Not a general workflow— Not a general infrastructure – Balancing
between performance & exploratory science.• What it is
— A highly customized climate-centric API (Zonal Mean Averages, GEV, etc…)
• Confluence – Portal to publish and collaborate with team members
• Jira – Bug & Issue tracking portal.
• CDash/Jenkins – Infrastructure to report status of software build & regression tests.
• BitBucket – Main software repository.
• ESGF service – Service for distribution of data generated by CASCADE.
CASCADE Team
• Detection & Attribution Team – Characterization, detection, and attribution of simulated and observed extremes in a variety of different contexts -- Analysis Algorithms
• Model Fidelity – . Evaluation and improvement of model fidelity in simulating extremes
• Statistics – Development of statistical frameworks for extremes analysis, uncertainty quantification, and model evaluation
• Formulation of highly parallel software for analysis and uncertainty quantification of extremes
Analysis Infrastructure Tasks
• Development of new climate-centric algorithms and evaluation of current ones. Implement scalable, parallel versions as needed.
• Performance analysis and data management.
• Deployment and Maintenance on HPC environments.
• Creating a standardized environment – Provide same execution environment on all deployed platforms, and seamless bridges different technologies (Python <-> R).
• User Support.
Detection & Attribution
• Single Program Multiple Data SPMD scripts – refactoring current algorithms to work in parallel.
• Distribution/Staging – Functionality to distribute data generated through ESGF also stage data at NERSC.
• TECA – Active development of Parallel Toolkit for Extreme Climate Analysis.
• Teleconnections – Ensemble analysis & software solutions to investigate of frequency of teleconnection events.
Model Fidelity
Model Fidelity
• ILIAD workflow—The parallelization of the generation of initial
conditions.—Dynamic Building, Compilation & Execution of
CESM.—Module verification – monitor execution status
& successful completion.—Module for automation of archiving of output