Top Banner
Barbara Brown 1 , Ed Tollerud 2 , and Tara Jensen 1 1 NCAR/RAL, Boulder, CO and DTC 2 NOAA/GSD, Boulder, CO and DTC DET: Testing and Evaluation Plan Wally Clark
8

DET: Testing and Evaluation Plan

Feb 24, 2016

Download

Documents

Duane

DET: Testing and Evaluation Plan. Barbara Brown 1 , Ed Tollerud 2 , and Tara Jensen 1 1 NCAR/RAL, Boulder, CO and DTC 2 NOAA/GSD, Boulder, CO and DTC. Wally Clark. DTC and DET Testing and Evaluation. T&E is one of the most important activities undertaken by the DTC - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: DET: Testing and Evaluation Plan

Barbara Brown1, Ed Tollerud2, and Tara Jensen1

1 NCAR/RAL, Boulder, CO and DTC 2 NOAA/GSD, Boulder, CO and DTC

DET: Testing and Evaluation Plan

Wally Clark

Page 2: DET: Testing and Evaluation Plan

DTC and DET Testing and Evaluation

T&E is one of the most important activities undertaken by the DTCDTC testing has involved WRF core

comparisons, boundary layer schemes, and other aspects of NWP

DTC has created “Reference Configurations” (RCs) that are to be re-tested in conjunction with model changes

DET infrastructure is being developed to allow Testing and evaluation andIntercomparison of ensemble systems and system

components

Page 3: DET: Testing and Evaluation Plan

Major categories of testingForecasting system comparisons

Compare forecasts based on one configuration with forecasts based on a different model configuration

ExamplesTwo types of model initializationTwo or more methods of statistical post-processing

Individual reference configurationModel “setup” is evaluatedSetup is re-evaluated when model changes are implementedReference configurations may be defined by

Operational centersUsers

RCs may also be community-contributedForecasts contributed by a modeling group

Ex: Forecasts evaluated in HWT and HMT projects

Page 4: DET: Testing and Evaluation Plan

DTC Testing and Evaluation Principles A formal test plan is developed,

defining all of the important aspects of the testing and evaluationDeveloper may have a role in helping

to create the test plan Execution of test is independent of

the developer Focus of test depends on the

questions that are of interestModule being usedVariables of interest

Many cases evaluated for statistical significanceNot just a few case studiesMultiple seasons, times of day, etc.

Meaningful stratificationsLocation/regionSeasonOther user-based criteria

Page 5: DET: Testing and Evaluation Plan

Components of a test plan (example)GoalsExperiment design

Codes Specification of the codes will be run as

part of the testModel output

What kinds of output will be produced?Forecast periodsPost-processingVerification

Statistical methods and measuresGraphics generation and displayData archival and dissemination of

resultsComputer resourcesDeliverables

Example from QNSE evaluation (surface T and wind)

Page 6: DET: Testing and Evaluation Plan

Questions to address when developing a test planWhich aspect(s) (or modules)of the ensemble

system will be evaluated?What performance aspects are we trying to

compare? Or evaluate?Who are the “users”?What are the variables of interest?

Answers to these questions will lead to determination of the other aspects of the plan

Page 7: DET: Testing and Evaluation Plan

Considerations for ensemble T&ENumber of cases will likely need to be

increased (over non-ensemble evaluations)Many probabilistic and ensemble verification

scores (e.g., reliability) require relatively large subsamples

Subsamples must be large enough to assess statistical significance

But – Sampling must be focused enough for representativeness

Verification approaches and metrics are somewhat unique

Computer resources may be a limitation

Page 8: DET: Testing and Evaluation Plan

Other considerationsReal-time vs. post-analysis

DTC intensive tests generally done in post-analysis

Real-time demonstrations also have many benefits (e.g., HMT, HWT)

Subjective evaluations – should these be considered for DET T&E?

How much rigorous end-to-end testing required vs. evaluation of individual components?

Example for HMT evaluation – winter 2010