Appendix F: Appendix to Section 11, “Task 10: Continued ... fileAppendix F: Appendix to Section 11, “Task 10: Continued Improvement to Model Evaluation Software” ... TABEL OF

Final Report for the WRAP Regional Modeling Center for the Project Period March 1, 2004 – February 28, 2005

Appendix F: Appendix to Section 11, “Task 10: Continued Improvement to Model Evaluation Software”

The attached appendix is referred to in Section 11, which covers Task 10. This appendix is a draft of the model performance evaluation (MPE) documentation prepared by RMC staff at CE-CERT, University of California, Riverside and titled “User’s Guide: Air Quality Model Evaluation Software, Version 2.0.1.”

Note that the main body of this report is contained in a separate file located at

http://pah.cert.ucr.edu/aqm/308/reports/final/2004_RMC_final_report_main_body.pdf

USER’S GUIDE

AIR QUALITY MODEL EVALUATION SOFTWARE

VERSION 2.0.1

Chao-Jung Chien, Gail Tonnesen, Bo Wang

Air Quality Modeling Group CE-CERT, University of California Riverside

1084 Columbia Ave. Riverside, CA 92507

Voice: 909-781-5791 | Fax: 909-781-5790

May 24, 2005

UCR Model Evalution Software 5/24/2005

ii

Copyright (c) 2003 - 2004

Air Quality Modeling Group College of Engineering Center for Environmental Research and Technology

University of California Riverside

This program is free software; permission to use, copy, modify, and distribute this software and its documentation for any purpose without fee is hereby granted, provided that both the above copyright notice and this permission notice remain untouched in all copies and in supporting documentation. This software is provided "as is" without express or implied warranty of any kind.


iii

TABEL OF CONTENTS 1. Introduction

2. Ambient Monitoring Data 3. Model Performance Metrics 4. Software Design Approach ……….…………………………………………………….……...1 5. Preparation of Model Performance Evaluation… ……………………....……...………………..3

5.1. Steps to install

5.2. Steps to run

5.2.1. Preparation of Data Input Files………………………………………………………3

5.2.2. Preparation of Monitoring Station Location Information File……………………….4

5.2.3. Preparation of Species Mapping File………………………………………………...5

5.2.4. Preparation of Model Output File or Model Input File………………………………6

5.2.5. Define the Model Evaluation Configuration in the Driven Script File………………7 Appendix A: Sample Driven Script....………………………………………………...16 Appendix B: Sample Ambient Data Input Files ……...………….………………...…17 Appendix C: Sample Output Files…………………………..………………………...18 Acknowledgement…………………………………………………………………….22


1

1. Introduction Air quality models (AQMs) are widely used for developing emissions reduction strategies to attain National Ambient Air Quality Standards (NAAQS) for ambient air pollutants. AQMs are typically applied for an historical air pollution episode using event specific emissions and meteorology input data, and the model predictions are then compared with ambient monitoring data to validate the usefulness of the model other applications. For regulatory applications the validated model is typically used to predict the benefits of emissions controls for reducing pollutants for hypothetical future air pollution episode, with the assumption that future meteorological conditions will be similar to that of the historical episode. The comparison of model predictions to ambient concentrations for the historical episode has been variously described as model “verification”, “validation” or “evaluation”. Oresekes et al. (1994) discuss the significance of each of these terms and they argue that, for complex environmental models, the most accurate descriptor is “model evaluation”, where the evaluation is designed to test the usefulness of the model for a particular application. Consistent with Oreskes et al. we use the term model evaluation to describe the comparison of model predictions to ambient data. Comparisons of AQM predictions to ambient data is not the only criteria that should be used in evaluating air quality models. For example, model formulation, completeness, code design, and accuracy and robustness of numerical algorithms are all important factors in selecting a particular AQM. However, assuming these choices are sound, comparison of model results to ambient data is perhaps the most important step in evaluating an AQM. While considerable effort has been mode to develop standardized AQMs for general use (e.g., CAMx and CMAQ are two widely used community models) there has not been a similar effort to standardize and make accessible ambient data bases and software for performing model evaluations. Nonetheless, there is a critical need for these because AQM evaluation is a complex, resource intensive task. Moreover, many of these tasks, such as preprocessing and QA of ambient data need only be performed once and then can be shared among many different users. Issues that must be addressed in performing a model evaluation include the following:

• Access to raw ambient monitoring data is sometimes highly limited, e.g., the AIRS gas phase data is not readily accessible to most modelers.

• Extensive data processing, formatting and QA is required before the ambient data can be used.

• In many cases the ambient trace species data are inconsistent with the representation of the trace species used in AQMs, and specialized knowledge is required to determine how best to compare ambient data to model species.

• Software or other data analysis tools must be used to map monitoring site locations to model grid cells and to handle cases where monitoring sites are located near boundaries of grid cells.


2

• Software must be developed to match the monitoring data temporal averaging period to the model results and to correctly treat time zone information, where both the time zone and sampling period can vary among monitoring sites for some ambient networks.

• Policy guidance from EPA requires that a variety of model performance statistics be computed for regulatory applications of AQMs, and this requires specialized knowledge.

• A variety measures of error and bias metrics are used in the modeling community, so modelers must compute a large set of metrics to facilitate comparisons of model performance result for different application and among different research groups.

Despite the importance and the complexity of model evaluation, there are no standardized data sets or software packages that are widely accessible. In early 2001, with funding from the Western Governors’ Association through its Western Regional Air Partnership (WRAO) we began an effort to perform long term modeling of gas phase and particulate chemistry and transport in which we routinely operate the CMAQ model for annual simulation for a variety of model sensitivity cases. Because the difficulty and cost of performing a model performance evaluation are compounded by long term modeling scenarios, we decided to invest considerable effort in developing software packages to automate the model performance evaluation. While this initially required greater effort, the MPE software makes it possible to routinely perform sophisticated model evaluation studies with minimal effort. As we continue to explore methods of presenting model evaluation results we also continue to modify the MPE software. By releasing an open source version of the software we hope that the air quality modeling community can adapt and add new features to the MPE software so that all modelers can benefit from others experience and resources. Finally, we note that comparisons of AQMs to ambient data can be classified as objective evaluations in which modeled species concentrations are compared to ambient observations, and diagnostic evaluations in which combinations of trace species and reaction kinetics data are used to probe or diagnose the photochemical regimes and the chemical transformations of precursors to secondary air pollutants. In this document we focus specifically on tools for comparing model to ambient data. 2. Ambient Monitoring Data We have compiled the ground-level model evaluation database for 2002 using several routine and research-grade ambient monitoring databases including the following for fine particulate matter:

• Interagency Monitoring of Protected Visual Environments (IMPROVE) • Clean Air Status and Trends Network (CASTNET) • EPA Speciation Trends Network (STN) • National Acid Deposition Network (NADP) • Southeastern Aerosol Research and Characterization (SEARCH).


3

In addition the we use the EPA’s Aerometric Information Retrieval System (AIRS/AQS) data base for routine gas-phase concentration measurements for ozone, NO, NO2 and CO archived. Data from these ambient networks are briefly described in the following sections. Figures 2.1~2.2 display the locations of the monitors for the various monitoring networks operating during 2002. Typically, these networks provide ozone, PM and visibility measurements, and the types of data available from these specialized PM monitoring programs are summarized in Table 2-1. Because of different lumping schemes in the model chemistry and the way model outputs in concentration units, some measured species can not be compared directly to the model species, certain mapping schemes are thus applied for model-to-observation species comparisons (as summarized in Table 2-2). Table 1.1 Summary of ambient database used in the evaluation

Monitoring Network Chemical Species Measured

Sampling Frequency; Duration

Data Availability (sites)

The Interagency Monitoring of Protected Visual Environments (IMPROVE)

Speciated PM25 and PM10

1 in 3 days; 24 hr ~62

Clean Air Status and Trends Network (CASTNET)

Speciated PM25, Ozone

Weekly; Week ~72

EPA Air Quality System (AQS) O3, CO, SO2, NO, NOy

Hourly; 1-hr average ~1536

Speciation Trend Network (STN)

Speciated PM25 Varies; Varies ~215

National Acid Deposition Network (NADP)

WSO4, WNO3, WNH4

Weekly ~100

Southeastern Aerosol Research and Characterization (SEARCH)

PM25 (OC, BC, SO4, NO3, NH4, Elem.); O3, NO, NO2, NOy, HNO3, SO2, CO, NH3)

Hourly, Daily; varies 8


1

Table 2 Model Species Mappings for Observation Species Observation Species Model Species

Compound1 IMPROVE CASTNet AQS SEARCH CMAQ CAMx

Gaseous Species

O3 O3 O3 O3 O3

CO CO CO CO CO

NO NO NO NO NO

SO2 SO2 SO2 SO2 SO2

SO2 (ug/m3) TOTAL_SO2 2617.6*SO2 2617.6*SO2

HNO3 HNO3 HNO3 HNO3

HNO3 (ug/m3) NHNO3

(nylon filter) 2576.7*HNO3 2576.7*HNO3

NOy NOY

NO + NO2 + HONO + NO3 + 2*N2O5 + HNO3 + PAN + XNO2 + TPAN + HNO4

NO + NO2 + HONO + NXOY + HNO3 + PAN + NTR


2

Observation Species Model Species

Compound1 IMPROVE CASTNet STN NADP SEARCH CMAQ CAMx

Particular Species

SO4 3*S TSO4 (Teflon filter) SO4 wSO4 PCM1_SO4 ASO4J + ASO4I PSO4

NO3 NO3 TNO3 (Teflon filter) NO3 wNO3 PCM1_NO3 ANO3J + ANO3I PNO3

Particulate NO3+SO4

0.29*TNO3 + 0.375*TSO4

0.29*(ANO3I + ANO3J) + 0.375*(ASO4I + ASO4J)

0.29*PNO3 + 0.375*PSO4

Total NO3 (gas+particle) TOTAL_NO3 ANO3I + ANO3J +

0.9841*2576.7*HNO3 PNO3 + 0.9841*2576.7*HNO3

NH4 0.375*SO4 + 0.29*NO3

TNH4 (Teflon filter) NH4 wNH4 PCM1_NH4 ANH4J + ANH4I PNH4

OC 1.4*(OC1+ OC2 + OC3 + OC4 + OP)

OC 1.4*PCM3_OC + 0.0868*BackupPCM3_OC

AORGAJ + AORGAI + AORGPAJ + AORGPAI + AORGBJ + AORGBI

POA + SOA1 + SOA2 + SOA3 + SOA4

EC EC1 + EC2 + EC3 - OP EC PCM3_EC AECJ + AECI PEC

TCM OC+EC TC_5400

SOIL 2.2*Al + 2.49*Si + 1.63*Ca + 2.42*Fe + 1.94*Ti

PM25_MajorMetalOxide A25I +A25J FCRS + FPRM

CM MT - FM ACORS + ASEAS + ASOIL CCRS + CPRM


3

PM25a FM PM25

FRM_Mass + 1.293*PCM1_Vol_NO3 + 0.0624*1.4*PCM3_OC

ASO4J + ASO4I + ANO3J + ANO3I + ANH4J + ANH4I+ AORGAJ + AORGAI + AORGPAJ + AORGPAI + AORGBJ + AORGBI + AECJ + AECI + A25J + A25I

PSO4 + PNO3 + PNH4 + POA + SOA1 + SOA2 + SOA3 + SOA4 + PEC + FCRS + FPRM

RCFM 1.375*SO4 + 1.29*NO3 + EC + OC + SOIL

1.375*(ASO4J + ASO4I) + 1.29*(ANO3J + ANO3I) + AORGAJ + AORGAI + AORGPAJ + AORGPAI + AORGBJ + AORGBI + AECJ + AECI + A25J + A25I

1.375*PSO4 + 1.29*PNO3 + POA + SOA1 + SOA2 + SOA3 + SOA4 + PEC + FCRS + FPRM

PM10 MT PM25 + CM PM25 + CM


1

-2000 -1500 -1000 -500 0 500 1000 1500 2000

-1500

-1000

-500

0

500

1000

IMPROVE

CASTNetSEARCHSTNNADPAQS

Figure 2.1


2

-2000 -1500 -1000 -500 0 500 1000 1500 2000

-1500

-1000

-500

0

500

1000

IMPROVE

CASTNetSEARCHSTNNADPAQS

Figure 2.2


1

IMPROVE The Interagency Monitoring of Protected Visual Environments monitoring network (http://vista.cira.colostate.edu/improve) reports detailed chemical species in its measurements of major visibility-reducing aerosol species on a twice-a-week basis. The PM fine mass species being used in the evaluation are as follows:

• Sulfates (SO4), as sulfate ion; • Nitrates (NO3), as nitrate ion; • Organic carbon (OC), as organic carbon mass; • Elemental carbon (EC), as light absorbing carbon or carbon soot; • Soil (SOIL), as fine soil and is sum of several inorganic elements such as Al, Si, Ca, Fe and

Ti.

These species are all measured using a 2.5-micron cut point inlet. The IMPROVE monitors also measure total PM10 and PM2.5 mass. These values are reported as the PM2.5 fine matter (FM) portion of the mass and the coarse matter (CM) portion, as PM10 - PM2.5. The mapping of the CMAQ species to the IMPROVE species counterparts is summarized Table 3. Noted that in CMAQ water as fine particle species is not included among the mapping of IMPROVE species, because IMPROVE measures only dry particles. In addition, IMPROVE defines SOIL as fine soil concentration, which is the sum concentrations of several inorganic species. Although fine soil is not specifically defined in CMAQ, it is taken as unspeciated portion of PM2.5 emitted species. Therefore, model species, A25J+A25I, are used as surrogates for IMPROVE’s fine soil concentration.

CASTNET The Clean Air Status and Trends Network (CASTNET) was developed to monitor dry deposition. It includes measurements of ambient concentration and meteorology and land use which are then used to calculate dry deposition rates. A majority of the CASTNET sites measure sulfate, nitrate, including both gaseous (as HNO3) and aerosol phase, and ammonium in 7-day filter samples. Detailed data collection procedures are described at the EPA CASTNET website: http://www.eap.gov/castnet. In short, atmospheric concentration data are collected at each site with open-faced, 3-stage filter packs. The filter pack contains a Teflon filter for collection of particulate species, a nylon filter for nitric acid and a base-impregnated cellulose (Whatman) filter for sulfur dioxide. Filter packs are exposed for 1-week intervals, and are later extracted and analyzed for certain species.

Because CASTNET reports gaseous species, e.g. SO2 and HNO3, in ug/m3 unit, and, except REMSAD, all other models have gaseous species in ppmV unit, conversion factors (assumed under STP condition) of 2617.6 and 2576.7 are applied for the modeled SO2 and HNO3 species. It has been suggested that total nitrate (NO3 + HNO3), instead of individual aerosol nitrate and gaseous nitric acid, should be used for observation-modeled comparison, because the possible volatilization loss on the Teflon filter pack used in CASTNET (Ames R. B. and W.C. Malm, Atmospheric Environment, 2001, 905-916). Notice that a ratio of 0.9841, molecular weight ratio of NO3/HNO3, is applied in CASTNET’s total nitrate calculation.


2

AQS

The Air Quality System database is EPA’s repository of “criteria air pollutant”— carbon monoxide (CO), nitrogen dioxide (NO2), sulfur dioxide (SO2), ozone (O3), particulate matter (PM10 and PM25), and lead (Pb), monitoring data since 1970s. It replaced the Aeronomic Information Retrieval System (AIRS) as EPA’s main repository for ambient air monitoring data, including data from the State and Local Air Monitoring Stations (SLAMS), the National Air Monitoring Stations (NAMS), Photochemical Assessment Monitoring Stations (PAMS), and other sources of data. A majority of the ozone, and several gaseous components, including O3, SO2, CO, NO2 and etc., can be retrieved for hourly data from EPA’s web site: http://www.epa.gov/ttn/airs/airsaqs/archived%20data/downloadaqsdata.htm through data query requests. In this study, only hourly gaseous species in AQS are used for model evaluations. Unlike CASTNET data, gaseous species in AQS are reported in ppmV unit. Therefore, modeled species for REMSAD, which outputs in ug/m3, need to apply unit conversion factors as shown in Table 3. STN

EPA’s Speciation Trends Network includes about 215 monitoring stations nationwide. It appears that among these 215 sites may include IMPROVE sites or other data from other networks. This, however, needs to be verified. Daily PM2.5 data are measured for 64 species in the STN network. Some archived STN data files were obtained from the website:http://www.epa.gov/ttn/airs/airsaqs/archived%20data/archivedaqsdata.htm.

NADP

The National Atmospheric Deposition Program/National Trends Network (NADP/NTN) is designed to measure wet deposition. The network is a cooperative effort between State Agricultural Experiment Stations, the U.S. Geological Survey, U.S. Department of Agriculture, and other governmental and private entities. It includes over 200 sites in the continental United States, Alaska, and Puerto Rico, and the Virgin Islands. The purpose of the network is to collect data on the chemistry of precipitation for monitoring of geographical and temporal long-term trends. The precipitation at each station is collected weekly is analyzed for hydrogen (acidity as pH), sulfate, nitrate, ammonium, chloride, and base cations (such as calcium, magnesium, potassium and sodium). The NADP network includes a quality assurance program, so we expect to use this data without any additional QA. The NADP includes the Mercury Deposition Network (MDN) and the Atmospheric Integrated Research Monitoring Network (AIRMoN), designed to study precipitation chemistry trends with greater temporal resolution. Precipitation samples are collected daily from a network of nine sites and analyzed for the same constituents as the NADP/NTN samples. We are currently investigating the availability of the NADP and AIRMoN data. At present, we have not been able to access this data.

SEARCH


3

The Southeastern Aerosol Research and Characterization is a research experimental study intending to provide detailed aerosol climatology for the Southeast. There are currently eight monitoring sites located in four states in the SEARCH network. Most sites provide hourly and daily measurements for both gaseous and aerosol species. Archived data file and collecting information for SEARCH monitoring network can be obtained from the web site: http://www.atmospheric-research.com/studies/SEARCH/index.htm Ambient samples for SEARCH are collected on a sequential, multi-channel sampler known as the Particle Composition Monitor (PCM), and subsequently analyzed for particle speciation data. SEARCH uses two approaches, FRM Equivalent and Best Estimate, to represent the measurements of particulate matter of less than 2.5 ug/m3 (PM25) and its constituents. While FRM Equivalent is an attempt to replicate the measurement by FRM, Best Estimate is an attempt to represent what is actually in the atmosphere and is therefore used for the model evaluation in this study. 3. Model Performance Metrics Statistical measures that are frequently used in current PM and visibility model performance evaluation include accuracy, error and bias as summarized in Table 3-1. The calculations of error and bias statistic measures are based on the residuals of all pairs of model estimates and observations. Both error and bias measures provide a useful basis for comparison among model simulations across different model episodes. While most model performance evaluations have used the observations to normalize the error and the bias, this approach can lead to misleading conclusion. When normalizing to very low observed concentration values (e.g., clean conditions) model over predictions are weighted much more strongly than equivalent under predictions, as suggested by Seigneur et al. (JAWMA, 50, 588-599, 2000). Seigneur et al. (2000) have recommended that peak bias, average fractional bias, average fractional gross error, and regression be included as the key statistics in model’s operational evaluation to alleviate such problem. As the criteria for model performance evaluation have not been established, we recommend using all statistical measures listed in Table 3-1 in this study.


1

Table 3-1 Recommended statistical measures for model performance evaluation

Measure Mathematical Expression Notation

Accuracy of unpaired peak (Au)

peak

peakupeak

OOP −

Opeak = peak observation; upeakP = unpaired peak

prediction within certain surrounding grid cells of peak observation

Accuracy of paired peak (Ap)

peak

peak

OOP −

Ppeak = paired (in both time and space) peak prediction

Coefficient of determination (r2)

∑ ∑

∑

= =

=

−−

⎥⎦

⎤⎢⎣

⎡−−

N

i

N

iii

N

iii

OOPP

OOPP

1 1

22

2

1

)()(

))((

Pi = prediction at time and location i; Oi = observation at time and location i; P = arithmetic average of Pi, i=1,2,…, N; O = arithmetic average of Oi, i=1,2,…,N

Normalized Mean Error (NME)

∑

∑

=

=

−

N

ii

N

iii

O

OP

1

1

Reported as %

Root Mean Square Error (RMSE) ( )

21

1

21⎥⎦

⎤⎢⎣

⎡−∑

=

N

iii OP

N


2

Fractional Gross Error (FE) ∑

= +−N

i ii

ii

OPOP

N 1

2 Reported as %

Mean Absolute Gross Error (MAGE) ∑

=

−N

iii OP

N 1

1

Mean Normalized Gross Error (MNGE) ∑

=

−N

i i

ii

OOP

N 1

1

Reported as %

Mean Bias (MB) ( )∑=

−N

iii OP

N 1

1

Mean Normalized Bias (MNB)

( )∑=

−N

i i

ii

OOP

N 1

1 Reported as %

Mean Fractionalized Bias (Fractional Bias, MFB)

∑=

⎟⎟⎠

⎞⎜⎜⎝

⎛+−N

i ii

ii

OPOP

N 1

2 Reported as %


3

Normalized Mean Bias (NMB)

∑

∑

=

=

−

N

ii

N

iii

O

OP

1

1)(

Reported as %

Bias Factor (BF) ∑=

N

i i

i

OP

N 1

1

Bias Factor = 1 + MNB; Reported as ratio notation (prediction:observation)


4

4. Software Design Approach: Five Types of Input File:

• Observed Ambient Data File in spreadsheet ASCII format • Monitor Station Information File in spreadsheet ASCII format • Air Quality Model Output File(s) in netcdf format • Air Quality Model Meteorological Input File(s) in netcdf format (optional) • Ambient/Model Data Species Mapping File(s) in ASCII format

One Simple Driven Script – specify the followings in the script:

• What are the models’ names? • Where are the model files? • What is the evaluation period? • What are the monitoring data network names? • What types of output plots do you want?

Three Types of output Files

• Observed and Model Data Query files in ASCII format, allowing further examining or processing with other database software, e.g. Excel.

• Results of 17 Types of Statistical Analysis Matrix based on "One Site All Day", "All Site

All Day" and "One Day All Site".

1. Accuracy of Paired Peak 2. Coefficient of Determination 3. Normalized Mean Error 4. Root Mean Square Error 5. Fractional Gross Error 6. Mean Absolute Gross Error 7. Mean Normalized Gross Error 8. Mean Bias 9. Mean Normalized Bias 10. Fractional Bias 11. Normalized Mean Bias 12. Observed Mean 13. Predicted Mean 14. Standard Deviation for Observation 15. Standard Deviation for Prediction 16. Correlation Variance 17. Bias Factor


5

• 4 Types of Plots in PNG format: 1. Time Series Plot based on "One Site All Day" 2. Scatter Plot based on "One Site All Day" 3. Scatter Plot based on "All Site All Day" 4. Scatter Plot based on "One Day All Site"

Notes: All Plots also show two statistical results chosen from the above 17 types, plus all scatter

plots show regression analysis results.


6

5. Preparation of Model Performance Evaluation

5.1. Steps to Install o Open the package by `tar –xzvf Model_Eval_Tool.v2.tar.gz` at where you want to install o Use default pgf90 compiler or define other compiler in “Model_Eval_Too.v2/Makefile” o Go into the Model_Eval_Too.v2, type `make`, find your executable in bin/ o Done, if installed gnuplot already.

5.2. Steps to Run

5.2.1. Preparation of Observed Data Input File:

This program supports four types of ambient datasets differentiated by data recording method: “Daily Average”, “Hourly”, “Weekly Total”, and “Weekly Average”.

Table 5.1 Format of the “Daily Average” observation data input files. Column Description Format Notes

1 Site Identification One Text String String consists of characters and/or numeric numbers

2 Year Integer 4 digits (e.g., 1996) 3 Julian Date Integer <= 3 digits (e.g., 8, 45, 183) 4 Data Value Float Data sample value for 1 species

Table 5-2. Format of the “Hourly” observation data input files. Column Description Format Notes


2 Year Integer 4 digits (e.g., 1996) 3 Julian Date Integer <= 3 digits (e.g., 8, 45, 183) 4 Hour Integer <=2 digits, 0 <= Hour <= 23 4 Data Value Float Data sample value for 1 species


7

Table 5-3. Format of the “Weekly Total” or “Weekly Average” observation data input files. Column Description Format Notes


2 Year Integer 4 digits (e.g., 1996) 3 Start Julian Date Integer <= 3 digits (e.g., 1, 39, 183) 4 End Julian Date Integer <= 3 digits (e.g., 8, 45, 190) 5 Start Hour Integer <=2 digits, 0 <= Hour <= 23 6 End Hour Integer <=2 digits, 0 <= Hour <= 23 7 Data Value Float Data sample value for 1 species

Notes:

a) There is always a header line on the NO.1 Row other than the above data part for each observation data input file. The Header consists of strings with one single string for each column. i) “Daily Average” data:

Site Year Jdate $SpeciesName

ii) “Hourly” data: Site Year Jdate Hour $SpeciesName

iii) “Weekly Total” or “Weekly Average” data: Site Year StartJdate EndJdate StartHour EndHour $SpeciesName

b) If there are more than one species in the observation data files, please add more data value

columns correspondingly. c) Units should have been converted to ppmV for Gas Phase Species, and microgram/m3 for

Aerosol Species, 1/Mm for Extinction Coefficient Species, and kg/ha for Wet and Dry Deposition species.

d) Columns are separated with “space” or “tab”. e) For missing or illegal ambient data, please fill in the data field as –999.0 f) During model evaluation, model data will be processed only for those grid cell(s) in which

the monitor station(s) are located, and only for those corresponding observed sampling period.

g) In the plotting results, model data point will be showed up only when it has corresponding “available” observed data in the same location with the same sampling period. “available” means the observed data either greater than 0 or equal to –999.0.


8

5.2.2 Preparation of Monitoring Station Location Information File:

Table 1-4. Format of the Monitor Station Information File. Column Description Format Notes

1 Site Identification One Text String String consists of characters and/or numeric numbers, Should match those used in Observed Data file(s)

2 Longitude Float Should be negative 3 Latitude Float Should be positive 4 Time Zone Integer 5 – Eastern Time

6 – Central Time 7 – Mountain Time 8 – Pacific Time

5 Daylight Saving Flag Integer 1 – Day Light Saving Enabled 0 – Day Light Saving is Special e.g., in part of Arizona, Indiana

Notes:

a) There is always a header line on the NO.1 Row other than the above data part for each Monitoring Station Location Information File. The Header is made of strings with one single string for each column.

Site Longitude Latitude TimeZone DayLightSaving

5.2.3 Preparation of Species Mapping File


9

Figure 5.1: A Complete Sample Species Mapping File Illustration:

a) Reserved Keyword: "ambient:", "model:" (case sensitive) b) Operator allowed: "+", "-", "*", "\", "=" only c) Species Naming Convention: name started with a letter [a-zA-Z] d) Factor Species: species name shown on the right hand side of equation e) Target Species: species name shown on the left hand side of equation f) Ambient Scope: defined between the keywords "ambient" and "model". In Ambient Scope,

all of the Factor Species must either be shown on the first line of the Observed Ambient Data File, with corresponding data column provided, or have been defined as a Target Species by one of its preceding equations in Ambient Scope.

g) Model Scope: defined below the keyword "model:" In Model Scope, all of the Factor Species must either be a model species in one of the model files, or, have been defined as a Target Species by one of its preceding equations in Model Scope. In case a species is defined by more than one type of model file at the same time, the first definition occurrence will be chosen.

h) fRH is a lifetime exception. Its lifetime spans across both Ambient Scope and Model Scope. i) Where there is a Target Species in Ambient Scope, there must be a Target Species in Model

Scope in the same sequence order. Each of the Target Species in Ambient scope will be

ambient: SO4 = SO4 NO3 = NO3 OC = 1.4*OC1 + 1.4*OC2 + 1.4*OC3 + 1.4*OC4 + 1.4*OP EC = EC1 + EC2 + EC3 - OP SOIL = 2.2*AL + 2.49*SI + 1.63*CA + 2.42*FE + 1.94*TI CM = MT - MF RCFM = 1.375*SO4 + 1.29*NO3 + EC + OC + SOIL PM25 = MF PM10 = MT model: SO4 = ASO4J + ASO4I NO3 = ANO3J + ANO3I OC = fRH*AORGAJ + AORGAI + AORGPAJ + AORGPAI + AORGBJ + AORGBI EC = AECJ + AECI SOIL = A25J + A25I CM = ACORS + ASEAS + ASOIL RCFM = 1.375*SO4 + 1.29*NO3 + OC + EC + SOIL PM25 = RCFM PM10 = PM25 + CM


10

paired with the corresponding Target Species in Model Scope in the following plotting and statistical analysis.

j) Target species in Ambient Scope must be in the same order as those in the model scope. If you run two model’s comparison, then the first model’s Ambient Target Species must also be in the same order as those Ambient Target Species in the second model. (Plotting module will pair up the ambient and model Target Species data; or ambient, model1 and model2 Target Species data if running two model comparison based on these consistent order)

k) Define Unit for Target Species on Plot: Program will choose correct unit to be put on the plots based on the way the target species are named in the Species Mapping File(s):

Table 5-5. Naming Convention to control the unit being used on plots.

Target Species Units Notes

Extinction Coefficient 1/Mm Attach "BEXT_" to the Beginning of the Target Species name. e.g., BEXT_becon = 1000*EXT_Recon

Gas Phase ppmV Attach "G_" to the Beginning of the Target Species name. e.g., G_O3 = O3

Wet Deposition kg/ha Attach "_wdep" to the end of the Target Species name. e.g, SO4_wdep = ASO4J + ASO4I

Dry Deposition kg/ha Attach "_ddep" to the end of the Target Species name. e.g, SO4_ddep = ASO4J + ASO4I

Aerosol microgram/m3 Do nothing special. e.g., CM = ACORS + ASEAS + ASOIL

5.2.4 Preparation of Model Output File or Model Input File: CMAQ output/input in netcdf format can be feed in directly. CAMx model binary output needs to be pre-processed by a CAMx_to_netCDF_converter at first. Use similar method to treat other air quality models.

5.2.5 Define the Model Evaluation Configuration in the Driven Script File


11

Table 5-6. A Complete Configuration Flag Illustration:

Flag Name Configuration Option

Notes

MODEL_NUM 1, 2 Total number of model being evaluated, support up to 2 models at the same time. e.g. setenv MODEL_NUM 2

FIRST_MODEL A string with length <= 10 characters

First model name that will be shown on plots. e.g. setenv FIRST_MODEL basecase

FIRST_MODEL_GDTYPE LAMBERT, UTM, LATLON

Coordinate system that the first model used. e.g., setenv FIRST_MODEL_GDTYPE LAMBENT

FIRST_MODEL_FILENUM Any integer between 1 and 10

Model evaluation input's model file type number e.g., if you want to evaluate .CONC and .AEROVIS file for CMAQ, then setenv FIRST_MODEL_FILENUM 2 e.g., if you only want to evaluate .CONC file for CMAQ, then setenv FIRST_MODEL_FILENUM 1

FI_FILE${i} A string with length <= 256

First Model’s input file(s). ${i} <= ${FIRST_MODEL_FILENUM} e.g., if $FIRST_MODEL_FILENUM = 3, then, FI_FILE1, FI_FILE2 and FI_FILE3 needs to be defined. For a model file name in the format of /home/aqm/CCTM_cb4.CONC.YYYYDDD (YYYYDDD is a Julian date such as 1996215) then, setenv FI_FILE1 /home/aqm/CCTM_cb4.CONC.JDATE For a model file name in the format of /home/aqm/CCTM_cb4.CONC.YYYYMMDD (YYYYMMDD is a general date such as 19960326) then, setenv FI_FILE1 /home/aqm/CCTM_cb4.CONC.GDATE

SECOND_MODEL Need to be set only if $MODEL_NUM = 2

Refer to FIRST_MODEL

SECOND_MODEL_GDTYPE Need to be set only if $MODEL_NUM = 2

Refer to FIRST_MODEL_GDTYPE

SECOND_MODEL_FILENUM Need to be set only if $MODEL_NUM = 2

Refer to FIRST_MODEL_FILENUM

SE_FILE${i} Need to be set only if $MODEL_NUM = 2

Refer to FI_FILE${i}

MODEL_YEAR Any integer between 1990 and 2004

modeling year e.g. setenv MODEL_YEAR 1996 program can be scaled up to support other years very easily.

SDATE Any integer between 1 and 366

Model evaluation start Julian date e.g., setenv SDATE 183

EDATE Any integer between 1 and 366

Model evaluation end Julian date e.g., setenv EDATE 213


12

OBSERVED_NETWORK A string with length <= 256

Any monitor network name, name value will not affect model evaluation result. e.g., setenv OBSERVED_NETWORK IMPROVE

EVAL_FREQUENCY DAILY_AVG, HOURLY, WEEKLY_AVG, WEEKLY_TOT

They mean “daily average”, “hourly”, “weekly average” and “weekly total” respectively. They represent how the monitor network records its data e.g., setenv EVAL_FREQUENCY DAILY_AVG

TIMEZONE GMT, LOCAL Represents whether or not the monitor network record its data in GMT e.g., setenv TIMEZONE LOCAL

CON_FILE A string with length <= 256

Observed data text file location e.g., setenv CON_FILE /home/bwang/improve.dat

STN_FILE A string with length <= 256

Monitor station information text file location e.g., setnev STN_FILE /home/bwang/improve.stn

FIRST_MODEL_MAPPING A string with length <= 256

First model’s Species Mapping File location e.g., setenv FIRST_MODEL_MAPPING /home/bwang/improve_species_mapping.txt

SECOND_MODEL_MAPPING Need to be set only if $MODEL_NUM = 2

Refer to FIRST_MODEL_MAPPING

PLOTTING_SCALE LOG, NOLOG Show scatter plot in log or non-log scale e.g., setenv PLOTTING_SCALE NOLOG

PLOT_TIMESERIES YES, NO Create Time Series plots or not e.g., setenv PLOT_TIMESERIES YES

PLOT_ALLDAY_ONESITE YES, NO Create “All Day One Site” Scatter plot and “All Day One Site” statistical result or not e.g., setenv PLOT_ALLDAY_ONESIE NO

PLOT_ALLSITE_ONEDAY YES, NO Create “All Site One Day” Scatter plot and “All Site One Day” statistical result or not e.g., setenv PLOT_ALLSITE_ONEDAY YES

PLOT_ALLSITE_ALLDAY YES, NO Create “All Site All Day” Scatter plot and “All Site All Day” statistical result or not e.g, setenv PLOT_ALLSITE_ALLDAY YES

PLINEAR YES, NO Show linear regression equation line on scatter plot or not e.g., setenv PLINEAR NO

ACCURACY_PAIRED_PEAK YES, NO COEF_DETERMINATION YES, NO NORM_MEAN_ERROR YES, NO ROOT_MEAN_SQR_ERROR YES, NO FRAC_GROSS_ERROR YES, NO MEAN_ABS_GROSS_ERROR YES, NO MEAN_NORM_GROSS_ERROR

YES, NO

MEAN_BIAS YES, NO MEAN_NORM_BIAS YES, NO MEAN_FRAC_BIAS YES, NO NORM_MEAN_BIAS YES, NO OBS_MEAN YES, NO MOD_MEAN YES, NO SD_OBS YES, NO SD_MOD YES, NO CORRELATION_VARIANCE YES, NO BIAS_FACTOR YES, NO


13

Notes:

• Among Statistical options which starts from “ACCURACY_PAIRED_PEAK” to “BIAS_FACTOR”, only two statistical analysis results with "YES” option will be shown on plots, while all of these above statistical analysis results will be shown in Statistical Analysis Matrix ASCII Output File.

• If There are more than two statistical options were defined as “YES”, the first two "YES"'s statistical analysis will be chosen to be put on plots.

• If there are less than two statistical options were defined as “YES”, program will choose NO's item to fill up the deficiency.


1

Appendix A: Sample Driven Script File:

#!/bin/sh MODEL_EVAL_ROOT=/home/aqm/bwang/Model_Eval_Tool.v2/RESULT export MODEL_NUM=1 export FIRST_MODEL=CMAQ export FIRST_MODEL_GDTYPE=LAMBERT export FIRST_MODEL_FILENUM=1 export FI_FILE1=/home/base/CCTM_CONC.JDATE export MODEL_YEAR=2004 export SDATE=10 export EDATE=20 export OBSERVED_NETWORK=IMPROVE export EVAL_FREQUENCY=DAILY_AVG export TIMEZONE=LOCAL export CON_FILE=IMPROVE.Data.txt export STN_FILE=IMPROVE.STN.txt export FIRST_MODEL_MAPPING=/home/aqm/bwang/Species_Mapping_IMPROVE export PLOTING_SCALE=NOLOG export PLOT_TIMESERIES=YES export PLOT_ALLDAY_ONESITE=YES export PLOT_ALLSITE_ONEDAY=YES export PLOT_ALLSITE_ALLDAY=YES export PLINEAR=NO export ACCURACY_PAIRED_PEAK=NO export COEF_DETERMINATION=NO export NORM_MEAN_ERROR=NO export ROOT_MEAN_SQR_ERROR=NO export FRAC_GROSS_ERROR=YES export MEAN_ABS_GROSS_ERROR=NO export MEAN_NORM_GROSS_ERROR=NO export MEAN_BIAS=NO export MEAN_NORM_BIAS=NO export FRACTIONAL_BIAS=YES export NORM_MEAN_BIAS=NO export OBS_MEAN=NO export MOD_MEAN=NO export SD_OBS=NO export SD_MOD=NO export CORRELATION_VARIANCE=NO export BIAS_FACTOR=NO cd $MODEL_EVAL_ROOT /home/aqm/bwang//Model_Eval_Tool.v1/bin/rmcplot cd -


2

Appendix B: Sample Ambient Data Input Files

1. Daily Average Data File: please refers to Table 1-1

2. Hourly Data File: please refers to Table 1-2

3. Weekly Total or Weekly Average File: please refers to Table 1-3

4. Monitor Station Information File: please refers to Table 1-4 5.

Station year Jdate fRH EC1 JARB1 1996 164 2.1700 0.234430 JARB1 1996 171 2.1700 0.136350 YOSE1 1989 91 -999.0000 0.027150 YOSE1 1989 123 -999.0000 0.123720

station year jdate hour O3 CO 01_117_0004 1999 201 15 0.053 -999.0 01_117_0004 1999 201 16 0.063 -999.0 01_117_0004 1999 201 17 0.065 -999.0 01_117_0004 1999 201 18 0.064 -999.0 04_013_0019 1999 165 18 0.067 0.1 04_013_0019 1999 165 19 0.069 -999.0 04_013_0019 1999 165 20 0.071 0.1 04_013_3002 1999 162 11 0.002 1.1 04_013_3002 1999 162 12 0.002 1.6 04_013_3002 1999 162 13 0.004 1.3

Site Year stjd enjd sthr enhr NH4 NO3 SO4 AL02 2001 184 191 6 6 0.39 1.41 1.07 AL99 2001 205 212 6 6 0.46 0.94 0.94 MN16 2001 184 191 6 6 0.28 0.59 0.4 MN16 2001 191 198 6 6 0.58 2.12 1.05 MN16 2001 198 205 6 6 0.61 1.82 1.04 MN16 2001 205 212 6 6 0.37 1.26 0.86 MN16 2001 212 219 6 6 0.81 1.59 1.06

Station Longitude Latitude timezone daylight AGTI1 -116.9706 33.4636 8 1 ARCH1 -109.5953 38.7786 7 1 COHU1 -84.6262 34.7851 5 1 INGA1 -112.1289 36.0776 7 0 MELA1 -104.4756 48.4871 7 1 SIPS1 -87.3388 34.3433 6 1 YOSE1 -119.7041 37.7125 8 1 ZION1 -113.2243 37.459 7 1


3

Appendix C: Sample Outputs

1. Scatter Plots: All Site All Day for One Model

2. Scatter Plots: All Site One Day for Two Models Comparison


4

3. Scatter Plots: All Day One Site for Two Models Comparison

4. Timeseries Plots: All Day One Site for Two Models Comparison


5


6

5. Sample Statistical Result Output File in ASCII format:

Sample Input File

AllSite_AllDay OC SOIL CM PM25_1 PP 1.88713 0.53168 -0.76012 -0.20708 CD 0.4019 0.00513 0.00239 0.39824 NME 67.97067 116.30432 72.99919 42.56641 RMSE 2.79382 1.53313 7.303 8.02297 FE 58.85895 76.2376 91.87093 45.16153 MAGE 1.83261 0.92165 4.47743 4.11794 MNGE 71.6768 191.64737 92.67808 45.7034 MB 0.64114 0.41694 -3.61057 -0.58748 MNB 31.19274 163.20903 -13.49308 5.78324 MFB 2.21398 33.4762 -73.1078 -10.17745 NMB 23.77978 52.61406 -58.86608 -6.07266 OM 2.69617 0.79245 6.13353 9.67415 PM 3.33732 1.20939 2.52296 9.08667 OSD 1.70504 0.83567 6.04956 9.7872 PSD 3.47244 1.28242 2.31755 8.77814 CV 7.8054 2.3505 53.33377 64.36807 BF 1.31193 2.63209 0.86507 1.05783 AllDay_ACAD1 OC SOIL CM PM25_1 PP 0.72068 4.91801 -0.50307 0.02821 CD 1 0.99865 0.91572 0.99133 NME 68.50737 413.46997 65.89732 14.07108 RMSE 2.80513 3.06798 4.39976 3.53661 FE 70.87134 114.61432 112.20625 35.83739 MAGE 2.60761 2.02888 4.33284 2.94411 MNGE 66.77332 299.92639 70.36227 28.18315 MB 1.03398 2.02888 -4.33284 -2.08688 MNB 5.29427 299.92639 -70.36227 -26.30231 MFB -17.89363 114.61432 -112.20625 -33.98272 NMB 27.16482 413.46997 -65.89732 -9.97405 OM 3.80632 0.4907 6.57513 20.9231 PM 4.8403 2.51958 2.2423 18.83622 OSD 1.7631 0.50605 2.808 21.50728 PSD 5.45082 3.32419 2.22654 24.28077 CV 7.86874 9.41248 19.35787 12.50764 BF 1.05294 3.99926 0.29638 0.73698 AllSite_191 OC SOIL CM PM25_1 PP 1.88713 0.53168 -0.52088 0.55808 CD 0.70826 0.00816 0.00541 0.47225 NME 69.12882 112.80376 69.89825 39.10355 RMSE 3.6868 1.68326 6.53766 5.29795 FE 58.06829 71.21825 86.19971 38.17971 MAGE 2.20183 0.92469 4.32657 3.52189 MNGE 63.0142 168.0172 66.98338 43.00843 MB 0.96725 0.44333 -3.62799 0.59163 MNB 13.22844 139.3396 -37.81644 11.03974 MFB -11.62073 30.68671 -74.90667 -1.76489 NMB 30.36796 54.08189 -58.61226 6.56887 OM 3.18511 0.81974 6.18981 9.00658 PM 4.15236 1.26307 2.56182 9.5982 OSD 1.8261 0.87402 5.05818 5.44213 PSD 4.99881 1.46682 2.5603 7.28834 CV 13.59249 2.83336 42.74105 28.06827 BF 1.13228 2.3934 0.62184 1.1104


7

6. Sample Processed Ambient Data Output File in ASCII format:

7. Sample Processed Model Data Output File in ASCII format:

Station Col Row Year Jdate OC SOIL CM PM25_1 ACAD1 139 82 1999 191 2.55962 0.22446 4.2278 11.1865 ACAD1 139 82 1999 195 -999 0.17334 5.8117 6.006 ACAD1 139 82 1999 198 5.05302 1.07429 9.6859 45.5768 BADL1 65 70 1999 191 1.54392 0.25978 2.4793 5.9323 BADL1 65 70 1999 195 3.17002 1.64628 13.6866 7.7036 BADL1 65 70 1999 198 1.51508 0.78939 5.8141 5.487 CANY1 46 56 1999 191 1.60454 0.40501 2.0595 3.8647 CANY1 46 56 1999 195 0.92022 0.33061 2.3875 3.0898 CANY1 46 56 1999 198 1.2362 0.34136 2.378 3.6693 GLAC1 42 88 1999 191 5.83758 2.02787 27.5948 13.3815 GLAC1 42 88 1999 195 2.77466 4.0266 55.1151 9.1526 GLAC1 42 88 1999 198 2.35284 0.30971 4.8683 4.0963 YELL1 47 75 1999 191 2.98228 0.48216 3.7281 5.4938 YELL1 47 75 1999 195 2.44104 0.76091 3.631 5.0309 YELL1 47 75 1999 198 1.10054 0.29117 0.8602 2.5 YOSE1 22 58 1999 191 3.8766 0.67976 4.1265 7.2512 YOSE1 22 58 1999 195 3.88486 0.72996 5.8787 8.2995 YOSE1 22 58 1999 198 2.55458 0.67204 6.0875 7.0298

Station Col Row Year Jdate OC SOIL CM PM25_1 ACStation Col Row Year Jdate OC SOIL CM PM25_1 ACAD1 139 82 1999 191 2.59186 0.22446 4.2278 11.1865 ACAD1 139 82 1999 195 4.12056 0.17334 5.8117 6.006 ACAD1 139 82 1999 198 5.05302 1.07429 9.6859 45.5768 BADL1 65 70 1999 191 1.54392 0.25978 2.4793 5.9323 BADL1 65 70 1999 195 3.17002 1.64628 13.6866 7.7036 BADL1 65 70 1999 198 1.51508 0.78939 5.8141 5.487 CANY1 46 56 1999 191 1.60454 0.40501 2.0595 3.8647 CANY1 46 56 1999 195 0.92022 0.33061 2.3875 3.0898 CANY1 46 56 1999 198 1.2362 0.34136 2.378 3.6693 GLAC1 42 88 1999 191 5.83758 2.02787 27.5948 13.3815 GLAC1 42 88 1999 195 2.77466 4.0266 55.1151 9.1526 GLAC1 42 88 1999 198 2.35284 0.30971 4.8683 4.0963 YELL1 47 75 1999 191 2.98228 0.48216 3.7281 5.4938 YELL1 47 75 1999 195 2.44104 0.76091 3.631 5.0309 YELL1 47 75 1999 198 1.10054 0.29117 0.8602 2.5 YOSE1 22 58 1999 191 3.8766 0.67976 4.1265 7.2512 YOSE1 22 58 1999 195 3.88486 0.72996 5.8787 8.2995 YOSE1 22 58 1999 198 2 55458 0 67204 6 0875 7 0298

Appendix F: Appendix to Section 11, “Task 10: Continued ... fileAppendix F: Appendix to Section 11, “Task 10: Continued Improvement to Model Evaluation Software” ... TABEL OF

Documents