Calibration of numerical model output using nonparametric spatial density functions Jingwen Zhou *1 , Montserrat Fuentes 1 , and Jerry Davis 2 1 North Carolina State University, Department of Statistics, NC, 27606 2 U.S. Environmental Protection Agency May 24, 2011 Abstract The evaluation of physically based computer models for air quality applications is crucial to assist in control strategy selection. Selecting the wrong control strategy has costly economic and social consequences. The objective comparison of mean and variances of modeled air pollution concentrations with the ones obtained from observed field data is the common approach for assessment of model performance. One drawback of this strategy is that it fails to calibrate properly the tails of the modeled air pollution distribution, and improving the ability of these numerical models to characterize high pollution events is of critical interest for air quality management. In this work we introduce an innovative framework to assess model performance, not only based on the two first moments of models and field data, but on their entire distribution. Our approach also compares the spatial dependence and variability in both models and data. More specifically, we estimate the spatial quantile functions for both models and data, and we apply a nonlinear monotonic regression approach on the quantile functions taking into account the spatial dependence to compare the density functions of numerical models and field data. We use a Bayesian approach for estimation and fitting to characterize uncertainties in data and statistical models. We apply our methodology to assess the performance of the US Environmental Protection Agency (EPA) Community Multiscale Air Quality model (CMAQ) to char- acterize ozone ambient concentrations. Our approach shows a 75% reduction in the root of mean square error (RMSE) compared to the default approach based on the 2 moments of models and data. Key Words: Bayesian spatial quantile regression, CMAQ calibration, non-crossing quantile * Corresponding author. Email address: [email protected]1
29
Embed
Calibration of numerical model output using nonparametric spatial
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Calibration of numerical model output usingnonparametric spatial density functions
Jingwen Zhou∗1, Montserrat Fuentes 1, and Jerry Davis 2
1North Carolina State University, Department of Statistics, NC, 276062U.S. Environmental Protection Agency
May 24, 2011
Abstract
The evaluation of physically based computer models for air quality applicationsis crucial to assist in control strategy selection. Selecting the wrong control strategyhas costly economic and social consequences. The objective comparison of mean andvariances of modeled air pollution concentrations with the ones obtained from observedfield data is the common approach for assessment of model performance. One drawbackof this strategy is that it fails to calibrate properly the tails of the modeled air pollutiondistribution, and improving the ability of these numerical models to characterize highpollution events is of critical interest for air quality management.
In this work we introduce an innovative framework to assess model performance,not only based on the two first moments of models and field data, but on their entiredistribution. Our approach also compares the spatial dependence and variability inboth models and data. More specifically, we estimate the spatial quantile functionsfor both models and data, and we apply a nonlinear monotonic regression approachon the quantile functions taking into account the spatial dependence to compare thedensity functions of numerical models and field data. We use a Bayesian approach forestimation and fitting to characterize uncertainties in data and statistical models.
We apply our methodology to assess the performance of the US EnvironmentalProtection Agency (EPA) Community Multiscale Air Quality model (CMAQ) to char-acterize ozone ambient concentrations. Our approach shows a 75% reduction in theroot of mean square error (RMSE) compared to the default approach based on the 2moments of models and data.Key Words: Bayesian spatial quantile regression, CMAQ calibration, non-crossingquantile
6 Application: calibration of eastern US ozone data
To compare spatial surfaces and distributions between the observed data and the CMAQ
output, we choose two data sources in the eastern US. The prior distributions of the CMAQ
quantile parameters β and calibration parameters α are determined using restricted least
squares with large variances. We use the Metropolis-Hastings approach for updating β,
α, σ2mB
, σ2ms
, ρmB, and ρms individually. The likelihood is calculated by the likelihood
approximation approach of QY (τ |Z, s) on a grid of 100 equally-spaced τk ∈ [0, 1]. The
I-splines have interior knots at (0.2, 0.8). The weight parameters ωU , ωL are supposed to
have a dense uniform distribution, and we choose a known value of 1000 for the purpose of
computing efficiency.
15
The estimated CMAQ quantile and its calibration for monitoring data are plotted in
Figure 6. Both of the two spatial-quantile processes are obtained by our Bayesian algorithm.
At τ= 0.05, 0.5, and 0.95, the empirical root mean integraded squared error RMISE =
[n−1
n∑i=1
(qzτ (si)− qyτ (si))]1/2 is calculated. The RMISE at the 50th quantile is equal to 7.13,
while the value is 13.17 for the 5th percentile and 15.46 for the 95th percentile, respectively.
The results show agreement between the distributions of CMAQ output and the monitoring
data at their median level, but show large differences for the tails. Also, from the contour
plot, we conclude that the CMAQ data are smoother than the observed spatial structure,
indicating that the physically based numerical models can not capture both the extreme
values and spatial correlations that are in the monitoring data.
Due to these differences, it is critical to calibrate the CMAQ data considering its spatial-
quantile structure. Based on the estimated CMAQ-monitoring calibration model, a nonlinear
transformation is made to the CMAQ data using G(Zt,s, A(τ, s)) = α0 +M∑m
Im(Zt,s)αm,
where α are the posterior estimations. Then we rescale G(Zt,s, A(τ, s)) to its original
range. Because G is a monotonic function, the quantiles of G(Zt,s,A(τ, s)) are equal to
G(QZ(τ |s),A(τ, s)) = QY (τ |s). We calculate qM(τk, s) (the sample quantiles of the mon-
itoring data), qC(τk, s) (the quantiles of the Bayesian calibrated data) and qL(τk, s) (the
quantiles from the linear regression model), at τk ∈ [0.01, 0.97] and location s. The root
mean squared error RMSE(qM , q|s) = [K−1
K∑k=1
(qM(τk, s) − q(τk, s))]1/2 is calculated for
both linear regression method and our Bayesian approach at each location s. Figure 7 shows
maps of the above quantiles when τ = 0.95, and the difference root mean squared error
DRMSE = (RMSE(qM , qC |s) −RMSE( ˆqM , qL|s)) /RMSE( ˆqM , qL|s) between the linear
regression method and the quantile calibration method. The differences range from -77%
to 66%, and is -30% on average. The results show that 57 out of 68 (83.8%) sites have a
reduced RMSE using the Bayesian calibration method. As we expected, the performance of
the calibrated CMAQ model data is consistent with the performance of the monitoring data
in terms of the quantile level τ .
16
7 Discussion
In this paper, we propose a Bayesian spatial quantile calibration model for adjusting the
behavior between CMAQ model output and monitoring data. Particularly, we focus on
calibrating the extreme values. Thus, instead of using the default approach based on the
first two moments of the models and data, we calibrated the two data sources through their
underlying quantile processes. We investigated two quantile processes: (1) estimated spatial-
quantiles for CMAQ; (2) the predicted monitoring quantiles based on CMAQ calibrations.
We conclude that the CMAQ and monitoring data are similar around their median values,
but present large differences at the upper and lower tails over eastern US. The investigated
transformation between CMAQ and the observed quantile process is then applied to model
output data, resulting in a calibrated series whose spatial and quantile structure is consistent
with the monitoring data.
Due to the different spatial scales of the CMAQ output and the observations, we as-
sume that both the CMAQ and observed quantile processes have a spatial structure with
exponential decay parameters. This assumption is made to obtain computing efficiency.
More complicated spatial processes, i.e., conditional autoregressive (CAR) model for grid-
ded CMAQ data, and spatial linear coregionalization models for calibrating spatial quantiles,
will be considered in future work.
Also, temporal components, known to be an important factor for ozone trend, play less of
a role when taking both quantile and spatial structure into account (see Figure 8). Another
approach is to consider the smoothing spline as a covariate, then evaluate its effect on
the conditional distributions (see Figure 9 for the individual quantile surfaces for both the
CMAQ data and monitoring data at a specific site); however, the quantile calibrations, as
a tranformations of one quantile process to another simultaneously, require a valid quantile
process with the non-crossing and monotonic constraints. An efficient way to calibrate this
type of spatial-temporal-quantile surface simultaneously is another avenue for future work.
17
8 Appendix
If the likelihood is given by fomula (18) and p(α) ∝ 1, then the posterior distribution of α,
π(α|Y ), will have a proper distribution. In other words:
0 <
∫π(α|Y )dα <∞ (30)
Proof. Suppose y(1) ≤ y(2)... ≤ y(n), and both ωL and ωU are two finite positive numbers.
We first consider two extreme situations:
(1) yi < α0, for all yi, i=1, 2,..., n. Hence, we have y(n) < α0 and:∫π(α|Y )dα =
∫ n∏i=1
fY (yi|(α)π(α)dα ∝∫{α0≥y(n)}
exp{−∑i
ωL(α0 − yi)}dα
∝∫{α0≥y(n)}
exp{−nωL(α0 − y)}dα
∝ 1
nωLexp{−nωL(y(n) − y)}
∈ (0,∞) (31)
(2) Another situation is: yi > α0 +∑
αm, for all yi, i=1, 2,..., n. As a result, we have
y(1) > α0 +∑
αm and:
∫π(α|Y )dα =
∫ n∏i=1
fY (yi|(α)π(α)dα
∝∫{α0+
∑m αm≤y(1)}
exp{−∑i
ωU(yi − (α0 +∑m
αm))}dα
∝∫{α0+
∑m αm≤y(1)}
exp{−nωU(y − (α0 +∑m
αm)}dα
∝ 1
nωUexp{−nωU(y − y(1))}
∈ (0,∞) (32)
In general, suppose y(1)..., y(u)< α0 ≤ y(u+1)...≤ y(l) ≤ α0 +∑m
αm <y(l+1)..., y(n) (see
18
Figure 10), then we have:∫π(α|Y )dα ∝ 1
uωUexp{−ωU(uy(u) −
u∑i=1
y(i))}
× 1
(n− l)ωLexp{−ωL(
n∑i=l+1
y(i) − (n− l)y(l+1))}
×∫ l
i=u+1
{ 1
∂∂τQY (τ)
|τ=τ(y(i))}dα
∈ (0,∞) (33)
The statement is proved.
References
[1] M. Kennedy and A. O’Hagan, “Bayesian calibration of computer models,” Journal of theRoyal Statistical Society: Series B (Statistical Methodology), vol. 63, no. 3, pp. 425–464,2001.
[2] C. Paciorek, “Combining spatial information sources while accounting for systematicerrors in proxies,” Journal of the Royal Statistical Society, 2000.
[3] M. Fuentes and A. E. Raftery, “Model evaluation and spatial interpolation by bayesiancombination of observations with outputs from numerical models,” Biometrics, vol. 61,no. 1, pp. 36–45, 2005.
[4] C. Y. Lim, M. Stein, J. K. Ching, and R. Tang, “Statistical properties of differencesbetween low and high resolution cmaq runs with matched initial and boundary condi-tions,” Environmental Modelling and Software, no. 25(1), pp. 158–169, 2010.
[5] B. K. Eder and S. Yu, “A performance evaluation of the 2004 release of models-3 cmaq,”Air Pollution Modeling and Its Application XVII, no. 6, pp. 534–542, 2007.
[6] V. J. Berrocal, A. E. Gelfand, and D. M. Holland, “A spatio-temporal downscaler foroutput from numerical models,” Journal of Agricultural, Biological, and EnvironmentalStatistics, vol. 15, pp. 176–197, 2010.
[8] H. Kozumi and G. Kobayashi, “Gibbs sampling methods for bayesian quantile regres-sion,” Journal of Statistical Computation and Simulation, 2011.
19
[9] Y. Wu and Y. Liu, “Stepwise multiple quantile regression estimation,” Statistics andIts Interface, vol. 2, 2009.
[10] H. D. Bondell, B. J. Reich, and H. Wang, “Non-crossing quantil regression curve esti-mation,” Biometrika, vol. 97, 2010.
[11] S. Tokdar and J. Kadane, “Simultaneous linear quantile regression: A semiparametricbayesian approach.,” In press, 2010.
[12] B. J. Reich, M. Fuentes, and D. Dunson, “Bayesian spatial quantile regression,” Journalof the American Statistical Association, vol. In press, 2010.
[13] M. Lavine, “On an approximate likelihood for quantiles,” Biometrika, vol. 82, 1995.
[14] D. B. Dunson and J. A. Taylor, “Approximate bayesian inference for quantiles,” Journalof Nonparametric Statistics, vol. 17, 2005.
[15] D. Byun and K. L. Schere, “Review of the governing equations, computational algo-rithms, and other components of the models-3 community multscale air quality (cmaq)modeling system,” Appl. Mech. Rev., 2006.
[16] K. Yu and R. A. Moyeed, “Bayesian quantile regression,” Statistics & Probability Letters,vol. 54, no. 4, pp. 437 – 447, 2001.
[17] B. Cai and D. B. Dunson, “Bayesian multivariate isotonic regression splines:applicationsto carcinogenicity studies,” Journal of the American Statistical Association, vol. 102,pp. 1158–1171, 2007.
[18] J. O. Ramsay, “Regression splines in action,” Statistical Science, vol. 3, pp. 425–441,1988.
[19] Z. Q. J. Lu and D. B. Clarkson, “Monotone spline and multidimensional scaling,”http://www.reocities.com/zqjlu/asa2.pdf.
20
−105 −95 −90 −85 −80 −75 −70
30
35
40
45
Longitude
La
titu
de
60
70
80
90
100
*
CMAQ 90th quantilefrequentist approach
−105 −95 −90 −85 −80 −75 −70
30
35
40
45
Longitude
La
titu
de
60
70
80
90
100
*
Monitoring 90th quantilefrequentist approach
Histogram of CMAQ ozone
De
nsi
ty
30 40 50 60 70 80 90
0.0
00
.02
0.0
4
Histogram of monitoring ozone
De
nsi
ty
20 40 60 80 100 120
0.0
00
0.0
10
0.0
20
0 50 100 150
0.0
00
0.0
15
0.0
30
Density comparison
De
nsi
ty
CMAQMonitorming data
0.0 0.2 0.4 0.6 0.8 1.0
40
60
80
10
0
Sample quantile
Tau1
ozo
ne
CMAQMonitorming data
Figure 1: Maps of the sample 90th quantile levels of the ozone concentration; the ” ∗ ” representsa randomly selected (i.e., 59th) monitoring site. We draw the maps for both observed and CMAQdata to identify their differences.
21
MODEL DATA
…
System Calibration: 1. Model CMAQ Quantile
MONITORING DATA
…
System Calibration: 2. Link with Observed Quantile
Quantile Process for CMAQ
),|( 1 sBtZ uQ , ),|( 2 sBtZ uQ … ),|( sBtKZ uQ
),( 11 stY
),( 22 stY
),( nn stY
Quantile Process for Observations
),|( 1 stY uQ , ),|( 2 stY uQ … ),|( stKY uQ
)(
.
.
.
)(
)(
2
1
nsτ,Α
sτ,Α
sτ,Α
Estimated Parameters
System Calibration:
3. Calibrating CMAQ to Monitoring data
),( 1 1sBtZ
),( 2 2sBtZ
),(nsBntZ
Figure 2: A process chart for spatial quantile calibration for going from CMAQ to the observations.We calibrate the original CMAQ data with the corresponding observations through their underlyingspatial-quantile processes.
22
Spatial – quantile process for CMAQ
0
1
( | ) ( ) ( ) ( )M
Z m m
m
Q s s I s
mI : Monotonic I spline;
Spatially variant coefficients β(s) for CMAQ ( | )ZQ s ;
Likelihood approximation by ( | )ZQ s ;
,A ( s) : Monotonic
mapping from ( | )s Z
to
( | )Q sY
Spatial – quantile process for monitoring data
0
1
( | ) ( ) ( ( | )) ( )M
m m
m
Q s s I s s
Y Z
Spatially variant calibration parameters α(s) ;
Likelihood approximation by predictive CMAQ
( | )s Z and monitoring quantile ( | )Q sY .
( | )s Z: Predictive
posterior quantile for
CMAQ
Figure 3: The Bayesian framework for the spatial-quantile calibration approach. The left andmiddle panels present CMAQ quantile and monitoring quantile estimates at the 59th site. The rightpanel provides the 90th ozone quantile over the eastern U.S. using our Bayesian spatial quantilecalibration method.
23
−0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Density
De
nsi
ty
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
Figure 4: Simulation results for the simple quantile functions in Example 1. Interior knots areplaced at 0.15, 0.8 with a weight parameter equal to 100.
0.0 0.2 0.4 0.6 0.8 1.0
−0.2
0.2
0.6
1.0
CQRS
time
0.0 0.2 0.4 0.6 0.8 1.0
−0.2
0.2
0.6
1.0
Real process
time
0.0 0.2 0.4 0.6 0.8 1.0
−0.2
0.2
0.6
1.0
BSQ
time
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Simulated data
time
y
Figure 5: Bayesian nonparametric quantile (BSQ) regression from Example 2. Interior knots areplaced at 0.2, 0.8 with weight parameter equal to 2000. We add a sin function to mimic the temporaltrend in reality. The classic quantile regression spline (CQRS) has crossed quantile curves, whichviolate the concept of a valid quantile process.
24
−105 −100 −95 −90 −85 −80 −75 −70
30
35
40
45
Longitude
La
titu
de
10
20
30
40
34
36
36
38
38
38
40
40
40
40
42
42
42
42
44
44
46 48
5 th CMAQ quantileBayesian approach
−105 −100 −95 −90 −85 −80 −75 −70
30
35
40
45
Longitude
La
titu
de
10
20
30
40
10
20
20
25
25
25
30
30 30
30
30
35
35 35
35
35
40
45
5 th monitoring quantileBayesian approach
−105 −100 −95 −90 −85 −80 −75 −70
30
35
40
45
Longitude
La
titu
de
40
50
60
70
80
55 60
60
60
65
65
65
65
70
70
70
75
75
75
80
50 th CMAQ quantileBayesian approach
−105 −100 −95 −90 −85 −80 −75 −70
30
35
40
45
Longitude
La
titu
de
40
50
60
70
80
45
50
55 60
60
60
60 60
65
65
70
70
75
75
75
80
50 th monitoring quantileBayesian approach
−105 −100 −95 −90 −85 −80 −75 −70
30
35
40
45
Longitude
La
titu
de
90
100
110
120
130
140
150
100 105
105 105
110
110
115
115
115
115
120
120
120
120
125
95 th CMAQ quantileBayesian approach
−105 −100 −95 −90 −85 −80 −75 −70
30
35
40
45
Longitude
La
titu
de
90
100
110
120
130
140
150
95
100
105
105
110
110
115
115 120
120
120
125 125
125
130
130 130 130
135
135
140
95 th monitoring quantileBayesian approach
Figure 6: Quantile comparison plots. The 5th, 50th and 95th quantile for the Bayesian estimatedCMAQ and calibrated monitoring data.
Figure 7: The 95th quantile for the monitoring data, using both the quantile calibration and linearregression method. We compare the differences between the linear regression and the Bayesianquantile calibration methods in terms of the RMSE.
26
0 50 100 150
−0.5
0.0
0.5
1.0
1.5
CMAQ temporal quantile
time
ozon
e
frequentist approach
0 50 100 150
−0.5
0.0
0.5
1.0
1.5
CMAQ temporal quantile
time
ozon
e
Bayesian approach
0 50 100 150
−0.5
0.0
0.5
1.0
1.5
monitoring temporal quantile
time
ozon
e
frequentist approach
0 50 100 150
−0.5
0.0
0.5
1.0
1.5
monitoring temporal quantile
time
ozon
e
Bayesian approach
Figure 8: The CMAQ and monitoring temporal quantiles at site 4. Under the non-crossingconstraints, ozone quantile curves show little trend for both the CMAQ models and the monitoringdata.
27
OBS.Quantile surface
Error using packet 1NAs are not allowed in subscripted assignments
0.2
0.4
0.6
0.8
1.0
OBS.Quantile surface
50 100 1500.20.4
0.60.8
0
20
40
60
80
100
120
t
τ
Q(y)
0
20
40
60
80
100
120
CMAQ.Quantile surface
50 100 1500.20.4
0.60.8
0
20
40
60
80
100
120
t
τ
Q(y)
20
40
60
80
100
Figure 9: Temporal quantile surfaces at the 19th location for both the CMAQ data and Observeddata.
28
0.0
0.1
0.2
0.3
0.4
y
p(y)
αα0 αα0 ++ ∑∑ααm
●
yu
●
yl++1
Figure 10: The likelihood approximation using estimated quantile functions.