Modeling Ground Ozone for the Contiguous United States By Michael Tuffly, Ph.D. ERIA Consultants, LLC GIS in the Rockies 2013 Cable Center Denver, Colorado 10/9/2013 http://www.eriaconsultants.com [email protected]
Jan 17, 2015
Modeling Ground Ozone for the Contiguous United States
By
Michael Tuffly, Ph.D.
ERIA Consultants, LLC
GIS in the Rockies 2013
Cable Center
Denver, Colorado
10/9/2013
http://www.eriaconsultants.com
What is Ozone
Chemically
It is a molecule containing 3 Oxygen atoms (aka triatomic) oxygen (O3).
Ozone is a powerful oxidizer (e.g. combines with Oxygen).
Examples of Oxidation
Rust on metal objects
Fire
“Oxidation is an increase in the oxidation number or a real or apparent loss of one or more electrons.” (Miller 1981).
Miller. G. T., 1981. Chemistry: A basic Introduction Second Edition. Wadsworth Publishing Company, Belmont, Californai. USA.
Ozone’s Location
Ozone which is located in the lower stratosphere (20 – 50 km in elevation) is beneficial to life on earth.
In the lower stratosphere ozone molecules form a protective layer that filters out much of the high-energy solar ultraviolet radiation.
3O2 2 O3 Ultraviolet Radiation
Ground Ozone Ozone at ground level can be an issue to the health of plants and animals
One way ground ozone is formed is via a reaction of NOx VOC’s, and sunlight.
The primary source of NOx is from internal combustion engines (i.e. cars) and coal fire power plants.
Many sources of VOC’s
Methane, CFC, Benzene, Methylene chloride, etc…
VOC’s have a high vapor pressure which produces low boiling point temperatures
Low boiling point temperatures allows VOC’s to escape to the atmosphere
Some Effects of Ground Ozone
In animals
Lung tissue damage can result from inhalation of ozone
In plants
Leaf surface damage (oxidation)
Disruption in stomata cell functions
Causing excessive water loss emulating drought conditions (Smith et al. 2008).
Smith, G. C., J. W. Coulston, and B. M. O'Connell. 2008. Ozone Bioindicators and Forest Health: A Guide to the Evaluation, Analysis and Interpretation of the Ozone Injury Data in the Forest Inventory and Analysis Program. United States Department of Agriculture, Forest Service General Technical Report 34
Other ways ozone can be formed
Lighting (natural) (small contributor)
Shorts in electrical equipment (anthropogenic)
Provides that unique smell (very small contributor)
Ozone is also use as a replacement for Chlorine (potentially high contributor; but, really unknown)
In swimming Pools
In sewage treatment plants
In domestic water supply as a disinfectant
Modeling Ozone
Source ozone data are from EPA CASTNET
ftp://ftp.epa.gov/castnet/data/
Data are from a single year 2010
In the summer months during the “Ozone Activity Envelope” (OAE)
June – August from 1:00 PM – 5:00 PM
Base data for ozone are recorded every hour
Only 73 ground ozone collections sites were used
This is part of a larger study over a ten year time period. These 73 sites were the only sites consistent from 2002 to 2011.
Five variables were extracted from these data for the OAE and averaged:
Ozone (PPB)
Wind Speed (MS)
Relative Humidity (% * 100)
Solar Radiation (Watts per m2)
Temperature (degrees C * 10)
Modeling Methods
Four different modeling methods were investigated:
Inverse Weighted Distance (IDW)
Ordinary Kriging
Generalized Linear Model (GLM)
Geographically Weighted Regression (GWR)
Results for all four modeling methods were:
Compared with a set of sample data not used in model creation via the Mean Squared Error Predicted (MSEP) method.
Autocorrelation
First, need to know if the data are autocorrelated
If the data are autocorrelated then we can use:
IDW
Kriging
Results from Morans’I (a test for autocorrelation) (Moran 1950
Data have a strong positive autocorrelation
Data points that are close together have similar values
Index = 0.421; p-value = 0
If data were not autocorrelated
Our best estimate using IDW or Kriging would be the mean for the whole study site.
Moran, P.A.P. (1950). Notes on continuous stochastic phenomena, Biometrika 37, pp17-23.
IDW
Called a deterministic function
Using the same input parameters will get the same results.
Data needs to be spatially autocorrelated
Three Basic parameters are required
Number of nearest neighbors
Power
Study area boundary
Useful for Continuous data (e.g. rainfall, elevation)
Not useful for: Categorical, Binary, Ordinal
Identifying IDW Parameters
Cross Validation
Remove one data point at one location
Calculate a new value for that point using the neighboring points
Repeat this for all points
Calculate the mean squared error and variance
Mean Squared Error Predicted (MSEP) gives:
The best number of nearest neighbors
The best power
The fewer number of nearest neighbors produces good local estimates; but, poor global.
A larger number of nearest neighbors produces good global estimates; but, poor local.
Need to balance between local and global estimates.
IDW
1
1
1
niy
i in
yi i
ZDx
D
=
=
=∑
∑
y = some exponent:; usually 1 or 2
Distance is calculated using the Pythagorean Theorem
A
C D
B
a2 + b2 = c2
For Distance A to x (C) 1.582 + 1.582 = 2.232
2.4964 + 2.4964 = 4.9729 4.97290.5 = 2.23
a
b
c
0 10 20 30 40 50 60 70
3540
4550
5560
Year = 2010 Power = 1, MSE Resi
num_neighbors
out.m
se
0 10 20 30 40 50 60 70
3540
4550
5560
Year = 2010 Power = 2, MSE Resd
num_neighbors
out2
.mse
41.8
8
43.3
8
Ordinary Kriging (Krige 1951) (Matheron 1962)
A stochastic or indeterminate interpolation process
Where estimates or interpolations at an unobserved location are made based upon: the weighted average of values at an observed location
Weights are base upon
The distance separating points
The function for the variogram
A variogram is used to identify key Kriging parameters:
Sill, Range, Nugget, and covariance
Assumes an unknown stationary mean.
Stationary mean refers that the mean over the area behaves predictably (e.g.. Gaussian).
Consider unbias
Mean residual sum to zero
Variance of error is minized
BLUE
Best Linear Unbias Estimator (Isaaks and Srivastava 1989)
Isaaks, E. H., and Srivastava, R (1989). An Introduction to Applied Spatial Statistics. Oxford, UK: Oxford University Press.
Krige, D. G. 1951 A statistical approach to some basic mine valuation problems on the Witwatersrand. Journal of the Chemical, Metal and Mining Society of South Africa 52 (6): 119 – 139)
Matheron, G. 1962. Traite de geostatistique appliquee. Editions Technip.
R output from Variogram Spherical Least Squares Estimate Nugget = 7.7377 Sill = 47.48165 Range = 1100000 AICC = 125.5306
Gaussian Least Squares Estimate Nugget = 13.6845 Sill = 52.25631 Range = 1100000 AICC = 128.4038
Exponential Nugget = 9.2776
Sill = 71.61078
Range = 1100000
AICC = 132.1289
Estimates: Nugget = 15 Sill = 30 Range = 1,100,000
Spherical and Gaussian have an AICC is less than 3 units apart; So there is no difference.
Graphic R Output
0e+00 2e+05 4e+05 6e+05 8e+05 1e+06
010
2030
4050
6070
Distance Meters
Ozo
ne V
alue
s
Year = 2010 Krig Raw Data
Gau
ExpSph
52.7
Nugget 13.6
Sill
Range
Number of Nearest Neighbors
5 10 15 20 25 30
3536
3738
3940
41
No. of Neighbors
var(c
ross
idw
$res
id)
Kriging Cross Validation, Gaussian Model
Generalized Linear Models (GLM) Similar to linear regression
Different than IDW and Kriging
Needs predictor input variables
solar radiation and relative humidity proved to be significant predicator variables.
Need to create the solar radiation and relative humidity surface via IDW as input into the GLM equation.
The GLM equation is:
45.35 + (SR * 0.0332) + (RH * -0.235)
R2 = 0.58
The GLM describes the “Large Scale Variability”
The “Small Scale Variability” is computed by calculating the differences between the observed values and the (GLM) predicted values.
Adding the “Large Scale Variability” to the “Small Scale Variability” can produce a good predicative surface.
Geographically Weighted Regression (GWR) A powerful modeling method that includes:
Linear Regression
Space
In a nutshell
GWR creates a series of local linear equations base upon the spatial parameters of the independent variables:
Kernel Function
Fixed Search Radius
Variable (number of neighbors)* (AKA Adaptive)
Bandwidth Method (fixed radius)
Cells located with in the search radius will have the same coefficients.
Best if sample points are located in a systematic method (e.g. no a gird with fixed distances).
Bandwidth Method (Adaptive or variable search radius)
One that uses the number of nearest neighbors from user input
One that uses a cross validation method which attempts to minimize the collinearity
Best if sample points are randomly located in the study area.
A sample point will be used multiple times to construct multiple linear equations
Each cell may contain different regression coefficients
Each linear equation (fixed radius or adaptive) uses the same global predictor variables as GLM
Solar Radiation and Relative Humidity proved to be the best global independent variables.
Results
Data Issues 1) Should have more data points to create and test the models 2) Data points should be more distributed over the study area
(e.g. no points in Oregon, Idaho, etc.. and few points in center of the nation.)
3) IDW MSE values for the observe points should not be different. This is likely due to cell size and rounding errors.
4) The variables temperature and wind speed were tested in the GWR model. Test results using these covariates included both the CV method or number of nearest neighbors. Results were very poor and not shown here.
Test Residuale Autocorrelated MSE MSE New Points GLM + IDW No 0.54 196.06 GWR using AICC and 25 nn No 21.98 265.09 GWR using CV No 38.43 241.2 IDW No 0.6 204.45 Kriging No 6.48 191.86
Take Home Message Final Statistical models are an abstraction of reality.
No statistical model is perfect. (e.g. errors)
Some models are better than other (Crawley 2007).
The correct model can never be known with complete certainty (Crawley 2007).
The simpler the model the better it is (Crawley 2007).
Models should include the Principle of Parsimony (Occam’s Razor)
Use the fewest number of variables
The correct explanation is the simplest explanation
Make sure that the assumptions of the model are followed.
Are the data IID.
Are the data spatially autocorrelated
Are the input variables correct?
Errors in measurement
Using temperature when solar radiation is a better independent variable.
How was the data collect
Random Sample, Systematic, etc…
Is there bias in the sample data?
Always as yourself does this model make sense.
Is the model predicted something where it should not
Example a fish population on land.
Crawley, M. J. 2007. The R Book. Imperial College London at Silwood Park, UK.
Final Quote
“Son you're going to drive me to drinking… if you don’t stop driving that hot rod Lincoln.” 1971.