Top Banner
Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008
60

Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

Dec 24, 2015

Download

Documents

Jeffry Norman
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

Geostatistic Analysis

Lecture 6-8

Feb25-Mar10, 2008

Page 2: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.
Page 3: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

Purposes

• Estimating or interpolating values at unsampled sites within the area covered by existing observations – Visiting every location is usually difficult or

expensive.– Assumption: spatially distributed objects are

spatially correlated. in other words, things are close together tend to have similar characteristics (first law of geography, also called spatial autocorrelation).

Page 4: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

example

• Few sample points – to fill all cells

Page 5: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

Example: Point elevation to surface

Page 6: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.
Page 7: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

Global interpolation

• global interpolators determine a single function which is mapped across the whole region – a change in one input value affects the entire

map – global algorithms tend to produce smoother

surfaces with less abrupt changes are used when there is an hypothesis about the form of the surface, e.g. a trend

Page 8: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

Local interpolation

• Local interpolators apply an algorithm repeatedly to a small portion of the total set of points. On average, values at points closer in space are more likely to be similar than point further apart (spatial autocorrelation)– A change in an input value only affects the result

within the window

• Two important steps for local interpolation– Define sampling neighborhood– Find points (samples) in the neighborhood

Page 9: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

If no directional influence

• If there are no directional influences in the data, you want to give equal weight to sample points regardless of their direction from the prediction location. This means that you probably want your neighborhood to be a circle

Page 10: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

If directional influence• if there is directional

influence in your data (such as might be caused by water draining downhill), then you may want to make an ellipse with the major axis running uphill/downhill

Page 11: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

Sample points• Once a shape is specified,

you can restrict which sample points within the neighborhood are used. You do this by specifying the maximum and minimum numbers of points to use and by dividing the neighborhood into sectors. If the neighborhood is sectored, then the maximum and minimum constraints are applied to each part

Page 12: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

1. Explore data before interpolation

• Before creating a surface, explore data (ED) tool enables you to gain a deeper understanding of the phenomena you are investigating so that you can make better decisions on issues relating to your data.

• Exploring the distribution of the data, looking for global and local outliers, looking for global trends, examining spatial autocorrelation, and understanding the covariation among multiple datasets.

• Tools includes Histogram, Voronoi Map, Normal QQPlot, Trend Analysis, Semivariogram/Covariance Cloud, General QQPlot, and Crosscovariance Cloud.

• Only works on point and polygon layers

Page 13: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

1.1 Histogram

• provides a univariate (one-variable) description of your data, displays the frequency distribution for the dataset of interest and calculates summary statistics.

• measures of location: - mean, median, and quartiles

• measures of spread: - standard deviation, variance

• measures of shape: - skewness, kurtosis

Page 14: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

1.2 QQPlotThe quantile-quantile (q-q) plot is a graphical technique for determining if two data sets

come from populations with a common distribution.

A q-q plot is a plot of the quantiles of the first data set against the quantiles of the second data set. By a quantile, we mean the fraction (or percent) of points below the given value. That is, the 0.3 quantile (or 30%) is the point at which 30% of the data fall below and 70% fall above that value 0.3 quantile is also called 30 percentile, 0.4 quantitle is 40 percentile25 percentile is called first quantitle 50 percentile is called second quantitle (equal to median value)75 percentile is called third quantitle100 percentile is called fourth quantile

The q-q plot is used to answer the following questions: • Do two data sets come from populations with a common distribution? • Do two data sets have common location and scale? • Do two data sets have similar distributional shapes? • Do two data sets have similar tail behavior?

Page 15: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

• QQPlots are graphs on which quantiles from two distributions are plotted relative to each other.

• a cumulative distribution is produced by ordering the data and producing a graph of the ordered values versus cumulative distribution values

Normal QQPlot

General QQPlot

standard

Page 16: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

1.3 Voronoi map

• Voronoi maps are constructed from a series of polygons (thiessen polygon) formed around the location of a sample point.

• The value for each polygon can be calculated using any of methods:- simple, mean, mode, cluster, entropy, median standard deviation, IQR (interquartile range)

Page 17: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

1.4 Trend analysis

• You may be interested in mapping a trend, or you may wish to remove a trend from the dataset before using kriging. The Trend Analysis tool can help identify global trends in the input dataset.

Page 18: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

globe trend and anisotropy

• globe trend is an overriding process that affects all measurements, can be represented by a mathematical formula (polynomial)

• anisotropy is a random process that shows higher autocorrelation in one direction than in another (directional autocorrelation or directional influence). the reason for directional influence may not be known, but they can be statistically quantified.

Page 19: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

1.5 Semivariogram/covariance cloud

• The semivariogram/covariance cloud shows the empirical semivariogram (half of the difference squared) and covariance for all pairs of locations within a dataset and plots them as a function of the distance between the two locations.

• the empirical semivariogram for the (i,j)th pair is simply

0.5*(z(si)-z(sj))2, and the empirical covariance is the cross-product

where is the average of the data. The semivariogram/covariance cloud can be used to examine the local characteristics of spatial autocorrelation within a dataset and look for outliers.

Page 20: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

Semivariogram depicts the spatial autocorrelation

Creating Variography

Page 21: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

Understanding a semivariogram -range, sill, and nugget

The distance where the model first flattens out is known as the rangeThe value at which the model attains the range is called the sillThe value at which the model intercepts the y-axis is called the nugget

Page 22: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

Fitting the semivariogram

• Circular, spherical, exponential, Gaussian, and linear.

Page 23: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

Making a prediction

• prediction based on the semivarioogram model and the measured values that are nearby (using search radius)

• fixed search radius: requires a distance and minimum number of points

• variable search radius: number of points needs to be specified. You can also specify a maximum distance (radius), that the search radius cannot exceed.

Page 24: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

To explore a directional influence in the semivariogram cloud, we use the

Search Direction tools

• the direction the pointer is facing determiners which pairs of data locations are plotted on the semivariogram.

• lag size is the size of a lag distance, used to reduce the larger number of possible combinations. it is the size of the cells in the semivariogram surface.

• number of cells is called number of lags, counted as the number of adjacent cells in a straight horizontal or vertical line from the center to the edge of the figure.

Page 25: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

1.6 Crosscovariance cloud

• The crosscovariance cloud shows the empirical crosscovariance for all pairs of locations between two datasets and plots them as a function of the distance between the two locations.

• Let z(si) denote the value at the i th location in dataset 1, and let y(tj) denote the value at the j th location in dataset 2.

Page 26: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

Relation between variogram and covariance

Page 27: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

2. Inverse Distance Weighted (IDW)

• Each sample point has a local influence that diminishes with distance.

• Weights the points closer to the processing cell more heavily than those farther away.

• Operator controls how weighting is done.– Power.

• High power gives more weight to closer points.– Radius type.

• Considers how far away to look.– Barrier.

• Search can be limited by other polygons or polyline.

Page 28: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

IDW Characteristics

• Inverse Distance Weighting (IDW) is a quick deterministic interpolator that is exact. There are very few decisions to make regarding model parameters. It can be a good way to take a first look at an interpolated surface. However, there is no assessment of prediction errors, and IDW can produce "bulls eyes" around data locations. There are no assumptions required of the data.

Page 29: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

IDW: a local interpolator

rrr

We usually uses power functions greater than 1.A r = 2 is known as the inverse distance squared weighted interpolation.

Page 30: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

Adv. and disadv.

Page 31: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.
Page 32: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.
Page 33: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

3. Global Polynomial interpolator

• Global Polynomial (GP) is a quick deterministic global interpolator that is smooth (inexact). There are very few decisions to make regarding model parameters. It is best used for surfaces that change slowly and gradually. However, there is no assessment of prediction errors and it may be too smooth. Locations at the edge of the data can have a large effect on the surface. There are no assumptions required of the data.

Page 34: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

GP interpolation fits a polynomial regression to x,y coordinates

First order

Second order:Third order:

Page 35: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

4. Local Polynomial interpolator

• Global Polynomial interpolation is the only method in Geostatistical Analyst that does not use a search neighborhood. If you add the idea of a search neighborhood to Global Polynomial interpolation, you get Local Polynomial interpolation;

• Local Polynomial (LP) is a moderately quick deterministic interpolator that is smooth (inexact). It is more flexible than the global polynomial method, but there are more parameter decisions. There is no assessment of prediction errors. The method provides prediction surfaces that are comparable to kriging with measurement errors. Local polynomial methods do not allow you to investigate the autocorrelation of the data, making it less flexible and more automatic than kriging. There are no assumptions required of the data.

Page 36: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

• Local polynomial interpolation creates a surface from many different formulas, each of which is optimized for a neighborhood

• The neighborhood shape, maximum and minimum number of points, and sector configuration can be specified. In addition, as with IDW, the sample points in a neighborhood can be weighted by their distance from the prediction location. Thus, local polynomial interpolation produces surfaces that better account for local variation.

Page 37: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.
Page 38: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

5. Radial Basis Functions

• Radial Basis Functions (RBF) are moderately quick deterministic interpolators that are exact. They are much more flexible than IDW, but there are more parameter decisions. There is no assessment of prediction errors. The method provides prediction surfaces that are comparable to the exact form of kriging. Radial Basis Functions do not allow you to investigate the autocorrelation of the data, making it less flexible and more automatic than kriging. Radial Basis Functions make no assumptions about the data.

• RBF seems like a rubber membrane that is fitted to each of the measured data points while minimizing the total curvature of the surface. Because the surface must pass through each sampled point, radial basis functions are exact interpolators

Page 39: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.
Page 40: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

Radial basis functions (Spline)• Radial basis functions (RBF)

methods are a series of exact interpolation techniques, that is, the surface must go through each measured sample value.

• There are five different basis functions: thin-plate spline, spline with tension, completely regularized spline, multiquadric function, and inverse multiquadric spline

• RBFs are conceptually similar to fitting a rubber membrane through the measured sample values while minimizing the total curvature of the surface. IDW will never predict values above

the maximum measured value or belowthe minimum measured value. Spline can

http://www.math.ucla.edu/~baker/java/hoefer/Spline.htm

Page 41: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

Radial basis functions

When running spline in Spatial Analyst

weight: - regularized: 0, 0.001, 0.01, 0.1, 0.5 the higher the weight, the smoother the surface - tension: 0, 1, 5, 10 the higher, the coarser

number of points: - used in the calculation. the more points. the smoother the surface

Wi

Wi

Bias or error

Page 42: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

6. Kriging

• Kriging is a moderately quick interpolator that can be exact or smoothed depending on the measurement error model. It is very flexible and allows you to investigate graphs of spatial autocorrelation. Kriging uses statistical models that allow a variety of map outputs including predictions, prediction standard errors, probability, etc. The flexibility of kriging can require a lot of decision-making. Kriging assumes the data come from a stationary stochastic process (1. constant mean throughout the region, or 2. variance of differences between any two samples is independent of position, but depends on separated distance), , and some methods assume normally-distributed data.

Page 43: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

Trend and error• In Kriging, a predicted value depends on two factors: a trend and an additional

element of variability. This is an intuitive idea with plenty of analogies in the real world. For instance, if you go from the ocean to the top of a mountain, you have an upward trend in elevation. However, there is likely to be variation on the way—you will go both up and down when crossing valleys, streams, knobs and other features.

• In Kriging, the trend part of a prediction is called the trend. The fluctuation part is called spatially-autocorrelated random error. "Error" doesn't mean a mistake—it just means a fluctuation from the trend. Z(s) = μ(s) + ε (s) – Assumption one: ε (s) is zero, (positive errors and negative errors)– Assumption two: the autocorrelation of the error is purely spatial; it

depends only on distance and not on any other property (such as position) of a location.

Autocorrelation between ε(s1) and ε(s1+h) is the same as the one between ε(s2) and ε(s2+h)

Page 44: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

Ordinary Kriging• In many cases, however, there is

no trend in the data—or, if there is one, it is weak enough that your predictions are just as good when you ignore it. Assuming that there is no trend in the data is mathematically equivalent to assuming that the data have a constant mean value.

• If the mean is a simple constant, such as μ(s) = μ (i.e., no trend) for all locations s, and if µ is unknown (you do not have prior knowledge of the mean value), then this is the model on which Ordinary Kriging is based.

Page 45: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

Universal Kriging

• Sometimes, there is a trend where the data values change consistently with the spatial coordinates. Mathematically, this is represented as a linear regression equation on the spatial x- and y-coordinates. Trends that vary (do not have a constant mean), and for which the regression coefficients are unknown, form models for Universal Kriging.

Page 46: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

Simple Kriging and indicator Kriging

• A known constant mean for the entire dataset, then you have the model for Simple Kriging.

• look at the left side of the equation, Z(s) = μ(s) + ε(s). You can do some useful math operations on Z(s). For example, suppose you want to predict the probability that Z(s) is above or below some threshold value, such as 0.12 ppm for ozone concentration. You can transform Z(s) to an indicator variable, where it gets the value 0 if Z(s) is below the threshold and 1 if it is above it. This is called Indicator Kriging.

Page 47: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

Ordinary Kriging Simple Kriging

Universal Kriging Indicator Kriging

Page 48: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

7. Cokriging

• Cokriging is a moderately quick interpolator that can be exact or smoothed depending on the measurement error model. Cokriging uses multiple datasets and is very flexible, allowing you to investigate graphs of cross-correlation and autocorrelation. Cokriging uses statistical models that allow a variety of map outputs including predictions, prediction standard errors, probability, etc. The flexibility of cokriging requires the most decision-making. Cokriging assumes the data come from a stationary stochastic process, and some methods assume normally-distributed data.

Page 49: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

8. definitions of output maps (for Kriging methods)

• Prediction maps (interpolated maps) estimate values at locations where measurements have not been taken.

• Standard error maps (the square root of the variance of a prediction) show the distribution of prediction error for a surface. Error tends to be highest in places where there is little or no sample data.

• Quantile maps show the values with 100 percent probability that the true values are less than the quantile map values.

• Probability maps show the odds that the true value at a location is greater than a threshold value. The probability of exceeding a threshold is determined from predicted values, the error distribution, and the specified threshold value

Page 50: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

Kriging family and output maps

• Transformation and trend removal can help justify assumptions of normality and stationarity

Page 51: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

creating a prediction map

• while applying a transformation

• while using detrending

• while considering error for nugget

Page 52: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

9. Performing diagnostics

• before produce a final surface, you should know how well the model predicts the values at unknown locations. cross-validation and validation help you make an informed decision as to which model provides the best predictions.

• cross-validation uses all of the data to estimate the autocorrelation model. Then it removes each data location, one at a time, and predicts the associated data value. compare the predicted value with measured value

• validation first remove part of the data (test dataset) using Create Subset tool, and then uses the rest of the data (training dataset) to develop the trend and autocorrection models to be used for prediction

• in both methods, graphs and summary statistics used for diagnostics are the same: predicted, prediction error (predicted-measured), standardized error (error/estimated kriging standard error), normal QQPlot (standardized error and standard normal distribution)

Page 53: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

create subset for validation

demo

Page 54: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

basic rules for good predicts

• Mean error should be close to 0, • RMS (root-mean-square) error, average standard error, and mean

standardized error should be as small as possible,• Root mean square stardardized error should be close to 1• uncertainty of prediction standard errors: average estimated

standard error versus RMS prediction error. - equal, good- larger than RMS, overestimate- less than RMS, underestimate

or RMS standardized error - =1, good,- <1, overestimate- >1, underestimate

Page 55: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

10. Compare model results• model

surfaces can be compared using cross-validation statistics

• previous basic rules for good predict still use in here

Page 56: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

11. Displaying geostatistical layers

Page 57: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

12. Comparison of different mathods

Page 58: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

here stochastic is geostatistic

Page 59: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

Summary

• Kriging performs statistical analysis of the error in its predictions. This allows it to create four kinds of surfaces: prediction, standard error, quantile, and probability.

• Prediction maps estimate values at locations where measurements have not been taken. (All interpolators make prediction maps.)

• Standard error maps show the distribution of prediction error for a surface. Error tends to be highest in places where there is little or no sample data.

• Quantile maps show the values that the true values are unlikely to exceed. • Probability maps show the odds that the true value at a location is greater than a

threshold value. • The various interpolation methods (Inverse Distance Weighting, Global

Polynomial, Local Polynomial, Radial Basis Functions, and Kriging) offer trade-offs in speed, flexibility, and their advantages and disadvantages.

• Fast interpolators produce output surfaces quickly, but are not as good at capturing subtle surface variations.

• Exact interpolators predict values equal to the observed value at all sampled locations. Smooth interpolators do not.

• Flexible interpolators allow users to fine-tune the output, while inflexible interpolators allow users to avoid making lots of choices

Page 60: Geostatistic Analysis Lecture 6-8 Feb25-Mar10, 2008.

Main references

• ESRI book “using ArcGIS Geostatistical Analyst”

• ESRI visual campus at campus.esri.com