Why is it there? (How can a GIS analyze data?) Getting Started, Chapter 6 Paula Messina
Dec 13, 2015
Why is it there?
(How can a GIS analyze data?)
Getting Started, Chapter 6
Paula Messina
GIS is capable of data analysis
Attribute Data Describe with statistics Analyze with hypothesis testing
Spatial Data Describe with maps Analyze with spatial analysis
Describing one attribute
Flat File Database
Record Value Value Value
Attribute Attribute Attribute
Record Value Value Value
Record Value Value Value
Attribute Description
The extremesextremes of an attribute are the highest and lowest values, and the rangerange is the difference between them in the units of the attribute.
A histogramhistogram is a two-dimensional plot of attribute values grouped by magnitude and the frequency of records in that group, shown as a variable-length bar.
For a large number of records with random errors in their measurement, the histogram resembles a bell curvebell curve and is symmetrical about the meanmean.
Describing a classed raster grid
5
10
15
20
% (blue) = 19/48
If the attributes are:
Numbers statistical description min, max, range variance standard deviation
Statistical description
Range : max-min Central tendency : mode, median,
mean Variation : variance, standard
deviation
Statistical description
Range : outliers mode, median, mean Variation : variance, standard deviation
Elevation (book example)
GPS Example Data: Elevation
Table 6.2: Sample GPS ReadingsData Extreme DateTime D M S D M S ElevMinimum 6/14/95 10:47am 42 30 54.8 75 41 13.8 247Maximum 6/15/95 10:47pm 42 31 03.3 75 41 20.0 610Range 1 Day 12 hours 00 8.5 00 6.2 363
Mean
Statistical average Sum of the values
for one attribute divided by the number of records
X i
i 1=
n
= X / n
Variance
The total variance is the sum of each record with its mean subtracted and then multiplied by itself.
The standard deviation is the square root of the variance divided by the number of records less one.
Average difference from the mean
Sum of the mean subtracted from the value for each record, squared, divided by the number of records-1, square rooted.
st.dev. =(X - X )
2i
n - 1
Standard DeviationStandard Deviation
GPS Example Data: ElevationStandard Deviation
Same units as the values of the records, in this case meters.
Elevation is the mean (459.2 meters) plus or minus the expected error of 82.92
meters Elevation is most likely to lie between
376.28 meters and 542.12 meters. These limits are called the error band
or margin of error.
Standard Deviations and the Bell Curve
Mean
459.
2
542.
1
376.
3
One Std. Dev.below the mean
One Std. Dev.above the mean
Testing Means (1)
Mean elevation of 459.2 meters Standard deviation 82.92 meters What is the chance of a GPS reading of
484.5 meters? • 484.5 is 25.3 meters above the mean• 0.31 standard deviations ( Z-score)
• 0.1217 of the curve lies between the mean and this value
• 0.3783 beyond it
Mean
12.17 %
37.83 %
Testing Means (2)
459.
2
484
.5
Accuracy
Determined by testing measurements against an independent source of higher fidelity and reliability.
Must pay attention to units and significant digits.
Not to be confused with precision!
The difference is the map
GIS data description answers the question: Where?
GIS data analysis answers the question: Why is it there?
GIS data description is different from statistics because the results can be placed onto a map for visual analysis.
Spatial Statistical Description For coordinates, the means and
standard deviations correspond to the mean center and the standard distance
A centroid is any point chosen to represent a higher dimension geographic feature, of which the mean center is only one choice.
Spatial Statistical Description For coordinates, data extremes
define the two corners of a bounding rectangle.
Geographic extremes
Southernmost point in the continental United States.
Range: e.g. elevation difference; map extent
Depends on projection, datum etc.
Mean Center
mean y
mean x
Centroid: mean center of a feature
Mean center?
Comparing spatial means
Spatial Analysis
Lower 48 United States 1996 Data from the U.S. Census on
gender Gender Ratio = # females per 100
males Range is 96.4 - 114.4 What does the spatial distribution
look like?
Gender Ratio by State: 1996
Searching for Spatial Pattern A linear relation is a predictable straight-
line link between the values of a dependent and an independent variable. (y = a + bx) It is a simple model of correlation.
A linear relation can be tested for goodness of fit with least squares methods. The coefficient of determination r-squared is a measure of the degree of fit, and the amount of variance explained.
Simple linear relation
dependentvariable
independent variable
observationbest fitregression liney = a + bx
intercept
gradient
y=a+bx
Testing the relation
gr = 117.46 + 0.138 long.
GIS and Spatial Analysis
Geographic inquiry examines the relationships between geographic features collectively to help describe and understand the real-world phenomena that the map represents.
Spatial analysis compares maps, investigates variation over space, and predicts future or unknown maps.
Many GIS systems have to be coaxed to generate a full set of spatial statistics.
You can lie with...
MapsMaps
StatisticsStatisticsCorrelation is not causation!
Terrain Analysis
Paula Paula MessinaMessina
Introduction to Terrain Analysis What is terrain analysis? How are data points interpolated to
a grid? How are topographic data sets
produced from non-point data? How are derivative data sets (i.e.,
slope and aspect maps) produced by ArcView?
What is Terrain Analysis? Terrain Analysis: the study of ground-
surface relief and pattern by numerical methods (a.k.a geomorphometry).
Geomorphology qualitative
Geomorphometry = quantitative
Interpolation to a Grid
Assumptions: Elevations are continuously distributed The influence of one known point over an
unknown point increases as distance between them decreases
58
46
97
86
70
58
46
86
7097
?
Interpolation Using the Neighborhood Model
58
46
86
7097
Inverse-Distance theory dictates: The value of X > 58 The value of X < 97 The value of X is
closer to 58 than 97
x
58
46
86
7097
x
Zx =
Zp dp-n
P = 1
R
dp-n
P = 1
R
Zx= elevation at kernal (point x)
Zp = elevation at known point pdp = distance from point x to point pn = “friction of distance” value; usually between 1 and 6
Neighborhood Interpolation Using Inverse Distance Weighting
When n=2, the technique is called “inverse-squared distance weighting.”
ArcGIS callsthis IDW
Types of “Neighborhoods” used with IDW
Nearest n Neighbors in this example, n = 3 this method isn’t effective when
there are clusters of points “nearest in quadrants,” and
“nearest in octants” searches can help
Fixed Radius a radius is selected points are selected only if they
lie within that fixed radius
58
46
86
7097
x
46
86
97
58
70
x
Interpolation using the Spline Method
The spline interpolator fits a minimum-curvature surface through input points. “Rubber sheet fit”
The spline interpolator fits a mathematical function to a specified number of nearest points
Interpolation Using Kriging Based on regionalized variable
theory Drift, random correlated component,
noise This method produces a
statistically optimal surface, but it is very computationally intensive
Kriging is used frequently in soil science and geology
Trend Interpolator Fits a mathematical function (a
polynomial of specified order) to input points Points may be chosen by nearest neighbor or radius
searches --or-- All points may be used
Uses a least-squares regression fit The surface produced does not
necessarily pass through the points used This is an excellent choice when data points are sparse
Not available asa menu itemin ArcGIS
Which ArcView menu interpolator is better?
IDW Assumption: The variable being mapped
decreases in influence with distance• Example: interpolating consumer purchasing
power for a retail site analysis
Spline Assumption: The variable being mapped is a
smooth, continuous surface; it is not particularly good for surfaces with
large variability over small horizontal distances
• Examples: terrain, water table heights, pollution concentration, etc.
The Finished Grid
The Messina “Eyeball” Interpolator was used
58
46
86
7097
x
56 58 65 74
46 56 54
86 84 80
70 75 78 86 94 94 80
66 69 73 80 90 88 86
72 76 80 84 90 89 84
50 52 60 64 68 80 80
48 50 54 56
46 48 50 52 46 46 44
Grids are subject to the “layer cake effect”
Point Data Collection in the Field It is critical to obtain data at the
corners of the grid extent It is advisable to obtain the VIPs
(Very Important Points) such as the highest and lowest elevations
Other Continuous Surface Sources
USGS DEMs produced directly from USGS Topographic Maps Elevations of an area are averaged within the grid cell
(pixel) High and low points can never be saved as a grid cell
value Various techniques (i.e. stereograms) were used to
accomplish this process Original datum (i.e. NAD27, NAD83) is preserved in the
DEM Spatial resolution: 30m (7.5 minute data), 1 arc-second (1 degree data), 10m*, 5m* *(limited
coverage)
Other Continuous Surface Sources
Synthetic Aperture Radar, Side-looking Airborne Radar Shuttle Missions:
• Shuttle Radar Topography Mission, 2/00• SIR-C , 1994
Other Orbiters• Magellan Mapping Mission of Venus,
1990-1994 Click here to see an animation of the Venutian surface topography
Airborne Radar Mappers• AirSAR/TopSAR• GeoSAR: California mapping
Click here to link to Hunter College’s Radar Mapping Web Site
How is Slope Computed?
Slope = arctan [( )2+( )2]
100 130 140
120 150 160
160 170 200
Grid cell = 100m x 100m
dZdX
dZdY
Calculate the slopefor the central pixel.Click here for thesolution.
How is Aspect Computed?
Aspect A’ = arctan -( ) ( )
100 130 140
120 150 160
160 170 200
Grid cell = 100m x 100m
dZdY
dZdX
Calculate the aspectfor the central pixel.Click here for thesolution.
If is negative, add 90 to A’
If is positive, and is negative: add 270 to A’
If is positive, and is positive: subtract A’ from 270
dZdX
dZdY dZdY
dZdXdZdX