Why Is It Why Is It There? There? Lecture 6 Lecture 6 Introduction to Geographic Information Introduction to Geographic Information Systems Systems Geography 176A Geography 176A 2006 Summer, Session B 2006 Summer, Session B Department of Geography Department of Geography University of California, Santa Barbara University of California, Santa Barbara
49
Embed
Why Is It There? Lecture 6 Introduction to Geographic Information Systems Geography 176A 2006 Summer, Session B Department of Geography University of California,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Why Is It There?Why Is It There?
Lecture 6Lecture 6Introduction to Geographic Information SystemsIntroduction to Geographic Information Systems
Geography 176AGeography 176A2006 Summer, Session B2006 Summer, Session B
Department of GeographyDepartment of GeographyUniversity of California, Santa BarbaraUniversity of California, Santa Barbara
““a geographic information system is a special a geographic information system is a special case of information systems where the database case of information systems where the database consists of observations on spatially distributed consists of observations on spatially distributed features, activities or events, which are definable features, activities or events, which are definable in space as points, lines, or areas. A geographic in space as points, lines, or areas. A geographic information system manipulates data about information system manipulates data about these points, lines, and areas to retrieve data for these points, lines, and areas to retrieve data for ad hoc queries and ad hoc queries and analysesanalyses".".
GIS is capable of data GIS is capable of data analysisanalysis
Attribute DataAttribute Data• Describe with statisticsDescribe with statistics• Analyze with hypothesis testingAnalyze with hypothesis testing
Spatial DataSpatial Data• Describe with mapsDescribe with maps• Analyze with spatial analysisAnalyze with spatial analysis
Describing one attributeDescribing one attribute
Flat File Database
Record Value Value Value
Attribute Attribute Attribute
Record Value Value Value
Record Value Value Value
Attribute DescriptionAttribute Description
The The extremesextremes of an attribute are the highest and of an attribute are the highest and lowest values, and the lowest values, and the rangerange is the difference is the difference between them in the units of the attribute.between them in the units of the attribute.
A A histogramhistogram is a two-dimensional plot of attribute is a two-dimensional plot of attribute values grouped by magnitude and the frequency values grouped by magnitude and the frequency of records in that group, shown as a variable-of records in that group, shown as a variable-length bar.length bar.
For a large number of records with random errors For a large number of records with random errors in their measurement, the histogram resembles a in their measurement, the histogram resembles a bell curvebell curve and is symmetrical about the and is symmetrical about the meanmean..
If the records are:If the records are:
TextText• Semantics of text e.g. Semantics of text e.g.
“Hampton” “Hampton” • word frequency e.g. “Creek”, word frequency e.g. “Creek”,
“Kill”“Kill”• address matchingaddress matching
Example: Display all places Example: Display all places called “State Street”called “State Street”
If the records are:If the records are:
ClassesClasses• histogram by classhistogram by class• numbers in classnumbers in class• contiguity description, e.g. average contiguity description, e.g. average
One: all I have! [6:00pm]One: all I have! [6:00pm] Two: do they agree? [6:00pm;6:04pm]Two: do they agree? [6:00pm;6:04pm] Three: level of agreement Three: level of agreement
[6:00pm;6:04pm;7:23pm][6:00pm;6:04pm;7:23pm] Many: average all, average without Many: average all, average without
extremesextremes Precision: 6:00pm. “About six o’clock”Precision: 6:00pm. “About six o’clock”
Statistical descriptionStatistical description
Range : min, max, max-minRange : min, max, max-min Central tendency : mode, median Central tendency : mode, median
(odd, even), mean(odd, even), mean Variation : variance, standard Variation : variance, standard
deviationdeviation
Statistical descriptionStatistical description
Range : outliersRange : outliers mode, median, meanmode, median, mean Variation : variance, standard deviationVariation : variance, standard deviation
Elevation (book example)Elevation (book example)
GPS Example Data: ElevationGPS Example Data: Elevation
MeanMean
Statistical averageStatistical average Sum of the values Sum of the values
for one attribute for one attribute divided by the divided by the number of recordsnumber of records
X i
i 1=
n
= X / n
Computing the MeanComputing the Mean
Sum of attribute values across all records, Sum of attribute values across all records, divided by the number of records.divided by the number of records.
Add all attribute values down a column, / Add all attribute values down a column, / by # records by # records
A representative value, and for A representative value, and for measurements with normally distributed measurements with normally distributed error, converges on the true reading.error, converges on the true reading.
A value lacking sufficient data for A value lacking sufficient data for computation is called a missing value. computation is called a missing value. Does not get included in sum or n.Does not get included in sum or n.
VarianceVariance
The total variance is the sum of each The total variance is the sum of each record with its mean subtracted and record with its mean subtracted and then multiplied by itself.then multiplied by itself.
The standard deviation is the square The standard deviation is the square root of the variance divided by the root of the variance divided by the number of records less one.number of records less one.
For two values, there is only one For two values, there is only one variance.variance.
Average difference from the mean
Sum of the mean subtracted from the value for each record, squared, divided by the number of records-1, square rooted.
st.dev. =(X - X )
2i
n - 1
Standard DeviationStandard Deviation
GPS Example Data: ElevationGPS Example Data: ElevationStandard deviationStandard deviation
Same units as the values of the records, in this Same units as the values of the records, in this case meters.case meters.
Average amount readings differ from the averageAverage amount readings differ from the average Can be above of below the meanCan be above of below the mean Elevation is the mean (459.2 meters) Elevation is the mean (459.2 meters) plus or minus the expected error of 82.92 metersplus or minus the expected error of 82.92 meters Elevation is most likely to lie between 376.28 Elevation is most likely to lie between 376.28
meters and 542.12 meters. meters and 542.12 meters. These limits are called the error band or margin These limits are called the error band or margin
of error.of error.
Mean
459.
2
484
.5
12.17 %
37.83 %
The Bell CurveThe Bell Curve
Samples and populationsSamples and populations
A A samplesample is a set of measurements taken from a is a set of measurements taken from a larger group or larger group or populationpopulation. .
Sample means and variances can serve as Sample means and variances can serve as estimatesestimates for their populations. for their populations.
Easier to measure with samples, then draw Easier to measure with samples, then draw conclusions about entire population.conclusions about entire population.
Testing MeansTesting Means
Mean elevation of 459.2 meters Mean elevation of 459.2 meters standard deviation 82.92 metersstandard deviation 82.92 meters what is the chance of a GPS reading of what is the chance of a GPS reading of
484.5 meters? 484.5 meters? 484.5 is 25.3 meters above the mean484.5 is 25.3 meters above the mean 0.31 standard deviations ( Z-score)0.31 standard deviations ( Z-score) 0.1217 of the curve lies between the 0.1217 of the curve lies between the
mean and this value mean and this value 0.3783 beyond it0.3783 beyond it
Hypothesis testingHypothesis testing
Set up NULL hypothesis (e.g. Values Set up NULL hypothesis (e.g. Values or Means are the same) as Hor Means are the same) as H00
Set up ALTERNATIVE hypothesis. HSet up ALTERNATIVE hypothesis. H11
Test hypothesis. Try to reject NULL.Test hypothesis. Try to reject NULL. If null hypothesis is rejected If null hypothesis is rejected
alternative is accepted with a alternative is accepted with a calculable level of confidence.calculable level of confidence.
Testing the MeanTesting the Mean
Mathematical version of the normal Mathematical version of the normal distribution can be used to compute distribution can be used to compute probabilities associated with probabilities associated with measurements with known means and measurements with known means and standard deviations.standard deviations.
A A test of meanstest of means can establish whether can establish whether two samples from a population are two samples from a population are different from each other, or whether the different from each other, or whether the different measures they have are the different measures they have are the result of random variation.result of random variation.
Alternative attribute histogramsAlternative attribute histograms
AccuracyAccuracy
Determined by testing measurements against an Determined by testing measurements against an independent source of higher fidelity and independent source of higher fidelity and reliability.reliability.
Must pay attention to units and significant digits.Must pay attention to units and significant digits. Can be expressed as a number using statistics Can be expressed as a number using statistics
The difference is the mapThe difference is the map
GIS data description answers the GIS data description answers the question: question: Where?Where?
GIS data analysis answers the GIS data analysis answers the question: question: Why is it there?Why is it there?
GIS data description is different from GIS data description is different from statistics because the results can be statistics because the results can be placed onto a map for placed onto a map for visual visual analysisanalysis..
For coordinates, data extremes For coordinates, data extremes define the two corners of a bounding define the two corners of a bounding rectangle.rectangle.
Geographic extremesGeographic extremes
Southernmost point Southernmost point in the continental in the continental United States.United States.
Range: e.g. Range: e.g. elevation difference; elevation difference; map extentmap extent
Depends on Depends on projection, datum projection, datum etc.etc.
For coordinates, the means and standard For coordinates, the means and standard deviations correspond to the mean center deviations correspond to the mean center and the standard distanceand the standard distance
A centroid is any point chosen to represent A centroid is any point chosen to represent a higher dimension geographic feature, of a higher dimension geographic feature, of which the mean center is only one choice.which the mean center is only one choice.
The standard distance for a set of point The standard distance for a set of point spatial measurements is the expected spatial measurements is the expected spatial error.spatial error.
Mean CenterMean Center
mean y
mean x
Centroid: mean center of a featureCentroid: mean center of a feature
Mean center?Mean center?
Comparing spatial meansComparing spatial means
GIS and Spatial AnalysisGIS and Spatial Analysis Descriptions of geographic properties Descriptions of geographic properties
such as shape, pattern, and such as shape, pattern, and distribution are often verbaldistribution are often verbal
Quantitative measure can be devised, Quantitative measure can be devised, although few are computed by GIS.although few are computed by GIS.
GIS statistical computations are most GIS statistical computations are most often done using retrieval options often done using retrieval options such as buffer and spread.such as buffer and spread.
Also by manipulating attributes with Also by manipulating attributes with arithmetic commands (map algebra).arithmetic commands (map algebra).
Lower 48 United StatesLower 48 United States 2000 Data from the U.S. Census on 2000 Data from the U.S. Census on
gendergender Gender Ratio = # males per 100 Gender Ratio = # males per 100
femalesfemales Range is 89.00 - 103.90Range is 89.00 - 103.90 What does the spatial distribution What does the spatial distribution
look like?look like?
Gender Ratio by State: 1996Gender Ratio by State: 1996
Searching for Spatial PatternSearching for Spatial Pattern
A linear relationship is a predictable A linear relationship is a predictable straight-line link between the values of a straight-line link between the values of a dependent and an independent variable. (y dependent and an independent variable. (y = a + bx) It is a simple model of the = a + bx) It is a simple model of the relationship.relationship.
A linear relation can be tested for goodness A linear relation can be tested for goodness of fit with least squares methods. The of fit with least squares methods. The coefficient of determination (r-squared) is a coefficient of determination (r-squared) is a measure of the degree of fit, and the measure of the degree of fit, and the amount of variance explained.amount of variance explained.
Simple linear relationshipSimple linear relationship
dependentvariable
independent variable
observationbest fitregression liney = a + bx
intercept
gradient
y=a+bx
Testing the relationshipTesting the relationship
Gender Ratio = -0.1438Longitude + 83.285
R-squared = 61.8%
Patterns in Residual MappingPatterns in Residual Mapping Differences between observed values of the dependent Differences between observed values of the dependent
variable and those predicted by a model are called variable and those predicted by a model are called residualsresiduals..
A GIS allows residuals to be mapped and examined for A GIS allows residuals to be mapped and examined for spatial patterns.spatial patterns.
A model helps explanation and prediction after the GIS A model helps explanation and prediction after the GIS analysis.analysis.
A A modelmodel should be simple, should explain what it should be simple, should explain what it represents, and should be examined in the limits before represents, and should be examined in the limits before use.use.
We should always examine the limits of the model’s We should always examine the limits of the model’s applicability (e.g. Does the regression apply to Europe?)applicability (e.g. Does the regression apply to Europe?)
Unexplained varianceUnexplained variance
More variables?More variables? Different extent?Different extent? More records?More records? More spatial dimensions?More spatial dimensions? More complexity?More complexity? Another model?Another model? Another approach?Another approach?
resolution? extent? accuracy? precision?resolution? extent? accuracy? precision?boundary effects? point spacing? Method?boundary effects? point spacing? Method?
GIS and Spatial AnalysisGIS and Spatial Analysis
Geographic inquiry examines the Geographic inquiry examines the relationships between geographic features relationships between geographic features collectively to help describe and collectively to help describe and understand the real-world phenomena that understand the real-world phenomena that the map represents.the map represents.
Spatial analysis compares maps, Spatial analysis compares maps, investigates variation over space, and investigates variation over space, and predicts future or unknown maps.predicts future or unknown maps.
Analytic Tools and GISAnalytic Tools and GIS Tools for searching out spatial relationships and for Tools for searching out spatial relationships and for
modeling are only lately being integrated into GIS.modeling are only lately being integrated into GIS. Statistical and spatial analytical tools are also only Statistical and spatial analytical tools are also only
now being integrated into GIS, and many people use now being integrated into GIS, and many people use separate software systems outside the GIS.separate software systems outside the GIS.
Real geographic phenomena are dynamic, but GISs Real geographic phenomena are dynamic, but GISs have been mostly static. Time-slice and animation have been mostly static. Time-slice and animation methods can help in visualizing and analyzing spatial methods can help in visualizing and analyzing spatial trends.trends.
GIS places real-world data into an organizational GIS places real-world data into an organizational framework that allows numerical description and framework that allows numerical description and allows the analyst to model, analyze, and predict with allows the analyst to model, analyze, and predict with both the map and the attribute data.both the map and the attribute data.
You can lie with...You can lie with...
MapsMaps StatisticsStatistics
• Correlation is not causation!Correlation is not causation!• Hypothesis vs. ActionHypothesis vs. Action