Hierarchical Modeling and Analysis for Spatial Data Bradley P. Carlin, Sudipto Banerjee , and Alan E. Gelfand [email protected], [email protected], and [email protected]University of Minnesota and Duke University Hierarchical Modeling and Analysis for Spatial Data – p. 1/2
21
Embed
Hierarchical Modeling and Analysis for Spatial Data · Hierarchical Modeling and Analysis for Spatial Data Bradley P. Carlin, Sudipto Banerjee , and Alan E. Gelfand [email protected],
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Hierarchical Modeling and Analysisfor Spatial Data
Bradley P. Carlin, Sudipto Banerjee , and Alan E. Gelfand
Hierarchical Modeling and Analysis for Spatial Data – p. 1/21
Introduction to spatial data and models
Researchers in diverse areas such as climatology,ecology, environmental health, and real estatemarketing are increasingly faced with the task ofanalyzing data that are:
highly multivariate, with many important predictorsand response variables,geographically referenced, and often presented asmaps, andtemporally correlated, as in longitudinal or other timeseries structures.
⇒ motivates hierarchical modeling and data analysis forcomplex spatial (and spatiotemporal) data sets.
Hierarchical Modeling and Analysis for Spatial Data – p. 2/21
Introduction (cont’d)
Example: In an epidemiological investigation, we might wishto analyze lung, breast, colorectal, and cervical cancerrates
by county and year in a particular state
with smoking, mammography, and other importantscreening and staging information also available atsome level.
Hierarchical Modeling and Analysis for Spatial Data – p. 3/21
Introduction (cont’d)
Public health professionals who collect such data arecharged not only with surveillance, but also statisticalinference tasks, such as
modeling of trends and correlation structures
estimation of underlying model parameters
hypothesis testing (or comparison of competing models)
prediction of observations at unobserved times orlocations.
=⇒ all naturally accomplished through hierarchicalmodeling implemented via Markov chain Monte Carlo(MCMC) methods!
Hierarchical Modeling and Analysis for Spatial Data – p. 4/21
Existing spatial statistics books
Cressie (1990, 1993): the legendary “bible” of spatialstatistics, but
rather high mathematical levellacks modern hierarchical modeling/computing
Wackernagel (1998): terse; only geostatistics
Chiles and Delfiner (1999): only geostatistics
Stein (1999a): theoretical treatise on kriging
More descriptive presentations: Bailey and Gattrell (1995),Fotheringham and Rogerson (1994), or Haining (1990).
Our primary focus is on the issues of modeling, computing,and data analysis.
Hierarchical Modeling and Analysis for Spatial Data – p. 5/21
Type of spatial data
point-referenced data, where Y (s) is a random vector ata location s ∈ ℜr, where s varies continuously over D, afixed subset of ℜr that contains an r-dimensionalrectangle of positive volume;
areal data, where D is again a fixed subset (of regularor irregular shape), but now partitioned into a finitenumber of areal units with well-defined boundaries;
point pattern data, where now D is itself random; itsindex set gives the locations of random events that arethe spatial point pattern. Y (s) itself can simply equal 1for all s ∈ D (indicating occurrence of the event), orpossibly give some additional covariate information(producing a marked point pattern process).
Hierarchical Modeling and Analysis for Spatial Data – p. 6/21
color indicates range of average 2001 levelHierarchical Modeling and Analysis for Spatial Data – p. 7/21
Areal (lattice) data
Figure 2: ArcView poverty map, regional survey
units in Hennepin County, MN.Hierarchical Modeling and Analysis for Spatial Data – p. 8/21
Notes on areal data
Figure 2 is an example of a choropleth map, which usesshades of color (or greyscale) to classify values into afew broad classes, like a histogram
From the choropleth map we know which regions areadjacent to (touch) which other regions.
Thus the “sites” s ∈ D in this case are actually theregions (or blocks) themselves, which we will denotenot by si but by Bi, i = 1, . . . , n.
It may be helpful to think of the county centroids asforming the vertices of an irregular lattice, with twolattice points being connected if and only if the countiesare “neighbors” in the spatial map.
Hierarchical Modeling and Analysis for Spatial Data – p. 9/21
Misaligned (point and areal) data
Figure 3: Atlanta zip codes and 8-hour maximum
ozone levels (ppm) at 10 sites, July 15, 1995.Hierarchical Modeling and Analysis for Spatial Data – p. 10/21
Spatial point process data
Exemplified by residences of persons suffering from aparticular disease, or by locations of a certain speciesof tree in a forest.
The response Y is often fixed (occurrence of the event),and only the locations si are thought of as random.
Such data are often of interest in studies of eventclustering, where the goal is to determine whetherpoints tend to be spatially close to other points, or resultmerely from a random process operating independentlyand homogeneously over space.
In contrast to areal data, here (and with point-referenced data as well) precise locations are known,and so must often be protected to protect the privacy ofthe persons in the set.
Hierarchical Modeling and Analysis for Spatial Data – p. 11/21
Spatial point process data (cont’d)
“No clustering" is often described through ahomogeneous Poisson process:
E[number of occurrences in region A] = λ|A| ,
where λ is the intensity parameter, and |A| is area(A).
Visual tests can be unreliable (tendency of the humaneye to see clustering), so instead we might rely onRipley’s K function,
K(d) =1
λE[number of points within d of an arbitrary point],
where again λ is the intensity of the process, i.e., themean number of points per unit area.
Hierarchical Modeling and Analysis for Spatial Data – p. 12/21
Spatial point process data (cont’d)
The usual estimator for K is
K̂(d) = n−2|A|∑ ∑
i6=j
p−1
ij Id(dij) ,
where n is the number of points in A, dij is the distancebetween points i and j, pij is the proportion of the circlewith center i and passing through j that lies within A,and Id(dij) equals 1 if dij < d, and 0 otherwise.
Compare this to, say, K(d) = πd2, the theoretical valuefor nonspatial processes
Clustered data would have larger K; uniformly spaceddata would have a smaller K
Hierarchical Modeling and Analysis for Spatial Data – p. 13/21
Spatial point process summary
A popular spatial add-on to the S+ package,S+SpatialStats, allows computation of K for anydata set, as well as approximate 95% intervals for it
Full inference likely requires use of the Splancssoftware, or perhaps a fully Bayesian approach alongthe lines of Wakefield and Morris (2001).
We consider only a fixed index set D, i.e., randomobservations at either points si or areal units Bi; see
Diggle (2003)Lawson and Denison (2002)Møller and Waagepetersen (2004)
for recent treatments of spatial point processes andspatial cluster detection and modeling.
Hierarchical Modeling and Analysis for Spatial Data – p. 14/21
Fundamentals of Cartography
The earth is round! So (longitude, latitude) 6= (x, y)!
A map projection is a systematic representation of all orpart of the surface of the earth on a plane.
Theorem: The sphere cannot be flattened onto a planewithout distortion
Instead, use an intermediate surface that can beflattened. The sphere is first projected onto the thisdevelopable surface, which is then laid out as a plane.
The three most commonly used surfaces are thecylinder, the cone, and the plane itself. Using differentorientations of these surfaces lead to different classesof map projections...
Hierarchical Modeling and Analysis for Spatial Data – p. 15/21
Developable surfaces
Figure 4: Geometric constructions of projections
Hierarchical Modeling and Analysis for Spatial Data – p. 16/21
Sinusoidal projection
Writing (longitude, latitude) as (λ, θ), projections are
x = f(λ, φ), y = g(λ, φ) ,
where f and g are chosen based upon properties our mapmust possess. This sinusoidal projection preserves area.
Hierarchical Modeling and Analysis for Spatial Data – p. 17/21
Mercator projection
While no projection preserves distance (Gauss’ TheoremaEggregium in differential geometry), this famous conformal(angle-preserving) projection distorts badly near the poles.
Hierarchical Modeling and Analysis for Spatial Data – p. 18/21
Calculation of geodesic distance
Consider two points on the surface of the earth,P1 = (θ1, λ1) and P2 = (θ2, λ2), where θ = latitude andλ = longitude.
The geodesic distance we seek is D = Rφ, whereR is the radius of the earthφ is the angle subtended by the arc connecting P1
and P2 at the center
Hierarchical Modeling and Analysis for Spatial Data – p. 19/21
Calculation of geodesic distance (cont’d)
From elementary trig, we have
x = R cos θ cos λ, y = R cos θ sin λ, and z = R sin θ
Letting u1 = (x1, y1, z1) and u2 = (x2, y2, z2), we know
cos φ =〈u1,u2〉
||u1|| ||u2||
Hierarchical Modeling and Analysis for Spatial Data – p. 20/21
Calculation of geodesic distance (cont’d)
We then compute 〈u1,u2〉 as
R2 [cos θ1 cos λ1 cos θ2 cos λ2 + cos θ1 sin λ1 cos θ2 sin λ2 + sin θ1 sin θ2]
= R2 [cos θ1 cos θ2 cos (λ1 − λ2) + sin θ1 sin θ2] .
But ||u1|| = ||u2|| = R, so our final answer is
D = Rφ = R arccos[cos θ1 cos θ2 cos(λ1 − λ2) + sin θ1 sin θ2] .
Hierarchical Modeling and Analysis for Spatial Data – p. 21/21