Top Banner
Outline Preliminaries Random patterns Estimating intensities Second order properties MODULE 5: Spatial Statistics in Epidemiology and Public Health Lecture 3: Point Processes Jon Wakefield and Lance Waller 1 / 37
37

MODULE 5: Spatial Statistics in Epidemiology and Public ...

Jun 09, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

MODULE 5: Spatial Statistics in Epidemiologyand Public Health

Lecture 3: Point Processes

Jon Wakefield and Lance Waller

1 / 37

Page 2: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Preliminaries

Random patternsHeterogeneous Poisson process

Estimating intensities

Second order propertiesK functionsMonte Carlo envelopes

2 / 37

Page 3: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

References

I Baddeley, A., Rubak, E., and Turner. R. (2015) Spatial PointPatterns: Methodology and Applications in R. Boca Raton,FL: CRC/Chapman & Hall.

I Diggle, P.J. (1983) Statistical Analysis of Spatial PointPatterns. London: Academic Press.

I Diggle, P.J. (2013) Statistical Analysis of Spatial andSpatio-Temporal Point Patterns, Third EditionCRC/Chapman & Hall.

I Waller and Gotway (2004, Chapter 5) Applied SpatialStatistics for Public Health Data. New York: Wiley.

I Møller, J. and Waagepetersen (2004) Statistical Inference andSimulation for Spatial Point Processes. Boca Raton, FL:CRC/Chapman & Hall.

3 / 37

Page 4: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Goals

I Describe basic types of spatial point patterns.

I Introduce mathematical models for random patterns of events.

I Introduce analytic methods for describing patterns in observedcollections of events.

4 / 37

Page 5: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Heterogeneous Poisson process

Terminology

I Realization: An observed set of event locations (a data set).

I Point: Where an event could occur.

I Event: Where an event did occur.

5 / 37

Page 6: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Heterogeneous Poisson process

Complete Spatial Randomness (CSR)

I Start with a model of “lack of pattern”.

I Events equally likely to occur anywhere in the study area(uniform distribution).

I Event locations independent of each other.

6 / 37

Page 7: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Heterogeneous Poisson process

Six realizations of CSR

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.00.

00.

20.

40.

60.

81.

0u

v

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

●●

●●

● ●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v ●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

7 / 37

Page 8: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Heterogeneous Poisson process

CSR as a boundary condition

CSR serves as a boundary between:

I Patterns that are more “clustered” than CSR.

I Patterns that are more “regular” than CSR.

8 / 37

Page 9: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Heterogeneous Poisson process

Too Clustered (top), Too Regular (bottom)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

Clustered

0.0 0.2 0.4 0.6 0.8 1.00.

00.

20.

40.

60.

81.

0

u

v

Clustered

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

Clustered

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

Regular

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

Regular

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

Regular

9 / 37

Page 10: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Heterogeneous Poisson process

Spatial Point Processes

I Mathematically, we treat our point patterns as realizations ofa spatial stochastic process.

I A stochastic process is a collection of random variablesX1,X2, . . . ,XN .

I Examples: Number of people in line at grocery store.

I For us, each random variable represents an event location.

10 / 37

Page 11: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Heterogeneous Poisson process

CSR as a Stochastic Process

Let N(A) = number of events observed in region A, and λ = apositive constant.

A homogenous spatial Poisson point process is defined by:

(a) N(A) ∼ Pois(λ|A|)(b) given N(A) = n, the locations of the events are uniformly

distributed over A.

λ is the intensity of the process (mean number of events expectedper unit area).

11 / 37

Page 12: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Heterogeneous Poisson process

Is this CSR?

I Criteria (a) and (b) give a “recipe” for simulating realizationsof this process:

* Generate a Poisson random number of events.* Distribute that many events uniformly across the study area.runif(n,min(x),max(x))

runif(n,min(y),max(y))

12 / 37

Page 13: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Heterogeneous Poisson process

Monte Carlo testing

Let T= a random variable representing a test statistic (somenumerical summary of the observed data).

What is the distribution of T under H0?

1. t1.

2. simulate t2, ..., tm under H0, these values will follow F0.

3. p.value = rank of t1m .

M.C. tests are useful in spatial statistics since we can simulatespatial patterns and calculate the statistics.

Example: e.g., 592 leukemia cases in ∼ 790 regions...

13 / 37

Page 14: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Heterogeneous Poisson process

Moving beyond CSR

CSR:

1. is the “white noise” of spatial point processes.

2. characterizes the absence of structure (signal) in data.

3. often the null hypothesis in statistical tests to determine ifthere is clustering in an observed point pattern.

4. not as useful in public health? Why not?

14 / 37

Page 15: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Heterogeneous Poisson process

Heterogeneous population density

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

15 / 37

Page 16: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Heterogeneous Poisson process

Heterogeneous Poisson Process

What if λ, the intensity of the process (mean number of eventsexpected per unit area), varies by location?

1. N(A) = Pois(∫

(s)∈A λ(s)ds)

(|A| =∫(s)∈A ds)

2. Given N(A) = n, events distributed in A as an independentsample from a distribution on A with p.d.f. proportional toλ(s).

We still have counts from areas ∼ Poisson and events aredistributed proportional to the intensity.

16 / 37

Page 17: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Heterogeneous Poisson process

Example intensity function

u

v

lambda

u

v

0 5 10 15 20

05

1015

20

17 / 37

Page 18: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Heterogeneous Poisson process

Six realizations

●●

● ●

●●

0 5 10 15 20

05

1015

20

u

v

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

0 5 10 15 200

510

1520

u

v ●

● ●

●●

●●

● ●

0 5 10 15 20

05

1015

20

u

v

●●

●●

●●

●●

●●

●●

● ●

● ●

0 5 10 15 20

05

1015

20

u

v

●●

● ●

●●

●●

0 5 10 15 20

05

1015

20

u

v

●●

● ●

● ●

● ●

●●

● ●

●●

●●

0 5 10 15 20

05

1015

20

u

v

18 / 37

Page 19: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Heterogeneous Poisson process

IMPORTANT FACT!

Without additional information, no analysis can differentiatebetween:

1. independent events in a heterogeneous (non-stationary)environment

2. dependent events in a homogeneous (stationary) environment

19 / 37

Page 20: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

How do we estimate intensities?

Kernel estimators provide a natural approach (Silverman (1986)and Wand and Jones (1995, KernSmooth R library)).

Main idea: Put a little “kernel” of density at each data point, thensum to give the estimate of the overall density function.

20 / 37

Page 21: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Kernels and bandwidths

s

0.0 0.2 0.4 0.6 0.8 1.0

Kernel variance = 0.02

s

0.0 0.2 0.4 0.6 0.8 1.0

Kernel variance = 0.03

s

0.0 0.2 0.4 0.6 0.8 1.0

Kernel variance = 0.04

s

0.0 0.2 0.4 0.6 0.8 1.0

Kernel variance = 0.1

21 / 37

Page 22: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Kernel estimation in R

base

I density() one-dimensional kernel

library(KernSmooth)

I bkde2D(x, bandwidth, gridsize=c(51, 51),

range.x=<<see below>>, truncate=TRUE) block kerneldensity estimation

library(splancs)

I kernel2d(pts,poly,h0,nx=20,

ny=20,kernel=’quartic’)

library(spatstat)

I ksmooth.ppp(x, sigma, weights, edge=TRUE)

22 / 37

Page 23: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Data Break: Early Medieval Grave Sites

I Alt and Vach (1991). (Data from Richard Wright EmeritusProfessor, School of Archaeology, University of Sydney.)

I Archeological dig in Neresheim, Baden-Wurttemberg,Germany.

I Question: are graves placed according to family units?

I 143 grave sites, 30 with missing or reduced wisdom teeth.

I Could intensity estimates for grave sites with and withoutwisdom teeth help answer this question?

23 / 37

Page 24: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Plot of the data

*

*

**

**

*

** * *

**

*

*

*

*

*

*

**

*

*

**

**

*

*

*

***

**

*

*

*

*

*

*

*

**

*

*

*

*

*

*

*

**

*

*

*

**

**

*

**

*

*

*

**

*

*

*

**

*

*

*

***

* *

**

*

*

*

*

**

**

*

**

*

*

*

**

*

*

*

*

**

*

*

*

*

*

*

**

** *

*

*

*

*

**

*

*

*

*

****

*

*

*

**

*

*

*

***

*

*

4000 6000 8000 10000

4000

6000

8000

10000

u

v

Grave locations (*=grave, O=affected)

OO

O

O

O

O

O

O

OO

OOO

O

O

OO

O

O

O

O

O

O

O

O

OO

OO

O

24 / 37

Page 25: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Case intensity

u

v

Intensity

Estimated intensity function

**

*

*

*

*

*

*

* *

***

*

*

**

**

*

*

*

**

*** **

*

4000 8000

4000

8000

u

v

Affected grave locations

25 / 37

Page 26: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Control intensity

u

v

Intensity

Estimated intensity function

*

*

**

*

** * *

**

*

*

*

*

*

**

*

*

**

**

*

*

***

***

*

**

**

*

*

*

*

*

*

**

*

*

*

*

*

**

*

*

**

*

*

*

*

*

** *

***

*

*

***

*

*

*

**

*

*

*

*

**

*

*

*

*

*

*

** *

*

*

*

**

*

*

**

****

*

*

**

*

*

*

*

4000 6000 8000 10000

4000

6000

8000

1000

0

u

v

Non−affected grave locations

26 / 37

Page 27: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

What we have/don’t have

I Kernel estimates suggest where there might be differences.

I No significance testing (yet!)

27 / 37

Page 28: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsMonte Carlo envelopes

First and Second Order Properties

I The intensity function describes the mean number of eventsper unit area, a first order property of the underlying process.

I What about second order properties relating to thevariance/covariance/correlation between event locations (ifevents non independent...)?

28 / 37

Page 29: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsMonte Carlo envelopes

Ripley’s K function

Ripley (1976, 1977 introduced) the reduced second momentmeasure or K function

K (h) =E [# events within h of a randomly chosen event]

λ,

for any positive spatial lag h.

I Under CSR, K (h) = πh2 (area of circle of with radius h).

I Clustered? K (h) > πh2.

I Regular? K (h) < πh2.

29 / 37

Page 30: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsMonte Carlo envelopes

Calculating K (h) in R

library(splancs)

I khat(pts,poly,s,newstyle=FALSE)

I poly defines polygon boundary (important!!!).

library(spatstat)

I Kest(X, r, correction=c("border", "isotropic",

"Ripley", "translate"))

I Boundary part of X (point process “object”).

30 / 37

Page 31: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsMonte Carlo envelopes

Plots with K (h)

I Plotting (h,K (h)) for CSR is a parabola.

I K (h) = πh2 implies (K (h)

π

)1/2

= h.

I Besag (1977) suggests plotting

h versus L(h)

where

L(h) =

(Kec(h)

π

)1/2

− h

31 / 37

Page 32: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsMonte Carlo envelopes

Monte Carlo Variability and Envelopes

I Observe K (h) from data.

I Simulate a realization of events from CSR.

I Find K (h) for the simulated data.

I Repeat simulations many times.

I Create simulation “envelopes” from simulation-based K (h)’s.

32 / 37

Page 33: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsMonte Carlo envelopes

Example: Regular clusters and clusters of regularity

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

−0.

10.

00.

10.

20.

3

Distance (h)

sqrt

(Kha

t/pi)

− h

Estimated K function, regular pattern of clusters

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

−0.

10.

00.

10.

20.

3

Distance (h)

sqrt

(Kha

t/pi)

− h

Estimated K function, cluster of regular patterns

33 / 37

Page 34: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsMonte Carlo envelopes

Data break: Medieval graves: K functions with polygonadjustment

0 1000 2000 3000 4000 5000

−40

040

0

Distance

Lhat

− d

ista

nce

L plot for all gravesites, polygon

0 1000 2000 3000 4000 5000

−40

040

0

Distance

Lhat

− d

ista

nce

L plot for affected gravesites, polygon

0 1000 2000 3000 4000 5000

−40

040

0

Distance

Lhat

− d

ista

nce

L plot for non−affected gravesites, polygon

34 / 37

Page 35: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsMonte Carlo envelopes

Clustering?

I Clustering of cases at very shortest distances.

I Likely due to two coincident-pair sites (both cases in bothpairs).

I Envelopes based on random samples of 30 “cases” from set of143 locations.

35 / 37

Page 36: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsMonte Carlo envelopes

Notes

I First and second moments do not uniquely define adistribution, and λ(s) and K (h) do not uniquely define aspatial point pattern (Baddeley and Silverman 1984, and inSection 5.3.4 ).

I Analyses based on λ(s) typically assume independent events.

I Analyses based on K (h) typically assume a stationary process(with constant λ).

I Remember IMPORTANT FACT! above.

36 / 37

Page 37: MODULE 5: Spatial Statistics in Epidemiology and Public ...

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsMonte Carlo envelopes

What questions can we answer?

I Are events uniformly distributed in space?I Test CSR.

I If not, where are events more or less likely?I Intensity estimation.

I Do events tend to occur near other events, and, if so, at whatscale?I K functions with Monte Carlo envelopes.

37 / 37