Top Banner
Measuring spatial clustering in disease patterns. Peter Congdon, Queen Mary University of London [email protected] http://www.geog.qmul.ac.uk/staff/congdonp.html http://webspace.qmul.ac.uk/pcongdon/ 1
58

Measuring spatial clustering in disease patterns.

Feb 23, 2016

Download

Documents

shada

Measuring spatial clustering in disease patterns. Peter Congdon , Queen Mary University of London [email protected] http://www.geog.qmul.ac.uk/staff/congdonp.html http://webspace.qmul.ac.uk/pcongdon/. Background: spatial correlation. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Measuring spatial clustering in disease patterns.

1

Measuring spatial clustering in disease patterns.

Peter Congdon, Queen Mary University of [email protected]

http://www.geog.qmul.ac.uk/staff/congdonp.htmlhttp://webspace.qmul.ac.uk/pcongdon/

Page 2: Measuring spatial clustering in disease patterns.

2

Background: spatial correlation Tobler’s First Law of Geography: “All places are related

but nearby places are more related than distant places” Spatial correlation: similar values in nearer spatial units

than more distant units Common feature of geographically configured datasets

(spatial econometrics, area health, political science, etc).

Can have positive or negative correlation, but positive correlation most common

Spatial correlation indices measure correlation but also account for distance between (or contiguity of )spatial units

Reference (null) pattern: spatial randomness. Values observed at one location do not depend on values observed at neighboring locations

Page 3: Measuring spatial clustering in disease patterns.

3

Background: spatial heterogeneity Michael Goodchild in “Challenges in

geographical information science”, Proc RSA 2011”

Mentions a second principle of spatial data: spatial heterogeneity.

In fact, an example of such heterogeneity is local variation in the degree of spatial dependence, leading to local indices of spatial association (LISA measures)

Page 4: Measuring spatial clustering in disease patterns.

4

Background: observation types

My focus is on spatial lattice data: N areal subdivisions (e.g. administrative areas) which taken together constitute the entire study region.

Unlike point data (e.g. mineral readings in geostatistics), where major focus is on interpolating a response between observed locations.

Page 5: Measuring spatial clustering in disease patterns.

5

Global Indices of Spatial Association Moran Index (for N areas, and continuous

centred data Zi)

Page 6: Measuring spatial clustering in disease patterns.

6

Spatial Weights

Possible options for spatial weights W=[wij]

Adjacency/contiguity: if area j is adjacent to area i, then wij=1; otherwise wij=0.

wij a distance-based weight such as the inverse distance between locations i and j: wij=1/dij

Page 7: Measuring spatial clustering in disease patterns.

7

Global Indices of Spatial Association: Binary data

Page 8: Measuring spatial clustering in disease patterns.

8

Global Indices of Spatial Association: Binary data

Page 9: Measuring spatial clustering in disease patterns.

9

Background: Area health data and spatial correlation

Health data with full population coverage (as opposed to survey data) often only available for geographic aggregates.

These may be small neighbourhoods, such as English lower super output areas (LSOAs). Average 1500-2000 population.

Small area units (with relatively homogenous social structure, physical environment and other exposures) preferable for epidemiological inferences in terms reducing ecologic bias

Page 10: Measuring spatial clustering in disease patterns.

10

Background: Area health data and spatial correlation

Examples of area health data (e.g. for electoral wards, LSOAs): mortality data by cause, cancer incidence data, health prevalence data

Spatial correlation in area health outcomes reflects clustering in risk factors (observed and unobserved), such as deprivation/affluence, health behaviours, environmental factors, neighbourhood social capital, etc

Page 11: Measuring spatial clustering in disease patterns.

11

Bayesian Relative Risk Models for Area Spatial Data

Bayesian models for area disease risks now widely applied (to detect smooth underlying risk surface over region, etc).

Assume observed disease counts yi Poisson distributed,

yi ~Po(eiri), (ei = expected counts) Relative risks ri have average 1 when

sum(expected)=sum(observed). Expected counts (demographic sense) based on applying region-wide disease rates to each small area population

Page 12: Measuring spatial clustering in disease patterns.

12

Bayesian Relative Risk Models for Area Spatial Data

One option for modelling area relative risks, convolution scheme (Besag et al, 1991) log(ri)=+si+ui, Spatial error: si~Conditional Autogressive (CAR)

prior Heterogeneity/overdispersion error: ui ~

Unstructured normal White Noise

Page 13: Measuring spatial clustering in disease patterns.

13

Neighbourhood Clustering in Elevated Risk

Consider binary risk measures: bi=1 if relative risk ri>1, bi=0 otherwise. These binary indicators are latent (unknown) as ri are

latent. Can use other thresholds (e.g. ri>1.5) Interest often in posterior exceedance probabilities

of elevated disease risk Ei=Pr(ri>1|y)=Prob(bi=1|y) in each area separately. Possible rules: area i a hotspot if Ei > 0.9 or if Ei>0.8.

Suitable threshold that Ei must exceed may depend on data frequency (higher thresholds can be set for more frequent data)

Page 14: Measuring spatial clustering in disease patterns.

14

Page 15: Measuring spatial clustering in disease patterns.

15

Neighbourhood Clustering in Elevated Risk “Hotspot” detection does not measure broader

local clustering in relative risks. Can have high risk and low risk clusters.

Can define high risk cluster centre: area i embedded in high risk cluster (i.e. high risk cluster centre) if both area i and all surrounding areas j have elevated risk, (Ei and Ej both high).

By contrast, high risk outlier: high risk area i (Ei high), but all adjacent areas j are low risk (Ej low)

Also, cluster edge area: high risk area i (Ei high), but adjacent areas j are mix of high and low risk

Page 16: Measuring spatial clustering in disease patterns.

16

Neighbourhood Clustering in Elevated Risk

Low risk cluster centre:area i embedded in low risk cluster: both area i and surrounding areas j have low risk (Ei and Ej both low)

By contrast:low risk outlier, low risk area i (Ei low) but all adjacent areas are high risk

Page 17: Measuring spatial clustering in disease patterns.

17

Spatial Scan Clusters Most well known approach to spatial clustering

of lattice data based on spatial scan method: produces lists of areas in a cluster at given significance, e.g. under Poisson model for data

Spatial scan: circle (or ellipse) of varying size systematically scans the study region (moving window).

Each geographic unit (e.g. census tract, LSOA) is a potential cluster centre.

Clusters are reported for those circles (or other area shapes) where total observed values within circle are greater than expected values.

Page 18: Measuring spatial clustering in disease patterns.

18

Stochastic Approach to Measuring Clustering in Elevated Risk

Method to be described provides measure of cluster status for each area in situation where relative health risks ri (and binary health status bi) are unknowns

Can be considered a method of cluster detection, included in MCMC updating

Includes high risk and low risk clustering in single perspective, and also encompasses outliers (isolated high or low risk hotspot)

Page 19: Measuring spatial clustering in disease patterns.

19

Synthetic Data Known adjacency structure: 113 middle level

super output areas (MSOAs) in Outer NE London

15 out of 113 areas have high RR (ri circa 1.75). Remainder have below average RR (ri circa 0.9).

High risk areas are located in three high risk clusters

Known yi and ei, and hence known crude relative risks (yi/ei), but whether (latent) RRs significantly elevated or not depends on amount of information in data (data frequency)

Page 20: Measuring spatial clustering in disease patterns.

20

Synthetic Data Assess Ei and bi (using Besag et al

convolution model) according to different expected cases: ei=20.39, or ei=58.77.

For ei=20.39, yi are either 18 or 36 (to ensure sum of observed and expected are the same)

For ei=58.77, yi are either 52 or 103

Page 21: Measuring spatial clustering in disease patterns.

21

Synthetic Data. Average e=20.39, Known RRs

Page 22: Measuring spatial clustering in disease patterns.

22

Local Join-Counts to Detect Clustering in Relative Disease Risk

As mentioned above, global join counts (BB-WW-BW) measure global spatial clustering in binary risk indicators bi (note BW statistic combines two types of discrepancy)

To detect local clustering in risk (or outlier status), use local versions of global BB statistics.

Page 23: Measuring spatial clustering in disease patterns.

23

Local Join-Counts to describe local clustering

Local version of BB statistic: summation only over neighbours of area i (not double summation)

J11i=bi∑jwijbj

wij either distance based or contiguity based (wij=1 if areas i and j adjacent, wij=0 otherwise)

J11i measures high risk “cluster embeddedness” or high risk cluster centre status. J11i will be high for areas surrounded by other high risk areas

Page 24: Measuring spatial clustering in disease patterns.

24

Local Join-Counts to describe local clustering

Local version of BW statistic : J10i=bi∑jwij(1-bj)Measures high risk outlier status: when

area i has elevated risk, but all neighbours have low risk

Also tends to increase for high risk cluster edges: area i has elevated risk, but many neighbours have low risk

Page 25: Measuring spatial clustering in disease patterns.

25

Local Join-Counts for low risk clustering

Local version of WW statistic : J00i=(1-bi)∑jwij(1-bj) high when area i and its neighbours both have low riskFinally, local WB statistic. Measures

situation of low risk area but discrepant from neighbours

J01i=(1-bi)∑jwijbj

Page 26: Measuring spatial clustering in disease patterns.

26

Local Join-Counts under Binary Spatial Weights

Consider binary weights wij Denote areas adjacent to area i as its neighbourhood”

Ni Li =number areas adjacent to area i (number of areas in

neighbourhood Ni) Common high risk joins formula (local BB count) is now J11i=bi∑jNibj

Local BW count: J10i=bi∑jNi (1-bj) Also: J01i=(1-bi)∑jNibj

J00i=(1-bi) ∑jNi(1-bj)

Page 27: Measuring spatial clustering in disease patterns.

27

Local Join-Counts under Binary Spatial WeightsSimple to show (and self-evident)

Li=J11i+J10i+J01i+J00iMultinomial sampling: Denominators

Li known, but {J11i,J10i,J01i,J00i} are unknowns in modelling situation with relative disease risks ri and risk indicators bi as unknowns.

Page 28: Measuring spatial clustering in disease patterns.

28

Probabilities of Local ClusteringProportion π11i of joins representing joint high

risk, defined by E(J11i)=Liπ11i

Estimate during MCMC run (J11i and bi varying by iterations) asπ11i=J11i/Li=bi∑jNibj/Li

π11i estimates probability that area i is member of high risk cluster.

As 11i Ei, area i likely to be cluster centre Term ∑jNibj/Li 1 when all adjacent areas have

definitive high risk

Page 29: Measuring spatial clustering in disease patterns.

29

Probabilities of Local Clustering

Proportion of local joins that are (1,0) pairs, defined by

E(J10i)=Liπ10iEstimates probability that area i

is high risk local outlierEstimate during MCMC run: π10i=J10i/Li=bi∑jNi (1-bj)/Li,

Page 30: Measuring spatial clustering in disease patterns.

30

Decomposition of Exceedance Probability

Can show that Ei=Pr(ri>1|y)=π11i+π10iHave J11i+J10i=bi∑jNibj+bi∑jNi(1-bj)=biLi

so that E(J11i)+E(J10i)=E(bi)Li=EiLi

Also by definition E(J11i)+E(J10i)=Liπ11i+Liπ10i

Page 31: Measuring spatial clustering in disease patterns.

31

Synthetic Data. Average e=20.39, Known RRs

Page 32: Measuring spatial clustering in disease patterns.

32

Synthetic Data Example: Cluster Focus

Area 25, cluster centre. So also is area 23 in terms of having just high risk neighbours

Areas 27 and 28, cluster edges (have as many background risk neighbours as high risk neighbours)

Areas 22,23,25,27,28 all have true RR of 1.77, surrounding areas have RR of 0.88.

Page 33: Measuring spatial clustering in disease patterns.

33

Cluster Focus (simulation with average ei=20.39, and bi=1 if ri>1)

Area IDri (poster-ior mean) Pr(bi=1)=Ei 11i 10i

22 1.45 1.00 0.89 0.1123 1.55 1.00 0.99 0.0025 1.48 1.00 1.00 0.0027 1.39 1.00 0.74 0.2628 1.35 0.99 0.67 0.3324 1.04 0.58 0.37 0.2129 0.98 0.40 0.23 0.1731 0.99 0.46 0.26 0.2133 1.00 0.47 0.26 0.2139 0.97 0.39 0.22 0.1740 0.97 0.37 0.18 0.19

Cluster

Background Risk

Page 34: Measuring spatial clustering in disease patterns.

34

Cluster Focus (simulation with average ei=58.77, and bi=1 if ri>1)

Area IDri (poster-ior mean) Pr(bi=1)=Ei 11i 10i

22 1.64 1.00 0.85 0.1523 1.70 1.00 1.00 0.0025 1.67 1.00 1.00 0.0027 1.58 1.00 0.65 0.3528 1.56 1.00 0.58 0.4224 0.98 0.39 0.21 0.1829 0.93 0.24 0.10 0.1431 0.94 0.30 0.13 0.1733 0.95 0.29 0.13 0.1639 0.93 0.23 0.11 0.1340 0.93 0.22 0.08 0.15

Cluster

Background Risk

Page 35: Measuring spatial clustering in disease patterns.

35

Cluster Centres and Edges

Cluster centre status verified: 11i Ei for areas 25 and 23.

Cluster edge status becomes clearer with more frequent data (for areas 27 and 28)

Page 36: Measuring spatial clustering in disease patterns.

36

Cluster Focus (simulation with average ei=20.39)Map of High Risk Cluster Probabilities 11i

Page 37: Measuring spatial clustering in disease patterns.

37

Cluster Focus (simulation with average ei=58.77)Map of High Risk Cluster Probabilities 11i

Page 38: Measuring spatial clustering in disease patterns.

38

Another simulation where clustering pattern known: cluster centre status under uneven risk scenario

Performance of 11i for measuring cluster centre status for contrasting situations

(1) EVEN RISK. High risk characterises all neighbours surrounding area i (so area i is cluster centre), and risk evenly distributed among neighbours

(2) UNEVEN RISK. High risk is not common to all neighbours, but unevenly concentrated among a few neighbours, so area i is no longer a cluster centre, and possibly a cluster edge.

Page 39: Measuring spatial clustering in disease patterns.

39

Even risk vs uneven risk scenarios

Page 40: Measuring spatial clustering in disease patterns.

40

Page 41: Measuring spatial clustering in disease patterns.

41

Winbugs code model {for (i in 1:N) {y[i] ~ dpois(mu[i]); mu[i] <- e[i]*r[i] log(r[i]) <- alph+s[i]+u[i]; u[i] ~ dnorm(0,tau.u); b[i] <- step(r[i]-1); # joins and join counts for (j in C[i]+1:C[i+1]) { j11[i,j] <- b[i]*b.map[j]; j10[i,j] <- b[i]*(1-b.map[j]) j01[i,j] <- (1-b[i])*b.map[j]; j00[i,j] <- (1-b[i])*(1-b.map[j])} J11[i] <- sum(j11[i,C[i]+1 : C[i+1]]); J10[i] <- sum(j10[i,C[i]+1 : C[i+1]]) J01[i] <- sum(j01[i,C[i]+1 : C[i+1]]); J00[i] <- sum(j00[i,C[i]+1 : C[i+1]]) pi.L[1,i] <- J11[i]/L[i]; pi.L[2,i] <- J10[i]/L[i]; pi.L[3,i] <- J01[i]/L[i]; pi.L[4,i] <- J00[i]/L[i]} # neighbourhood vector of risks and indicators for (i in 1:NN) { wt[i] <- 1; r.map[i] <- r[map[i]]; b.map[i] <- b[map[i]]} # priors alph ~ dflat(); tau.s ~ dgamma(1,0.001); rho ~ dexp(1); tau.u <-

rho*tau.s s[1:N] ~ car.normal(map[], wt[], L[], tau.s)}

Page 42: Measuring spatial clustering in disease patterns.

42

Real Example: Suicide in North West England Suicide counts {yi,ei} for 922 small areas

(middle level super output areas, MSOAs) in NW England over 5 years (2006-10).

Model: yi ~Po(eiri), relative risks ri averaging 1 log(ri)=+si+ui, si~CAR, ui ~ WN

o Overdispersion: ui needed as well as spatial term

Monitor exceedance and high risk clustering with bi=1 if ri>1, bi=0 otherwise.

Spatial interactions wij binary, based on adjacency

Page 43: Measuring spatial clustering in disease patterns.

43

Smoothed Suicide Risk

Note small expected values ei, average

3.5: impedes strong inferences about

elevated risk, and so also about clustering

Page 44: Measuring spatial clustering in disease patterns.

44

Real Example: Suicide in North West England Flexscan (developed by Toshiro Tango)

detects five significant clusters (p value under 0.05): most likely cluster (albeit irregular shape) consists of 9 areas in Blackpool.

1.Census areas included .: 587, 588, 590, 591, 593, 594, 595, 597, 599 Maximum distance.......: 5823.08 (areas: 587 to 599) Number of cases .......: 68 (Expected number of cases: 31.0964) Overall relative risk .: 2.18675 Statistic value .......: 16.5159 Monte Carlo rank ......: 6/1000 P-value ...............: 0.006

Page 45: Measuring spatial clustering in disease patterns.

45

High Suicide Risk Cluster, Blackpool

and Surrounds

Page 46: Measuring spatial clustering in disease patterns.

46

Real Example: Suicide in North West England, Areas within the Flexscan cluster

Exceedance Prob

High risk cluster prob

High risk outlier prob

ID_all_922ARCMAP

ID y_i e_i E_i pi11_i pi10_i587 2 6 3.6 0.84 0.62 0.22588 3 6 3.7 0.80 0.52 0.28590 5 11 3.4 0.99 0.75 0.24591 6 6 2.5 0.89 0.54 0.34593 8 7 3.8 0.87 0.54 0.33594 9 10 3.8 0.98 0.87 0.11595 10 7 3.5 0.91 0.77 0.14597 12 9 3.1 0.97 0.75 0.21599 14 6 3.6 0.86 0.65 0.21

68 31.1

Page 47: Measuring spatial clustering in disease patterns.

47Exceedance Probs for Blackpool Suicide Cluster (ARCMAP area IDs)

Possible QuestionsWhat is most plausible cluster centre (if any)?Which areas are more likely to be cluster edges?Of two areas inside the doughnut, area 7 has higher exceedance prob (E7=0.72, E4=0.48).Area 9 has E9=0.98, and five of 6 neighbours have Ej>0.8. Other neighbour has Ej=0.72. Area 9 has highest π11i namely 0.87. Area 6 has four neighbours, only two with Ej>0.8, two with Ej below 0.5 (E4=0.48, E41=0.26). Has π11i=0.54, π10i=0.34 cluster edge

Page 48: Measuring spatial clustering in disease patterns.

48

Local Join-Counts for Bivariate ClusteringLocal BB statistic for two outcomes A, B with event counts yAi, yBi. Binary indicators

bABi=1 if both rAi>1 and rBi>1 bABi=0 otherwiseBivariate high risk clustering assessed using local bivariate join count

J11ABi=bABi∑jwijbABj

Page 49: Measuring spatial clustering in disease patterns.

49

Local Join-Counts for Bivariate Clustering

J11ABi high in bivariate high risk cluster – when area i, and neighbours j of area i, both have high risk on both outcomes.

Bivariate high risk clustering probability π11ABi, proportion of joins that are joint high risk, defined by

E(J11ABi)=Liπ11ABiEstimate during MCMC run via

π11ABi=J11ABi/Li

Page 50: Measuring spatial clustering in disease patterns.

50

Two outcomes: Likelihood and Prior

NW England, MSOAs, yA suicide deaths, yB self-harm hospitalisations

Self harm much more frequent than suicide, average ei is 93.

Likelihood yAi ~ Po(eAirAi), yBi ~ Po(eBirBi) Assume correlated spatial effects log(rAi)=A+sAi+uAi; log(rBi)=B+sBi+uBi, uAi ~ WN, uBi ~ WN SA:B,i~BVCAR,

Page 51: Measuring spatial clustering in disease patterns.

51

Example: suicide mortality and self-harm hospitalisations in North West England

Smoothed suicide risk rAi, Wigan and adjacent boroughs

Page 52: Measuring spatial clustering in disease patterns.

52

Example: suicide mortality and self-harm hospitalisations in North West England

Smoothed self-harm risk, rBi, Wigan and adjacent boroughs

Page 53: Measuring spatial clustering in disease patterns.

53

Bivariate clustering: suicide and self-harm, Wigan and surrounds

Probabilities π11ABi of joint outcome high risk cluster status

Page 54: Measuring spatial clustering in disease patterns.

54

Another Bivariate Example: Pre-Primary Obesity (yA) and End-Primary Child Obesity (yB) in NE London.

Map is of RRs in Pre-Primary Obesity

MSOAsRelative Risks (Pre-primary obesity)

0.68 - 0.83

0.84 - 0.93

0.94 - 1.02

1.03 - 1.13

1.14 - 1.26

Page 55: Measuring spatial clustering in disease patterns.

55

RRs for End-Primary Child Obesity (yB).Relative risks in this outcome show negative

skew

MSOAsRelative Risk (end-primary obesity)

0.65 - 0.88

0.89 - 0.97

0.98 - 1.05

1.06 - 1.10

1.11 - 1.19

Page 56: Measuring spatial clustering in disease patterns.

56

Probabilities of Joint High Risk Clustering

MSOAsHigh Risk Clusters (pi11_AB)

0.00 - 0.25

0.25 - 0.75

Over 0.75

Page 57: Measuring spatial clustering in disease patterns.

57

Probabilities of Joint Low Risk Clustering

MSOAsJoint Low Risk Probs, pi00_ AB

Under 0.2

0.2 - 0.8

Over 0.8

Page 58: Measuring spatial clustering in disease patterns.

58

Final Remarks Cluster status approach can be embedded

within different models (including model averaging or covariate impacts). Clustering (as well as exceedance) inferences can be compared. So provides “model based clustering”

Provides alternative perspective to “list of areas” approach, and additional insights with regard to cluster centres vs edges, low risk clustering as well as high risk clustering in an integrated

perspective, high/low risk outliers

Can also apply bivariate method when outcome A is disease, outcome B is risk factor. Detects varying strength of association between disease and risk factor