Detecting Spatial Clustering in Detecting Spatial Clustering in Matched Case-Control StudiesMatched Case-Control Studies
Andrea Cook, MSAndrea Cook, MS
Collaboration with:Collaboration with:
Dr. Yi LiDr. Yi Li
November 4, 2004November 4, 2004
OutlineOutline1.1. MotivationMotivation
• Petrochemical exposure in relation to childhood Petrochemical exposure in relation to childhood brain and leukemia cancersbrain and leukemia cancers
2.2. Cumulative Geographic ResidualsCumulative Geographic Residuals• UnconditionalUnconditional• ConditionalConditional
3.3. Simulation ResultsSimulation Results• Type I error Type I error • Power CalculationsPower Calculations
4.4. ApplicationApplication• Childhood Leukemia Childhood Leukemia • Childhood Brain CancerChildhood Brain Cancer
5.5. SoftwareSoftware6.6. DiscussionDiscussion
• Limitations Limitations • Future ResearchFuture Research
Taiwan Petrochemical StudyTaiwan Petrochemical Study
Matched Case-Control StudyMatched Case-Control Study• 3 controls per case3 controls per case• Matched on Age and GenderMatched on Age and Gender• Resided in one of 26 of the overall 38 Resided in one of 26 of the overall 38
administrative districts of Kaohsiung administrative districts of Kaohsiung County, TaiwanCounty, Taiwan
• Controls selected using national Controls selected using national identity numbers (not dependent on identity numbers (not dependent on location). location).
Study PopulationStudy Population
Due to dropout approximately 50% 3 to 1 matching, Due to dropout approximately 50% 3 to 1 matching, 40% 2 to 1 matching, and 10% 1 to 1 matching.40% 2 to 1 matching, and 10% 1 to 1 matching.
LeukemiaLeukemia Brain CancerBrain Cancer
CasesCases 121121 111111
ControlsControls 287287 259259
Map of KaohsiungMap of Kaohsiung
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
##
#
#
#
#
#
##
#
#
#
#
#
#
#
#
# #
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
##
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
# #
#
#
#
#
#
##
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
##
#
#
#
# #
#
#
#
#
#
#
##
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
##
#
#
#
#
##
#
#
#
##
#
#
#
#
#
#
#
#
#
#
#
#
#
#
##
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
##
#
#
#
#
#
#
#
#
#
#
#
##
##
#
#
#
#
#
#
#
#
#
#
#
#
#
##
#
#
#
#
#
#
#
#
#
#
#
#
#
#
##
#
#
#
#
##
#
#
##
#
#
##
#
#
#
#
#
#
#
#
#
#
##
#
#
#
#
#
#
#
#
#
# #
#
##
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
##
#
#
#
#
#
#
#
#
#
#
#
#
#
#
##
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
##
#
#
#
#
#
#
#
#
#
##
#
#
#
###
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
##
#
#
##
#
#
#
#
#
#
#
#
#
#
#
#
#
#
###
#
#
#
#
#
#
#
#
#
#
#
##
#
#
#
#
###
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
##
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
##
#
#
#
#
#
#
##
#
#
#
#
#
#
#
#
#
#
#
###
# #
#
#
#
#
##
# #
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
# #
#
#
#
#
#
#
#
##
##
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
##
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
##
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
##
#
#
##
#
#
#
#
#
#
#
##
#
#
#
#
#
#
#
#
#
#
##
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
##
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
##
#
###
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
$$$
$
Nantze
Jenwu
Linyuan
Tsoying
# Study Participants$ Petro Plants
Cumulative ResidualsCumulative Residuals
Unconditional (Independence)Unconditional (Independence)• Model definition using logistic regressionModel definition using logistic regression• Extension to Cluster DetectionExtension to Cluster Detection
Conditional (Matched Design)Conditional (Matched Design)• Model definition using conditional logistic Model definition using conditional logistic
regressionregression• Extension to Cluster DetectionExtension to Cluster Detection
Logistic ModelLogistic ModelAssume the logistic model where,Assume the logistic model where,
and the link function,and the link function,
Therefore the likelihood score function for isTherefore the likelihood score function for is
with information matrixwith information matrix
ii Y1i
Yiii )p1(p)p|Y(L
. )p(logit)p(g ii iβX
n
1ii )exp(1
)exp(Y)(U
i
ii βX
βXXβ
β
.)exp(1
)exp()( T
n
1i2 ii
i
i XXβX
βXβI
Residual FormulationResidual Formulation
Then define a residual as,Then define a residual as,
where is the solution to .where is the solution to .
Assuming the model is correctly specified would Assuming the model is correctly specified would imply there is no pattern in residuals.imply there is no pattern in residuals.
=> Use Residuals to test for misspecification.=> Use Residuals to test for misspecification.
)ˆexp(1
)ˆexp(Ye ii
i
i
Xβ
Xβ
β 0)(U β
Cumulative Residuals for Model Checking; Lin, Wei, Ying 2002
Hypothesis TestHypothesis Test
Hypothesis of interest,Hypothesis of interest,
Geographic Location, (rGeographic Location, (rii, t, tii ) )
Independent Independent
of Outcome, Yof Outcome, Yii|X|Xii
Cumulative Geographic Residual Cumulative Geographic Residual Moving Block Process is PatternlessMoving Block Process is Patternless
Unconditional Cluster DetectionUnconditional Cluster DetectionDefine the Cumulative Geographic Residual Moving Block Process as,Define the Cumulative Geographic Residual Moving Block Process as,
n
1ii2i221i112121loc ext)bx(,xr)bx(I
n
1),bb|x,x(W
Asymptotic DistributionAsymptotic Distribution
However, the asymptotic distribution of is difficult to However, the asymptotic distribution of is difficult to simulate, but it has been shown to be equivalent to the following, simulate, but it has been shown to be equivalent to the following, conditional on the observed data, distribution, conditional on the observed data, distribution,
wherewhere
i
n
1ii2
121
T
i
n
1ii2i221i112121loc
Ge)()|x,x(n
1
Gext)bx(,xr)bx(In
1),bb|x,x(W
)]ˆexp(1[
)ˆexp(ˆˆ
iXβ
iXβββ iXI
data. observed theoft independen )1,0( ~ G,...,G and
xt)bx(,xr)bx(I)|x,x(
iid
n1
n
1i22i221i1121
T
)]ˆexp(1[
)ˆexp(
iXβ
iXββ iX
)b,b|,(W 21loc
Significance TestSignificance TestTesting the NULLTesting the NULL
• Simulate N realizations ofSimulate N realizations of
by repeatedly simulating , while fixing the data at their observed by repeatedly simulating , while fixing the data at their observed values.values.
• Calculate P-valueCalculate P-value
)t,r(|Y:H iiio iX
)b,b|,(W 21loc
)b,b|,(W),...,b,b|,(W 21loc,N21loc,1
)G,...,G( n1
)b,b|x,x(Wsup)b,b(S and )b,b|x,x(Wsup)b,b(S
whereN
)b,b(S)b,b(SI
value-P
2121locx,x
21loc2121locx,x
21loc
N
1j21loc,j21loc
2121
Conditional Logistic ModelConditional Logistic ModelType of Matching: 1 case to MType of Matching: 1 case to Ms s controlscontrols
Data Structure:Data Structure:
Assume that conditional on , an unobserved stratum-specific intercept, Assume that conditional on , an unobserved stratum-specific intercept, and given the logit link, implies,and given the logit link, implies,
The conditional likelihood, conditioning on is,The conditional likelihood, conditioning on is,
.)exp(
)exp()s|Y(E 1M
1j
isis s
is
is
βX
βX
.)exp(
)exp()(L
1 s
is
s
N
1s
1M
1i
Y
1M
1j j
s
is
βX
βXβ
0Y,...,0Y,1Y s)1M(s2s1 s
s
1YY s)1M(s1 s
Score and InformationScore and Information
Denote the conditional likelihood score as,Denote the conditional likelihood score as,
with information matrix,with information matrix,
,)exp(
)exp()(U)(U
1 1
s
sN
1s
N
1s1M
1j
1M
1js
js
jsjs
1sβX
βXXXββ
.
)exp(
)exp()exp(
)exp(
)exp()(I
1
s
ss
s
sN
1s21M
1j
1M
1j
T1M
1j
1M
1j
1M
1j
T
js
jsjsjs
js
jsjs
βX
βXXβXX
βX
βXXXβ jsjs
Conditional ResidualConditional Residual
Then define a residual as,Then define a residual as,
where is the solution to .where is the solution to .
=> Use these correlated Residuals to test for patterns => Use these correlated Residuals to test for patterns based on location.based on location.
1M
1j js
sisis s )ˆexp(
)ˆexp(Ye
Xβ
Xβ i
β 0)(U β
Conditional Cumulative ResidualConditional Cumulative ResidualDefine the Conditional Cumulative Residual Moving Block Process as,Define the Conditional Cumulative Residual Moving Block Process as,
Which has been shown to be asymptotically equivalent to,Which has been shown to be asymptotically equivalent to,
wherewhere
and that are independent of observed data.and that are independent of observed data.
)1,0(~G,...,Giid
N1 1
1 sN
1s
1M
1iis2is221is11
1
2121loc ext)bx(,xr)bx(IN
1),bb|x,x(W
ss1
21T
N
1s
1M
1iis2is221is11
1
2121loc
GˆUˆIˆ|x,x
ext)bx(,xr)bx(IN
1),bb|x,x(W
1 s
βββ
1 sN
1s
1M
1iis2is221is1121 /xt)bx(,xr)bx(I)|x,x( ββ
Significance TestSignificance TestTesting the NULL Testing the NULL
• Simulate N realizations ofSimulate N realizations of
by repeatedly simulating , while fixing the data at their observed by repeatedly simulating , while fixing the data at their observed values.values.
• Calculate P-valueCalculate P-value
)t,r(|Y:H isissiso iX
)b,b|,(W 21loc
)b,b|,(W),...,b,b|,(W 21loc,N21loc,1 )G,...,G(
1N1
)b,b|x,x(Wsup)b,b(S and )b,b|x,x(Wsup)b,b(S
whereN
)b,b(S)b,b(SI
value-P
2121locx,x
21loc2121locx,x
21loc
N
1j21loc,j21loc
2121
SimulationSimulation Choice of GChoice of Gii or G or Gisis
UnconditionalUnconditionalNormalNormal DiscreteDiscrete
ConditionalConditionalNormalNormal DiscreteDiscrete
1 to 11 to 1
2 to 12 to 1
3 to 13 to 1 Type I errorType I error Power CalculationsPower Calculations
)1,0(N~G i
2/1.p.w1
2/1.p.w1~G i
)1,0(~ NGs
2/1..1
2/1..1~
pw
pwGs
3/1..2/2
3/2..2/1~
pw
pwGs
4/1..3/3
4/3..3/1~
pw
pwGs
Type I errorType I error
UnconditionalUnconditionalGenerate N xGenerate N xii and y and yii from Unif(0,10) from Unif(0,10)
Type I error is the percentage of found Type I error is the percentage of found significant clusters.significant clusters.
ConditionalConditionalGenerate N xGenerate N xisis and y and yisis from Unif(0,10) from Unif(0,10)
Type I error is the percentage of found Type I error is the percentage of found significant clusters.significant clusters.
Type I errorType I error
UnconditionalUnconditional
ConditionalConditional
300 500 1000 300 500 1000Percent of 20% 0.016 0.036 0.054 0.146 0.172 0.168
Cases 30% 0.024 0.044 0.054 0.136 0.154 0.138
Normal DiscreteNumber of Observations
1:1 2:1 3:1 1:1 2:1 3:1Number of 100 0.010 0.080 0.148 0.020 0.074 0.036
Cases 200 0.012 0.088 0.162 0.030 0.084 0.046
Normal DiscreteType of Matching
Power CalculationsPower Calculations
Two Power CalculationsTwo Power Calculations
1313 1414 1515 1616
99 1010 1111 1212
55 66 77 88
11 22 33 44
Power CalculationsPower Calculations
Single HotspotSingle Hotspot
1313 1414 1515 1616
99 1010 1111 1212
55 66 77 88
11 22 33 44
Power CalculationsPower Calculations
Multiple HotspotsMultiple Hotspots
1313 1414 1515 1616
99 1010 1111 1212
55 66 77 88
11 22 33 44
Power CalculationsPower Calculations
UnconditionalUnconditional
ConditionalConditional
1:1 2:1 3:1 1:1 2:1 3:1Single Cluster
Number of 100 0.606 0.766 0.828 0.706 0.758 0.750
Cases 200 0.886 0.964 0.990 0.908 0.950 0.982
Multi ClusterNumber of 100 0.464 0.704 0.774 0.490 0.672 0.704
Cases 200 0.844 0.946 0.974 0.854 0.932 0.948
Type of MatchingNormal Discrete
Spatial Scan Normal DiscreteSingle 0.958 0.964 0.976
Multi 0.852 0.916 0.932
ApplicationApplication
Study: Study:
Kaohsiung, Taiwan Matched Case-Control Kaohsiung, Taiwan Matched Case-Control StudyStudy
Method: Method:
Conditional Cumulative Geographic Conditional Cumulative Geographic Residual Test (Normal and Mixed Residual Test (Normal and Mixed Discrete)Discrete)
ResultsResults
Odds Ratio (p-values)Odds Ratio (p-values)
Marginally Significant Clustering for both outcomes Marginally Significant Clustering for both outcomes without adjusting for smoking history.without adjusting for smoking history.
Unadjusted Adjusted Unadjusted AdjustedDiscrete 2.10 (0.055) 2.19 (0.143) 1.97 (0.058) 2.08 (0.104)
Normal 2.10 (0.050) 2.19 (0.122) 1.97 (0.052) 2.08 (0.104)
Leukemia Brain Cancer
Childhood LeukemiaChildhood Leukemia
165000 170000 175000 180000 185000 190000
24
90
00
02
50
00
00
25
10
00
02
52
00
00
25
30
00
02
54
00
00
X1
X2
Cu
mu
lativ
e R
esi
du
als
Unadjusted
P-Values:Discrete = 0.055 Normal = 0.050
(a)
165000 170000 175000 180000 185000 190000
24
90
00
02
50
00
00
25
10
00
02
52
00
00
25
30
00
02
54
00
00
X1
X2
Adjusted
(b)
P-Values:Discrete = 0.143 Normal = 0.122
CasesControlsPlants
Childhood Brain CancerChildhood Brain Cancer
165000 170000 175000 180000 185000 190000
24
90
00
02
50
00
00
25
10
00
02
52
00
00
25
30
00
02
54
00
00
X1
X2
P-Values:Discrete = 0.052 Normal = 0.058
(a)
Cu
mu
lativ
e R
esi
du
als
Unadjusted
165000 170000 175000 180000 185000 190000
24
90
00
02
50
00
00
25
10
00
02
52
00
00
25
30
00
02
54
00
00
X1
X2
Adjusted
P-Values:Discrete = 0.104 Normal = 0.104
(b)CasesControlsPlants
SoftwareSoftware
R macro to handle both unconditional and R macro to handle both unconditional and conditional dataconditional data
Dataset:Dataset:X and Y coordinates of each participantX and Y coordinates of each participantCase/control variableCase/control variableCovariate matrixCovariate matrixStratum Variable for conditional dataStratum Variable for conditional data
Takes just a few minutes to run!Takes just a few minutes to run!
DiscussionDiscussion
Cumulative Geographic ResidualsCumulative Geographic Residuals• Unconditional and Conditional Methods for Binary Unconditional and Conditional Methods for Binary
OutcomesOutcomes• Can find multiple significant hotspots holding type I Can find multiple significant hotspots holding type I
error at appropriate levels.error at appropriate levels.• Not computer intensive compared to other cluster Not computer intensive compared to other cluster
detection methodsdetection methods
Taiwan StudyTaiwan Study• Found a possible relationship between Childhood Found a possible relationship between Childhood
Leukemia and Petrochemical Exposure, but not with Leukemia and Petrochemical Exposure, but not with the outcome Childhood Brain Cancer.the outcome Childhood Brain Cancer.
DiscussionDiscussion
Future ResearchFuture Research• Failure Time DataFailure Time Data• Recurrent EventsRecurrent Events• Relocation of Study ParticipantsRelocation of Study Participants• SurveillanceSurveillance