Capture-recapture and Disease Registers Geraldine Surman Matthias Pierce 15 March 2010
Nov 18, 2014
Capture-recapture and
Disease Registers
Geraldine Surman Matthias Pierce15 March 2010
Capture-recapture
• What is C-Rc?• What is C-Rc for?• Methods• How useful is it?
What is Capture-recapture?
• Ecologists• Capture – mark - release – recapture• % marked used to estimate population size• Epidemiology – increasing use, methods
developing
Disease Registers
• List of cases• Multiple sources of ascertainment (case
identification)
Assumptions
• Closed model• ‘Captures’ are matchable• Independence of sources• Homogeneity/equal catchability
Fitting the assumptions to the register
• Closed model – location at birth used to select– Followed up till age 5 yrs, even if move out
• Matchability – personal identifiers used• Every child with CP born in area should be
equally catchable by any one source NB severity
• Independence – unlikely to be met!
4Child - The Sources
• 22 different source types• 2337 CP notifications, 1984-2003 births– Each child notified up to 5 times
• Sources with < 10 notifications excluded• Three analytic sources constructed– 1. Child Health Information Systems (45%)– 2. everything else except (40%)– 3. Health Visitors (15%)
2 sources of cases
Source 1 Source 2
n11n10 n01
n00
Analytical methods used – estimators2 source scenario
Source 2
1 0
Source 1 1 n11 n10 n1+
0 n01 n00 n0+
n+1 n+0 N
Analytical methods used - estimators
• Taking assumption of independence of sources –
Rearrange to get (Lincoln-Petersen) estimate:
• For small samples: Chapman’s two sample nearly unbiased estimates
1001
0011
.
.1
nn
nnOR
1)1(
)1).(1(
11
1001
n
nnnchap
11
100100
.
n
nnn
3 sourced scenario
Source A
Yes No
Source B Source B
Yes No Yes No
Source C Yes a b c d
No e f g x
3 sources of cases
Source 1
a
bc
xd
ef g
Source 2
Source 3
Assessing dependenceIf 3 sources are available, two sourced dependence can be assessed and accounted for by modeling the expected frequencies in a contingency table with a log linear model:
Where is the first order effect of source A at level i
is the second order (interaction) effect of sources A at level i and B at level j
ln BCij
ACij
ABij
Ck
Bj
AiijkF
Ai
ABij
Log linear modeling
•No interaction: sources are independent (1 model)
•Interaction between 2 sources only (3 models)
•Interactions between pairs of sources (3 models)
•Interactions between all sources 2 by 2 (1 model - saturated)
Ck
Bj
AiijkF ln
ABij
Ck
Bj
AiijkF ln
ACij
ABij
Ck
Bj
AiijkF ln
ln BCij
ACij
ABij
Ck
Bj
AiijkF
Assessing model fit
• Using Gsquared goodness-of-fit statistic:
• Then using the parsimony of the model (simple is best!):
)/ln(22jijj ExpObsObsG
.).)(2/(ln
.).(22
2
fdNGBIC
fdGAIC
obs
AIC = Akaike information criterionBIC=Bayesian information criterion
Chapman’s – 4Child data
• -ve dependence between each pair (s1/s2, s1/s3 and s2/s3)
• Pop n estimates higher than observed• Fit with assumptions not good
Further Analytical methods
• Log linear modelling fitted to the data in a contingency table - to account for dependence (interaction)between sources
• Backwards elimination• Akaike Information Criterion (AIC), Bayesian
Information Criterion (BIC) evaluates the likelihood and the parsimony of the model.
Further Analytical methods (2)
• Heterogeneity – stratification• Sex• Severity of impairment• County of birth• Birthweight
• CIs allowing for uncertainty of observed number of cases
• Chi square on difference between subgroups• Revised birth prevalence estimates
Illustration
• Use the Poisson command in STATA for independence log-linear model, maximum likelihood, main effects only.
• poisson n cr1 cr2 cr3 • Results: coeff = 5.391907• exp β0 = 219.6• est N = 1355 + 220 = 1575
Illustration (2)
• Fit the most complex log-linear model. xi: poisson n (i.cr1 i.cr2 i.cr3) i.cr1*i.cr2
i.cr1*i.cr3 i.cr2*i.cr3• Results indicate that interactions cr1.cr3, and
cr2.cr3 may be dropped• Running the post-estimation command,
predict, gives a population estimate of 1833
Illustration (3)
• xi: nestreg, qui lr: poisson n (i.cr1 i.cr2 i.cr3) i.cr1*i.cr2 i.cr1*i.cr3 i.cr2*i.cr3 p-values confirm dropping cr1.cr3, and cr2.cr3 (marginal)
• nestreg was rerun alternating the 2-way interactions in last position to find the lowest AIC and BIC model – keep cr2.cr3
Illustration (4)
• . xi: poisson n (i.cr1 i.cr2 i.cr3) i.cr1*i.cr2 i.cr2*i.cr3
• run estat gof (post estimation command)• Results: Goodness-of-fit chi2 = .0934347• Prob > chi2(1) = 0.7599• So H0: model fits, p-value = no evidence of lack
of fit
Illustration (5)
• . xi: poisson n (i.cr1 i.cr2 i.cr3) i.cr1*i.cr2 i.cr2*i.cr3
• Post estimation command, predict, yields a population estimate of 1860
Confidence Intervals
Var(estN) = (estn0)2x(SE)2 + 1355 x 505/ 1860
= 5052 x 0.17327732 + 1355 x 505/1860Var(estN) = 7370 + 368
= 7739SE(estN) = √7739
= 88.0estN ±1.96 x SE(estN) = 1860 ±1.96 x 88
= 1688 to 2032 The STATA command nlcom gives similar CIs. nlcom 1355+exp(_b[_consGives: = 1688 to 2031
Results and Conclusions
• Overall significant % uncaptured individuals• Severe motor impairment low % missing• Two counties 6-10% missing
Results and Conclusions (2)
• ‘Corrected’ birth prevalence estimates higher than observed but supported a decline in CP rate over time
• Good ascertainment is possible where resources are focussed
Estimating the UK-wide prevalence of gastroschisis using 3 sources and C/R
• 3 sources:– UK Obstetric Surveillance System
(UKOSS)– British Assocation of Paediatric
Surgeons (BAPS)– British Isles Network Of Congenital
Anomalies Registers (BINOCAR) <50% coverage
2 different analyses:- Binocar areas and non-binocar areas
BINOCAR area analysis
• 2 sourced capture recapture analysis between all cases caught by BINOCAR and UKOSS areas– Confidence intervals using goodness of fit statistic
Gsquared• 3 sourced analysis for all livebirth cases to
assess independence of sources
Non-BINOCAR area analysis
• Two sourced analysis on livebirth data only• UKOSS underascertainment estimated• Total estimated cases extrapolated using– CI’s calculated using bootstrapping
Calculating prevalence estimate
• Need to accomodate two sources of variation – that in the prevalence estimate and that in the c/r estimate
• To combine these two sources, use techniques borrowed from multiple imputation.
• After N bootstraps:VB = Variance in the bootstrapped
estimates of the incidenceσ2
i=the square of the standard error of the incidence in the ith bootstrapB
M
i
i
Tot VM
M
MV
11
2
Findings
• Amongst regions where BINOCAR operates, C/R did not add many cases (.01%). – Not surprising since BINOCAR caught >95% of
cases and 88% of cases were caught by two or more sources
• In non-BINOCAR areas, an extra 15% of cases were added after C/R– 49% caught by both registers
Would I use C-Rc again?
• % individuals notified only once• C-Rc– Much more complex initially– Once set up, easy to repeat– Provides population estimates– Comparable with other studies
• Yes, I would use it again, always emphasising the range of estimates according to CIs