Capture Recapture Mar 10

Capture-recapture and

Disease Registers

Geraldine Surman Matthias Pierce15 March 2010

Capture-recapture

• What is C-Rc?• What is C-Rc for?• Methods• How useful is it?

What is Capture-recapture?

• Ecologists• Capture – mark - release – recapture• % marked used to estimate population size• Epidemiology – increasing use, methods

developing

Disease Registers

• List of cases• Multiple sources of ascertainment (case

identification)

Assumptions

• Closed model• ‘Captures’ are matchable• Independence of sources• Homogeneity/equal catchability

Fitting the assumptions to the register

• Closed model – location at birth used to select– Followed up till age 5 yrs, even if move out

• Matchability – personal identifiers used• Every child with CP born in area should be

equally catchable by any one source NB severity

• Independence – unlikely to be met!

4Child - The Sources

• 22 different source types• 2337 CP notifications, 1984-2003 births– Each child notified up to 5 times

• Sources with < 10 notifications excluded• Three analytic sources constructed– 1. Child Health Information Systems (45%)– 2. everything else except (40%)– 3. Health Visitors (15%)

2 sources of cases

Source 1 Source 2

n11n10 n01

n00

Analytical methods used – estimators2 source scenario

Source 2

1 0

Source 1 1 n11 n10 n1+

0 n01 n00 n0+

n+1 n+0 N

Analytical methods used - estimators

• Taking assumption of independence of sources –

Rearrange to get (Lincoln-Petersen) estimate:

• For small samples: Chapman’s two sample nearly unbiased estimates

1001

0011

.

.1

nn

nnOR

1)1(

)1).(1(

11

1001

n

nnnchap

11

100100

.

n

nnn

3 sourced scenario

Source A

Yes No

Source B Source B

Yes No Yes No

Source C Yes a b c d

No e f g x

3 sources of cases

Source 1

a

bc

xd

ef g

Source 2

Source 3

Assessing dependenceIf 3 sources are available, two sourced dependence can be assessed and accounted for by modeling the expected frequencies in a contingency table with a log linear model:

Where is the first order effect of source A at level i

is the second order (interaction) effect of sources A at level i and B at level j

ln BCij

ACij

ABij

Ck

Bj

AiijkF

Ai

ABij

Log linear modeling

•No interaction: sources are independent (1 model)

•Interaction between 2 sources only (3 models)

•Interactions between pairs of sources (3 models)

•Interactions between all sources 2 by 2 (1 model - saturated)

Ck

Bj

AiijkF ln

ABij

Ck

Bj

AiijkF ln

ACij

ABij

Ck

Bj

AiijkF ln

ln BCij

ACij

ABij

Ck

Bj

AiijkF

Assessing model fit

• Using Gsquared goodness-of-fit statistic:

• Then using the parsimony of the model (simple is best!):

)/ln(22jijj ExpObsObsG

.).)(2/(ln

.).(22

2

fdNGBIC

fdGAIC

obs

AIC = Akaike information criterionBIC=Bayesian information criterion

Chapman’s – 4Child data

• -ve dependence between each pair (s1/s2, s1/s3 and s2/s3)

• Pop n estimates higher than observed• Fit with assumptions not good

Further Analytical methods

• Log linear modelling fitted to the data in a contingency table - to account for dependence (interaction)between sources

• Backwards elimination• Akaike Information Criterion (AIC), Bayesian

Information Criterion (BIC) evaluates the likelihood and the parsimony of the model.

Further Analytical methods (2)

• Heterogeneity – stratification• Sex• Severity of impairment• County of birth• Birthweight

• CIs allowing for uncertainty of observed number of cases

• Chi square on difference between subgroups• Revised birth prevalence estimates

Illustration

• Use the Poisson command in STATA for independence log-linear model, maximum likelihood, main effects only.

• poisson n cr1 cr2 cr3 • Results: coeff = 5.391907• exp β0 = 219.6• est N = 1355 + 220 = 1575

Illustration (2)

• Fit the most complex log-linear model. xi: poisson n (i.cr1 i.cr2 i.cr3) i.cr1*i.cr2

i.cr1*i.cr3 i.cr2*i.cr3• Results indicate that interactions cr1.cr3, and

cr2.cr3 may be dropped• Running the post-estimation command,

predict, gives a population estimate of 1833

Illustration (3)

• xi: nestreg, qui lr: poisson n (i.cr1 i.cr2 i.cr3) i.cr1*i.cr2 i.cr1*i.cr3 i.cr2*i.cr3 p-values confirm dropping cr1.cr3, and cr2.cr3 (marginal)

• nestreg was rerun alternating the 2-way interactions in last position to find the lowest AIC and BIC model – keep cr2.cr3

Illustration (4)

• . xi: poisson n (i.cr1 i.cr2 i.cr3) i.cr1*i.cr2 i.cr2*i.cr3

• run estat gof (post estimation command)• Results: Goodness-of-fit chi2 = .0934347• Prob > chi2(1) = 0.7599• So H0: model fits, p-value = no evidence of lack

of fit

Illustration (5)

• . xi: poisson n (i.cr1 i.cr2 i.cr3) i.cr1*i.cr2 i.cr2*i.cr3

• Post estimation command, predict, yields a population estimate of 1860

Confidence Intervals

Var(estN) = (estn0)2x(SE)2 + 1355 x 505/ 1860

= 5052 x 0.17327732 + 1355 x 505/1860Var(estN) = 7370 + 368

= 7739SE(estN) = √7739

= 88.0estN ±1.96 x SE(estN) = 1860 ±1.96 x 88

= 1688 to 2032 The STATA command nlcom gives similar CIs. nlcom 1355+exp(_b[_consGives: = 1688 to 2031

Results and Conclusions

• Overall significant % uncaptured individuals• Severe motor impairment low % missing• Two counties 6-10% missing

Results and Conclusions (2)

• ‘Corrected’ birth prevalence estimates higher than observed but supported a decline in CP rate over time

• Good ascertainment is possible where resources are focussed

Estimating the UK-wide prevalence of gastroschisis using 3 sources and C/R

• 3 sources:– UK Obstetric Surveillance System

(UKOSS)– British Assocation of Paediatric

Surgeons (BAPS)– British Isles Network Of Congenital

Anomalies Registers (BINOCAR) <50% coverage

2 different analyses:- Binocar areas and non-binocar areas

BINOCAR area analysis

• 2 sourced capture recapture analysis between all cases caught by BINOCAR and UKOSS areas– Confidence intervals using goodness of fit statistic

Gsquared• 3 sourced analysis for all livebirth cases to

assess independence of sources

Non-BINOCAR area analysis

• Two sourced analysis on livebirth data only• UKOSS underascertainment estimated• Total estimated cases extrapolated using– CI’s calculated using bootstrapping

Calculating prevalence estimate

• Need to accomodate two sources of variation – that in the prevalence estimate and that in the c/r estimate

• To combine these two sources, use techniques borrowed from multiple imputation.

• After N bootstraps:VB = Variance in the bootstrapped

estimates of the incidenceσ2

i=the square of the standard error of the incidence in the ith bootstrapB

M

i

i

Tot VM

M

MV

11

2

Findings

• Amongst regions where BINOCAR operates, C/R did not add many cases (.01%). – Not surprising since BINOCAR caught >95% of

cases and 88% of cases were caught by two or more sources

• In non-BINOCAR areas, an extra 15% of cases were added after C/R– 49% caught by both registers

Would I use C-Rc again?

• % individuals notified only once• C-Rc– Much more complex initially– Once set up, easy to repeat– Provides population estimates– Comparable with other studies

• Yes, I would use it again, always emphasising the range of estimates according to CIs

Capture Recapture Mar 10

Documents

source b

source c

sources n01

b c d x source

sources of casessource

pairs of sources

times sources

analytic sources