An example of a study problem Bias or truth?
Jan 04, 2016
An example of a study problem
Bias or truth?
How it all started 1968/69
Invited:
1622 women aged 38, 46, 50, 54 and 60 years
Examined:
1462 women (90.1%)
Psychiatric follow-up
1992/93:
Those born 1908, 1914, 1918 and 1922, i.e. the four oldest age cohorts aged 70, 74, 78 and 84,
Interview
526 of the survivors agreed (89.6%)
CT scan:
277 women agreed
Samples
• 120 ml blood
serum stored at –20°C in 2.5 ml covered polystyrene cups enclosed together in small batches, in firmly tied plastic bags for 28 years.
• Thawed once for other analyses at 25 y. and restored for two years at ‑80°C.
CT scan and Lacunar Infarcts (LI)
• Endpoint:– Lacunar infarcts– White matter lesions
Logistic regression
0 1 1 2 2
0 1 1 2 21 2 1 2( | , , , ) ( , , , ) ,1
p p
p p
x x x
p p x x x
eE Y x x x p x x x
e
Predictors – independent variables
• Homocysteine was analysed in tertiles using cut points previously calculated for the whole sample.
Predictors – independent variables
• Covariates included age, basic cardiovascular risk factors and influential factors for tHcy.
• The basic CVD risk factors were systolic blood pressure, diastolic blood pressure, serum cholesterol, serum triglycerides, BMI and smoking.
Predictors – independent variables
• Factors considered to be influential for tHcy were serum B-12, serum creatinine, coffee consumption and dietary folate.
Odds ratio for LI according to tHcy
1st tHcy tertile 2nd tHcy tertile 3rd tHcy tertile
Range 3.3-9.7 9.8-12.6 12.7-41.5
Median 8.31 11.2 14.8
n n=87 n=110 n=80
Covariate studied
LI (n)
Age alone 1 1.54(0.54-4.38) 3.65(1.34-9.90)
Systolic blood pressure* 1 1.50(0.52-4.24) 3.60(1.32-9.85)
Diastolic blood pressure* 1 1.44(0.50-4.13) 3.50(1.28-9.55)
Study design
• Type of study?
• Possible biases?
True or false
• The data must be representative
• if we want to
• generalize the results
• A homogenous sample
• makes generalization easier
Estimate of OR
valueTrue
0 ∞
OR = 1
Estimation, precision
Estimate
( | )
Estimate with confidence interval
Sample
95% confidence interval: 95% of repeated intervals will contain the true value
valueTrue
0 ∞
Precision and validity
• Measures of populations– precision - random error - statistics– validity - systematic error - epidemiology
Truevalue
Estimate
Precision
Bias
True or false
• It takes 2 to tango
• It takes 3 chords to play the blues
• It takes 4 numbers to be an epidemiologist
2 x 2 table
Exposure/test Disease No disease
positive a b
negative c d
Odds ratio for the study population;hypothesis – no effect
SA D
ORB C
No effect, OR=1
Odds ratio for the sample- observed effect
Da d
ORb c
OR> 1
Odds ratio for the sample
Da d
ORb c
OR> 1
2 x 2 table
Exposure/test Disease No disease
positive a b
negative c d
Odds ratio for the sample
Da d
ORb c
OR> 1
D too large? Those without disease and low tHcy more likely to be included?
Odds ratio for the sample
Da d
ORb c
OR> 1
a too large? Those with disease and high tHcy more likely to be included?
Conclusion 1
• If those ill and exposed are more likely to be examined, the effect estimate will be overestimated
• Likewise, if those unaffected and unexposed are more likely to be examined.
Odds ratio for the sample
Da d
ORb c
OR> 1
a too small? Those without disease and high tHcy less likely to be included?
Conclusion 2
• If those ill and exposed are less likely to be examined, the effect estimate will be attenuated
Generalization
• Do the results apply outside the sample?
• Statistical generalization– Representative sample
• Biological generalization– Information from outside the study– Homogenous sample