Top Banner
BIOSTATISTICS II Capita Selecta, 2009 Part I Analysis of Variance Part II Generalized Linear Models Part III Multiple regression and model building Part IV Sample size calculations Part V Measuring agreement Part VI Systematic review and meta-analysis Søren Lundbye Christensen Johannes J. Struijk
47
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: BIOSTATISTICS

BIOSTATISTICS IICapita Selecta, 2009

Part I Analysis of VariancePart II Generalized Linear ModelsPart III Multiple regression and model buildingPart IV Sample size calculationsPart V Measuring agreementPart VI Systematic review and meta-analysis

Søren Lundbye ChristensenJohannes J. Struijk

Page 2: BIOSTATISTICS

Part IIIMultiple regression & model building

Literature: any serious book on statistics

Martin Bland, ”Introduction to medical statistics” Oxford Univ. Press, 2000,chapter 17.

Page 3: BIOSTATISTICS

Multiple regression & model building

Basic model:

Best (minimum mean square error) estimator:

Solution for b:

We immediately see a problem: if some of the independent variables are linearly related then the inverse of the covariance matrix doesn’t exist.

exbxbxbbY kk 22110

eXbY

bXY ˆˆ

xyxx

xyxx

SSb

SbS

bXXYX

1

TT

ˆ

ˆ

ˆ

Page 4: BIOSTATISTICS

Multiple regression & model building

Maximum voluntary contraction (MVC) of the quadriceps muscle as function of age and height of 41 alcoholics.

Page 5: BIOSTATISTICS

Multiple regression & model building

Page 6: BIOSTATISTICS

Multiple regression & model building

Model: MVC = b0 + b1xHeight + b2xAge

Multiple correlation coefficientR2 = SSReg / SST (proportion of variability accounted for)

Page 7: BIOSTATISTICS

Multiple regression & model building

Page 8: BIOSTATISTICS

Multiple regression & model building

Interaction:

MVC = b0 + b1xHeight + b2xAge + b3xHeightxAge

Note: adjusted Ra2 = 1- (1-R2)(n-1)/(n-p-1)

Page 9: BIOSTATISTICS

Multiple regression & model building

Polynomial regression: MVC = b0 + b1xHeight + b2xHeight2

Page 10: BIOSTATISTICS

Multiple regression & model building

Page 11: BIOSTATISTICS

Multiple regression & model building

Dichotomous variables

ExamplesSex: man / womanLiver disease: yes / no

Assign 0’s and 1’s to those variables and use the standard techniques

Page 12: BIOSTATISTICS

Multiple regression & model building

Variance inflation factor: VIF = 1/(1-Ri2)

VIF>10 is real problem (Ri2 >90%: 90 of influence

of xi is explained by other x’s)

Leverage: Cook’s distance (influential points)

Page 13: BIOSTATISTICS

Multiple regression & model building

Many variables?

Step-up (forward)Step-down (backward)Forward-backwardBest subset

F1,n-q=(SSE(q)-SSE(q+1)) / (SSE(q+1)/(n-q))

Page 14: BIOSTATISTICS

Part IVSample Size Calculation

Literature:

Machin et al., (1997), ”Sample size tables for clinical studies”, Blackwell, Oxford

Altman (1982), ”How large a sample?” In: Statistics in Practice (Eds. Gore & Altman), Blackwell Publishing Ltd., London

Lehr (1992), ”Sixteen s squared over d squared: a relation for crude sample size estimates”, Stat. in Med., 11:1099-1102

Martin Bland, ”Introduction to medical statistics” Oxford Univ. Press, 2000,chapter 18.

Page 15: BIOSTATISTICS

Sample Size Calculation

Importance of sample size

Common error to have a sample that is too small: low power, Type II error: no rejection of the null hyptoheses.

Page 16: BIOSTATISTICS

Sample Size Calculation

A little taxonomy of sample size calculations

Power – chance of rejecting the null-hypothesis if it is falseSignificance level – cutt-off level of the p-value below which

we reject the null-hypothesisVariability - e.g., standard deviation for numerical dataSmallest effect of interest – magnitude of the effect that we

want to be able to detect as being statistically significant

Page 17: BIOSTATISTICS

Sample Size Calculations

Sample size calculations are important for:- Estimation: effect on confidence intervals

- Examples: estimation of population mean estimation of correlation coefficient

- Tests: effect on confidence level and power- Example: 1-sample test

Literature: Martin Bland, ”Introduction to medical statistics” Oxford Univ. Press, 2000,chapter 18.

Page 18: BIOSTATISTICS

Sample Size Calculations

Methods of Sample Size Calculations

- Do the math- Special tables- Nomograms- Simulation- Computer software

Page 19: BIOSTATISTICS

Sample Size Calculations

Estimation of population mean μ.

Assume sample size = n.

Estimated mean:

Estimated variance:

Estimated standard error:

Confidence interval:

n

iinXM

1

1

n

iinMXS

1

21

12

nSMSE /)(

)(,)( 2/2/ MSEzMMSEzM

Page 20: BIOSTATISTICS

Sample Size Calculations

Estimation of population mean μ.

Width of the confidence interval:

For a desired width, Wd, of the CI we can thus calculate n:

Thus, n depends on • confidence level,• desired width of the confidence interval,• variance,• distribution of the data.

n

SzWidth 2/2

22/2

dW

Szn

Page 21: BIOSTATISTICS

Sample Size Calculations

Estimation of correlation coefficient ρ.

Assume sample size = n

Estimated correlation = r (has a very nasty distribution)

Fisher’s z transformation: has a normal distribution with

Mean:

SE:

r

rz

1

1ln

2

1

1

1ln

2

1

121

1ln

2

1

nz

31)(S nzE

Page 22: BIOSTATISTICS

Sample Size Calculations

Estimation of correlation coefficient ρ.

Confidence interval:

Example: expected r = 0.5; desired 95% CI = [0.4, 0.6]z0.4=0.424; z0.5=0.549; z0.6=0.693

z0.6-z0.5=0.144; z0.5-z0.4=0.126

31,312/2/

nzznzz

246126.03196.1 nn

Page 23: BIOSTATISTICS

Sample Size Calculations

Paired-sample test.

Test statistic:)(dse

dz d

-zβ+μd/se(d)

μd/se(d)

0

β

H0

Page 24: BIOSTATISTICS

Sample Size Calculations

Altman’s nomogramAltman (1982), ”How large a sample?”, in

Statistics in practice, eds. Gore & Altman, BMA London.

Example: difference of capillary density (per mm2) in the feet of ulcerated patients (better foot minus worse foot):Min. diff. to be detected 4 mm-2

SD(difference) = 6.1Standardized difference = 2 x (4/6.1)= 1.31Required Power = 0.80Significance level = 0.05

Page 25: BIOSTATISTICS

Sample Size Calculations

Using the formula:

zα = 1.96 (α = 0.05)

zβ = 0.86 (Power = 80%)

Min. μd = 4.0

VAR(d) = 6.12 =37.21

n = 18

-zβ+μd/se(d)

μd/se(d)

0

β

H0

Page 26: BIOSTATISTICS

Part VMeasuring agreement

Literature:

Bland, Altman, (1999), ”Measuring agreement in method comparison studies”, Stat Meth Med Res, 8:135-160

Landis, Koch, (1977), ”The measurement of observer agreement for categorical data”, Biometrics, 33:159-174

Page 27: BIOSTATISTICS

Measuring agreement

Methods used in the literature:

Data MethodOrdinal Cohen’s kappa

Spearman’s rank-order correlation coefficientKendall’s tauKendall’s coefficient of concordance

Interval/ratio Pearson’s correlation coefficientIntraclass correlation coefficientTukey’s mean-difference plot (Bland-Altman plot)

Page 28: BIOSTATISTICS

Measuring agreement

Page 29: BIOSTATISTICS

Measuring agreement

Cohen’s kappa(Ordinal data)

Doctor 1

Doctor 2

Schizo- Bipolar Other Row sum

Schizo- 31 4 2 37

Bipolar 6 29 8 43

Other 10 7 3 20

Column sum

10 7 13 100agreement rate = 0.63κ = 0.41σκ= 0.077

Page 30: BIOSTATISTICS

Measuring agreement

More than two judges (Ordinal data)For example: Kendall’s coefficient of concordance(related to Friedman’s two-way ANOVA on ranks)

MGP 2009 - Song

District 1 2 3 4 5 Totals

NJutl 1 2 3 5 4

MJutl 1 2 4 3 5

SJutl 1 2 3 5 4

Sjæll 1 2 3 4 5

Cophn 1 2 4 3 5Sum 5 10 17 20 23 T=75Sumsq 25 100 289 400 529 U=1343

Page 31: BIOSTATISTICS

Measuring agreement

Kendall’s coefficient of concordance, W

m = number of ratersn = number of classes

W = 218 / 250 = 0.872

Page 32: BIOSTATISTICS

Measuring agreement

NUMERICAL VARIABLES

Correlation coefficient

Intraclass correlation coefficient

Bland-Altman plot (Tukey plot)

Manual

Auto

mat

ed

Identity lin

e

Page 33: BIOSTATISTICS

Measuring agreement

Pearson’s product-moment correlation coefficient

Ignores bias and gain!Only for two raters.

Page 34: BIOSTATISTICS

Measuring agreement

Intraclass correlation coefficient (also for multiple raters) = Between pairs variance / Total variance.

k = number of subjects (or measured objects)n = number of raters (or methods)

This takes into account the systematic difference!

Page 35: BIOSTATISTICS

Measuring agreement

Bland-Altman plotTukey mean-difference plot

Page 36: BIOSTATISTICS

Measuring agreementBias

Proportional errorHeterogeneous variance

Page 37: BIOSTATISTICS

Part VISystematic review and meta-analysis

Literature:

Chalmers, Altman, (eds), (1995), ”Systematic reviews”, Br. Med. J. Publ. Group, London

Higgins et al., (2003), ”Measuring inconsistency in meta-analysis”, Br. Med. J., 237:557-560

Cochrane Handbook: at http://www.cochrane.org

Page 38: BIOSTATISTICS

Systematic review and meta-analysis

Systematic review =

Formalized and stringent process of combining the information from all (published and unpublished) of the same health condition.

Page 39: BIOSTATISTICS

Systematic review and meta-analysis

Why systematic reviews?

Reduction of informationGeneralization to a wider populationConsistency by comparing different studiesReliability of recommendationsPower and precision increases

Page 40: BIOSTATISTICS

Systematic review and meta-analysis

Meta-analysis =

Systematic review with focus on numerical results

To combine results f rom individual studies to estimate an overall / average effect of interest (example: the relative risk of getting cancer because of using mobile phones)

Page 41: BIOSTATISTICS

Systematic review and meta-analysis

Meta-analysis

From a statistical angle, meta-analysis is an application of multifactorial methods:

Multiple studies of the same thing. Combine the results of the studies: - Treatment / risk factor is one independent factor- Study is a second independent factor

Page 42: BIOSTATISTICS

Systematic review and meta-analysis

Meta-analysis

Clear definition of the question / effect of interest.Example:- Does lowering serum cholesterol reduce risk of dying from

coronary artery disease? - Does a diet to lower serum cholesterol reduce risk of dying

from coronary artery disease?

Study where attempt to lower cholesterol failed should be included?

Page 43: BIOSTATISTICS

Systematic review and meta-analysis

Meta-analysis – PUBLICATION BIAS

Simple literature search is not good enough!- Bias towards positive results (sometimes to

negative results)- More positive results in English literature?- Unpublished studies are important.

Page 44: BIOSTATISTICS

Systematic review and meta-analysis

Meta-analysis – Example from M. Bland, ch. 17

Page 45: BIOSTATISTICS

Systematic review and meta-analysis

Meta-analysis – Example from M. Bland, ch. 17

Page 46: BIOSTATISTICS

Systematic review and meta-analysis

Meta-analysis – Example from M. Bland, ch. 17

ln(o) = b0+b1T+b2S1+ ... +b5S4+b6S5+b7TS1+ ... +b11TS5

Page 47: BIOSTATISTICS

Systematic review

Example (Mailis-Gagnon et al., (2004), ”Spinal cord stimulation for chronic

pain”, The Cochrane Library, issue 3)

1692 papers : only 2 admitted to the reviewResult: further study needed(!)

http://thecochranelibrary.com