Top Banner
Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology
84

Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Jan 02, 2016

Download

Documents

Lisa Summers
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Basic Biostatistics

Prof Paul Rheeder

Division of Clinical Epidemiology

Page 2: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 3: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 4: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 5: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Overview• Bias vs chance• Types of data• Descriptive statistics• Histograms and boxplots• Inferential statistics• Hypothesis testing: P and CI• Comparing groups• Correlation and regression

Page 6: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 7: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Research Questions?• Does CK level predict in hospital

mortality post MI?• Is there an association between

troponin I and renal function?• What is the Incidence of

amputation in diabetics with renal failure?

HOW ARE THEY MEASURED???

Page 8: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Research question• Does aspirin reduce CV mortality

in diabetics when used for primary prevention?

• Is there an increased risk between cell phone use and brain cancer?

• Does level of SES correlate with depression?

Page 9: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Research question• So your research question must be

phrased in such a manner that you can answer YES or NO or provide some quantification of sorts.

Page 10: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Data analysis• Aim: to provide information on the

study sample and to answer the research question !

Page 11: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Problems !

Page 12: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Problems• Bias and confounding also called

systematic error…. Typically dealt with in the planning and execution of the study…can also control for it in the data analysis (eg multivariate analysis)

• Chance also called random error. Classically P values (and CI) can be used to judge role of chance

Page 13: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

First important issues• What type of data are you collecting

• Typically one has some outcome variable and some exposure variable or variables?

• How and with what are they measured?

Page 14: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Outcome and exposure?

• Does CK level predict in hospital mortality post MI?

• Is there an association between troponin I and renal function?

• What is the Incidence of amputation in diabetics with renal failure?

HOW ARE THEY MEASURED???

Page 15: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Research question• Does aspirin reduce CV mortality

in diabetics when used for primary prevention?

• Is there an increased risk between cell phone use and brain cancer?

• Does level of SES correlate with depression?

Page 16: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Research question• So your research question must be

phrased in such a manner that you can answer YES or NO or provide some quantification of sorts.

Page 17: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Types of data• Categorical: HT yes or no, sex,

smoking status (usually a %)• Ordinal versus nominal• Continuous data• Spread of continuous data

Page 18: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 19: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Data analysis• Descriptive stats

• Mean/median

• SD or range

Page 20: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Hypothesis testing• Differences between groups:• Examples:• T test/Mann Whitney (2 groups)• ANOVA/ Kruskal Wallis (>2 groups)• Chi square if it is %

Page 21: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

• Associations between variables• Does coffee cause cancer (OR, RR)• Efficacy of Rx (RRR, ARR, NNT)• If BMI associated with BP

(correlation and regression)

Page 22: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

2 X 2 tableCancer No cancer

Smoke a b

Non smoker c d

RR= (a/a+b)/(c/c+d) OR = (a/b)/(c/d)

Page 23: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 24: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 25: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

TYPES OF DATA

Page 26: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

DESCRIPTIVE STATS

Page 27: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 28: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 29: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 30: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 31: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 32: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 33: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 34: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 35: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 36: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Graphics

40

50

60

70

ag

e in

ye

ars

1 2 3

Page 37: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 38: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Using the SD and the Normal Curve

Page 39: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 40: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

• Mean ± 1.96 SD = 95% range of sample

• Mean ± 1.96 SEM=95% Confidence interval

Page 41: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 42: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

One of many samples

Page 43: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 44: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

95% Confidence Intervals

Page 45: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 46: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 47: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Hypothesis Testing

Page 48: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 49: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 50: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 51: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 52: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 53: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 54: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 55: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 56: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Type I & II Errors Have an Inverse Relationship

If you reduce the probability of one error, the other one increases so that everything else is unchanged.

Page 57: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Factors Affecting Type II Error

• True value of population parameter– Increases when the difference between

hypothesized parameter and its true value decrease

• Significance level– Increases when decreases

• Population standard deviation– Increases when increases

• Sample size– Increases when n decreases

n

Page 58: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 59: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 60: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Examples• Difference in glucose between

survivors and non survivors• = 5 mmol/l (95% CI -5 to 10

mmol/l)• RR for cancer =1.4 (95% CI 0.7 to

1.3)

Page 61: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 62: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

P value• The H0 is NO difference• BUT I can find a difference by chance• Eg WHAT is the probability that you can

find a difference between groups of 5 mmol/l when in TRUTH the difference is ZERO?

• P=0.10

Page 63: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

+-------------------+| Key ||-------------------|| frequency || column percentage |+-------------------+

| 0=L E=1 Y/NR | 0 1 | Total-----------+----------------------+---------- N | 28 20 | 48 | 53.85 44.44 | 49.48 -----------+----------------------+---------- Y | 24 25 | 49 | 46.15 55.56 | 50.52 -----------+----------------------+---------- Total | 52 45 | 97 | 100.00 100.00 | 100.00

Pearson chi2(1) = 0.8530 Pr = 0.356

Page 64: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 65: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Differences between groups

Page 66: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Parametric comparisons

Page 67: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

?

Page 68: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

T-test?

Page 69: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

What about 3 groups• anova age ethngr, cat(ethngr)

• Number of obs = 37 R-squared = 0.0621• Root MSE = 7.7883 Adj R-squared = 0.0069

• Source | Partial SS df MS F Prob > F• -----------+----------------------------------------------------• Model | 136.560095 2 68.2800477 1.13 0.3362• |• ethngr | 136.560095 2 68.2800477 1.13 0.3362• |• Residual | 2062.35882 34 60.6576125 • -----------+----------------------------------------------------• Total | 2198.91892 36 61.0810811

Page 70: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Differences between the 3

• . regress

• Source | SS df MS Number of obs = 37• -------------+------------------------------ F( 2, 34) = 1.13• Model | 136.560095 2 68.2800477 Prob > F = 0.3362• Residual | 2062.35882 34 60.6576125 R-squared = 0.0621• -------------+------------------------------ Adj R-squared = 0.0069• Total | 2198.91892 36 61.0810811 Root MSE = 7.7883

• ------------------------------------------------------------------------------• age Coef. Std. Err. t P>|t| [95% Conf. Interval] • ------------------------------------------------------------------------------• _cons 56.6 2.462877 22.98 0.000 51.59483 61.60517• ethngr• 1 4.635294 3.103845 1.49 0.145 -1.672479 10.94307• 2 2.5 3.483034 0.72 0.478 -4.578376 9.578376• 3 (dropped)• ------------------------------------------------------------------------------

Page 71: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Repeated measures• One group of schoolkids• Muscle strength in January• Muscle strength again in March• Did things change significantly over

time?• Paired T –test• Two or more groups: RM ANOVA

Page 72: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Non-parametric comparisons

• Two groups• ranksum age, by(menopaus)

• Two-sample Wilcoxon rank-sum (Mann-Whitney) test

• menopaus | obs rank sum expected• -------------+---------------------------------• 0 | 19 210 826.5• 1 | 67 3531 2914.5• -------------+---------------------------------• combined | 86 3741 3741

• unadjusted variance 9229.25• adjustment for ties -28.04• ----------• adjusted variance 9201.21

• Ho: age(menopaus==0) = age(menopaus==1)• z = -6.427• Prob > |z| = 0.0000

Page 73: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Non Parametric• Three groups• kwallis s_tg, by(ethngr)

• Test: Equality of populations (Kruskal-Wallis test)

• +-------------------------+• | ethngr | Obs | Rank Sum |• |--------+-----+----------|• | 1 | 17 | 381.00 |• | 2 | 10 | 149.50 |• | 3 | 10 | 172.50 |• +-------------------------+

• chi-squared = 3.350 with 2 d.f.• probability = 0.1873

• chi-squared with ties = 3.352 with 2 d.f.• probability = 0.1871

Page 74: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

summarize• Continuous-Non Normal• 2 groups: Mann Whitney• 3 groups: Kruskal Wallis

• Continuous-Normal• 2 groups: T tests• 3 groups: ANOVA

Page 75: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Categorical data

Page 76: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 77: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 78: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Relationships

Page 79: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Page 80: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Linear Regression

Page 81: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

• Here the DEPENDENT (logTG) and INDEPENDENT VARIABLES are continuous

• So how much does logTG increase if waist increases by 1cm = the beta coefficient

Page 82: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

What if the INDEP=Categorical

• regress age menop

• Source | SS df MS Number of obs = 86• -------------+------------------------------ F( 1, 84) = 135.01• Model | 3499.71205 1 3499.71205 Prob > F = 0.0000• Residual | 2177.49725 84 25.9225863 R-squared = 0.6164• -------------+------------------------------ Adj R-squared = 0.6119• Total | 5677.2093 85 66.7906977 Root MSE = 5.0914

• ------------------------------------------------------------------------------• age | Coef. Std. Err. t P>|t| [95% Conf. Interval]• -------------+----------------------------------------------------------------• menopaus | 15.37628 1.323348 11.62 0.000 12.74465 18.0079• _cons | 46.57895 1.168053 39.88 0.000 44.25615 48.90175• ------------------------------------------------------------------------------

Menop= 0 or 1……. INTERPRETATION??

Page 83: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Logistic regression• Outcome is heart disease (Yes/No… ?)• Independent var = age• . logistic CVD age

• Logistic regression Number of obs = 48• LR chi2(1) = 2.51• Prob > chi2 = 0.1133• Log likelihood = -29.945379 Pseudo R2 = 0.0402

• died | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]• -------------+------------------------------------------------------------ age |

1.093467 .064069 1.52 0.127 .9748363 1.226535• ---------------------------------------------------------------------------

?

Page 84: Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.