Top Banner
Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Statistics for Health Research Research
21

Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Dec 13, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Correlation and Linear Regression

Peter T. Donnan

Professor of Epidemiology and Biostatistics

Statistics for Health ResearchStatistics for Health Research

Page 2: Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

CONTENTS

• Correlation coefficients• meaning• values• role• significance

• Regression• line of best fit• prediction• significance

2

Page 3: Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

INTRODUCTION

• Correlation• the strength of the linear relationship between

two variables

• Regression analysis• determines the nature of the relationship

• For example - Is there a relationship between the number of units of alcohol consumed and the likelihood of developing cirrhosis of the liver?

3

Page 4: Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

PEARSON’S COEFFICIENT OF CORRELATION (r)

• Measures the strength of the linear relationship between one dependent and one independent variable• curvilinear relationships need other techniques

• Values lie between +1 and -1• perfect positive correlation r = +1 • perfect negative correlation r = -1• no linear relationship r = 0

4

Page 5: Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

PEARSON’S COEFFICIENT OF CORRELATION

5

r = +1

r = -1

r = 0.6

r = 0

Page 6: Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

SCATTER PLOT

6

dependent variable

make inferences about

independent variable

Calcium intake

BMD

Page 7: Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

NON-NORMAL DATA

7

Page 8: Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

NORMALISED WITH LOG TRANSFORMATION

8

Page 9: Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

SPSS OUTPUT: SCATTER PLOT

9

Page 10: Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

SPSS OUTPUT: CORRELATIONS

10

Page 11: Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

11

Interpreting correlation

Large r does not necessarily imply: strong correlation

r tends to increase with sample size cause and effect

strong correlation between the number of televisions sold and the number of cases of paranoid schizophrenia

watching TV causes paranoid schizophrenia

may be due to indirect relationship

Page 12: Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

12

Interpreting correlation

Variation in dependent variable due to: relationship with independent variable: r2 random noise: 1 - r2 r2 is the Coefficient of Determination or

Variation explained e.g. r = 0.661 r2 = = 0.44 less than half of the variation (44%) in

the dependent variable due to independent variable

Page 13: Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

13

Page 14: Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

14

Agreement

Correlation should never be used to determine the level of agreement between repeated measures: measuring devices users techniques

It measures the degree of linear relationship You can have high correlation with poor

agreement

Page 15: Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

15

Non-parametric correlation

Make no assumptions Carried out on ranks Spearman’s

easy to calculate Kendall’s

has some advantages over distribution has better statistical

properties easier to identify concordant / discordant

pairs Usually both lead to same

conclusions

Page 16: Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

16

Role of regression

Shows how one variable changes with another

By determining the line of best fit Default is linear Curvilinear?

Page 17: Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

17

Line of best fit

Simplest case linear Line of best fit between:

dependent variable Y BMD

independent variable X dietary intake of Calcium

value of Y when X=0

Y = a + bX

change in Y when X increases by 1

Page 18: Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

18

Role of regression

Used to predict or explore associations the value of the dependent variable when value of independent variable(s)

known within the range of the known data

extrapolation is risky! relation between age and bone age

Does not imply causality

Page 19: Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

SPSS OUTPUT: REGRESSION

19

Page 20: Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

20

Multiple regression

Later - More than one independent variable BMD may be dependent on:

agegendercalorific intakeUse of bisphosphonatesExerciseetc

Page 21: Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

21

Summary

Correlation strength of linear relationship between two

variables Pearson’s - parametric Spearman’s / Kendall’s non-parametric Interpret with care!

Regression line of best fit prediction Multiple regression logistic