Top Banner
Biostatistics Unit 9 – Regression and Correlation
35
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Biostatistics Unit 9 – Regression and Correlation.

Biostatistics

Unit 9 – Regression and Correlation

Page 2: Biostatistics Unit 9 – Regression and Correlation.

Regression and Correlation

Introduction

Regression and correlation analysis studies the relationships between variables.

Page 3: Biostatistics Unit 9 – Regression and Correlation.

This area of statistics was started in the 1860s by Francis Galton (1822-1911) who was also Darwin’s Cousin.

Page 4: Biostatistics Unit 9 – Regression and Correlation.

Nature of DataThe data are in the form of (x,y) pairs.

Page 5: Biostatistics Unit 9 – Regression and Correlation.

Graphical Representation

A scatter plot (x-y) plot is used to display regression and correlation data. The regression line has the form

y = mx + b

In actual practice, various forms are used such as y = ax + b and y = a + bx.

Page 6: Biostatistics Unit 9 – Regression and Correlation.

General Regression Line

y = + x + is the y-intercept

is the slope

is the error term

Page 7: Biostatistics Unit 9 – Regression and Correlation.

Calculations

For each point, the vertical distance from the point to the regression line is squared. Adding these gives the sum of squares.

Page 8: Biostatistics Unit 9 – Regression and Correlation.

Regression Analysis

Regression analysis allows the experimenter to predict one value based on the value of another.

Page 9: Biostatistics Unit 9 – Regression and Correlation.

Data

Data are in the form of (x,y) pairs.

Page 10: Biostatistics Unit 9 – Regression and Correlation.

Regression Equation

Page 11: Biostatistics Unit 9 – Regression and Correlation.

Using the regression equation

• Interpolation is used to find values of points between the data points.

• Extrapolation is used to find values of points outside the range of the data.

Be careful that the results of the calculations give realistic results.

Page 12: Biostatistics Unit 9 – Regression and Correlation.

Significance of regression analysis

It is possible to perform the linear regression t test. In this test:

is the population regression coefficient

is the population correlation coefficient

Page 13: Biostatistics Unit 9 – Regression and Correlation.

Hypotheses

H0: and = 0

HA: and 0

Page 14: Biostatistics Unit 9 – Regression and Correlation.

Calculations and Results

Calculator setup

Page 15: Biostatistics Unit 9 – Regression and Correlation.

Calculations and Results

Results

Page 16: Biostatistics Unit 9 – Regression and Correlation.

Correlation

Correlation is used to give information about the relationship between x and y. When the regression equation is calculated, the correlation results indicate the nature and strength of the relationship.

Page 17: Biostatistics Unit 9 – Regression and Correlation.

Correlation Coefficient

The correlation coefficient, r, indicates the nature and strength of the relationship. Values of r range from -1 to +1. A correlation coefficient of 0 means that there is no relationship.

Page 18: Biostatistics Unit 9 – Regression and Correlation.

Correlation Coefficient

Perfect negative correlation, r = -1.

Page 19: Biostatistics Unit 9 – Regression and Correlation.

Correlation Coefficient

No correlation, r = 0.

Page 20: Biostatistics Unit 9 – Regression and Correlation.

Correlation Coefficient

Perfect positive correlation, r = +1.

Page 21: Biostatistics Unit 9 – Regression and Correlation.

Coefficient of Determination

The coefficient of determination is r2. It has values between 0 and 1. The value of r2 indicates the percentage of the relationship resulting from the factor being studied.

Page 22: Biostatistics Unit 9 – Regression and Correlation.

Graphs

Scatter plot

Page 23: Biostatistics Unit 9 – Regression and Correlation.

Graphs

Scatter plot with regression line

Page 24: Biostatistics Unit 9 – Regression and Correlation.

Data for calculations

Page 25: Biostatistics Unit 9 – Regression and Correlation.

Calculations

Calculate the regression equation

Page 26: Biostatistics Unit 9 – Regression and Correlation.

Calculations

Calculate the regression equation

Page 27: Biostatistics Unit 9 – Regression and Correlation.

Calculations

Calculate the regression equation

y = 4.53x – 1.57

Page 28: Biostatistics Unit 9 – Regression and Correlation.

Calculations

Calculate the correlation coefficient

Page 29: Biostatistics Unit 9 – Regression and Correlation.

Coefficient of Determination

The coefficient of determination is r2. It indicates the percentage of the contribution that the factor makes toward the relationship between x and y. With r = .974, the coefficient of determination r2 = .948. This means that about 95% of the relationship is due to the temperature.

Page 30: Biostatistics Unit 9 – Regression and Correlation.

Residuals

The distance that each point is above or below the line is called a residual. With a good relationship, the values of the residuals will be randomly scattered. If there is not a random residual plot then there is another factor or effect involved that needs attention.

Page 31: Biostatistics Unit 9 – Regression and Correlation.

Calculate the residual variance

Page 32: Biostatistics Unit 9 – Regression and Correlation.

Calculate the residual variance

Page 33: Biostatistics Unit 9 – Regression and Correlation.

Results of linear regression t test

Page 34: Biostatistics Unit 9 – Regression and Correlation.

Results of linear regression t test

Page 35: Biostatistics Unit 9 – Regression and Correlation.

fin