Top Banner
Biostatistics Unit 9 Regression and Correlation 1
30

Biostatistics

Jan 03, 2016

Download

Documents

stone-phillips

Biostatistics. Unit 9 Regression and Correlation. Regression and Correlation. Regression and correlation analysis studies the relationships between variables. This area of statistics was started in the 1860s by Francis Galton (1822-1911) who was also Darwin’s Cousin. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Biostatistics

Biostatistics

Unit 9

Regression and Correlation

1

Page 2: Biostatistics

Regression and Correlation• Regression and correlation analysis

studies the relationships between variables.

• This area of statistics was started in the 1860s by Francis Galton (1822-1911) who was also Darwin’s Cousin.

2

Page 3: Biostatistics

Data for Regression and Correlation

• Data are in the form of (x,y) pairs.

• A scatter plot (x-y) plot is used to display regression and correlation data.

• The regression line has the form

y = mx + b

• In actual practice, two forms are used which are y = ax + b and y = a + bx.

3

Page 4: Biostatistics

General Regression Line

y = + x + is the y-intercept

is the slope

is the error term

4

Page 5: Biostatistics

Calculations

• For each (x,y) point, the vertical distance from the point to the regression line is squared.

• Adding these gives the sum of squares.• Regression analysis allows the

experimenter to predict one value based on the value of another.

• A similar procedure is used in biochemistry with standard curves.

5

Page 6: Biostatistics

Data

Data are in the form of (x,y) pairs. List L1 contains the x values and List L2 contains the y values.

6

Page 7: Biostatistics

Calculation of regression equation using TI-83

• The Linear Regression test is used.

• Conclusion: The equation of the regression line is y = 4.54x – 1.57

7

Page 8: Biostatistics

Using the regression equation

• Interpolation is used to find values of points between the data points. This is a relatively safe and accurate process.

• Extrapolation is used to find values of points outside the range of the data. This process is more risky especially as you get further and further from the ends of the line.

Be careful to make sure that the

calculations give realistic results.8

Page 9: Biostatistics

Significance of regression analysis

It is possible to perform the linear regression t test to give a probability. In this test: is the population regression coefficientis the population correlation coefficient

The hypotheses are:

H0: and = 0

HA: and 0

9

Page 10: Biostatistics

Calculations and Results

Calculator setup

10

Page 11: Biostatistics

Calculations and Results

Results

Conclusion: p < .001 (.000206)

11

Page 12: Biostatistics

Correlation

Correlation is used to give information about the relationship between x and y. When the regression equation is calculated, the correlation results indicate the nature and strength of the relationship.

12

Page 13: Biostatistics

Correlation Coefficient

The correlation coefficient, r, indicates the nature and strength of the relationship. Values of r range from -1 to +1. A correlation coefficient of 0 means that there is no relationship.

13

Page 14: Biostatistics

Correlation Coefficient

Perfect negative correlation, r = -1.

14

Page 15: Biostatistics

Correlation Coefficient

No correlation, r = 0.

15

Page 16: Biostatistics

Correlation Coefficient

Perfect positive correlation, r = +1.

16

Page 17: Biostatistics

Coefficient of Determination

The coefficient of determination is r2. It has values between 0 and 1. The value of r2 indicates the percentage of the relationship resulting from the factor being studied.

17

Page 18: Biostatistics

Graphs

Scatter plot

18

Page 19: Biostatistics

Graphs

Scatter plot with regression line

19

Page 20: Biostatistics

Data for calculations

20

Page 21: Biostatistics

Calculations

Calculate the regression equation

21

Page 22: Biostatistics

Calculations

Calculate the regression equation

Result: The regression equation is

y = 4.54x – 1.57

22

Page 23: Biostatistics

Calculations

Calculate the correlation coefficient

23

Page 24: Biostatistics

Coefficient of Determination

• The coefficient of determination is r2. It indicates the percentage of the contribution that the factor makes toward the relationship between x and y.

• With r = .974, the coefficient of determination r2 = .948.

• This means that about 95% of the relationship is due to the temperature.

24

Page 25: Biostatistics

Residuals

• The distance that each point is above or below the line is called a residual.

• With a good relationship, the values of the residuals will be randomly scattered.

• If there is not a random residual plot then there is another factor or effect involved that needs attention.

25

Page 26: Biostatistics

Calculate the residual variance

26

Page 27: Biostatistics

Calculate the residual variance

Result: The residual variance is 56.1366. Residual SD is 7.4924 which TI-83 gives.

27

Page 28: Biostatistics

Results of linear regression t test

28

Page 29: Biostatistics

Results of linear regression t test

29

Page 30: Biostatistics

fin

30