Top Banner
MATH 2400 Chapter 5 Notes
20

MATH 2400

Feb 23, 2016

Download

Documents

hasana

MATH 2400. Chapter 5 Notes. Regression Line. Uses data to create a linear equation in the form y = ax + b where “a” is the slope of the line (unit rate of change) “b” is the y-intercept (initial value) Can be used generalize a set of data, to estimate a value, or predict a value. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MATH 2400

MATH 2400Chapter 5 Notes

Page 2: MATH 2400

Regression LineUses data to create a linear equation in the form y = ax + b where“a” is the slope of the line (unit rate of change)“b” is the y-intercept (initial value)

Can be used generalize a set of data, to estimate a value, or predict a value.

Page 3: MATH 2400

Example 1 (Exercise 5.1)We expect a car’s highway gas mileage to be related to its city gas mileage. Data for all 1040 vehicles in the government’s 2010 Fuel Economy Guide give the regression line

HWY MPG = 6.554 + (1.016 x CITY MPG)for predicting highway mileage from city mileage.a) What is the slope of this line? Say in words what the numerical

value of the slope tells you.b) What is the intercept? Explain why the value of the intercept is not

statistically meaningful.c) Find the predicted highway mileage for a car that gets 16 mpg in the

city. Do the same for a car with city mileage 28 mpg.

Page 4: MATH 2400

Example 2 (Exercise 5.2…sort of)You use the same bottle of body wash every day. The volume was initially 355 ml. You estimate you use 7 ml of body wash each day. What is the equation of the regression line for predicting the volume of body wash left in the bottle after each day?

Page 5: MATH 2400

Least-Square Regression Line

Where and.

Sy represents the standard deviation of the response variable.Sx represents the standard deviation of the explanatory variable.r represents the correlation coefficient. represents the mean of the explanatory variable. represents the mean of the response variable.𝑥𝑦

Page 6: MATH 2400

Example 3This table displays the data regarding 8 U.S airports and their total number of passengers for the year 1992 and 2005. Use the 1992 data as the explanatory variable and the 2005 data as the response variable. Create a least-squares regression line and use that line to estimate how many passengers Raleigh-Durham International had in 2005 if the airport had 4.9 million passengers in1992.

Page 7: MATH 2400

r and r2

• r tells us if there is a positive or negative relationship between the explanatory variable and the response variable.• r also tells us how strong of a relationship the variables have.• r2 tells us what portion of the linear relationship between the

variables can be explained by the explanatory variable.• 1 – r2 tells us what portion of the linear relationship between the

variables can not be explained by the explanatory variable.Ex: If r = 0.6, r2 = 0.36. 36% of the linear relationship can be explained by the explanatory variable and 64% cannot be explained.Ex: If r = -1, r2 = 1. 100% of the linear relationship can be explained by the explanatory variable and 0% cannot be explained.

Page 8: MATH 2400

Example 4

Page 9: MATH 2400

Example 5

Page 10: MATH 2400

ResidualsA residual is the difference between an observed value of the response variable and the value predicted by the regression line. That is, a residual is the prediction error that remains after we have chosen the regression line:Residual = observed y – predicted y

= y - ̂𝑦

Page 11: MATH 2400

Residuals…continuedA residual plot makes it easier to see unusual observations and patterns. The regression line is horizontal (think about it…).

Page 12: MATH 2400

Residual GraphingUse the following data to create a least-squares regression line and plot the residuals on the graph provided. AGE HEIGHT

0 20

1 31

2 36

3 39

4 43

5 46

6 48

7 51

8 54

9 56

Page 13: MATH 2400

CAUTION!!!• Correlation and regression lines describe only linear relationships.• Correlation and least-squares regression lines are not resistant to

influential data (data drastically outside the norm). We should always plot our data and look for observations that might be influential.• Ecological Correlation is based on averages rather than on individuals.Ex: There is a large positive correlation between average income and number of years of education. The correlation is smaller if we compare the incomes of individuals with number of years of education. The correlation based on average income ignores the large variation in the incomes of individuals having the same amount of education.

Page 14: MATH 2400

CAUTION!!!Extrapolation is the use of a regression line for prediction far outside the range of values of the explanatory variable that you used to obtain the line.Ex: Using the least-squares regression line for the height of the child from ages 0-9 to predict their height at age 30.

Lurking Variables should always be thought about before drawing conclusions based on correlation or regression.

Page 15: MATH 2400

Correlation Causation???NO!!!

A serious study once found that people with two cars live longer than people who own only one car. Owning three cars is even better, and so on. There is a substantial positive correlation between number of cars x and length of life y.

Lurking variables?

Page 16: MATH 2400

HW 5.17

Page 17: MATH 2400

HW 5.25

Page 18: MATH 2400

HW 5.27

Page 19: MATH 2400

HW 5.29

Page 20: MATH 2400

HW 5.53