This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
The correlation between two scores X and Y equals 0.8. If both the X scores and the Y scores are converted to z-scores, then the correlation between the z-scores for X and the z-scores for Y would be:
The correlation between two scores X and Y equals 0.8. If both the X scores and the Y scores are converted to z-scores, then the correlation between the z-scores for X and the z-scores for Y would be:
The linear model (line of best fit, least squares line, regression line) is just an equation of a straight line through the data to show us how the values are associated.
Using this line we will be able to predict values. Predicted values are denoted as: (also called y-hat).
The hat tells you they are predicted values. The difference between the observed-value and the
Some residuals are positive, others are negative, and, on average, they cancel each other out. To calculate how well the line fits the data we square the residuals (to eliminate the negatives) then find the sum of the squares.
The smaller the sum, the better the fit. That is why another name is least squares line.
Example: A scatterplot of house prices vs. house size for houses shows a relationship that is straight, with only moderate scatter and no outliers. The correlation between house price and house size is 0.77.
a. You go to an open house and find the house is 1 standard deviation above the mean in size. What would you guess about its price?
b. You read an add for a house priced 2 standard deviations below the mean. What would you guess about it’s size?
c. A friend tells you about a house whose size in square meters (he’s European) is 1.5 standard deviations above the mean. What would you guess about its size in square feet?
First make sure a regression is appropriate: Since regression and correlation are closely
related, we need to check the same conditions for regressions as we did for correlations: Quantitative Variables Condition Straight Enough Condition (look at scatterplot) Outlier Condition (look at scatterplot)
Example: Using the relationship between house price (in thousands of dollars) and house size (in thousands of square feet) the regression model is: a. What is the slope and what does it mean?
b. What are the units of the slope?
c. Your house is 2000 square feet bigger than your neighbor’s house. How much more do you expect it to be worth?
d. Is the y-intercept of -3.117 meaningful, explain?
Hurricane Katrina had a central pressure measured at 920 millibars. What does our regression model predict for her maximum wind speed? How good is that prediction, given that Katrina’s actual wind speed was measured at 110 knots?
You look at the scatterplot to make sure it is linear. Sometimes it is hard to tell. After you do a regression do a residual plot. If the residual plot is completely random, you know your scatterplot was linear.
The calculator automatically stores the residuals in a list named RESID after you run a regression. To look at them … STAT EDIT cursor over to RESID.
To create the residual plot … STAT PLOT, Plot2, Xlist:YR and Ylist: RESID
Y= may still have your regression line in it. You can either turn it off or remove it.
All regression analyses include this statistic, although by tradition, it is written R2 (pronounced “R-squared”). An R2 of 0 means that none of the variance in the data is in the model; all of it is still in the residuals.
When interpreting a regression model you need to Tell what R2 means. “The % of variability in y that is explained by x is” R2
R2 is always between 0% and 100%. What makes a “good” R2 value depends on the kind of data you are analyzing and on what you want to do with it.
Always report slope and intercept for a regression and R2 so that readers can judge for themselves how successful the regression is at fitting the data.