Psych 230 Psychological Measurement and Statistics Pedro Wolf September 30, 2009
Feb 07, 2016
Psych 230
Psychological Measurement and Statistics
Pedro Wolf
September 30, 2009
Homework
Question 2 a.
Homework
2 b. = 0.96617372 c. r2 = .93. Ninety three percent of the
variance in test y can be predicted in test x.2d. Yes the large correlation indicates that one
can predict scores on test Y with scores on test X.
Homework
• 6 a – This statement implies a cause and effect relation. – The correlation by itself does not imply this. – It may be that people with more education have
more access to mental healthcare and therefore have the opportunity to use it.
Homework
• 6b. – Again this statement implies a cause and effect
relation.– There could be other factors.• Children who play an instrument may have parents who
stress the value of practice and education
• 6c – This statement is false. – A negative correlation means there is a relationship
Homework
• 8 a. Test 4, r=.633: r2 =.4007• b. .4007 x 2= √.8014; r=.8952• c. They ask the same type of question.
Last Time….
• Correlation
• r value indicates– Direction of relationship between two variables– Strength of relationship between two variables
Today….
Regression
Regression
Regression
• Correlation tells us about the strength of the relationship between 2 variables
• It does not let us predict• We can use linear regression to do this
Correlation• When you run a correlation you convert everythingto z scores r = (ΣZxZy) / N
Regression• We build on
correlation by adding a “line of best fit” to the data
• The previous plot was on a stand- ardized scale• Any known X score
lets us predict the Y score
Line of best fit
• Remember this from high school?• Y = mX + b• We use: Y = α + by(X)+ error
• Where by is the slope of the line• a is the Y intercept (where the line hits the y-
axis)• error is the unexplained variance
Slope• Slope (by) is the angle of the line• Change in Y / Change in X• The more Y changes for every unit change of
X, the steeper the slope
Y-intercept
• This is where the line crosses the Y axis• When X = 0, the value of Y is the intercept
Line of best fit
• The resulting line comes as close as possible to the existing data points
Determining the Regression Line
• The following is the formula for determining the slope
• For the intercept
Y prime• The line formula gives us the value of Y we
would predict if given X• We write this as Y’• We have to differentiate from the actual Y,
because our estimate Y’ is not totally accurate
Why predict Y?
• We already have Y scores• Y’ isn’t as good as Y• But, the regression lets you predict new data• Use SAT scores to predict college performance• Use morbidity data to predict longevity of smokers• Use past status of markets to predict their future
status
Making predictions
• You can rewrite the line formula as:
• The slope is the middle term by = r(Sy/Sx)• Get the intercept by moving stuff around
)(' XXSSrYYX
Y
Example• Jessica wants to predict her final exam grade from
the midterm• She earned a 74 on the midterm• The mean grade on the midterm was 70 and s = 4• In previous years, the mean on the final was a 75 and
s = 4. The correlation between the two tests was r = .60
• What score can Jessica predict?• Y’ = 75 + .6(4/4)(74 – 70)
)(' XXSSrYYX
Y
Example• Jessica wants to predict her final exam grade from
the midterm• She earned a 74 on the midterm• The mean grade on the midterm was 70 and s = 4• In previous years, the mean on the final was a 75 and
s = 4. The correlation between the two tests was r = .60
• What score can Jessica predict?• Y’ = 75 + .6(4/4)(74 – 70)
)(' XXSSrYYX
Y
Example
• Y’ = 75 + .6(4/4)(74 – 70)• Y’ = 75 + (.6)(1)(4)• Y’ = 75 + 2.4• Y’ = 77.4
• What if the correlation between the midterm and final was 1?
)(' XXSSrYYX
Y
Example
• Y’ = Ybar + r(Sy / Sx) (X – Xbar)• Y’ = 75 + (1)(4/4)(74 – 70)• Y’ = 75 + 4 = 79
• The correlation is perfect here• A difference in score values reflects a
difference in scale• The distance from the mean is identical
Example
• Y’ = Ybar + r(Sy / Sx) (X – Xbar)
• What if the correlation between the midterm and final was 0?
Example
• Y’ = Ybar + r(Sy / Sx) (X – Xbar)• Y’ = 75 + (0)(4/4)(74 – 70)• Y’ = 75
• The best prediction is the mean when the variables are uncorrelated, or the correlation is unknown.
• Regression allows us to beat the mean
Variation
• If r = +-1, all variation is explained, if r = 0 all variation is unexplained
• The closer the points fall to the regression line, the greater the variation explained
Causation
• As with correlation, we can’t infer causation with regression
• We’re observing variables that correlate, not running experiments
• Beware of lurking variables. Another explanation may fit the data better
Midterm• For the midterm you are going to have to integrate what you have
learned. • You are going to be given one or more research problems with small
datasets.– Because all you know how to do right now is descriptive statistics and
correlation/regression analyses they will be correlational designs. • You are going to have to run all the descriptive statistics you know. (e.g.
what the mean, standard deviation, range, mode, etc. for the two variables).
• Draw a scatterplot.• You will then calculate the correlation, report whether or not it is
significant.• You will then do a regression, calculate the slope and intercept and draw
the line of best fit through the scatterplot.– I may give you a value for x and ask you to predict a corresponding y value given your
regression line.
Homework
• 2 a-d• 10• 15