Psych 230 Psychological Measurement and Statistics

Psych 230

Psychological Measurement and Statistics

Pedro Wolf

September 30, 2009

Homework

Question 2 a.

Homework

2 b. = 0.96617372 c. r2 = .93. Ninety three percent of the

variance in test y can be predicted in test x.2d. Yes the large correlation indicates that one

can predict scores on test Y with scores on test X.

Homework

• 6 a – This statement implies a cause and effect relation. – The correlation by itself does not imply this. – It may be that people with more education have

more access to mental healthcare and therefore have the opportunity to use it.

Homework

• 6b. – Again this statement implies a cause and effect

relation.– There could be other factors.• Children who play an instrument may have parents who

stress the value of practice and education

• 6c – This statement is false. – A negative correlation means there is a relationship

Homework

• 8 a. Test 4, r=.633: r2 =.4007• b. .4007 x 2= √.8014; r=.8952• c. They ask the same type of question.

Last Time….

• Correlation

• r value indicates– Direction of relationship between two variables– Strength of relationship between two variables

Today….

Regression

Regression

Regression

• Correlation tells us about the strength of the relationship between 2 variables

• It does not let us predict• We can use linear regression to do this

Correlation• When you run a correlation you convert everythingto z scores r = (ΣZxZy) / N

Regression• We build on

correlation by adding a “line of best fit” to the data

• The previous plot was on a stand- ardized scale• Any known X score

lets us predict the Y score

Line of best fit

• Remember this from high school?• Y = mX + b• We use: Y = α + by(X)+ error

• Where by is the slope of the line• a is the Y intercept (where the line hits the y-

axis)• error is the unexplained variance

Slope• Slope (by) is the angle of the line• Change in Y / Change in X• The more Y changes for every unit change of

X, the steeper the slope

Y-intercept

• This is where the line crosses the Y axis• When X = 0, the value of Y is the intercept

Line of best fit

• The resulting line comes as close as possible to the existing data points

Determining the Regression Line

• The following is the formula for determining the slope

• For the intercept

Y prime• The line formula gives us the value of Y we

would predict if given X• We write this as Y’• We have to differentiate from the actual Y,

because our estimate Y’ is not totally accurate

Why predict Y?

• We already have Y scores• Y’ isn’t as good as Y• But, the regression lets you predict new data• Use SAT scores to predict college performance• Use morbidity data to predict longevity of smokers• Use past status of markets to predict their future

status

Making predictions

• You can rewrite the line formula as:

• The slope is the middle term by = r(Sy/Sx)• Get the intercept by moving stuff around

)(' XXSSrYYX

Y

Example• Jessica wants to predict her final exam grade from

the midterm• She earned a 74 on the midterm• The mean grade on the midterm was 70 and s = 4• In previous years, the mean on the final was a 75 and

s = 4. The correlation between the two tests was r = .60

• What score can Jessica predict?• Y’ = 75 + .6(4/4)(74 – 70)

)(' XXSSrYYX

Y

Example• Jessica wants to predict her final exam grade from

the midterm• She earned a 74 on the midterm• The mean grade on the midterm was 70 and s = 4• In previous years, the mean on the final was a 75 and

s = 4. The correlation between the two tests was r = .60

• What score can Jessica predict?• Y’ = 75 + .6(4/4)(74 – 70)

)(' XXSSrYYX

Y

Example

• Y’ = 75 + .6(4/4)(74 – 70)• Y’ = 75 + (.6)(1)(4)• Y’ = 75 + 2.4• Y’ = 77.4

• What if the correlation between the midterm and final was 1?

)(' XXSSrYYX

Y

Example

• Y’ = Ybar + r(Sy / Sx) (X – Xbar)• Y’ = 75 + (1)(4/4)(74 – 70)• Y’ = 75 + 4 = 79

• The correlation is perfect here• A difference in score values reflects a

difference in scale• The distance from the mean is identical

Example

• Y’ = Ybar + r(Sy / Sx) (X – Xbar)

• What if the correlation between the midterm and final was 0?

Example

• Y’ = Ybar + r(Sy / Sx) (X – Xbar)• Y’ = 75 + (0)(4/4)(74 – 70)• Y’ = 75

• The best prediction is the mean when the variables are uncorrelated, or the correlation is unknown.

• Regression allows us to beat the mean

Variation

• If r = +-1, all variation is explained, if r = 0 all variation is unexplained

• The closer the points fall to the regression line, the greater the variation explained

Causation

• As with correlation, we can’t infer causation with regression

• We’re observing variables that correlate, not running experiments

• Beware of lurking variables. Another explanation may fit the data better

Midterm• For the midterm you are going to have to integrate what you have

learned. • You are going to be given one or more research problems with small

datasets.– Because all you know how to do right now is descriptive statistics and

correlation/regression analyses they will be correlational designs. • You are going to have to run all the descriptive statistics you know. (e.g.

what the mean, standard deviation, range, mode, etc. for the two variables).

• Draw a scatterplot.• You will then calculate the correlation, report whether or not it is

significant.• You will then do a regression, calculate the slope and intercept and draw

the line of best fit through the scatterplot.– I may give you a value for x and ask you to predict a corresponding y value given your

regression line.

Homework

• 2 a-d• 10• 15

Psych 230 Psychological Measurement and Statistics

Documents