Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions -- Any model has predicted values and residuals. (Do we always want a model with small residuals ? ) -- The “regression effect” (Why did Galton call these things “regressions” ? ) -- Pitfalls: Outliers -- Pitfalls: Extrapolation -- Conditions for a good regression
27
Embed
Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Correlations and scatterplots-- Optical illusion ?-- Finding the marginal distributions in the scatterplots
(shoe size vs. hours of TV)
Regressions-- Any model has predicted values and residuals.
(Do we always want a model with small residuals ? )-- The “regression effect”
(Why did Galton call these things “regressions” ? )-- Pitfalls: Outliers-- Pitfalls: Extrapolation-- Conditions for a good regression
Which looks like a stronger relationship?
-1
1
3
-0.43 0.32 1.07 1.82
X
Y
-6
-4
-2
0
2
4
6
8
-4 -3 -2 -1 0 1 2 3 4 5
X
Y
Mortality vs. Education
9
9.5
10
10.5
11
11.5
12
12.5
13
800 850 900 950 1000 1050 1100 1150
Education
Mortality vs. Education
9
9.5
10
10.5
11
11.5
12
12.5
13
800 850 900 950 1000 1050 1100 1150
Education
Optical Illusion ?
-1
0
1
2
-1 0 1 2
X
Y
correlation = .97
-1
0
1
2
-2 0 2
X
Y
correlation = .71
Shoe size vs. hours of TV…
Linear models and non-linear models
Model A: Model B:
y = a + bx + error y = a x1/2 + error
Model B has smaller errors. Is it a better model?
aa opas asl poasie ;aaslkf 4-9043578
y = 453209)_(*_n &*^(*LKH l;j;)(*&)(*& + error
This model has even smaller errors. In fact, zero errors.
Tradeoff: Small errors vs. complexity.
(We’ll only consider linear models.)
The “Regression” Effect
A preschool program attempts to boost children’s reading scores.
Children are given a pre-test and a post-test.
Pre-test: mean score ≈ 100, SD ≈ 10Post-test: mean score ≈ 100, SD ≈ 10
The program seems to have no effect.
A closer look at the data shows a surprising result:
Children who were below average on the pre-test tended to gain about 5-10 points on the post-test
Children who were above average on the pre-test tended to lose about 5-10 points on the post-test.
A closer look at the data shows a surprising result:
Children who were below average on the pre-test tended to gain about 5-10 points on the post-test
Children who were above average on the pre-test tended to lose about 5-10 points on the post-test.
Maybe we should provide the program only for children whose pre-test scores are below average?
Fact:In most test–retest and analogous situations, the
bottom group on the first test will on average tend to improve, while the top group on the first test will on average tend to do worse.
Other examples:• Students who score high on the midterm tend on
average to score high on the final, but not as high.
• An athlete who has a good rookie year tends to slump in his or her second year. (“Sophomore jinx”, "Sports Illustrated Jinx")
• Tall fathers tend to have sons who are tall, but not as tall. (Galton’s original example!)
80
90
100
110
120
130
80 90 100 110 120
pre-test
post-test
JPM (vertical axis) vs. DJI (horizontal axis) daily changes
-10.0000
-8.0000
-6.0000
-4.0000
-2.0000
0.0000
2.0000
4.0000
6.0000
8.0000
10.0000
-6 -4 -2 0 2 4 6
DJI
JPM
JPM (vertical axis) vs. DJI (horizontal axis) daily changes
-10.0000
-8.0000
-6.0000
-4.0000
-2.0000
0.0000
2.0000
4.0000
6.0000
8.0000
10.0000
-6 -4 -2 0 2 4 6
DJI
JPM
It works the other way, too:
• Students who score high on the final tend to have scored high on the midterm, but not as high.
• Tall sons tend to have fathers who are tall, but not as tall.
• Students who did well on the post-test showed improvements, on average, of 5-10 points, while students who did poorly on the post-test dropped an average of 5-10 points.
Students can do well on the pretest…-- because they are good readers, or-- because they get lucky.
The good readers, on average, do exactly as well on the post-test. The lucky group, on average, score lower.
Students can get unlucky, too, but fewer of that group are among the high-scorers on the pre-test.
So the top group on the pre-test, on average, tends to score a little lower on the post-test.