Top Banner
Chapter 15 1 Chapter 15 Describing Relationships: Regression, Prediction, and Causation
26

Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

Dec 13, 2015

Download

Documents

Reynold Jacobs
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

Chapter 15 1

Chapter 15

Describing Relationships: Regression, Prediction, and

Causation

Page 2: Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

Chapter 15 3

Thought Question 2

From past natural disasters, a strong positive correlation has been found between the amount of aid sent and the number of deaths. Would you interpret this to mean that sending more aid causes more people to die? Explain.

Page 3: Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

Chapter 15 4

Thought Question 3

Studies have shown a negative correlation between the amount of food consumed that is rich in beta carotene and the incidence of lung cancer in adults. Does this correlation provide evidence that beta carotene is a contributing factor in the prevention of lung cancer? Explain.

Page 4: Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

Chapter 15 6

Linear Regression Objective: To quantify the linear relationship between

an explanatory variable and a response variable.

We can then predict the average response for all subjects with a given value of the explanatory variable.

Regression equation: y = a + bx– x is the value of the explanatory variable– y is the average value of the response variable

– note that a and b are just the intercept and slope of a straight line– note that r and b are not the same thing, but their signs will agree

Plot

Page 5: Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

Chapter 15 7

Least Squares Regression

Used to determine the “best” line

We want the line to be as close as possible to the data points in the vertical (y) direction (since that is what we are trying to predict)

Least Squares: use the line that minimizes the sum of the squares of the vertical distances of the data points from the line

Click for Graphical

Explanation

Page 6: Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

Chapter 15 8

Prediction via Regression Line

Hand, et.al., A Handbook of Small Data Sets, London: Chapman and Hall

The regression equation is y = 3.6 + 0.97x– y is the average age of all husbands who have wives

of age x

For all women aged 30, we predict the average husband age to be 32.7 years:

3.6 + (0.97)(30) = 32.7 years Suppose we know that an individual wife’s

age is 30. What would we predict her husband’s age to be? How old is her husband?

Husband and Wife: Ages

Page 7: Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

Chapter 15 9

Coefficient of Determination (R2)

Measures usefulness of regression prediction R2 (or r2, the square of the correlation):

measures the percentage of the variation in the values of the response variable (y) that is explained by the regression line r=1: R2=1: regression line explains all (100%) of

the variation in y r=.7: R2=.49: regression line explains almost

half(50%) of the variation in y

Page 8: Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

Chapter 14 10

Income versus Assets

0

50

100

150

200

250

300

0 20 40 60

assets (billions)

inco

me

(mil

lio

ns)

Income =a + bAssets

Assets vary from 3.4 billion to 49 billion

Income varies from bank to bank, even among those with similar assets

Statistical relationship

Page 9: Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

Chapter 15 11

A CautionBeware of Extrapolation

Sarah’s height was plotted against her age

Can you predict her height at age 42 months?

Can you predict her height at age 30 years (360 months)?

80

85

90

95

100

30 35 40 45 50 55 60 65

age (months)

hei

gh

t (c

m)

Page 10: Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

Chapter 15 12

A CautionBeware of Extrapolation

Regression line:y = 71.95 + .383 x

height at age 42 months? y = 88 cm.

height at age 30 years? y = 209.8 cm.– She is predicted to

be 6' 10.5" at age 30.70

90

110

130

150

170

190

210

30 90 150 210 270 330 390

age (months)

hei

gh

t (c

m)

Page 11: Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

Chapter 15 13

Correlation Does Not Imply Causation

Even very strong correlations may not correspond to a real

causal relationship.

Click for Graphical

Explanation

Page 12: Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

Chapter 15 14

Evidence of Causation A properly conducted experiment

establishes the connection Other considerations:

– A reasonable explanation for a cause and effect exists

– The connection happens in repeated trials – The connection happens under varying

conditions– Potential confounding factors are ruled out– Alleged cause precedes the effect in time

Page 13: Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

Chapter 15 15

Reasons Two Variables May Be Related (Correlated)

Explanatory variable causes change in response variable

Response variable causes change in explanatory variable

Explanatory may have some cause, but is not the sole cause of changes in the response variable

Confounding variables may exist Both variables may result from a common cause

– such as, both variables changing over time The correlation may be merely a coincidence

Page 14: Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

Chapter 15 16

Explanatory causes Response

Explanatory: pollen count from grasses Response: percentage of people

suffering from allergy symptoms

Explanatory: amount of food eaten Response: hunger level

Page 15: Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

Chapter 15 17

Response causes Explanatory

Explanatory: Hotel advertising dollars Response: Occupancy rate

Positive correlation? – more advertising leads to increased occupancy rate? Actual correlation is negative: lower

occupancy leads to more advertising

Page 16: Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

Chapter 15 18

Explanatory is notSole Contributor

barbecued foods are known to contain carcinogens, but other lifestyle choices may also contribute

Explanatory: Consumption of barbecued foods

Response: Incidence of stomach cancer

Page 17: Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

Chapter 15 19

Confounding Variables

Explanatory: Meditation Response: Aging (measurable aging

factor)

general concern for one’s well being may be confounded with decision to try meditation

Meditation vs. Aging

Page 18: Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

Chapter 15 20

Common Response(both variables change due to

common cause)

Both may result from an unhappy marriage.

Explanatory: Divorce among men Response: Percent abusing alcohol

Page 19: Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

Chapter 15 21

Both Variables are Changing Over Time

Both divorces and suicides have increased dramatically since 1900.

Are divorces causing suicides? Are suicides causing divorces??? The population has increased

dramatically since 1900 (causing both to increase). Better to investigate: Has the rate of divorce

or the rate of suicide changed over time?

Page 20: Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

Chapter 15 22

The Relationship May Be Just a Coincidence

We will see some strong correlations (or apparent associations) just by chance, even when the variables are not related in the population

Page 21: Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

Chapter 15 23

A required whooping cough vaccine was blamed for seizures that caused brain damage– led to reduced production of vaccine (due to lawsuits)

Study of 38,000 children found no evidence for the accusations (reported in New York Times)– “people confused association with cause-and-effect”– “virtually every kid received the vaccine…it was inevitable

that, by chance, brain damage caused by other factors would occasionally occur in a recently vaccinated child”

Issues with this analysis??? – Prevalence

Coincidence (?)Vaccines and Brain Damage

Page 22: Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

Chapter 15 24

Case Study

House, J., Landis, K., and Umberson, D. “Social Relationships and Health,” Science, Vol. 241 (1988), pp 540-545.

Social Relationships and Health

Does lack of social relationships cause people to become ill?

Or, are unhealthy people less likely to establish and maintain social relationships?

Or, is there some other factor that predisposes people both to have lower social activity and become ill?

Page 23: Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

Chapter 15 25

Key Concepts

Least Squares Regression Equation R2

Correlation does not imply causation Confirming causation Reasons variables may be correlated

Continued…

Page 24: Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

Chapter 15 26

Cautionsabout Correlation and Regression

only describe linear relationships are both affected by outliers always plot the data before interpreting beware of extrapolation

– predicting outside of the range of x

beware of lurking variables– have important effect on the relationship among the

variables in a study, but are not included in the study

association does not imply causation

Page 25: Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

Chapter 15 27

Least Squares Regression

A least squares regression line makes the vertical distances from the data points to the line small. Return to

Slide 7

Page 26: Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

Chapter 15 28

A few explanations for an observed association

A dashed line shows an association. An arrow shows a cause-and-effect link.

Variable x is explanatory, y is a response variable, and z is a lurking variable.Return to Slide 13