Top Banner
Copyright ©2011 Brooks/Cole, Cengage Learning Relationshi ps Between Quantitativ e Variables Chapter 3 1
40

Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Dec 19, 2015

Download

Documents

Katrina Fox
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning

Relationships Between

Quantitative Variables

Chapter 3

1

Page 2: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 2

Principle Idea:

The description and confirmation of relationships between variables are very important in research.

Page 3: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 3

Three Tools we will use …

• Scatterplot, a two-dimensional graph of data values

• Correlation, a statistic that measures the strength and direction of a linear relationship between two quantitative variables.

• Regression equation, an equation that describes the average relationship between a quantitative response and explanatory variable.

Page 4: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 4

3.1 Looking for Patterns with Scatterplots

Questions to Ask about a Scatterplot

• What is the average pattern? Does it look like a straight line, or is it curved?

• What is the direction of the pattern?

• How much do individual points vary from the average pattern?

• Are there any unusual data points?

Page 5: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 5

Positive/Negative Association/Linear Relationship• Two variables have a positive association when

the values of one variable tend to increase as the values of the other variable increase.

• Two variables have a negative association when the values of one variable tend to decrease as the values of the other variable increase.

• Two variables have a linear relationship when the pattern of their relationship resembles a straight line.

Page 6: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 6

Example 3.1 Height and Handspan

Data shown are the first 12 observations of a data set that includes the heights (in inches) and fully stretched handspans (in centimeters) of 167 college students.

Page 7: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 7

Example 3.1 Height and Handspan

Taller people tend to have greater handspan measurements than shorter people do.

When two variables tend to increase together, we say that they have a positive association.

The handspan and height measurements may have a linear relationship.

Page 8: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 8

Example 3.2 Driver Age and MaximumLegibility Distance of Highway Signs

• A research firm determined the maximum distance at which each of 30 drivers could read a newly designed sign.

• The 30 participants in the study ranged in age from 18 to 82 years old.

• We want to examine the relationship between age and the sign legibility distance.

Page 9: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 9

Example 3.2 Driver Age and MaximumLegibility Distance of Highway Signs

• We see a negative association with a linear pattern.

• We will use a straight-line equation to model this relationship.

Page 10: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 10

Example 3.3 The Development of Musical Preferences

• The 108 participants in the study ranged in age from 16 to 86 years old.

• We want to examine the relationship between song-specific age (age in the year the song was popular) and musical preference (positive score above average, negative score below average).

Page 11: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 11

Example 3.3 The Development of Musical Preferences

• Popular music preferences acquired in late adolescence and early adulthood.

• The association is nonlinear.

Page 12: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 12

Groups and Outliers

• Use different plotting symbols or colors to represent different subgroups.

• Look for outliers: points that have an usual combination of data values.

Page 13: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 13

3.2 Describing Linear Patterns with a Regression Line

Two purposes of the regression line:• to estimate the average value of y at any

specified value of x• to predict the value of y for an individual,

given that individual’s x value

When the best equation for describing the relationship between x and y is a straight line, the equation is called the regression line.

Page 14: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 14

Example 3.5 Height and Handspan (cont)

Based on line, at height of 60 inches tall, handspan is about 18 cm,at height of 70 inches tall, handspan is about 21.5 cm.So change in height of 10 inches corresponds to change in handspan of about 3.5 cm 3.5/10 = 0.35 cm per inch.Estimated slope is about 0.35 cm per inch.

Scatterplot with ‘best’ regression line (via Minitab)

Page 15: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 15

The Equation for the Regression Line

is spoken as “y-hat,” and it is also referred to either as predicted y or estimated y.

b0 is the intercept of the straight line. The intercept is the value of y when x = 0.

b1 is the slope of the straight line. The slope tells us how much of an increase (or decrease) there is for the y variable when the x variable increases by one unit. The sign of the slope tells us whether y increases or decreases when x increases.

xbby 10ˆ y

Page 16: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 16

Example 3.6 Writing the Regression Equation for Height and Handspan

Regression equation: Handspan = -3 + 0.35 Height

Estimate the average handspan for people 60 inches tall:Average handspan = -3 + 0.35(60) = 18 cm.

Predict the handspan for someone who is 60 inches tall:Predicted handspan = -3 + 0.35(60) = 18 cm.

Note: in a statistical relationship, there is variation from the

average pattern.

Page 17: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 17

Interpreting the y-Intercept and the Slope

b0 = -3 is the y-intercept, the estimated or predicted handspan for someone whose height (x) is 0 inches. No meaningful interpretation in this example.

b1 = 0.35 is the slope, we estimate the handspan increases by 0.35 cm, on average, for each increase of 1 inch in height.

Regression equation: Handspan = -3 + 0.35 Height

Page 18: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 18

Regression equation: Distance = 577 - 3 Age

Example 3.7 Driver Age and MaximumLegibility Distance of Highway Signs

Estimate the average distance for 20-year-old drivers:Average distance = 577 – 3(20) = 517 ft.

Predict the legibility distance for a 20-year-old driver:Predicted distance = 577 – 3(20) = 517 ft.

Slope of –3 tells us that, on average, the legibility distance decreases 3 feet when age increases by 1 year.

Page 19: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 19

Prediction Errors and Residuals

• Prediction Error = difference between the observed value of y and the predicted value .

• Residual =

y

yy ˆ

Page 20: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 20

Regression equation: = 577 – 3x

Example 3.8 Prediction Errors for the Highway Sign Data

Can compute the residual for all 30 observations.Positive residual observed value higher than predicted.Negative residual observed value lower than predicted.

x = Age y = Distance Residual

18 510 577 – 3(18)=523 510 – 523 = -13

20 590 577 – 3(20)=517 590 – 517 = 73

22 516 577 – 3(22)=511 516 – 511 = 5

xy 3577ˆ

y

Page 21: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 21

The Least Squares Estimation Criterion

Least Squares Regression Line: minimizes the sum of squared prediction errors.

SSE: Sum of squared prediction errors.

Formulas for Slope and Intercept:

ii

iii

xx

yyxxb

21

xbyb 10

Page 22: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 22

3.3 Measuring Strength and Direction with Correlation

• The strength of the relationship is determined by the closeness of the points to a straight line.

• The direction is determined by whether one variable generally increases or generally decreases when the other variable increases.

Correlation r indicates the strength and the direction of a straight-line relationship.

Page 23: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 23

Interpretation the Correlation Coefficient

• r is always between –1 and +1• magnitude indicates the strength• r = –1 or +1 indicates a perfect linear relationship• sign indicates the direction• r = 0 indicates a slope of 0 so knowing x does not

change the predicted value of y

• Formula for correlation:

y

i

x

i

s

yy

s

xx

nr

1

1

Page 24: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 24

Example 3.10 Correlation Between Handspan and Height

Regression equation: Handspan = -3 + 0.35(Height)

Correlation r = +0.74

a somewhat strong positive linear relationship.

Page 25: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 25

Regression equation: Distance = 577 – 3(Age)

Example 3.11 Correlation Between Age and Sign Legibility Distance

Correlation r = -0.8 a somewhat strong negative linear association.

Page 26: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 26

Example 3.12 Left and Right Handspans

If you know the span of a person’s right hand, can you accurately predict his/her left handspan?Correlation r = +0.95 a very strong positive linear relationship.

Page 27: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 27

Example 3.13 Verbal SAT and GPAGrade point averages (GPAs) and verbal SAT scores for a sample of 100 university students.

Correlation r = 0.485

a moderately strong positive linear relationship.

Page 28: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 28

Example 3.14 Age and Hours of TVWatching per Day

Relationship between age and hours of daily television viewing for 1299 survey respondents.

Correlation r = 0.136 a weak connection.Note: a few claimed to watch more than 24 hours/day!

Page 29: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 29

Example 3.15 Hours of Sleep and Hours of Study

Relationship between reported hours of sleep the previous 24 hours and the reported hours of study during the same period for a sample of 116 college students.

Correlation r = –0.36 a not too strong negative association.

Page 30: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 30

Interpretation of and Formula for r2

Squared correlation r2 is between 0 and 1 and indicates the proportion of variation in the response explained by x.

SSTO = sum of squares total = sum of squared differences between observed y values and .

SSE = sum of squared errors (residuals) = sum of squared differences between observed y values and predicted values based on least squares line.

SSTO

SSESSTOr

2

y

Page 31: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 31

Interpretation of r2

Height and Right Handspanr2 = 0.55 Height explains 55% of the variation among observed right handspans

TV viewing and Ager2 = 0.0185 only about 1.85%; knowing a person’s age doesn’t help much in predicting amount of daily TV viewing.

Page 32: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 32

Reading Computer Results for Regression

Page 33: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 33

3.4 Regression and Correlation Difficulties and Disasters

• Extrapolating too far beyond the observed range of x values

• Allowing outliers to overly influence results

• Combining groups inappropriately

• Using correlation and a straight-line equation to describe curvilinear data

Page 34: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 34

Extrapolation

• Risky to use a regression equation to predict values far outside the range where the original data fell (called extrapolation).

• No guarantee that the relationship will continue beyond the range for which we have observed data.

Page 35: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 35

Example 3.17 Height and Foot Length

Regression equation uncorrected data: 15.4 + 0.13 heightcorrected data: -3.2 + 0.42 height

Correlationuncorrected data: r = 0.28corrected data: r = 0.69

Three outliers were data entry errors.

Page 36: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 36

Example 3.18 Earthquakes in US

Correlationall data: r = 0.26w/o SF: r = –0.824

Page 37: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 37

Example 3.19 Height and Lead Feet

Scatterplot of all data: College student heights and responses to the question “What is the fastest you have ever driven a car?”

Scatterplot by gender:Combining two groups led to illegitimate correlation

Page 38: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 38

Example 3.20 U.S. Population Predictions

Correlation: r = +0.96Regression Line: population = –2348 + 1.289(Year)Poor Prediction for Year 2030 = –2348 + 1.289(2030) or about 269 million, due to curved (not linear) pattern.

Population of US (in millions) for each census year between 1790 and 2000.

Page 39: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 39

3.5 Correlation Does Not Prove Causation

1. Causation2. Confounding Factors Present3. Explanatory and Response are both

affected by other variables4. Response variable is causing a change

in the explanatory variable

Interpretations of an Observed Association

Page 40: Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Quantitative Variables Chapter 3 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 40

Case Study 3.1 A Weighty Issue

Relationship between Actual and Ideal Weight

Females Males