Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 14 1
Chapter 14
Describing Relationships: Scatterplots and Correlation
Chapter 14 1
Chapter 14 2
Thought Question 1
For all cars manufactured in the U.S., there is a positive correlation between the size of the engine and horsepower. There is a negative correlation between the size of the engine and gas mileage. What does it mean for two variables to have a positive correlation or a negative correlation?
Chapter 14 3
ScatterplotA Scatterplot shows the relationship between two quantitative variables measured on the same individuals. The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis. Each individual in the data appears as the point in the plot fixed by the values of both variables for that individual.
Figure 14.9 Scatterplot of average SAT Mathematics score for each state againstthe proportion of the state’s high school seniors who took the SAT. The light-coloredpoint corresponds to two states. (This figure was created using the Minitab softwarepackage.)
Figure 14.3 Scatterplot of the life expectancy of people in many nations against eachnation’s gross domestic product per person. (This figure was created using the Minitab software package.)
Chapter 14 6
Examining a Scatterplot
In any graph of data, look for the overall pattern and for striking deviations from that pattern.You can describe the overall pattern of a scatterplot by the direction, form, and strength of the relationship.An important kind of deviation is an outlier, an individual value that falls outside the overall pattern of the relationship.
Chapter 14 7
Positive association, Negative association
Two variables are positively/Negatively associated when above-average values of one tend to accompany above-average /below-average values of the other.The scatterplot slops upward/downward as we move from the left to right.
• Our scatterplot regarding the SAT scores shows two clusters of states.
• The one with the GDP shows a curved relationship.
• The strength of a relationship in a scatterplot is determined by how closely the points follow a clear form.
• The relationship in both our plots are not strong. – States with similar percentages show quite a bit of
scatter in their average scores.– Nations with similar GDPs can have quite different life
expectancies.
Chapter 14 8
A scatterplot with strong relationship
Chapter 14 9
Figure 14.5 Scatterplot of the lengths of two bones in 5 fossil specimens of the extinct beast Archaeopteryx.
Statistical versus Deterministic Relationships
• Distance versus Speed (when travel time is constant).
• Income (in millions of dollars) versus total assets of banks (in billions of dollars).
Chapter 14 10
Distance versus Speed
• Distance = Speed Time• Suppose time = 1.5 hours• Each subject drives a fixed
speed for the 1.5 hrs.– speed chosen for each subject
varies from 10 mph to 50 mph
• Distance does not vary for those who drive the same fixed speed
• Deterministic relationship
0
10
20
30
40
50
60
70
80
0 20 40 60
speed
dis
tan
ce
Chapter 14 11
Income versus Assets
• Income =a + bAssets?
• Assets vary from 3.4 billion to 49 billion
• Income varies from bank to bank, even among those with similar assets
• Statistical relationship0
50
100
150
200
250
300
0 20 40 60
assets (billions)
inco
me
(mil
lio
ns)
Chapter 14 12
Linear Relationship
Some relationships are such that the points of a scatterplot tend to fall along a straight line -- linear relationship.
Chapter 14 13
Measuring Strength & Directionof a Linear Relationship
• How closely does a non-horizontal straight line fit the points of a scatterplot?
• The correlation coefficient (often referred to as just correlation) r is a measure of: – the strength of the relationship: the stronger the
relationship, the larger the magnitude of r.– the direction of the relationship: positive r indicates
a positive relationship, negative r indicates a negative relationship.
Chapter 14 14
Correlation Coefficient
• special values for r : A perfect positive linear relationship would have r = +1. A perfect negative linear relationship would have r = -1.
If there is no linear relationship, or if the scatterplot points are best fit by a horizontal line, then r = 0.
Note: r must be between -1 and +1, inclusive.• r > 0: as one variable changes, the other variable
tends to change in the same direction.• r < 0: as one variable changes, the other variable
tends to change in the opposite direction.
Chapter 14 15
Figure 14.7 How correlation measures the strength of a straight-line relationship.Patterns closer to a straight line have correlations closer to 1 or −1.
Chapter 14 17
Correlation Calculation• Suppose we have data on variables X and Y
for n individuals:x1, x2, … , xn and y1, y2, … , yn
• Each variable has a mean and std dev: ) ) y
xs( x, s ( y, s (see ch. 12 for ) and
n
1i y
i
x
i
s
yy
s
xx
1-n
1r
Chapter 14 18
Case Study
Per Capita Gross Domestic Productand Average Life Expectancy for
Countries in Western Europe
Chapter 14 19
Case StudyCountry Per Capita GDP (x) Life Expectancy (y)
Austria 21.4 77.48
Belgium 23.2 77.53
Finland 20.0 77.32
France 22.7 78.63
Germany 20.8 77.17
Ireland 18.6 76.39
Italy 21.5 78.51
Netherlands 22.0 78.15
Switzerland 23.8 78.99
United Kingdom 21.2 77.37
Chapter 14 20
Case Studyx y
21.4 77.48 -0.078 -0.345 0.027
23.2 77.53 1.097 -0.282 -0.309
20.0 77.32 -0.992 -0.546 0.542
22.7 78.63 0.770 1.102 0.849
20.8 77.17 -0.470 -0.735 0.345
18.6 76.39 -1.906 -1.716 3.271
21.5 78.51 -0.013 0.951 -0.012
22.0 78.15 0.313 0.498 0.156
23.8 78.99 1.489 1.555 2.315
21.2 77.37 -0.209 -0.483 0.101
= 21.52 = 77.754sum = 7.285
sx =1.532 sy =0.795
yi /syy xi /sxx
x y
y
i
x
i
s
y-y
s
x-x
Chapter 14 21
Case Study
There is a strong, positive linear relationship between Per Capita GDP (x) and Life Expectancy (y).
Problems with Correlations
• Outliers can inflate or deflate correlations.• Groups combined inappropriately may mask
relationships (a third variable).– groups may have different relationships when
separated.
Chapter 14 22
Figure 14.8 Moving one point reduces the correlation from r = 0.994 to r = 0.640.
Not all Relationships are LinearMiles per Gallon versus Speed
• Linear relationship?MPG = a + bSpeed?
• Speed chosen for each subject varies from 20 mph to 60 mph.
• MPG varies from trial to trial, even at the same speed.
• Statistical relationship
Chapter 14 24
Not all Relationships are LinearMiles per Gallon versus Speed
• Curved relationship(r is misleading)
• Speed chosen for each subject varies from 20 mph to 60 mph.
• MPG varies from trial to trial, even at the same speed.
• Statistical relationship
0
5
10
15
20
25
30
35
0 50 100
speed
mil
es p
er g
allo
n
Chapter 14 25
Price of Books versus Size
• Relationship between price of books and the number of pages?
• Positive?• Look at paperbacks:• Look at hardcovers:• All books together:• Overall correlation is
Negative!
0
20
40
60
80
100
120
140
0 100 200 300 400
# of pages
pri
ce (
do
llar
s)
Chapter 14 26
Key Concepts
• Statistical vs. Deterministic Relationships• Statistically Significant Relationship• Strength of Linear Relationship• Direction of Linear Relationship• Correlation Coefficient• Problems with Correlations
Chapter 14 27