Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data.
Post on 14-Jan-2016
216 Views
Preview:
Transcript
Bivariate Data
AS 90425 (3 credits)
Complete a statistical investigation involving bi-variate data
Scatter Plots
The scatter plot is the basic tool used to investigate relationships between two quantitative variables.
Types of Variables
Quantitative(measurements/
counts)
Qualitative(groups)
What do I see in these scatter plots?
There appears to be a linear trend.
There appears to be moderate constant scatter about the trend line.
Negative Association.
No outliers or groupings visible.
454035
20
19
18
17
16
15
14
Latitude (°S)
Mean January Air Temperatures for 30 New Zealand Locations
Tem
pera
ture
(°C
)
What do I see in these scatter plots?
There appears to be a non-linear trend.
There appears to be non-constant scatter about the trend line.
Positive Association. One possible outlier
(Large GDP, low % Internet Users).
0 10 20 30 40
GDP per capita (thousands of dollars)
0
10
20
30
40
50
60
70
80
Inte
rnet
Use
rs (
%)
% of population who are Internet Users vs GDP per capita for 202 Countries
What do I see in these scatter plots?
Two non-linear trends (Male and Female).
Very little scatter about the trend lines
Negative association until about 1970, then a positive association.
Gap in the data collection (Second World War).Year
1990198019701960195019401930
30
28
26
24
22
20
Ag
e
Average Age New Zealanders are First Married
What do I look for in scatter plots?
Trend Do you see
a linear trend… straight line
OR
a non-linear trend?
What do I look for in scatter plots?
Trend Do you see
a linear trend… straight line
OR
a non-linear trend?
What do I look for in scatter plots?
Trend Do you see
a positive association… as one variable gets bigger, so does the other
OR a negative association?
as one variable gets bigger, the other gets smaller
What do I look for in scatter plots?
Trend Do you see
a positive association… as one variable gets bigger, so does the other
OR a negative association?
as one variable gets bigger, the other gets smaller
What do I look for in scatter plots?
Scatter Do you see
a strong relationship… little scatter
OR
a weak relationship? lots of scatter
What do I look for in scatter plots?
Scatter Do you see
a strong relationship… little scatter
OR
a weak relationship? lots of scatter
What do I look for in scatter plots?
Scatter Do you see
constant scatter… roughly the same amount of scatter as you look across the plot
or non-constant scatter?
the scatter looks like a “fan” or “funnel”
What do I look for in scatter plots?
Scatter Do you see
constant scatter… roughly the same amount of scatter as you look across the plot
or non-constant scatter?
the scatter looks like a “fan” or “funnel”
What do I look for in scatter plots?
Anything unusual Do you see
any outliers? unusually far from the trend
any groupings?
What do I look for in scatter plots?
Anything unusual Do you see
any outliers? unusually far from the trend
any groupings?
Rank these relationships from weakest (1) to strongest (4):
Rank these relationships from weakest (1) to strongest (4):
2
1
4
3
Rank these relationships from weakest (1) to strongest (4):
How did you make your decisions? The less scatter there is about the
trend line, the stronger the relationship is.
Describing scatterplots
Trend – linear or non linear Trend – positive or negative Scatter – strong or weak Scatter – constant or non constant Anything unusual – outliers or
groupings
Correlation
Correlation measures the strength of the linear association between two quantitative variables
Get the correlation coefficient (r) from your calculator or computer
r has a value between -1 and +1: Correlation has no units
Calculating the coefficient of correlation (r)
The coefficient of correlation (r) is given by the following formula:
2 2
( )( )
( ) ( )
x x y yr
x x y y
Fortunately you will have Excel do all the work for you!
What can go wrong?
Use correlation only if you have two quantitative variables (variables that can be measured)
There is an association between gender and weight but there isn’t a correlation between gender and weight! Gender is not quantitative
Use correlation only if the relationship is linear
Beware of outliers!
Deceptive situations
r is a measure of how one variable varies in a linear relation to the other
An obvious pattern does not always indicate a high value of the coefficient of correlation (r)
A horizontal or vertical trend indicates that there is no relationship, and so r is (close to) zero
Always plot the data before looking at the correlation!
r = 0 No linear relationship, but
there is a relationship!
r = 0.9 No linear relationship, but
there is a relationship!
2007
For example: Rata's bean sprouts: r = 0
Minutes spent on measuring the plants10
Height of sprout
mm
10
20
30
2007
For example: Jenny's bean sprouts: r = 0
Soil depth (cm)2 4 6
Height of sprout
mm
10
20
Tick the plots where it would be OK to use a correlation coefficient to describe the strength of the relationship:
9876543210
4000300020001000
0
Position Number
Dis
tan
ce (
mill
ion
mile
s)
Distances of Planets from the SunReaction Times (seconds)
for 30 Year 10 Students
0
0.2
0.4
0.6
0.8
0 0.2 0.4 0.6 0.8 1
Non-dominant Hand
Dom
inan
t H
an
d
454035
20
19
18
17
16
15
14
Latitude (°S)
Mean January Air Temperatures for 30 New Zealand Locations
Tem
pera
ture
(°C
)
Female ($)
Average Weekly Income for Employed New Zealanders in 2001
Male
($
)
0
200
400
600
800
1000
1200
0 200
400
600
800
Tick the plots where it would be OK to use a correlation coefficient to describe the strength of the relationship:
9876543210
4000300020001000
0
Position Number
Dis
tan
ce (
mill
ion
mile
s)
Distances of Planets from the SunReaction Times (seconds)
for 30 Year 10 Students
0
0.2
0.4
0.6
0.8
0 0.2 0.4 0.6 0.8 1
Non-dominant Hand
Dom
inan
t H
an
d
454035
20
19
18
17
16
15
14
Latitude (°S)
Mean January Air Temperatures for 30 New Zealand Locations
Tem
pera
ture
(°C
)
Female ($)
Average Weekly Income for Employed New Zealanders in 2001
Male
($
)
0
200
400
600
800
1000
1200
0 200
400
600
800
Not linear
Remove two outliers, nothing happening
What do I see in this scatter plot?
Appears to be a linear trend, with a possible outlier (tall person with a small foot size.)
Appears to be constant scatter.
Positive association.
22 23 24 25 26 27 28 29
150
160
170
180
190
200
Foot size (cm)
Heig
ht
(cm
)
Height and Foot Size for 30 Year 10 Students
What will happen to the correlation coefficient if the tallest Year 10 student is removed?
It will get smaller
It won’t change
It will get bigger22 23 24 25 26 27 28 29
150
160
170
180
190
200
Foot size (cm)
Heig
ht
(cm
)
Height and Foot Size for 30 Year 10 Students
What will happen to the correlation coefficient if the tallest Year 10 student is removed?
It will get bigger22 23 24 25 26 27 28 29
150
160
170
180
190
200
Foot size (cm)
Heig
ht
(cm
)
Height and Foot Size for 30 Year 10 Students
What do I see in this scatter plot?
Appears to be a strong linear trend.
Outlier in X (the elephant).
Appears to be constant scatter.
Positive association.
6005004003002001000
40
30
20
10
Gestation (Days)
Life
Exp
ect
an
cy (
Years
)
Life Expectancies and Gestation Period for a sample of non-human Mammals
Elephant
What will happen to the correlation
coefficient if the elephant is removed?
It will get smaller
It won’t change
It will get bigger6005004003002001000
40
30
20
10
Gestation (Days)
Life
Exp
ect
an
cy (
Years
)
Life Expectancies and Gestation Period for a sample of non-human Mammals
Elephant
What will happen to the correlation
coefficient if the elephant is removed?
It will get smaller
6005004003002001000
40
30
20
10
Gestation (Days)
Life
Exp
ect
an
cy (
Years
)
Life Expectancies and Gestation Period for a sample of non-human Mammals
Elephant
top related