Linear Correlation To accompany Hawkes lesson 12.1 Original content by D.R.S.
Linear Correlation
To accompany Hawkes lesson 12.1Original content by D.R.S.
Linear Correlation
• Input: A bunch of data points• Take two measurements from each
member in your sample.• Example: Weight and Blood Pressure
• Output: “There is / is not a significant linear relationship between and .
Visual Assessment of Correlation
• A Scatter Plot of the (x,y) ordered pairs in your sample data can give you a notion of what the relationship might be.
• Do the points line up in a straight line?– Or in sort-of a straight-ish line?– Or all over the place with no apparent relationship
between x and y?– Or in a curvy curve pattern?
Types of Relationships:
HAWKES LEARNING SYSTEMS
math courseware specialists
Regression, Inference, and Model Building
12.1 Scatter Plots and Correlation
Strong Linear
Relationship
Non-LinearRelationship
NoRelationship
Weak LinearRelationship
The Horses Example
• Some horses were measured– Height (in hands?), Girth (inches), Length (inches),
Weight (pounds)– Put these data
values into your TI-84 lists L1, L2, L3, L4.
• Original data source and idea for this problem is “Elementary Statistics” by Johnson & Kuby, 10th Edition, © Brooks-Cole-Thomson, Page 702.
Question: “Is Girth related to Weight?”
• We wonder: is the girth of a horse related to its weight? Significantly so?
• ρ (Greek letter rho) is the population parameter for the Correlation Coefficient
• r (our alphabet’s letter r) is the sample statistic for the Correlation Coefficient
• We use our sample r to estimate the population’s parameter ρ
The Correlation Coefficient
• If , it means there is absolutely no relationship between Girth () and Height ()
• If , it means there is perfect positive correlation between girth and height.
• If , it means there is perfect negative correlation between girth and height.
• There’s an awful formula to compute .• Remember: sample estimates population .
• Pearson Correlation Coefficient, – the parameter that measures the strength of a linear relationship for the population.
• Correlation Coefficient, r – measures how strongly one variable is linearly dependent upon the other for a sample.
Correlation coefficient:
HAWKES LEARNING SYSTEMS
math courseware specialists
Regression, Inference, and Model Building
12.1 Scatter Plots and Correlation
When calculating the correlation coefficient, round your answers to three decimal places.
HAWKES LEARNING SYSTEMS
math courseware specialists
Regression, Inference, and Model Building
12.1 Scatter Plots and Correlation
• –1 ≤ r ≤ 1
• Close to –1 means a strong negative correlation.
• Close to 0 means no correlation.
• Close to 1 means a strong positive correlation.
Hypothesis Test for significant
• Null Hypothesis: “No relationship”• Alternative:
“But there IS a significant relationship!”• There’s some level of significance specified in
advance, like or • It involves calculating a value and finding “what
is the -value of this ?”• And if -value < , reject the null hypothesis
– If so, then we say “Yes, significant relationship!”
Hypothesis Test for significant
• Usually we do this two-tailed test:– Null Hypothesis : “No relationship”– Alternative Hypothesis: , “There is a significant
linear relationship.”• Be aware of a couple one-tailed variations:
– Test for significant POSITIVE correlation only:using and
– Test for significant NEGATIVE correlation only:using and
“Is a horse’s Girth significantly correlated to its Weight?”
• Here’s how we do the Hypothesis Test for
• Let’s suppose that level of significance , requiring strong evidence.
• STATS, TESTS, F:LinRegTTest– Shortcut instead of scrolling: ALPHA F directly.– But it might be option E on TI-83/Plus.
LinRegTTest inputs
• Here are the inputs:
• Xlist and Ylist – where you put the data– Shortcut: 2ND 2 puts L2
• Freq: 1 (unless…)
• β & : ≠ 0– This is the Alternative
Hypothesis
• RegEq: VARS, right arrow to Y-VARS, 1, 1– Just put it in for later
• Highlight “Calculate”• Press ENTER
LinRegTTest Outputs, first screen
•
• t= the t statistic value for this test (the formula is in the book)
• p = the p-value for this t test statistic
• in this kind of a test• later – for regression
LinRegTTest Outputs, second screen
• b later, for Regression• s much later, for
advanced Regression
• r2 = how much of the output variable (weight) is explained by the input variable (girth)
• r = the correlation coefficient for the sample– Close to – strong
positive relationship– Or – strong negative
Making the Decision
• We will use the p-value method.• Compare the -value (as calculated by the TI-84)
to the Level Of Significance value for this experiment.
• In this example, (it was chosen during the design of the experiment) and the calculator computed p=5.3448432E-5 ,
• Since , reject the null hypothesis. There IS a significant linear rel. between girth and height.
How did the calculator get r and r2?
• Here is the awful formula:
How did the calculator compute t ?
• Here is the awful formula:
Another test: Girth and Length
• Is there a significant relationship between a horse’s girth and length?
• What do you expect?– Think about people: do you expect a significant
relationship between waist size and height?
TI-84 Inputs and Outputsfor the Girth and Length question
Inputs• (Data already in lists)
OutputsFirst screen
Second screen
Girth and Length conclusions
Conclusions• What does the tell you in
this particular case?
• At the level of significance, is there a significant relationship between a horse’s girth and his length?
OutputsFirst screen
Second screen
An extra problem type in Hawkes
• They tell you three pieces of information:– The Level of Significance chosen, – The correlation coefficient calculated, – The sample size,
• They ask you “Is this significant?”• Use Table I on Page 777 to determine this.
Lookup in column and row.• If , then Yes, significant.
Determine the significance:
HAWKES LEARNING SYSTEMS
math courseware specialists
Regression, Inference, and Model Building
12.1 Scatter Plots and Correlation
a. r = 0.52, n = 19, a = 0.05
r = 0.456, Yes
b. r = 0.52, n = 19, a = 0.01
r = 0.575, No
c. r = –0.44, n = 35, a = 0.01
r = 0.430, Yes
Determine whether the following values of r are statistically significant.