Stor 155, Section 2, Last Time • 2-way Tables – Sliced populations in 2 different ways – Look for independence of factors – Chi Square Hypothesis test • Simpson’s Paradox – Aggregating can give opposite impression • Inference for Regression – Sampling Distributions – TDIST & TINV
44
Embed
Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Stor 155, Section 2, Last Time• 2-way Tables
– Sliced populations in 2 different ways
– Look for independence of factors
– Chi Square Hypothesis test
• Simpson’s Paradox
– Aggregating can give opposite impression
• Inference for Regression
– Sampling Distributions – TDIST & TINV
Reading In Textbook
Approximate Reading for Today’s Material:
Pages 634-667 & Review
Approximate Reading for Next Class:
Pages 634-667 & Review
Inference for RegressionChapter 10
Recall:
• Scatterplots
• Fitting Lines to Data
Now study statistical inference associated with fit lines
E.g. When is slope statistically significant?
Recall Scatterplot
For data (x,y)
View by plot:
(1,2)
(3,1)
(-1,0)
(2,-1)
Toy Scatterplot, Separate Points
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
-2 -1 0 1 2 3 4
x
y
Recall Linear Regression
Idea:
Fit a line to data in a scatterplot
• To learn about “basic structure”
• To “model data”
• To provide “prediction of new values”
Recall Linear Regression
Given a line, , “indexed” by
Define “residuals” = “data Y” – “Y on line”
=
Now choose to make these “small”
),( 11 yx
abxy
)( abxy ii
),( 22 yx
),( 33 yx
ab&
ab&
Recall Linear Regression
Make Residuals > 0, by squaring
Least Squares: adjust to
Minimize the “Sum of Squared Errors”
ab&
21
)(
n
iii abxySSE
Least Squares in Excel
Computation:
1. INTERCEPT (computes y-intercept a)
2. SLOPE (computes slope b)
Revisit Class Example 14http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg14.xls
• Big (small) for big (small, resp.)– Accurate data Accurate est. of slope
• Small for x’s more spread out– Data more spread More accurate
• Small for more data– More data More accuracy
a
n
ii
ea
xxaSD
1
2ˆ
e
Inference for RegressionFormula for SD of :
• Big (small) for big (small, resp.)– Accurate data Accur’te est. of intercept
• Smaller for – Centered data More accurate intercept
• Smaller for more data– More data More accuracy
b
n
ii
eb
xx
xn
bSD
1
2
21ˆ
e
0x
Inference for RegressionOne more detail:
Need to estimate using data
For this use:
• Similar to earlier sd estimate,
• Except variation is about fit line
• is similar to from before
e
2
ˆˆ1
2
n
bxays
n
iii
e
s
2n 1n
Inference for Regression
Now for Probability Distributions,
Since are estimating by
Use TDIST and TINV
With degrees of freedom =
e es
2n
Inference for RegressionConvenient Packaged Analysis in Excel:
Tools Data Analysis Regression
Illustrate application using:
Class Example 32,
Old Text Problem 10.12
Inference for RegressionClass Example 32,
Old Text Problem 10.12Utility companies estimate energy used by
their customers. Natural gas consumption depends on temperature. A study recorded average daily gas consumption y (100s of cubic feet) for each month. The explanatory variable x is the average number of heating degree days for that month. Data for October through June are:
Inference for RegressionData for October through June are:
Inference for RegressionClass Example 33, (10.23 – 10.25)
Engineers made measurements of the Leaning Tower of Pisa over the years 1975 – 1987. “Lean” is the difference between a points position if the tower were straight, and its actual position, in tenths of a meter, in excess of 2.9 meters. The data are:
Inference for RegressionClass Example 33,
(10.23 – 10.25)
The data are:
Year Lean
75 642
76 644
77 656
78 667
79 673
80 688
81 696
82 698
83 713
84 717
85 725
86 742
87 757
Inference for RegressionClass Example 33, (10.23 – 10.25) :
(a) Plot the data, does the trend in lean over time appear to be linear?
(b) What is the equation of the least squares fit line?
(c) Give a 95% confidence interval for the average rate of change of the lean.