Stat 31, Section 1, Last Time • 2-way tables – Testing for Independence – Chi-Square distance between data & model – Chi-Square Distribution – Gives P-values (CHIDIST) • Simpson’s Paradox: – Lurking variables can reverse comparisons • Recall Linear Regression – Fit a line to a scatterplot
42
Embed
Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Stat 31, Section 1, Last Time• 2-way tables
– Testing for Independence– Chi-Square distance between data & model– Chi-Square Distribution– Gives P-values (CHIDIST)
• Simpson’s Paradox:– Lurking variables can reverse comparisons
• Recall Linear Regression– Fit a line to a scatterplot
Recall Linear Regression
Idea:
Fit a line to data in a scatterplot
Recall Class Example 14https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg14.xls
• Big (small) for big (small, resp.)– Accurate data Accurate est. of slope
• Small for x’s more spread out– Data more spread More accurate
• Small for more data– More data More accuracy
a
n
ii
ea
xxaSD
1
2ˆ
e
Inference for RegressionFormula for SD of :
• Big (small) for big (small, resp.)– Accurate data Accur’te est. of intercept
• Smaller for – Centered data More accurate intercept
• Smaller for more data– More data More accuracy
b
n
ii
eb
xx
xn
bSD
1
2
21ˆ
e
0x
Inference for RegressionOne more detail:
Need to estimate using data
For this use:
• Similar to earlier sd estimate,
• Except variation is about fit line
• is similar to from before
e
2
ˆˆ1
2
n
bxays
n
iii
e
s
2n 1n
Inference for Regression
Now for Probability Distributions,
Since are estimating by
Use TDIST and TINV
With degrees of freedom =
e es
2n
Inference for RegressionConvenient Packaged Analysis in Excel:
Tools Data Analysis Regression
Illustrate application using:
Class Example 27,
Old Text Problem 8.6 (now 10.12)
Inference for RegressionClass Example 27,
Old Text Problem 8.6 (now 10.12)Utility companies estimate energy used by
their customers. Natural gas consumption depends on temperature. A study recorded average daily gas consumption y (100s of cubic feet) for each month. The explanatory variable x is the average number of heating degree days for that month. Data for October through June are:
Inference for RegressionData for October through June are:
Inference for RegressionClass Example 28, (now 10.13 – 10.15)
Old 10.8:
Engineers made measurements of the Leaning Tower of Pisa over the years 1975 – 1987. “Lean” is the difference between a points position if the tower were straight, and its actual position, in tenths of a meter, in excess of 2.9 meters. The data are:
Inference for RegressionClass Example 28,
(now 10.13 – 10.15)
Old 10.8:
The data are:
Year Lean
75 642
76 644
77 656
78 667
79 673
80 688
81 696
82 698
83 713
84 717
85 725
86 742
87 757
Inference for RegressionClass Example 28, (now 10.13 – 10.15)
Old 10.8:
(a) Plot the data, does the trend in lean over time appear to be linear?
(b) What is the equation of the least squares fit line?
(c) Give a 95% confidence interval for the average rate of change of the lean.
define: etymology• The history of words; the study of the history
of words.csmp.ucop.edu/crlp/resources/glossary.html
• The history of a word shown by tracing its development from another language.www.animalinfo.org/glosse.htm
And Now for Something Completely Different
What is “etymology”?
• Etymology is derived from the Greek word e/)tymon(etymon) meaning "a sense" and logo/j(logos) meaning "word." Etymology is the study of the original meaning and development of a word tracing its meaning back as far as possible.www.two-age.org/glossary.htm
And Now for Something Completely Different
Google response to: define: and now for something
completely differentAnd Now For Something Completely Different is a
film spinoff from the television comedy series Monty Python's Flying Circus. The title originated as a catchphrase in the TV show. Many Python fans feel that it excellently describes the nonsensical, non sequitur feel of the program. en.wikipedia.org/wiki/And_Now_For_Something_Completely_Different
And Now for Something Completely Different
Google Search for:
“And now for something completely different”
Gives more than 100 results….
A perhaps interesting one:
http://www.mwscomp.com/mpfc/mpfc.html
And Now for Something Completely Different
Google Search for:
“Stat 31 and now for something completely different”
Gives:
[PPT] Slide 1File Format: Microsoft Powerpoint 97 - View as HTML... But what is missing? And now for something completely different… Review Ideas on State Lotteries,. from our study of Expected Value ...https://www.unc.edu/~marron/ UNCstat31-2005/Stat31-05-03-31.ppt - Similar pages
Prediction in Regression
Idea: Given data
Can find the Least Squares Fit Line, and do
inference for the parameters.
Given a new X value, say , what will the
new Y value be?
nn YXYX ,,,, 11
0X
Prediction in Regression
Dealing with variation in prediction:
Under the model:
A sensible guess about ,
based on the given ,
is:
(point on the fit line above )
iii ebaXY
0Y
iY ebXaY ˆˆˆˆ 00
0X
0X
Prediction in Regression
What about variation about this guess?
Natural Approach: present an interval
(as done with Confidence Intervals)
Careful: Two Notions of this:
1. Confidence Interval for mean of
2. Prediction Interval for value of
0Y
0Y
Prediction in Regression
1. Confidence Interval for mean of :
Use:
where:
and where
0Y
YSEtY ˆ*ˆ
n
ii xx
xxn
sSEY
1
2
20
ˆ
1
)2,95.01(* nTINVt
Prediction in Regression
Interpretation of:
• Smaller for closer to
• But never 0
• Smaller for more spread out
• Larger for larger
0x x
n
ii xx
xxn
sSEY
1
2
20
ˆ
1
six
Prediction in Regression
2. Prediction Interval for value of
Use:
where:
And again
0Y
YSEtY ˆ*
0
n
ii xx
xxn
sSEY
1
2
20
ˆ
11
)2,95.01(* nTINVt
Prediction in Regression
Interpretation of:
• Similar remarks to above …
• Additional “1 + ” accounts for added
variation in compared to
n
ii
Y
xx
xxn
sSE
1
2
20
ˆ
11
Y0Y
Prediction in RegressionRevisit Class Example 28,
(now 10.13 – 10.15) Old 10.8:
Engineers made measurements of the Leaning Tower of Pisa over the years 1975 – 1987. “Lean” is the difference between a points position if the tower were straight, and its actual position, in tenths of a meter, in excess of 2.9 meters. The data are listed above…
Prediction in RegressionClass Example 28, (now 10.13 – 10.15)
Old 10.9:
(a) Plot the data, Does the trend in lean over time appear to be linear?
(b) What is the equation of the least squares fit line?
(c) Give a 95% confidence interval for the average rate of change of the lean.