Transcript

Chapter 15

Inference for Regression

The Regression Model

• We can consider each coordinate pair as a part of a random sample

• For each x-coordinate, there are a number of possible y-coordinates that could have been recorded.

• Depending on the outcomes of the random nature of the experiment, we could have different regression models for every experiment conducted

The regression model

• There must be some “true regression model” that we are approximating with our experiments:y = + x

y = “the average response variable”

= “the true value of the intercept” = “the true value of the slope”

• The std dev of y is the same for all values of x• For a fixed value of x, the responses (y) vary

according to a Normal distribution

Confidence Interval for

This is a PANIC procedure. As always, some of the steps will be the same for a significance test.

Parameterwe are constructing a confidence interval for the value of .

“ is the value of the slope of the regression line of (response var) on (explanatory var)”

Confidence Interval for

Assumptions(1) Observations are independent(2) The true relationship is linear

-check scatterplot “scatterplot appears linear”-check residuals “residuals do not show any

pattern

Confidence Interval for Assumptions (cont.)(3) The std dev is the same everywhere

-check residuals“residuals do not show a increasing/decreasing

fan pattern”(4) The response varies Normally about the true regression line

-check the histogram of the residuals“Histogram of residuals show a single peaked,

symmetric distribution”this distribution may be slightly skewed.NO OUTLIERS

Confidence Interval for

Name of the Interval“We are constructing a (CL)% confidence interval for the value of .

“We are constructing a 90% confidence interval for the value of .”

Confidence Interval for

Interval calculations the calculations here get messy: most of the time, we read standard deviations from printouts or calculator work.there are actually 3 standard deviations on a normal printoutstd error of ‘a’ = “SEa” (we will never use this)

std error of ‘b’ = “SEb”

std error of residuals = “s”

Confidence Interval for

Interval calculations (cont.) Calculation of standard errors:

laborious calculations, indeed!

2

2

residuals

2

2

sn

y ys

n

2b

sSE

x x

PHANTOMS again

Interval Calculations (cont.)From a printout:

PHANTOMS again

Interval Calculations (cont.)From a printout:

a

PHANTOMS again

Interval Calculations (cont.)From a printout:

b

PHANTOMS again

Interval Calculations (cont.)From a printout:

r2

PHANTOMS again

Interval Calculations (cont.)From a printout:

SEb

PHANTOMS again

Interval Calculations (cont.)From a printout:

s (SE of residuals)

PHANTOMS again

Interval Calculations (cont.)From a printout:

IGNORED

Confidence Interval for

Interval calculationsUsing your calculator to find SEb will be shown laterCI = b ± t*df (SEb)

df = n – 2Conclusion

We are (CL)% confident that the value of the slope of the regression line of (response) on (explanatory) is in the interval (CI).

Confidence Interval for

Example from print-out

b = 0.018, SEb = 0.0024, n = 16

Confidence Interval for

Example from print-outb = 0.018, SEb = 0.0024, n = 16, df = 14For a 95% CI, t* = 2.145

(chart or “-invT(-0.05/2)”)CI = 0.018 ± 2.145(0.0024) = (0.013,

0.023)“We are 95% confident that the value of the slope of the regression line of BAC level on number of beers drunk is in the interval (0.013, 0.023).”

PHANTOMS again

Parameterfor these procedures, we are conducting a significance test on the value of .

“ is the value of the slope of the regression line of (response var) on (explanatory var)”

PHANTOMS again

HypothesisWhen = 0 there is no linear relationship between the two variables.

H0: = 0 (there is no linear relationship)

Ha: 0 (there is a linear relationship) or,

Ha: < 0 (there is a neg. linear relationship) or,Ha: > 0 (there is a positive linear relationship)

PHANTOMS again

Assumptions(1) Observations are independent(2) The true relationship is linear

-check scatterplot “scatterplot appears linear”-check residuals “residuals do not show any

pattern

PHANTOMS again

Assumptions (cont.)(3) The std dev is the same everywhere

-check residuals“residuals do not show a increasing/decreasing

fan pattern”(4) The response varies Normally about the true regression line

-check the histogram of the residuals“Histogram of residuals show a single peaked,

symmetric distribution”this distribution may be slightly skewed.NO OUTLIERS

PHANTOMS again

Name of Test“t-test for the slope of a linear regression”

PHANTOMS again

Test Statisticthe calculations here get messy: most of the time, we read standard deviations from printouts or calculator work.there are actually 3 standard deviations on a normal printoutstd error of ‘a’ = “SEa” (we will never use this)

std error of ‘b’ = “SEb”

std error of residuals = “s”

PHANTOMS again

Test Statistic (cont.)Calculation of standard errors:

laborious calculations, indeed!

2

2

residuals

2

2

sn

y ys

n

2b

sSE

x x

PHANTOMS again

Test Statistic (cont.)

In actuality, a printout will have the test statistic almost completely calculated for you!

b

btSE

2df n

PHANTOMS again

Interval Calculations (cont.)From a printout:

PHANTOMS again

Interval Calculations (cont.)From a printout:

a

PHANTOMS again

Interval Calculations (cont.)From a printout:

b

PHANTOMS again

Interval Calculations (cont.)From a printout:

r2

PHANTOMS again

Interval Calculations (cont.)From a printout:

SEb

PHANTOMS again

Interval Calculations (cont.)From a printout:

s (SE of residuals)

PHANTOMS again

Interval Calculations (cont.)From a printout:

IGNORED

PHANTOMS again

P ValueHa: < 0; p val = P(t < T)

PHANTOMS again

P ValueHa: < 0; p val = P(t < T)

PHANTOMS again

P ValueHa: < 0; p val = P(t < T)Ha: > 0; p val = P(t > T)

PHANTOMS again

P ValueHa: < 0; p val = P(t < T)Ha: > 0; p val = P(t > T)

PHANTOMS again

P ValueHa: < 0; p val = P(t < T)Ha: > 0; p val = P(t > T)Ha: 0; pval = 2 x P(t > |T|)

PHANTOMS again

P ValueHa: < 0; p val = P(t < T)Ha: > 0; p val = P(t > T)Ha: 0; pval = 2 x P(t > |T|)

PHANTOMS again

DecisionSimilarly to the other tests, reject the null hypothesis when the p-value is below the accepted level

SummaryUse the same 3 part summary:1) Interpret the p-value w.r.t. sampling distribution2) Make decision with reference to an alpha level3) Summarize the results in context of the problem

Calculator Methods

• The TI83/84/89 must have the data in list1/list2

• TI83/84 [STAT] -> “TEST” -> “LinRegTTest”

• TI89[APPS] -> “Stat/List Editor” -> [TESTS] -> “LinRegTTest”

• Select Xlist, Ylist, and Ha

• “Calculate”• There are 2 screens of data

Problem 15.1

• We will try to determine whether a linear regression will fit the data

Problem 15.1

Begin by inputting the data in our lists

Problem 15.1

Begin by inputting the data in our lists

Problem 15.1

Begin by inputting the data in our listsParameter

“ is the slope of the true regression line of IQ on peaks of infant crying in a their most active 20 second interval”

HypothesesH0: = 0

Ha: 0

Problem 15.1

Assumptions(1) Independence“The peaks of infant crying is independent from infant to infant”

Problem 15.1

Assumptions(1) Independence“The peaks of infant crying is independent from infant to infant”(2) Linearity(note: you will need to run a regression before you analyze residuals)

Problem 15.1

Assumptions(1) Independence“The peaks of infant crying is independent from infant to infant”(2) Linearity(note: you will need to run a regression before you analyze residuals)

Problem 15.1

Assumptions(1) Independence“The peaks of infant crying is independent from infant to infant”(2) Linearity(note: you will need to run a regression before you analyze residuals)“The scatterplot appears moderately linear”

Problem 15.1

Assumptions(1) Independence“The peaks of infant crying is independent from infant to infant”(2) Linearity(note: you will need to run a regression before you analyze residuals)“The scatterplot appears moderately linear”

Problem 15.1

Assumptions(1) Independence“The peaks of infant crying is independent from infant to infant”(2) Linearity(note: you will need to run a regression before you analyze residuals)“The scatterplot appears moderately linear”“The residual plot shows no obvious pattern”

Problem 15.1

Assumptions (cont.)(3) standard deviations

Problem 15.1

Assumptions (cont.)(3) standard deviations

Problem 15.1

Assumptions (cont.)(3) standard deviations“The residual plot does not show a fan pattern; the standard deviation is most likely the same along the line”

Problem 15.1

Assumptions (cont.)(3) standard deviations“The residual plot does not show a fan pattern; the standard deviation is most likely the same along the line”(4) Normal responses

Problem 15.1

Assumptions (cont.)(3) standard deviations“The residual plot does not show a fan pattern; the standard deviation is most likely the same along the line”(4) Normal responses

Problem 15.1

Assumptions (cont.)(3) standard deviations“The residual plot does not show a fan pattern; the standard deviation is most likely the same along the line”(4) Normal responses“The Histogram of the residuals is right skewed, but our procedure is robust enough to handle the skewness (n = 38)”

Problem 15.1

Name of the TestWe will perform a “t-test for linear regressions”

Test Statistict = b / SEb

Use your calculator to find t and p

Problem 15.1

Name of the TestWe will perform a “t-test for linear regressions”

Test Statistict = b / SEb

Use your calculator to find t and p

Problem 15.1

Name of the TestWe will perform a “t-test for linear regressions”

Test Statistict = b / SEb

Use your calculator to find t and p

Problem 15.1

Name of the TestWe will perform a “t-test for linear regressions”

Test Statistict = b / SEb

Use your calculator to find t and p

Problem 15.1

Name of the TestWe will perform a “t-test for linear regressions”

Test Statistict = b / SEb

Use your calculator to find t and p

Problem 15.1

Name of the TestWe will perform a “t-test for linear regressions”

Test Statistict = b / SEb

Use your calculator to find t and pt = 3.065 and p = 0.004If needed, used the equation for t to solve for SEb

Problem 15.1

Obtain p-valuepvalue = 2 x P(t > 3.065)“2*tcdf(3.065,1E99,36)”pvalue = 0.004

Make decisionreject null hypothesis

Problem 15.1

Summary“Approximately 0.4% of the time, a random sample of 38 will produce a test statistic at least as extreme as 3.065”“Because this p-value is less than an alpha of 0.01, we will reject the null hypothesis”“We must conclude that a linear relationship between the peaks of a baby crying in a 20 second interval and the babies IQ does exist.”

OMG WE FINISHED THE BOOK

top related