Chapter 12 Section 1 Inference for Linear Regression.

Chapter 12 Section 1Inference for Linear Regression

Inference for Linear Regression

Students will be able to check conditions for performing inference

about the slope (beta) for the population (true) regression line.

to interpret computer output from a least-squares regression analysis

to construct and interpret a confidence interval for the slope (beta) of the population (true) regression line.

to perform a significance test about the slope (beta) of a population (true) regression line.

Inference for Linear Regression

Observing the scatter plot on pp. 739, the line that is draw out is known as the population regression line due to it using all the observations.

If we take sample size out of the population (still use the equation (y (phat) = a + bx) for the sample regression line. More than likely the slope of the sample will

vary on your choice of samples. The pattern of variation in the slope b is described by its sampling distribution.

Sampling Distribution of b

Confidence intervals and significance tests about the slope of the population regression line is based upon the sampling distribution of b, the slope of the sample regression line.


Describing the approximate sampling distribution: Shape – a strong linear pattern in the graph

tells that the approximate sampling distribution is close to Normal.

Center – calculate the mean: as long as the mean of the sample is close to the mean of the population, then you are good.

Spread – calculate the standard deviation: same rules of the center applies

Conditions for Regression Inference

Conditions: Linear – the actual relationship between x and y is

linear. For any fixed value of x, the mean response (mhew), falls on the population (true) regression line mhewx = alpha + betax. The slope beta and intercept alpha are usually unknown parameters.

Independent – individual observations are independent of each other (one does not effect the other)

Normal – for any fixed value of x, the response y varies according to a normal distribution.

Equal variance – the standard deviation of y (call it sigma) is the same for all values of x. The common standard deviation sigma is usually an unknown parameter.

Random – the data come from a well-designed random sample or randomized experiment.

Conditions for Regression Inference

Regression model tells us: a linear regression tells us whatever x does it concludes with a predicted y value.

**** Remember to always check conditions before doing inference about the regression model.

Take a look at example on pp. 743 - 744

Estimating the Parameters

When conditions are met, we can proceed to calculating the unknown parameters.

If we calculate the least-square regression line, the slope is an unbiased estimator of the true slope and the y intercept is an unbiased estimator of the true y intercept. The remaining parameter is the standard deviation (sigma), which describes the variability of the response y about the population (true) regression line.

Residuals estimate how much y varies about the population line. The standard deviation of responses about the population regression line, we estimate standard deviation using the formula at the top of page 745

Estimating the Parameters

Take a look at example on pp. 745

It is possible to do inference about any of the three parameters. However, the slope (beta) is usually the most important parameter in a regression problem. So try to stick with that one.


For spread – since we do not know the standard deviation, then we estimate using the standard deviation of the residuals. Then we estimate the spread of the sampling distribution of b with the standard error of the slope (formula on pp. 746)

If we transform to a formula we use (the middle formula on pp. 746) which translates to the last formula (use this one). Now when calculating the degrees of freedom take the “n” value and subtract 2 from it (we use 2 instead of 1 – explanation is deeper and complicating).

Constructing a Confidence Interval for

the SlopeThe slope (beta) is the rate of change of the

mean response as the explanatory variable increases. Mhew x = alpha + beta x

A confidence interval is more useful than the point estimate because it shows how precise the estimate b is likely to be. (Statistic) plus/minus (critical value) * (standard

deviation) B plus/minus t * Seb

Take a look at yellow box on pp. 747 and example on pp. 747-748

Performing a Significance Test for the

SlopeNull hypothesis has the general form H0 : beta

= hypothesized value.

To do the test: Test statistic = (statistic – parameter) /

(standard deviation of statistic) T = b – beta0 / Seb

To find p-value, use t distribution with n – 2 Take a look at yellow box on pp. 751 Take a look at the remainder of the examples in

this section for clarification

Chapter 12 Section 1 Inference for Linear Regression.

Documents