August 2016 Introducon to Econometrics is an introductory book for undergraduate students in economics and finance. It is designed to give students an understanding of why econometrics is necessary, and to provide them with a working knowledge of basic econometric tools so that (1) they can apply these tools to modeling, esmaon, inference, and forecasng in the context of real-world economic problems; (2) they can evaluate crically the results and conclusions from others who use basic econometric tools; (3) they have a foundaon and understanding for further study of econometrics, …. It is assumed that students have taken courses in the principles of economics and had review of probability and stascs. These things, previous semester, you need to remem- ber to do well in this class . Introduction to Econometrics
103
Embed
Introduction to Econometrics - pith-edu · 2019. 10. 4. · Introduction to Econometrics is an introductory book for undergraduate students in economics ... Econometrics, Accompanied
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
August 2016
Introduction to Econometrics is an introductory book for undergraduate students in economics
and finance. It is designed to give students an understanding of why econometrics is necessary,
and to provide them with a working knowledge of basic econometric tools so that (1) they can
apply these tools to modeling, estimation, inference, and forecasting in the context of real-world
economic problems; (2) they can evaluate critically the results and conclusions from others who
use basic econometric tools; (3) they have a foundation and understanding for further study of
econometrics, …. It is assumed that students have taken courses in the principles of economics
and had review of probability and statistics. These things, previous semester, you need to remem-
• There are 3 types of data which econometricians might use for analysis: 1. Time series data 2. Cross-sectional data 3. Panel data, a combination of 1. & 2.
• The data may be quantitative (e.g. exchange rates, stock prices, number of
shares outstanding), or qualitative (e.g. day of the week).
• Examples of time series data Series Frequency GNP or unemployment monthly, or quarterly government budget deficit annually money supply weekly value of a stock market index as transactions occur
• Continuous data can take on any value and are not confined to take specific numbers.
• Their values are limited only by precision. o For example, the rental yield on a property could be 6.2%, 6.24%, or 6.238%.
• On the other hand, discrete data can only take on certain values, which are usually integers o For instance, the number of people in a particular underground carriage or the number
of shares traded during a day.
• They do not necessarily have to be integers (whole numbers) though, and are often defined to be count numbers. o For example, until recently when they became „decimalised‟, many financial asset
prices were quoted to the nearest 1/16 or 1/32 of a dollar.
• Another way in which we could classify numbers is according to whether they are
cardinal, ordinal, or nominal.
• Cardinal numbers are those where the actual numerical values that a particular variable takes have meaning, and where there is an equal distance between the numerical values.
o Examples of cardinal numbers would be the price of a share or of a building, and the
number of houses in a street.
• Ordinal numbers can only be interpreted as providing a position or an ordering.
o Thus, for cardinal numbers, a figure of 12 implies a measure that is `twice as good' as a figure of 6. On the other hand, for an ordinal scale, a figure of 12 may be viewed as `better' than a figure of 6, but could not be considered twice as good. Examples of ordinal numbers would be the position of a runner in a race.
• Nominal numbers occur where there is no natural ordering of the values at all.
o Such data often arise when numerical values are arbitrarily assigned, such as telephone
numbers or when codings are assigned to qualitative data (e.g. when describing the exchange that a US stock is traded on.
• Cardinal, ordinal and nominal variables may require different modelling approaches or at least different treatments, as should become evident in the subsequent chapters.
• It is preferable not to work directly with asset prices, so we usually convert the raw prices into a series of returns. There are two ways to do this:
Simple returns or log returns
where, Rt denotes the return at time t pt denotes the asset price at time t ln denotes the natural logarithm
• We also ignore any dividend payments, or alternatively assume that the price series have been already
• Overall task: analyze data to inform a (business) decision.• Assume data relevant to the problem has been collected.• Intermediate task: identify and summarize the data.• Example: we’ve moved to a new city and wish to buy a home.• Data: Y = selling price (in $ thousands) for n = 30 randomly sampled single-family homes
• Consider lowest home price represented by “1” in the stem and “6” in the leaf.• This represents a number between 155 and 164.9 (thousand dollars).• In particular, it is the lowest price of $155,500.• What does this graph tell you about home prices in this market?
• Sample mean, mY , measures “central tendency” of Y-values.• Median also measures central tendency, but less sensitive to very small/large values.• Sample standard deviation, sY , measures spread/variation.• Minimum and maximum.• Percentiles, e.g., 25th percentile: 25% of Y-values are smaller and 75% of Y-values are larger.• Question: what’s another name for the 50th percentile?
• Standardizing calibrates a list of numbers (Y ) to a common scale.• Subtract the mean and divide by the standard deviation:
Z =Y − mY
sY
.
• Sample mean of Z-values? 0• Sample standard deviation of Z-values? 1• Exercise: use statistical software to create graphs, find summary statistics, and calculate
• Population: entire collection of objects of interest.• Sample: (random) subset of population.• Statistical thinking: draw inferences about population by using sample data.• Model: mathematical abstraction of the real world used to make statistical inferences.• Assumptions:
◦ model provides a reasonable fit to sample data,◦ sample is representative of population.
• Normal distribution: simple, effective model (“bell-curve”).
• Drawback to CLT: need to know population standard deviation, SD(Y ), to use it.• Since we rarely know SD(Y ), what would be a good estimate to use instead? The sample s.d.,
sY .• Replacing SD(Y ) with sY requires use of a t-distribution rather than the normal:
◦ t-distribution is like normal but more spread out (fatter tails) to reflect additionaluncertainty;
◦ additional uncertainty is due to using sY instead of assuming we know SD(Y );◦ sY is a better estimate of SD(Y ) for large n;◦ t-distribution accounts for this using degrees of freedom (df= n−1 in this case);◦ as df becomes large, t-distribution looks more and more like normal.
• Horizontal axis values are called critical values.• Tail areas (under the density curve) represent probabilities.• Example: Pr(t29 > 1.699) = 0.05.• Note that critical values get closer to those for the normal as df gets larger.
• Randomly sample Y1, Y2, . . . , Yn from a population with mean, E(Y ).
• CLT: t-statistic =mY −E(Y )
sY /√
n∼ tn−1
(t-distribution with n−1 df).
• Assume home prices Y1, . . . , Y30 have E(Y )=280.• Sample standard deviation, sY , is 53.8656.• What is the 95th percentile of mY ?• Pr (t29 > 1.699) = 0.05
• Goal: estimate the population mean E(Y ).• Best point estimate: the sample mean mY .• How far off might we be? Can we quantify our uncertainty?• Confidence interval: point estimate ± uncertainty.• Example: 80% confidence interval for E(Y ) in home prices application is
278.603 ± 12.893 = (265.710, 291.496).• In other words, based on this dataset, we are 80% confident that the population mean home
price is between $266,000 and $291,000.• This leaves quite a bit of room for error (20%), so 90% and 95% intervals are more common.• Question: will a 90% interval be narrower or wider than the 80% interval?
• Example: home prices Y1, . . . , Y30.• Sample mean, mY , is 278.603.• Sample standard deviation, sY , is 53.8656.• Calculate an 80% confidence interval for E(Y ).• 90th percentile of t29 is 1.311.• mY ± 90th percentile (sY /
√n) = 278.603 ± 1.311 (53.8656/
√30) = 278.603 ± 12.893 =
(265.710, 291.496).• Calculate a 90% confidence interval for E(Y ).
• Loosely speaking: based on this dataset, we are 80% confident that the population mean homeprice is between $266,000 and $291,000.
• More precisely: If we were to take a large number of random samples of size 30 from apopulation of sale prices and calculate an 80% confidence interval for each, then 80% of thoseconfidence intervals would contain the (unknown) population mean.
• E.g., 10 confidence intervals for samples from a population with E(Y ) marked by the verticalline:
• 8 of the intervals contain E(Y ), while 2 don’t.
• Confidence intervals tell us a range of plausible values for E(Y ) with a specified confidencelevel.
• By contrast, hypothesis tests ask whether a particular value is plausible or not.• Example: does a population mean of $255,000 seem plausible given our sample of 30 home
prices?
◦ Upper-tail test: can we reject the possibility that E(Y ) = 255 in favor of E(Y ) > 255?◦ Lower-tail test: can we reject the possibility that E(Y ) = 255 in favor of E(Y ) < 255?◦ Two-tail test: can we reject the possibility that E(Y ) = 255 in favor of E(Y ) 6= 255?
• Upper-tail test: null hypothesis NH : E(Y )=255 versus alternative hypothesis AH :E(Y )>255.
• If NH is true, then the sampling distribution of the t-statistic =mY −E(Y )
sY /√
nis tn−1.
• Recall tn−1 has a bell-shape centered at zero with most of it’s area (≈ 95%) between −2 and+2.
• So, if the value of the t-statistic is “not too far” from zero, we cannot reject NH .• Conversely, a t-statistic much larger than zero favors AH (larger since this is an upper -tail
test).• How large does the t-statistic have to be before we reject NH in favor of AH?• Significance level (e.g., 5%) determines a rejection region beyond a critical value (e.g.,
• Upper-tail test: null hypothesis NH : E(Y )=255 versus alternative hypothesis AH :E(Y )>255.
• t-statistic =mY −E(Y )
sY /√
n= 278.603−255
53.8656/√
30= 2.40.
• Significance level = 5%.• Critical value is the 95th percentile of t29 which is 1.699.• Since t-statistic (2.40) > critical value (1.699), we reject NH in favor of AH .• In other words, the sample data suggest that the population mean is greater than $255,000 (at
• Upper-tail test: null hypothesis NH : E(Y )=255 versus alternative hypothesis AH :E(Y )>255.
• If NH is true, then the sampling distribution of the t-statistic =mY −E(Y )
sY /√
nis tn−1.
• Recall tn−1 has a bell-shape centered at zero with most of it’s area (≈ 95%) between −2 and+2.
• So, if the upper-tail area beyond the t-statistic is “not too small,” we cannot reject NH .• Conversely, a very small upper tail-area favors AH .• How small does the upper-tail area, called the p-value, have to be before we reject NH in favor
of AH?• Smaller than the significance level (e.g., 5%).
• Upper-tail test: null hypothesis NH : E(Y )=255 versus alternative hypothesis AH :E(Y )>255.
• t-statistic =mY −E(Y )
sY /√
n= 278.603−255
53.8656/√
30= 2.40.
• Significance level = 5%.• Since the t-statistic (2.40) is between 2.045 and 2.462, the p-value must be between 0.01 and
0.025.• Since p-value < significance level, we reject NH in favor of AH .• In other words, the sample data suggest that the population mean is greater than $255,000 (at
Do not reject NH Reject NHin favor of AH in favor of AH
NH true correct decision type 1 errorReality
NH false type 2 error correct decision• Pr(type 1 error) = signif. level; analyst selects this.• But, setting it too low can increase the chance of a type 2 error occuring.• Trade-off: set signif. level at 5% (sometimes 1% or 10%); reduce chance of type 2 error by
having n as large as possible, using sound statistical methods.• Also, use hypothesis tests judiciously and always keep in mind the possibility of making these
• New problem: predict an individual Y-value picked at random from the population.• Is this easier or more difficult than estimating the population mean?• More difficult: imagine predicting the sale price of a new home on the market (versus estimating
the average sale price of homes in this market)—which answer would you be less certain about?• Approach: calculate a prediction interval—like a confidence interval but with a larger range of
uncertainty.• Confidence interval: point estimate ± estimation uncertainty.• Prediction interval: point estimate ± prediction uncertainty.
• Model: Yi = E(Y ) + ei (i = 1, . . . , n).• Y-value to be predicted: Y ∗= E(Y ) + e∗.• Point estimate of Y ∗? Sample mean, mY .• Prediction error: Y ∗− mY = (E(Y ) − mY ) + e∗.• Variance of estimation error (E(Y )−mY ): s2
Y/n.
• Var. of random error (e∗): s2
Y.
• Var. of prediction error (Y ∗−mY ): s2
Y(1+1/n).
• Confidence interval for E(Y ): mY ± t-percentile(sY /√
• Example: home prices Y1, . . . , Y30.• Sample mean, mY , is 278.603.• Sample standard deviation, sY , is 53.8656.• Calculate an 80% prediction interval for Y .• 90th percentile of t29 is 1.311.
• mY ± 90th percentile(
sY
√
1+1/n)
= 278.603 ± 1.311(
53.8656√
1+1/30)
=
278.603 ± 71.785 = (206.818, 350.388).• We’re 80% confident the sale price of an individual, randomly selected home in this market will
be between $207,000 and $350,000.• Calculate a 90% prediction interval for Y .
• Y is a quantitative response variable(a.k.a. dependent, outcome, or output variable).
• X is a quantitative predictor variable(a.k.a. independent or input variable, or covariate).
• Two variables play different roles, so important to identify which is which and define carefully,e.g.:
◦ Y is sale price, in $ thousands;◦ X is floor size, in thousands of square feet.
• How much do we expect Y to change by when we change the value of X?• What do we expect the value of Y to be when we set the value of X at 2?• Note: association (observational data) not causation (experimental data).
• Simple linear regression models straight-line relationships (like upper two plots on last slide).• Suppose sale price is (on average) $190,300 plus 40.8 times floor size.
◦ E(Y |Xi) = 190.3 + 40.8Xi,where E(Y |Xi) means “the expected value of Y given that X is equal to Xi”.
• Individual sale prices can deviate from this expected value by an amount ei (called a “randomerror”).
◦ Yi |Xi = 190.3 + 40.8Xi + ei (i = 1, . . . , n).◦ Yi |Xi = deterministic part + random error.
• Error, ei, represents variation in Y due to factors other than X which we haven’t measured,e.g., lot size, # beds/baths, age, garage, schools.
• Estimated equation: Y = b0 + b1X = 190.3 + 40.8X.• We expect Y = b0 when X =0, but only if this makes sense and we have data close to X =0
(not the case here).• We expect Y to change by b1 when X increases by one unit, i.e., we expect sale price to
increase by $40,800 when floor size increases by 1000 sq. feet.• For this example, more meaningful to say we expect sale price to increase by $4080 when floor
Adjusted RegressionModel Multiple R R Squared R Squared Std. Error1 0.972 a 0.945 0.927 2.7865a Predictors: (Intercept), X.
• Regression standard error, s, estimates the std. dev.of the simple linear regression random errors:
s =
√
SSE
n − 2.
• Unit of measurement for s is the same as unit of measurement for Y.• Approximately 95% of the observed Y-values lie within plus or minus 2s of their fitted Y-values.• Since 2s=5.57, we can expect to predict an unobserved sale price from a particular floor size
to within approx. ±$5570 (at a 95% confidence level).
• Without model, estimate Y with sample mean mY .• With model, estimate Y using fitted Y-value.• How much do we reduce our error when we do this?• Total error without model:
TSS =∑
n
i=1(Yi − mY )2, variation in Y about mY .
• Remaining error with model:SSE =
∑
n
i=1(Yi − Yi)
2, unexplained variation.
• Proportional reduction in error: R2 = TSS−SSETSS .
• Home prices example: R2 = 423.4−23.3423.4 = 0.945.
• 94.5% of the variation in sale price (about its mean) can be explained by a straight-linerelationship between sale price and floor size.
Adjusted RegressionModel Multiple R R Squared R Squared Std. Error1 0.972 a 0.945 0.927 2.7865a Predictors: (Intercept), X.
• R2 measures the proportion of variation in Y (about its mean) that can be explained by astraight-line relationship between Y and X.
• If TSS = SSE then R2 = 0: using X to predict Y hasn’t helped and we might as well use mY
to predict Y regardless of the value of X.• If SSE = 0 then R2 = 1: using X allows us to predict Y perfectly (with no random errors).• Such extremes rarely occur and usually R2 lies between zero and one, with higher values of R2
• Significance level = 5%.• Critical value is 3.182 (97.5th percentile of t3).• Since t-statistic (7.18) is between 5.841 and 10.215, p-value is between 0.01 and 0.002.• Since t-statistic (7.18) > critical value (3.182) and p-value < signif. level, reject NH in favor of
AH.• In other words, the sample data favor a nonzero slope (at a significance level of 5%).• Exercise: do an upper tail test for this example.
• Loosely speaking: based on this dataset, we are 95% confident that the population slope, b1, isbetween 22.7 and 58.9.
• More precisely: if we were to take a large number of random samples of size 5 from ourpopulation of homes and calculate a 95% confidence interval for each, then 95% of thoseconfidence intervals would contain the (unknown) population slope.
• Exercise: calculate a 90% confidence interval for b1.
Four assumptions about random errors, e = Y − E(Y ) = Y − b0 − b1X:
• Probability distribution of e at each value of X has a mean of zero;• Probability distribution of e at each value of X has constant variance;• Probability distribution of e at each value of X is normal;• Value of e for one observation is independent of the value of e for any other observation.
Checking the model assumptions• Calculate residuals, e = Y − Y = Y − b0 − b1X.• Draw a residual plot with e along the vertical axis and X along the horizontal axis.
◦ Assess zero mean assumption—do the residuals average out to zero as we move across theplot from left to right?
◦ Assess constant variance assumption—is the (vertical) variation of the residuals similar aswe move across the plot from left to right?
◦ Assess independence assumption—do residuals look “random” with no systematicpatterns?
• Draw a histogram and QQ-plot of the residuals.
◦ Assess normality assumption—does histogram look approximately bell-shaped andsymmetric and do QQ-plot points lie close to line?
• Assessing assumptions in practice can be difficult and time-consuming.• Taking the time to check the assumptions is worthwhile and can provide additional support for
any modeling conclusions.• Clear violation of one or more assumptions could mean results are questionable and should
probably not be used (possible remedies to come in Chapters 3 and 4).• Regression results tend to be quite robust to mild violations of assumptions.• Checking assumptions when n is very small (or very large) can be particularly challenging.• Example: CARS2 data file—is weight or horsepower better for predicting cost?
Interpreting model results• We found a statistically significant straight-line relationship (at a 5% significance level)
between Y = sale price ($k) and X = floor size (k sq. feet).• Estimated equation: Y = b0+b1X = 190.3+40.8X.• X =0 does not make sense for this application, nor do we have data close to X =0, so we
cannot meaningfully interpret b0 = 190.3.• Expect sale price to increase $4080 when floor size increases 100 sq. feet, for 1683–2269
sq. feet homes (95% confident sale price increases between $2270 and $5890 when floor sizeincreases 100 sq. feet).
• Can expect a prediction of an unobserved sale price from a particular floor size to be accurateto within approximately ±$5570 (with 95% confidence).
• 94.5% of the variation in sale price (about its mean) can be explained by a straight-linerelationship between sale price and floor size.
• Recall the confidence interval for a univariate population mean, E(Y ):mY ± t-percentile(sY /
√n).
• Also, a prediction interval for an individual univariate Y-value:
mY ± t-percentile(
sY
√
1+1/n)
.
• Similar distinction between confidence and prediction intervals for simple linear regression.• Confidence interval for the population mean, E(Y ), at a particular X-value is
Y ± t-percentile(sY).
• Prediction interval for an individual Y-value at a particular X-value is Y ∗± t-percentile(sY ∗).
• Which should be wider? Is it harder to estimate a mean or predict an individual value?
3.1 Probability model for (X1, X2, . . . ) and Y 2 / 22
Multiple linear regression
• Y is a quantitative response variable(a.k.a. dependent, outcome, or output variable).
• (X1, X2, . . . ) are quantitative predictor variables(a.k.a. independent/input variables, or covariates).
• Important to identify variables and define them carefully, e.g.:
◦ Y is final exam score, out of 100;◦ X1 is time spent partying during last week of term, in hours;◦ X2 is average time spent studying during term, in hours per week.
• How much do we expect Y to change by when we change the values of X1 and/or X2?• What do we expect the value of Y to be when X1 = 7.5 and X2 = 1.3?
• A matrix of scatterplots showing all bivariate relationships in a multivariate dataset(e.g., previous slide).
• However, patterns cannot tell us whether a multiple linear regression model can provide auseful mathematical approximation to these bivariate relationships.
• Primarily useful for identifying any strange patterns or odd-looking values that might warrantfurther investigation before we start modeling.
• Home price–floor size example:no odd values to worry about.
• Random errors, e, represent variation in Y due to factors other than X1 and X2 that we haven’tmeasured, e.g., numbers of bedrooms/bathrooms, property age, garage size, or nearby schools.
• Use least squares to estimate the deterministic part of the model, E(Y ), asY = b0 + b1X1 + b2X2.
◦ i.e., use statistical software to find the values of b0, b1, and b2 that minimizeSSE =
• Fitted model: Y = 122.36 + 61.98X1 + 7.09X2.• Expect Y = b0 when X1 =X2 =0, but only if this makes sense and we have data close to
X1 =X2 =0• Expect Y to change by b1 when X increases by one and other predictor X-variables stay
constant, i.e., expect sale price to increase $6200 when floor size increases 100 sq. feet and lotsize stays constant.
• Expect Y to change by b2 when X increases by one and other predictor X-variables stayconstant, i.e., expect sale price to increase $7090 when lot size increases one category and floorsize stays constant.
Adjusted RegressionModel Multiple R R Squared R Squared Std. Error1 0.986 a 0.972 0.953 2.4753a Predictors: (Intercept), X1, X2.
• Regression standard error, s, estimates the std. dev.of the multiple linear regression random errors:
s =
√
SSE
n−k−1.
• Unit of measurement for s is the same as unit of measurement for Y.• Approximately 95% of the observed Y-values lie within plus or minus 2s of their fitted Y-values.• 2s=4.95, so expect to predict an unobserved sale price from particular floor and lot size values
to within approx. ±$4950 (at a 95% confidence level).
• Without model, estimate Y with sample mean mY .• With model, estimate Y using fitted Y-value.• How much do we reduce our error when we do this?• Total error without model:
TSS =∑
n
i=1(Yi − mY )2, variation in Y about mY .
• Remaining error with model:SSE =
∑
n
i=1(Yi − Yi)
2, unexplained variation.
• Proportional reduction in error: R2 = TSS−SSETSS .
• Home price–floor size example: R2 = 0.972.• 97.2% of the variation in sale price (about its mean) can be explained by a multiple linear
regression relationship between sale price and(floor size, lot size).
Adjusted RegressionModel Multiple R R Squared R Squared Std. Error1 0.986 a 0.972 0.953 2.4753a Predictors: (Intercept), X1, X2.
• R2 measures the proportion of variation in Y (about its mean) that can be explained by amultiple linear regression relationship between Y and (X1, X2, . . .).
• If TSS = SSE then R2 = 0: using (X1, X2, . . .) to predict Y hasn’t helped and we may as welluse mY to predict Y regardless of the (X1, X2, . . .) values.
• If SSE = 0 then R2 = 1: using (X1, X2, . . .) allows us to predict Y perfectly (with no randomerrors).
• Such extremes rarely occur and usually R2 lies between zero and one, with higher values of R2
• Model building: what is the best way to model the relationship between Y and(X1, X2, . . . , Xk)?
◦ e.g., should we use all k predictors, or just a subset?
• Consider a sequence of nested models, with each model in the sequence adding predictors tothe previous model.
• Which model would R2 say is the “best” model? The final model with k predictors.• Geometrical argument: start with a regression line on a 2D-scatterplot, then add a second
predictor to make the line a plane in a 3D-scatterplot.• In other words, R2 always increases (or stays the same) as you add predictors to a model.
• R2 has a clear interpretation since it represents the proportion of variation in Y (about itsmean) explained by a multiple linear regression relationship between Y and (X1, X2, . . . ).
• But, R2 is not appropriate for finding a model that captures the major, important populationrelationships without overfitting every slight twist and turn in the sample relationships.
• We need an alternate criterion, which penalizes models that contain too many unimportantpredictor variables:
adjusted R2 = 1 −(
n−1
n−k−1
)
(
1−R2)
.
• In practice, we can obtain the value for adjusted R2 directly from statistical software.
Adjusted RegressionModel Multiple R R Squared R Squared Std. Error1 0.826 a 0.682 0.603 7.1775a Predictors: (Intercept), X1.
2 0.986 a 0.972 0.953 2.4753a Predictors: (Intercept), X1, X2.
• Since adjusted R2 is 0.603 for the single-predictor model, but 0.953 for the two-predictormodel, the two-predictor model is better than the single-predictor model (according to this criterion).
• In other words, there is no indication that adding X2 = lot size to the model causes overfitting.• What happens to R2 and s?
• Y = weekly labor hours• X1 = total weight shipped in thousands of pounds• X2 = proportion shipped by truck• X3 = average shipment weight in pounds• X4 = week• Compare two models:
• Since adjusted R2 is 0.786 for the two-predictor model, but 0.771 for the four-predictor model,the two-predictor model is better than the four-predictor model (according to this criterion).
• In other words, there is a suggestion that adding X2 = truck proportion and X4 = week to themodel causes overfitting.
Adjusted RegressionModel Multiple R R Squared R Squared Std. Error1 0.986 a 0.972 0.953 2.4753a Predictors: (Intercept), X1, X2.
• The multiple correlation coefficient, multiple R, measures the strength and direction of linearassociation between the observed Y-values and the fitted Y-values from the model.
• Multiple linear regression: multiple R = +√
R2.
◦ e.g., 0.986=√
0.972 for the home price–floor size example above.
• Beware: intuition about correlation can be seriously misleading when it comes to multiple linearregression (see next two slides).
Sum ofModel Squares df Mean Square Global F-stat Pr(>F)
1 Regression 630.259 2 315.130 51.434 0.005 b
Residual 18.381 3 6.127Total 648.640 5
a Response variable: Y.b Predictors: (Intercept), X1, X2.
•Global F-stat =
(TSS−SSE)/k
SSE/(n−k−1)=
(648.640−18.381)/2
18.381/(6−2−1)
=R2/k
(1−R2)/(n−k−1)=
0.97166/2
(1−0.97166)/(6−2−1)
= 51.4.
• Critical value, FINV(0.05,2,3), is 9.55.• p-value, FDIST(51.4,2,3), is 0.005.• Reject NH in favor of AH; at least one of the predictors, (X1,X2), is linearly related to Y.
Sum ofModel Squares df Mean Square Global F-stat Pr(>F)
1 Regression 5646.052 4 1411.513 ? ? b
Residual 1242.898 15 82.860Total 6888.950 19
a Response variable: Y.b Predictors: (Intercept), X1, X2, X3, X4.
•Global F-stat =
(TSS−SSE)/k
SSE/(n−k−1)
=R2/k
(1−R2)/(n−k−1)
= ?
• Critical value, FINV(0.05,4,15), is 3.06.• p-value, FDIST(?,4,15), is ?• Reject NH in favor of AH; at least one of the predictors, (X1,X2,X3,X4), is linearly related to Y.
• Suppose a global usefulness test suggests at least one of (X1, X2, . . . , Xk) is linearly related toY.
• Can a reduced model with less than k predictor variables be better than a complete k-predictormodel?
◦ If a subset of the X’s provides no useful information about Y beyond the informationprovided by the other X’s.
• Complete k-predictor model: SSEC.• Reduced r-predictor model: SSER.• Which is larger? (recall geometric argument)• Which model is favored if it is a lot larger?• Which model is favored if it is just a little larger?
AH: at least one of b2 or b4 is not equal to 0.• Nested F-stat = 0.472.• Significance level = 5%.• Critical value, FINV(0.05,2,15), is 3.68.• p-value, FDIST(0.472,2,15), is 0.633.• Cannot reject NH in favor of AH.• Neither X2 nor X4 appears to provide useful information about Y beyond the information
Adjusted Regression Change StatisticsModel R Squared R Squared Std. Error F-stat df1 df2 Pr(>F)
R 0.808 a 0.786 8.815C 0.820 b 0.771 9.103 0.472 2 15 0.633a Predictors: (Intercept), X1, X3.b Predictors: (Intercept), X1, X2, X3, X4.
• There is a suggestion that adding X2 = truck proportion and X4 = week to the model causesoverfitting. Why?
◦ Adjusted R2 is higher for the reduced model.◦ The regression standard error, s, is lower for the reduced model.◦ The nested F-stat is not significant (high p-value), so the reduced model is favored.
• Which predictors to test in a nested model test?• One possible approach is to consider the regression parameters individually.• What do the estimated sample estimates, b1, b2, . . . , bk, tell us about likely values for the
population parameters, b1, b2, . . . , bk?• An individual t-test for bp considers whether there is evidence that Xp provides useful
information about Y beyond the information provided by the other k−1 predictors. In otherwords:
◦ should we retain Xp in the model with the other k−1 predictors (evidence suggests bp 6=0);◦ or, should we consider removing Xp from the model and retain only the other k−1
• Significance level = 5%.• Critical value, TINV(0.05,15), is 2.13.• p-value, TDIST(2.28,15,2), is 0.038.• Since t-statistic (2.28) > critical value (2.13) and p-value < signif. level, reject NH in favor of
AH.• Sample data favor b1 6=0 (at a 5% signif. level).• There appears to be a linear relationship between Y and X1, once X2, X3, and X4 have been
accounted for (or holding X2, X3, and X4 constant).
• Last two cols: individual t-stats and two tail p-values.• Low p-values indicate potentially useful predictors that should be retained (i.e., X1 and X3
here).• High p-values indicate possible candidates for removal from the model (i.e., X2 and X4 here).• However, high p-value for X2 means we can remove X2, but only if we retain X1, X3, and X4.• Similarly, high p-value for X4 means we can remove X4, but only if we retain X1, X2, and X3.
• Can do individual regression parameter t-tests to:
◦ remove just one redundant predictor at a time;◦ or to identify which predictors to investigate with a nested model F-test.
• Need to do a nested model F-test to remove more than one predictor at a time.• Using nested model F-tests allows us to use fewer hypothesis tests overall to help identify
redundant predictors (so that the remaining predictors appear to explain Y adequately).
◦ This also lessens the chance of making any hypothesis test errors.
• Loosely speaking: based on this dataset, we are 95% confident that the the populationregression parameter, b1, is between 0.40 and 11.75 in the modelE(Y ) = b0 + b1X1 + b2X2 + b3X3 + b4X4.
• More precisely: if we were to take a large number of random samples of size 20 from ourpopulation of shipping numbers and calculate a 95% confidence interval for b1 in each, then95% of those confidence intervals would contain the true (unknown) population regressionparameter.
• What happens to this interval in the model E(Y ) = b0 + b1X1 + b3X3?
• Global usefulness test to determine whether any of the potential predictors in a dataset areuseful.
• Nested model F-tests and individual parameter t-tests to identify the most important predictors.• Employ tests judiciously to avoid conducting too many tests and reduce chance of making
mistakes.• If possible, identification of the important predictors should also be guided by practical
considerations and background knowledge about the application.• When k is very large, computer intensive methods can help get things started:
◦ Forward selection: predictors added sequentially to an initial zero-predictor model;◦ Backward elimination: predictors excluded sequentially from the full k-predictor model;◦ Combined stepwise method: can proceed forwards or backwards at each stage;◦ Other machine learning/data mining methods.
Four assumptions about random errors, e = Y − E(Y ) = Y − b0 − b1X1 − · · · − bkXk:
• Probability distribution of e at each set of values (X1, X2, . . . , Xk) has a mean of zero;• Probability distribution of e at each set of values (X1, X2, . . . , Xk) has constant variance;• Probability distribution of e at each set of values (X1, X2, . . . , Xk) is normal;• Value of e for one observation is independent of the value of e for any other observation.
Checking the model assumptions• Calculate residuals, e = Y − Y = Y − b0 − b1X1 − · · · − bkXk.• Draw a residual plot with e along the vertical axis and a function of (X1, X2, . . . , Xk) along the
horizontal axis (e.g., Y or one of the X’s).
◦ Assess zero mean assumption—do the residuals average out to zero as we move across theplot from left to right?
◦ Assess constant variance assumption—is the (vertical) variation of the residuals similar aswe move across the plot from left to right?
◦ Assess independence assumption—do residuals look “random” with no systematicpatterns?
• Draw a histogram and QQ-plot of the residuals.
◦ Assess normality assumption—does histogram look approximately bell-shaped andsymmetric and do QQ-plot points lie close to line?
• Assessing assumptions in practice can be difficult and time-consuming.• Taking the time to check the assumptions is worthwhile and can provide additional support for
any modeling conclusions.• Clear violation of one or more assumptions could mean results are questionable and should
probably not be used.• Possible remedy: try a different subset of available predictors (further ideas to come in Chapter
4).• Regression results tend to be quite robust to mild violations of assumptions.• Checking assumptions when n is very small (or very large) can be particularly challenging.• Example: MLRA data file.
Model 1 on the left: E(Y ) = b0 + b1X1 + b2X2.Model 2 on the right: E(Y )=b0+b1X1+b2X2+b3X3.
0.0 0.2 0.4 0.6 0.8
−1.
00.
00.
51.
0
X3
Mod
el 1
res
idua
ls
0.0 0.2 0.4 0.6 0.8
−0.
50.
00.
51.
0
X3
Mod
el 2
res
idua
ls
Plots include “loess fitted lines” (computational method for applying “slicing/averaging”technique).Do either of the models fail the zero mean assumption?
There is no evidence at the 5% significance level that X2 (proportion shipped by truck) or X4
(week) provide useful information about Y (weekly labor hours) beyond the information providedby X1 (total weight shipped in thousands of pounds) and X3 (average shipment weight in pounds).
Interpreting model results• We found a statistically significant straight-line relationship (at a 5% significance level)
between Y and X1 (holding X3 constant)and between Y and X3 (holding X1 constant).
• Estimated equation: Y =110.43+5.00X1−2.01X3.• X1 =X3 =0 makes no sense for this application, nor do we have data close to X1 =X3 =0, so
cannot meaningfully interpret b0 = 110.43.• Expect increase of 5 weekly labor hours when total weight increases 1000 pounds and
ave. shipment weight remains constant, for total weights of 2000–10,000 pounds andave. weights of 10–30 pounds (95% confident increase is 0.23–9.77).
• Expect decrease of 2.01 weekly labor hours when ave. weight increases 1 pound and totalweight remains constant, for total weights of 2000–10,000 pounds and ave. weights of 10–30pounds (95% confident decrease is 0.60–3.42).
• Can expect a prediction of unobserved weekly labor hours from particular values of total weightshipped and average shipment weight to be accurate to within approximately ±17.6 (with 95%confidence).
• 80.8% of the variation in weekly labor hours (about its mean) can be explained by a multiplelinear regression relationship between labor hours and (total weight shipped, average shipmentweight).
• Estimate the mean (or expected) value of Y at particular values of (X1, X2, . . . , Xk).• Formula: Y ± t-percentile(s
Y).
• Interval is narrower:
◦ when n is large;◦ when X’s are close to their sample means;◦ when the regression standard error, s, is small;◦ for lower levels of confidence.
• Example: for shipping example two-predictor model, the 95% confidence interval for E(Y )when X1 =6 and X3 =20 is (95.4, 105.0).
• Interpretation: we’re 95% confident that expected weekly labor hours is between 95.4 and105.0 when total weight shipped is 6000 pounds and average shipment weight is 20 pounds.
Prediction interval for an individual Y-value• Predict an individual value of Y at particular values of (X1, X2, . . . , Xk).• Formula: Y ∗± t-percentile(s
Y ∗).• Interval is narrower:
◦ when n is large;◦ when X’s are close to their sample means;◦ when the regression standard error, s, is small;◦ for lower levels of confidence.
• Since sY ∗ >s
Y, prediction interval is wider than confidence interval.
• Example: for shipping example two-predictor model, the 95% prediction interval for Y ∗whenX1 =6 and X3 =20 is (81.0, 119.4).
• Interpretation: we’re 95% confident that actual labor hours in a week is between 81.0 and119.4 when total weight shipped is 6000 pounds and average shipment weight is 20 pounds.
Table B.l contains critical values or percentiles for t-distributions; a description of how to use the table precedes it. Figure B.l illustrates how to use the table to find bounds for an upper-tail p-value. Bounds for a lower-tail p-value involve a similar procedure for the negative (left-hand) side of the density curve. To find bounds for a two-tail p-value, multiply each bound for the corresponding upper-tail p-value by 2; for example, the two-tail p-value for the situation in Figure B.l lies between 0.05 and 0.10.
Use Table B.l and Figure B.2 to find critical values or percentiles for t-distributions; each row of the table corresponds to a t-distribution with the degrees of freedom shown in the left-hand column. The critical values in the body of the table represent values along the horizontal axis of the figure. Each upper-tail significance level in bold at the top of the table represents the area under the curve to the right of a critical value. For example, if the curve in the figure represents a t-distribution with 60 degrees of freedom, the right-hand shaded area under the curve to the right of the critical value 2.000 represents an upper-tail significance level of 0.025. Each two-tail significance level in bold at the bottom of the table represents the sum of the areas to the right of a critical value and to the left of the negative of that critical value. For example, for a t-distribution with 60 degrees of freedom, the sum of the shaded areas under the curve to the right of the critical value 2.000 and to the left of —2.000 represents a two-tail significance level of 0.05.
For t-distributions with degrees of freedom not in the table (e.g., 45), to be conservative you should use the table row corresponding to the next lowest number (i.e., 40 for 45 degrees of freedom), although you will lose some accuracy when you do this. Alternatively, use
Table B.l contains critical values or percentiles for t-distributions; a description of how to use the table precedes it. Figure B.l illustrates how to use the table to find bounds for an upper-tail p-value. Bounds for a lower-tail p-value involve a similar procedure for the negative (left-hand) side of the density curve. To find bounds for a two-tail p-value, multiply each bound for the corresponding upper-tail p-value by 2; for example, the two-tail p-value for the situation in Figure B.l lies between 0.05 and 0.10.
Use Table B.l and Figure B.2 to find critical values or percentiles for t-distributions; each row of the table corresponds to a t-distribution with the degrees of freedom shown in the left-hand column. The critical values in the body of the table represent values along the horizontal axis of the figure. Each upper-tail significance level in bold at the top of the table represents the area under the curve to the right of a critical value. For example, if the curve in the figure represents a t-distribution with 60 degrees of freedom, the right-hand shaded area under the curve to the right of the critical value 2.000 represents an upper-tail significance level of 0.025. Each two-tail significance level in bold at the bottom of the table represents the sum of the areas to the right of a critical value and to the left of the negative of that critical value. For example, for a t-distribution with 60 degrees of freedom, the sum of the shaded areas under the curve to the right of the critical value 2.000 and to the left of —2.000 represents a two-tail significance level of 0.05.
For t-distributions with degrees of freedom not in the table (e.g., 45), to be conservative you should use the table row corresponding to the next lowest number (i.e., 40 for 45 degrees of freedom), although you will lose some accuracy when you do this. Alternatively, use
Figure B.l Density curve for a t-distribution showing two critical values from Table B.l to the left and to the right of a calculated test statistic. The upper-tail p-value is between the corresponding upper-tail significance levels at the top of the table, in this case 0.025 and 0.05.
Figure B.2 Density curve for a t-distribution showing critical values (or percentiles or t-statistics) along the horizontal axis and significance levels (or probabilities or p-values) as areas under the curve.
computer help #8 in the software information files available from the book website to find exact percentiles (or critical values). For example, computer software will show that the 97.5th percentile of the t-distribution with 40 degrees of freedom is 2.021, while the 97.5th percentile of the t-distribution with 45 degrees of freedom is 2.014. Be careful to input the correct significance level when using software to find t-percen tiles. For example, if you enter "0.05," software that expects a one-tail significance level will return the 95th percentile, whereas software that expects a two-tail significance will return the 97.5th percentile. Use computer help #9 to turn these calculations around and find tail areas (or p-values). Again, be careful about whether the software is working with one- or two-tail areas. For example, the upper-tail area corresponding to the test statistic 2.021 for a t-distribution with 40 degrees of freedom is 0.025, while the two-tail area corresponding to the test statistic 2.021 for a t-distribution with 40 degrees of freedom is 0.05.
2 9 0 CRITICAL VALUES FOR t-DISTRIBUTIONS
/ 0.05 p-value > 0.025
^ I J^2* 0 1.671 1.83 2.000
critical test critical value statistic value
Figure B.l Density curve for a t-distribution showing two critical values from Table B.l to the left and to the right of a calculated test statistic. The upper-tail p-value is between the corresponding upper-tail significance levels at the top of the table, in this case 0.025 and 0.05.
Figure B.2 Density curve for a t-distribution showing critical values (or percentiles or t-statistics) along the horizontal axis and significance levels (or probabilities or p-values) as areas under the curve.
computer help #8 in the software information files available from the book website to find exact percentiles (or critical values). For example, computer software will show that the 97.5th percentile of the t-distribution with 40 degrees of freedom is 2.021, while the 97.5th percentile of the t-distribution with 45 degrees of freedom is 2.014. Be careful to input the correct significance level when using software to find t-percen tiles. For example, if you enter "0.05," software that expects a one-tail significance level will return the 95th percentile, whereas software that expects a two-tail significance will return the 97.5th percentile. Use computer help #9 to turn these calculations around and find tail areas (or p-values). Again, be careful about whether the software is working with one- or two-tail areas. For example, the upper-tail area corresponding to the test statistic 2.021 for a t-distribution with 40 degrees of freedom is 0.025, while the two-tail area corresponding to the test statistic 2.021 for a t-distribution with 40 degrees of freedom is 0.05.
CRITICAL VALUES FOR t-DISTRIBUTIONS 291
Table B.l Percentiles or critical values for t-distributions. The final row of the table labeled Z represents the standard normal distribution (equivalent to a t-distribution with infinite degrees of freedom).
t-distribution upper-tail significance level df
2 3 4 5
6 7 8 9
10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
26 27 28 29 30
40 50 60 70 80 90
100
200 500
1000
z df
0.1
1.886 1.638 1.533 1.476
1.440 1.415 1.397 1.383 1.372
1.363 1.356 1.350 1.345 1.341
1.337 1.333 1.330 1.328 1.325
1.323 1.321 1.319 1.318 1.316
1.315 1.314 1.313 1.311 1.310
1.303 1.299 1.296 1.294 1.292 1.291 1.290
1.286 1.283 1.282
1.282
0.2
0.05
2.920 2.353 2.132 2.015
1.943 1.895 1.860 1.833 1.812
1.796 1.782 1.771 1.761 1.753
1.746 1.740 1.734 1.729 1.725
1.721 1.717 1.714 1.711 1.708
1.706 1.703 1.701 1.699 1.697
1.684 1.676 1.671 1.667 1.664 1.662 1.660
1.653 1.648 1.646
1.645
0.1
0.025
4.303 3.182 2.776 2.571
2.447 2.365 2.306 2.262 2.228
2.201 2.179 2.160 2.145 2.131
2.120 2.110 2.101 2.093 2.086
2.080 2.074 2.069 2.064 2.060
2.056 2.052 2.048 2.045 2.042
2.021 2.009 2.000 1.994 1.990 1.987 1.984
1.972 1.965 1.962
1.960
0.05
0.01
6.965 4.541 3.747 3.365
3.143 2.998 2.896 2.821 2.764
2.718 2.681 2.650 2.624 2.602
2.583 2.567 2.552 2.539 2.528
2.518 2.508 2.500 2.492 2.485
2.479 2.473 2.467 2.462 2.457
2.423 2.403 2.390 2.381 2.374 2.368 2.364
2.345 2.334 2.330
2.326
0.02
0.005
9.925 5.841 4.604 4.032
3.707 3.499 3.355 3.250 3.169
3.106 3.055 3.012 2.977 2.947
2.921 2.898 2.878 2.861 2.845
2.831 2.819 2.807 2.797 2.787
2.779 2.771 2.763 2.756 2.750
2.704 2.678 2.660 2.648 2.639 2.632 2.626
2.601 2.586 2.581
2.576
0.01
0.001
22.327 10.215 7.173 5.893
5.208 4.785 4.501 4.297 4.144
4.025 3.930 3.852 3.787 3.733
3.686 3.646 3.610 3.579 3.552
3.527 3.505 3.485 3.467 3.450
3.435 3.421 3.408 3.396 3.385
3.307 3.261 3.232 3.211 3.195 3.183 3.174
3.131 3.107 3.098
3.090
0.002 t-distribution two-tail significance level
CRITICAL VALUES FOR t-DISTRIBUTIONS 291
Table B.l Percentiles or critical values for t-distributions. The final row of the table labeled Z represents the standard normal distribution (equivalent to a t-distribution with infinite degrees of freedom).
Upper-tail critical value for testing E(K): t-percentile from t„^\ (significance level = area to the right) 20
Lower-tail critical value for testing E(Y): t-percentile from /„_i (significance level = area to the left) 20
Two-tail critical value for testing E(Y): t-percentile from t„-\ (significance level = sum of tail areas) 20
Upper-tail p-value for testing E(y): area under /„_i curve to right of t-statistic 21 Lower-tail p-value for testing E(K): area under t„_\ curve to left of t-statistic 21 Two-tail p-value for testing E(K): 2x area under /„_] curve beyond t-statistic 21 Model for univariate data: Y = E(Y) + e 26 Point estimate for E(K): my 15 Confidence interval for E(Y): my ± (t-percentile from t„-i){sy/y/n) 17 Point estimate for Y* (prediction): my 25 Prediction interval for Y*: my ± (t-percentile from t„-i)(sy y/l + l/n) 26
C.2 SIMPLE LINEAR REGRESSION
Notation and Formulas Page
Response values: Y; predictor values: X; sample size: n 35 Simple linear regression model: Y = E(Y) + e = bo + biX + e 40 Fitted regression model for E(Y): Y = bo + b\ X 42 Estimated errors or residuals: e = Y — Y 42 Residual sum of squares: RSS = EjLt e2 42
vC RSS Regression standard error: s= \j „ _ 2 ^6
(with 95% confidence, we can expect to predict Y to within approx. ±2s) 47 Total sum of squares: TSS = £?= 1 (Yi - mY)2 48
RSS Coefficient of determination: R2 = 1 — JQQ 48
(linear association between Y and X explains R2 of the variation in Y) 49 LL,(yi-my)(Xi-mx) Coefficient of correlation: r = —, —. 50
y/LU (Yi-my)Wl.U {Xi-mx)2
(r tells us strength and direction of any linear association between Y and X) 50
t-statistic for testing b\: — (the test value, b\, is usually 0) 53
Upper-tail critical value for testing b\: t-percentile from f„_2 (significance level = area to the right) 54
Lower-tail critical value for testing b\: t-percentile from f„_2 (significance level = area to the left) 54
Two-tail critical value for testing b\: t-percentile from f„_2 (significance level = sum of tail areas) 55
Upper-tail p-value for testing b\: area under t„-2 curve to right of t-statistic 54
2 9 4 NOTATION AND FORMULAS
Notation and Formulas Page
Upper-tail critical value for testing E(K): t-percentile from t„^\ (significance level = area to the right) 20
Lower-tail critical value for testing E(Y): t-percentile from /„_i (significance level = area to the left) 20
Two-tail critical value for testing E(Y): t-percentile from t„-\ (significance level = sum of tail areas) 20
Upper-tail p-value for testing E(y): area under /„_i curve to right of t-statistic 21 Lower-tail p-value for testing E(K): area under t„_\ curve to left of t-statistic 21 Two-tail p-value for testing E(K): 2x area under /„_] curve beyond t-statistic 21 Model for univariate data: Y = E(Y) + e 26 Point estimate for E(K): my 15 Confidence interval for E(Y): my ± (t-percentile from t„-i){sy/y/n) 17 Point estimate for Y* (prediction): my 25 Prediction interval for Y*: my ± (t-percentile from t„-i)(sy y/l + l/n) 26
C.2 SIMPLE LINEAR REGRESSION
Notation and Formulas Page
Response values: Y; predictor values: X; sample size: n 35 Simple linear regression model: Y = E(Y) + e = bo + biX + e 40 Fitted regression model for E(Y): Y = bo + b\ X 42 Estimated errors or residuals: e = Y — Y 42 Residual sum of squares: RSS = EjLt e2 42
vC RSS Regression standard error: s= \j „ _ 2 ^6
(with 95% confidence, we can expect to predict Y to within approx. ±2s) 47 Total sum of squares: TSS = £?= 1 (Yi - mY)2 48
RSS Coefficient of determination: R2 = 1 — JQQ 48
(linear association between Y and X explains R2 of the variation in Y) 49 LL,(yi-my)(Xi-mx) Coefficient of correlation: r = —, —. 50
y/LU (Yi-my)Wl.U {Xi-mx)2
(r tells us strength and direction of any linear association between Y and X) 50 t-statistic for testing b\: — (the test value, b\, is usually 0) 53
Upper-tail critical value for testing b\: t-percentile from f„_2 (significance level = area to the right) 54
Lower-tail critical value for testing b\: t-percentile from f„_2 (significance level = area to the left) 54
Two-tail critical value for testing b\: t-percentile from f„_2 (significance level = sum of tail areas) 55
Upper-tail p-value for testing b\: area under t„-2 curve to right of t-statistic 54
MULTIPLE LINEAR REGRESSION 2 9 5
Notation and Formulas Page
Lower-tail p-value for testing b\: area under f„_2 curve to left of t-statistic 54 Two-tail p-value for testing b\: 2 x area under r„_2 curve beyond t-statistic 55 Confidence interval for b\: b\± (t-percentile from f„_2)(s£ ) 58
Point estimate for E(Y) (estimation) at Xp: Y = bo + b\Xp 68
Confidence interval for E(K) at Xp: Y ± (t-percentile from f„_2)Cty) 68
-mx)2
Point estimate for Y* (prediction) at Xp: Y = bo + b\Xp 69
Prediction interval for Y* at Xp: Y ± (t-percentile from tn-2)(sy.) 69
/ l | {Xp-mx) Standard error of estimation: Sf = i < / - + „ n (y._ \2~ 68
/ i l l , (Xp—mx)2
Standard error of prediction: Sy* — ^\l ' n x*n (Y y " L,=i(Ai — mx, I
70
C.3 MULTIPLE LINEAR REGRESSION
Notation and Formulas Page
Response values: Y; predictor values: X\,X2,---, X ; sample size: n 83 Multiple linear regression model:
Y = E{Y) + e = b0+biXi+b2X2 + --- + bkXk + e 85 Interpreting regression parameters in models such as E(Y) = bo + b\ X\ + £2^2:
by — expected change in Y when X\ increases by one unit (and X2 stays fixed) 85 Fitted regression model for E(K): Y = b0 + b\X\ + b2X2 -\ \-bkXk 87 Estimated errors or residuals: e = Y — Y 88 Residual sum of squares: RSS = £"=1 e
(with 95% confidence, we can expect to predict Y to within approx. ±2s) 93
Total sum of squares: TSS = Z"= i{Yi-mY)2 94 RSS
Coefficient of determination: R2 = 1 — J Q ^ 94 [the linear regression model for (X\,... ,Xk) explains R2 of the variation in Y] 94
AdjustedR2 = l - ( ^ = ^ T ) ( l - R 2 ) 96
Multiple R = v/R2 100 (the correlation between the observed Y-values and the fitted Y-values) 100
Global F-statistic for testing b\=b2=-—bk=0:
(TSS-RSS) A R 2 A ) m
RSS/(B-ik-l) " (l_R2)/(II_ ifc_l) Critical value: F-percentile from F* „_*_! (significance level = area to the right) 102 p-value: area under the Fkn_k_\ curve to the right of the F-statistic 102
MULTIPLE LINEAR REGRESSION 2 9 5
Notation and Formulas Page
Lower-tail p-value for testing b\: area under f„_2 curve to left of t-statistic 54 Two-tail p-value for testing b\: 2 x area under r„_2 curve beyond t-statistic 55 Confidence interval for b\: b\± (t-percentile from f„_2)(s£ ) 58 Point estimate for E(Y) (estimation) at Xp: Y = bo + b\Xp 68 Confidence interval for E(K) at Xp: Y ± (t-percentile from f„_2)Cty) 68
-mx)2
Point estimate for Y* (prediction) at Xp: Y = bo + b\Xp 69 Prediction interval for Y* at Xp: Y ± (t-percentile from tn-2)(sy.) 69
/ l | {Xp-mx) Standard error of estimation: Sf = i < / - + „ n (y._ \2~ 68
/ i l l , (Xp—mx)2
Standard error of prediction: Sy* — ^\l ' n x*n (Y y " L,=i(Ai — mx, I
70
C.3 MULTIPLE LINEAR REGRESSION
Notation and Formulas Page
Response values: Y; predictor values: X\,X2,---, X ; sample size: n 83 Multiple linear regression model:
Y = E{Y) + e = b0+biXi+b2X2 + --- + bkXk + e 85 Interpreting regression parameters in models such as E(Y) = bo + b\ X\ + £2^2:
by — expected change in Y when X\ increases by one unit (and X2 stays fixed) 85 Fitted regression model for E(K): Y = b0 + b\X\ + b2X2 -\ \-bkXk 87 Estimated errors or residuals: e = Y — Y 88 Residual sum of squares: RSS = £"=1 e
[the linear regression model for (X\,... ,Xk) explains R2 of the variation in Y] 94 AdjustedR2 = l - ( ^ = ^ T ) ( l - R 2 ) 96 Multiple R = v/R2 100
(the correlation between the observed Y-values and the fitted Y-values) 100 Global F-statistic for testing b\=b2=-—bk=0:
(TSS-RSS) A R 2 A ) m
RSS/(B-ik-l) " (l_R2)/(II_ifc_l) Critical value: F-percentile from F* „_*_! (significance level = area to the right) 102 p-value: area under the Fkn_k_\ curve to the right of the F-statistic 102
Critical value: F-percentile from F^- r^ - i (significance level = area to the right) 105 p-value: area under the Ft-r^-i curve to the right of the F-statistic 105
bn ~bn
t-statistic for testing b„: -*■ - (the test value, b„, is usually 0) 109 Upper-tail critical value for testing bp: t-percentile from f„_/t_i
(significance level = area to the right) 111 Lower-tail critical value for testing bp: t-percentile from fn-t-i
(significance level = area to the left) 111 Two-tail critical value for testing bp: t-percentile from f„-*_i
(significance level = sum of tail areas) 110 Upper-tail p-value for testing bp: area under /„_*-] curve to right of t-statistic 111 Lower-tail p-value for testing bp: area under f„_*_i curve to left of t-statistic 111 Two-tail p-value for testing bp: 2 x area under f„_t_i curve beyond t-statistic 111 Confidence interval for bp: bp ± (t-percentile from /„_*_!) (s^ ) 113 Regression parameter standard errors: square roots of the diagonal entries of s2 (XTX) ~' 118 Point estimate for E(Y) (estimation) at (X\ ,X2,...,X^):
Y = b0 + blXi+b2X2+-+bkXk 126 Confidence interval forE{Y) at(Xi,X2,...,Xk):
Y ± (t-percentile from f„_n )(sf) ^6
Point estimate for Y* (prediction) at (X\ ,X2, ■ ■. ,Xk):
Y = bo + blXl+b2X2 + --+bkXk 128 Prediction interval for Y* at (X) ,X2,... ,Xk):
Y ± (t-percentile from r^t- i ) isf«) 128 Standard error of estimation: Sy = s\/xT(XJX)~lx 129 Standard error of prediction: sY, = s\/l +xT(XTX)_ 1x 129 Models with loge(F) as the response, for example, E(loge(y)) = bo + b\ X\ + b2X2:
exp(fc]) — 1 = proportional change in Y when X\ increases by 1 unit 155 (and X2 stays fixed)
Standardized residual: r,- = ,'< . 193
Studentized residual: /, = n J " ^ ^ 193
Leverages: diagonal entries of H = X(XTX)" * XT 196 Leverage (alternate formula): hi = , . j ; . ,2 'A 199
ANOVA test See Global usefulness test and Nested model test. Autocorrelation Data collected over time can result in regression model residuals that
violate the independence assumption because they are highly dependent across time (p. 202). Also called serial correlation.
Average See Mean.
Bivariate Datasets with two variables measured on a sample of observations (p. 35).
Categorical See Qualitative.
Collinearity See Multicollinearity.
Confidence interval A range of values that we are reasonably confident (e.g., 95%) contains an unknown population parameter such as a population mean or a regression parameter (p. 16). Also called a mean confidence interval.
Cook's distance A measure of the potential influence of an observation on a regression
model, due to either outlyingness or high leverage (p. 196).
Correlation A measure of linear association between two quantitative variables (p. 50).
Covariate(s) See Predictor variable(s).
Critical value A percentile from a probability distribution (e.g., t or F) that defines the
rejection region in a hypothesis test (p. 20).
Degrees of freedom Whole numbers for t, F, and x2 distributions that determine the
shape of the density function, and therefore also critical values and p-values (p. 14).
Density curve Theoretical smoothed histogram for a probability distribution that shows the relative frequency of particular values for a random variable (p. 6).
ANOVA test See Global usefulness test and Nested model test. Autocorrelation Data collected over time can result in regression model residuals that
violate the independence assumption because they are highly dependent across time (p. 202). Also called serial correlation.
Average See Mean. Bivariate Datasets with two variables measured on a sample of observations (p. 35). Categorical See Qualitative. Collinearity See Multicollinearity. Confidence interval A range of values that we are reasonably confident (e.g., 95%)
contains an unknown population parameter such as a population mean or a regression parameter (p. 16). Also called a mean confidence interval.
Cook's distance A measure of the potential influence of an observation on a regression model, due to either outlyingness or high leverage (p. 196).
Correlation A measure of linear association between two quantitative variables (p. 50). Covariate(s) See Predictor variable(s). Critical value A percentile from a probability distribution (e.g., t or F) that defines the
rejection region in a hypothesis test (p. 20). Degrees of freedom Whole numbers for t, F, and x2 distributions that determine the
shape of the density function, and therefore also critical values and p-values (p. 14). Density curve Theoretical smoothed histogram for a probability distribution that shows
the relative frequency of particular values for a random variable (p. 6).
Distribution Theoretical model that describes how a random variable varies, that is, which values it can take and their associated probabilities (p. 5).
Dummy variables See Indicator variables.
Expected value The population mean of a variable.
Extrapolation Using regression model results to estimate or predict a response value for an observation with predictor values that are very different from those in our sample (p. 213).
Fitted value The estimated expected value, Y, of the response variable in a regression
model (p. 88). Also called an (unstandardized) predicted value.
Global usefulness test Hypothesis test to see whether any of the predictors in a multiple
linear regression model are significant (p. 101). An example of an ANOVA test.
Hierarchy A modeling guideline that suggests including lower-order predictor terms when also using higher-order terms, for example, keep X\ when using Xf, keep Xi and X2 when using X1X2, and keep X2 when using DX2 (p. 145).
Histogram A bar chart showing relative counts (frequencies) within consecutive ranges
(bins) of a variable (p. 3).
Hypothesis test A method for deciding which of two competing hypotheses about a
population parameter seems more reasonable (p. 19).
Imputation One method for dealing with missing data by replacing the missing values with imputed numbers, which might be sample means, model predictions, and so on (p. 215).
Independent variable(s) See Predictor variable(s).
Indicator variables Variables derived from qualitative variables that have values of 1 for
one category and 0 for all other categories (p. 167). Also called dummy variables.
Individual prediction interval See Prediction interval.
Input variable(s) See Predictor variable(s).
Interaction When the effect of one predictor variable on a response variable depends on
the value of another predictor variable (p. 159).
Least squares The computational criterion used to derive regression parameter estimates by minimizing the residual sum of squares, where the residuals are the differences between observed K-values and fitted f-values (p. 88).
Leverage A measure of the potential influence of a sample observation on a fitted regres-sion model (p. 194).
Loess fitted line A smooth line for a scatterplot that fits a general nonlinear curve
representing the association between the variables on the two axes (p. 120).
Mean A measure of the central tendency of a variable, also known as the average (p. 4).
Median An alternative measure of the central tendency of a variable, which is greater
than half the sample values and less than the other half (p. 4).
Multicollinearity When there is excessive correlation between quantitative predictor variables that can lead to unstable multiple regression models and inflated standard errors (p. 206). Also called collinearity.
3 1 6 GLOSSARY
Dependent variable See Response variable. Distribution Theoretical model that describes how a random variable varies, that is,
which values it can take and their associated probabilities (p. 5). Dummy variables See Indicator variables. Expected value The population mean of a variable. Extrapolation Using regression model results to estimate or predict a response value for
an observation with predictor values that are very different from those in our sample (p. 213).
Fitted value The estimated expected value, Y, of the response variable in a regression model (p. 88). Also called an (unstandardized) predicted value.
Global usefulness test Hypothesis test to see whether any of the predictors in a multiple linear regression model are significant (p. 101). An example of an ANOVA test.
Hierarchy A modeling guideline that suggests including lower-order predictor terms when also using higher-order terms, for example, keep X\ when using Xf, keep Xi and X2 when using X1X2, and keep X2 when using DX2 (p. 145).
Histogram A bar chart showing relative counts (frequencies) within consecutive ranges (bins) of a variable (p. 3).
Hypothesis test A method for deciding which of two competing hypotheses about a population parameter seems more reasonable (p. 19).
Imputation One method for dealing with missing data by replacing the missing values with imputed numbers, which might be sample means, model predictions, and so on (p. 215).
Independent variable(s) See Predictor variable(s). Indicator variables Variables derived from qualitative variables that have values of 1 for
one category and 0 for all other categories (p. 167). Also called dummy variables.
Individual prediction interval See Prediction interval. Input variable(s) See Predictor variable(s). Interaction When the effect of one predictor variable on a response variable depends on
the value of another predictor variable (p. 159). Least squares The computational criterion used to derive regression parameter estimates
by minimizing the residual sum of squares, where the residuals are the differences between observed K-values and fitted f-values (p. 88).
Leverage A measure of the potential influence of a sample observation on a fitted regres-sion model (p. 194).
Loess fitted line A smooth line for a scatterplot that fits a general nonlinear curve representing the association between the variables on the two axes (p. 120).
Mean A measure of the central tendency of a variable, also known as the average (p. 4). Median An alternative measure of the central tendency of a variable, which is greater
than half the sample values and less than the other half (p. 4). Multicollinearity When there is excessive correlation between quantitative predictor
variables that can lead to unstable multiple regression models and inflated standard errors (p. 206). Also called collinearity.
GLOSSARY 317
Multiple R The correlation between the observed K-values and the fitted F-values from a regression model (p. 100).
Multivariate Datasets with two or more variables measured on a sample of observations
(p. 83).
Natural logarithm transformation A mathematical transformation for positive-valued quantitative variables which spreads out low values and pulls in high values; that is, it makes positively skewed data look more normal (p. 142).
Nested model test Hypothesis test to see whether a subset of the predictors in a multiple linear regression model is significant (p. 104). An example of an ANOVA test. Also called an R-squared change test.
Nominal See Qualitative.
Normal probability plot See QQ-plot.
Observed significance level See p-value.
Ordinal See Qualitative.
Outcome variable See Response variable.
Outlier A sample observation in a linear regression model with a srudentized residual
less than —3 or greater than +3 (p. 190).
Output variable See Response variable.
p-value The probability of observing a test statistic as extreme as the one observed or
even more extreme (in the direction that favors the alternative hypothesis) (p. 21).
Parameter A numerical summary measure for a population such as a population mean
or a regression parameter (p. 11).
Percentile A number that is greater than a certain percentage (say, 95%) of the sample
values and less than the remainder (5% in this case) (p. 4). Also called a quantile.
Point estimate A single number used as an estimate of a population parameter. For
example, the sample mean is a point estimate of the population mean (p. 15).
Polynomial transformation A mathematical transformation involving increasing powers
of a quantitative variable, for example, X, X2, and X3 (p. 144).
Population The entire collection of objects of interest about which we would like to
make statistical inferences (p. 5).
Predicted value See Fitted value.
Prediction interval A range of values that we are reasonably confident (e.g., 95%) contains an unknown data value (such as for univariate data or for a regression response variable) (p. 25). Also called an individual prediction interval.
Predictor effect plot A line graph that shows how a regression response variable varies
with a predictor variable holding all other predictors constant (p. 224).
Predictor variable(s) Variable(s) in a regression model that we use to help estimate or predict the response variable; also known as independent or input variable(s), or covariate(s) (p. 83).
Probability Mathematical method for quantifying the likelihood of particular events
occurring (p. 9).
QQ-plot A scatterplot used to assess the normality of some sample values (p. 8).
GLOSSARY 317
Multiple R The correlation between the observed K-values and the fitted F-values from a regression model (p. 100).
Multivariate Datasets with two or more variables measured on a sample of observations (p. 83).
Natural logarithm transformation A mathematical transformation for positive-valued quantitative variables which spreads out low values and pulls in high values; that is, it makes positively skewed data look more normal (p. 142).
Nested model test Hypothesis test to see whether a subset of the predictors in a multiple linear regression model is significant (p. 104). An example of an ANOVA test. Also called an R-squared change test.
Nominal See Qualitative. Normal probability plot See QQ-plot. Observed significance level See p-value. Ordinal See Qualitative. Outcome variable See Response variable. Outlier A sample observation in a linear regression model with a srudentized residual
less than —3 or greater than +3 (p. 190). Output variable See Response variable. p-value The probability of observing a test statistic as extreme as the one observed or
even more extreme (in the direction that favors the alternative hypothesis) (p. 21). Parameter A numerical summary measure for a population such as a population mean
or a regression parameter (p. 11). Percentile A number that is greater than a certain percentage (say, 95%) of the sample
values and less than the remainder (5% in this case) (p. 4). Also called a quantile.
Point estimate A single number used as an estimate of a population parameter. For example, the sample mean is a point estimate of the population mean (p. 15).
Polynomial transformation A mathematical transformation involving increasing powers of a quantitative variable, for example, X, X2, and X3 (p. 144).
Population The entire collection of objects of interest about which we would like to make statistical inferences (p. 5).
Predicted value See Fitted value. Prediction interval A range of values that we are reasonably confident (e.g., 95%)
contains an unknown data value (such as for univariate data or for a regression response variable) (p. 25). Also called an individual prediction interval.
Predictor effect plot A line graph that shows how a regression response variable varies with a predictor variable holding all other predictors constant (p. 224).
Predictor variable(s) Variable(s) in a regression model that we use to help estimate or predict the response variable; also known as independent or input variable(s), or covariate(s) (p. 83).
Probability Mathematical method for quantifying the likelihood of particular events occurring (p. 9).
QQ-plot A scatterplot used to assess the normality of some sample values (p. 8).
3 1 8 GLOSSARY
Quadratic A particular type of polynomial transformation that uses a variable and its square, for example, X and X2 (p. 145).
Qualitative Data variable that contains labels for categories to which each sample obser-vation belongs (p. 166). Also called categorical, nominal (if there is no natural order to the categories, e.g., male/female), or ordinal (if there is a natural order to the categories, e.g., small/medim/large).
Quantile See Percentile.
Quantitative Data variable that contains meaningful numerical values that measure some
characteristic for each sample observation. Also called a scale measure (p. 35).
R-squared (R2) The proportion of variation in a regression response variable (about its
mean) explained by the model (p. 94).
R-squared change test See Nested model test.
Reciprocal transformation A mathematical transformation that divides a quantitative variable into 1, for example, l/X (p. 147).
Reference level One of the categories of a qualitative variable selected to be the compar-ison level for all the other categories. It takes the value zero for each of the indicator
variables used (p. 174).
Regression coefficients See Regression parameters.
Regression parameters The numbers multiplying the predictor values in a multiple linear regression model, that is, (b\, bj, ■ ■ ■) in E(Y) = bo + b\Xi +b2X2-\ . Also called (unstandardized) regression coefficients (p. 86).
Regression standard error (s) An estimate of the standard deviation of the random errors in a multiple linear regression model (p. 93). Also called standard error of the estimate in SPSS, root mean squared error in SAS, and residual standard error in R.
Rejection region The range of values for a probability distribution that leads to rejection
of a null hypothesis if the test statistic falls in this range (p. 20). •
Residual The difference, e, between a response Y-value and a fitted Y-value in a regression
model (p. 119).
Residual standard error R terminology for regression standard error.
Response variable Variable, Y, in a regression model that we would like to estimate or
predict (p. 83). Also known as a dependent, outcome, or output variable.
Root mean squared error SAS terminology for regression standard error.
Sample A (random) subset of the population for which we have data values (p. 11).
Sampling distribution The probability distribution of a test statistic under (hypothetical)
repeated sampling (p. 12).
Scatterplot A graph representing bivariate data with one variable on the vertical axis and
the other on the horizontal axis (p. 37).
Scatterplot matrix A matrix of scatterplots representing all bivariate associations in a
set of variables (p. 89).
Serial correlation See Autocorrelation.
Significance level The probability of falsely rejecting a null hypothesis when it is true— used as a threshold for determining significance when a p-value is less than this (p. 20).
3 1 8 GLOSSARY
Quadratic A particular type of polynomial transformation that uses a variable and its square, for example, X and X2 (p. 145).
Qualitative Data variable that contains labels for categories to which each sample obser-vation belongs (p. 166). Also called categorical, nominal (if there is no natural order to the categories, e.g., male/female), or ordinal (if there is a natural order to the categories, e.g., small/medim/large).
Quantile See Percentile. Quantitative Data variable that contains meaningful numerical values that measure some
characteristic for each sample observation. Also called a scale measure (p. 35). R-squared (R2) The proportion of variation in a regression response variable (about its
mean) explained by the model (p. 94). R-squared change test See Nested model test. Reciprocal transformation A mathematical transformation that divides a quantitative
variable into 1, for example, l/X (p. 147). Reference level One of the categories of a qualitative variable selected to be the compar-
ison level for all the other categories. It takes the value zero for each of the indicator variables used (p. 174).
Regression coefficients See Regression parameters. Regression parameters The numbers multiplying the predictor values in a multiple linear
regression model, that is, (b\, bj, ■ ■ ■) in E(Y) = bo + b\Xi +b2X2-\ . Also called (unstandardized) regression coefficients (p. 86).
Regression standard error (s) An estimate of the standard deviation of the random errors in a multiple linear regression model (p. 93). Also called standard error of the estimate in SPSS, root mean squared error in SAS, and residual standard error in R.
Rejection region The range of values for a probability distribution that leads to rejection of a null hypothesis if the test statistic falls in this range (p. 20). •
Residual The difference, e, between a response Y-value and a fitted Y-value in a regression model (p. 119).
Residual standard error R terminology for regression standard error. Response variable Variable, Y, in a regression model that we would like to estimate or
predict (p. 83). Also known as a dependent, outcome, or output variable. Root mean squared error SAS terminology for regression standard error. Sample A (random) subset of the population for which we have data values (p. 11). Sampling distribution The probability distribution of a test statistic under (hypothetical)
repeated sampling (p. 12). Scatterplot A graph representing bivariate data with one variable on the vertical axis and
the other on the horizontal axis (p. 37). Scatterplot matrix A matrix of scatterplots representing all bivariate associations in a
set of variables (p. 89). Serial correlation See Autocorrelation. Significance level The probability of falsely rejecting a null hypothesis when it is true—
used as a threshold for determining significance when a p-value is less than this (p. 20).
GLOSSARY 319
Standardize Rescale a variable by subtracting a sample mean value and dividing by a sample standard deviation value. The resulting Z-value has a mean equal to 0 and a standard deviation equal to 1 (p. 4).
Standard deviation A measure of the spread of a variable, with most of the range of a
normal random variable contained within 3 standard deviations of the mean (p. 4).
Standard error An estimate of a population standard deviation, often used to quantify the sampling variability of a test statistic or model estimate (p. 26).
Standard error of a regression parameter A standard deviation estimate used in hy-pothesis tests and confidence intervals for regression parameters (p. 111).
Standard error of estimation A standard deviation estimate used in hypothesis tests and
confidence intervals for a univariate population mean (p. 26).
Standard error of estimation for regression A standard deviation estimate used in
confidence intervals for the population mean in a regression model (p. 126).
Standard error of prediction A standard deviation estimate used in prediction intervals
for a univariate prediction (p. 26).
Standard error of prediction for regression A standard deviation estimate used in
prediction intervals for an individual response value in a regression model (p. 128).
Standard error of the estimate SPSS terminology for regression standard error.
Statistic A numerical summary measure for a sample such as a sample mean or an
estimated regression parameter (p. 11).
Stem-and-leaf plot A variant on a histogram where numbers in the plot represent actual
sample values or rounded sample values (p. 2).
Test statistic A rescaled numerical summary measure for a sample that has a known sampling distribution under a null hypothesis, for example, a t-statistic for a univariate mean or a t-statistic for a regression parameter (p. 19).
Unbiased When a statistic is known to estimate the value of the population parameter
correctly on average under repeated sampling (p. 11).
Univariate Datasets with a single variable measured on a sample of observations (p. 1).
Variance The square of the standard deviation (p. 10).
Variance inflation factor (VIF) An estimate of how much larger the variance of a regression parameter estimate becomes when the corresponding predictor is included in the model (p. 206).
Z-value See Standardize.
GLOSSARY 319
Standardize Rescale a variable by subtracting a sample mean value and dividing by a sample standard deviation value. The resulting Z-value has a mean equal to 0 and a standard deviation equal to 1 (p. 4).
Standard deviation A measure of the spread of a variable, with most of the range of a normal random variable contained within 3 standard deviations of the mean (p. 4).
Standard error An estimate of a population standard deviation, often used to quantify the sampling variability of a test statistic or model estimate (p. 26).
Standard error of a regression parameter A standard deviation estimate used in hy-pothesis tests and confidence intervals for regression parameters (p. 111).
Standard error of estimation A standard deviation estimate used in hypothesis tests and confidence intervals for a univariate population mean (p. 26).
Standard error of estimation for regression A standard deviation estimate used in confidence intervals for the population mean in a regression model (p. 126).
Standard error of prediction A standard deviation estimate used in prediction intervals for a univariate prediction (p. 26).
Standard error of prediction for regression A standard deviation estimate used in prediction intervals for an individual response value in a regression model (p. 128).
Standard error of the estimate SPSS terminology for regression standard error. Statistic A numerical summary measure for a sample such as a sample mean or an
estimated regression parameter (p. 11). Stem-and-leaf plot A variant on a histogram where numbers in the plot represent actual
sample values or rounded sample values (p. 2). Test statistic A rescaled numerical summary measure for a sample that has a known
sampling distribution under a null hypothesis, for example, a t-statistic for a univariate mean or a t-statistic for a regression parameter (p. 19).
Unbiased When a statistic is known to estimate the value of the population parameter correctly on average under repeated sampling (p. 11).
Univariate Datasets with a single variable measured on a sample of observations (p. 1). Variance The square of the standard deviation (p. 10). Variance inflation factor (VIF) An estimate of how much larger the variance of a
regression parameter estimate becomes when the corresponding predictor is included in the model (p. 206).