1 CHAPTER 6 Further Inference in the Multiple Regression Model CHAPTER OUTLINE 6.1 The F-Test 6.1.1 Testing the Significance of the Model 6.1.2 Relationship between t- and F-tests 6.1.3 More General F-tests 6.2 Nonsample Information 6.3 Model Specification 6.3.1 Omitted Variables 6.3.2 Irrelevant Variables 6.3.3 Choosing the Model Model Selection Criteria RESET 6.4 Poor Data, Collinearity, and Insignificance Key Terms Chapter 6 do-file 6.1 THE F-TEST The example used in this chapter is a model of sales for Big Andy's Burger Barn considered in Chapter 5. The model includes three explanatory variables and a constant: 2 1 2 3 4 i i i i i SALES PRICE ADVERT ADVERT e where SALES i is monthly sales in a given city and is measured in $1,000 increments, PRICE i is price of a hamburger measured in dollars, ADVERT i is the advertising expenditure also measured in thousands of dollars and i=1, 2, … , N. The null hypothesis is that advertising has no effect on average sales. For this marginal effect to be zero for all values of advertising requires 3 4 0 and 0. The alternative is 3 4 0 or 0. The parameters of the model under the null hypothesis are restricted to be zero and the parameters under the alternative are unrestricted. The F-test compares the sum of squared errors from the unrestricted model to that of the restricted model. A large difference is taken as evidence that the restrictions are false. The statistic used to test the null hypothesis (restrictions) is
29
Embed
Further Inference in the Multiple Regression Model 6.pdf · Further Inference in the Multiple Regression Model 3 Then, the F-statistic can be computed scalar fstat =...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
CHAPTER 6
Further Inference in the Multiple
Regression Model
CHAPTER OUTLINE
6.1 The F-Test
6.1.1 Testing the Significance of the Model
6.1.2 Relationship between t- and F-tests
6.1.3 More General F-tests
6.2 Nonsample Information
6.3 Model Specification
6.3.1 Omitted Variables
6.3.2 Irrelevant Variables
6.3.3 Choosing the Model
Model Selection Criteria
RESET
6.4 Poor Data, Collinearity, and Insignificance
Key Terms
Chapter 6 do-file
6.1 THE F-TEST
The example used in this chapter is a model of sales for Big Andy's Burger Barn considered in
Chapter 5. The model includes three explanatory variables and a constant:
2
1 2 3 4i i i i iSALES PRICE ADVERT ADVERT e
where SALESi is monthly sales in a given city and is measured in $1,000 increments, PRICEi is
price of a hamburger measured in dollars, ADVERTi is the advertising expenditure also measured
in thousands of dollars and i=1, 2, … , N.
The null hypothesis is that advertising has no effect on average sales. For this marginal effect
to be zero for all values of advertising requires 3 40 and 0. The alternative is
3 40 or 0. The parameters of the model under the null hypothesis are restricted to be zero
and the parameters under the alternative are unrestricted.
The F-test compares the sum of squared errors from the unrestricted model to that of the
restricted model. A large difference is taken as evidence that the restrictions are false. The
statistic used to test the null hypothesis (restrictions) is
2 Chapter 6
R U
U
SSE SSE JF
SSE N K
,
which has an F-distribution with J numerator and N−K denominator degrees of freedom when the
restrictions are true.
The statistic is computed by running two regressions. The first is unrestricted; the second has
the restrictions imposed. Save the sum of squared errors from each regression, the degrees of
freedom from the unrestricted regression (N−K), and the number of independent restrictions
imposed (J). Then, compute the following:
1896.391 1532.084 28.44
1532.084 75 4
R U
U
SSE SSE JF
SSE N K
To estimate this model load the data file andy.dta
use andy, clear
In Stata’s variables window, you’ll see that the data contain three variables: sales, price, and
advert. These are used with the regress function to estimate the unrestricted model
regress sales price advert c.advert#c.advert
Save the sum of squared errors into a new scalar called sseu using e(ssr) and the residual
degrees of freedom from the analysis of variance table into a variable called df_unrest using
e(df_r).
scalar sseu = e(ssr)
scalar df_unrest = e(df_r)
Next, impose the restriction on the model and reestimate it using least squares. Again, save the
sum of squared errors and the residual degrees of freedom.
regress sales advert
scalar sser = e(ssr)
scalar df_rest = e(df_r)
The saved residual degrees of freedom from the restricted model can be used to obtain the
number of restrictions imposed. Each unique restriction in a linear model reduces the number of
parameters in the model by one. So, imposing one restriction on a three parameter unrestricted
model (e.g., Big Andy’s) reduces the number of parameters in the restricted model to two. Let Kr
be the number of regressors in the restricted model and Ku the number in the unrestricted model.
Subtracting the degrees of freedom in the unrestricted model (N−Ku) from those of the restricted
model (N−Kr) will yield the number of restrictions you’ve imposed, i.e., (N−Kr) − (N−Ku) =
(Ku−Kr) = J. In Stata,
scalar J = df_rest - df_unrest
Further Inference in the Multiple Regression Model 3
Then, the F-statistic can be computed
scalar fstat = ((sser-sseu)/J)/(sseu/(df_unrest))
The critical value from the F(J,N−K) distribution and the p-value for the computed statistic can be
computed in the usual way. In this case, invFtail(J,N-K,) generates the level critical value
from the F-distribution with J numerator and N−K denominator degrees of freedom. The
Ftail(J,N-K,fstat) function works similarly to return the p-value for the computed statistic,
fstat.
scalar crit1 = invFtail(J,df_unrest,.05)
scalar pvalue = Ftail(J,df_unrest,fstat)
scalar list sseu sser J df_unrest fstat pvalue crit1
The output for which is:
The dialog boxes can also be used to test restrictions on the parameters of the model. The
first step is to estimate the model using regress. This proceeds just as it did in section 5.1 above.
Select Statistics > Linear models and related > Linear regression from the pull-down menu.
This reveals the regress dialog box. Using sales as the dependent variable and price, advert,
and the interaction c.advert#c.advertrt as independent variables in the regress–Linear
regression dialog box, run the regression by clicking OK. Once the regression is estimated, post-
estimation commands are used to test the hypothesis. From the pull-down menu select Statistics
> Postestimation > Tests > Test parameters, which brings up the testparm dialog box:
constraint irrelevant variables restricted sum of squares
e(df_r) joint significance test Schwartz criterion
e(r2) lincom test (hypoth 1)(hypoth 2)
e(r2_a) Manage constraints testparm varlist
e(rank) model selection t-ratio
e(rss) omitted variables ttail(df,tstat)
estat ovtest overall F-test unrestricted sum of squares
estimates store predict, xb
estimates table program
CHAPTER 6 DO-FILE [CHAP06.DO]
* file chap06.do for Using Stata for Principles of Econometrics, 4e * cd c:\data\poe4stata * Stata do-file * copyright C 2011 by Lee C. Adkins and R. Carter Hill * used for "Using Stata for Principles of Econometrics, 4e" * by Lee C. Adkins and R. Carter Hill (2011) * John Wiley and Sons, Inc.
* setup version 11.1 capture log close set more off * open log log using chap06, replace text use andy, clear * ------------------------------------------- * The following block estimates Andy's sales * and uses the difference in SSE to test * a hypothesis using an F-statistic * ------------------------------------------- * Unrestricted Model regress sales price advert c.advert#c.advert scalar sseu = e(rss) scalar df_unrest = e(df_r) * Restricted Model regress sales price scalar sser = e(rss) scalar df_rest = e(df_r) scalar J = df_rest - df_unrest * F-statistic, critical value, pvalue scalar fstat = ((sser -sseu)/J)/(sseu/(df_unrest)) scalar crit1 = invFtail(J,df_unrest,.05) scalar pvalue = Ftail(J,df_unrest,fstat) scalar list sseu sser J df_unrest fstat pvalue crit1 * ------------------------------------------- * Here, we use Stata's test statement * to test hypothesis using an F-statistic * Note: Three versions of the syntax * ------------------------------------------- regress sales price advert c.advert#c.advert testparm advert c.advert#c.advert test (advert=0)(c.advert#c.advert=0) test (_b[advert]=0)(_b[c.advert#c.advert]=0) * ------------------------------------------- * Overall Significance of the Model * Uses same Unrestricted Model as above * ------------------------------------------- * Unrestricted Model (all variables) regress sales price advert c.advert#c.advert scalar sseu = e(rss) scalar df_unrest = e(df_r) * Restricted Model (no explanatory variables) regress sales scalar sser = e(rss) scalar df_rest = e(df_r) scalar J = df_rest - df_unrest * F-statistic, critical value, pvalue scalar fstat = ((sser -sseu)/J)/(sseu/(df_unrest)) scalar crit2 = invFtail(J,df_unrest,.05) scalar pvalue = Ftail(J,df_unrest,fstat) scalar list sseu sser J df_unrest fstat pvalue crit2 * ------------------------------------------- * Relationship between t and F * ------------------------------------------- * Unrestricted Regression regress sales price advert c.advert#c.advert scalar sseu = e(rss) scalar df_unrest = e(df_r) scalar tratio = _b[price]/_se[price] scalar t_sq = tratio^2
Further Inference in the Multiple Regression Model 27
* Restricted Regression regress sales advert c.advert#c.advert scalar sser = e(rss) scalar df_rest = e(df_r) scalar J = df_rest - df_unrest * F-statistic, critical value, pvalue scalar fstat = ((sser -sseu)/J)/(sseu/(df_unrest)) scalar crit = invFtail(J,df_unrest,.05) scalar pvalue = Ftail(J,df_unrest,fstat) scalar list sseu sser J df_unrest fstat pvalue crit tratio t_sq * ------------------------------------------- * Optimal Advertising * Uses both syntaxes for test * ------------------------------------------- * Equivalent to Two sided t-test regress sales price advert c.advert#c.advert test _b[advert]+3.8*_b[c.advert#c.advert]=1 test advert+3.8*c.advert#c.advert=1 * t stat for Optimal Advertising (use lincom) lincom _b[advert]+3.8*_b[c.advert#c.advert]-1 lincom advert+3.8*c.advert#c.advert-1 scalar t = r(estimate)/r(se) scalar pvalue2tail = 2*ttail(e(df_r),t) scalar pvalue1tail = ttail(e(df_r),t) scalar list t pvalue2tail pvalue1tail * t stat for Optimal Advertising (alternate method) gen xstar = c.advert#c.advert-3.8*advert gen ystar = sales - advert regress ystar price advert xstar scalar t = (_b[advert])/_se[advert] scalar pvalue = ttail(e(df_r),t) scalar list t pvalue * One-sided t-test regress sales price advert c.advert#c.advert lincom advert+3.8*c.advert#c.advert-1 scalar tratio = r(estimate)/r(se) scalar pval = ttail(e(df_r),tratio) scalar crit = invttail(e(df_r),.05) scalar list tratio pval crit * Joint Test regress sales price advert c.advert#c.advert test (_b[advert]+3.8*_b[c.advert#c.advert]=1) (_b[_cons]+6*_b[price]+ 1.9*_b[advert] /// +3.61*_b[c.advert#c.advert]= 80) * ------------------------------------------- * Nonsample Information * ------------------------------------------- use beer, clear gen lq = ln(q) gen lpb = ln(pb) gen lpl = ln(pl) gen lpr = ln(pr) gen li = ln(i) constraint 1 lpb+lpl+lpr+li=0 cnsreg lq lpb lpl lpr li, c(1) * ------------------------------------------- * MROZ Examples * ------------------------------------------- use edu_inc, clear regress faminc he we regress faminc he * correlations among regressors correlate
28 Chapter 6
* Irrelevant variables regress faminc he we kl6 x5 x6 * ------------------------------------------- * Stata uses the estat ovtest following * a regression to do a RESET(3) test. * ------------------------------------------- regress faminc he we kl6 estat ovtest program modelsel scalar aic = ln(e(rss)/e(N))+2*e(rank)/e(N) scalar bic = ln(e(rss)/e(N))+e(rank)*ln(e(N))/e(N) di "r-square = "e(r2) " and adjusted r-square " e(r2_a) scalar list aic bic end quietly regress faminc he di "Model 1 (he) " modelsel estimates store Model1 quietly regress faminc he we di "Model 2 (he, we) " modelsel estimates store Model2 quietly regress faminc he we kl6 di "Model 3 (he, we, kl6) " modelsel estimates store Model3 quietly regress faminc he we kl6 x5 x6 di "Model 4 (he, we, kl6. x5, x6) " modelsel estimates store Model4 estimates table Model1 Model2 Model3 Model4, b(%9.3f) stfmt(%9.3f) se /// stats(N r2 r2_a aic bic) regress faminc he we kl6 predict yhat gen yhat2=yhat^2 gen yhat3=yhat^3 summarize faminc he we kl6 *------------------------------- * Data are ill-conditioned * Reset test won' work here * Try it anyway! *------------------------------- regress faminc he we kl6 yhat2 test yhat2 regress faminc he we kl6 yhat2 yhat3 test yhat2 yhat3 *---------------------------------------- * Drop the previously defined predictions * from the dataset *---------------------------------------- drop yhat yhat2 yhat3 *-------------------------------- * Recondition the data by * scaling FAMINC by 10000 * ------------------------------- gen faminc_sc = faminc/10000 regress faminc_sc he we kl6 predict yhat gen yhat2 = yhat^2 gen yhat3 = yhat^3 summarize faminc_sc faminc he we kl6 yhat yhat2 yhat3 regress faminc_sc he we kl6 yhat2 test yhat2 regress faminc_sc he we kl6 yhat2 yhat3 test yhat2 yhat3
Further Inference in the Multiple Regression Model 29
* Extraneous regressors regress faminc he we kl6 x5 x6 * ------------------------------------------- * Cars Example * ------------------------------------------- use cars, clear summarize corr regress mpg cyl regress mpg cyl eng wgt test cyl test eng test eng cyl * Auxiliary regressions for collinearity * Check: r2 >.8 means severe collinearity regress cyl eng wgt scalar r1 = e(r2) regress eng wgt cyl scalar r2 = e(r2) regress wgt eng cyl scalar r3 = e(r2) scalar list r1 r2 r3 log close program drop modelsel