8/8/2019 L4&5 Multiple Regression 2010B
1/77
Chi-square goodness-of-fit test
The chi-square test can be used to determine whether the sample data conform toany kind of expected distribution and the data is categorical (nominal orordinal). The test determines whether the data fits a given distribution, such asuniform, normal,
Where:
f0 = frequency of observed (or actual) values
fe = frequency of expected (or theoretical) values
k = number of categoriesm = number of parameters being estimated from the sample data
1
e
eo
f
ff 22 )( = df = k 1 m
8/8/2019 L4&5 Multiple Regression 2010B
2/77
Chi-square test for independence
The Chi-square test for independence is based on the count in acontingency (or cross tabs) table. It tests whetherthe counts for therow categories areprobabilistically independent of the counts for thecolumn categories.
Where:
Oi= Observed number of observations in category I
Ei = Expected number of observations in category I
2
ij
ijij
ijE
EO 22
)( = df = (row 1)(col 1)
8/8/2019 L4&5 Multiple Regression 2010B
3/77
Chi-square test Local survey
x In a national survey, consumers were asked the question, In general, how wouldyou rate the level of service that business in this country provide?
x The distribution of responses was in the National column:x Suppose a manager wants to find out whether this result apply to his customers
of her store in the city.x She did a similar survey to 207 randomly selected customers in her
store and observed the results as in the
Local column.x She can use Chi-square test to see
if her observed frequencies of responsesare the same frequencies that would be
expected on the national survey.3
Nationalresponse
Local response (of207 asked)
Excellent 8% 21
Pretty good 47% 109Only fair 34% 62
Poor 11% 15
8/8/2019 L4&5 Multiple Regression 2010B
4/77
Clive Morley 4
Hypothesis Testing Local Survey Example
x Example using Excel
Microsoft Excel
Worksheet
8/8/2019 L4&5 Multiple Regression 2010B
5/77
Clive Morley 5
Steps in Hypothesis Testing
1: State the null and alternative hypotheses.
2: Make a judgment about the population distribution,the level of measurement, and then select theappropriate statistical test.
3: Decide upon the desired level of significance.
4: Collect data from a sample and compute thestatistical test to see if the level of significance ismet.
5: Accept or reject the null hypothesis.
8/8/2019 L4&5 Multiple Regression 2010B
6/77
Clive Morley 6
Contingency Tables
Two way table
Test whether rows and columns are associated (orindependent)
Can calculate expected numbers in each cell if rows andcolumns independentCompare with actual (observed)
= (Oi - Ei)/Ei
8/8/2019 L4&5 Multiple Regression 2010B
7/77
Clive Morley 7
Contingency Tables - Example
Two way tableeg. responses to question 6 (a, b, or c) by two groups:
Q6 Group 1 Group 2
a 10 18b 12 22
c 15 26
The numbers in the table are counts (frequencies) of thenumber falling into each category
8/8/2019 L4&5 Multiple Regression 2010B
8/77
Clive Morley 8
Contingency Tables - Example
Q6 Group 1 Group 2
a 27% 27%
b 32 33
c 41 39
8/8/2019 L4&5 Multiple Regression 2010B
9/77
Clive Morley 9
Contingency Tables - Example
= 0.014
prob = 0.993
Not significant
Q6 Group 1 Group 2a 10 18
b 12 22c 15 26
Chi-square test statistic = 0.0142
p-value = 0.9929
Microsoft Excel
Worksheet
8/8/2019 L4&5 Multiple Regression 2010B
10/77
Clive Morley 10
Statistical Decision
For t-test (for mean or proportion):Null Hypothesis: no-change situation
For Chi square test:
Null Hypothesis: two variable sets are independent
test value t > t critical-value (usually 2): Reject Null Hypothesis
p-value < alpha (usually 0.5): Reject Null Hypothesis
test value Chi-square > Chi-square critical-valua: Reject Null Hypothesis
8/8/2019 L4&5 Multiple Regression 2010B
11/77
Clive Morley 11
Type I and Type II errors
Two ways a hypothesis test result can be wrong:
I - find hypothesis is wrong, when it is correct
II - find hypothesis is correct, when it is wrong
8/8/2019 L4&5 Multiple Regression 2010B
12/77
Clive Morley 12
Type I and Type II errors
TEST FINDS
REALITYHypothesis correct Hypothesis wrong
Hypothesis correct type II error
Hypothesis wrong type I error
(test significance level)
8/8/2019 L4&5 Multiple Regression 2010B
13/77
Clive Morley 13
Type I and Type II errors
Prob value = observed probability of type I error
In control charts, control limits often set at 3 standard deviationsequivalent to setting probability of type I error at 0.003
minimizes reacting when dont need to
Using t = 2equivalent to setting probability of type I error at 0.05
8/8/2019 L4&5 Multiple Regression 2010B
14/77
BUSM 4074Management Decision
Making
Prof. Clive MorleyGraduate School of Business
8/8/2019 L4&5 Multiple Regression 2010B
15/77
BUSM 4074 Management DecisionMaking
4. Multiple regression
5. Multiple regression (cont)
8/8/2019 L4&5 Multiple Regression 2010B
16/77
Unit 4&5 - Learning Objectives
x To understand the use of the multiple regression technique,including linear, log-log, logit, autoregressive and time seriesmodels
x To be able to carry out straightforward multiple regression
model estimationx To be able to interpret standard computer output from a multiple
regression exercise, including to assess variables forsignificance, estimate the size of an explanatory variablesimpact on the dependent variable, assess model fit and to use
the model to estimate values of the dependent variable
Clive Morley 16
8/8/2019 L4&5 Multiple Regression 2010B
17/77
as my salary increases, computers are getting
cheaper,
therefore to get cheaper computers, pay me more
What is wrong with this (very attractive) argument?
8/8/2019 L4&5 Multiple Regression 2010B
18/77
Multiple regression
very powerful widely used statistical techniquemany applications in all sorts of areas
used to estimate the relationship between variables
For example, Y might be the sales of a certain item and Xthe price of it. The linear relationship is estimated:
Y = a + bX
8/8/2019 L4&5 Multiple Regression 2010B
19/77
Multiple regression
parameters a and b are estimated from data on thevariables X and Y
correlation establishes whether a linear relationship existsand how strong it is
regression estimates what the relationship is
8/8/2019 L4&5 Multiple Regression 2010B
20/77
Multiple regression
model is readily extended to include other explanatoryvariables: for example, sales (Y) might depend on price(X1), buyers incomes (X2) and advertising expenditure(X3), giving the equation to be estimated
Y = a + b1X1 + b2X2 + b3X3
Data on a number of cases (eg. various sales areas ordifferent times) for all the variables is needed
8/8/2019 L4&5 Multiple Regression 2010B
21/77
Multiple regression
the explanatory variables do not exactly predict the valueof Y
due to random effects
the impacts of other (hopefully minor) variables, etc.
so the equation does not exactly fit
residuals
8/8/2019 L4&5 Multiple Regression 2010B
22/77
Purposes of multiple regression
x to estimate the equation, so we canpredict Y for givenvalues of the explanatory variables, or
x to estimate the effects of variables on Y (through the b
parameters of the variables of interest, and also throughthe variables correlation with Y), or
x to determine which potential explanatory variableshave a significant impact on Y (through testing thesignificance of the relevant b values).
8/8/2019 L4&5 Multiple Regression 2010B
23/77
Theory Least squares
Computer finds the values for parameters that give theline of best fit
Best fit is defined as minimising the sum of squarederrors (SSE)
8/8/2019 L4&5 Multiple Regression 2010B
24/77
Theory model specification
Y is some function of a lot of explanatory variables
Narrow the lot ofexplanatory variables down to those
expected to be important (ignore others)
Then specify functional form of relationship linear isusual starting point for regression
(but see discussion of log-log models below)
8/8/2019 L4&5 Multiple Regression 2010B
25/77
Theory model specification
Model specification which variables, linear (or other)form, etc based on relevant theory
Estimated relationship thenbased on data
8/8/2019 L4&5 Multiple Regression 2010B
26/77
Multiple regression
overall fit of the equation estimated is measured byR-squared (R)the proportion of the variation in Y explained by theequation
also is the square of the correlationbetween the fitted andactual Y values
Each parameter estimated (and hence each variable) can betested for individual significance
8/8/2019 L4&5 Multiple Regression 2010B
27/77
Linear Regression Example
Data: House Price(y)
Sq Feet(x)
245 1400312 1600
279 1700308 1875199 1100219 1550
405 2350324 2450319 1425255 1700
8/8/2019 L4&5 Multiple Regression 2010B
28/77
Plot of data
Linear Regression Example
y = 75.814 + 0.123xSq.Fteet
8/8/2019 L4&5 Multiple Regression 2010B
29/77
29
Simple Linear Regression Model
iii xy ++= 10
Random error for thisXi value
ii
Slope = 1
Intercept = 0
Observed
value of Y forXi
Y
Xi
cmXY ^
+=
8/8/2019 L4&5 Multiple Regression 2010B
30/77
X(SqFt)
Y($000)
Predicted Y
Y
Y - Y
1400 245 251.92 -6.92316
1600 312 273.88 38.12329
1700 279 284.85 -5.85348
1875 308 304.06 3.93716
1100 199 218.99 -19.99284
1550 219 268.39 -49.38832
2350 405 356.20 48.79749
2450 324 367.18 -43.17929
1425 319 254.67 64.33264
1700 255 284.85 -29.85348
30
Excel Residual Output for House Price model
It shows how well the regression line fits the datapoints. The best and worst predictions were 3.94 and
64.3, respectively.
8/8/2019 L4&5 Multiple Regression 2010B
31/77
31
_
_
Measures of variation
SSE: Sum of Squares of Error, SSR: Sum of Squares of RegressionXi
Y
X
Yi
SSyy = (Yi - Y)2SSE = (Yi - i )2
_
YY
Y_
SSR = (i - Y)2
8/8/2019 L4&5 Multiple Regression 2010B
32/77
Total Sum ofSquares
Regression Sum ofSquares
Error Sum ofSquares
32
x Total variation is made up of two parts: SS yy = SSR + SSE
=
2)( YYSSR i =2)( ii YYSSE
=
2)( YYSSiyy
Where: Y = Average value of the dependent variable
Yi = Observed values of the dependent variable
i = Predicted value of Y for the given X i value
_
SSyy = Total Sum of SquaresMeasures the variation of the Yi values around their mean YSSR = Regression Sum of SquaresExplained variation attributable to the relationship between X and YSSE = Error Sum of Squares
Variation attributable to factors other than the relationship between X and Y
Measures of variation
8/8/2019 L4&5 Multiple Regression 2010B
33/77
33
Standard Error of the Estimate
( )
=
=
xybyby
SSE
yy10
2
2
The standard error of the estimate is the standard deviation oftheerror of a regression model
Sum of SquaresError
Standard Error
of the
Estimate 2
=
n
SSEse
Standard Error of the Estimate tells us how spread-out the errors is.
8/8/2019 L4&5 Multiple Regression 2010B
34/77
Computer output:Correlation R = 0.837 , R-squared = 0.700
Coefficient t sig
Constant 75.813 2.508 0.0204
Sq. feet 0.123 7.009 0.0000
Linear Regression Example
8/8/2019 L4&5 Multiple Regression 2010B
35/77
Linear regression - Example
the model estimated is:House Price = 75.813 + 0.123 Sq.Feet
x correlation between House Price and Sq. Feet is high, at0.837, and the fit of the regression model is quite strongR = 0.700, i.e. 70%
x The Sq. Feet variable is highly significantt = 7.009,p = 0.0000
x Implicit hypothesis is thatcoefficient is zero, i.e. variablehas no impact
8/8/2019 L4&5 Multiple Regression 2010B
36/77
Linear regression - Example
Add another variable to data - LocationPrice Sq. Feet Location
245 1400 2
312 1600 3
279 1700 4308 1875 3
199 1100 5
219 1550 1
405 2350 1
324 2450 5319 1425 4
etc
8/8/2019 L4&5 Multiple Regression 2010B
37/77
Linear regression - Example
Computer output:
Correlation R = 0.839 , R-squared = 0.705Without Location: Correlation R = 0.837 , R-squared = 0.700
Coefficient t sig
Constant 73.510 2.366 0.0282
Sq. Feet
Location
0.120
2.283
6.475
0.525
0.0000
0.6050
8/8/2019 L4&5 Multiple Regression 2010B
38/77
Slight improvement in R
Location not significant (sig orp-value high)
- considerdropping from model
Linear Regression Example
Coefficient t sig
Constant 73.510 2.366 0.0282
Sq. FeetLocation
0.1202.283
6.4750.525
0.00000.6050
8/8/2019 L4&5 Multiple Regression 2010B
39/77
Linear regression - example
Model Market to Book Value (MBV) as function ofRevenueData
Company MBV Revenue
12
3
2.0111.814
1.522
39.5054.165
10.40645
1.8261.824
7.6022.942
6
7etc
1.337
1.650
5.228
1.697
8/8/2019 L4&5 Multiple Regression 2010B
40/77
Linear regression - example
OutputDep Var: MBV N: 71
Multiple R: 0.318
Squared multiple R: 0.101
variable coefficient t value sig
Constant 2.010 11.465 0.000
Revenue 0.046 2.789 0.007
8/8/2019 L4&5 Multiple Regression 2010B
41/77
Linear regression - example
Model is:MBV = 2.010 + 0.046 Revenue
Fit not great (R = 0.10, 10%)But significant (F = 7.778,p = 0.007)
Revenue variable is significant
(t = 2.789, p = 0.007)
8/8/2019 L4&5 Multiple Regression 2010B
42/77
Linear regression - example
More factors (variables) impact on MBV and need to beconsidered
*** WARNING ***
Case 1 has large leverage
(Leverage = 0.243)Case 8 has large leverage
(Leverage = 0.163)
Case 56 is an outlier
(Standardized Residual = 5.167)Durbin-Watson D Statistic 1.682
First Order Autocorrelation 0.140
8/8/2019 L4&5 Multiple Regression 2010B
43/77
Multiple regression
Avoid step-wise regression
Look fornon-linear patterns in scattered plot
Diagnostic checksx Multicollinearity (different xs move together in systematic way)x
Autocorrelation (successive error terms are correlated with eachother)x Outliers (data points that are not together with the rest)x Heteroscedasticity (non-constant variance)x Leverage (observation with large effects on outcomes)
8/8/2019 L4&5 Multiple Regression 2010B
44/77
Multiple regression - Example
Hospital and Nursing Salary example
(9.10 of textbook)
8/8/2019 L4&5 Multiple Regression 2010B
45/77
Multiple regression - Example
Dialogue box
Dependent: Annual Nursing Salary
Independents: Number of beds in home
Annual medical in-patient daysAnnual total patient days
Rural (1) and non-rural (0) homes
8/8/2019 L4&5 Multiple Regression 2010B
46/77
Multiple regression - Example
Model Summary
R R Square Adjusted R Square
0.8803 0.775 0.7557
Std. Error of the Estimate $82,024.63
ANOVA F SigRegression 40.4375 0.000
8/8/2019 L4&5 Multiple Regression 2010B
47/77
Multiple regression - Example
R=0.88 Coeff. of correlation, the relationship between 2variables. R=-1: strong, negative relationship. R=1: strong,positive relationship. R=0: no relationship between 2 variables.
R Square =0.775 Coeff. of Determination. 77.5% of a change in Ycan be explained by a change in X. The other 22.5% is by some
other factors. This fit is quite strong.Adjusted R Square=0.7557 Adjusted for multiple variables. A
decrease Adjusted R Square means the newly added variable is notsignificant.
8/8/2019 L4&5 Multiple Regression 2010B
48/77
Multiple regression - Example
Std. Error of the Estimate $82,024.63Sig=0.000 Significant fit
p=0.1799 (beds) Too high (compared to ), shouldconsider dropping beds as a variable of fit
8/8/2019 L4&5 Multiple Regression 2010B
49/77
Multiple regression - Example
Coeff-icients
StandardError
t value p value(sig)
Constant (Intercept) 113.5003 495.4654 0.2291 0.8198
Number of beds in home 9.6399 7.0804 1.3615 0.1799
Annual medical in-patientdays (100s)
-7.4072 2.4012 -3.0848 0.0034
Annual total patient days(100s)
15.7674 2.7550 5.7232 0.0000
Rural (1) and non-rural (0)homes
-79.5796 288.1857 -0.2761 0.7837
8/8/2019 L4&5 Multiple Regression 2010B
50/77
Multiple regression - Example
The interpretation of the coefficients is that if the in-patient days,total patient days and rural factor are held constant, then the annualnursing salary is expected to increase by $9.64 for each extra bed inhome. Similarly, annual nursing salary is expected to increase (decrease)by
$-740.72, $1576.74 and $-79.58 for each extra in-patient day, patientday and rural factor, respectively, other variables held constant. The $11300 can be interpreted as the annual base salary.
Coeff-icients Standard Error t value p value (sig)
Constant (Intercept) 113.5003 495.4654 0.2291 0.8198
Number of beds in home 9.6399 7.0804 1.3615 0.1799Annual medical in-patient days (100s) -7.4072 2.4012 -3.0848 0.0034
Annual total patient days (100s) 15.7674 2.7550 5.7232 0.0000
Rural (1) and non-rural (0) homes -79.5796 288.1857 -0.2761 0.7837
8/8/2019 L4&5 Multiple Regression 2010B
51/77
Multiple regression - Example
x Compare the intercepts and theslopes of multipleregression with those of linear regression?
Changes have occurred. (Difficult to analyse in detail)
x Sestill is the error of the estimate. Note that multipleregression yields abetter s
ethan those in linear
x R2 similarly (but would increase with extra xs)x
Adjusted R2
a decrease indicates an added x notbelong to the equation.
8/8/2019 L4&5 Multiple Regression 2010B
52/77
Multiple regression - Example
x Tolerance stats OK (> 0.1), so no multicollinearityissue. If individual R2 is too high (almost equal R2 ofmultiple regression): suspect multicollinearity!
x Durbin-Watson stat d = 2.4789, somewhat negativelyautocorrelation issue. 1 < d 2 would indicate noautocorrelation concern!
x
Outlier see graphs of residuals. Normal shape onhistogram, random (no patterned) on scatter plots.
8/8/2019 L4&5 Multiple Regression 2010B
53/77
Multiple regression - Example
RegressionStandardizedResidual
6.00
5.00
4.00
3.00
2.00
1.00
0.00
-1.00
-2.00
-3.00
-4.00
Histogram
Dependent Variable: Current Salary
Frequen
cy
160
140
120
100
80
60
40
20
0
Std. Dev=1.00
Mean=0.00
N=474.00
S t a n d a r d iz e d R e
0
5
1 0
1 5
2 0
- 3 .5 - 3 - 2 .5 - 2 - 1 .5 - 1 - 0 .5 0 0 .5 1 1 .5 2 2 .5 3 3 .5
V a r ia
Freque
ncy
Standardized Residual distribution is relatively normal. Relatively good fit
8/8/2019 L4&5 Multiple Regression 2010B
54/77
Multiple regression - Example
Scatter plot Randomly distributed. Both sides of 0.00
Scatterplot
Dependent Variable: Current Salary
Current Salary
140000120000100000800006000040000200000
RegressionStandardizedResidual
8
6
4
2
0
-2
-4
-6
-2500.0000
-2000.0000
-1500.0000
-1000.0000
-500.0000
0.0000
500.0000
1000.0000
1500.0000
2000.0000
0 10 20 30 40 50 60
Estimate
The red plot is an example of aheteroscedasticity (unequal variance
distribution)
8/8/2019 L4&5 Multiple Regression 2010B
55/77
Multiple regression - Example
Dummy variables categorical data, related to dependentvariable
Other names: indicators, 0-1 variables
If dummy variable = 1: correct categorydummy variable = 0: not in that category
The coefficient of this variable indicates the dependentvariable difference due to this (dummy) variable
8/8/2019 L4&5 Multiple Regression 2010B
56/77
Salary = 113.50 + 9.64Bed 7.41In-ptDay + 15.77Tot-ptDay 79.58Rural
Rural = 0 and Rural = 1: Salary difference = - $7958(rural is lower)
Two or more categorical variables can be involved. Itindicates the y difference when the rest is the same
Coeff-icients Standard Error t value p value (sig)
Constant (Intercept) 113.5003 495.4654 0.2291 0.8198Number of beds in home 9.6399 7.0804 1.3615 0.1799
Annual medical in-patient days (100s) -7.4072 2.4012 -3.0848 0.0034
Annual total patient days (100s) 15.7674 2.7550 5.7232 0.0000
Rural (1) and non-rural (0) homes -79.5796 288.1857 -0.2761 0.7837
Multiple regression - Example
A l i R i
8/8/2019 L4&5 Multiple Regression 2010B
57/77
Analysing a Regression
x
p-value of the Regressionx p-value of each x, to consider dropping it or not
x Adjusted R-square value
x Standard Error of the Regression estimate
x Scatter plot of residuals Randomness. Outliers. Heteroscedacisticity (non-
equal variance)x Histogram of the residuals
x Durbin-Watson statistic (d)
Linear, Quadratic and Log regressionl
8/8/2019 L4&5 Multiple Regression 2010B
58/77
example
The Public Service Electric Company produces differentquantities of electricity each month, depending on thedemand. File Poly and Log examples - Power.xls listthe number of units of electricity produced (Units) and
the total cost of producing these (Cost) for a 36-monthperiod. How can regression be used to analyse the
relationship between Cost and Units?
M l i l i E l
8/8/2019 L4&5 Multiple Regression 2010B
59/77
Multiple regression - Example
R Square 0.7359Standard Error 2733.7424
R Square 0.8216Standard Error 2280.7998
L d l
8/8/2019 L4&5 Multiple Regression 2010B
60/77
Log model
Very often we use multiple regression to fit a multiplicative model:Y = aX1b1 X2b2 X3b3
Any explanatory variable change by 1%, the dependent variablechange by a constant percentage
This can be estimated by making a logarithmic transformation ofthe equation, which gives:
ln(Y) = ln(a) + b1ln(X1)+ b2ln(X2)+ b3ln(X3)
L d l
8/8/2019 L4&5 Multiple Regression 2010B
61/77
Log model
Thus we can calculate ln(Y), ln(X1), ln(X2), ln(X3)
And regress these variables in the usual way, to estimatethe parameters of the original equation.
L d l l
8/8/2019 L4&5 Multiple Regression 2010B
62/77
Log model example
File CarSales.xls contains annual data (1970 1999) ondomestic auto sales in the United States. The variables
are defined as:Sales: annual domestic auto sales (in number of units)PriceIndex: consumer price index of transportationIncome: real disposable income
Interest: prime rate interest
M lti l i E l
8/8/2019 L4&5 Multiple Regression 2010B
63/77
Regression and Correlation
Observations 30R Square 0.5414
Standard Error 758049.7773Adjusted R Square 0.4680Multiple R 0.7358
Multiple regression - Example
LogRegres Coefficients t value p value
Intercept -110360558.48 -45.9500 0.0000
Log(Sales) 7522741.47 54.4195 0.0000
Log(PriceIndex) 35983.70 0.2297 0.8202Log(Income) -162258.29 -0.6222 0.5395
Log(Interest) -13588.13 -0.2133 0.8328
Regression and Correlation
Observations 30
R Square 0.9956
Standard Error 74199.1103Adjusted R Square 0.9949
Multiple R 0.9978
MultiRegres Coefficients t value p valueIntercept 513941538.55 0.7356 0.4688
Year -258651.57 -0.7234 0.4761
PriceIndex -18121.97 -0.4786 0.6364
Income 2175.75 1.1204 0.2732
Interest -8895378.05 -1.4810 0.1511
Multiple regression Example
8/8/2019 L4&5 Multiple Regression 2010B
64/77
Multiple regression - Example
Log modelProbably slightlybetter model
R-square = 0.99, Good
Less outliers, slightly better
Residual plots not necessarily better
Multiple Regression Goal
8/8/2019 L4&5 Multiple Regression 2010B
65/77
Multiple Regression Goal
Remove any unimportant(multicorrelation orautocorrelation, etc) variables
out of the equation and decidewhich variable(s) are importantfor the regression model.
Use that model for yourprediction.
Multiple regression time series example
8/8/2019 L4&5 Multiple Regression 2010B
66/77
Multiple regression time series example
Plot CarSales.xls data, Year vs. Sales
Multiple regression time series example
8/8/2019 L4&5 Multiple Regression 2010B
67/77
Multiple regression time series example
Period Sales (000)
2003 Quarter I 25.4
2003 Quarter II 23.8
2003 Quarter III 22.0
2003 Quarter IV 28.6
2004Quarter I 28.5
2004 Quarter II 27.0etc
Multiple regression time series example
8/8/2019 L4&5 Multiple Regression 2010B
68/77
Multiple regression time series example
0 5 10 15 20TIME
20
25
30
35
40
45
SA
LES
Multiple regression time series example
8/8/2019 L4&5 Multiple Regression 2010B
69/77
Multiple regression time series example
Create dummy variables for the Quarters and time period
Period Sales Time QII QIII QIV
2003 Q I 25.4 1 0 0 0
2003 Q II 23.8 2 1 0 02003 Q III 22.0 3 0 1 0
2003 Q IV 28.6 4 0 0 1
2004 Q I 28.5 5 0 0 0
2004 Q II 27.0 6 1 0 0etc
Multiple regression time series example
8/8/2019 L4&5 Multiple Regression 2010B
70/77
Multiple regression time series example
Squared multiple R: 0.987
Effect Coefficient t P
CONSTANT 23.679 50.5 0.000
TIME 1.005 28.5 0.000
QII -2.525 -5.2 0.000QIII -5.070 -9.8 0.000
QIV 0.450 0.9 0.401
Could drop QIV and re-estimate
Multiple regression time series example
8/8/2019 L4&5 Multiple Regression 2010B
71/77
Multiple regression time series example
Model as estimated is:
Sales
= 23.679 + 1.005 Time - 2.525QII 5.070QIII + 0.450QIV
Say data ended at Time = 24 i.e. 2008 QIVUse model to forecast
e.g. forecast sales in 2009 in quarters I and II
Multiple regression time series example
8/8/2019 L4&5 Multiple Regression 2010B
72/77
Multiple regression time series example
2009 quarters I is Time = 25, QI = 1, QII=0, QIII=0, QIV=0
Sales = 23.679 + 1.005 25 - 0 0 + 0
= 48.808 i.e. $48,800
2009 quarters II is Time = 26, QI = 0, QII=1, QIII=0, QIV=0Sales = 23.679 + 1.005 26 - 0 2.525 + 0
= 47.284 i.e. $47,300
Autoregression
8/8/2019 L4&5 Multiple Regression 2010B
73/77
Autoregression
Another way of dealing with time series is Autoregression
Often used when Durbin-Watson indicates autocorrelation (acommon issue with time series data)
Or because it makes theoretical sensethat one periods value depends (partly) on the previousvalue of the series
Use previous (lagged) values as explanatory variable
Autoregression
8/8/2019 L4&5 Multiple Regression 2010B
74/77
u o eg ess o
In example, add another variable, which is the lagged sales
Period Sales Time QII QIII QIV lagSales2003 Q I 25.4 1 0 0 0 -
2003 Q II 23.8 2 1 0 0 25.42003 Q III 22.0 3 0 1 0 23.82003 Q IV 28.6 4 0 0 1 22.02004 Q I 28.5 5 0 0 0 28.6
2004 Q II 27.0 6 1 0 0 28.5etc
Autoregression
8/8/2019 L4&5 Multiple Regression 2010B
75/77
g
The lagged variable would replace the Time (trend) variable
First data point is lost, as we dont have a lagged value for it
Can handle seasonality by having another variable, Sales
lagged by the seasonality period (e.g. 4 terms)
Logit regression
8/8/2019 L4&5 Multiple Regression 2010B
76/77
g g
If the dependent variable is categorical, not metrice.g. accounting graduates, membership of CPA Aust ornot is dependent variableX variables might be gender, age, importance of
joining cost, importance of brand status, etc
Regression possible, special technical issues
Reference
8/8/2019 L4&5 Multiple Regression 2010B
77/77
Ragsdale (2008) chapter 9, + pp522-28