1Business SchoolJ.Watson, Semester 1 2015
1
Quantitative Methods for Business COMM5005 Lecture 12
In this final lecture we will look at how to interpret the Excel
output for a multiple regression see how to separate the trend and
cyclical components in time series data
forecastingsee how to calculate index numbers
2
Readings
For todays topics Berenson et al.Ch 12.7Ch 13.1-13.4, 13.6Ch 14.1-14.4Ch 14.9
3
1.Multiple regression example We will now look at an example of a multiple
regression where we have an employees superannuation account balance as the dependent variable. There is data from 20 employees.
We are trying to estimate the relationship that the independent variables years in workforce, gender and current salary have on the superannuation balance.
Gender is a qualitative variable. We represent it by using a dummy variable which has a value of 1 for a male and 0 for a female employee.
4
superannuation dataY e a r s in W o r k f o r c e G e n d e r
S a la r y $ '0 0 0
S u p e r a n n u a t io n B a la n c e $ '0 0 0
2 5 1 5 0 .6 1 1 7 .93 1 1 7 5 .2 4 1 7 .13 7 1 4 8 .3 1 5 6 .23 7 1 5 2 .3 2 0 2 .94 0 1 1 0 6 .2 5 0 6 .23 0 1 6 1 .3 2 5 53 2 1 5 2 .6 1 7 9 .82 6 1 4 8 .9 8 2 .62 9 1 4 2 .6 4 7 .33 6 1 8 9 .5 4 8 8 .52 8 1 3 3 .1 7 0 .52 9 1 3 5 .6 1 2 0 .11 0 0 3 1 .2 1 5 .61 5 0 3 3 .9 8 .93 0 0 4 9 .7 1 2 4 .72 8 0 6 9 .3 2 2 3 .43 5 0 8 6 .4 3 0 1 .61 7 0 2 8 .1 5 2 .82 5 0 4 6 .2 6 7 .92 2 0 5 0 .7 8 9 .5
25
Excel output
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.951399308R Square 0.905160643Adjusted R Square 0.887378264Standard Error 50.11767261Observations 20
ANOVAdf SS MS F Significance F
Regression 3 383564.8798 127855 50.90211 2.09087E-08Residual 16 40188.49773 2511.781Total 19 423753.3775
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept -196.0738799 45.52498098 -4.306951 0.000543 -292.582528 -99.56523Years in Workforce -0.604845409 2.633748738 -0.229652 0.821272 -6.18814328 4.978452Gender 59.58681943 29.84990611 1.996215 0.06322 -3.6921543 122.8658Current Salary $'000 6.480588884 0.799454349 8.106265 4.67E-07 4.785821383 8.175356
6
Regression equationThe sample regression equation is now
We can no longer draw this as a line because it is in four dimensional space. From the Excel output below we see that the equation should be
1 2 3
1
2
3
196.07 0.6048 59.5868 6.4806 where predicted superannuation balance $'000
years in workforce genderX salary$'000
i i i i
i
i
i
i
Y X X X
YXX
0 1 1 2 2 3 3i i i iY b b X b X b X
7
Estimate 1
So a male employee with a current salary of $50,000 who has worked for 30 years is estimated to have a superannuation balance of
or $169,397
196.07 0.6048(30) 59.5868(1) 6.4805(50)169.397
Y
8
Estimate 2
However a female employee with the same salary and period in the workforce is estimated to have a superannuation balance of
or only $109,810
196.07 0.6048(30) 0 6.4806(50)109.810
Y
39
Interpretation of coefficients
Each b coefficient can be seen as a partial derivative i.e. the rate at which the super balance changes if all other variables are kept constant
So we can interpret to mean that the estimated superannuation account balance will increase by 6.4806 thousand dollars for every extra thousand dollars of current salary, keeping gender and working years constant.
3 6.4806b
10
ResidualsObservation Predicted Superannuation Balance Residuals
1 176.3096018 -58.40960183
2 332.1030159 84.99698407
3 154.1461025 2.053897502
4 180.068458 22.83154197
5 527.5576627 -21.35766266
6 242.6276759 12.37232415
7 185.0368617 -5.236861742
8 164.6877553 -82.08775532
9 122.0455091 -74.74550913
10 421.7512099 66.74879007
11 61.08476014 9.415239862
12 76.68138694 43.41861306
13 0.072039187 15.52796081
14 14.54540213 -5.645402131
15 107.8660254 16.83397463
16 236.0952583 -12.69525831
17 342.6794104 -41.07941037
18 -24.25170421 77.05170421
19 88.20819132 -20.30819132
20 119.1853775 -29.68537752
11
residual plotsWe should check the residuals against plus (separately) against each of the independent variables. These plots are shown below and on the next slide.
iY
Residuals versus predicted Y
-100
-80
-60
-40
-20
0
20
40
60
80
100
-100 0 100 200 300 400 500 600
predicted super
r
e
s
i
d
u
a
l
s
ResidualsCheck either side of 0 level
12
residuals versus X variables
Current Salary $'000 Residual Plot
-100
0
100
0 20 40 60 80 100 120
Current Salary $'000
R
e
s
i
d
u
a
l
s
-100-50
050
100
0 10 20 30 40 50
R
e
s
i
d
u
a
l
s
Years in Workforce
Years in Workforce Residual Plot
-100-50
050
100
0 0.5 1 1.5
R
e
s
i
d
u
a
l
s
Gender
Gender Residual Plot
413
Residuals normal?
We also check to see if the residuals are normally distributed. The histogram here created from the data on slide 10 shows that they are.
Histogram
02468
-100 -50 0 50 100 More
Residual
F
r
e
q
u
e
n
c
y
Frequency
14
2. R Square
The R Square value tells us that 90.5 % of the variation in superannuation balances is explained by this model.
Remember though that if we were comparing alternative models with different numbers of variables it would be preferable to use the adjusted R Square which tells us that 88.7% of variation of superannuation is explained.
15
3. Which variables are significant?
Look at the t Stat and P-value columns. Will we reject the hypotheses that are zero using two tail tests and a 1% level of significance?
0 3and
Coefficients Standard Error t Stat P-valueIntercept -196.0738799 45.52498098 -4.30695 0.000543Years in Workforce -0.604845409 2.633748738 -0.22965 0.821272Gender 59.58681943 29.84990611 1.996215 0.06322Current Salary $'000 6.480588884 0.799454349 8.106265 4.67E-07
Which variables are significant ( cont.)
Yes we can reject
The reason is that for the Intercept,the P-value is 0.00543
5 However, we can only reject the hypothesis that the gender coefficient using a two tail test at the 10% level since its P-value is 0.06322
We are not able to reject the hypothesisthat the years in the workforce coefficient
even at the 10% level.
17
2 0
1 0
18
4. Confidence intervals for
Excel also gives us another way of analysing the beta coefficients. We can find a confidence interval for each population coefficient.
The endpoints of these are shown in the ColumnsLower 95% and Upper 95%.
We can also check these to see which intervals contain zero and which do not.
s
Lower 95% Upper 95%-292.582528 -99.5652-6.18814328 4.978452-3.6921543 122.8658
4.785821383 8.175356
19
5. Analysis of Variance
As mentioned in the last lecture, the ANOVA section of the output refers to analysis of variance and shows sums of squares due to the regression (SSR), residual or errors (SSE) and total sum of squares (SST).
We have already seen how these have been used to give the value.
They are also used to calculate the F value which is a measure of the overall significance of the regression model.
2R
20
F testFor a regression with only one independent variable the significance level of F is the same as the p-value for the slopes t test. In a multiple regression the F value is used to perform a joint test of the regression coefficients i.e. that
against the alternative At least one of the coefficients is
non-zerowhere the test statistic is
From the Excel output we see F = 50.90211
0 1 2 ... 0KH
1 2, ,... K /
/ 1SSR k MSRF
SSE n k MSE
1 :H
621
The F tables are at Table E4 in your textbook. Four separate tables are given for alpha levels of 0.05, 0.025, 0.01 and 0.005 in the upper tail.
The F distribution depends on degrees of freedom for both the numerator and denominator.
If we use a 1% level of significance 3 degrees of freedom in the numerator and 16 in the denominator, the critical value is
Since F = 50.90211 > 5.29 we reject the null hypothesis.
F tables
0.01,3,16 5.29F
22
F test conclusion
Instead of using tables you can simply read off the Significance F relating to F=50.90211 and compare it with the desired level of significance.
The given value is 2.09087E-08or 2.09087x10-8 < 0.01
At the 1 % level we would reject
Conclusion: there is a linear relationship between at least one of the variables and the superannuation balance.
0...210 KH
23
6.Time Series and Forecasting
Time series data shows values for a variable or set of variables over time.
Graphs are usually drawn with time on the horizontal axis and lines connecting the data points. Often patterns can be determined such as seasonal variations and long term trends.
What do you notice about temperatures here?
Source: Bureau of Meteorologyhttp://www.bom.gov.au/cgi-bin/climate/change/timeseries.cgi
7Compare this graph
Source: Bureau of Meteorology http://www. bom.gov.au /cgi-bin/climate/change/timeseries.cgi 25
Also compare this one
Source: Bureau of Meteorology http://www. bom.gov.au /cgi-bin/climate/change/timeseries.cgi 26
27
Trend and Seasonal component
TrendThe trend is the continuous long term movement in a variable over a period of time. If a linear relationship is appropriate, the most widely used technique for isolating the trend is to use a linear regression with where t the independent variable is time.
Seasonal ComponentMany economic variables fluctuate on a regular basis throughout a defined period, usually a year. This may be due to agricultural growing and harvest cycles, holiday periods such as Christmas or time when school leavers join the workforce. If only annual data were used the model would not need to include a seasonal component.
tbbYt 10
28
Cyclical and irregular components
Cyclical VariationBusiness cycles with upswings, peaks, contractions and troughs will produce a wavelike effect on time series over a relatively long period.
Irregular fluctuationsThese will be often caused by natural disasters such as floods, cyclones and tsunamis or man-made disruptions such as wars and elections.
829
additive model
A time series can be decomposed into several components. In an additive model these are usually regarded as the Trend (T), the Seasonal component (S), Cyclical variations (C) and some Irregular or random fluctuations (I). Therefore the additive model can be written as
ttttt ICSTY
30
additive example
If we were to develop a monthly times series model of the sales value of swimwear in a retail store with the additive model
we might find that at time t = 12 Y = $2,000 + $3,000 -$500 +$50 = $4,550
The large positive seasonal component ($3,000) reflects that swimwear sales are strongly seasonal and are high in month 12 (December). The negative value for the cyclical component might be the result of a downswing in the business cycle.
31
multiplicative model
The additive model suffers from the assumption that the components are independent of each other. A more realistic alternative is the multiplicative model which can be written as
Here only T is expressed in original units and the other terms are proportions.
ttttt ICSTY
32
Petrol example
We will use a petrol price example to see how to separate out the trend and (day of week) cyclical components from a set of data and to use this to forecast future prices. Week 1 Week 2 Week 3 Week 4
Sun 1.20 1.23 1.25 1.28Mon 1.18 1.22 1.22 1.26Tue 1.16 1.21 1.20 1.25Wed 1.18 1.21 1.32 1.26Thu 1.27 1.30 1.32 1.35Fri 1.25 1.28 1.30 1.34Sat 1.24 1.26 1.29 1.30
933
Step 1
We need to find the trend line so will arrange the prices in a single column and number the days from Sunday in Week 1 as shown at left then perform a regression with price as the dependent variable and time as the independent one.
The result is the equation
tY 0043.01917.1
Day Price1 1.202 1.183 1.164 1.185 1.276 1.257 1.248 1.239 1.22
10 1.2111 1.2112 1.3013 1.2814 1.2615 1.2516 1.2217 1.2018 1.3219 1.3220 1.3021 1.2922 1.2823 1.2624 1.2525 1.2626 1.3527 1.3428 1.30
34
Trend line
The plot below shows how the daily price fluctuates in a fairly regular pattern around this trend line.
Day Line Fit Plot
1.10
1.20
1.30
1.40
0 10 20 30
Day
P
r
i
c
e PricePredicted Price
35
Step 2 multiplicative model
Assume we have a multiplicative model with a cyclical component but no seasonal one. We will try to adjust the prices to remove the cyclical (day of week) component.
Copy the predicted prices from the regression output onto the data page. Divide the price for each time period by the corresponding predicted price. Thus we have an estimate
tt t
t
Y C IT
t t t tY TC I
36
Next find the average (price/predicted price) for each day of the week. For the four Sundays we find the average for observations 1,8,15 and 22.
We calculate seven averages in all, one for each day using the four weeks of observations.
These values become the multiplicative cyclical (i.e. daily) index. We can check that the index numbers add to 7. In this case they are very close to 7 so will not need to be adjusted.
10
37
The adjusted index is then found by dividing each daily price by the index for the corresponding day.
For example to find the adjusted price for Day 1(a Sunday), as we see on the next slide
adjusted price 1.20 0.998808 1.201
38
adjusting first 12 days shown
Day Price Predicted Price Price/predicted price Day
Average (price/pred price)
Adjusted series
1 1.20 1.196009852 1.003336216 Sun 0.998808306 1.2014322 1.18 1.200353038 0.983044124 Mon 0.979308748 1.2049323 1.16 1.204696223 0.962898345 Tue 0.963849185 1.2035084 1.18 1.209039409 0.975981421 Wed 0.990193085 1.1916875 1.27 1.213382594 1.046660802 Thu 1.040655105 1.2203856 1.25 1.21772578 1.026503685 Fri 1.023141684 1.2217277 1.24 1.222068966 1.014672686 Sat 1.00403247 1.235028 1.23 1.226412151 1.002925484 1.225919 1.22 1.230755337 0.99126119 6.999988584 1.241043
10 1.21 1.235098522 0.979678931 1.25662311 1.21 1.239441708 0.976245992 1.23977812 1.30 1.243784893 1.045196808 1.242045
39
days 13-28
13 1.28 1.248128079 1.025535778 1.24695114 1.26 1.252471264 1.006011104 1.2417815 1.25 1.25681445 0.994577998 1.24584416 1.22 1.261157635 0.967365193 1.24104317 1.20 1.265500821 0.948241186 1.24623718 1.32 1.269844007 1.03949776 1.35248519 1.32 1.274187192 1.035954535 1.26115420 1.30 1.278530378 1.016792423 1.26643521 1.29 1.282873563 1.005555058 1.27134622 1.28 1.287216749 0.994393525 1.27574423 1.26 1.291559934 0.975564483 1.28173324 1.25 1.29590312 0.964578278 1.29816425 1.26 1.300246305 0.969047168 1.29100826 1.35 1.304589491 1.034808274 1.28981627 1.34 1.308932677 1.023734852 1.30540228 1.30 1.313275862 0.989891033 1.281201
40
forecasting
If, instead of adjusting observed prices, we wish to make a forecast from our trend line we need to include the cyclical component.
If we wish to forecast the price at day 35 we would substitute t = 35 into the regression equation and multiply the result by the index relating to a Saturday.
Thus the forecast of is 1.0040 (1.1917 + 0.0043(35)) = 1.3476
35Y
11
41
Step 2- additive model
After copying the predicted prices to the data page this time they should be subtracted from the observed prices to give . Then the average (price-predicted price) for each day of the week should be found , giving a daily index.
t tC I
t t t tY T C I
42
adjusting first 18 days shown
Day Price Predicted Price
Price -predicted price Day
Average (price-pred.price)
Adjusted series
1 1.20 1.196009852 0.003990148 Sun -0.0016133 1.201613 2 1.18 1.200353038 -0.020353038Mon -0.025956486 1.205956 3 1.16 1.204696223 -0.044696223Tue -0.045299672 1.2053 4 1.18 1.209039409 -0.029039409Wed -0.012142857 1.192143 5 1.27 1.213382594 0.056617406 Thu 0.051013957 1.218986 6 1.25 1.21772578 0.03227422 Fri 0.029170772 1.220829 7 1.24 1.222068966 0.017931034 Sat 0.004827586 1.235172 8 1.23 1.226412151 0.003587849 1.231613 9 1.22 1.230755337 -0.010755337total -3.88578E-16 1.245956 10 1.21 1.235098522 -0.025098522 1.2553 11 1.21 1.239441708 -0.029441708 1.222143 12 1.30 1.243784893 0.056215107 1.248986 13 1.28 1.248128079 0.031871921 1.250829 14 1.26 1.252471264 0.007528736 1.255172 15 1.25 1.25681445 -0.00681445 1.251613 16 1.22 1.261157635 -0.041157635 1.245956 17 1.20 1.265500821 -0.065500821 1.2453 18 1.32 1.269844007 0.050155993 1.332143
43
forecasting
The daily adjusted series is found by subtracting the index for the relevant day from the observed.
When forecasting with the additive model it should be assumed I = 0 and the index should be added to the estimate from the trend.
As day 36 is Sunday forecast of would be1.1917+0.0043(36)+(-0.0016) = 1.3449
36Y
44
7. Price indices
What do the CPI, S&P/ASX 200 and the Hang Seng all have in common? They are indices.
A simple price index looks at only one item e.g. the price of 1 kg of navel oranges in 2012 compared with the base year of 2000
2012price 2.99100 100 186.8752000 price 1.60
12
45
simple aggregate index
A composite index is made up of changes in a number of items.
A simple aggregate index can be found by finding the sum of current prices x 100 divided by the sum of base year pricesor
0
100npp
46
Example 1
Construct a simple aggregate index for the following basket of food items
Item 2000 2012Zucchini/kg 3.99 5.99Mushrooms/kg 6.50 7.99Pink Lady Apples/kg
3.99 5.99
Navel Oranges/kg 1.60 2.99
47
deficiencies
the main deficiency of a simple aggregate index is that it takes no account of quantities purchased.
if prices are quoted in different units, eg per mushroom instead of per kilo, the index will be affected and give a different result.
a large price for one item may dominate
48
Weighted indices
Weighted indices allow greater importance to be given to items for which greater quantities are sold or consumed
Laspeyres index uses base period quantities as weights
It can be used to compare prices between other periods
0( )q
0
0 0
Laspeyres index 100np qp q
13
Example 2- Laspeyres Index 2012
Calculate the
49
Item 2000 p 2012p 2000 q 2012q
Zucchini/kg 3.99 5.99 3.2 4.3
Mushrooms/kg 6.50 7.99 1.2 1.5
Pink Lady Apples/kg
3.99 5.99 5.2 5.6
Navel Oranges/kg
1.60 2.99 6.2 7.0
50
The Paasche index used the current period quantities but has some practical problems such as obtaining quantity data for every period
0
Paasche index = 100n nn
p qp q
51
NoticesDont forget to complete CATEI evaluations for this course on myUNSW. They are carried out anonymously and will help us plan for future changes in the course.
We will hand out the assignments which have been corrected so far during Week 12 tutorials. If there are any we have not managed to mark by Thursday the rest will be handed out during the Week 13 tutorials or can be picked up from Judiths office after that date.
Stuvac consultations
Judiths consultation times up to the exam will be slightly different:Week 13 as normal- Tuesday 2-4, Thursday 4-5Tuesday June 9, 2-4Thursday June 11, 2-3Monday June 15, 2-4Thursday June 18, 2-3Tuesday June 23, 2-4Thursday June 25, 2-3and by appointment 52
14
53
Exam
The exam will consist of two parts: Part A: 16 multiple choice questions on both
maths and statistics, each worth 1 mark. Use a pencil to mark answers and your personal details.
Part B: 3 written problems with two of the three based on the statistics section of the course. They are not all of equal marks so plan your time carefully.
Total marks for the exam: 50 Please bring an approved calculator, textbook,
notes, pencil, pen, ruler, eraser. No tables will be supplied (you can use the textbook ones).
And last of all. Dont forget that the final Regression eLearning
tutorial will run in week 13 to help with your revision.
Thanks for your participation in COMM5005. We hope you have learned some useful skills and that your efforts are rewarded with good results.
54