Top Banner

of 14

Lecture 12

Jan 10, 2016

Download

Documents

gasb
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 1Business SchoolJ.Watson, Semester 1 2015

    1

    Quantitative Methods for Business COMM5005 Lecture 12

    In this final lecture we will look at how to interpret the Excel

    output for a multiple regression see how to separate the trend and

    cyclical components in time series data

    forecastingsee how to calculate index numbers

    2

    Readings

    For todays topics Berenson et al.Ch 12.7Ch 13.1-13.4, 13.6Ch 14.1-14.4Ch 14.9

    3

    1.Multiple regression example We will now look at an example of a multiple

    regression where we have an employees superannuation account balance as the dependent variable. There is data from 20 employees.

    We are trying to estimate the relationship that the independent variables years in workforce, gender and current salary have on the superannuation balance.

    Gender is a qualitative variable. We represent it by using a dummy variable which has a value of 1 for a male and 0 for a female employee.

    4

    superannuation dataY e a r s in W o r k f o r c e G e n d e r

    S a la r y $ '0 0 0

    S u p e r a n n u a t io n B a la n c e $ '0 0 0

    2 5 1 5 0 .6 1 1 7 .93 1 1 7 5 .2 4 1 7 .13 7 1 4 8 .3 1 5 6 .23 7 1 5 2 .3 2 0 2 .94 0 1 1 0 6 .2 5 0 6 .23 0 1 6 1 .3 2 5 53 2 1 5 2 .6 1 7 9 .82 6 1 4 8 .9 8 2 .62 9 1 4 2 .6 4 7 .33 6 1 8 9 .5 4 8 8 .52 8 1 3 3 .1 7 0 .52 9 1 3 5 .6 1 2 0 .11 0 0 3 1 .2 1 5 .61 5 0 3 3 .9 8 .93 0 0 4 9 .7 1 2 4 .72 8 0 6 9 .3 2 2 3 .43 5 0 8 6 .4 3 0 1 .61 7 0 2 8 .1 5 2 .82 5 0 4 6 .2 6 7 .92 2 0 5 0 .7 8 9 .5

  • 25

    Excel output

    SUMMARY OUTPUT

    Regression StatisticsMultiple R 0.951399308R Square 0.905160643Adjusted R Square 0.887378264Standard Error 50.11767261Observations 20

    ANOVAdf SS MS F Significance F

    Regression 3 383564.8798 127855 50.90211 2.09087E-08Residual 16 40188.49773 2511.781Total 19 423753.3775

    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept -196.0738799 45.52498098 -4.306951 0.000543 -292.582528 -99.56523Years in Workforce -0.604845409 2.633748738 -0.229652 0.821272 -6.18814328 4.978452Gender 59.58681943 29.84990611 1.996215 0.06322 -3.6921543 122.8658Current Salary $'000 6.480588884 0.799454349 8.106265 4.67E-07 4.785821383 8.175356

    6

    Regression equationThe sample regression equation is now

    We can no longer draw this as a line because it is in four dimensional space. From the Excel output below we see that the equation should be

    1 2 3

    1

    2

    3

    196.07 0.6048 59.5868 6.4806 where predicted superannuation balance $'000

    years in workforce genderX salary$'000

    i i i i

    i

    i

    i

    i

    Y X X X

    YXX

    0 1 1 2 2 3 3i i i iY b b X b X b X

    7

    Estimate 1

    So a male employee with a current salary of $50,000 who has worked for 30 years is estimated to have a superannuation balance of

    or $169,397

    196.07 0.6048(30) 59.5868(1) 6.4805(50)169.397

    Y

    8

    Estimate 2

    However a female employee with the same salary and period in the workforce is estimated to have a superannuation balance of

    or only $109,810

    196.07 0.6048(30) 0 6.4806(50)109.810

    Y

  • 39

    Interpretation of coefficients

    Each b coefficient can be seen as a partial derivative i.e. the rate at which the super balance changes if all other variables are kept constant

    So we can interpret to mean that the estimated superannuation account balance will increase by 6.4806 thousand dollars for every extra thousand dollars of current salary, keeping gender and working years constant.

    3 6.4806b

    10

    ResidualsObservation Predicted Superannuation Balance Residuals

    1 176.3096018 -58.40960183

    2 332.1030159 84.99698407

    3 154.1461025 2.053897502

    4 180.068458 22.83154197

    5 527.5576627 -21.35766266

    6 242.6276759 12.37232415

    7 185.0368617 -5.236861742

    8 164.6877553 -82.08775532

    9 122.0455091 -74.74550913

    10 421.7512099 66.74879007

    11 61.08476014 9.415239862

    12 76.68138694 43.41861306

    13 0.072039187 15.52796081

    14 14.54540213 -5.645402131

    15 107.8660254 16.83397463

    16 236.0952583 -12.69525831

    17 342.6794104 -41.07941037

    18 -24.25170421 77.05170421

    19 88.20819132 -20.30819132

    20 119.1853775 -29.68537752

    11

    residual plotsWe should check the residuals against plus (separately) against each of the independent variables. These plots are shown below and on the next slide.

    iY

    Residuals versus predicted Y

    -100

    -80

    -60

    -40

    -20

    0

    20

    40

    60

    80

    100

    -100 0 100 200 300 400 500 600

    predicted super

    r

    e

    s

    i

    d

    u

    a

    l

    s

    ResidualsCheck either side of 0 level

    12

    residuals versus X variables

    Current Salary $'000 Residual Plot

    -100

    0

    100

    0 20 40 60 80 100 120

    Current Salary $'000

    R

    e

    s

    i

    d

    u

    a

    l

    s

    -100-50

    050

    100

    0 10 20 30 40 50

    R

    e

    s

    i

    d

    u

    a

    l

    s

    Years in Workforce

    Years in Workforce Residual Plot

    -100-50

    050

    100

    0 0.5 1 1.5

    R

    e

    s

    i

    d

    u

    a

    l

    s

    Gender

    Gender Residual Plot

  • 413

    Residuals normal?

    We also check to see if the residuals are normally distributed. The histogram here created from the data on slide 10 shows that they are.

    Histogram

    02468

    -100 -50 0 50 100 More

    Residual

    F

    r

    e

    q

    u

    e

    n

    c

    y

    Frequency

    14

    2. R Square

    The R Square value tells us that 90.5 % of the variation in superannuation balances is explained by this model.

    Remember though that if we were comparing alternative models with different numbers of variables it would be preferable to use the adjusted R Square which tells us that 88.7% of variation of superannuation is explained.

    15

    3. Which variables are significant?

    Look at the t Stat and P-value columns. Will we reject the hypotheses that are zero using two tail tests and a 1% level of significance?

    0 3and

    Coefficients Standard Error t Stat P-valueIntercept -196.0738799 45.52498098 -4.30695 0.000543Years in Workforce -0.604845409 2.633748738 -0.22965 0.821272Gender 59.58681943 29.84990611 1.996215 0.06322Current Salary $'000 6.480588884 0.799454349 8.106265 4.67E-07

    Which variables are significant ( cont.)

    Yes we can reject

    The reason is that for the Intercept,the P-value is 0.00543

  • 5 However, we can only reject the hypothesis that the gender coefficient using a two tail test at the 10% level since its P-value is 0.06322

    We are not able to reject the hypothesisthat the years in the workforce coefficient

    even at the 10% level.

    17

    2 0

    1 0

    18

    4. Confidence intervals for

    Excel also gives us another way of analysing the beta coefficients. We can find a confidence interval for each population coefficient.

    The endpoints of these are shown in the ColumnsLower 95% and Upper 95%.

    We can also check these to see which intervals contain zero and which do not.

    s

    Lower 95% Upper 95%-292.582528 -99.5652-6.18814328 4.978452-3.6921543 122.8658

    4.785821383 8.175356

    19

    5. Analysis of Variance

    As mentioned in the last lecture, the ANOVA section of the output refers to analysis of variance and shows sums of squares due to the regression (SSR), residual or errors (SSE) and total sum of squares (SST).

    We have already seen how these have been used to give the value.

    They are also used to calculate the F value which is a measure of the overall significance of the regression model.

    2R

    20

    F testFor a regression with only one independent variable the significance level of F is the same as the p-value for the slopes t test. In a multiple regression the F value is used to perform a joint test of the regression coefficients i.e. that

    against the alternative At least one of the coefficients is

    non-zerowhere the test statistic is

    From the Excel output we see F = 50.90211

    0 1 2 ... 0KH

    1 2, ,... K /

    / 1SSR k MSRF

    SSE n k MSE

    1 :H

  • 621

    The F tables are at Table E4 in your textbook. Four separate tables are given for alpha levels of 0.05, 0.025, 0.01 and 0.005 in the upper tail.

    The F distribution depends on degrees of freedom for both the numerator and denominator.

    If we use a 1% level of significance 3 degrees of freedom in the numerator and 16 in the denominator, the critical value is

    Since F = 50.90211 > 5.29 we reject the null hypothesis.

    F tables

    0.01,3,16 5.29F

    22

    F test conclusion

    Instead of using tables you can simply read off the Significance F relating to F=50.90211 and compare it with the desired level of significance.

    The given value is 2.09087E-08or 2.09087x10-8 < 0.01

    At the 1 % level we would reject

    Conclusion: there is a linear relationship between at least one of the variables and the superannuation balance.

    0...210 KH

    23

    6.Time Series and Forecasting

    Time series data shows values for a variable or set of variables over time.

    Graphs are usually drawn with time on the horizontal axis and lines connecting the data points. Often patterns can be determined such as seasonal variations and long term trends.

    What do you notice about temperatures here?

    Source: Bureau of Meteorologyhttp://www.bom.gov.au/cgi-bin/climate/change/timeseries.cgi

  • 7Compare this graph

    Source: Bureau of Meteorology http://www. bom.gov.au /cgi-bin/climate/change/timeseries.cgi 25

    Also compare this one

    Source: Bureau of Meteorology http://www. bom.gov.au /cgi-bin/climate/change/timeseries.cgi 26

    27

    Trend and Seasonal component

    TrendThe trend is the continuous long term movement in a variable over a period of time. If a linear relationship is appropriate, the most widely used technique for isolating the trend is to use a linear regression with where t the independent variable is time.

    Seasonal ComponentMany economic variables fluctuate on a regular basis throughout a defined period, usually a year. This may be due to agricultural growing and harvest cycles, holiday periods such as Christmas or time when school leavers join the workforce. If only annual data were used the model would not need to include a seasonal component.

    tbbYt 10

    28

    Cyclical and irregular components

    Cyclical VariationBusiness cycles with upswings, peaks, contractions and troughs will produce a wavelike effect on time series over a relatively long period.

    Irregular fluctuationsThese will be often caused by natural disasters such as floods, cyclones and tsunamis or man-made disruptions such as wars and elections.

  • 829

    additive model

    A time series can be decomposed into several components. In an additive model these are usually regarded as the Trend (T), the Seasonal component (S), Cyclical variations (C) and some Irregular or random fluctuations (I). Therefore the additive model can be written as

    ttttt ICSTY

    30

    additive example

    If we were to develop a monthly times series model of the sales value of swimwear in a retail store with the additive model

    we might find that at time t = 12 Y = $2,000 + $3,000 -$500 +$50 = $4,550

    The large positive seasonal component ($3,000) reflects that swimwear sales are strongly seasonal and are high in month 12 (December). The negative value for the cyclical component might be the result of a downswing in the business cycle.

    31

    multiplicative model

    The additive model suffers from the assumption that the components are independent of each other. A more realistic alternative is the multiplicative model which can be written as

    Here only T is expressed in original units and the other terms are proportions.

    ttttt ICSTY

    32

    Petrol example

    We will use a petrol price example to see how to separate out the trend and (day of week) cyclical components from a set of data and to use this to forecast future prices. Week 1 Week 2 Week 3 Week 4

    Sun 1.20 1.23 1.25 1.28Mon 1.18 1.22 1.22 1.26Tue 1.16 1.21 1.20 1.25Wed 1.18 1.21 1.32 1.26Thu 1.27 1.30 1.32 1.35Fri 1.25 1.28 1.30 1.34Sat 1.24 1.26 1.29 1.30

  • 933

    Step 1

    We need to find the trend line so will arrange the prices in a single column and number the days from Sunday in Week 1 as shown at left then perform a regression with price as the dependent variable and time as the independent one.

    The result is the equation

    tY 0043.01917.1

    Day Price1 1.202 1.183 1.164 1.185 1.276 1.257 1.248 1.239 1.22

    10 1.2111 1.2112 1.3013 1.2814 1.2615 1.2516 1.2217 1.2018 1.3219 1.3220 1.3021 1.2922 1.2823 1.2624 1.2525 1.2626 1.3527 1.3428 1.30

    34

    Trend line

    The plot below shows how the daily price fluctuates in a fairly regular pattern around this trend line.

    Day Line Fit Plot

    1.10

    1.20

    1.30

    1.40

    0 10 20 30

    Day

    P

    r

    i

    c

    e PricePredicted Price

    35

    Step 2 multiplicative model

    Assume we have a multiplicative model with a cyclical component but no seasonal one. We will try to adjust the prices to remove the cyclical (day of week) component.

    Copy the predicted prices from the regression output onto the data page. Divide the price for each time period by the corresponding predicted price. Thus we have an estimate

    tt t

    t

    Y C IT

    t t t tY TC I

    36

    Next find the average (price/predicted price) for each day of the week. For the four Sundays we find the average for observations 1,8,15 and 22.

    We calculate seven averages in all, one for each day using the four weeks of observations.

    These values become the multiplicative cyclical (i.e. daily) index. We can check that the index numbers add to 7. In this case they are very close to 7 so will not need to be adjusted.

  • 10

    37

    The adjusted index is then found by dividing each daily price by the index for the corresponding day.

    For example to find the adjusted price for Day 1(a Sunday), as we see on the next slide

    adjusted price 1.20 0.998808 1.201

    38

    adjusting first 12 days shown

    Day Price Predicted Price Price/predicted price Day

    Average (price/pred price)

    Adjusted series

    1 1.20 1.196009852 1.003336216 Sun 0.998808306 1.2014322 1.18 1.200353038 0.983044124 Mon 0.979308748 1.2049323 1.16 1.204696223 0.962898345 Tue 0.963849185 1.2035084 1.18 1.209039409 0.975981421 Wed 0.990193085 1.1916875 1.27 1.213382594 1.046660802 Thu 1.040655105 1.2203856 1.25 1.21772578 1.026503685 Fri 1.023141684 1.2217277 1.24 1.222068966 1.014672686 Sat 1.00403247 1.235028 1.23 1.226412151 1.002925484 1.225919 1.22 1.230755337 0.99126119 6.999988584 1.241043

    10 1.21 1.235098522 0.979678931 1.25662311 1.21 1.239441708 0.976245992 1.23977812 1.30 1.243784893 1.045196808 1.242045

    39

    days 13-28

    13 1.28 1.248128079 1.025535778 1.24695114 1.26 1.252471264 1.006011104 1.2417815 1.25 1.25681445 0.994577998 1.24584416 1.22 1.261157635 0.967365193 1.24104317 1.20 1.265500821 0.948241186 1.24623718 1.32 1.269844007 1.03949776 1.35248519 1.32 1.274187192 1.035954535 1.26115420 1.30 1.278530378 1.016792423 1.26643521 1.29 1.282873563 1.005555058 1.27134622 1.28 1.287216749 0.994393525 1.27574423 1.26 1.291559934 0.975564483 1.28173324 1.25 1.29590312 0.964578278 1.29816425 1.26 1.300246305 0.969047168 1.29100826 1.35 1.304589491 1.034808274 1.28981627 1.34 1.308932677 1.023734852 1.30540228 1.30 1.313275862 0.989891033 1.281201

    40

    forecasting

    If, instead of adjusting observed prices, we wish to make a forecast from our trend line we need to include the cyclical component.

    If we wish to forecast the price at day 35 we would substitute t = 35 into the regression equation and multiply the result by the index relating to a Saturday.

    Thus the forecast of is 1.0040 (1.1917 + 0.0043(35)) = 1.3476

    35Y

  • 11

    41

    Step 2- additive model

    After copying the predicted prices to the data page this time they should be subtracted from the observed prices to give . Then the average (price-predicted price) for each day of the week should be found , giving a daily index.

    t tC I

    t t t tY T C I

    42

    adjusting first 18 days shown

    Day Price Predicted Price

    Price -predicted price Day

    Average (price-pred.price)

    Adjusted series

    1 1.20 1.196009852 0.003990148 Sun -0.0016133 1.201613 2 1.18 1.200353038 -0.020353038Mon -0.025956486 1.205956 3 1.16 1.204696223 -0.044696223Tue -0.045299672 1.2053 4 1.18 1.209039409 -0.029039409Wed -0.012142857 1.192143 5 1.27 1.213382594 0.056617406 Thu 0.051013957 1.218986 6 1.25 1.21772578 0.03227422 Fri 0.029170772 1.220829 7 1.24 1.222068966 0.017931034 Sat 0.004827586 1.235172 8 1.23 1.226412151 0.003587849 1.231613 9 1.22 1.230755337 -0.010755337total -3.88578E-16 1.245956 10 1.21 1.235098522 -0.025098522 1.2553 11 1.21 1.239441708 -0.029441708 1.222143 12 1.30 1.243784893 0.056215107 1.248986 13 1.28 1.248128079 0.031871921 1.250829 14 1.26 1.252471264 0.007528736 1.255172 15 1.25 1.25681445 -0.00681445 1.251613 16 1.22 1.261157635 -0.041157635 1.245956 17 1.20 1.265500821 -0.065500821 1.2453 18 1.32 1.269844007 0.050155993 1.332143

    43

    forecasting

    The daily adjusted series is found by subtracting the index for the relevant day from the observed.

    When forecasting with the additive model it should be assumed I = 0 and the index should be added to the estimate from the trend.

    As day 36 is Sunday forecast of would be1.1917+0.0043(36)+(-0.0016) = 1.3449

    36Y

    44

    7. Price indices

    What do the CPI, S&P/ASX 200 and the Hang Seng all have in common? They are indices.

    A simple price index looks at only one item e.g. the price of 1 kg of navel oranges in 2012 compared with the base year of 2000

    2012price 2.99100 100 186.8752000 price 1.60

  • 12

    45

    simple aggregate index

    A composite index is made up of changes in a number of items.

    A simple aggregate index can be found by finding the sum of current prices x 100 divided by the sum of base year pricesor

    0

    100npp

    46

    Example 1

    Construct a simple aggregate index for the following basket of food items

    Item 2000 2012Zucchini/kg 3.99 5.99Mushrooms/kg 6.50 7.99Pink Lady Apples/kg

    3.99 5.99

    Navel Oranges/kg 1.60 2.99

    47

    deficiencies

    the main deficiency of a simple aggregate index is that it takes no account of quantities purchased.

    if prices are quoted in different units, eg per mushroom instead of per kilo, the index will be affected and give a different result.

    a large price for one item may dominate

    48

    Weighted indices

    Weighted indices allow greater importance to be given to items for which greater quantities are sold or consumed

    Laspeyres index uses base period quantities as weights

    It can be used to compare prices between other periods

    0( )q

    0

    0 0

    Laspeyres index 100np qp q

  • 13

    Example 2- Laspeyres Index 2012

    Calculate the

    49

    Item 2000 p 2012p 2000 q 2012q

    Zucchini/kg 3.99 5.99 3.2 4.3

    Mushrooms/kg 6.50 7.99 1.2 1.5

    Pink Lady Apples/kg

    3.99 5.99 5.2 5.6

    Navel Oranges/kg

    1.60 2.99 6.2 7.0

    50

    The Paasche index used the current period quantities but has some practical problems such as obtaining quantity data for every period

    0

    Paasche index = 100n nn

    p qp q

    51

    NoticesDont forget to complete CATEI evaluations for this course on myUNSW. They are carried out anonymously and will help us plan for future changes in the course.

    We will hand out the assignments which have been corrected so far during Week 12 tutorials. If there are any we have not managed to mark by Thursday the rest will be handed out during the Week 13 tutorials or can be picked up from Judiths office after that date.

    Stuvac consultations

    Judiths consultation times up to the exam will be slightly different:Week 13 as normal- Tuesday 2-4, Thursday 4-5Tuesday June 9, 2-4Thursday June 11, 2-3Monday June 15, 2-4Thursday June 18, 2-3Tuesday June 23, 2-4Thursday June 25, 2-3and by appointment 52

  • 14

    53

    Exam

    The exam will consist of two parts: Part A: 16 multiple choice questions on both

    maths and statistics, each worth 1 mark. Use a pencil to mark answers and your personal details.

    Part B: 3 written problems with two of the three based on the statistics section of the course. They are not all of equal marks so plan your time carefully.

    Total marks for the exam: 50 Please bring an approved calculator, textbook,

    notes, pencil, pen, ruler, eraser. No tables will be supplied (you can use the textbook ones).

    And last of all. Dont forget that the final Regression eLearning

    tutorial will run in week 13 to help with your revision.

    Thanks for your participation in COMM5005. We hope you have learned some useful skills and that your efforts are rewarded with good results.

    54