Top Banner
SADC Course in Statistics Predictions from the regression model (Session 09)
22

SADC Course in Statistics Predictions from the regression model (Session 09)

Mar 28, 2015

Download

Documents

Alexandra Dolan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SADC Course in Statistics Predictions from the regression model (Session 09)

SADC Course in Statistics

Predictions from the regression model

(Session 09)

Page 2: SADC Course in Statistics Predictions from the regression model (Session 09)

2To put your footer here go to View > Header and Footer

Learning Objectives

At the end of this session, you will be able to

• assess the regression model in terms of its appropriateness as a prediction equation

• distinguish between two types of possible predictions and why their precision differs

• use software to produce predictors and corresponding confidence intervals

• understand dangers of predicting beyond values used in developing the predictor

Page 3: SADC Course in Statistics Predictions from the regression model (Session 09)

3To put your footer here go to View > Header and Footer

An Example

Consider a health survey of a random sample of 20 districts in which one variable of interest was the perinatal mortality rate, i.e. no. of deaths in first 30 days after birth per 1000 live births;

In all districts, the number of health centres per 1000 households is known.

Suppose a health official wants to predict perinatal mortality in districts not covered by the survey.

Page 4: SADC Course in Statistics Predictions from the regression model (Session 09)

4To put your footer here go to View > Header and Footer

The data (fictitious)

mort hcentres mort hcentres

19.0 11.2 13.8 12.2

15.9 12.2 14.9 13.7

15.4 11.1 16.8 11.2

15.2 12.8 10.3 13.2

12.8 13.0 21.3 10.6

19.6 11.8 14.7 11.3

22.6 9.5 12.8 11.2

17.0 10.5 19.2 9.5

21.3 9.0 12.2 12.2

12.7 13.0 15.3 10.4

Page 5: SADC Course in Statistics Predictions from the regression model (Session 09)

5To put your footer here go to View > Header and Footer

Results of a linear regression model----------------------------------------- mort | Coef. Std.Err. t P>|t|----------+------------------------------ hcentres | -1.9139 .4071 -4.70 0.000 const. | 38.111 4.703 8.10 0.000 -----------------------------------------

Adjusted R2 = 52.6%

The equation of the line is:

mort = 38.11 - 1.914 (hcentres)

Page 6: SADC Course in Statistics Predictions from the regression model (Session 09)

6To put your footer here go to View > Header and Footer

Predictions from the modelThere are two types of predictions possible:

1. What would be the average mortality rate in districts where there are only 9 health centres per 1000 households?

2. A particular district has 9 health centres per 1000 hhs. What would be the predicted mortality rate in this district?

The answer to both predictions are:mortality = 38.11 - 1.914 (9) = 20.88

But would their standard errors be the same?

Page 7: SADC Course in Statistics Predictions from the regression model (Session 09)

7To put your footer here go to View > Header and Footer

Standard errors of the predictionsThe standard error for (1), call it “sepred”, can be obtained with statistical software.

The standard error for (2) is the square root of:(sepred)2 + (residual M.S. in anova)

The addition of the RMS is to account for variation of individual values about their mean.

Note: For the case where there is only 1 regressor variable, a simple formula does exist (see final slide).

Page 8: SADC Course in Statistics Predictions from the regression model (Session 09)

8To put your footer here go to View > Header and Footer

Confidence intervals for predictions

The standard error can be used to determine (say) 95% confidence intervals for each type of prediction.

Because individual predictions have a larger standard error, the corresponding 95% C.I. will be wider for such predictions.

The graphs below show the difference in precision (Stata output)

Page 9: SADC Course in Statistics Predictions from the regression model (Session 09)

9To put your footer here go to View > Header and Footer

Confidence intervals for predicted mean values

Page 10: SADC Course in Statistics Predictions from the regression model (Session 09)

10To put your footer here go to View > Header and Footer

Confidence intervals for predicted individual values

Page 11: SADC Course in Statistics Predictions from the regression model (Session 09)

11To put your footer here go to View > Header and Footer

Dangers of extrapolation

It is clear from above graphs that the confidence bands widen as predictions are made away from the mean of the x values.

This is also true of predictions made from results of a multiple linear regression model.

Hence extrapolating to make predictions beyond x’s used in the model fitting can be dangerous since they would be very imprecise!

Page 12: SADC Course in Statistics Predictions from the regression model (Session 09)

12To put your footer here go to View > Header and Footer

A real example from TanzaniaOne aim of the Adult Morbidity and Mortality Project (AMMP) in Tanzania was to investigate how health outcomes varied across different levels of poverty.

All households in several sentinel sites were monitored over many years…

But, initially there was no measure of income poverty to address above objective.

Question: Can a prediction model be developed to give a poverty measure using socio-economic variables?

Page 13: SADC Course in Statistics Predictions from the regression model (Session 09)

13To put your footer here go to View > Header and Footer

Results for a model for households in Tabora region------------------------------------------ lnexpdf | Coef. Std.Err. t P>|t|-------------+---------------------------- hhsize | -.22667 .01213 -18.69 0.000 hhsize2 | .00956 .00085 11.24 0.000 water | .11726 .03007 3.90 0.000 qmeat | .11932 .0100 11.93 0.000 qmilk | .03158 .00601 5.26 0.000 iron | .20931 .03164 6.62 0.000 table | .18927 .03559 5.32 0.000 wheatf | .29468 .03232 9.12 0.000 seeds | .18994 .05095 3.73 0.000num_meal | .15502 .03265 4.75 0.000 const. | 9.1540 .0930 98.41 0.000------------------------------------------

Page 14: SADC Course in Statistics Predictions from the regression model (Session 09)

14To put your footer here go to View > Header and Footer

Model equation

lnexpdf (y) = 9.154 – 0.227(hhsize)

+ 0.0096(hhsize)2 + 0.117(water)

+ 0.119(qmeat) + 0.032(qmilk)

+ 0.209(iron) +0.189(table)

+ 0.295(wheatf) + 0.190(seeds)

+0.155(num_meal)

Page 15: SADC Course in Statistics Predictions from the regression model (Session 09)

15To put your footer here go to View > Header and Footer

How good is the prediction?R2=58%. A plot of predicted values vs actual also indicates closeness of prediction.

Page 16: SADC Course in Statistics Predictions from the regression model (Session 09)

16To put your footer here go to View > Header and Footer

But terciles tabulation of actual versus predicted, less promising!

terciles | terciles of prediction | of lnexpdf| Lowest Middle Highest | Total----------+-------------------------+------ Lowest | 248 80 13 | 341 Middle | 95 182 74 | 351 Highest | 9 86 262 | 357----------+-------------------------+------ Total | 352 348 349 | 1049----------+-------------------------+------

Percent correctly classified = [(248 + 182 + 262)/1049]*100 = 66%

Page 17: SADC Course in Statistics Predictions from the regression model (Session 09)

17To put your footer here go to View > Header and Footer

How many HHs are correctly classified as being below poverty line?

---------------+-------------------+------If really below| Below poverty line|basic needs | on predictions |poverty line? | No Yes | Total---------------+-------------------+------ No | 838 42 | 880 Yes | 99 70 | 169---------------+-------------------+------ Total | 937 112 | 1,049---------------+-------------------+------

Percent correctly classified as below line = [(838 + 70)/1049]*100 = 86.6%

Page 18: SADC Course in Statistics Predictions from the regression model (Session 09)

18To put your footer here go to View > Header and Footer

Does the prediction address the original objective?

The answer is “yes”.

Although at a household level, the predictions were not so good, health outcomes like mortality rates, percent immunised, etc., were measured in AMMP at community level.

The predictions, averaged over HHs in a community were satisfactory, e.g. further analysis showed that the average prediction percent error was under 6%.

Page 19: SADC Course in Statistics Predictions from the regression model (Session 09)

19To put your footer here go to View > Header and Footer

Key Steps in model prediction…

• Select carefully a set of potential explanatory variables

• Decide the direction of influence expected (+ or – for regression coefficients)

• Use a suitable model selection approach to identify the subset of variables contributing significantly to explaining the variation in the key response(y)

• Conduct a residual analysis and take remedial action if problems occur

Page 20: SADC Course in Statistics Predictions from the regression model (Session 09)

20To put your footer here go to View > Header and Footer

Key Steps in model prediction…• Check whether the final model makes sense

in terms of the explanatory variables included and the signs of their coefficients

• Consider whether there are other latent variables which may be non-measurable but influential! Conclusions should mention these!

• Examine how well the prediction performs, studying also the precision of the prediction

• Give the form of the model equation and corresponding conclusions

Page 21: SADC Course in Statistics Predictions from the regression model (Session 09)

21To put your footer here go to View > Header and Footer

Formulae for standard errors for a simple linear regression(for reference only)

2022

i

x x1s 1

n (x x)

2022

i

x x1sn (x x)

If s2 represents the Residual Mean Square in the anova, the std. error for the predicted mean =

Std. error for a predicted individual value =

Page 22: SADC Course in Statistics Predictions from the regression model (Session 09)

22To put your footer here go to View > Header and Footer

Practical work follows to ensure learning objectives are

achieved…