Top Banner
Basics of Regression Analysis
38

Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Dec 31, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Basics of Regression Analysis

Page 2: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Determination of three performance measures

• Estimation of the effect of each factor

• Explanation of the variability

• Forecasting Error

Page 3: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Two Predictor Variables

Population Regression Model:

Y = 0 + 1X1 + 2X2 + ee following N(0, )

Unknown parameters: 0, 1, 2;

Page 4: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

From Data to Estimates of Coefficients

Principle:

Least Squares

Normal Equation Systems

Estimates ofCoefficients

MathematicsComputingAlgorithm

Page 5: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Least Squares Method

y

x1

x2

*

*

*

*

*

*

*

*

*

*

e

y

*

*

*

*

*

*

*

*y

x

e

y

Simple Regression Multiple Regression

Y=b0 + b1X Y = b0 + b1X1 + b2X2

2

1ˆn

i iiMinimize Y Y

Page 6: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Matrix Computation for b

• Normal Equation System: (XTX) b = XTY– See Text Appendix D.3

• Solution for b: b = (XTX)-1 (XTY)

Page 7: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Standardized Regression Coefficients,

• Definition

– b0 = 0

– the beta coefficient

• Used to show relative weights of predictors.

bk'

bk' = sX

sY bk for k = 1, 2

Page 8: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Estimation of se - Standard Deviation of Disturbance e

• Forecasting

Equation

• SS of Residuals

• Mean SS

SSE = Y i- Y i2

i=1

n

Y=b0+b1X1+b2X2

MSE = se2 = SSE

(n-3)

Page 9: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Standard Error of Coefficients

• The variance matrix of b (K+1 x 1)is

12 TeVar s X X

b

1

k

Tb es s the k th diagonal element of X X

Page 10: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

The Variability Explained

• First, determine the base variability for explanation by the regression

Unconditional mean model: Y = y + e e follows N(0, y)

LS fit of the model: Pred_Y = Y

SS of Residuals:

MSS (DF=n-1):

2

i1

Y -Yn

i

SST

2

i2 1

Y -Y

1

n

iyS n

Page 11: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

The Variability Explained – cont.

• Second, by subtraction of the variability for still left.

• In SS:

• In Variance :

2

i1

Y -Yn

i

SST

2

i2 1

Y -Y

1

n

iyS n

2

i1

ˆY -Yn

i

SSE

2

i2 1

ˆY -Y

3

n

ieS n

Page 12: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Creating ANOVA Table

Regression

Model

Unexplained Variability in SS DF

Unexplained

Variability in Variance (MSE)

Un-

conditional SST (n-1)

Conditional SSE (n-3)

Variability

Explained

SSR=

SST - SSE

2

Proportion

Explained

2yS MST

2eS MSE

2 2y eS S

2 1SSE

RSST

2

22

1 e

y

Sadjusted R

S

Page 13: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Test of Significance

• F test of significance

• T- Test of significance– Two sided alternative

– One sided alternative

Page 14: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

F - Test of Significance of the variability explained by the regression

H0: 1= 2 = 0

Ha: At least one coefficient is not 0

2

2

3

2

3

1 2

nSST SSE MSRF stat

SSE MSE

nR

R

P-Value of F-stat = P{F(2, n-3) > F-stat}

Page 15: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

t-Test of Significance of significance of a variable, X1

- two sided

H0: 1 = 0

Ha: 1 = 0

1

11

b

bt Stat of X

s

P-Value of t-stat = P{ t( n-3) > |t-stat|}

Page 16: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

One Sided Test of Significance of significance of a variable, X1

H0: 1 = 0

Ha: 1 > 0 (using the prior knowledge)

1

11

b

bt Stat of X

s

p-Value of t-stat = P{ t( n-3) > t-stat}

Page 17: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Forecasting

• Point forecasting

• Sources of forecasting error

• Interval forecasting

Page 18: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Forecasting at xm

11 12

1 2

1

1 n n

X X

X X

X 1

2

1

m

m

X

X

mx

Data of X for regression Value of X for prediction

Page 19: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Sources of Forecasting Error

• Data: Y|xm = 0+ 1 x1m + 2 x2m + em

• Forecast:

• Forecast Error:

0 1 m1 2 m2Y | x b +b x +b xm

2

2

m

0 1 20 1 m1 2 m2

-

- - -

Y |x Y|xb + b x + b x

em

m

SS

m

e

Page 20: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Computing Standard Errors

1T Tm e m ms s

x X X x

22emp sss

Page 21: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Forecasting Performance Analysis

• R2_pred = 1 – Press / SST

Press = SS of {yi – yi(i)} (deleted residual)

• Sample splitting

–Analysis sample (n1)

–Validation sample (n2)

Page 22: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Generalization to K Independent Variables

• Use n – K – 1 for n – 3 for DF for t.

• Use K for the numerator DF and n-K-1 for the denominator DF for F.

Page 23: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Diagnostics

• Assumptions for Disturbance

• Multi-collinearity

• Outliers and Influential Observations

Page 24: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Problematic Data Conditions

• Regression Coefficients Are Sensitive to:

–Highly Collinear Independent Variables

–Contamination By Outliers and Influential Observations

Page 25: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

DetectingOutliers and Influential Data

• Outliers– Leverage (X-space) distance from the mean

– Tresid (Y-space) forecasting error

• Influential Data – Idea: with / without comparison

–Cook’D

–Dfbetas

–Dfits

Page 26: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Modeling Techniques

• Transformation of Variables– Log – Others

• Using Dummy Variables– Symbolic representation– Dummy variables for qualitative variables

• Using Scores for Ordinal Variables

• Selection of Independent Variables– Forecasting– Computer intensive– Analysis of correlation structure of independent variables

Page 27: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Dummy Variables

• DK= “If (X=k,1,0)”

• Can be used nominal and also ordinal variables

• # of DK = c-1 where c is the number of categories.

Page 28: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Using Scores for Ordinal Variable

• Scoring Systems

– 1, 2, 3, …c

– -2, -1, 0, 1, 2 c:odd

Page 29: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Implications of Variable Selection

Purposes ofRegression

MissingEssentialPredictors

Including Non-essentialPredictors

Prediction ofthe DependentVariable

Increase in theMean SquaredError of thePrediction

Increase in theMean SquaredError of thePrediction

Estimation ofthe Effect ofthe Predictors

Bias in theEstimates

Increase in theStandardErrors of theCoefficients

Page 30: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Selection of Variables - 1

• Backward elimination

• Stepwise (forward) inclusion

All X’s

Final Regression

T-test

Bestsimple

BestTwo variables

Best…. variables

Max Increasein R2

Max Increasein R2

Page 31: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Selection of Variables - 2

• All Possible Regression

K independentvariables

K simple

K (K-1) two variable

1K variable

Final Regression

Page 32: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Selection Criteria

• R2___________________________

• Adj. R2 ______________________

• R2PRED ______________________

• Se __________________________

• Cp___________________________

Page 33: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Cp (= # of coefficients)

Select a combination with Cp close to p

2

2

pp

F

p F F

F

p F

F

p F

F

SSEC n p

MSE

MSE n p MSE n p MSE n p

MSE

n p MSE MSEn p n p

MSE

n p MSE MSEp

MSE

Page 34: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

What to Look for in Good Regression?

• Remember the three functions of regression– Estimation of the effect of each X

– Explaining the variability of Y

– Forecasting

• Populations regressions are assumptions– Needs testing

• Data might be contaminated

Page 35: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Extensions

For Other Variable Types of Y

Page 36: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Types of Variable

Variable

Quantitative

Qualitative

Continuous

Discrete(counting)

Ordinal

Nominal

Page 37: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Generalized Linear Models (GLM)

• Regression model:Y = 0 + 1X1 + 2X2 + ee following N(0, )

• GLM Formulation:1. Model for Y:

Y is N(, )

2. Model for predictors (Link Function):

= 0 + 1X1 + 2X

Page 38: Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Forecasting Counting Data

• Model for Y: Poisson Distribution ()

• Link Function:

exp|

!

yi i

i iP Y yy

0 1 1 2 2log i i i K KiX X X