Top Banner
Course: Statistics Unit 9 Regression Analysis
24

Unit 9 Regression SLM

Apr 08, 2015

Download

Documents

munmun8327
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Unit 9 Regression SLM

Course: Statistics

Unit 9

Regression Analysis

Page 2: Unit 9 Regression SLM

Page 2 of 24

Table of Contents

9.1. Learning Objectives ........................................................................................................................... 3

9.2. Introduction ........................................................................................................................................ 3

9.3. Regression Analysis ............................................................................................................................ 4

9.4. Regression Lines ................................................................................................................................. 4

9.5. Regression Coefficient ....................................................................................................................... 5

9.6. Differences between Correlation Coefficient and Regression Coefficient .................................... 5

9.7. Examples ............................................................................................................................................. 8

9.8. Standard Error of Estimate ............................................................................................................ 13

9.9. Application in Finance ..................................................................................................................... 17

9.9.1. Correlation between Two Variables ......................................................................................................... 17

9.9.2. Beta () of a Stock/Share ......................................................................................................................... 17

9.10. Non-Linear Regression .................................................................................................................. 19

9.11. Logistic Regression......................................................................................................................... 21

9.12. Summary ......................................................................................................................................... 24

Page 3: Unit 9 Regression SLM

Page 3 of 24

9.1. Learning Objectives

By the end of this unit, you should be able to:

Recognise the need of regression analysis

Apply the regression equations to calculate correlation coefficient

Calculate the regression equations for a correlation study

Calculate the standard error of the estimate

9.2. Introduction

The word Regress means the tendency of the data to tend to the normal value.

Correlation analysis attempts to study the relationship between the two variables x and y. Regression

analysis attempts to predict the average x for a given y. In Regression it is attempted to quantify the

dependence of one variable on the other.

A Case

Mr. Ajit is a G.M of a tyre manufacturing company. He is very happy that the sales of

tyres are increasing. However he was of the opinion that increase in sales is due to sales

force. His secretary, Ms. Anitha pointed out that the performance record sent by

Marketing Manager does not show any changes. Mr. Ajit was very curious. When he was

talking to his friends son, Mr. Suresh who holds a position in Motor Vehicle Registration

office he learnt that Registration of vehicles is increasing. Mr. Ajit immediately thinks of

his statistician, Mr. Satish. He consults him. Mr. Satish promises to come back with

solution to the problem.

(Cont. in topic ‘Differences between Correlation Coefficient and Regression Coefficient’)

There are two variables x and y. y depends on x. The dependence is expressed in the form

of the following equation. In regression one of the variables is dependent and the others are

independent.

Y = a + bx

Regression is defined as, “the measure of the average relationship between two or more

variables in terms of the original units of the data.”

Page 4: Unit 9 Regression SLM

Page 4 of 24

9.3. Regression Analysis

Regression Analysis is used to:

Estimate the values of the dependent variables from the values of the independent variables

Get a measure of the error involved while using the regression line as a basis for estimation

Regression coefficient is used to calculate correlation coefficient; the square of correlation that prevails

between the given two variables. It provides a mathematical relationship between two or more variables. It is

based on cause and effect relationship.

9.4. Regression Lines

For a set of paired observations there exist two straight lines.

The smaller angle between these lines, higher is the correlation between the variables. If we fit a straight line

to scatter diagram data some of the points will lie above the straight line and some below the line. The

deviation of each point from the line is called Error.

The regression equations found by the above conditions is said to fit by method of least squares. ‘byx’ and

‘bxy’ are called Regression Coefficients.

The regression model captures the systematic behaviour of data. The non-systematic behaviour of data

cannot be captured and are known as errors. The errors are due to random components that cannot be

predicted. Assuming that the random errors are “Normally distributed” we can construct confidence level

and interval for random errors.

The regression lines always intersect at x y . The regression lines have equation,

The regression equation of y on x / simple linear Regression model is given by xXbyxyY .

The regression equation of x on y / simple linear regression model is given by yYbxyxX .

Where,

22 dxdx

dydxdxdybyx and

22 dydy

dydxdxdybxy

The line drawn such that sum of vertical deviation is zero and sum of their squares is minimum

is called Regression line of y on x. It is used to estimate y – values for given x – values.

The line drawn such that sum of horizontal deviation is zero and sum of their squares is

minimum is called Regression line of x on y. It is used to estimate x - values for given y -

values.

Page 5: Unit 9 Regression SLM

Page 5 of 24

9.5. Regression Coefficient

Regression coefficient is used to calculate correlation coefficient; the square of correlation that prevails

between the given two variables. It provides a mathematical relationship between two or more variables. It is

based on cause and effect relationship.

1.. 2 bxybyxrbxybyx

1. bxybyx

If byx is negative, then bxy is also negative and r is negative.

They can also be expressed as x

yrbyx

and

y

xrbyx

It is an absolute measure.

9.6. Differences between Correlation Coefficient and Regression Coefficient

Table 9.1

Correlation Coefficient Regression Coefficient

rxy = ryx

byx = bxy

-1< r <1 if byx can be greater than one, but bxy must

be less than one such that byx.byx<1

It has no units attached to it It has unit attached to it

There exist nonsense correlation There is no such nonsense regression

It is not based on cause and effect

relationship

It is based on cause and effect relationship

It indirectly helps in estimation It is meant for estimation

Page 6: Unit 9 Regression SLM

Page 6 of 24

(Cont. from topic ‘A Case’)

Mr. Satish collects data on Number of Vehicles registered and number of tyres sold as

follows:

Table 9.2

Number of Vehicle

Registered in week

(X)

23 29 29 35 42 46 50 54 64 66 76 78

Number of Tyre’s sold

per week (Y)

69 96 102 118 125 126 138 178 156 184 176 225

He worked out the regression equation of sales on number of vehicles registered as follows:-

Table 9.3

X Y X2 XY

23 69 529 1587 82.432 180.4305

29 96 841 2784 95.7959 0.0416

29 102 841 2958 95.7959 38.4904

35 118 1225 4130 109.1594 78.1557

42 125 1764 5250 124.7502 0.0624

46 126 2116 5796 133.6592 58.6629

50 138 2500 6900 142.5682 20.8681

54 178 2916 9612 151.4772 703.4609

64 156 4096 9184 173.7497 315.0502

66 184 4356 12144 178.2042 33.5918

76 176 5776 13376 200.4766 599.1060

78 225 6084 17550 204.9311 402.7592

Total 592 1693 33044 92071 2430.68

2272.2889.319

472.712byx

33.4912

592

083.14112

1693

The regression equation is

33.492272.2083.141

2128.312272.2

(Cont. in next page)

Page 7: Unit 9 Regression SLM

Page 7 of 24

(Cont. from previous page)

And he concludes that there is good relationship between the variables. His conclusion is

that increase is number of registration has increased the sales. He further supports it by

calculating correlation coefficient. The calculation through MS-Excel is shown at later

below. This information will help Mr. Ajit to plan his future production.

He worked out the regression equation of sales on number of vehicles registered as follows:

Table 9.4

Y

17 16.6555 0.1187

17 17.1765 0.0311

18 17.6975 0.0915

18 18.2185 0.0477

19 18.7395 0.0678

19 19.5605 0.3142

19 20.0815 1.1696

20 20.6025 0.3630

21 21.1235 0.0153

22 21.6445 0.1264

Total 2.3453

10

3453.2YXS

484.023453.0

Page 8: Unit 9 Regression SLM

Page 8 of 24

9.7. Examples

Example 9.1:

Find regression equation from the following data

Table 9.5

Age of Husband 18 19 20 21 22 23 24 25 26 27

Age of Wife 17 17 18 18 19 19 19 20 21 22

And hence calculate correlation coefficient.

Solution:

Table 9.6

Age of

husband

(x)

dx = x-

22

dx2 Age of

wife (y)

dy = y-19 dy2 dx dy

18 -4 16 17 -2 4 8

19 -3 9 17 -2 4 6

20 -2 4 18 -1 1 2

21 -1 1 18 -1 1 1

22 0 0 19 0 0 0

23 1 1 19 0 0 0

24 2 4 19 0 0 0

25 3 9 20 1 1 3

26 4 16 21 2 4 8

27 5 25 22 3 9 15

Total 225 5 85 190 0 24 43

5.2210

225 19

10

190

Regression equation of Y on X is:

Regression Equation of X on Y is:

2775.7521.0

)5.22(521.019

521.0825

430

)5(8510

)0)(5(43102

XY

XY

byx

XXbyxYY

966.0792.1521.0

548.11792.1

)19(792.15.22

392.124

43

)5(2410

)0)(5(43102

r

YX

YX

bxy

(Cont. in next page)

Page 9: Unit 9 Regression SLM

Page 9 of 24

(Cont. from previous page)

Using MS Excel - Procedure

Regression Analysis

Regression Statistics

Multiple R 0.966353136

R Square 0.933838384

Adjusted R Square 0.925568182

Standard Error 0.445516384

Observations 10

ANOVA

df SS MS F Significance F

Regression 1 22.41212121 22.41212121 112.9160305 5.38409E-06

Residual 8 1.587878788 0.198484848

Total 9 24

Coefficients

Standard

Error t Stat P-value Lower 95% Upper 95%

Intercept 7.272727273 1.112575252 6.536840775 0.000180955 4.707124143 9.838330403

Age of

Husband 0.521212121 0.04904974 10.62619549 5.38409E-06 0.408103219 0.634321023

Residual Output

Observation

Predicted Age of

Wife Residuals

1 16.65454545 0.345454545

2 17.17575758 -0.175757576

3 17.6969697 0.303030303

4 18.21818182 -0.218181818

5 18.73939394 0.260606061

6 19.26060606 -0.260606061

7 19.78181818 -0.781818182

8 20.3030303 -0.303030303

9 20.82424242 0.175757576

10 21.34545455 0.654545455

Page 10: Unit 9 Regression SLM

Page 10 of 24

Example

A study of wheat prices at Mumbai and Kanpur yields the following data:

Mumbai Kanpur

Mean

7.50

8.10

Standard Deviation

0.326

0.207

Example 9.2:

In a correlation study we have the following data.

Table 9.7

Series X Series Y

Mean S.D 65 67

S.D 2.5 3.5

Correlation coefficient 0.8

Find the two regression equations.

Solution:

Regression equation of y and x is:

8.512.1

)65(12.167

655.2

5.3)8.0(67

.

XY

XY

XY

XXrYYx

y

Regression equation of x and y is:

72.2657.0

)67(57.065

675.3

5.2)8.0(65

.

YX

XX

XX

YYrXXy

x

Page 11: Unit 9 Regression SLM

Page 11 of 24

The correlation coefficient between the prices of Mumbai and Kanpur is 0.774. Estimate the price at Kanpur,

if the price at Mumbai is Rs.8.

Solution:

Given

X = 7.5 Y = 8.10 σx = 0.326 σy = 0.207 r = 0.774

The regression equation which we need to find is Y on X (where X Mumbai and Y Kanpur)

)( XXbYY yx …… eq. (1)

Where, x

y

yx rb

Substituting the values in eq. (1) we get,

50.7326.0

207.0774.010.8 XY

4145.44914.0

5.74914.010.8

XY

XY

Estimation of price at Kanpur when the price at Mumbai is Rs. 8

1195.8

4145.484914.0

Y

Y

The price at Kanpur is Rs. 8.12, when the price at Mumbai is Rs. 8.

Page 12: Unit 9 Regression SLM

Page 12 of 24

Example

The following table shows the amount spent on advertising and the corresponding sales of the product from

10 companies:

Company Sales (Rs. in

lakh)

Advertising cost

(Rs. in lakh)

A 25 8

B 35 12

C 29 11

D 24 5

E 38 14

F 12 3

G 18 6

H 27 8

I 17 4

J 30 9

a. Plot a scatter gram showing the relationship between advertising cost and sales of the

product.

b. Estimate the equation of the regression line of sales on advertising costs.

c. Use the regression line to forecast sales if advertising costs were Rs. 10 lakh.

Solution:

a. A scatter gram showing the relationship between advertising cost and sales of the product.

0

10

20

30

40

0 5 10 15

Sale

s (

Rs. in

lak

h)

Advertising cost (Rs. in lakh)

Page 13: Unit 9 Regression SLM

Page 13 of 24

b. The equation of the regression line of sales on advertising costs.

Y X X2 XY

25 8 64 200

35 12 144 420

29 11 121 319

24 5 25 120

38 14 196 532

12 3 9 36

18 6 36 108

27 8 64 216

17 4 16 68

30 9 81 270

Y = 225 X = 80 X2 = 756 XY = 2289

b =

22 xxn

yxxyn a =

y

nb

x

n

b = 28075610

80255228910

a =

10

8014655.2

10

255

= 2.14655 = 25.5 - 17.1724

= 8.3276

Y= 8.33 + 2.15x

c. Forecast of sales if advertising costs were Rs. 1000 lakh, we put X = 10 in the equation,

Y = 8.33 + 2.15 x 10

= 29.83

As the original data was given to the nearest integer (whole number), the forecast of sales

= 30 (or Rs. 30 lakh)

9.8. Standard Error of Estimate

The standard error of estimates helps to measure the accuracy of the estimated figures in regression analysis.

If the value of the standard error of estimate is small, it shows that the estimate provided by the regression

equation is better and closer. If standard error of estimate is zero, it shows that there is no variation about the

line and the correlation will be perfect.

Page 14: Unit 9 Regression SLM

Page 14 of 24

The standard error of regression of X values from Xc is:

2

ySx ,

216 rxSy ,

ba

ySx

2

, and

2

cySx

“The standard error of estimate uses to ascertain how good and representative the regression

line is as a description of the average relationship between two series.”

Page 15: Unit 9 Regression SLM

Page 15 of 24

Example 9.3:

The following results were worked out from scores in Statistics and Mathematics in a

certain examination.

Table 9.8

Scores in Statistics (X) Scores in Mathematics (Y)

Mean 40 48

Standard Deviation 10 15

Karl Pearson’s correlation coefficient between x and y is = + 0.42. Find the regression lines

x on y and y on x. Use the regression lines to find the value of y when x = 50 and value of x

when y = 30.

Solution:

Given the following data:

42.0;15;10;40;40 rYX yx

The regression line x on y is:

)()( YYrXXy

x

................... (1)

The regression line y on x is:

)()( XXrYYx

y

................... (2)

Therefore substituting the values we get the respective equation as:

8.6.26279.0 yX ................ (3) and

80.2263.0 xY ................ (4)

Therefore;

When y=30; x=35.518 using equation (3)

When x=50; y=54.3 by using equation (4)

Page 16: Unit 9 Regression SLM

Page 16 of 24

Example 9.4:

From the following data obtain the two regression equations

Table 9.9

X 12 4 20 8 16

Y 18 22 10 16 14

Estimate Y for X = 15 and estimate X for Y = 20

Solution:

= (12 + 4 + 20 + 8 + 16)/ 5 =12 = mean of X

= (18 + 22 + 10 + 16 + 14) / 5 = 16 = mean of Y

Table 9.10

X Y X –

X - 12

Y –

Y - 16

(X – )2 (Y – )

2 (X – ) (Y – )

12 8 0 2 0 4 0

4 22 - 8 6 64 36 - 48

20 10 8 - 6 64 36 - 48

8 16 - 4 0 16 0 0

16 14 4 - 2 16 4 - 8

160 80 - 104

65.0160

1042

yxb and

3.1

80

1042

yxb

Regression equation X on Y is given by:

YXTherefore

YX

b

3.18.32,

)16(3.112

1

When Y = 20; X = 32.8 – 1.3 x 20 = 6.8

Regression equation Y on X is given by:

XYTherefore

XY

b

65.08.23,

)12(65.016

1

When X = 15; Y = 23.8 – 0.65 x 15 = 14.05

Page 17: Unit 9 Regression SLM

Page 17 of 24

9.9. Application in Finance

9.9.1. Correlation between Two Variables

The correlation between two variables can be studied for

Time series data

Cross-sectional data, that is, data about sales revenue and advertisement expenses during a year for a

number of companies

The results and conclusions for time series data is valid for one company only. But for cross sectional data it

is valid for a group of companies at industry level.

We may take a particular company and study the correlation between prices of its stock in BSE and NSE.

9.9.2. Beta () of a Stock/Share

A stock with beta more than one say, 1.10, would rise 10% as much as the market index or would fall 10%

as compared to the index.

The volatility of stock is measured by its beta value. Beta represents the risk associated with the stock.

An aggressive investor would opt for a stock with beta value more than one.

A conservative investor would opt for the stock with beta value less than one.

Beta is measured through regression analysis. The percentage daily/weekly/monthly change in stock is taken

as dependent variable and the corresponding change in market index such as BSE or NSE is taken as

independent variable. Then the regression equation is fitted which is of the form Y= + X.

Thus a stock’s “” measures the relationship between the stock’s rate of return (Y) and the average rate of

return for the market as a whole.

The coefficient of determination “r2” obtained in the study provides a measure of volatility explained in a

stock’s price by the market.

One can determine regression equation between advertisement expenses and sales revenue

for different sectors of industries say, manufacturing, IT, chemical, pharmaceutical etc.

Beta measures which reflects the sensitiveness of a stock to movement in the stock market

index like NSE-Nifty or BSE-Sensex, as a whole. Always Beta value for market is taken as

one.

Page 18: Unit 9 Regression SLM

Page 18 of 24

Example 9.5:

The following data relates to the closing BSE sensex and stock price of RIL for 10 trading

days during a period. Find “” and interpret.

Table 9.11 Days BSE Stock price of RIL

1 12342 1150

2 12378 1163

3 12360 1148

4 12461 1150

5 12479 1147

6 12538 1169

7 12730 1192

8 12928 1213

9 12848 1216

10 12885 1208

Solution:

First we calculate the percentage changes in both BSE (X) and RIL(Y) as follows

dayIndexfor

dayindexfordayindexforRILBSE

st

stnd

1

10012/

Table 9.12 X Y

+0.2917 1.1304

-0.1454 -1.2898

0.8172 0.1742

0.1445 -0.2609

0.4728 1.9180

1.5313 1.9675

1.5554 1.7617

-0.6188 0.2473

0.2880 -0.6579

(Cont. in next page)

Page 19: Unit 9 Regression SLM

Page 19 of 24

9.10. Non-Linear Regression

Test of Hypothesis on regression coefficient by analysis will tell us whether there exists a linear relationship

or not suppose the relation is not linear, and then it can be always converted to linear relation by using

logarithm

(Cont. from previous page)

Using MS Excel - Procedure

Regression Analysis

Regression Statistics

Multiple R 0.657986268

R Square 0.432945929

Adjusted R

Square 0.351938204

Standard

Error 0.961822395

Observations 9

ANOVA

df SS MS F

Significance

F

Regression 1 4.9442110 4.9442110 5.34450178 0.05404187

Residual 7 6.4757162 0.9251023

Total 8 11.419927

Coefficients

Standard

Error t Stat P-value Lower 95% Upper 95%

Intercept 0.0291111 0.392985 0.0740769 0.9430215 -0.9001508 0.958373159

X 1.0903451 0.4716397 2.3118178 0.0540418 -0.0249055 2.205595895

Consider, the relation y = abx. This can be written as:

BXAY

baLogy

loglog

When, Y = log Y, A = log a, and B = log b.

Page 20: Unit 9 Regression SLM

Page 20 of 24

Example 9.6:

Consider the following incentive scheme and the turnover expected

Table 9.13

Incentive increase in % of Base Year Turnover (Rs. in crores)

1 110

2 120

3 132

5 160

8 215

10 260

Fit a curve of type Y = axb

Solution:

Log y = log a + blog x

Y = A + Bx

Table 9.14

X

Log x

Y

Log y X

2 Y

2

0 2.04 0 0

0.3 2.08 0.09 0.63

0.48 2.12 0.23 1.01

0.70 2.2 0.49 1.54

0.90 2.33 0.81 2.11

1.00 2.41 1.00 2.41

3.38 13.19 2.62 7.7

3766.04244.1172.15

5822.442.46

38.362.26

19.1338.37.762

A = 1.99 taking antilog the equation is

Y = 99.72 (2.364)x

Page 21: Unit 9 Regression SLM

Page 21 of 24

Example

Find the second degree regression polynomial y = a + bx + cx2 by least square method to the data given

below.

X

0

1

2

3

4

Y

1

0

3

10

21

Solution:

We need to fit a second degree regression polynomial of the form y = a + bx + cx2. In order to obtain the value for the

constants a, b and c the normal equations are:

∑y = Na + b∑x + c∑x2

∑xy = a∑x + b∑x2 + c∑x

3

∑x2y = a∑x

2 + b∑x

3 + c∑x

4

Calculation

X Y X2 XY X

2Y X

3 X

4

0 1 0 0 0 0 0

1 0 1 0 0 1 1

2 3 4 6 12 8 16

3 10 9 30 90 27 81

4 21 16 84 336 64 256

10 35 30 120 438 100 354

Substituting the values in the above equations and solving the simultaneous equations we get:

35 = 5a + 10b + 30c

120 = 10a + 30b + 100c

438 = 30a + 100b + 354c

a = 1

b = - 3

c = 2

Therefore, the second degree parabola is Y = 1 – 3x + 2x2.

Page 22: Unit 9 Regression SLM

Page 22 of 24

9.11. Logistic Regression

In linear regression model the variables are assumed to take continuous values in the interval. However there

are situations wherein the dependent variable follows Binomial distribution. In such cases logistic regression

is used.

The relationship between dependent and independent variable is of the form.

ye

1

1 where, P is the probability of success

Pe y 1

1 or

1

11

Pe y

1

ye or

BXAY

PPeY e

)1log(loglog

Page 23: Unit 9 Regression SLM

Page 23 of 24

Example 9.7:

Suppose an event either is successful or failure. These are the values of Y, Viz 1 or 0 taken

by dependent variable. The corresponding revenue is given for twenty events as follows:

Y X

0 3.45

1 3.36

0 3.12

0 3.15

0 3.14

1 3.48

1 3.42

1 3.32

0 3.31

1 3.29

1 3.46

1 3.34

0 3.25

1 3.41

1 3.48

1 3.21

1 3.25

1 3.16

1 3.28

0 3.22

Then Regression equation is Y = 1.881 x 5.566

Note:

It is left as an exercise for the reader to find regression equation.

This regression equation does not yield

Y = 0 or Y = 1 when we put X = 2

Y = 3.762 – 5.566 = 1.204 > 1

Therefore we require a different technique to predict Y-value.

Let us construct class intervals

Mid X Prob of Success P Y = log (P / 1-P)

3.1-3.2 3.15 1/4 = 0.25 -0.477

3.2-3.3 3.25 4/6 = 0.67 0.308

3.3-3.4 3.35 3/4 = 0.75 0.477

3.4-3.5 3.45 5/6 = 0.81 0.689

(Cont. in next page)

Page 24: Unit 9 Regression SLM

Page 24 of 24

9.12. Summary

In this unit we learnt what is regression, how to measure and how to interpret SPSS output. Further the

application of regression in financial field was explained with example. We also learnt how to calculate the

standard error of the estimate.

(Cont. from previous page)

Note:

There are 4 reading in the interval 3.1-3.2 and only one corresponds to 1

P = ¼

Regression equation of Y on X is Y = 3.667 X – 11.8572

(or)

1log =3.667 x -11.8572

The P values are given by:

852.111

852.11667.3

667.3

e

e

For example when X = 2.7 Y = -1.9511 and P = 12%