Top Banner
REGRESSION ANALYSIS British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century.
49

Regression Analysis

Feb 23, 2016

Download

Documents

vashon

Regression Analysis. British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century. Meaning of Regression A nalysis. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Regression Analysis

REGRESSION ANALYSIS

British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century.

Page 2: Regression Analysis

Meaning of Regression Analysis

Regression means the estimation or prediction of the unknown value of dependent variable from the known value of independent variable. In other words, regression analysis is a mathematical measure of the average relationship between two or more variables.

Definition of RegressionBy M.M. Blair- Regression is the measure of the average

relationship between two or more variables.

By Hirsch- regression analysis measures the nature and extent of the relation between two or more variables, thus enables us to make predictions.

Page 3: Regression Analysis

Utility of Regression 1. Regression analysis is essential for planning and

policy making: It is widely used to estimate the coefficient of economic relationship. These coefficient helps in the formulation of economic policy of govt.

2. Testing Economic Theory: Regression analysis is one of the basic tool to test the accuracy of economic theory.

3. Prediction: By regression analysis, the value of dependent variable can be predicted on the basis of the value of an independent variable. For example, if price of a commodity rises, what will be the probable fall in demand, this can be predicted by regression.

4. Regression is used to study the functional relationship: In agriculture, it is important to study the response of fertilizer.

Page 4: Regression Analysis

Difference between Regression and CorrelationBasis Correlation Regression

1. Degree and nature of Relationship

Correlation is a measure of degree of relationship between X and Y.

Regression studies the relationship between two variables so that one may be able to predict the value of one variable on the basis of another.

2. Cause and Effect Relationship

Correlation does not always assume cause and effect relationship.

Regression analysis expresses the cause and effect relationship.

3. Prediction Correlation does not make any prediction.

Regression analysis enable us to make prediction.

4. Non-sense Correlation:

In Correlation analysis, sometimes there is a non-sense correlation like between rise in income and rise in weight

In regression analysis there is nothing like non-sense regression.

Page 5: Regression Analysis

Types of Regression Analysis1. Simple and Multiple Regression: In simple regression

analysis, we study only two variables at a time, in which one variable is dependent and another is independent. The relationship between income and expenditure is an example of simple regression. In multiple regression analysis, we study more than two variables at a time in which one is dependent variable and others are independent variable. The study of effect of rain and irrigation on yield of wheat is an example of multiple regression.

2. Linear and Non-linear Regression: When one variable changes with other variable in fixed ratio this is called linear regression. When one variable varies with other variable in changing ratio, then it is called as non-linear regression.

3. Partial and Total Regression: When two or more variables are studied for functional relationship but at a time, relationship between two variables is studied and the other variables are constant, it is known as partial regression. When all the variables are studied simultaneously for the relationship among them is called total regression.

Page 6: Regression Analysis

Simple Linear Regression

Regression Lines

Regression Equations

Regression Coefficients

Page 7: Regression Analysis

Regression Lines

The Regression Line shows the average relationship between two variables. This is also known as the line of Best Fit. If two variables X and Y are given, then there are two regression lines related to them:

1. Regression Line of X on Y: The regression line of X on Y gives the best estimate for the value of X for any given value of Y

2. Regression Line of Y on X: The regression line of Y on X gives the best estimate for the value of Y for any value of X.

Page 8: Regression Analysis

Method of obtaining Regression lines

1. Scatter diagram method

2. Least Square method

Page 9: Regression Analysis

Scatter diagram method

This is the simplest method of constructing regression lines. In this method, values of related variables are plotted on a graph. A straight line is drawn passing through the plotted points. The straight line is drawn with freehand. The shape of regression line can be linear or non-linear.

Page 10: Regression Analysis

Least Square Method Regression Lines can also be constructed by this method.

Under this method a regression line is fitted through different points in such a way that the sum of squares of the deviations of the observed values from the fitted line shall be least. The line drawn by this method is called Line of Best Fit. Under this method two lines regression lines are drawn in such a way that sum of squared deviations becomes minimum.

Page 11: Regression Analysis

Regression Equations Regression equations are the algebraic formulation

of regression lines. Regression Equations represent regression lines. There are two regression equations:

Regression Equation of Y on X

Regression equation of X on Y

Page 12: Regression Analysis

Regression Equation of Y on X This equation is used to estimate the probable value

of Y on the basis of the given values of X. This equation can be expressed in the following way:

Y=a+bX Here a and b are constants Regression equation of Y on X can also be

presented in another way: Y−Y¯=byx(X—X¯)

Page 13: Regression Analysis

Regression Equation of X on Y

This equation is used to estimate the probable values of X on the basis of the given values of Y. This equation is expressed in the following:

X=a+bY Here a and b are constants Regression equation of X on Y can also be written

in the following way: X—X¯=bxy(Y—Y¯¯)

Page 14: Regression Analysis

Regression Coefficients There are two regression coefficients. Regression

coefficients measures the average change in the value of one variable for q unit change in the value of another variables. Regression coefficient represents the slope of a regression line. There are two regression coefficients:

Regression coefficient of Y on X

Regression coefficient of X on Y

Page 15: Regression Analysis

Regression coefficient of Y on X This coefficient shows that with a unit change in the value of

X variable, what will be the average change in the value of Y variable. This is represented byx. Its formula is as follows:

byx=r.y∕x

Regression coefficient of X on Y This coefficient shows that with a unit change in the value of

Y variable, what will be the average change in the value of X-variable. It is represented by bxy. Its formula is as follows:

bxy=r.x∕y

Page 16: Regression Analysis

Properties of Regression Coefficients

Following are the properties of regression coefficients: 1. Coefficient of correlation is the geometric

mean of the regression coefficients. r=√bxy×byx 2. Both the regression coefficients must have

same sign- This means either both regression coefficients will either be positive or negative. In other words when one regression coefficient is negative, the other would also be negative. It is never possible that one regression coefficient is negative while the other is postive.

Page 17: Regression Analysis

3. The coefficient of correlation will have the same sign as that of regression coefficients- If both regression coefficients are negative, then the correlation would be negative. And if byx and bxy have positive signs, then r will also take plus sign.

4. Both regression coefficients cannot be greater than unity- If one regression coefficient of y on x is greater than unity, then the regression coefficient of x on y must be less than unity. This is because

r=√byx.bxy=±1

Page 18: Regression Analysis

5. Arithmetic mean of two regression coefficients is either equal to or greater than the correlation coefficient. In terms of the formula:

byx+bxy÷2≥r6. Shift of origin does not affect regression

coefficients but shift in scale does affect regression coefficient- Regression coefficients are independent of the change of origin but not of scale.

Page 19: Regression Analysis

Methods to obtain Regression Equations

Regression Equations in case of Individual Series

Regression Equations in case of Grouped Data

Page 20: Regression Analysis

In case of Individual Series In Individual series, regression equations can be

worked out by two methods:

Using Normal Equations

Using regression Coefficients

Page 21: Regression Analysis

Regression Equations using Normal Equations

This method is also called Least Square Method. Under this method, computation of regression equations is done by solving two normal equations.

Regression Equation of Y on X Y=a+bX Under least square method, the values of a and b are obtained by using the

following two normal equations: ∑Y=Na+b∑X ∑XY=a∑X+b∑X² Solving these equations, we get the following value of a and b: bxy= N.∑XY−∑X.∑Y÷N.∑X²−(∑X)² a=Y-bX Finally a and b are put in the equation: Y=a+bX

Page 22: Regression Analysis

Regression Equation of X on Y Regression Equation of X on Y is expressed as follows: X=a+bY Under least Square Method the value of a and b are obtained by

using the following normal equations: ∑X=Na+b∑Y ∑XY=a∑Y+b∑Y² Solving these equations we get the value of a and b The calculated value of a and b are put in the equation: X=a+bY

Page 23: Regression Analysis

Example

Calculate the regression equation of X on Y from the following by least square method

X Y X² Y² XY1 2 1 4 2

2 5 4 25 10

3 3 9 9 9

4 8 16 64 32

5 7 25 49 35

N=5 ∑X=15

∑Y=25 ∑X²=55 ∑Y²=151 ∑XY=88

Page 24: Regression Analysis

Solution

Regression Equation of X on Y X=a+bY The two normal equations are: ∑X=Na+b∑Y ∑XY=a∑Y+b∑Y² Substituting the values we get 15=5a+25b 88=25a+151b Multiplying 1 and 2 88=25a+151b 75=25a+125b − − − 13=26b b=13÷26=0.5

Page 25: Regression Analysis

Putting the value of b in equation1 15=5a+25×0.5 15=5a+12.5 5a=2.5 a=0.50 Therefore: X=0.5+0.5Y

Page 26: Regression Analysis

Regression Equation using Regression Coefficients

Regression equations can also be computed with the help of regression coefficients. For this we will have to find out, X‾, Y‾, byx, bxy from the data. Regression equations can be computed from the regression coefficients by following method:

1. Using actual values of X and Y 2. Using deviations from Actual Means 3. Using deviations from Assumed Means 4. Using r, x, y, and X‾, Y‾.

Page 27: Regression Analysis

Using Actual values of X and Y series

In this method , actual values of X and Y are used to determine regression equations. Regression equation are put in the following way:

Regression Equation of Y on X Y-Y‾=byx(X-X‾) Using the actual values, the value of byx can be calculated

as: byx=N.∑XY−∑X.∑Y÷N.∑X²−(∑X)² Regression Equation of X on Y X−X‾=bxy(Y−Y‾) Using the actual values bxy can be calculated

as : bxy=N.∑XY−∑X.∑Y÷N.∑Y²−(Y)²

Page 28: Regression Analysis

Example

Obtain two lines of equations from the following :

X Y XY X² Y²2 5 10 4 25

4 7 28 16 49

6 9 54 36 81

8 8 64 64 64

10 11 110 100 121

∑X=30 ∑Y=40 ∑XY=266 ∑X²=220 ∑Y²=340

Page 29: Regression Analysis

Regression coefficient of Y on X byx=N.∑XY−(∑X) (∑Y)÷N.∑X²−(∑X)² =5×266−30×40÷5×220−(30)² =1330−1200÷1100−900 =130÷200 =0.65 Regression coefficient of X on Y bxy=N.∑XY−∑X.∑Y÷N.∑Y²−(Y)² =5×266−30×40÷5×340−(40)² =1330−1200÷1700−1600 =130÷100 =1.30 Y−Y‾=byx(X−X‾) Y‾=∑Y÷N =40÷5 =8 X‾=∑X÷N =30÷5 =6

Page 30: Regression Analysis

Regression equation of Y on X Y−Y‾=byx(X−X‾) Y−8=0.65(X− 6) Y=4.1+0.65X Regression equation of X on Y X−X‾=bxy(Y−Y‾) X−6=1.30(Y−8) X=−4.40+1.30Y

Page 31: Regression Analysis

Using Deviations take from Actual Means

When the size of the values of X and Y is very large, then the method using actual values becomes very difficult to use. In such case, deviations taken from arithmetic means are used.

Regression equation of Y on X Y−Y‾=byx(X−X‾) Using deviations from actual means: byx=∑xy÷∑x² Where, x=X−X‾, y=Y−Y‾ Regression equation of X on Y X−X‾=bxy(Y−Y‾) Using deviations from actual means: bxy=∑xy÷∑y²

Page 32: Regression Analysis

Example Obtain the two regression equations from the

following:X Y X‾=7

X−X‾x

x² Y‾=5Y−Y‾y

Y² xy

2 4 −5 25 −1 1 5

4 2 −3 9 −3 9 9

6 5 −1 1 0 0 0

8 10 1 1 5 25 5

10 3 3 9 −2 4 −6

12 6 5 25 1 1 5

∑X=42N=6

∑Y=30 ∑x=0 ∑x²=70

∑y=0 ∑y²=40

∑xy=18

Page 33: Regression Analysis

SolutionSince the actual means of X and Y are whole numbers we should take

deviations from X‾and Y‾ byx=∑xy÷∑x² =18÷70 =0.257 bxy=∑xy÷∑y² =18÷40 =0.45 Regression equation of Y on X Y−Y‾=byx(X−X‾) Y−5=0.257(X−7) Y−5=0.257X−7 Y=0.257X+3.201 Regression equation of X on Y X−X‾=bxy(Y−Y‾) X−7=0.45(Y−5) X−7=0.45Y−2.25 X=0.45Y=4.75

Page 34: Regression Analysis

Using Deviations taken from Assumed Means

When actual means turn out to be in fractions rather than the whole number like 24.14, 56.89 etc, then deviations from assumed means rather than actual means are used. Following are the regression equations:

Regression equation of Y on X Y−Y‾=byx(X−X‾) Using deviations from the assumed means, value of byx

can be calculated as: byx=N×∑dxdy−∑dx.∑dy÷N.∑dx²−(∑dx)²

Page 35: Regression Analysis

Regression equation of X on Y X−X¯=bxy(Y−Y¯) Using deviations from assumed means the value of bxy

can be calculated: bxy=N.∑dxdy−∑dx.∑dy÷N.∑dy²−(∑dy)² Where dx=X−Ax, dy=Y−Ay

Page 36: Regression Analysis

Example Obtain the two regression equations for the

following data:

X Y A=69dx

dx² A=112dy

dy² dxdy

78 125 9 81 13 169 11789 137 20 400 25 625 50097 156 28 784 44 1936 123269=A 112=A 0 0 0 0 059 107 −10 100 −5 25 5079 136 10 100 24 576 24068 124 −1 1 12 144 −1261 108 −8 64 −4 16 32N=8∑X=600

∑Y=1005

∑dx=48

∑dx²=1530

∑dy=109

∑dy²=3491

∑dxdy=2159

Page 37: Regression Analysis

Solution byx=N.∑dxdy−∑dx.∑dy÷N.∑dx²−(dx)² =8×2159−(48)(109)÷8×1530−(48)² =17272−5232÷12240−2304 =12040÷9936 =1.212 Regression equation of Y on X Y−Y¯=byx(X−X¯) Y−125.625=1.212(X−75) Y−125.625=1.212X−90.9 Y=1.212X+34.725

Page 38: Regression Analysis

bxy=N.∑dxdy−∑dx.∑dy÷N.∑dy²−(∑ dy)² =8×2159−(48)

(109)÷8×3491−(1005)² =17272−5232÷27928−1010025 =12040÷−9827097 =−0.001 Regression equation of X on Y X−X¯=bxy(Y−Y¯) X−75=−0.001(Y−125.625) X−75=−0.001Y+0.125 X=−0.001Y+9.375

Page 39: Regression Analysis

Obtain Regression Equations from Coefficient of Correlation, Standard Deviations and Arithmetic Means of X and Y

When the values of X¯, Y, x, y and r of X and Y series are given then regression equations are expressed in the following way:

1. Regression Equation Of Y on X Y−Y¯=byx(X−X¯) Where, byx=r.y÷x 2. Regression equation of X on Y X−X¯=bxy(Y−Y¯) Where, bxy=r.x÷y

Page 40: Regression Analysis

Example You are given the following information: Obtain two regression equations:

X YArithmetic Mean: 5 12Standard deviation: 2.6 3.6

Correlation Coefficient:

r=0.7

Page 41: Regression Analysis

Solution 1. Regression Equation Of X on Y: X−X¯=r.x÷y(Y−Y¯) Putting the values in the equation we get: X−5=0.7×2.6÷3.6(Y−12) X−5=0.51(Y−12) X−5=0.51Y−6.12 X=0.51Y−1.12

Page 42: Regression Analysis

2. Regression equation of Y on X Y−Y¯=r.y÷x(X−X¯)Putting the values in the equation, we get Y−12=0.7×3.6÷2.6(X−5) Y−12=0.97(X−5) Y−12=0.97X−4.85 Y=0.97X+7.15

Page 43: Regression Analysis

Obtain Regression Equations in case of Grouped Data

For obtaining regression equations from grouped data, first of all we have to construct a correlation table. Special

adjustments must be made while calculating the value of regression coefficients because regression coefficients are independent of change of origin but not of scale. In grouped data the regression coefficients are computed by using the formula: 1.bxy=N×∑fdxdy−∑fdx.∑fdy÷N×∑fdy²−(∑fdy)²×ix÷iy

2.byx=N×∑fdxdy−∑fdx.∑fdy÷N×∑fdx²−(∑fdx)²×iy÷ix

Page 44: Regression Analysis

Obtain the Mean values and Correlation Coefficient from the Regression Equations

1. To find the Mean Values from the Regression Equations: Two regression lines intersect each other at mean value(X¯ and Y‾ points.

2. To find the Coefficient of Correlation from two Regression Equations: Correlation coefficient can be worked out from the regression coefficients bxy and byx.

r=√byx.bxy

Page 45: Regression Analysis

Example From the following two regression equations, identify which one

is X on Y and which one is Y on X 2X+3Y=42 X+2Y=26

Solution In the absence of any clear cut indication, Let us assume that

equation first to be Y on X and equation second to be X on Y Let equation 1 be regression equation of Y on X 2X+3Y=42 3Y=42−2X Y=42÷3−2X÷3

Page 46: Regression Analysis

From this it follows that byx=Coefficient of X in 1=−2÷3Now equation 2 be regression equation on X on Y X+2Y=26 X=26−2YFrom this it follows that bxy=coefficient of Y in 2=−2Now, we calculate ‘r’ on the basis of the above values of two

regression coefficients we get: r²=byx.bxy =−2÷3×−2 = 4÷3>1

Page 47: Regression Analysis

Here, r²>1 which is impossible as r²≤1. So our assumption is wrong. We now choose equation 1 as regression of X on Y and 2 as regression equation of Y on X:

Assuming the first equation as on X on Y, we have 2X+3Y=42 2X=42−3Y X=42÷2−3Y÷2 From this, it follows that bxy=Coefficient of Y in above equation=−3÷2 Now, assuming the second equation as Y on X, we have, X+2Y=26 2Y=26−X Y=26÷2−1÷2X From this it follows that byx=Coefficient of X in above equation=−1÷2

Page 48: Regression Analysis

Now, r²=bxy.byx =−3÷2×−1÷2 =3÷4 =0.75 Here, r²<1 which is possible. r² is within

the limit , r²≤1. Hence, it is proved that the first equation

is of X on Y and the second equation is of Y on X

Page 49: Regression Analysis