Top Banner
Chapter 15 Multiple Linear Regression Analysis
50

Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

Dec 25, 2015

Download

Documents

Rolf Nash
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

Chapter 15

Multiple Linear Regression Analysis

Page 2: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

• Multiple linear regression

• Choice of independent variable • Application

Page 3: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

Goal : construct the multiple linear regression model to assess the relationship between one dependant variable and a set of independent variables.

Data : the dependant variable is quantitative data; the independent variables are all or most quantitative data. If there are some qualitative data or ranked data ,we must change them.

Application : explain and prediction.

significance : Since the things are influenced by many facts, the chan

ge of dependent variable may influenced by many others independent

variables. For example, the change of diabetes’ blood sugar may affect

ed by many biochemical criterions such as insulin, glycosylated hemogl

obin, total cholesterol of serum, triglyceride and so on.

Page 4: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

§  1 Multiple linear regression

Page 5: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

• variable: one dependant variable, a set of independent . together m+1 。

• Sample size: n• Data form in Table 15-1• General model of the regression equation:

eXXXY mm 22110

1 、 Multiple linear regression model

In the above model, the dependent variable y can be denoted the linear function of independent variables(x1,x2,•••xm) approximately.ß0 is the constant, ß 1, ß2, •••ßm are partial regression coefficient, denote that when other dependent variable holds the line, xj increase or decrease one unit that mean variation of y. The residual e is random error that excludes m entries independent variable influence to y.

Page 6: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

Case NO. X1 X2 … Xm Y 1 X11 X12 … X1m Y1 2 X21 X22 … X2m Y2 ┇ ┇ ┇ … ┇ ┇ n Xn1 Xn2 … Xnm Yn

Table 15-1 Data form of multiple regression

Qualification

(1)There is linear relationship between y and x1,x2,•••xm.(2)The measured value yi(i=1,2, •••,n) of each case is independent.(3) The residual e is independent and normally distributed with mean 0 an

d variance σ2, it equates to that for any independent variables x1,x2,•••xm

the dependent variable y has the same variance, and obey to normal distribution.

Page 7: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

General process

mm XbXbXbbY 22110ˆ

construct regression equation

(2) test and evaluate regression equation, the effect of each independent variables

(1)seek the partial regression coefficient mbbbb ,,,, 210

Page 8: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

2 、 The construction of Multiple linear regression equation

Case 15-1 the measured values of total cholesterol of

serum, triglyceride, fasting blood - sugar level, glycosyl

ated hemoglobin, fasting blood glucose are lied in table

15-2. Please construct Multiple linear regression equati

on with blood sugar and others indexes.

Page 9: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

Total cholesterin triglyceride insulin glycosylated Blood sugar (mmol/L) (mmol/L) (μ U/ml) hemoglobin(%) (mmol/L) NO.i

X1 X2 X3 X4 Y 1 5.68 1.90 4.53 8.2 11.2 2 3.79 1.64 7.32 6.9 8.8 3 6.02 3.56 6.95 10.8 12.3 4 4.85 1.07 5.88 8.3 11.6 5 4.60 2.32 4.05 7.5 13.4 6 6.05 0.64 1.42 13.6 18.3 7 4.90 8.50 12.60 8.5 11.1 8 7.08 3.00 6.75 11.5 12.1 9 3.85 2.11 16.28 7.9 9.6

10 4.65 0.63 6.59 7.1 8.4 11 4.59 1.97 3.61 8.7 9.3 12 4.29 1.97 6.61 7.8 10.6 13 7.97 1.93 7.57 9.9 8.4 14 6.19 1.18 1.42 6.9 9.6 15 6.13 2.06 10.35 10.5 10.9 16 5.71 1.78 8.53 8.0 10.1 17 6.40 2.40 4.53 10.3 14.8 18 6.06 3.67 12.79 7.1 9.1 19 5.09 1.03 2.53 8.9 10.8 20 6.13 1.71 5.28 9.9 10.2 21 5.78 3.36 2.96 8.0 13.6 22 5.43 1.13 4.31 11.3 14.9 23 6.50 6.21 3.47 12.3 16.0 24 7.98 7.92 3.37 9.8 13.2 25 11.54 10.89 1.20 10.5 20.0 26 5.84 0.92 8.61 6.4 13.3 27 3.84 1.20 6.45 9.6 10.4

Table 15-2 blood sugar of 27case diabetes and measured values of relative variables

Page 10: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

222110

2 )]([)ˆ( mm XbXbXbbYYYQ

mYmmmmm

Ymm

Ymm

lblblbl

lblblbl

lblblbl

2211

22222121

11212111

)( 22110 mm XbXbXbYb

Partial derivative

( )( ) , , j=1,2, ,m

( )( ) , 1, 2 ,

i jij i i j j i j

jjY j j j

X Xl X X X X X X i

n

X Yl X X Y Y X Y j m

n

4321 6382027060351501424094335 X.X.X.X..Y

Principle least sum of squares

Page 11: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

3、 Hypothesis test and evaluation

0 1 2: 0mH ,

1 : jH Notall(j =1, 2, , m) are

zero,

0.05

3.1.1 analysis of variance process :

ySS SS SS reg res

/

/ 1)y y

SS m MSF

SS n m MS

reg reg

( 1) Regression equation

)1(~ mn,mFF

Page 12: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

Source of

variation

df SS MS F P

Total variation n-1 SSy

regression m SSreg SSy /m MSreg/MSres

residual n-m-1 SSres SSres /(n-m-1)

table15-4 analysis of variance of case 15-1

Source of

variation

df SS MS F P

Total

variation

26 222.5519

regression 4 133.7107 33.4277 8.28 <0.01

residual 22 88.8412 4.0382

Table 15-3 frame of Multiple linear regression analysis of variance

From the F bound value we get 31.4)22,4(01.0 F , 31.4F , 01.0P ,

at 05.0 lever we can reject H0,accept H1 and consider that the regression equation has

statistics’ significance.

( 0.05)

( 0.05)

Page 13: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

10 2 R , 2R is the proportion of variation in the dependent variable that is predictable from the best linear combination of the

independent variables. The closer 2R is to 1, the better that the model is responsible for data. In this case:

6008.05519.222

7107.1332 R

In this example , 60% of the variance in blood sugar is

predictable from insulin, glycosylated hemoglobin, total

cholesterol of serum, triglyceride.

2 1 SS SS

RSS SS

reg grs

y y

A Coefficient of determination R 2:

Page 14: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

B Multiple correlation coefficient

It can be used to measure the degree of relationship between dependent variable y and a set of independent variables, that is the degree of

relationship between observation y and estimation Y .

The equation of calculation: 2RR , in this example

7751060080 ..R , if m=1, that |r|R , r is simple correlation

coefficient .

Page 15: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

( 2 ) for each independent variable The effect of each independent variable to y should be showed clearly in the equation. (analysis of variance and the total test of coefficient of determination.

A. Sum of the squared for partial regression

Significance In the equation, sum of the squared for partial regression of one of independent variables Xj means that when there are others m-1 independent variables, the contribution of this independent variable to the dependent variable y. That is, after Xj is excluded from the equation, the decrement of the sum of squared regression. That is, on the basic of m-1 independent variables, when Xj increases, the increase of the sum of squared regression.

Page 16: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

( )/1

/ ( 1)

j

j

SS XF

SS n mreg

res

1 2 1, 1n m

is sum of the squared for partial regression, the bigger it is the more importance of corresponding independent variable.

In general condition, the effect of m-1 independent variables to the sum of squared partial regression of y should be obtained from new equation, rather than exclude the from equation of m independent variables simply.

( )jSS Xreg

Page 17: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

Sum of the squared independent variables SSreg SSres

4321 X,X,X,X 133.7107 88.8412

② 432 X,X,X 133.0978 89.4540 ③ 431 XX,X 121.7480 100.8038 ④ 421 XX,X 113.6472 108.9047 ⑤ 321 XX,X 105.9168 116.6351

Table 15-5 some part result of case 15-1 base on regression analyze

Sum of squared for partial regression of each indepe

ndent variable can be accounted according to draw up regr

ession equation from different independent variables. Table

15-5 gives some part result of case 15-1.

Page 18: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

1 1 2 3 4 2 3 4( ) ( , , , ) ( , , )

133.7107-133.0978=0.6129

SS X SS X X X X SS X X Xreg reg reg

2 1 2 3 4 1 3 4( ) ( , , , ) ( , , )

133.7107-121.7480 11.9627

SS X SS X X X X SS X X Xreg reg reg

3 1 2 3 4 1 2 4( ) ( , , , ) ( , , )

133.7107-113.6472 20.0635

SS X SS X X X X SS X X Xreg reg reg

4 1 2 3 4 1 2 3( ) ( , , , ) ( , , )

133.7107-105.9168 27.7939

SS X SS X X X X SS X X Xreg reg reg

152.0)1427(/8412.88

1/6129.01

F , 962.2

)1427/(8412.88

1/9627.112

F

968.4)1427/(8412.88

1/0635.203

F , 883.6

)1427/(8412.88

1/7939.274

F

results

Page 19: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

B. t –test A method equals to sum of squared for partial regression test. Calculate formula is

jb

jj S

bt

Hypothesis test:

H0: 0j , jt obey to df is 1 mn of t

distribution。If 12 mn,/j t|t| , then at lever of

(0.05),reject H0,accept H1,that we can say there

is linear relationship between jX andY .

is estimative value of partial regression coefficient, is standard error of

jb

jbS jb

Page 20: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

390036560

142401 .

.

.t

7211

20420

351502 .

.

.t

2292

12140

270603 .

.

.t

6232

24330

638204 .

.

.t

results

results0742222050 .t ,/. , 074.2|| 34 tt ,

P-value is lower than 0.05, that is to say

3b and 4b have statistical significance,but

1b and 2b do not have statistical significance。

Page 21: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

C . Standardization regression coefficient Standardization variable is that subtract the mean of corresponding variable from original data, then divide by the standard deviation of variable.

' ( )j jj

j

X XX

S

This regression equation is named standardization regression equation, and corresponding regression coefficient is named standardization regression coefficient.

Y

jj

YY

jjjj S

Sb

l

lbb '

Standardization regression coefficient doesn’t have unit, it can be used to compare with the effective intension of each independent variable Xj to y. Generally, if there is statistical significance, the larger the absolute value of standardization regression coefficient is, the more important effect of correspondent independent variable to y

Page 22: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

Attention :• Generally, regression coefficient has unit,

and to interpret the effect of each independent variable to dependent variable. It means when other independent variables keep steady, increases or decreases one unit that the average change of y. We can’t use each to compare the effect of to

• Standardization regression coefficient doesn’t have unit, and to compare the effect of each independent variable to dependent variable, the larger is, the larger effect of to

jXjb

jX Y

jb jX

Y

Page 23: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

11.5934S,22.5748S,33.6706S ,41.8234S,2.9257YS

0776.09257.2

5934.11424.0'

1 b

3093092572

57482351502 .

.

..b '

3395092572

67063270603 .

.

..b '

3977092572

82341638204 .

.

..b '

results

As the result showed, the size of factors affect blood sugar can be ranked as follow: glycosylated hemoglobin(X4), insulin(X3), triglyceride(X2),total cholesterol of serum(X1).

Page 24: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

§ 2 choosing of independent variable

purpose : The effect of prediction and /or

explanation should be in the best

Page 25: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

1 、 entirely choosing method

Goal : for better prediction

significance : Compare the regression formula which construct of dif

ferent combined of independent variables select

method :1 . Revise determine coefficient 2cR choosing

method

2. pC choosing method

Page 26: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

1.1.1.revise decision coefficient 2cR choosing

method,formula:

reg

resc MS

MS

pn

nRR

11

1)1(1 22

N is sample size, 2R is coefficient of determination of

regression equation, which include )( mpp

independent variables 。 The change rule of 2cR is :

when 2R are equal , the more number of independent

variables are, the smaller 2cR is。By mean of “ the best”

regression equation, 2cR is the largest。

Page 27: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

1.1.2. choosing of pC

)]1(2[)(

)( pn

MS

SSC

mres

presp

presSS )( is sum of squared errors of regression

from )( mpp of independent variables ,

mresMS )( is residual mean square that comes from

the regression model of total m of independent variables.

When the equation from p of independent variables is theatrically best, the expected value of pC is p+1, so the regression equation in

which pC is the closest to p+1 should be chosen the optimum equation.

The pC should not be applied to choose independent variable if there is

no variable which has main effect to y in all independent variables.

Page 28: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

Case 15-2 Use entirely choose excellent method to choose independent variables in case 15-1

Independent variable

2cR pC Independent

variable 2cR pC

X2,X3,X4 0.546 3.15 X2,X3 0.408 9.14

X1,X2,X3,X4 0.528 5.00 X1,X3 0.375 10.78

X1,X3,X4 0.488 5.96 X4 0.347 11.63

X1,X2,X4 0.447 7.97 X1 0.284 14.92

X1,X4 0.441 7.42 X1,X2 0.275 15.89

X2,X4 0.440 7.51 X3 0.231 17.77

X3,X4 0.435 7.72 X2 0.179 20.53

X1,X2,X3 0.408 9.88 m=4, so the number of regression equation of fit

is 42 1 2 1 15m 。 The best construction isX2,X3,X4,that is the

optimum regression equation which constructs triglyceride, insulin, glycosylated hemoglobin, with blood sugar.

Page 29: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

2 stepwise selection

1.

2.1.forward selection , Import the independent variables into t

he regression equation one by one. This way is omitted from consideration o

n the whole.

2.2. backward elimination , Place all the independent variables into the equation, then eliminate those without statistic significant progressively. The way of independent variables elimination is to select a variable has the lest square sum of regression, make F-test to determine whether it should be eliminated. Eliminate the one without statistic significant and then make a new regression equation with the left ones. Repeat this progress ceaselessly, until all the independent variables in the equation can not be eliminated. Theoretically, it’s the best way, and we strongly recommend.

2.3.stepwise regression , Stepwise regression is on the basis

of the two approaches hereinbefore, it’s a way of bidirectional filtration. Esse

ntially speaking, it’s a way of forward selection.

Page 30: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

Setting the test level: the test level of small sample is 0.10 or 0.15, the test level of large sample is 0.05.

A lower level means a stricter standard for selecting variables, as a result, there will be less selected variables. Whereas a higher level means a wider standard, which means more variables will be chosen.

Attention: the level of independent variable entered must lower than or equal to the level of independent variable moved.

Page 31: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

Case 15-3 Use the stepwise regression to

analyze data in case 15-1 ( 100.入 , 15.0出 )。

process (l)

Variable entered

Variables Removed

The number

of variable p

2R ( )

( )l

SS X jreg ( )l

SSres F-value P-value

1 X4 1 0.372 82.714 139.837 14.788 0.0007 2 X1 2 0.484 25.076 114.762 5.244 0.0311 3 X3 3 0.547 13.958 100.804 3.185 0.0875 4 X2 4 0.601 11.963 88.841 2.962 0.0993 5 X1 3 0.598 0.613 88.841 0.152 0.7006

Table 15-7 the process of stepwise regression

Page 32: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

Source of variation

df SS MS F P

Total variance 26 222.5519

Regression 3 133.098 44.366 11.41 0.0001

residual 23 89.454 3.889

Table 15-8 analysis of variance of case 15-3

“the best” regression equation :

432 6632.02871.04023.04996.6ˆ XXXY Result : There is linear relationship between the change of blood sugar and insulin, glycosylated hemoglobin, total cholesterol of serum, triglyceride. Insulin is negative relation. From the standard regression coefficient, we can conclude that glycosylated hemoglobin has the largest effect to fasting blood glucose.

Page 33: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

Table 15-9 Estimation and test result of regression coefficient in case 15-3

variable Regression

coefficient b

Standard error

bS

Standard regression

coefficient 'b t-value P-value

constant 6.4996 2.3962 0 2.713 0.0124

X2 0.4023 0.1540 0.3541 2.612 0.0156

X3 -0.2870 0.1117 -0.3601 -2.570 0.0171

X4 0.6632 0.2303 0.4133 2.880 0.0084

Page 34: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

§ 3

Application of Multiple Linear Regression and Attentions

Page 35: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

1. Application

Page 36: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

1.1. Analysis of the related factors

• For example, there are many factors that can affect hypertension, such as age, diet, habit, smoking, tension, family history and so on. So among those, it’s necessary to find which factors are related and which are further.

Page 37: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

• During clinical practice, it is difficult to ensure the agreement of all parameter of all groups, because of lots of complicated condition.

• For example , the regression can help compare two different therapy ,with the disagreement on age, the state of illness and so on.

• An easy method to control confounding factors is to draw these to regression equation and analyze with other major variables.

Page 38: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

2.2. Estimation and Prediction

• For example, estimating the surface area of children’s hearts by their cardiac broad diameter(TCD); predicting the infants’ weigh by their gestational age, diameter of head , diameter at breast height (DBH) and abdomen girth(AG).

Page 39: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

2.3.Stastistical Control, Backrun Estimation

• For example, when we use the radio frequency therapy appearance to cure brain tumors, the impaired diameter of pallium has the linear regression relation with the temperature of radio frequency and the exposure time. The regression equation is established and it can help determine the optimal control of the temperature of radio frequency and the exposure time ,by given the impaired diameter of pallium in advance.

Page 40: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

2 The problems of using multiple regression

2.1.Quantify of indices• (1)quantify, non-linear linear• (2)qualitative indices convert to quantitative

ones: (0,1)variable, dummy variable, false variable, indicative variable.

Binomial classified, use (0,1) variable,such as sexMultinomial classified, k-1(0,1)variables,such as blood

type:

0 male

1 female

Blood type X1 X2 X3

O 0 0 0 A 1 0 0 B 0 1 0

AB 0 0 1

Page 41: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

No X1 X2 X3 Y 1 1 0 0 2 0 0 0 3 0 1 0 n 0 0 1

Data model regression equation

Founding regression equation

0 1 1 2 2 3 3Y b b X b X b X b1: the distinction of A type compares to O type

b2 : the distinction of B type compares to O type b3 : the distinction of AB type compares to O type

Page 42: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

(3) Rank Quantities

We always change the rank from strong to weak into x=1,2,3, … (or x=0,1,2, … ). For example, education level could be classified into 4 degree: primary scholar, junior or senior student, undergraduate, graduate or PhD. stands for income.

1

1 2 3

X

小学中学大学

4 大学以上

0 1 1Y b b X

Explanation: b(b1) represents that when the 1unit of x(x1) increased, would increase b units(such as 500). It means junior or senior students could earn 500 more than primary scholar, undergraduates earn 500 more than junior students.

Primary scholarJunior or senior UndergraduateGraduate or PhD

Y

Y

Page 43: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

We could also change the k degree into k-1 (0,1) variables

b1,b2,b3 represents the income differences between junior or senior ,undergraduate and graduate or PhD when compares to primary scholar.

Dummy variable X1 X2 X3

Primary school 0 0 0

Middle school 1 0 0

college 0 1 0

Graduate or PhD 0 0 1

Page 44: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

2.2. Sample size: n =(5 ~ 10)m

2.3. Stepwise regression: Don’t trust in the result of stepwise blindly. The so called “best” regression equation does not by all means the best. The variable excluded from the equation does not mean that it has no statistical significance.For example: 15-3 if we change the entry probability of stepwise into 0.05( )and the removal probability into 0.10( ), the ultimate chosen variables should be , rather than .

Which regression equation be used is decided by the professional knowledge.

05.0入

10.0出41, XX

4321 ,,, XXXX

Page 45: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

2.4. Multicollinearity: there maybe some stron

g linear relationship exists between independ

ent variables.

For example, hypertension and age, years of

smoking, years of drinking et al. Those indep

endent variables are highly related which ma

kes founding equation through the method of

least squares out of use. And it could invite s

ome negative result:

Page 46: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

Elimination of multicollinearity: discard the independent variable which makes collinearity; rebuild equation of regression; use stepwise regression.

• (1) standard error of the test statistic becomes large, therefore, t value becomes small;

• (2) regression equation becomes unstable. The evaluation could change significant when the observed datum increased or decreased;

• (3) inaccuracy of t test caused the discard of important variables which should be involved in model;

• (4) the inconsistent positive and negative sign of

evaluation with objective reality.

Page 47: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

2.5. The interaction between variables

In order to test whether there is

interaction between the two independent

variables, we usually added the product

of them into the equation.

Page 48: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

In analyzing the data in table 15-2, we have chosen three variables: triglyceride(x2), insulin(x3) and glycosylated hemoglobin(x4). And now we add x3 x4 into the equation. If the product(x3* x4 ) is statistically significant, it means that there is interaction between the insulin and the glycosylated hemoglobin. Therefore, we should define the new variable z (z=x3* x4) , and reestimate test statistic according to the new equation (y=b0 + b2x2 + b3x3 + b4x4 + bzz). If the hypothetic test rejected H0: βz=0 , it could be concluded that there exists interactive effect except the main effect of x3 and x4. In this case, the conclusion is that the use of Z is statistically significant(p <0.01). y=0.7898+0.3690x2+

1.2267x3+1.5097x4-0.1785z. That means the effect of insulin in patients of diabetes is relied on the concentration of glycosylated hemoglobin.

Page 49: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.

2.6.Residuals analysis That is

Under regular circumstances, the residuals ei are normally distributed. The mean of this normal distribution is zero, and the variance equals to σ2 . The residuals plot is composed of standardized residuals

as the vertical line and as the horizontal line.iY

iii YYe ˆ

残MS

ee i

i '

Page 50: Chapter 15 Multiple Linear Regression Analysis. Multiple linear regression Choice of independent variable Application.