Linear Lack of Fit (LOF) Test An F test for checking whether a linear regression function is inadequate in describing the trend in the data
Jan 15, 2016
Linear Lack of Fit (LOF) Test
An F test for checking whether a linear regression function is inadequate in
describing the trend in the data
Where does this topic fit in?
• Model formulation
• Model estimation
• Model evaluation
• Model use
504030
200
150
100
Latitude
Mo
rta
lity
S = 19.1150 R-Sq = 68.0 % R-Sq(adj) = 67.3 %Mortality = 389.189 - 5.97764 Latitude
Regression Plot
Example 1
Do the data suggest that a linear function is inadequate in describing the relationship between skin cancer mortality and latitude?
Example 2
Do the data suggest that a linear function is inadequate in describing the relationship between the length and weight of an alligator?
150140130120110100 90 80 70 60
700
600
500
400
300
200
100
0
Length
We
ight
S = 54.0115 R-Sq = 83.6 % R-Sq(adj) = 82.9 %Weight = -393.264 + 5.90235 Length
Regression Plot
Example 3
Do the data suggest that a linear function is inadequate in describing the relationship between iron content and weight loss due to corrosion?
210
130
120
110
100
90
80
Iron content
We
ight
loss
S = 3.05778 R-Sq = 97.0 % R-Sq(adj) = 96.7 %wgtloss = 129.787 - 24.0199 iron
Regression Plot
Some notation
200150100
150
100
50
Size of minimum deposit
Num
ber
of
new
acc
oun
ts
xy 49.07.50ˆ
2811 y
4212 y
351 y
12462 y
1146 y
10461 y
2.87ˆˆ 1211 yy 1.148ˆˆ 6261 yy
Decomposing the error
200150100
150
100
50
Size of minimum deposit
Num
ber
of
new
acc
oun
ts
xy 49.07.50ˆ
14742ˆ2
i jijij yy
13594ˆ2
i jiji yy
11482
i jiij yy
Decomposing the error
100 150 200
80
90
100
110
120
130
140
150
Size of minimum deposit
Num
ber
of
new
acc
oun
ts xy 49.07.50ˆ 1.45ˆ
2
i jijij yy
6.6ˆ2
i jiji yy
5.382
i jiij yy
The basic idea
• Break down the residual error (“error sum of squares – SSE) into two components:– a component that is due to lack of model fit
(“lack of fit sum of squares” – SSLF)– a component that is due to pure random error
(“pure error sum of squares” – SSPE)
• If the lack of fit sum of squares is a large component of the residual error, it suggests that a linear function is inadequate.
A geometric decomposition
200150100
150
100
50
Size of minimum deposit
Num
ber
of
new
acc
oun
ts
iijijiijij yyyyyy ˆˆ
ijy
ijij xbby 10ˆ
iy
The decomposition holds for the sum of the squared deviations, too:
c
i
n
jiij
c
i
n
jiji
c
i
n
jijij
iii
yyyyyy1 1
2
1 1
2
1 1
2 ˆˆ
Error sum of squares (SSE)
Lack of fit sum of squares (SSLF)
Pure error sum of squares (SSPE)
SSPESSLFSSE
Breakdown of degrees of freedom
cncn 22
Degrees of freedom associated with SSE
Degrees of freedom associated with SSLF
Degrees of freedom associated with SSPE
Definitions of Mean Squares
And, the pure error mean square (MSPE) is defined as:
The lack of fit mean square (MSLF) is defined as:
cn
SSPE
cn
yyMSPE iij
2
22
ˆ 2
c
SSLF
c
yyMSLF iji
Expected Mean Squares
2
)(2
102
c
XnMSLFE iii
2)( MSPEE
• If μi = β0+β1Xi, we’d expect the ratio MSLF/MSPE to be …
• If μi ≠ β0+β1Xi, we’d expect the ratio MSLF/MSPE to be …
• Use ratio, MSLF/MSPE, to reject whether or not μi = β0+β1Xi.
Expanded Analysis of Variance Table
Source DF SS MS F
Regression 1
Residual error
n-2
Lack of fit c-2
Pure error n-c
Total n-1
c
i
n
jijij
i
yySSE1 1
2ˆ
c
i
n
jij
i
yySSTO1 1
2
c
i
n
jij
i
yySSR1 1
2ˆ1
SSRMSR
2n
SSEMSE
MSE
MSRF
c
i
n
jiji
i
yySSLF1 1
2ˆ
c
i
n
jiij
i
yySSPE1 1
2
2c
SSLFMSLF
cn
SSPEMSPE
MSPE
MSLFF
The formal lack of fit F-test
Null hypothesis H0: μi = β0+β1XiAlternative hypothesis HA: μi ≠ β0+β1Xi
Test statisticMSPE
MSLFF *
P-value = What is the probability that we’d get an F* statistic as large as we did, if the null hypothesis is true?
The P-value is determined by comparing F* to an F distribution with c-2 numerator degree of freedom and n-c denominator degrees of freedom.
LOF Test in Minitab
• Stat >> Regression >> Regression …
• Specify predictor and response.
• Under Options…– under Lack of Fit Tests, select the box labeled
Pure error.
• Select OK. Select OK.
Decomposing the error
200150100
150
100
50
Size of minimum deposit
Num
ber
of
new
acc
oun
ts
xy 49.07.50ˆ
14742ˆ2
i jijij yy
13594ˆ2
i jiji yy
11482
i jiij yy
Is there lack of linear fit?
Analysis of Variance
Source DF SS MS F PRegression 1 5141 5141 3.14 0.110Residual Error 9 14742 1638 Lack of Fit 4 13594 3398 14.80 0.006 Pure Error 5 1148 230Total 10 19883
1 rows with no replicates
Decomposing the error
100 150 200
80
90
100
110
120
130
140
150
Size of minimum deposit
Num
ber
of
new
acc
oun
ts xy 49.07.50ˆ 1.45ˆ
2
i jijij yy
6.6ˆ2
i jiji yy
5.382
i jiij yy
Is there lack of linear fit?
Analysis of Variance
Source DF SS MS F PRegression 1 5448.9 5448.9 1087.06 0.000Residual Error 9 45.1 5.0 Lack of Fit 4 6.6 1.7 0.21 0.919 Pure Error 5 38.5 7.7Total 10 5494.0
1 rows with no replicates
Example 1
Do the data suggest that a linear function is not adequate in describing the relationship between skin cancer mortality and latitude?
504030
200
150
100
Latitude
Mo
rta
lity
S = 19.1150 R-Sq = 68.0 % R-Sq(adj) = 67.3 %Mortality = 389.189 - 5.97764 Latitude
Regression Plot
Example 1: Mortality and Latitude
Analysis of Variance
Source DF SS MS F PRegression 1 36464 36464 99.80 0.000Residual Error 47 17173 365 Lack of Fit 30 12863 429 1.69 0.128 Pure Error 17 4310 254Total 48 53637
19 rows with no replicates
Example 2
Do the data suggest that a linear function is not adequate in describing the relationship between the length and weight of an alligator?
150140130120110100 90 80 70 60
700
600
500
400
300
200
100
0
Length
We
ight
S = 54.0115 R-Sq = 83.6 % R-Sq(adj) = 82.9 %Weight = -393.264 + 5.90235 Length
Regression Plot
Example 2: Alligator length and weight
Analysis of Variance
Source DF SS MS F PRegression 1 342350 342350 117.35 0.000Residual Error 23 67096 2917 Lack of Fit 17 66567 3916 44.36 0.000 Pure Error 6 530 88Total 24 409446
14 rows with no replicates
Example 3
Do the data suggest that a linear function is not adequate in describing the relationship between iron content and weight loss due to corrosion?
210
130
120
110
100
90
80
Iron content
We
ight
loss
S = 3.05778 R-Sq = 97.0 % R-Sq(adj) = 96.7 %wgtloss = 129.787 - 24.0199 iron
Regression Plot
Example 3: Iron and corrosion
Analysis of Variance
Source DF SS MS F PRegression 1 3293.8 3293.8 352.27 0.000Residual Error 11 102.9 9.4 Lack of Fit 5 91.1 18.2 9.28 0.009 Pure Error 6 11.8 2.0Total 12 3396.6
2 rows with no replicates
3020100
400
300
200
mileage
gro
ove
Example 4
Do the data suggest that a linear function is not adequate in describing the relationship between mileage and groove depth?
Example 4: Tread wear
Analysis of Variance
Source DF SS MS F PRegression 1 50887 50887 140.71 0.000Residual Error 7 2532 362Total 8 53419
No replicates. Cannot do pure error test.
When is it okay to perform the LOF Test?
• When the “INE” part of the “LINE” assumptions are met.
• The LOF test requires repeat observations, called replicates, for at least one of the values of the predictor X.