APPLICATION OF FUZZY REGRESSION METHODOLOGY IN AGRICULTURE USING SAS Himadri Ghosh and Savita Wadhwa I.A.S.R.I., Library Avenue, Pusa, New Delhi – 110012 [email protected], [email protected]Multiple linear regression modelling is a very powerful technique and is extensively used in agricultural research (Lalitha et al. 1999, Guo and Sun 2001). This technique estimates linear relationship between dependent (response) and independent (explanatory) variables. If X i , i=1,2,…,n are explanatory variables and Y is response variable, the model is expressed as : e X b X b b Y n n ... 1 1 0 (1) where b’s are parameters and e is the error term assumed to be followin g a normal distribution. The parameters are generally estimated using method of least squares. A good description of various aspects of multiple linear regression methodology is given in Draper and Smith (1998). One drawback of the above methodology is that the underlying relationship is assumed to be crisp or precise, as its gives a precise value of response for a set of values of explanatory variables. However, in a realistic situation, the underlying relationship is not a crisp function of a given form; it contains some vagueness or impreciseness. So, by assuming a crisp relationship, some vital information may be lost (Slowinski 1998). A very promising technique of fuzzy regression has been developed. This technique can be applied to solve agricultural research problems. A fuzzy regression model corresponding to equation (1) can be written as: n n 1 1 0 X A ... X A A Y (2) Here explanatory variables X i ’s, as before, are assumed to be precise. However, as mentioned above, response variable Y is not crisp but is instead fuzzy in nature. This implies that the parameters are also fuzzy in nature. Our aim is to estimate these parameters. In subsequent discussion, it is assumed that A i ‘s are symmetric fuzzy numbers (ie vagueness is expressible as equidistant from the center) and so can be represented by intervals. For example, A i can be expressed as fuzzy set given by: w c a a A 1 1 1 , (3) where c a 1 is centre and w a 1 is radius or vagueness associated. The above fuzzy set describes belief of regression coefficient around ic a in terms of symmetric triangular membership function. It is also to be noted that the methodology is applied when the underlying phenomenon is fuzzy which means that the response variable is fuzzy and the relationship is also considered to be fuzzy. Equation (3) is sometimes also written as: ] , [ 1 1 1 R L a a A (4) where w c w c L a a and a a a 1 1 1R 1 1 1 a (Kacprzyk and Fedrizzi 1992)
17
Embed
Application of fuzzy regression methodology in agriculture*iasri.res.in/sscnars/socialsci/18-APPLICATION OF FUZZY REGRESSION... · Application of Fuzzy Regression Methodology in Agriculture
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
APPLICATION OF FUZZY REGRESSION METHODOLOGY
IN AGRICULTURE USING SAS
Himadri Ghosh and Savita Wadhwa
I.A.S.R.I., Library Avenue, Pusa, New Delhi – 110012
Multiple linear regression modelling is a very powerful technique and is extensively used in agricultural research (Lalitha et al. 1999, Guo and Sun 2001). This technique estimates linear
relationship between dependent (response) and independent (explanatory) variables. If Xi, i=1,2,…,n are explanatory variables and Y is response variable, the model is expressed as :
eXbXbbY nn ...110 (1)
where b’s are parameters and e is the error term assumed to be following a normal distribution. The parameters are generally estimated using method of least squares. A good
description of various aspects of multiple linear regression methodology is given in Draper and Smith (1998).
One drawback of the above methodology is that the underlying relationship is assumed to be crisp or precise, as its gives a precise value of response for a set of values of explanatory
variables. However, in a realistic situation, the underlying relationship is not a crisp function of a given form; it contains some vagueness or impreciseness. So, by assuming a crisp relationship, some vital information may be lost (Slowinski 1998). A very promising
technique of fuzzy regression has been developed. This technique can be applied to solve agricultural research problems.
A fuzzy regression model corresponding to equation (1) can be written as:
nn110 XA...XAAY (2)
Here explanatory variables Xi’s, as before, are assumed to be precise. However, as mentioned above, response variable Y is not crisp but is instead fuzzy in nature. This implies that the
parameters are also fuzzy in nature. Our aim is to estimate these parameters. In subsequent discussion, it is assumed that Ai ‘s are symmetric fuzzy numbers (ie vagueness is expressible as equidistant from the center) and so can be represented by intervals. For example, Ai can be
expressed as fuzzy set given by:
wc aaA 111 , (3)
where ca1 is centre and wa1 is radius or vagueness associated. The above fuzzy set describes
belief of regression coefficient around ica in terms of symmetric triangular membership
function. It is also to be noted that the methodology is applied when the underlying phenomenon is fuzzy which means that the response variable is fuzzy and the relationship is
also considered to be fuzzy. Equation (3) is sometimes also written as:
],[ 111 RL aaA (4)
where wcwcL aaandaaa 111R 111 a (Kacprzyk and Fedrizzi 1992)
Application of Fuzzy Regression Methodology in Agriculture Using SAS
Method of estimation of parameters of equation (2) is different from that of equation (1). In fuzzy regression methodology, parameters are estimated by minimizing total vagueness in the
model, ie sum of radii of predicted intervals, From equation (2):
njnij10j XA...XAAY
Using equation (3),
,y,yxa,a...xaaa,a y jwjcnjnwncj1w1,c1w0c0j say
Thus
njncjccjc xaxaay ...110 (5a)
||...|| 110 njnwjwwjw xaxaay (5b)
As jwy represents radius and so cannot be negative, therefore on the right-hand side of
equation (5b), absolute values of ijx are taken. Suppose there are m data points, each
comprising rowna 1 vector. Then parameters Ai are estimated by minimizing the
quantity, which is total vagueness of the model-data set combination, subject to the
constraints that each data point must fall within estimated value of response variable. This can be visualized as the following linear programming problem (Tanaka 1987):
Minimize
m
jnjnwjww xaxaa
1110 ||...|| (6)
Subject to
10
10 j
n
iijiww
n
iijicc Yxaaxaa
j
n
iijiww
n
iijicc Yxaaxaa
10
10
and 0iwa
To solve the above linear programming problem, Simplex procedure (Taha 1997) is generally
employed.
ILLUSTRATION 1: Data given in article of Sengupta et al. (2001) are considered. They have studied the effect of sulphur-containing fertilizers on productivity of rainfed greengram (Phaseolus radiatus L.). The response variable is dry-matter accumulation (Y) and the
explanatory variables are plant height (X1) and leaf area index (X2). Only the data pertaining to maturity level, i.e. 60 days after sowing (DAS), are considered for data analysis and the
same are presented in Table 1 for ready reference.
Application of Fuzzy Regression Methodology in Agriculture Using SAS
Table 1: Data of dry matter accumulation, plant height and leaf area index for effect of
sulphur on growth of greengram crop
Dry-matter accumulation
(g/m2)
Plant Height
(cm)
Leaf area index
247.32 60.41 3.74
324.52 61.08 4.80
364.56 64.98 5.71
328.44 64.16 5.27
349.48 62.99 5.45
339.92 65.20 5.34
320.48 63.24 5.11
357.16 67.19 5.66
The multiple linear regression model and fuzzy regression model are fitted to the above data using SAS, version 9.2 software package and following are the SAS codes and results
obtained: Method of Multiple linear regression (MLR)
Title 'Method of least square'; ods csv file=’resultls.csv’;
read data plant into [_n_] y x1 x2; /*Print y x1 x2*/ print y x1 x2;
number n init 8; /* Total number of Observations*/ /* Decision Variables*/
var aw{1..3}>=0; /*Theses three variables are bounded*/ var ac{1..3}; /* These three variables are not bounded*/
/* Objective function*/
min z1= aw[1] * n + sum{i in j} x1[i] * aw[2] + sum{i in j} x2[i] * aw[3];
/*Linear Constraints*/ con c{i in 1..n}: ac[1]+x1[i]*ac[2]+x2[i]*ac[3]-aw[1]-x1[i]*aw[2]- x2[i]*aw[3] <= y[i]; con c1{i in 1..n}: ac[1]+x1[i]*ac[2]+x2[i]*ac[3]+aw[1]+x1[i]*aw[2]+x2[i]*aw[3] >= y[i];
expand; /* This provides all equations */
solve; print ac aw; quit;
ods rtf close;
RESULTS:
Partial SAS output:
Method of Multiple linear regression (MLR) Parameter Estimates
The fitted model for MLR is Y=186.10 – 3.02 X1 + 65.22 X2 (7) Standard Errors (107.63) (2.16) (7.57)
The fitted model for FR is Y = <217.08, 7.97 > + <-3.06, 0 > X1 + < 59.73, 0 > X2 (8)
In order to compare performance of above 2 approaches, viz multiple linear regression methodology and fuzzy regression methodology, width of prediction intervals corresponding to each observed value of response variable is computed. For the former, upper limits of
prediction interval are computed from the prediction equation (7) by taking the coefficient as their corresponding estimated values plus standard error, i.e. using the equation
Application of Fuzzy Regression Methodology in Agriculture Using SAS
Further, for fuzzy regression model, the prediction equations for computing upper and lower limits, obtained from equation (8), are respectively
Y = (217.08 +7.97) + (-3.06 + 0) X1 + (59.73 + 0) X2 and Y = (217.08 -7.97) + (-3.06 - 0). X1 + (59.73 - 0) X2
The width of prediction intervals in respect of multiple linear regression model and fuzzy regression model corresponding to each set of observed explanatory variables is computed in MS Excel (We can open above SAS results in MS Excel directly) and the results are reported
in the following Table 2. From this table, average width for former was found to be 568.00, while that for latter was only 15.93, indicating thereby the superiority of fuzzy regression
methodology.
Table 2: Fitting of MLR and FLR
Multiple Linear Regression (MLR) Model Lower limit Upper limit Width
Fuzzy Regression (FR) Model Lower limit Upper limit Width
-18.84 514.01 532.85 247.65 263.58 15.93
38.80 590.59 551.80 308.91 324.85 15.93
71.06 653.48 582.42 351.33 367.27 15.93
49.94 622.16 572.22 327.56 343.49 15.93
66.37 636.26 569.89 341.89 357.83 15.93
48.59 626.36 577.77 328.56 344.49 15.93
45.48 611.30 565.82 320.82 336.75 15.93
56.72 647.94 591.21 341.58 357.52 15.93
Average width 568.00 Average width 15.93
In reality, underlying phenomenon is fuzzy; therefore, as emphasized above, correct methodology to obtain relationship between response and explanatory variables is to apply fuzzy regression methodology rather than multiple linear regression methodology.
ILLUSTRATION 2: Length (L) – weight (W) data of a fish species is given below:
Assuming the underlying phenomenon to be fuzzy, fit Fuzzy linear regression using the Method of fuzzy least squares (FLS) to the deterministic allometric model W=a Lb
.
For estimating Length-weight relationship statistical form of the above deterministic
allometric model is:
Log W= log a + b log L +e (i)
Also compare the results with those obtained through fitting least squares (LS).
Application of Fuzzy Regression Methodology in Agriculture Using SAS
Note: Optmodel procedure can also be used for this illustration
Active Constraints 2 Objective Function 30.221322328
Max Abs Gradient Element 75.94396
All parameters are actively constrained. Optimization cannot proceed.
PROC NLP: Nonlinear Minimization
Optimization Results
Parameter Estimates
N Parameter Estimate Gradient
Objective
Function
Active
Bound
Constraint
1 ar 0.145200 16.000000
2 br -6.34258E-18 75.943960 Lower BC
Application of Fuzzy Regression Methodology in Agriculture Using SAS
Optimization Results
Parameter Estimates
N Parameter Estimate Gradient
Objective
Function
Active
Bound
Constraint
3 ac -11.990000 0 Equal BC
4 bc 2.960000 0 Equal BC
Value of Objective Function = 2.3232
Substituting the values of parameter estimates in model (i) of illustration 2 logw = -11.99 + 2.96 logl Standard Errors (0.15) (0.00)
Now we substitute the values of parameters i.e. a=exp(-11.99) and b=2.96 and their corresponding standard errors in the deterministic allometric model W=a Lb and calculate the
width as follows:
Width = exp(-11.99+0.15) *L2.96-(exp(-11.99-0.15)*L2.96 for different values of
L(Length) given in the illustration 2.
Table 3: Width of predicted interval by LS and FLS approach
Length
(mm)
Weight
(g)
Estimated weight
(g)
Width of predicted interval (g)
LS FLS
80 3.05 2.67 4.50 0.65
85 3.07 3.20 5.43 0.77
90 3.68 3.79 6.48 0.91
95 4.56 4.45 7.66 1.06
100 4.72 5.19 8.98 1.24
105 6.10 6.00 10.44 1.42
110 6.65 6.89 12.06 1.63
115 7.65 7.87 13.84 1.86
120 9.16 8.93 15.79 2.10
125 10.14 10.09 17.91 2.37
130 10.43 11.34 20.22 2.65
135 12.99 12.69 22.73 2.96
140 14.48 14.14 25.43 3.29
145 15.32 15.70 28.35 3.64
150 17.35 17.37 31.48 4.02
155 20.90 19.15 34.84 4.42
Average width 16.63 2.19
Conclusion: The predicted interval computed using “Method of fuzzy least squares” have much shorter average width as compared to that obtained using “Method of least
squares”. This implies that former procedure is more efficient than latter. The main
Application of Fuzzy Regression Methodology in Agriculture Using SAS
message emerging out of this illustration is that correct methodology to determine length-weight relationship in fish is that of “Fuzzy least squares” rather than ordinary
“Least squares”.
ILLUSTRATION 3: Wheat crop and spectral vegetation indices Normalized Difference Vegetation Index (NDVI) and Ratio Vegetation Index (RVI) (95 days a fter sowing) are observed from the experimental study at IARI farms, New Delhi and are
Use Fuzzy regression Methodology (FRM) to fit the data using linear programming
approach available in SAS software package and show its superiority over corresponding Multiple linear regression (MLR) model. Models are same as given in
equations (1) and (2) above.
Note: Optmodel procedure can also be used for this illustration
SAS CODES:
Method of Least squares (LS)
Data plant;
input Y NDVI RVI; cards;
52.29 0.52 3.18 43.05 0.54 3.36 44.2 0.52 3.2
44.05 0.48 2.87 36.08 0.48 2.88
40.04 0.49 2.88 46.93 0.54 3.35 47.64 0.52 3.19
46.62 0.53 3.24 46.13 0.49 2.95
29.57 0.38 2.2 45.17 0.46 2.67
;
proc reg;
model Y= NDVI ; proc reg; model Y= RVI;
proc reg; model Y= NDVI RVI;
run; quit;
Application of Fuzzy Regression Methodology in Agriculture Using SAS
Partial SAS Output:
Parameter Estimates
Variable DF Parameter
Estimate
Standard
Error
t Value Pr > |t|
Intercept 1 -6.68100 13.45930 -0.50 0.6304
NDVI 1 101.16673 27.04375 3.74 0.0038
Parameter Estimates
Variable DF Parameter
Estimate
Standard
Error
t Value Pr > |t|
Intercept 1 4.03703 11.51212 0.35 0.7331
RVI 1 13.15890 3.81911 3.45 0.0063
Parameter Estimates
Variable DF Parameter
Estimate
Standard
Error
t Value Pr > |t|
Intercept 1 -26.36657 25.03924 -1.05 0.3198
NDVI 1 322.31429 238.14074 1.35 0.2089
RVI 1 -30.01393 32.10847 -0.93 0.3743
So our equations are:
i) Y= -6.68 + 101.17 NDVI Standard Errors (13.46) (27.04)
ii) Y= 4.04 + 13.16 RVI Standard Errors (11.51) (3.82)
Upper and lower widths of prediction interval for Multiple linear regression models are computed respectably as Y = (-6.68+13.46) + (101.17+27.04) NDVI (a)
Y = (-6.68-13.46) + (101.17-27.04) NDVI (b)
Upper and lower widths of prediction interval for Fuzzy linear regression models are computed respectably as
Y = (-14.55+5.13) + (117.42+1.25) NDVI (c) Y = (-14.55-5.13) + (117.42-1.25) NDVI (d)
By subtracting equation(b) from (a) and then taking average, we can get average width for Multiple linear regression model and by subtracting equation(d) from (c) and then taking average, we can get average width for Fuzzy linear regression model. Similarly we can get
average widths for RVI and both NDVI and RVI. The following table shows the average width for the three predictor variables.
Table 4: Average width for fitted regression models
Predictor Variable Method of Least Squares
(LS)
Fuzzy Linear
Regression (FLR)
Model
NDVI 53.73 11.50
RVI 45.92 12.02
Both (NDVI & RVI) 478.73 10.51
Conclusion: The above table 4 shows that average widths for linear regression models
vis-à-vis their fuzzy counterparts are much higher. Thus Fuzzy regression methodology is more efficient than Multiple linear regression technique. It may also be pointed out that, for fuzzy approach, average widths, when both NDVI and RVI are taken into
account, are generally smaller than when only one of these is considered, which is quite logical. This clearly shows that, unlike multiple linear regression technique, fuzzy
regression methodology is capable of handling situations in which predictor variables are highly correlated.