Top Banner
Theory and Methodology Decision making using multiple models Manoj K. Malhotra a, * , Subhash Sharma b , Satish S. Nair c a Department of Management Science, College of Business Administration, University of South Carolina, Columbia, SC 29208, USA b Department of Marketing, College of Business Administration, University of South Carolina, Columbia, SC 29208, USA c Department of Mechanical and Aerospace Engineering, University of Missouri, Columbia, MO 65211, USA Received 1 June 1996; accepted 1 January 1998 Abstract Many real world business situations require classification decisions that must often be made on the basis of judgment and past performance. In this paper, we propose a decision framework that combines multiple models or techniques in a complementary fashion to provide input to managers who make such decisions on a routine basis. We illustrate the framework by specifically using five dierent classification techniques – neural networks, discriminant analysis, qua- dratic discriminant analysis (QDA), k-nearest neighbor (KNN), and multinomial logistic regression analysis (MNL). Application of the decision framework to an actual retail department store data shows that it is most useful in those cases where uncertainty is high and a priori classification cannot be made with a high degree of reliability. The proposed framework thus enhances the value of exception reporting, and provides managers additional insights into the phe- nomenon being studied. Ó 1999 Elsevier Science B.V. All rights reserved. Keywords: Classification decisions; Neural nets; Multivariate techniques 1. Introduction Classification problems deal with the assign- ment of an object to an appropriate group, and frequently occur in business environments. Man- agers are continuously called upon to make per- formance evaluation and classification decisions. For example, brand managers have to make product deletion and introduction decisions. Loan and credit managers have to decide whether loan applicants should or should not be approved for the loan. Financial analysts typically evaluate the financial health of firms, and classify the firms accordingly (i.e., high-risk, medium-risk and low- risk firms). Similarly, retail store managers typi- cally classify dierent departments such as toys, soft fashion lines, electronics, etc. as high, medium and low performers (Sharma and Achabal, 1982). Several such examples of classification problems exist in other areas also (Salchenberger et al., 1992). A number of statistical models such as multiple regression, discriminant analysis, logistic regres- sion, and profit analysis have been suggested for European Journal of Operational Research 114 (1999) 1–14 * Corresponding author. Tel.: 001 803 777 2712; fax: 001 803 777 6876; e-mail: [email protected] 0377-2217/99/$ – see front matter Ó 1999 Elsevier Science B.V. All rights reserved. PII: S 0 3 7 7 - 2 2 1 7 ( 9 8 ) 0 0 0 3 7 - X
14

Decision making using multiple models

May 02, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Decision making using multiple models

Theory and Methodology

Decision making using multiple models

Manoj K. Malhotra a,*, Subhash Sharma b, Satish S. Nair c

a Department of Management Science, College of Business Administration, University of South Carolina, Columbia, SC 29208, USAb Department of Marketing, College of Business Administration, University of South Carolina, Columbia, SC 29208, USA

c Department of Mechanical and Aerospace Engineering, University of Missouri, Columbia, MO 65211, USA

Received 1 June 1996; accepted 1 January 1998

Abstract

Many real world business situations require classi®cation decisions that must often be made on the basis of judgment

and past performance. In this paper, we propose a decision framework that combines multiple models or techniques in a

complementary fashion to provide input to managers who make such decisions on a routine basis. We illustrate the

framework by speci®cally using ®ve di�erent classi®cation techniques ± neural networks, discriminant analysis, qua-

dratic discriminant analysis (QDA), k-nearest neighbor (KNN), and multinomial logistic regression analysis (MNL).

Application of the decision framework to an actual retail department store data shows that it is most useful in those

cases where uncertainty is high and a priori classi®cation cannot be made with a high degree of reliability. The proposed

framework thus enhances the value of exception reporting, and provides managers additional insights into the phe-

nomenon being studied. Ó 1999 Elsevier Science B.V. All rights reserved.

Keywords: Classi®cation decisions; Neural nets; Multivariate techniques

1. Introduction

Classi®cation problems deal with the assign-ment of an object to an appropriate group, andfrequently occur in business environments. Man-agers are continuously called upon to make per-formance evaluation and classi®cation decisions.For example, brand managers have to makeproduct deletion and introduction decisions. Loanand credit managers have to decide whether loan

applicants should or should not be approved forthe loan. Financial analysts typically evaluate the®nancial health of ®rms, and classify the ®rmsaccordingly (i.e., high-risk, medium-risk and low-risk ®rms). Similarly, retail store managers typi-cally classify di�erent departments such as toys,soft fashion lines, electronics, etc. as high, mediumand low performers (Sharma and Achabal, 1982).Several such examples of classi®cation problemsexist in other areas also (Salchenberger et al.,1992).

A number of statistical models such as multipleregression, discriminant analysis, logistic regres-sion, and pro®t analysis have been suggested for

European Journal of Operational Research 114 (1999) 1±14

* Corresponding author. Tel.: 001 803 777 2712; fax: 001 803

777 6876; e-mail: [email protected]

0377-2217/99/$ ± see front matter Ó 1999 Elsevier Science B.V. All rights reserved.

PII: S 0 3 7 7 - 2 2 1 7 ( 9 8 ) 0 0 0 3 7 - X

Page 2: Decision making using multiple models

making such decisions (e.g. Altman, 1968; Altmanet al., 1977; Barth et al., 1989; Booth et al., 1989;Pantalone and Platt, 1987). More recently, how-ever, researchers have also advocated the use ofneural networks as an alternate procedure (Kumaret al., 1995; Archer and Wang, 1993). Researchstudies done to date have shown that the perfor-mance of neural networks, henceforth referred toas neural nets, is often superior to that of tradi-tional statistical methods (e.g., Kumar et al., 1995;Patuwo et al., 1993; Salchenberger et al., 1992;Tam and Kiang, 1992). In addition, they havetreated statistical and arti®cial intelligence tech-niques such as neural nets as competing tech-niques rather than complementary ones that couldmutually assist each other in better decisionmaking.

In order to provide a better motivation for thetwo issues that form the main focus of this paper,it is important to note that the data sets used inprior studies that have compared the two types oftechniques have a priori known set of outcomeswith respect to the group membership of the ob-servations. For example, in the case of thrift fail-ure studies (Salchenberger et al., 1992; Tam andKiang, 1992) the data sets used to build neural netor statistical models consisted of failed and non-failed thrifts. There was no ambiguity at all re-garding which thrifts have or have not failed. Butwhat happens if the group membership is ambig-uous or fuzzy with respect to data that must beused to build statistical or neural net models? Forexample, it is possible that group membership mayneed to be de®ned as thrifts that, according to themanagement, have or do not have the potential forfailure. Managerial judgment and intuition mayneed to be used in determining potential failurecases. Even though some degree of uncertaintyexists, it is becoming increasingly important tomake future prediction of outcomes under thesesituations. The performance of classical statisticalas well as arti®cial intelligence techniques needs tobe better understood for such scenarios that oftenoccur in practice.

Second, and more importantly, the priorstudies have generally treated di�erent classi®ca-tion methods as competing techniques. Thereseems to be no apparent rationale as to why this

should be the case. Neural nets and the statisticalmethods could complement each other in pro-viding further insights into the phenomenon un-der study. In fact, many researchers haveadvocated the use of multiple methods for deci-sion making. For example, ASSESSOR, a modelto forecast sales of new products, employs twodi�erent methods to obtain the forecast (Urbanand Hauser, 1992). If the two methods producedivergent results, then further analysis is calledfor to investigate the reasons for divergent results.Similarly, Fishman et al. (1991) suggest the inte-gration of neural nets and expert systems toforecast market performance indicators such asthe S&P 500 stock index.

How can multiple methods be combined formaking classi®cation decisions? Under what con-ditions would such an approach be valuable?What happens if classi®cation methods are con-ceptually very di�erent and provide di�erent re-sults? Can the di�erence in results be linked to theinherent characteristics of the methodologiesused? This paper attempts to answer these ques-tions by proposing a decision framework thatconstructively integrates multiple methodologiesto assist managers who must work with informa-tion that does not always lead to accurate a prioriclassi®cations. The decision framework is illus-trated by combining results of neural nets, dis-criminant analysis, quadratic discriminantanalysis, k-nearest neighbor, and multinomial lo-gistic regression techniques using such a data set.We also show how results from so many di�erenttechniques can be combined to provide insightsinto the phenomenon under study and assist inbetter decision making.

2. Decision making framework

Fig. 1 schematically represents a decisionmaking framework that uses multiple models forevaluating performance and providing classi®ca-tion of observations. The solid line arrows inFig. 1 represent information ¯ow needed to obtaina classi®cation decision, while the dotted line ar-rows represent information ¯ow that occurs in theconsensus building process for those observations

2 M.K. Malhotra et al. / European Journal of Operational Research 114 (1999) 1±14

Page 3: Decision making using multiple models

that do not get similarly classi®ed by di�erentmodels.

For each observation or case, there are two typesof inputs to the classi®cation models being consid-ered ± the independent variables and the dependentvariable. The independent variables input to eachmodel come from past data, which are often in theform of several performance indicators or mea-sures. The dependent variable is the classi®cation ofan observation or case into two or more groups.Managers use their subjective evaluation of thehistorical data, as well as their own intuition andunderstanding of their business environment andfuture performance, to provide judgment basedclassi®cation of each observation to the model.

Alternate models based on di�erent techniquesare then calibrated or estimated based upon theseinput data, also known as the estimation or thetraining sample. Estimated or calibrated modelsare then used to make future classi®cations foreach observation. Cases for which there are con-sensus of classi®cation among the multiple models,and which also agree with managers' judgmentbased classi®cations, do not need further evalua-tion. However, cases for which there is disagree-ment in classi®cation among the models andmanagers must be evaluated further to determineand reconcile the di�erences. Review and recon-ciliation henceforth occurs on a qualitative basisand may in some cases call for additional infor-

Fig. 1. A framework for decision-making using multiple methods.

M.K. Malhotra et al. / European Journal of Operational Research 114 (1999) 1±14 3

Page 4: Decision making using multiple models

mation to be gathered. This process could possiblylead to management rethinking and adjusting itsclassi®cation for some cases, while leaving othersas they are. The outcomes are debated until con-sensus is reached among decision-makers for allthe exceptional cases or observations highlightedas a result of applying the framework to ®rm'sdata.

The utility of our decision framework lies in thefact that it can be used to isolate those cases forwhich a more detailed debate must be encouraged,while at the same time also identifying those casesfor which managers can have con®dence in theirjudgmental classi®cations. Thus by employingseveral models simultaneously, our decisionframework process con®rms or questions manag-ers' classi®cation decisions. Such a process can beextremely useful in focusing management's atten-tion on an exception basis on only a few cases. Thevalue of such an exception reporting can be par-ticularly high when working with large data setsthat necessitate potentially numerous decisions.

3. Background information and related data sets

In order to more fully discuss the notion of datasets that do not have a priori known outcomes foreach observation, we ®rst provide a brief back-ground discussion of the example that we are go-ing to use to illustrate our decision framework.The data set used in this study to illustrate theapplication of neural nets and the statisticalmethods within the proposed decision frameworkis taken from Sharma and Achabal (1982). Heremanagement of a large chain of retail departmentstores is interested in developing a systematicprocedure for evaluating the performance of var-ious departments (e.g., housewares, linens, toysand appliances). In 1979, an executive of the ®rmcategorized the various departments in the store assuccessful, medium performance, and unsuccessfuldepartments. This classi®cation was done in con-junction with other executives of the ®rm and storemanagers, historical knowledge about the depart-ments and their performance variables, and in-dustry-wide performance metrics. Out of a total of132 departments used in this study, 56 were clas-

si®ed as successful, 41 were classi®ed as mediumperforming and 35 were classi®ed as unsuccessfuldepartments. 1 In addition, the executives of the®rm identi®ed a total of 12 marketing and ®nancialindicators that they believed to be the most im-portant measures of performance. Measures forthese indicators were available for the years 1976±1979. 2 Table 1 lists the 11 indicators. Historicaldata for the years 1976±1978 are pooled for use asthe training sample to build ®ve models based ondi�erent techniques. The training sample, there-fore, consists of 396 observations (132 ´ 3). Datafor 1979 are used as the validation or predictionsample.

Two possible sources of error could contributeto a discrepancy or a gap between the managers'classi®cation of each observation and the one thatis actually the correct one as interpreted by theunderlying trends in the performance indicators.First, there could be judgmental errors in manag-ers' classi®cation of departments, thereby leadingto uncertainty in the outcomes or dependentvariables. If the departments are classi®ed incor-rectly by the managers, then the relationship be-tween the classi®cation and its indicators will tendto become unclear. In the present case, manage-ment could clearly identify some departments aswinners or losers (i.e., successful or unsuccessfuldepartments). Classi®cation for other departmentsis less clear-cut. Some of these departments couldpotentially have been successful while others couldhave been unsuccessful. The managers classi®edthese departments as medium performance de-partments. With respect to classi®cation, there isgreater classi®cation uncertainty for the mediumperformance category than for the successful andunsuccessful category. In contrast, in most of theprior studies that have used neural nets for clas-si®cation purposes, the categorization of the ob-servations was clear-cut (i.e., failed or non-failed).Therefore, the potential error in the classi®cation

1 The original data set contained 154 departments. However,

complete data for the years 1976±1979 were available for only

132 departments.2 Data for one of the indicators (prior stock) were available

for only 1979, and consequently this indicator was not used in

the analysis.

4 M.K. Malhotra et al. / European Journal of Operational Research 114 (1999) 1±14

Page 5: Decision making using multiple models

can a�ect the relationship between performanceand its indicators, more so for the medium per-forming departments than others.

Second, in the present study, the models arecalibrated using the training sample (i.e., data for1976±1978). However, as indicated earlier, classi-®cation of departments was done in 1979. Thisassumes that departments' classi®cations for thetraining sample period are the same as that for thevalidation sample period (i.e., 1979). Such an as-sumption may be less valid if underlying trendswithin the data shift over the temporal dimension.Then it is possible that the group membership forthe training sample is di�erent from the validationsample. In other words, the models may be gettingincorrect information for some departments withrespect to the group membership or outputs.Again, this may be more true for those depart-ments that are making a transition from the me-dium category to successful or unsuccessfulcategory over time. Consequently the classi®cationmodels may be learning using imperfect relation-ships between the classi®cation and the perfor-mance indicators. This type of mismatch can oftenoccur in practice when management must makepredictions and classi®cations for the future on thebasis of historical data that may not be totallyaccurate in representing the current situation. Therelative performance of di�erent models in pro-viding classi®cation is unknown in such cases.

Providing insights into this issue is one of the ob-jectives of this paper, which we develop further by®rst describing the ®ve di�erent classi®cationmodels used in this study.

4. Classi®cation models and their calibration

4.1. Neural networks model

Neural nets possess the potential for quantify-ing complex mapping characteristics in a compactand elegant manner. Such representations are veryuseful in several areas including economic analysis,pattern recognition, speech recognition and syn-thesis, medical diagnosis, seismic signal recogni-tion, and control systems (Nelson andIllingsworth). Interest in neural nets is due, in part,to powerful new neural models ± the multilayerperceptron, the feedback model of Hop®eld (1982)and Hop®eld and Tank (1985), the AdaptiveResonance Technique (ART) networks, Kohonennetwork etc., and to the learning methods such asback-propagation. It is also due to the rapid recentdevelopments in hardware design that havebrought within reach the realization of neural netswith a very large number of nodes.

The ability to learn is one of the main advan-tages that make the neural nets so attractive. Theyalso have the capability of performing massive

Table 1

Means for training sample

Indicator Performance p-value

Successful Medium Unsuccessful

Maintained mark-on (%) 40.19 37.88 29.90 0.000

Gross margin (%) 45.18 42.04 33.01 0.000

Total selling expenses a (%) 8.09 9.82 11.03 0.000

Total operating expenses (%) 18.54 22.40 25.95 0.000

Department margin a;b (%) 26.63 19.64 7.23 0.000

Stock turnover 4.11 3.19 2.70 0.000

Percent returns (%) 9.36 11.78 13.41 0.000

Gross transactions a;b (000s of units) 74.71 49.07 27.23 0.000

Sales per sq. ft. a($) 116.95 92.44 111.74 0.001

Gross margin return on inventory a;b 186.70 136.05 86.86 0.000

Fashion/Basic c 73.20 53.70 34.30 0.001

a Selected by discriminant analysis.b Selected by multinomial logistic regression analysis.c Percent of Fashion departments.

M.K. Malhotra et al. / European Journal of Operational Research 114 (1999) 1±14 5

Page 6: Decision making using multiple models

parallel processing and possess signi®cant faulttolerance (Mistry and Nair, 1994). Currently, thereare a wide variety of neural nets, as cited earlier,that are being studied or used in applications.However, the most widely used learning rule inneural nets is back-propagation, which is also usedfor modeling neural nets in this study.

Back-propagation is an example of a mappingnetwork that learns an approximation to a non-linear vector function, y � f �x�, from sample (x; y)pairs by propagating the errors successively backthrough the layers, starting at the output (Ru-melhart and McClelland, 1987). The neurons inthe input layer simply store the input patterns,thus acting as bu�ers. The hidden and the outputlayer neurons, each carry out two calculations, onelinear and the other nonlinear. In the linear part,for every neuron j, each of its inputs zi is multipliedby the corresponding weight wji, a bias term bi isadded and the result is summed over all inputs

netj �XN

i�1

�wjizi � bi�: �1�

This summed quantity, netj, is passed through anonlinearity to yield the output, oj as oj� f(netj),where f(.) is the activation function. Several acti-vation functions are in use, a common one beingf �:� � 1ÿ eÿ�:�

� = 1� eÿ�:��

. The function f(.)typically is non-decreasing and continuously dif-ferentiable. A back-propagation network learns bymaking changes in its weight and biases in thedirection that minimizes the sum of squared errors,E, between the network output, oj, and the target,tj (supplied through training data). This objectivefunction is given by E � 0:5

Pall j�tj ÿ oj�2. As-

suming that there are N input/output vector pairs,�x1; y1�; �x2; y2�; . . . ; �xN ; yN �, available for trainingthe network, the adjustment in parameters, for alearning rate of g, is given by the following rela-tions:

rWji � ÿg @E@Wji

8i and j;

rbi � ÿg @E@bi

8i and j:�2�

The back-propagation algorithm as describedabove is used to optimize the weights and biases ofa multilayer neural network designed to capture

the complex relationships between performanceand its indicators. The network structure used has11 inputs, 1 output, and 40 and 20 neurons each inthe two hidden layers. The training data, as men-tioned earlier, consists of 396 observations over aperiod of three years. This implies a total of 396``patterns'' for neural network learning. The datawere normalized before training on a DEC 5000workstation using the C language code.

In order to determine for how long the net mustbe trained, a holdout sample of 20% of randomlyselected observations was set aside from thetraining sample. The net was then trained on theremainder of the training sample. At every 5000iterations or epochs, the current network was usedto classify the holdout sample and determine theerror rate in it. The error in the reduced trainingsample and the holdout sample diminished expo-nentially before ¯attening out at 50,000 epochs.We went up to 100,000 epochs in order to ensurethat training had not stopped due to a local min-ima problem. The error rate did not change in ei-ther the reduced training sample or the hold-outsample from 50,000 to 100,000 epochs, therebyensuring that at 100,000 epochs the net is properlytrained and no ``grandmothering'' has occurredduring the training process. The two samples werere-combined to obtain the original training sam-ple, which was subsequently used to train the netfor 100,000 epochs. The errors dropped rapidlyduring training, and were less than 1.7% (max)after 100,000 epochs, which was deemed satisfac-tory.

4.2. Discriminant analysis model

Discriminant analysis is a well-known statisticaltechnique. Similar to the procedures used in pre-vious studies (Altman, 1968; Altman et al., 1977;Sharma and Achabal, 1982), we subjected the datain the training sample to a stepwise discriminantanalysis. The estimated discriminant function(s)was used to compute discriminant score(s), whichcan be used as measures of performance and toclassify departments in the three performancegroups. Table 1 gives the means and p-values forthe training sample (i.e., data for the period 1976±

6 M.K. Malhotra et al. / European Journal of Operational Research 114 (1999) 1±14

Page 7: Decision making using multiple models

1978). The ®ve indicators selected by stepwisediscriminant analysis are also shown in Table 1.As can be readily seen, the means of all the indi-cators are signi®cantly di�erent across the threegroups.

Since there are three groups, a maximum of twodiscriminant functions can be extracted. Both thediscriminant functions were statistically signi®cant(p < 0.000). However, the ®rst discriminant func-tion accounts for almost 95% of the total di�er-ences among the groups. Thus for all practicalpurposes, only one discriminant function is nec-essary to represent most of the di�erences amongthe groups.

4.3. Quadratic discriminant analysis model

In linear discriminant analysis it is assumedthat the covariance matrices of the groups areequal, whereas quadratic discriminant analysis(QDA) makes no such assumption. When the co-variance matrices are not equal, in theory the useof QDA will result in better discrimination andclassi®cation rates. However, due to the increasednumber of additional parameters that need to beestimated, it is quite possible that the classi®cationby QDA is worse than that of linear discriminantanalysis (Dillon and Goldstein, 1984; Sharma,1996).

Since stepwise QDA is not available and fur-thermore since it is an extension of linear dis-criminant analysis, we decided to use the same setof variables that were identi®ed by linear dis-criminant analysis for classifying observationsusing QDA. As already mentioned, these selectedvariables are shown in Table 1.

4.4. k-nearest neighbor model

The k-nearest-neighbor (KNN) is a distributionfree or non-parametric technique for classifyingobservations (Cover and Hart, 1967; Hand, 1981).It essentially computes the distance or a similaritymeasure between the observation to be classi®edand KNNs whose group membership is known.The observation is classi®ed to the group to which

the majority of KNNs belong. Use of KNNtechnique requires a measure of distance or simi-larity and the value of k. The most commonly usedmeasure of distance is the Euclidean distance orthe Mahalanobis distance. Euclidean distance is aspecial case of the Mahalanobis distance. TheMahalanobis distance is the preferred one since itis scale invariant and takes into consideration thecorrelations among the variables.

Loftsgaarden and Queensberry (1965) suggestthat for group g a reasonable value of k is

�����ngp

where ng is the number of observations in group g.However since the value of k should depend on thenumber of variables and the smoothness of theprobability density functions, a more reasonableand practical approach would be to use trial anderror to identify the value of k that gives the lowestmisclassi®cations (Hand, 1981). We performedsuch a sensitivity analysis, with the results beingshown in Table 2 for various values of k for thetraining and the validation sample. The maximumvalue of 10 was used as it was greater than

�����ngp

forall the three groups. As can be seen, a value of 3gives the highest (lowest) correct classi®cation(misclassi®cation) rate for the validation sample.

4.5. Multinomial logistic regression model

Ordered Multinomial Logistic RegressionModel (MNL) is an extension of logistic regressionmodel when the dependent variable is both poly-tomous and ordinal, and the underlying constructof the ordinal dependent variable is assumed to becontinous. A brief discussion of the MNL model

Table 2

Percent of correct classi®cation by KNN model

Value of k Training sample Validation sample

2 89.65 83.33

3 89.39 84.09

4 87.88 83.33

5 84.34 81.82

6 82.32 80.30

7 81.06 78.79

8 79.29 79.55

9 79.29 78.03

10 77.78 76.52

M.K. Malhotra et al. / European Journal of Operational Research 114 (1999) 1±14 7

Page 8: Decision making using multiple models

follows. The interested reader is also referred toAgresti (1984, 1990) and Freeman (1987) for fur-ther details.

Suppose pj is the probability of any given ob-servation belonging to the ordinal category j. Thecumulative odds of belonging to category j or be-low is given by

O6 j �Pj

g�1 pgPgg�j�1 pg

; �3�

where O6 j is cumulative odds of belonging tocategory j or below and pg is the probability of anobservation belonging to category g. MNL as-sumes that each cumulative odds is a function ofthe independent variables. That is,

log O6 j � bj0 �

XK

k�1

bjk X j

k j � 1; . . . ; g ÿ 1; �4�

where b0; b1; . . . ; bk are the parameters, X0;X1;X2; . . . ;Xk are the independent variables, G isthe number of categories or groups, and K is thenumber of variables. If it is assumed that the log ofodds is proportional across the categories, whichcan be assessed by the chi-square test for propor-tionality, then the above equation reduces to

log O6 j � bj0 �

XK

k�1

bk Xk j � 1; . . . ; g ÿ 1; �5�

The data in the training sample was subjectedto the stepwise MNL model. The chi-square sta-tistic for the proportionality test was not signi®-cant at an alpha level of 0.05, suggesting that themodel given by Eq. (5) is appropriate for this dataset. Table 1 also gives the variables selected by thestepwise procedure.

5. Classi®cation results

Table 3 gives the overall correct classi®cationresults for the training and the validation sample.The correct classi®cation rate in the trainingsample ranges from a low of 67.70% for the dis-criminant analysis model to a high of 99.75% forthe neural nets model. Although the classi®cationrate for the training data sample is generally not

high, it is signi®cantly greater than the naiveclassi®cation rate of 42.4%. 3 The predictive orclassi®cation ability of the models is assessed bycomparing the predicted classi®cation with theactual classi®cation for the validation sample (i.e.,1979 data). The classi®cation results for the vali-dation sample are also shown in Table 3, andrange from a low of 77.27% for the neural netmodel to a high of 88.64% for the multinomiallogistics regression model.

It should be noted that the classi®cation rateactually decreases in the validation sample relativeto the training sample for two models (neural netsand KNN). This is expected since in general the ®tof any model could be expected to be lower on thevalidation data than the training data on which themodel was initially calibrated or created. Yet, areverse trend is noted for the remaining threemodels.

What accounts for these di�erences? A likelyexplanation lies in the intrinsic characteristic ofmodel types and the characteristics of the datasets. Discriminant models explicitly maximize theratio of the between-group di�erences to within-group di�erences. Consequently, the classi®cationrate will be higher if the di�erences among thegroups are sharper. Such is the case for the presentdata set. Recall that the managers classi®ed the

Table 3

Overall correct classi®cations for training and validation sam-

ples

Model Sample

Training Validation

(%) (%)

Neural net 99.75 77.27

Discriminant analysis 67.70 86.36

QDA 68.18 81.82

KNN 89.39 84.09

MNL 69.44 88.64

3 The naive classi®cation rate is obtained by assuming that

the highest percent of correct classi®cation can be obtained by

arbitrarily classifying all departments as successful (Sharma,

1996). This will give a correct classi®cation rate of 42.4% (168/

396).

8 M.K. Malhotra et al. / European Journal of Operational Research 114 (1999) 1±14

Page 9: Decision making using multiple models

departments in 1979. As discussed earlier, it ispossible that the di�erences among the successful,medium, and unsuccessful groups for the valida-tion sample are much greater or sharper than forthe training sample. Such a sharper distinctionamong the groups would aid the discriminantfunction in providing a better classi®cation of thegroups in the validation sample than the groups inthe training sample. This assertion can be veri®edby examining the statistical distance betweengroups. As already mentioned, Mahalanobis dis-tance is one such measure of the statistical dis-tance. In order to compute the Mahalanobisdistance, observations are assumed to be points ina given p-dimensional space whose dimensions arede®ned by the performance indicators (Johnsonand Wichern, 1988; Sharma, 1996). That is, in thepresent case, the centroid of each group is assumedto be a point in the 11-dimensional space. Table 4gives the Mahalanobis distance between pairs ofgroups. It can be clearly seen that the distancebetween pairs of groups is statistically signi®cantfor both the training and the validation sample.However, the distance between pairs of groups forthe validation sample is much greater than that forthe training sample, thereby accounting for itsgreater classi®cation rate. MNL and discriminantanalysis are statistical techniques and their per-formance is expected to be very similar (A�® andClark, 1984; Sharma, 1996). Consequently, thepattern of the results for the MNL model shouldbe and are actually found to be very close to thatof discriminant analysis.

Models such as neural net which create classi-®cations on the basis of estimating non-linearfunctions on training sample data do not lead toan increased classi®cation rate in the validationsample based on the sample data characteristics.

Such results are also fairly common in the fore-casting literature, where the mean absolute devia-tion (MAD) scores in the validation sample aregenerally higher than the MAD scores in thetraining sample data upon which the forecastingmodel parameters were estimated. Similar consid-erations also prevail for the KNN model, which isessentially a nonparametric technique and usesonly k observations identi®ed as the KNNs andignores the remaining n ) k observations. Fur-thermore, for classifying observations in the vali-dation sample, KNN uses observations in thetraining sample and not the validation sample todetermine the KNNs. Therefore, this technique isless susceptible to or in¯uenced by the changes inthe di�erences among groups for the training andthe validation sample. Consequently, the classi®-cation in the validation sample would be verysimilar to that of the training sample. Such a resultis also noted for our sample data set.

The detailed classi®cations for each perfor-mance category and each model are shown inTable 5. No one model completely dominates theothers across the board, even though discriminantanalysis and multinomial logistics regressionmodels are slightly better than the others whenaveraged across all performance categories (seeTable 3). As seen in Table 6, the degree of agree-ment is also the highest between these two models.It is interesting to note that despite similarity incharacteristics for some of these models, completeagreement is not achieved between them. Di�erentmodels have their relative advantages and disad-vantages, and one may not dominate the other.This is perhaps true for other data sets also towhich these models may be applied. Thus themodels used in this study are su�ciently di�erentfrom each other in their classi®cation capability.This diversity in classi®cation capability is what weseek to exploit here through our decision frame-work, and in turn provide better decision makingcapability to the managers.

The average correct classi®cation rate of the®ve models in Table 5 is the best for the successfulcategory (92.1%), followed by unsuccessful(87.4%) and medium performing (68.9%) depart-ments. This result is not surprising since mediumperformance category departments are more likely

Table 4

Mahalanobis distances

Pairs of groups Mahalanobis distance

Training

sample

Validation

sample

Successful ± Medium performing 1.467 � 7.030 �

Successful ± Unsuccessful 6.718 � 17.626 �

Medium ± Unsuccessful performing 1.814 � 5.876 �

� Signi®cant at p < 0.01.

M.K. Malhotra et al. / European Journal of Operational Research 114 (1999) 1±14 9

Page 10: Decision making using multiple models

to either drift into the upper or lower performancegroup over time, and where managers may havethe hardest task of creating the ``correct'' classi®-cation based on judgment alone. After analyzingthe underlying data trends, di�erent models arealso less in agreement with the managers or withone another for the medium category. Thus it isfor the medium performance category departmentswhere most signi®cant insights into correct classi-®cation may be obtained by combining the resultsof these ®ve models. This is what we proceed to donext.

6. Integrating the models with decision framework

In order to understand how these models can beintegrated with the suggested decision framework,we ®rst attempt to identify those departments inwhich a complete agreement exists between the ®vemodels and the managers. Departments for whichthere is consensus in the classi®cation by the ®vemodels and the classi®cation by the managersstrengthen management's con®dence in their clas-si®cation. No further evaluation of these depart-ments is needed. Only those departments for which

Table 6

Percent of departments classi®ed into same performance category by each pair of models

Models Models

Neural net Discriminant analysis MNL KNN QDA

Neural net ) 71.21 74.24 73.48 68.19

Discriminant analysis 71.21 ) 91.67 78.03 86.36

MNL 74.24 91.67 ) 81.82 76.52

KNN 73.48 78.03 81.82 ) 76.52

QDA 68.19 86.36 76.52 76.52 )

Table 5

Detailed classi®cations for validation sample by performance categories

Actual performance Predicted performance Total

Successful Medium Unsuccessful

Successful 51 a (91.07) 4 (7.14) f 1 (1.79) 56

53 b (94.64) 3 (5.36) 0 (0.00)

49 c (87.50) 4 (7.14) 3 (5.36)

52 d (92.86) 4 (7.14) 0 (0.00)

53 e (94.64) 3 (5.36) 0 (0.00)

Medium 13 (31.71) 20 (48.78) 8 (19.51) 41

8 (19.51) 32 (78.05) 1 (2.44)

11 (26.83) 29 (70.73) 1 (2.44)

5 (12.20) 29 (70.73) 7 (17.07)

10 (24.39) 31 (75.61) 0 (0.00)

Unsuccessful 2 (5.71) 2 (5.71) 31 (88.57) 35

0 (0.00) 6 (17.14) 29 (89.86)

0 (0.00) 5 (14.29) 30 (85.71)

0 (0.00) 5 (14.29) 30 (85.71)

0 (0.00) 2 (5.71) 33 (94.29)

a Neural net.b Discriminant analysis.c QDA.d KNN.e MNL.f Numbers in parentheses are percentages.

10 M.K. Malhotra et al. / European Journal of Operational Research 114 (1999) 1±14

Page 11: Decision making using multiple models

there is disagreement in the classi®cations by themodels and the managers need further resolution.

Table 7 shows that complete consensus assuggested above is obtained for 56.82% of theobservations. Four out of the ®ve models agreewith the managers in 79.55% (i.e., 56.82 + 22.73)of the cases, while three out of the ®ve modelsagree with the managers in 89.56% cases. De-pending upon the e�ort and the time that man-agement is willing to spend on evaluating and re-visiting their decisions, we suggest that managersplace con®dence in their evaluation if at least 3models agree with their original classi®cation.Then only 13, or about 10% of the departments,need further attention on an exception basis. Outof these 13 departments thus identi®ed, 9 belong tothe medium performance category, 2 belong to thesuccessful category and 2 belong to the unsuc-cessful category (see Table 7). Thus there is adisproportionate share of the medium performingdepartments contributing to the lack of con®dencein the original classi®cations. Given our earlierdiscussion, these results con®rm our expectationthat more disagreement and debate will exist forthe medium performance category departments.

The identity of these 13 departments on whichmanagement attention is focused on an exceptionbasis is shown in Table 8, along with how the ®vemodels classi®ed each of these departments rela-tive to the managers' classi®cation. A seriousmisclassi®cation exists when most of the modelsagree with one another but di�er from the man-agers' classi®cation. This happens for only twodepartments ± #37 and #57. The models strongly

suggest that these two departments that have beenclassi®ed as medium performing by the managersshould actually be considered as successful ones.In such cases of serious misclassi®cation, themanagers should be urged to reconsider theirearlier classi®cation in the consensus directionsuggested by these models. The underlying logicbehind such a recommendation is that thesemodels represent sophisticated arti®cial intelli-gence or statistical techniques that can perhapssense the underlying patterns in the input databetter than the judgmental classi®cation of themanagers.

In those instances where only one model agreeswith the managers, there are ®ve cases where theremaining four models have consensus betweenthem (departments #1, #2, #64, #71, and #84).The managers can accept this consensus position iffurther evaluation of these departments supportssuch a position (for example, consider department#1 as a medium performer rather than as a suc-cessful one and department #84 as a successfulperformer rather than as a medium one). We rec-ommend a similar resolution for the remaining twodepartments (#38 and #42) where the three modelsagree with each other but not with managers'classi®cation. Managers can accept the consensusof the three models by classifying departments #38and #42 as successful performers rather than asmedium ones.

The resolution for the remaining four depart-ments where only two models agree with themanagers (departments #25, #36, #70, and #95) isless clear-cut. However for the illustrative example

Table 7

Total number of models that agreed with managers' classi®cation a

Manager's classi®cation Number of models agreeing with manager's classi®cation Total

0 1 2 3 4 5

High 0 1 1 2 11 41 56

(0.00) (1.79) (1.79) (3.57) (19.64) (73.21)

Medium 2 5 2 9 10 13 41

(4.88) (12.20) (4.88) (21.95) (24.39) (31.71)

Low 0 1 1 3 9 21 35

(0.00) (14.29) (25.00) (21.43) (30.00) (28.00)

Total 2 7 4 14 30 75 132

(1.52) (5.30) (3.03) (10.01) (22.73) (56.82)

a Numbers in parentheses are percents.

M.K. Malhotra et al. / European Journal of Operational Research 114 (1999) 1±14 11

Page 12: Decision making using multiple models

used here, less than 4% of the observations needsuch a closer scrutiny. Here judgment will have tobe used by managers to see whether a change inthe original classi®cation is warranted or not. Theexisting quantitative data and additional qualita-tive information must be carefully reviewed forthese exceptional cases. Managers could retaintheir original classi®cations if there is a strongcon®dence in their own subjective evaluation andjudgmental understanding of the situation underconsideration, or, management could engage infurther debate to identify if any other importantfactors or indicators of performance were inad-vertently omitted which, if included in the dataand the model, could substantially change the re-sults. This managerial evaluation of the results isschematically indicated by the dotted feedbackarrow from ``quantitative data'' to ``reconcile dif-ferences'' boxes in Fig. 1. Such a process can befollowed to obtain consensus for all the observa-tions in the data set, and to obtain a better insightinto the phenomenon being modeled.

We have shown here how in reference to ourdecision framework in Fig. 1, the classi®cation ofresults from the multiple models can be used andintegrated together to reconcile di�erences for

those observations or departments that do not ®ndagreement between the outcomes of the modelsand the managers. Such a decision framework thushelps the managers in analyzing and better un-derstanding their decisions.

We suggest that the above framework should beused in an adaptive mode with a rolling planninghorizon. As new data become available, they canbe incorporated into training the sample afterdropping the most distant period in the past. Forexample, in making decisions for, say, 1980 man-agement could use the data for years 1979, 1978and 1977 for training or estimating the models.

7. Generalizability of the framework

Although the framework is illustrated in thisstudy by using data set from one particular com-pany, the framework itself can be set up in anycompany context. The data requirements aresimple enough, and can be assembled in a rea-sonable time period. The models used are welldeveloped and de®ned and hence similarly gener-alizable across individual ®rms. It should be noted,however, that the number and type of competing

Table 8

Departments for which only zero, one, or two models agreed with manager's classi®cation

Dept. Manager's classi®cation Classi®cation by the models

Neural nets Discriminant analysis MNL KNN QDA

None of the models agreed with manager's classi®cation

# 37 M a S S S U S

# 57 M S S S S S

Only one model agreed with manager's classi®cation

# 1 S S M M M M

# 2 M S M S S S

# 38 M U S S M S

# 42 M M S S U S

# 64 M S S S M S

# 71 U U M M M M

# 84 M M S S S S

Only two models agreed with manager's classi®cation

# 25 U U M M U M

# 36 S S M M S M

# 70 M M S S M S

# 95 M S M S M S

a S�Successful departments; M�Medium performance departments; U�Unsuccessful departments.

12 M.K. Malhotra et al. / European Journal of Operational Research 114 (1999) 1±14

Page 13: Decision making using multiple models

techniques or models to be employed could bevaried based upon the ®rm context. It is howeversuggested that as done in this study, techniquesthat are su�ciently di�erent in their design fromone another should be included in the decisionframework.

The procedures on how the decision frameworkmust be used and results of various models com-bined and interpreted are valid across a wide va-riety of business contexts. There is obviously notheory or a closed form analytical solution avail-able to determine how many techniques or modelsmust agree with each other, since the exact math-ematical form of the misclassi®cation cost struc-ture would vary considerably from oneorganization to another. Consequently, the exactcut-o� point for the number of technique thatmust agree with one another cannot be generalizedor determined a priori, but must be agreed uponby managerial consensus based upon the individ-ual company context.

The important point to remember is that theusefulness of the framework lies in creating amechanism for exception reporting and how itleads to better insights into decision making bymanagers. For example, if the most stringent rule(i.e., 5 out of 5 techniques must agree) is used inthis study, then only 43.18% (see Table 7) of thecases need further detailed evaluation. And, ifagreement among of 4 out of 5 cases is sought,then only 20.45% of cases need further detailedevaluation. Finally, in the ``3 out of 5 cases'' rulewe used for illustrative purpose here, only 10.44%of cases need further detailed evaluation. Thetrade-o� is thus between increased managementattention versus costs of misclassi®cation.

8. Conclusion

The objective of this paper was to combinemultiple models to obtain better decision makingcapability when working with data in which out-comes are uncertain. We also de®ned two sourcesof errors that can exist in the measurement ofvariables in data sets that are obtained from actualreal world ®rms in practice, including ours.Through our decision framework we showed that

competing classi®cation techniques can often beused in a complementary fashion. It was alsoshown how the results of multiple models can becombined together in a decision framework andused to provide management with further insightsinto their business context within which decisionsare made.

The suggested approach, which utilizes com-peting methodologies, has many attractive fea-tures. It is essentially based on combining thestrengths of two or more techniques in isolatingthose cases that have an ambiguous relationshipbetween their performance measures and ®nalclassi®cations. We have also shown that the ben-e®ts of following a combined approach as cap-tured in the decision framework of Fig. 1 wouldbe considerable under those cases where it is dif-®cult for the managers to make clear cut classi®-cations.

When important decisions must be made re-garding eliminating products or closing down de-partments which appear to be unpro®table, usingthis framework could focus management's atten-tion on an exception basis on those products ordepartments which are in need of further consid-eration and evaluation. Such an input to decisionmaking could save management time, avert po-tentially costly errors, and lead to more pro®tableoperations in the long run.

References

A�®, A.A., Clark, V., 1984. Computer-Aided Multivariate

Analysis. Lifetime Learning, CA.

Agresti, A., 1984. Analysis of Ordinal Categorical Data

Analysis. Wiley, New York.

Agresti, A., 1990. Categorical Data Analysis. Wiley, New York.

Altman, E.I., 1968. Financial ratios, discriminant analysis and

the prediction of corporate bankruptcy. Journal of Finance

23 (September), 589±609.

Altman, E.I., Haldeman, R., Narayanan, P., 1977. ZETA

analysis: A new model to identify bankruptcy risk of

corporations. Journal of Banking and Finance 1 (June), 29±

54.

Archer, N.P., Wang, S., 1993. Application of the back

propagation neural network algorithm with monotonicity

constraints for two-group classi®cation problems. Decision

Sciences 24, 60±75.

Barth, J.R., Brumbaugh, R.D., Sauerhaft, D., Wang, G.H.K.,

1989. Thrift institution failures: Estimating the regulator's

M.K. Malhotra et al. / European Journal of Operational Research 114 (1999) 1±14 13

Page 14: Decision making using multiple models

closure rule. In: Kaufman, G.G. (Ed.), Research in Finan-

cial Services. JAI Press, Greenwich, CT.

Booth, D.E., Alam, P., Ahkam, S.N., Osyk, B.B., 1989. A

robust multivariate procedure for the identi®cation of

problem savings and loan institutions. Decision Sciences

20, 320±333.

Cover, T.M., Hart, P.E., 1967. Nearest neighbor pattern

classi®cation. IEEE Transactions on Information Theory

It-13 (January), 21±27.

Dillon, W.R., Goldstein, M., 1984. Multivariate Analysis.

Wiley, New York.

Fishman, M.B., Barr, D.S., Loick, W.J., 1991. Using neural

nets in market analysis. Technical Analysis of Stocks and

Commodities (April), 18±21.

Freeman, D.H., Jr., 1987. Applied Categorical Data Analysis.

Dekker, New York.

Hand, D.J., 1981. Discrimination and Classi®cation. Wiley,

New York.

Hop®eld, J.J., 1982. Neural networks and physical systems with

emergent collective computational abilities. National Acad-

emy of Science 79, 2554±2558.

Hop®eld, J.J., Tank, D.W., 1985. Neural computation of

decisions in optimization problems. Biological Cybernetics

52, 141±152.

Johnson, R.A., Wichern, D.W., 1988. Applied Multivariate

Analysis. Prentice-Hall, Englewood Cli�s, NJ.

Kumar, A., Rao, V.R., Soni, H., 1995. An empirical compar-

ison of neural network and logistic regression models.

Marketing Letters 6 (4), 251±263.

Loftsgaarden, D.O., Queensberry, C.P., 1965. A nonparametric

estimate of a multivariate density function. Annals of

Mathematical Statistics 36, 1049±1051.

Mistry, S.I., Nair, S.S., 1994. Identi®cation and control

experiments using neural designs. IEEE Control Systems

Magazine 14 (3), 48±57.

Nelson, M.M., Illingsworth, W.T., 1991. A Practical Guide to

Neural Nets. Addison±Wesley, Reading, MA.

Pantalone, C., Platt, M., 1987. Predicting failure of savings and

loan associations. AREUEA Journal 15, 46±64.

Patuwo, W., Hu, M.Y., Hung, M.S., 1993. Two-group classi-

®cation using neural networks. Decision Sciences 24, 825±

845.

Rumelhart, D., McClelland, J., 1987. Parallel Distributed

Processing: Explorations in the Microstructure of Cogni-

tion, vol. 1. MIT Press, Cambridge, MA.

Salchenberger, L.M., Cinar, E.M., Lash, N.A., 1992. Neural

networks: A new tool for predicting thrift failures. Decision

Sciences 23, 899±916.

Sharma, S., 1996. Applied Multivariate Techniques. Wiley,

New York.

Sharma, S., Achabal, D.D., 1982. STEMCOM: An analytical

model for marketing control. Journal of Marketing 46

(Spring), 104±113.

Tam, K.Y., Kiang, M.Y., 1992. Managerial applications of

neural networks: The case of bank failure predictions.

Management Science 38 (July), 926±947.

Urban, G.L., Hauser, J.R., 1992. Designing and Marketing of

New Products. Prentice-Hall, Englewood Cli�s, NJ.

14 M.K. Malhotra et al. / European Journal of Operational Research 114 (1999) 1±14