Top Banner
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright
11

Data mining-based method for identifying discriminant attributes in sensory profiling

Jan 24, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data mining-based method for identifying discriminant attributes in sensory profiling

This article appeared in a journal published by Elsevier. The attachedcopy is furnished to the author for internal non-commercial researchand education use, including for instruction at the authors institution

and sharing with colleagues.

Other uses, including reproduction and distribution, or selling orlicensing copies, or posting to personal, institutional or third party

websites are prohibited.

In most cases authors are permitted to post their version of thearticle (e.g. in Word or Tex form) to their personal website orinstitutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies areencouraged to visit:

http://www.elsevier.com/copyright

Page 2: Data mining-based method for identifying discriminant attributes in sensory profiling

Author's personal copy

Data mining-based method for identifying discriminant attributesin sensory profiling

Michel J. Anzanello, Flavio S. Fogliatto ⇑, Karina RossiniFederal Univ of Rio Grande do Sul, Industrial Engineering, Av Osvaldo Aranha, 99 – 5o andar, 90035-190 Porto Alegre, RS, Brazil

a r t i c l e i n f o

Article history:Received 20 November 2009Received in revised form 24 August 2010Accepted 26 August 2010Available online 6 September 2010

Keywords:Attribute selectionDiscriminant attributesSample classificationData mining tools

a b s t r a c t

Selection of attributes from a group of candidates to be assessed through sensory analysis is an importantissue when planning sensory panels. In attribute selection it is desirable to reduce the list of those to bepresented to panelists to avoid fatigue, minimize costs and save time. In some applications the goal is tokeep attributes that are relevant and non-redundant in the sensory characterization of products. In thispaper, however, we are interested in keeping attributes that best discriminate between products. For thatwe present a data mining-based method for attribute selection in descriptive sensory panels, such asthose used in the Quantitative Descriptive Analysis. The proposed method is implemented using PrincipalComponent Analysis and the k-Nearest Neighbor classification technique, in conjunction with ParetoOptimal analysis. Objectives are (i) to identity the set of attributes that best discriminate samples ana-lyzed in the panel, and (ii) to indicate the group of panelists that provide consistent evaluations. Themethod is illustrated through a case study where beef cubes in stew, used as combat ration by the Amer-ican Army, are characterized in sensory panels using the Spectrum protocol.

� 2010 Elsevier Ltd. All rights reserved.

1. Introduction

Descriptive Analysis (DA) methods aim at providing sensoryprofiles of products. In essence, DA protocols call for the evaluationof samples regarding the intensity of a typically large set ofattributes. Samples are evaluated individually, and results areexpressed on a continuous numerical scale. There is a single num-ber indicating the attribute intensity for each sample, and sensorypanel data can thus be treated as quantitative data.

Although widely used by practitioners (Murray, Delahunty, &Baxter, 2001), DA methods present some drawbacks. First, thenumber of attributes to be assessed by panelists is usually large;some DA profiling protocols may present up to 30 attributes(Carbonell, Izquierdo, & Carbonell, 2007). As a result, data collec-tion tends to be tiring, time consuming and costly. Second, it isnot guaranteed that a thorough profiling of samples will includeattributes with respect to which samples may be discriminated,although in some applications that is the main purpose of the datacollection; e.g. Granitto, Gasperi, Biasioli, Trainotti, and Furlanello(2007). Third, DA methods do not offer any structured way to scorepanelists.

Attribute selection is an important research topic in the area ofsensory evaluation. Selection may be aimed at (i) identifying a sub-

set of non-redundant attributes that best describe products, or at(ii) finding attributes that best discriminate between products.With respect to objective (i), authors such as Dijksterhuis, Frost,and Byrne (2002), Westad, Hersleth, Lea, and Martens (2003),and Sahmer and Qannari (2008) have proposed the use of severalmultivariate projection methods to select a subset of relevantand/or non-redundant attributes from a larger group. With respectto objective (ii), and methodologically aligned with the proposi-tions in this paper, Granitto et al. (2007) introduced the use of datamining tools (more specifically, Random forests heuristics) to se-lect attributes that best discriminate products. In all cases, sensoryprofiling was used to characterize products and the drawbackslisted above were partially addressed.

In this paper we propose the combined use of multivariate pro-jection methods and data mining tools to select relevant attributesin sensory profiling. We consider relevant attributes those thatbest discriminate products evaluated in a panel. Our method isimplemented in six steps. First, panelists are ranked using a consis-tency index, and those whose evaluations differ from the rest of thegroup have their data omitted from the dataset, using a ‘‘leave onepanelist out at a time” approach. S datasets are produced, eachcomprised of evaluations from a subgroup of panelists; the mini-mum number of panelists in a subgroup is user-defined. Remainingsteps are implemented S times, one for each dataset obtained inthe initial step. For a given dataset we run a Principal ComponentAnalysis (PCA) and compute attribute importance indices based onPCA weights. We then classify products in the dataset using the

0950-3293/$ - see front matter � 2010 Elsevier Ltd. All rights reserved.doi:10.1016/j.foodqual.2010.08.010

⇑ Corresponding author. Tel.: +55 51 3308 4294; fax: +55 51 3308 4007.E-mail address: [email protected] (F.S. Fogliatto).

Food Quality and Preference 22 (2011) 139–148

Contents lists available at ScienceDirect

Food Quality and Preference

journal homepage: www.elsevier .com/locate / foodqual

Page 3: Data mining-based method for identifying discriminant attributes in sensory profiling

Author's personal copy

k-Nearest Neighbor (KNN) algorithm and compute the classifica-tion accuracy. The attribute with the lowest importance index isremoved from the dataset; products are classified again and anew accuracy value is produced. The process is repeated until thereis only one attribute remaining. Accuracy results are plotted in agraph and the entire procedure is repeated for the next dataset. Re-sults from each iteration are collapsed in a single accuracy plot,and the best subset of attributes and panelists is determined usingPareto Optimal analysis.

We envision three relevant contributions in the method we pro-pose, as explained next.

First, we present an attribute selection method that simulta-neously identifies discriminant attributes and consistent panelists.More specifically the discriminative power of different subsets ofattributes is measured in the light of evaluations performed by dif-ferent groups of panelists. The objective is to simultaneously iden-tify the best set of discriminant attributes keeping those panelistswhose evaluations improve the attributes’ capability to discrimi-nate between products. Other attribute selection propositions inthe literature focus exclusively on the minimization of retainedattributes; e.g. Granitto et al. (2007).

Second, we combine Principal Component Analysis with theKNN classification algorithm to obtain an efficient attribute selec-tion method. Our method uses KNN as the classification tech-nique because of its good performance in practical applicationsin the data mining field, conceptual simplicity, and wide avail-ability in computational packages; see Chaovalitwongse, Fan,and Sachdeo (2007) and Anzanello, Albin, and Chaovalitwongse(2009).

Third, we use Pareto Optimal (PO) analysis to identify a limitednumber of distinctive solutions that maximize the classificationaccuracy and minimize the number of retained attributes, as wellas the number of panelists. PO has been employed in a large varietyof applications, such as the analysis of life cycles of chemical prod-ucts in Azapagic (1999), scheduling of manufacturing operations inTaboada and Coit (2008), and reliability optimization in powertransmission in Taboada and Coit (2007).

To illustrate the proposed attribute selection method, we applyit in a case study involving the military product beef stew MREs, ormeals-ready-to-eat (Fogliatto, Albin, & Tepper, 1999). The productis beef stew in plastic pouches used as field rations for soldiers. Theproduct is made at a pilot plant, funded by the DLA/DOD, and lo-cated at Rutgers University, USA. Eight different products wereevaluated by a 9 member sensory panel with respect to 26 sensoryattributes. Our method reduced the number of panelists and attri-butes needed to discriminate between products to 5 and 17,respectively.

The rest of this paper is organized as follows. Section 2 gives abrief literature review on attribute selection approaches proposedin the sensory field, and introduces some of the tools applied in themethod we propose. Section 3 presents the sequence of steps forapplying the method. Section 4 shows the numerical results fromapplication of the method to the case, followed by conclusions inSection 5.

2. Background

Variable and feature selection is an important research topic inareas where large datasets are common, such as Genetics and Lin-guistics. The objectives of variable selection are to improve the per-formance of predictors, and to better describe the processgenerating the data. Strategies for variable selection have been re-viewed by Guyon and Elisseeff (2003), with a special emphasis onranking methods. Such methods are based on ordering variables

according to an importance index, and then reducing the datasetby discarding the ones with smallest index scores (Gauchi &Chagnon, 2001).

In sensory analysis ranking is commonly performed throughANOVA using the attributes’ F-values as an importance index. If amultivariate index is desired, projection methods such as principalcomponent analysis (PCA) or STATIS may be used. We now reviewsome works on attribute selection for discriminating purposes insensory analysis. With the exception of Granitto et al. (2007)where data mining tools are applied, all approaches are based oncombinations of multivariate techniques and ANOVA.

In Linear Discriminant Analysis (LDA) data are assumed to fol-low a multivariate normal distribution, with a common covariancematrix for all categories (Ripley, 1996). The Mahalanobis distancesof each object from the centroids of the categories are computed,and objects are assigned to the category with smallest distance.The delimiter between two categories is a linear function whichcan be a straight line in the case of two independent variables,for example. Ranking of attributes in LDA is typically performedusing the prediction error rate in a validation sample as the impor-tance index. A traditional application of LDA in attribute selectionmay be found in Rason, Marin, Dufour, and Lebecque (2007), whileGranitto, Biasioli, Endrizzi, and Gasperi (2008) extend the use ofLDA to more elaborate methods. In comparison with our method,LDA-based attribute selection procedures do not test different sub-sets of discriminant attributes in search of the best subset. The pro-cedure is to score attributes based on their performance in avalidation sample and choose a cut-off point to determine the sub-set of attributes to be retained. On the other hand, our method iter-atively assesses the quality of classification by extensively testingdifferent subsets of attributes, in search of the best classificationaccuracy.

ANOVA may be used solely to determine if significant differ-ences exist between products or panelists. However, applying AN-OVA to each sensory attribute increases the probability of Type Ierror (Tabachnick & Fidell, 1996). To overcome that, ANOVAmay be used in conjunction with PCA for attribute selection pur-poses. A more usual approach consists of applying ANOVA onscores obtained from a PCA on raw sensory data matrix, as re-ported by Chabanet (2000) and Westad et al. (2003). The data ma-trix in this case presents combinations of products and panelistsin the rows, and attribute assessments in the columns. In case sig-nificant effects are present, Fisher LSD post hoc tests may be em-ployed to explore the differences between individual products orpanelists. Luciano and Naes (2009) refer to this approach asPCA-ANOVA.

Alternatively ANOVA may be used on each attribute data matrixseparately, followed by PCA on the matrices of estimated main ef-fects and interactions. Such procedure, known as ASCA (Jansenet al., 2005) uses PCA to interpret the results from the ANOVA.ASCA and the method in which PCA is followed by ANOVA are com-pared by Luciano and Naes (2009) in a dataset from a sensory anal-ysis of a candy product, with similar results. In opposition to ourproposed method, the aforementioned approaches neither testthe performance of subsets of attributes regarding their classifica-tion power nor jointly verify the best subset of panelists.

Granitto et al. (2007) proposed the use of random forests (RFs)to select discriminating attributes in samples of cheese. RFs aresets of heuristics-created decision trees such that differences be-tween trees are maximized. The discriminating function derivedfrom RFs assigns importance weights to attributes; such weightsmay be used to select a reduced number of attributes, howeverkeeping the discriminative power of the function at a desired le-vel. The authors were the first to propose the use of data miningtools in the context of sensory profiling; however, in their ap-

140 M.J. Anzanello et al. / Food Quality and Preference 22 (2011) 139–148

Page 4: Data mining-based method for identifying discriminant attributes in sensory profiling

Author's personal copy

proach panelist selection is not performed simultaneously withattribute selection, as proposed in our paper. In addition the RFtechnique, a rather complex classification tool if compared tothe KNN algorithm proposed here, is not available in most statis-tical packages.

A comprehensive theoretical survey of variable selection ap-proaches is reported in Liu and Yu (2005), while a comparison ofalgorithms for that purpose is presented by Kudo and Sklansky(2000): the best method combines leave-one-out rule and KNNclassification technique, backing our proposition.

Some classical approaches for variable selection arise from thefood manufacturing industry, although not directly dealing withsensory attribute selection. Several approaches use PCA to identifythe variables with most variance and then apply tools for classifi-cation and clustering (e.g. data mining, discriminant and cluster-ing techniques) using the selected variables; see Mallet, De Vel,and Coomans (1998) and Guo, Wu, Massart, Boucon, and Jong(2002). A method to reduce dimensionality in industrial winemanufacturing was reported by Urtubia, Perrez-Correa, Soto, andPszczolkowski (2007): PCA was first used to select the variablescarrying most of the metabolite interaction information, andclasses of similar behavior were then generated by applying theK-means clustering technique on the lower-dimensioned dataset.In Camara, Alves, and Marques (2006), PCA was associated withdiscriminant analysis to differentiate and classify wines; coeffi-cients of discriminant functions were used to identify the relevantvariables.

A similar study was conducted by Rebolo, Pena, Latorre, Botana,and Herrero (2000) for testing the authenticity of wines producedin a specific region. Variables describing chemical substances wereinitially selected using cluster analysis and PCA techniques, fol-lowed by a Stepwise Bayesian analysis. The study also aimed atcategorizing the wines in classes according to their origin. Withidentical purposes, Marini, Bucci, Magri, and Magri (2006) studiedtwo classification techniques, namely Soft Independent Modelingof Class Analogies (SIMCA) and UNEQ to verify the authenticityof Italian wines. SIMCA describes the similarities among productsin a category using a principal component analysis (Wold &Sjostrom, 1977), while the UNEQ is a multivariate normal classmodel assuming an individual dispersion (i.e. unequal dispersion)of each class, similar to a Quadratic Discriminant Analysis (Derde& Massart, 1986). The identification of the relevant variables wasperformed by means of a Stepwise Linear Discriminant Analysis.In Capron, Smeyers-Verbeke, and Massart (2007), a wine datasetconsisting of 63 process variables was evaluated using decisiontrees and modifications of Partial Least Squares (PLS) regressionto identify the most important variables aimed at classifying winesinto 4 different classes (i.e. countries of origin).

Next we provide some insight on two of the analytical toolsused in our method: KNN classification algorithm and Pareto Opti-mal analysis.

The k-Nearest Neighbor is a data mining technique for classify-ing objects based on closest training examples in the variablespace. KNN is among the simplest algorithms for classification ofobservations (Duda, Hart, & Stork, 2001).

Consider observations in a J-dimensional dataset, correspondingto the J attributes, and two classes of products (A or B). The objec-tive is to classify a new observation in A or B based only on attri-butes. We consider the k-nearest neighbors of the newobservation where distance is measured by Euclidean distance.For each of the k neighbors identify the class, A or B. One way toclassify the new observation is by majority voting: the new obser-vation is in class A if the majority of its k-nearest neighbors is in A.If k = 1, then the observation is simply assigned to the class of itsnearest neighbor. The number of neighbors, k (k is a positive inte-

ger, typically small), is selected by maximizing accuracy of classifi-cation in the dataset where the class of each observation is known.Further details about KNN classification technique can be found inWu et al. (2008).

The advantage of the KNN is that it is conceptually simpler,more intuitive than other classification techniques, and widelyavailable in software packages. Further, KNN requires only oneparameter, k, and the classification accuracy is not too sensitiveto this choice within a reasonable range. Due to its simplicity,KKN has been applied in a wide variety of contexts including textrecognition patterns in Weiss et al. (1999), detection of abnormalbrain activity in Chaovalitwongse et al. (2007), and productionbatches classification in Anzanello et al. (2009).

The Pareto Optimal (PO) analysis identifies a set of distinctivesolutions in applications with multiple objective functions. Thesefunctions frequently do not present a unique solution, but a setof suitable solutions. Variable selection applications where classifi-cation performance measures are maximized and the number ofretained variables is minimized are examples of scenarios withseveral possible solutions. Due to its practical applicability, thePO analysis has been widely integrated to algorithms and ap-proaches for optimization purposes, as reported in Deb, Pratap,Agarwal and Meyarivan (2002), and Deb, Thiele, Laumanns andZitzler (2002).

Solutions identified by the Pareto Optimal are named non-dom-inated, meaning that they cannot be surpassed by other neighborsolutions in the evaluated objectives. That enables significantreduction on the number of potential solutions in that the analysiscan be focused on a small set of effectively better solutions. Thesesolutions are typically illustrated on a boundary named Paretofrontier. The identification of the best solution may depend on sub-jective information, and may become complex as the number ofobjective functions increases; see Horn, Nafpliotis, and Goldberg(1994), Zitzler and Thiele (1999), and Taboada and Coit (2008)for details.

3. Method

The method to select attributes that best discriminate productsrelies on six operational steps: 1. Measure panelists’ consistencyusing a suitable index. 2. Apply a multivariate technique on thedataset consisting of sensory attributes. In our method a PrincipalComponent Analysis (PCA) is used, but other techniques may beconsidered. 3. Compute a vector of attribute importance indicesbased on PCA weights. 4. Classify the sensory dataset using thek-Nearest Neighbor (KNN) technique and compute the classifica-tion accuracy. Then eliminate the attribute with the lowest impor-tance index, classify the dataset again, and re-compute theaccuracy. Continue such iterative process until there is only oneattribute remaining. 5. Construct an accuracy graph. 6. Removethe less consistent panelist and perform a new attribute selection,repeating steps 2 to 6. These operational steps are detailed in sub-sections to follow; Matlab codes used to perform analyses are gi-ven in the Appendix.

Let p (p = 1, . . . , P) denote the panelists, j (j = 1, . . . , J) the attri-butes, and i(i = 1, . . . , I) the products analyzed in a sensory panel.Panelists’ evaluations on a product regarding all attributes maybe repeated d (d = 1, . . . , D) times. Consider a sensory data matrixX comprised of (P � I � D) rows and J columns, with element xpid,j

Matrix X may be deployed into P individual matrices Xp, each con-taining assessments of a given panelist on all attributes and prod-ucts. The method’s objective is to correctly classify the (P � I � D)observations in X into I product classes by properly choosing dis-criminant attributes and consistent panelists.

M.J. Anzanello et al. / Food Quality and Preference 22 (2011) 139–148 141

Page 5: Data mining-based method for identifying discriminant attributes in sensory profiling

Author's personal copy

Step 1: Measure panelists’ consistency

Measure panelists’ consistency using a suitable index. We sug-gest using the index associated with the weighted average config-uration proposed by Ledauphin, Hanafi, and Qannari (2006). Theindex has several interesting characteristics. First, location anddispersion effects in panelists assessments are removed throughpre-treatment of the data matrix. Second, using an analyticalframework similar to the STATIS method (Lavit, Escoufier, Sabatier,& Traissac, 1994) the index highlights panelists presenting a differ-ent understanding of the attributes from the rest of the group; suchnon-performance indication is aligned with one of our method’sobjectives which is to identify the group of best performing panel-ists. Third, the index implementation requires straightforward cal-culations, as presented next.

Start by centering and reducing data matrices Xp. For that, firstsubtract from all column entries the column average to obtain cen-tered data matrices Xcp. Next, obtain matrices Yp by multiplyingeach matrix Xcp by a scalar hp ¼ 1=

ffiffiffiffitp

p, where tp is the sum of

the squares of all entries in Xcp i.e. tp ¼ traceðXctpÞXcp, with Xct

p.denoting the transpose of Xcp. Formally, Yp = hpXcp.

The weighted average configuration of the sensory dataset isobtained as follows. Consider matrices Yk and Yl from panelists kand l. Determine a (P � P) matrix S with entries corresponding toa similarity measure between panelists k and l given byskl = (1 + tkl)/2, where tkl = trace(Yt

kYl), for k, l = 1, . . . ,P. Determinethe eigenvector corresponding to the largest eigenvalue of S; i.e.bt = [b1, . . . , bp], such that

Ppp¼1 bp = 1.

The weighted average configuration is a compromise matrix Cthat takes into account the panelists’ performances. Formally,C ¼

PPp¼1bpYp. The performance index for panelist p is given by

ap ¼traceðYt

pCÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffitraceðCtCÞ

q : ð1Þ

The alpha values vary in the interval from �1 to +1. A panelistwith a close to -1 is in complete disagreement with the rest ofthe panel, while a panelist with a value equal to +1 is in perfectagreement with the rest of the group. Panelists are then rankedaccording to the alpha values, and the panelist with smallest ap

is the first to be removed from the analysis. A panelist removal isfollowed by an attribute selection procedure, detailed in Steps 2to 6.

Step 2: Apply a multivariate technique on the dataset consisting ofsensory attributes

Characterize the relationship among attributes in matrix Xusing a multivariate technique, and considering attributes as vari-ables in the analysis. We recommend using PCA on matrix X�PCAoutputs of interest are the component weights wjr and the percent-age of variance explained by each retained component r (r = 1, . . . ,R). The number of retained components R is defined based on theamount of variance explained by them, as in Montgomery, Peck,and Vining (2001).

Step 3: Generate attribute importance indices (z)

Generate an attribute importance index to guide the removal ofattributes not relevant for classification purposes, as proposed inAnzanello et al. (2009). Attribute j’s index is denoted by zj

j = 1, . . . , J. The higher the value of zj, the more important the cor-responding attribute is for classifying observations into productclasses.

The index zj is generated based on PCA weights, wjr, as in Eq. (2).Attributes with large weights are preferred. According to Duda

et al. (2001), such attributes should lead to a better discriminationof observations into classes of products, although exceptions mayoccur.

zj ¼XR

r¼1

jwjr j; j ¼ 1; � � � ; J ð2Þ

Step 4: Classify the dataset using KNN and eliminate irrelevant andnoisy attributes

Categorize the (P � I � D) dataset observations into I classes ofproducts on all J attributes using KNN, and compute the classifica-tion accuracy. Accuracy is defined as the ratio between the numberof correct classifications and the number of performed classifica-tions. Parameter k for the KNN algorithm is selected by cross-vali-dation on the sensorial dataset, as in Chaovalitwongse et al. (2007).

Attribute elimination starts by identifying the attribute with thesmallest zj. Remove the selected attribute, perform a new classifi-cation using KNN on the J � 1 remaining attributes and computethe classification accuracy. This procedure is repeated removingthe next attribute with the smallest zj and applying KNN on theremaining attributes, until there is only one attribute left.

Step 5: Construct an accuracy graph

Construct a graph relating classification accuracy to the numberof retained attributes. In case classification accuracy is the onlyoptimization criterion considered, the maximum accuracy indi-cates the best subset of attributes to be retained for classification.In case of having alternative subsets with identical accuracy values,choose the one with the smallest number of retained attributes.

Step 6: Remove the less consistent panelist and perform a newattribute selection

Remove the panelist with the lowest ap and repeat Steps 2 to 6for the sensorial data consisting of the J original attributes andremaining panelists. Add the new accuracy profile to the accuracygraph in Step 5. Note that since the elimination of each panelistleads to a new attribute selection based on the evaluations ofremaining panelists, one of the procedure outputs will be a set ofaccuracy profiles. Repeat this iterative procedure until a user-de-fined lower bound of remaining panelists is reached.

A final solution is obtained by identifying the maximum globalaccuracy (peak) on the set of accuracy profiles, as exemplified inFig. 1. The peak identifies both the best group of panelists andthe best subset of attributes to be considered in classification pro-cedures. Pareto Optimal (PO) analysis may also be used to identify

Fig. 1. Accuracy profiles as attributes and panelists are eliminated.

142 M.J. Anzanello et al. / Food Quality and Preference 22 (2011) 139–148

Page 6: Data mining-based method for identifying discriminant attributes in sensory profiling

Author's personal copy

a limited number of solutions that maximize the classificationaccuracy and minimize the number of retained attributes. Suchanalysis may be particularly helpful when the graph presents mul-tiple (local) peaks.

4. Case example

We now apply the suggested method to select discriminantattributes and consistent panelists in a sensory dataset. Twenty-six sensory attributes listed in Table 1 are evaluated by ninetrained subjects in a sensory panel; the resulting dataset is avail-able upon request. Attributes are appearance, flavor, and texture-related. Products analyzed are different formulations of stewedbeef cubes in gravy, conditioned inside thermostable pouchesand processed in a retort. Evaluations are performed followingthe Spectrum Method, a QDA technique. In the Spectrum Method,products are evaluated individually by each panelist regarding aset of sensory attributes [see Meilgaard, Civille, and Carr (1999)for a description of the method]. Products were prepared throughthe Combat Ration Advanced Manufacturing Technology Develop-ment Program at the Food Manufacturing Technology Facility inPiscataway, NJ, in 1994. Formulations were based on military spec-ifications for beef stew in meal-ready-to-eat pouches. Eight formu-lations were tested.

In the analyses to follow attributes marked with (*) were notconsidered due to missing observations from some panelists. Eachpanelist repeated product evaluations four times on twenty-fourattributes; a total of 768 observations were obtained from eachpanelist. The dimension of matrix X containing evaluations fromall panelists is (288 � 24). In the analysis reports panelist p is re-ferred to as Pp.

A consistency analysis on panelists using the alpha values in Eq.(1) as consistency index is performed; Table 2 depicts the resultingalpha readings. Panelist 9 (P9) is the least consistent and the first tobe removed for attribute selection, followed by P7 and similarlythereafter. In general, panelists display high alpha values, indicat-ing good group consensus in the evaluations performed.

We then apply PCA to the dataset. We set R = 2 as the number ofcomponents to retain on each PCA performed after a panelist is re-moved. This number of components explained 63% or more of thevariance on the attributes; additional components explained resid-ual portions of the variance, thus not justifying their inclusion.

As for the KKN, parameter k = 3 was defined using a five-foldedcross-validation. We first define a suitable interval of odd k values[1–11] and use those k’s to classify 80% the dataset; the k yieldingthe maximum classification accuracy is then used to classify theremaining 20% of the observations. That procedure is repeated sev-

Table 1Sensory attributes evaluated in the experiment.

Appearance attributes Flavor attributes Texture attributes

(1) Ratio of gravy to meat (4) Cooked lean beef (15) Viscosity of thegravy

(2) Visual thickness ofsauce

(5) Beef broth (16) Springiness of themeat

(*) Hue/value/chroma of thegravy

(6) Organy (17) Initialcohesiveness of themeat

(*) Hue/value/chroma of thebeef

(7) Hydrolyzedvegetable protein

(18) Denseness of themeat

(3) Uniformity of size andshape of the beef

(8) Coagulated beefblood

(19) Firmness of themeat

(9) Cardboardy (20) Chewiness of themeat

(10) Browned (21) Fibrousness of themeat

(11) Beef fat (22) Flaking of themeat

(12) Salty (23) Dryness of thebolus

(13) Metallic feelingflavor

(24) Oily film

(14) Heat

Table 2Alpha values for panelists.

Panelist ID Alpha

P1 0.8386P2 0.8205P3 0.8157P4 0.8127P5 0.8104P6 0.8077P7 0.7676P8 0.6959P9 0.0250

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0 5 10 15 20 25

Cla

ssifi

catio

n ac

cura

cy

Number of retained attributes

All

P9 out

P9. P7 out

P9. P7. P6 out

P9. P7. P6. P5 out

P9. P7. P6. P5. P2 out

Frontier Point

Fig. 2. Accuracy profiles as attributes and panelists are eliminated.

M.J. Anzanello et al. / Food Quality and Preference 22 (2011) 139–148 143

Page 7: Data mining-based method for identifying discriminant attributes in sensory profiling

Author's personal copy

eral times, and the k leading to the highest average accuracy on the20% portion is chosen.

Attribute selection is performed after each panelist elimination,leading to the set of accuracy profiles given in Fig. 2. The elimina-tion of panelists is concluded when a user-defined lower bound of5 remaining panelists is reached. Frontier points from the PO anal-ysis are identified in the accuracy graph, where FP1 is the extremeleft Frontier point. These points are also detailed in Table 3.

Maximum accuracy of 71.4% is obtained when P9, P7, P6 andP5’s evaluations are removed from the original dataset, and 17 ofthe original 24 attributes are retained. That leads to classifications12% more accurate than using KNN on all 24 original attributes. Re-tained attributes are 20, 18, 16, 19, 22, 17, 4, 21, 3, 6, 23, 10, 12, 11,1, 2, and 8 in decreasing order of relevance. Note that classificationaccuracy using 24 attributes and 9 panelists is lower than the oneobtained when 17 attributes and 5 panelists are considered. That isexplained by the removal of noisy attributes and inconsistent pan-elists from the dataset.

Alternative solutions correspond to the frontier points in Table3; one such solution clearly appears as promising. FP5 displays asituation in which 5 panelists evaluating only 7 attributes attaina classification accuracy of 70.6%. Reducing 70% of the originalgroup of attributes in the sensory experiment without severelycompromising classification accuracy is highly desirable if (i) stresson panelists is to be minimized, and (ii) a low-cost confirmatorysensory panel is to be performed.

5. Conclusion

Reducing the number of attributes to be analyzed in sensoryprofiling experiments has been object of research in recent

years. The objective is to identify a subset of meaningful andnon-redundant attributes that allow discrimination of samplestested in the panel. Such optimization is desired to save timeand fatigue to panelists, leading to less costly data collectionprocedures. Many of the attribute selection approaches in theliterature are focused on the selection of attributes carryinglarge amounts of variation. Our scope though is twofold: theidentification of attributes that (i) lead to the most accurateclassification of products and (ii) are informative in terms ofthe proportion of the variance explained in the multivariatedataset.

We propose a method for selecting the important attributes tobe used for classification of sensory panel assessed products. Themethod is: (1) Measure panelists’ consistency using a suitable in-dex; (2) Apply PCA on the dataset consisting of sensory attributes;(3) Compute a vector of attribute importance indices based on PCAweights; (4) Classify the sensory dataset using the k-NearestNeighbor (KNN) technique and compute the classification accu-racy. Iterate by eliminating the attribute with the lowest impor-tance index, classifying the dataset again, and re-computing theaccuracy; (5) Construct an accuracy graph; and (6) Remove the lessconsistent panelist and perform a new attribute selection, repeat-ing steps 2 to 6.

We apply the proposed method to a descriptive analysis data-set comprised of evaluations performed by 9 panelists on 24attributes about 8 different product formulations. Maximumaccuracy of 71.4% is obtained when four panelists’ evaluationsare removed from the original dataset, and 17 of the original24 attributes are retained. An alternative, more parsimonioussolution is identified through Pareto Optimal analysis where 5panelists evaluating only 7 attributes attain a classification accu-racy of 70.6%.

Future research includes the development of alternative ap-proaches to select the best subset of attributes for classification.We consider generating and testing other consistency indices torank attributes and panelists, as well as applying alternative datamining tools, such as Support Vector Machine and ProbabilisticNeural Networks. We will also explore methods aimed at selectingattributes for prediction purposes. Generating concise and reliableregression models will enable better product characterization asattributes levels are changed.

Acknowledgements

Dr. Fogliatto’s research is supported by CNPq (Grant No.301380/2008-2). We thank the two reviewers for their valuablecomments on an earlier manuscript version.

Table 3Information on pareto optimal frontier points.

Frontierpoint (FP)

Retainedpanelist

Retained attribute ID Classificationaccuracy

1 P1, P2, P3, P4,P8

16 0.6245

2 P1, P2, P3, P4,P5, P8

16, 18 0.6375

3 P1, P2, P3, P4,P5, P6,P8

16, 19, 18 0.6510

4 P1, P2, P3, P4,P5, P6, P7, P8

22, 11, 21, 16, 18 0.6602

5 P1, P2, P3, P4,P8

20, 18, 16, 19, 22, 17, 4 0.7063

6 P1, P2, P3, P4,P8

20, 18, 16, 19, 22, 17, 4, 21, 3,6, 23, 10, 12, 11, 1, 2, 8

0.7143

144 M.J. Anzanello et al. / Food Quality and Preference 22 (2011) 139–148

Page 8: Data mining-based method for identifying discriminant attributes in sensory profiling

Author's personal copy

Appendix

M.J. Anzanello et al. / Food Quality and Preference 22 (2011) 139–148 145

Page 9: Data mining-based method for identifying discriminant attributes in sensory profiling

Author's personal copy

146 M.J. Anzanello et al. / Food Quality and Preference 22 (2011) 139–148

Page 10: Data mining-based method for identifying discriminant attributes in sensory profiling

Author's personal copy

References

Anzanello, M. J., Albin, S. L., & Chaovalitwongse, W. (2009). Selecting the bestvariables for classifying production batches into two quality classes.Chemometrics and Intelligent Laboratory Systems, 97(2), 111–117.

Azapagic, A. (1999). Life cycle assessment and its application to process selection,design and optimization. Chemical Engineering Journal, 73(1), 1–21.

Camara, J., Alves, M., & Marques, J. (2006). Multivariate analysis for the classificationand differentiation of Madeira wines according to the main grape varieties.Talanta, 68, 1512–1521.

Capron, X., Smeyers-Verbeke, J., & Massart, D. (2007). Multivariate determination ofthe geographical origin of wines from four different countries. Food Chemistry,101, 1585–1597.

Carbonell, L., Izquierdo, L., & Carbonell, I. (2007). Sensory analysis of Spanishmandarin juices: Selection of attributes and panel performance. Food Qualityand Preference, 18, 329–341.

Chabanet, C. (2000). Statistical analysis of sensory profiling data. Graphs forpresenting results (PCA and ANOVA). Food Quality and Preference, 11(1–2),159–162.

Chaovalitwongse, W., Fan, Y., & Sachdeo, C. (2007). On the time series k-nearestneighbor classification of abnormal brain activity. IEEE Transactions on Systemand Man Cybernetics A, 37(6), 1005–1016.

Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. (2002). A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on EvolutionaryComputation, 6(2), 182–197.

Deb, K., Thiele, L., Laumanns, M., & Zitzler, E. (2002). Scalable multi-objectiveoptimization test problems. Proceedings of the 2002 Congress on EvolutionaryComputation, 1, 825–830.

Derde, M., & Massart, D. (1986). Supervised pattern recognition: The ideal method?Analytica Chimica Acta, 184, 33–51.

Dijksterhuis, G., Frost, M. B., & Byrne, D. V. (2002). Selection of a subset of variables:Minimization of Procrustes loss between a subset and the full set. Food Qualityand Preference, 13, 89–97.

Duda, R., Hart, P., & Stork, D. (2001). Pattern classification (2nd ed.). New York:Wiley-Interscience.

Fogliatto, F. S., Albin, S. L., & Tepper, B. J. (1999). A hierarchical approach tooptimizing descriptive analysis multiresponse experiments. Journal of SensoryStudies, 14(4), 443–465.

Gauchi, J., & Chagnon, P. (2001). Comparison of selection methods of exploratoryvariables in PLS regression with application to manufacturing process data.Chemometrics and Intelligent Laboratory Systems, 58, 171–193.

Granitto, P., Biasioli, F., Endrizzi, I., & Gasperi, F. (2008). Discriminant models basedon sensory evaluations: Single assessors versus panel average. Food Quality andPreference, 19(6), 589–595.

Granitto, P. M., Gasperi, F., Biasioli, F., Trainotti, E., & Furlanello, C. (2007). Moderndata mining tools in descriptive sensory analysis: A case study with a Randomforest approach. Food Quality and Preference, 18, 681–689.

Guo, Q., Wu, W., Massart, D., Boucon, C., & Jong, S. (2002). Feature selection inprincipal component analysis of analytical data. Chemometrics and IntelligentLaboratory Systems, 61, 123–132.

Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection.Journal of Machine Learning Research, 3, 1157–1182.

Horn, J., Nafpliotis, N., & Goldberg, D. (1994). A niched pareto genetic algorithm formultiobjective optimization. In Proceedings of the First IEEE Conference onEvolutionary Computation, IEEE World Congress on Computational Intelligence, 1,82–87.

Jansen, J., Hoefsloot, H., Greef, J., Timmerman, M., Westerhuis, J., & Smilde, J. (2005).ASCA: Analysis of multivariate data obtained from an experimental design.Journal of Chemometrics, 19(9), 469–481.

Kudo, M., & Sklansky, J. (2000). Comparison of algorithms that select features forpattern classifiers. Pattern Recognition, 33, 25–41.

Lavit, C., Escoufier, Y., Sabatier, R., & Traissac, P. (1994). The ACT (STATIS method).Computational Statistics & Data Analysis, 18, 97–119.

Ledauphin, S., Hanafi, M., & Qannari, E. M. (2006). Assessment of the agreementamong the subjects in fixed vocabulary profiling. Food Quality and Preference,17(3–4), 277–280.

Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms forclassification and clustering. IEEE Transactions on Knowledge and DataEngineering, 17(4), 491–502.

Luciano, G., & Naes, T. (2009). Interpreting sensory data by combining principalcomponent analysis and analysis of variance. Food Quality and Preference, 20(3),167–175.

Mallet, Y., De Vel, O., & Coomans, D. (1998). Integrated feature extraction usingadaptive wavelets. In: H. Liu & H. Motoda, Feature extraction, construction andselection: A Data mining perspective, 175–189.

Marini, F., Bucci, R., Magri, A., & Magri, A. (2006). Authentication of Italian CDOwines by class-modeling techniques. Chemometrics and Intelligent LaboratorySystems, 84, 164–171.

Meilgaard, M., Civille, G. V., & Carr, B. T. (1999). Sensory evaluation techniques (3rded.). Boca Raton: CRC Press.

Montgomery, D., Peck, E., & Vining, G. (2001). Introduction to linear regressionanalysis. New York: John Wiley.

Murray, J. M., Delahunty, C. M., & Baxter, I. A. (2001). Descriptive sensory analysis:Past, present and future. Food Research International, 34(6), 461–471.

Rason, J., Marin, J., Dufour, E., & Lebecque, A. (2007). Diversity of the sensorycharacteristics of traditional dry sausages from the centre of France. Relationwith regional manufacturing practice. Food Quality and Preference, 18(3),517–530.

Rebolo, S., Pena, R., Latorre, M., Botana, A., & Herrero, C. (2000). Characterisation ofGalician (NW Spain) Ribeira Sacra wines using pattern recognition analysis.Analytica Chimica Acta, 417, 211–220.

Ripley, B. (1996). Pattern recognition and neural networks. Cambridge: CambridgeUniversity Press.

Sahmer, K., & Qannari, E. M. (2008). Procedures for the selection of a subset ofattributes in sensory profiling. Food Quality and Preference, 19, 141–145.

Tabachnick, B. G., & Fidell, L. S. (1996). Using multivariate statistics. New York:Harper Collins College Publishers.

Taboada, H., & Coit, D. (2007). Data clustering of solutions for multiple objectivesystem reliability optimization problems. Quality Technology & QuantitativeManagement Journal, 4, 35–54.

M.J. Anzanello et al. / Food Quality and Preference 22 (2011) 139–148 147

Page 11: Data mining-based method for identifying discriminant attributes in sensory profiling

Author's personal copy

Taboada, H., & Coit, D. (2008). Multi-objective scheduling problems: Determinationof pruned Pareto sets. IIE Transactions, 40, 552–564.

Urtubia, A., Perrez-Correa, J., Soto, A., & Pszczolkowski, P. (2007). Using data miningtechniques to predict industrial wine problem fermentation. Food Control, 18,1512–1517.

Weiss, S., Apte, C., Dameray, D., Johnson, D., Ples, F., Goetz, T., et al. (1999).Maximizing text-mining performance. IEEE Intelligent Systems, 14(4), 63–69.

Westad, F., Hersleth, M., Lea, P., & Martens, H. (2003). Variable selection in PCA insensory descriptive and consumer data. Food Quality and Preference, 14,463–472.

Wold, S., & Sjostrom, M. (1977). A Method for Analyzing Chemical Data in Terms ofSimilarity and Analogy. In B. R. Kowalski (Ed.), Chemometrics, Theory andapplication, ACS Symposium series. 52 (pp. 243–282). Washington, DC: AmericanChemical Society.

Wu, X., Kumar, V., Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., et al. (2008). Top 10algorithms in data mining. Knowledge and Information Systems, 14(1), 1–37.

Zitzler, E., & Thiele, L. (1999). Multiobjective evolutionary algorithms: Acomparative case study and the strength pareto approach. IEEE Transactionson Evolutionary Computation, 3(4), 257–271.

148 M.J. Anzanello et al. / Food Quality and Preference 22 (2011) 139–148