Top Banner
Agarwood Classification: A Case-based Reasoning Approach Based on E-nose Muhammad Sharfi Najib 1,2,3 , Mobyen Uddin Ahmad 3 , Peter Funk 3 ,Mohd Nasir Taib 1 , 1 Faculty of Electrical Engineering, Universiti Teknologi MARA, 40450, Selangor, Malaysia 2 Faculty of Electrical and Electronics, Universiti Malaysia Pahang, 25000, Pahang, Malaysia, 3 School of Innovation, Design and Engineering, Malardalen University, PO Box 883, SE-721 23, Vasteras, Sweden, [email protected], [email protected], [email protected], [email protected], Nor Azah Mohd Ali 4 4 Forest Research Institute Malaysia 52109, Selangor, Malaysia [email protected]; Abstract—Using an array of sensors (E-nose) to classify Agarwood has proven to be successful and produced performance close to an expert level (90% of expert level performance) but it has proven difficult to eliminate misclassifications without over-fitting. In our effort to improve our result we explored a self-improving Case-Based Reasoning approach and reached 100% correct classification. Case-Based Reasoning is an approach that will learn from every new classified case and hence the risk for misclassification is reduced. Also when new cases have to be classified that have never occurred before the system will avoid misclassification (similarity measurement is low). The approach also enables indeterminism; in reality a sample may be both close to a good case and a bad case and need further exploration by experts. The approach also handles natural variants in the wood samples well; both low- quality and high-quality samples may spread considerably in the context of E-nose readings and there is no model available of low or high quality. Keywords-Agarwood; classifications; case-based reasoning; feature selection; e-nose I. INTRODUCTION Agarwood is an aromatic wood that is usually produced from the diseased wood of Aquilaria (Thymelaeceae) species [1]. Agarwood can be classified into high and low quality types. The high-quality wood priced over US $3000 per kg is used as incense [2] while the low quality Agarwood is used for essential oil extraction [3]. Agarwood is traded internationally in major volumes and its quality very much depends on the wood resin content, aroma and region mainly from Agarwood producing countries such as Malaysia, Indonesia and India [3- 4]. Agarwood has been applied in several medications such as in pharmacological research [5], [6]. There are several methods that can be used to sense the smell of plants, such as fiber optics and Gas Chromatography (GC) [4, 7]. Until now, the problem of classifying the Agarwood by GC is still ongoing research due to its complex properties. Besides GC, E-nose [8] is an electronic instrument that is used to classify plants [9]. The major components of an E-nose is an array of physical sensors [10]. Since an E-nose normally is over dimensioned with sea large number of sensors not all needed for a specific classification task, it is an advantage to identify and select the significant sensors to be employed in classification. Feature extraction is one of the methods to select significant sensors in many research areas including pattern classification. Searching for significant feature formatter will need to create these components, incorporating the applicable criteria that follow. selection techniques have been employed to yield unbiased error estimations [11-13]. This is due to the fact that it is not usual practice to apply all the features or attributes as inputs in system classification since this increase the complexity of the classification process. The significant sensors that are selected in feature selection from the E-nose need a classification system to complete the identification process. A number of different methods are deployed for classification such as Principal Component Analysis (PCA) [14], Discriminant Factor Analysis (DFA) [15] , k-Nearest Neighbor (k-NN) [16] and ANN [17-20]. The ANN and k-NN has previously been implemented for Agarwood classification [16, 19]. One method explored in this research with promising results is Case-Based Reasoning (CBR) applied in both medical and industrial domains [21]. Its advantage is that it learns and uses past experience in order to solve a current problem [22]. CBR is especially suitable for domains with a weak domain theory, i.e. when the domain is difficult to formalize and is empirical, which is the case in many medical domains, e.g. [23-25]. Here, classified sensor signal experience in the form of cases is used to represent knowledge. A survey that shows the recent trends and development of CBR in medical domains is presented in [26] 2012 IEEE 8th International Colloquium on Signal Processing and its Applications 978-1-4673-0961-5/12/$31.00 ©2012 IEEE 120
7

Agarwood Classification

Oct 19, 2015

Download

Documents

Lily Syafinaz

gaharu
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Agarwood Classification: A Case-based Reasoning Approach Based on E-nose

    Muhammad Sharfi Najib1,2,3, Mobyen Uddin Ahmad3,

    Peter Funk3,Mohd Nasir Taib1, 1Faculty of Electrical Engineering,

    Universiti Teknologi MARA, 40450, Selangor, Malaysia

    2Faculty of Electrical and Electronics, Universiti Malaysia Pahang, 25000, Pahang, Malaysia,

    3School of Innovation, Design and Engineering, Malardalen University, PO Box 883,

    SE-721 23, Vasteras, Sweden, [email protected], [email protected],

    [email protected], [email protected],

    Nor Azah Mohd Ali4 4Forest Research Institute Malaysia

    52109, Selangor, Malaysia [email protected];

    AbstractUsing an array of sensors (E-nose) to classify Agarwood has proven to be successful and produced performance close to an expert level (90% of expert level performance) but it has proven difficult to eliminate misclassifications without over-fitting. In our effort to improve our result we explored a self-improving Case-Based Reasoning approach and reached 100% correct classification. Case-Based Reasoning is an approach that will learn from every new classified case and hence the risk for misclassification is reduced. Also when new cases have to be classified that have never occurred before the system will avoid misclassification (similarity measurement is low). The approach also enables indeterminism; in reality a sample may be both close to a good case and a bad case and need further exploration by experts. The approach also handles natural variants in the wood samples well; both low-quality and high-quality samples may spread considerably in the context of E-nose readings and there is no model available of low or high quality.

    Keywords-Agarwood; classifications; case-based reasoning; feature selection; e-nose

    I. INTRODUCTION Agarwood is an aromatic wood that is usually produced

    from the diseased wood of Aquilaria (Thymelaeceae) species [1]. Agarwood can be classified into high and low quality types. The high-quality wood priced over US $3000 per kg is used as incense [2] while the low quality Agarwood is used for essential oil extraction [3]. Agarwood is traded internationally in major volumes and its quality very much depends on the wood resin content, aroma and region mainly from Agarwood producing countries such as Malaysia, Indonesia and India [3-4]. Agarwood has been applied in several medications such as in pharmacological research [5], [6]. There are several methods that can be used to sense the smell of plants, such as fiber optics and Gas Chromatography (GC) [4, 7]. Until now, the problem of classifying the Agarwood by GC is still

    ongoing research due to its complex properties. Besides GC, E-nose [8] is an electronic instrument that is used to classify plants [9]. The major components of an E-nose is an array of physical sensors [10]. Since an E-nose normally is over dimensioned with sea large number of sensors not all needed for a specific classification task, it is an advantage to identify and select the significant sensors to be employed in classification. Feature extraction is one of the methods to select significant sensors in many research areas including pattern classification. Searching for significant feature formatter will need to create these components, incorporating the applicable criteria that follow. selection techniques have been employed to yield unbiased error estimations [11-13]. This is due to the fact that it is not usual practice to apply all the features or attributes as inputs in system classification since this increase the complexity of the classification process. The significant sensors that are selected in feature selection from the E-nose need a classification system to complete the identification process. A number of different methods are deployed for classification such as Principal Component Analysis (PCA) [14], Discriminant Factor Analysis (DFA) [15] , k-Nearest Neighbor (k-NN) [16] and ANN [17-20]. The ANN and k-NN has previously been implemented for Agarwood classification [16, 19].

    One method explored in this research with promising

    results is Case-Based Reasoning (CBR) applied in both medical and industrial domains [21]. Its advantage is that it learns and uses past experience in order to solve a current problem [22]. CBR is especially suitable for domains with a weak domain theory, i.e. when the domain is difficult to formalize and is empirical, which is the case in many medical domains, e.g. [23-25]. Here, classified sensor signal experience in the form of cases is used to represent knowledge. A survey that shows the recent trends and development of CBR in medical domains is presented in [26]

    2012 IEEE 8th International Colloquium on Signal Processing and its Applications

    978-1-4673-0961-5/12/$31.00 2012 IEEE 120

  • Figure 1: Experimental setup for measuring Agarwood odor to remove

    contaminant from E-nose

    and a life-cycle model that presents the key processes involved in the CBR method has been introduced by Aamodt and Plaza [27] (further details can be found in [21]).

    This paper presents a novel approach for a case-based

    signal classification method using sensor readings from an E-nose developed for Agarwood classification. The approach uses CBR to improve classification performance and create a dynamic system where new classified cases contribute to improved performance. The paper is organized as follows. Section II is the research methodology, section III is the results obtained, and the last section is the conclusion of this work.

    II. METHODOLOGY

    A. Measurement of Agarwood Sample Odor The measurement of the Agarwood samples were based on

    a standard operation procedure defined by Forest Research Institute Malaysia (FRIM) [28] and the experimental setup is shown in Figure 1. Ten set of samples from Malaysia and Indonesia have been used.

    The E-nose data has been collected from the ten different

    samples of Agarwood. The sample are named: DS1, DS2, DS3, DS4, DS5, DS6, DS7, DS8 DS9 and DS10. Expert have classified DS1, DS2, DS3 and DS4 as high-quality Agarwood, while the remaining samples were classified as low-quality Agarwood. Each sample with a mass of 1 kg was divided into 100 g portions and transferred into a vial of ten samples. This means that each sample has ten repeated readings of the E-nose. The E-nose consists of an array of 32 sensors that will detect the smell of Agarwood simultaneously. Hence, the data for each sample has a dimension of 32 x 10 sensor array e-nose readings, or alternately it can be written as a dimensional data matrix (32 rows x 10 column), in total 3200 E-nose readings.

    B. Data Preprocessing After completing data raw data preparation, preprocessing

    technique was applied before creating a CBR case library with cases. All the sample data sets were normalized. The 32 normalized sensor values were analyzed for the ten Agarwood samples.

    III. CASE-BASED CLASSIFICATION

    A. Feature Exraction and Selection Execution time in a CBR system is sensitive to how much

    calculation is needed when comparing two features and determining how similar they are and on how many features a case has. A common practice is to identify and remove features that do not have any significance for classification (this may change over time hence feature selection may need to be redone regularly in a CBR system when new cases are added to the case library). To also keep execution time acceptable when the case library will fill up with cases we selected the most significant sensors for classification. We reduced the number of sensors from the array using weighted average technique. From an array of 32 sensors, 9 sensors had been identified as the most significant sensor that clustering the high-quality and low-quality Agarwood sampled data. The weight vector were heuristically applied ranking from 1 to 20, whereby 1 and 20 indicating the least significant and the most significant weight vector respectively. The sensors that have been identified as the most significant sensors are S6, S8, S12, S13, S14, S16, S22, S27 and S32. From these 9 identified features, 3 features were found to be the top highest significant sensors (S13, S27 and S32) among the most significant one based on initial CBR performance evaluation. As a result, the amount of data to be used in the process of classification using CBR is by (9 rows of sensors x 10 column of datasets) for each Agarwood sample. Hence, generating of total 90 E-nose sampled data. From each Agarwood sample, a sensor centroid value of 10 same sensors was computed. Therefore, 9 sensors centroid values from each Agarwood sample were extracted. These sensor centroid values were used as extracted features in case-id formulation. If some cases are difficult the reason may be that features are not correctly weighted or there are combinations between features that need to be considered in order to make a correct identification of new cases. For this methods have been developed to identify which features or combination of features are able to discriminate between different cases, for more on this see [29], but this is beyond the scope of this paper since more cases are needed for this..

    B. Case Formulation

    TABLE I. EXTRACTED FEATURES FROM E-NOSE ORIGINAL DATA (OD)

    Problem Solution CaseID F1 F2 . . . F10 Classification caseid _001 2.37 1.94 . . . 2.09 High caseid _002 2.26 1.80 . . . 1.79 High

    . . . . . . . High

    . . . . . . . High

    . . . . . . . Low caseid _010 2.90 2.17 . . . 2.28 Low

    Table I shows the features that have been extracted from e-

    nose signal based on sensors centroid [16] from each sensor measurement for case formulation. Each sensor was

    This work was partially supported by the the Ministry of Science and Technology Malaysia under eSciencefund Grant, code: 05-03-10-SF0070)

    2012 IEEE 8th International Colloquium on Signal Processing and its Applications

    121

  • recognized as one problem from each case-ID. Thus, these datasets were divided into ten cases; one each for a different case-ID library and they were identified and classified by Agarwood expert. Case-ID_001 until case-ID_004 and case-ID_005 until case-ID_010 are identified as high-quality and low-quality Agarwood respectively. Next, the same approach was applied to all the formulated artificial cases and artificial extended cases.

    Artificial case formulation

    In order to increase the number library cases for evaluation purpose, there is a need for data extension. From ten original data (OD), the data was extended to two types of additional artificial data.

    Artificial cases are based on a simple model of E-Nose

    profile from high quality and low-quality Agarwood used to evaluate the CBR approach with more cases since we only have ten real Agarwood samples. Hence we also evaluate how the approach scales with more cases. Every new real or artificial case increases the knowledge about Agarwood and will improve overall results if the artificial cases are based on a model reflecting reality. Adding artificial cases here is only for evaluation purpose and we have not validated the model used to produce artificial cases against real Agarwood cases.

    For the first type artificial data (ADT1) was established by

    combining the high-quality data into another high-quality data. Then, the same approach was done with the low-quality data. The second type artificial data (ADT2) was added with randomized noise generated based on variance from measurement of Sensor 1 (S1) of the E-nose. This method was done to ensure that the CBR classification implementation will not be over-fitting and to validate the robustness of the system. The combination of data is presented in Table II. Data DS1 until DS4 are from the high-quality data. In each sample data, there exists 9 attributes which extracted features are sensor centroid. For nine attributes from each sample data, the last five attributes were taken from one of high-quality data set (case) and combined into another high-quality case. The same method was repeated onto low-quality data, which are DS5, DS6, DS7, DS8, DS9 and DS10. The artificial high-quality data are labeled as DS11, DS12, DS13 and DS14. The artificial low-quality data assigned as DS15, DS16, DS17, DS18, DS19 and DS20. Hence, there were ten additional artificial cases were formulated based on ADT1. After that, based on ADT2 data, there were another 10 artificial cases were generated. The high-quality data named as DS21, DS22, DS23, DS24, while for low-quality, they are set as DS25, DS26, DS27, DS28, DS29, and DS30. The randomized noises (RN1, RN2, RN3, RN4, RN5, RN6, RN7, RN8, RN9 and RN10) were added in ADT2 data. For the purpose of performance measure comparison, five different CBR classifications were implemented with different set of case library based on OD, ADT1, ADT2, EDT1 and EDT2 respectively.

    TABLE II. ORIGINAL DATA (OD)

    Quality O.D ADT1 ADT2 High DS1 DS11=DS1+DS2 DS21=DS1+RN1

    DS2 DS12=DS2+DS3 DS22=DS2+ RN2 DS3 DS13=DS3+DS4 DS23=DS3+ RN3 DS4 DS14=DS4+DS1 DS24=DS4+ R4

    Low DS5 DS15=DS5+DS6 DS25=DS5+ R5 DS6 DS16=DS6+DS7 DS26=DS6+ R6 DS7 DS17=DS7+DS8 DS27=DS7+ R7 DS8 DS18=DS8+DS9 DS28=DS8+ R8 DS9 DS19=DS9+DS10 DS29=DS9+ R9

    DS10 DS20=DS10+DS1 DS30=DS10+ R10 In Table III, there is the arrangement of combined OD data and ADT1 data. Hence, there are 20 total cases that were included in EDT1 CBR case library.

    TABLE III. EXTENDED DATA TYPE 1 (EDT1)

    Quality Extended Data (OD + ADT1)

    High DS1, DS2, DS3, DS4 DS11, DS12, DS13, DS14

    Low DS5, DS6, DS7, DS8, DS9, DS10 DS15, DS16, DS17, DS18, DS19, DS20 Table IV presents the arrangement of combined OD data and ADT2 data. As a result, there are 10 new artificial cases that were included in EDT2 CBR case library.

    TABLE IV. EXTENDED DATA TYPE 2 (EDT2)

    Quality Extended Data (OD + ADT2)

    High DS1, DS2, DS3, DS4 DS21, DS22, DS23, DS24

    Low DS5, DS6, DS7, DS8, DS9, DS10 DS25, DS26, DS27, DS28, DS29, DS30

    C. CBR Classification The most critical step in a CBR system is the Retrieval

    step and many CBR only contain a retrieval step and one retain step (storing new cases in the case library) leaving reuse and revision for humans, e.g. if more than one case is very close the solution may be a combination of the most similar cases. Revision is needed to insure that the suggested solution still matches the original new case to be classified, e.g. if there the original case to classify is from Malaysia but all similar cases are from Indonesia and there is a grading difference between Malaysia and Indonesia Agarwood then an expert may need to adapt the suggested solution.

    In this paper we focus on the retrieval step. Retrieval is essential since it plays a vital role for calculating the similarity of two cases. One popular way to the retrieve most similar cases is that the retrieval algorithm computes the similarity value for all the cases in a case library and retrieves the most similar cases against a current problem. The similarity value between cases is usually represented as 0 to 1 or 0 to 100, where 0 means no match and 1 or 100 means a perfect match. One of the most common and well known retrieval methods is the nearest neighbour (or kNN) [21] which is based

    2012 IEEE 8th International Colloquium on Signal Processing and its Applications

    122

  • on the matching of a weighted sum of the features. For a feature vector, local similarity is computed by comparing each feature value and a global similarity value is obtained as a weighted calculation of the local similarities. A standard equation for the nearest-neighbour calculation is illustrated in Eq 1.

    n

    iwi

    i

    n

    iii wSTfSTSimilarity

    1

    1),(),(

    (1)

    In equation 1:

    T is the target case S is the source case n is the number of attributes in each case i is an individual attribute from 1 to n f is a similarity function for attribute i in cases T and S w is the importance for weighing of attribute i. The weights allocated to each feature/attribute provide them a range of importance. But determining the weight for a feature value is a problem and the easy way is to calibrate this weight by an expert or user in terms of the domain knowledge. However, it may also be determined by an adaptive learning process i.e. learning or optimizing weights from the case library as an information source [13, 29]. Below is the table presented the similarity matching calculation of two cases.

    TABLE V. SIMILIRATY CALCULATION

    Features Source Target Sim weight norm_w sim*norm_w S6 2.90 2.37 0.53 1.00 0.08 0.04 S8 2.17 1.94 0.23 1.00 0.08 0.02 S12 2.76 2.21 0.55 1.00 0.08 0.04 S13 2.55 2.16 0.39 8.00 0.62 0.24 S14 2.61 2.14 0.47 2.00 0.15 0.07 S16 2.58 2.13 0.45 1.00 0.08 0.03 S22 2.59 2.14 0.45 1.00 0.08 0.03 S27 2.45 2.13 0.32 13.00 1.00 0.32 S32 2.28 2.09 0.18 10.00 0.77 0.14

    Total or global similarity between two cases 0.95

    In the above table, the similarity calculation of two cases are presented where target is a new case need to find classification and source is a classified case stored in the case-library. There are 9 features (S6, S8, S12, S13, S14, S16, S22, S27, and S32) are used for the both cases and the column Sim represent the local similarity by calculating the absolute difference of two features. The column Weight represent the importance of each features which is further normalized by using formula 2.

    n

    ff

    ff

    lw

    lww

    1

    (2)

    Here, the weight vectors are defined by experts, assumed to be a quantity reflecting importance of the corresponding feature. The training procedure will optimize the weight vector of CaseID to increase CBR accuracy. There were nine weight

    vector introduced based on nine attributes of the CaseID problem. Table VI shows the assignment of weight vectors to particular attributes. The weight has been heuristically varied to optimize significant variation between high-quality and low-quality Agarwood.

    TABLE VI. WEIGHT VECTOR ASSIGNMENT

    Weight Vector Attributes (E-Nose Sensors)

    W1=1 S6 W2=1 S8 W3=1 S12 W4=8 S13 W5=2 S14 W6=1 S16 W7=1 S22 W8=13 S27 W9=10 S32

    IV. RESULTS

    Figure 2. Measurement of resistance response of S1 from all dataset samples

    Figure 2 shows the series of data from the E-nose measurement of S1 from all dataset samples. S1 was selected as an example to set the noise

    1 2 3 4 5 6 7 8 9 105.81

    5.82

    5.83

    5.84

    5.85

    5.86

    5.87

    5.88

    5.89

    5.9First 10 measured data of S1 from entire samples

    E-n

    ose

    sens

    or re

    sist

    ance

    resp

    onse

    ( )

    Frequency

    a

    bc

    d e f

    g

    j

    i

    h

    Figure 3. Measurement of resistance response of S1 for samples for region (a) DS1, (b) DS2, (c) DS3, (d) DS4, (e) DS5, (f) DS6, (g) DS7, (h) DS8, (i)

    DS9, (j) DS10

    0 20 40 60 80 100 120 140 160 180 2005.8

    5.82

    5.84

    5.86

    5.88

    5.9

    5.92

    5.94Agarwood sample data sets S1 from all samples

    E-n

    ose

    sens

    or re

    sist

    ance

    resp

    onse

    ( )

    Frequencies

    2012 IEEE 8th International Colloquium on Signal Processing and its Applications

    123

  • Figure 3 shows the selected data from entire dataset samples from f=1 to f=10, where f is the frequency. The data was selected with the assumption that for the first datasets of S1 from entire samples, the heating of the Agarwood was not in steady state and volatile compound from Agarwood resin just begin to evaporate. However, the evaporation of the Agarwood volatile compound was assumed stabilized after f>10.

    A. Classifier Performance Evaluation The CBR classification method was analyzed based on

    their sensitivity, specificity and accuracy. Table VII and Table VIII show the comparison of accuracy, sensibility and sensitivity respectively between original data and the extended data.

    TABLE VII. STASTISTICAL ANALYSIS OF THE SYSTEM CLASSIFICATION (K=1)

    Performance Evaluation

    Original Data (OD)

    Artificial Data

    Type1 (ADT1)

    Artificial Data

    Type2 (ADT2)

    Extended Data

    Type1 (EDT1)

    Extended Data

    Type2 (EDT2)

    Criteria/ Indices

    Values Values Values Values Values

    Total Cases 10 10 10 20 20 High-quality

    case (P) 4 4 4 8 8

    Low-quality case (N)

    6 6 6 12 12

    True positive (TP)

    3 3 2 6 8

    False positive (FP)

    1 1 2 2 0

    True Negative

    (TN)

    5 6 3 11 11

    False negative

    (FN)

    1 0 3 1 1

    Sensitivity= TP

    /(TP+FN) 0.75 1.00 0.40 0.86 0.89

    Specificity= TN

    /(FP+TN) 0.83 0.86 0.60 0.85 1.00

    Accuracy= (TP+TN) /(P+N)

    0.80 0.90 0.50 0.85 0.95

    From Table VII, among the five case libraries using (k=1), EDT2 obtains the highest accuracy and specificity while ADT1 shows the highest sensitivity. For EDT2 system, among the 20 quality cases, 8 are correctly classified as high-quality (i.e true positive) by the system and only 1 is incorrectly identified as low-quality (i.e false negative) by the system. The specificity of EDT2 obtains 100% of low-quality cases are correctly classified as they do not have any high-quality Agarwood. Next, for ADT1, the sensitivity obtains 100% that measures the percentage of high-quality Agarwood due to that fact there is no low-quality Agarwood. The lowest accuracy, specificity and sensitivity is obtained by ADT1 with 50%, 60% and 40% respectively.

    TABLE VIII. STASTISTICAL ANALYSIS OF THE SYSTEM CLASSIFICATION (K=2)

    Performance Evaluation

    Original Data (OD)

    Artificial Data

    Type1 (ADT1)

    Artificial Data

    Type2 (ADT2)

    Extended Data

    Type1 (EDT1)

    Extended Data

    Type2 (EDT2)

    Criteria/ Indices

    Values Values Values Values Values

    Total Cases 10.00 10.00 10.00 20.00 20.00 High-quality

    case (P) 4.00 4.00 4.00 8.00 8.00

    Low-quality case (N) 6.00 6.00 6.00 12.00 12.00

    True positive (TP) 3.00 4.00 3.00 8.00 8.00

    False positive (FP) 1.00 0.00 1.00 0.00 0.00

    True Negative

    (TN) 6.00 6.00 5.00 12.00 11.00

    False negative

    (FN) 0.00 0.00 1.00 0.00 1.00

    Sensitivity= TP

    /(TP+FN) 1.00 1.00 0.75 1.00 0.89

    Specificity= TN

    /(FP+TN) 0.86 1.00 0.83 1.00 1.00

    Accuracy= (TP+TN) /(P+N)

    0.90 1.00 0.80 1.00 0.95

    Subsequently, from Table VIII, among the five case libraries using (k=2), ADT1 and EDT1 obtains the highest accuracy specificity and sensitivity. Both for ADT1 and EDT1 case library, among the all cases of ADT1 and EDT1, all samples are correctly classified as high-quality (i.e true positive) and low-quality (i.e false negative) by ADT1 and EDT 1 system respectively. Thus, the ADT1 and EDT1 systems accuracy, specificity and sensitivity gain 100%

    TABLE IX. ACCURACY

    Data Type Accuracy (%) K=1

    Accuracy (%) K=2

    OD 80 90 ADT1 80 100 ADT2 50 80 EDT1 85 100 EDT2 95 95

    TABLE X. SPECIFICITY

    Data Type Accuracy (%) K=1

    Accuracy (%) K=2

    OD 83 86 ADT1 86 100 ADT2 60 83 EDT1 85 100 EDT2 100 100

    2012 IEEE 8th International Colloquium on Signal Processing and its Applications

    124

  • TABLE XI. SENSITIVITY

    Data Type Accuracy (%) K=1

    Accuracy (%) K=2

    OD 100 75 ADT1 100 100 ADT2 75 40 EDT1 100 86 EDT2 89 89

    Table IX, Table X, Table XI summarize the accuracy, specificity and sensitivity respectively of OD, ADT1, ADT2, EDT1 and EDT2 system. From summary, it can be obviously said that there is always one misclassified sample from sample which is from high-quality sample.

    Figure 4. Details of resistance response of S11 and S18 for samples from datasets DS1, DS2, DS3, DS4

    Figure 4 shows the three dimensional data from the details of resistance response of S1 and S18 from all datasets.

    Figure 5. Details of resistance response of S11 and S18 for samples from datasets DS1, DS2, DS3, DS4

    Figure 5 depicts there is a sensor centroid pair that make the feature of DS3 sample differs from other samples. This pair was identified as S1 and S18 It shows that DS3 sample, high-quality sample, and low-quality samples show negative slopes, zero slopes and zero slopes respectively. All samples from low-

    quality one are having lower sensor centroid as compared to high-quality one except for DS4. In this indicative phenomenon, it can be said that sample DS3 feature is unique. It neither is in high-quality group nor in low-quality group. The irregularity found in DS3 can be identified as a special group. If an expert would classify this feature as an identifier for high quality wood then we would be able to add this knowledge to the Similarity function. We may also be able to discover this with an automated approach as proposed in [29] and if for example this combined feature only exists in high-grade samples then the problem would be solved. If this feature is more common in high-grade samples then this combined feature should have a high weight for similarity with a high-grade sample (assuming there are more similar cases in the case library). By doing this assumption (need to be confirmed by additional case or expert confirmation) we reach 100% accuracy in the classification which shows the potential and flexibility of using an CBR approach for classification in a complex domain like Agarwood classification.

    V. CONCLUSION In this paper we have demonstrated the successful

    application of signal-based classification from E-nose response for Agarwood grade samples into high or low grades using a Case-Based Reasoning approach. We achieved higher performance than with previous approaches and we achieved 100% accuracy (leave-one-out and let system classify the case) on a small set of real cases. We also extended the cases with artificial cases for evaluation purposes and 95% of all cases where correctly classified, still higher than previous approaches in classification.

    The main cause behind this exceptional performance is that

    we identified one significant combined feature that only occurred in the misclassified sample, a combined feature of S1 and S18. This shows that this case belongs to a unique cluster (currently only containing one case) classified by experts as high-grade. If more similar cases would occur these would be correctly classified by the CBR system. Such combined features can be automatically discovered in a CBR system (in this work we used Math Lab). In future, the technique can be further refined to produce finer grading and also integrate other identification features such as the origin of the Agarwood using intelligent feature selection techniques as proposed in [11-13].

    We also reduced the 32 array sensors to 9 sensors found to

    be the most significant sensors based on weight vector analysis techniques to ensure fast classification also in a large case library with tens of thousands of classified Agarwood cases and one desirable feature with Case-based classification is that when new classified cases are added to the case library, the system will extend its ability and accuracy in classification.

    ACKNOWLEDGMENT This work was using the data gathered at the Forest

    Research Institute Malaysia (FRIM) with collaboration of

    2012 IEEE 8th International Colloquium on Signal Processing and its Applications

    125

  • Advance Signal Processing (ASP) research group Faculty of Electrical Engineering UiTM, Malaysia, Universiti Malaysia Pahang, Malaysia, Ministry of Higher Education Malaysia, Institute of Innovation Design and Engineering Mlardalen University, Sweden. The authors would like to thank all ASP research group UiTM and FRIM for supporting this research.

    REFERENCES

    [1] J. Ueda, L. Imamura,Y. Tezuka,Q. La Tran,M. Tsuda, and S. Kadota,, "New sesquiterpene from Vietnamese agarwood and its induction effect on brain-derived neutrophic factor mRNA expression in vitro," Bioorganic & Medicinal Chemistry, vol. 14, pp. 3571-3574, 2006.

    [2] T. Hiroaki, M. Ito, T. Shiraki, T. Yagura and G. Honda, Sedative effects of vapor inhalation of agarwood oil and spikenard extract and identification of their active components vol. 62: Springer, 2008.

    [3] M. A. Nor Azah, J.Mailina, A. Abu Said, J. Abd. Majid, S. Saidatul Husni, H. Nor Hasnida, and Y. Nik Yasmin, "Comparison of chemical profiles of selected Gaharu oils from peninsular Malaysia," Malaysian Journal of Analytical Sciences, vol. 12, pp. 338-340, 2008.

    [4] Q. Shu-yuan, III Aquilaria species: in vitro culture and the production of eaglewood (agarwood) vol. 33: Bristish Library, 1995.

    [5] Z. Jinhua, Z. Chunshan, J. Xinyu and X. Lianwu, "Extraction of essential oil from shaddock peel and analysis of its components by gas chromatography mass spectometry," J. Cent. South Uni. Technol, vol. 13, 2006.

    [6] S. A. Rezzoug, C. Boutekedjiret, and K. Allaf, "Optimization of operating conditions of rosemary essential oil extraction by a fast controlled pressure drop process using response surface methodology," Journal of Food Engineering, vol. 71, pp. 9-17, 2005.

    [7] M. Ishihara and T. Tsuneya, "Components of the agarwood smoke on heating," Essential Oil Research, vol. 5, pp. 419-423, 1993.

    [8] S. S. Susan, G.K. Bahram and H. Troy Nagle Analysis of Medication Off-odors Using an electronic nose: Oxford University Press, 1997.

    [9] B. Nicole, B. Mark, and R. Michael, "A novel electronic nose based on miniaturized SAW sensor arrays coupled with SPME enhanced headspace-analysis and its use for rapid determination of volatile organic compounds in food quality monitoring," Sensors and Actuators B: Chemical, vol. 114, pp. 482-488, 2006.

    [10] O. Canhoto and N. Magan, "Electronic nose technology for the detection of microbial and chemical contamination of potable water," Sensors and Actuators B: Chemical, vol. 106, pp. 3-6, 2005.

    [11] N. Xiong, "A hybrid approach to input selection for complex processes," Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, vol. 32, pp. 532-536, 2002.

    [12] N. Xiong and P. Funk, "Construction of fuzzy knowledge bases incorporating feature selection," Soft Comput., vol. 10, pp. 796-804, 2006.

    [13] X. Ning and P. Funk, "Combined feature selection and similarity modelling in case-based reasoning using hierarchical memetic algorithm," in Evolutionary Computation (CEC), 2010 IEEE Congress on, 2010, pp. 1-6.

    [14] M. A. Markom, A. Y. M. Shakaff, A. H. Adom, M. N. Ahmad, W. Hidayat, A. H. Abdullah, and N. A. Fikri, "Intelligent electronic nose system for basal stem rot disease detection," Computers and Electronics in Agriculture, vol. 66, pp. 140-146, 2009.

    [15] M. Lebrun, A. Plotto, K. Goodner, M.-N. Ducamp, and E. Baldwin, "Discrimination of mango fruit maturity by volatiles using the electronic nose and gas chromatography," Postharvest Biology and Technology, vol. 48, pp. 122-131, 2008.

    [16] M. S. Najib, N. A. M. Ali, M. N. M. Arip, A. M. Jalil, and M. N. Taib, "Classification of Malaysian and Indonesia agarwood using k-NN," in International Sysmposium on Forestry and Forest Products 2010, Kuala Lumpur, 2010

    [17] F. B. M. Suah, M. Ahmad, and M. N. Taib, "Optimisation of the range of an optical fibre pH sensor using feed-forward artificial neural network," Sensors and Actuators B: Chemical, vol. 90, pp. 175-181, 2003.

    [18] D. Luo, H. G. Hosseini, and J. R. Stewart, "Application of ANN with extracted parameters from an electronic nose in cigarette brand identification," Sensors and Actuators B: Chemical, vol. 99, pp. 253-257, 2004.

    [19] M. S. Najib, N. A. M. Ali, M. N. M. Arip, A. M. Jalil, and M. N. Taib, "Classification of Agarwood Region using ANN " presented at the IEEE Control & System Graduate Research Colloquium 2010, Shah Alam, Selangor, 2010.

    [20] M. S. Najib, M. N. Taib, N. A. M. Ali, M. N. M. Arip, and A. M. Jalil, "Classification of Agarwood grades using ANN," in Electrical, Control and Computer Engineering (INECCE), 2011 International Conference on, 2011, pp. 367-372.

    [21] M. U. Ahmed, S. Begum, E. Olsson, N. Xiong, and P. Funk, Case-Based Reasoning for Medical and Industrial Decision Support Systems. Germany: Springer-Verlag, 2010.

    [22] M. J. Demirali, Reason: An introduction to critical thinking, 1 ed.: Analogical Reasoning Institute, 2011.

    [23] M. S. Begum, M. U. Ahmed, P. Funk, N. Xiong, and B. V. Scheel, "A case-based decision support system for individual stress diagnosis using fuzzy similarity matching," Computational Intelligence, vol. 25, pp. 180-195, 2009 2009.

    [24] M. U. Ahmed, S. Begum, P. Funk, N. Xiong, and B. v. Schele, "Case-based Reasoning for Diagnosis of Stress using Enhanced Cosine and Fuzzy Similarity," International journal of Transactions on Case-Based Reasoning on Multimedia Data, vol. 1, pp. 3-19, 2008.

    [25] M. U. Ahmed, S. Begum, P. Funk, N. Xiong, and B. v. Schele, "A Multi-Module Case Based Biofeedback System for Stress Treatment," International journal of Artificial Intelligence in Medicine, vol. 51, pp. 107-115, 2010.

    [26] S. Begum, M. U. Ahmed, P. Funk, X. Ning, and M. Folke, "Case-Based Reasoning Systems in the Health Sciences: A Survey of Recent Trends and Developments," Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, vol. 41, pp. 421-434, 2011.

    [27] A. Aamodt and E. Plaza, "Case-based reasoning: Foundational issues, methodological variations, and system approaches," AICom - Artificial Intelligence Communications, vol. 7, pp. 39-1994, 1994.

    [28] F. R. I. Malaysia, "Arahan Kerja Proses Penentuan Kualiti Sampel dari Cyronose 320," vol. AK(04)PK(O).FRIM.UFPP.01, ed, 2008, p. 2.

    [29] P. Funk and N. Xiong, "CASE-BASED REASONING AND KNOWLEDGE DISCOVERY IN MEDICAL APPLICATIONS WITH TIME SERIES," Computational Intelligence, vol. 22, pp. 238-253, 2006.

    2012 IEEE 8th International Colloquium on Signal Processing and its Applications

    126