Brasov, 2011 “Transilvania” University of Braşov Faculty of Electrical Engineering and Computer Science Applications of computational intelligence in data mining By Ioan Bogdan CRIVAŢ A thesis submitted in partial fulfillment of the requirements for the degree of PhD Advisor: Prof. Univ. Dr. Razvan Andonie
142
Embed
Application of Computational Intelligence in Data Mining
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Brasov, 2011
“Transilvania” University of Braşov
Faculty of Electrical Engineering and Computer
Science
Applications of computational intelligence in data mining
By
Ioan Bogdan CRIVA Ţ
A thesis submitted in partial
fulfillment of the requirements for
the degree of
PhD
Advisor: Prof. Univ. Dr. Razvan Andonie
Brasov, 2011
Abstract
The objective of this work is a synthesis of some of the recent efforts in
the domain of predictive and associative rules extraction and processing
as well as a presentation of certain original contributions to the area.
The first two chapters of the theses present data mining and the
some recent results in the area of rule extraction. The second chapter,
“Rules in the Data Mining Context” introduces data mining with a focus
on rule extraction. We discuss association rules and their properties as
well as some notions of fuzzy modeling and fuzzy rules. The third
chapter, “Methods for Rules Extraction”, presents the most commonly
used methods for extracting rules. A special section describes the
specifics of rules analysis in Microsoft SQL Server. The following chapters
contain some original contributions in their context. The fourth chapter,
“Contributions to Rules Generalization”, reviews some of the existing
methods for simplifying rule models, and focuses on measures for
detecting rules similarity. Similar rules can be merged, resulting in
simpler rule systems. The fifth chapter, “Measuring the Usage
Prediction Accuracy of Recommendation Systems”, presents the area of
accuracy measurements for recommendation systems, one of the most
common applications of association rules. A new instrument for
assessing the accuracy of a recommender is presented, together with
some experimental results. The sixth chapter presents some
experimental results for the techniques introduced in the third and
fourth chapters. The results are detailed for datasets used in presenting
the methods or compared against results from other authors. The last
chapter contains conclusions of this thesis as well as certain directions
for further research.
Brasov, 2011
Contents Contents .......................................................................................................... iii List of figures ................................................................................................... 1
3 Methods for Rule Extraction ................................................................... 31
3.1 Extraction of Association Rules ................................................... 31
3.1.1 The Apriori algorithm ................................................................................ 31
3.1.2 The FP-Growth algorithm ......................................................................... 35
3.1.3 Other algorithms and a performance comparison ................................... 38
3.1.4 Problems raised by Minimum Support itemset extraction systems ................................................................................................................. 40
3.2 An implementation perspective: Support for association analysis in Microsoft SQL Server ® 2008 ................................................. 45
3.3 Rules as expression of patterns detected by other algorithms .. 50
3.3.1 Rules based on Decision Trees .................................................................. 51
3.3.2 Rules from Neural Networks ..................................................................... 52
4 Contributions to Rule Generalization ..................................................... 59
4.5.1 Future directions for the basic rule generalization algorithm ............................................................................................................... 80
4.5.2 Further work for the apriori specialization of the RGA ............................ 84
5 Measuring the Usage Prediction Accuracy of Recommendation Systems ......................................................................................................... 85
5.1 Association Rules as Recommender Systems ............................. 86
5.2 Evaluating Recommendation Systems ........................................ 86
5.3 Instruments for offline measuring the accuracy of usage predictions .............................................................................................. 88
5.3.1 Accuracy measurements for a single user ................................................ 89
5.3.2 Accuracy Measurements for Multiple Users ............................................ 92
5.4 The Itemized Accuracy Curve ...................................................... 93
5.4.1 A visual interpretation of the itemized accuracy curve ............................ 98
5.4.2 Impact of the N parameter on the Lift and Area Under Curve measures .................................................................................................... 99
5.5 An Implementation for the Itemized Accuracy Curve .............. 101
Figure 3-2 An FP-Tree structure ........................................................................................ 36
Figure 3-3 A mining case containing tabular features ...................................................... 46
Figure 3-4 A RDBMS representation of the data supporting mining cases with nested tables ................................................................................................. 47
Figure 3-5 Using a structure nested table as source for multiple model nested tables ......................................................................................................... 50
Figure 3-6 A decision tree built for rules extraction (part of a SQL Server forest) .................................................................................................................... 52
Figure 3-7 An artificial neural network ............................................................................. 53
Figure 4-1 - Creating a fuzzy set C to replace two similar sets A and B ............................ 69
Figure 4-2 Merging of similar rules ................................................................................... 70
Figure 4-3 A visual representation of the RGA ................................................................. 74
Figure 4-4 A finer grain approach to rule generalization ................................................. 80
Figure 4-5Accuracy of a fuzzy rule as a measure of similarity with the universal set .......................................................................................................... 83
Figure 5-1 Example of ROC Curve ..................................................................................... 91
Figure 5-2 Itemized Accuracy Curve for a top-N recommender ....................................... 98
Figure 5-3 Evolution of Lift and Area Under Curve for different values of N ................. 100
Figure 5-4 Aggregated Itemized Accuracy Curve based on the Movie Recommendations dataset (for N=5 recommendations) ................................... 105
Figure 6-2 Evolution of Lift for various values of N for test models (Movie Recommendations dataset) ................................................................................ 116
Figure 6-3 Evolution of Lift for various values of N for test models (Movie Lens dataset) ....................................................................................................... 117
2
Acknowledgments
I would like to express my deepest gratitude to Prof. Dr. Răzvan Andonie for
his guidance, patience and encouragements. Above all, I would like to thank
him for rekindling my passion for academic research after years of industrial
experience.
Deep thanks also go to the Faculty of Electrical Engineering and Computer
Science at the “Transilvania” University for their help and advice with the
intermediate steps of the doctoral research as well as to dr. Daniela Drăgoi,
always a tremendous help for the doctoral program procedures.
I am also grateful to the amazing people that I met in my academic life,
particularly to prof. Petru Moroșanu and prof. dr. Tudor Bălănescu, and to
the wonderful colleagues at Microsoft Corporation and Predixion Software,
for their friendship, knowledge and experience.
At last, but certainly not least, my heartfelt thanks go to my family, Irinel
and Cosmin, for their most consistent help and support.
3
Publications, Patents and Patent Applications by the
Author
Books
1. MacLennan Jamie, Crivat Bogdan and Tang ZhaoHui Data Mining with Microsoft SQL Server 2008 [Book]. - Indianapolis, Indiana, United States of America : Wiley Publishing, Inc., 2009. - 978-0-470-27774-4.
2. Crivat Bogdan, Grewal Jasjit Singh, Kumar Pranish and Lee Eric ATL Server: High Performance C++ on .Net [Book]. – Berkeley, CA, United States of America : APress, Inc., 2003. - 1-59059-128-3.
Articles
3. Andonie Razvan, Crivat B [et al.] Fuzzy ARTMAP rule extraction in computational chemistry [Conference] // IJCNN. - 2009. - pp. 157-163. - DOI: 10.1109/IJCNN.2009.5179007.
4. Crivat, Ioan Bogdan SQL Server Data Mining Programability [Online] March 2005 [Cited: 6 22, 2011.] http://msdn.microsoft.com/en-US/library/ms345148(v=SQL.90).aspx.
Issued Patents (United States Patents and Trademark Office)
5. Crivat Ioan B, Petculescu Cristian and Netz Amir Explaining changes
in measures thru data mining [Patent] : 7899776. - United States of America, 2011.
6. Crivat Ioan B, Petculescu Cristian and Netz Amir Random access in run-length encoded structures [Patent] : 7952499. - United States of America, 2011.
7. Crivat Ioan B, Iyer Raman and MacLennan C James Detecting and displaying exceptions in tabular data [Patent] : 7797264. - United States of America, 2010.
4
8. Crivat Ioan B, Iyer Raman and MacLennan C. James Dynamically detecting exceptions based on data changes [Patent] : 7797356. - United States of America, 2010.
9. Crivat Ioan B, Iyer Raman and MacLennan James Partitioning of a data mining training set [Patent] : 7756881. - United States of America, 2010.
10. Crivat Ioan B, Petculescu Cristian and Netz Amir Efficient Column Based Data Encoding for Large Scale Data Storage [Patent] : 20100030796 . - United States of America, 2010.
11. Crivat Ioan B. [et al.] Extensible data mining framework [Patent] : 7383234. - United States of America, 2008.
12. Crivat Ioan Bogdan [et al.] Systems and methods that facilitate data mining [Patent] : 7398268. - United States of America, 2008.
13. Crivat Ioan, B [et al.] Using a rowset as a query parameter [Patent] : 7451137. - United States of America, 2008.
14. Crivat Ioan, B, MacLennan C, James and Iyer Raman Goal seeking using predictive analytics [Patent] : 7788200. - United States of America, 2010.
15. Crivat Ioan, Bogdan [et al.] Unstructured data in a mining model language [Patent] : 7593927. - United States of America, 2009.
16. Crivat Ioan, Bogdan, Cristofor Elena, D. and MacLennan C. James Analyzing mining pattern evolutions by comparing labels, algorithms, or data patterns chosen by a reasoning component [Patent] : 7636698. - United States of America, 2009.
17. Crivat Bogdan [et al.] Systems and methods of utilizing and expanding standard protocol [Patent] : 7689703. - United States of America, 2010.
Pending patent applications (United States Patents and
Trademark Office)
18. Crivat Ioan Bogdan [et al.] Techniques for Evaluating Recommendation Systems [Patent Application] : 20090319330 - United States of America, 2009.
5
1 Introduction
1.1 Objectives
The objective of this work is a synthesis of some of the recent efforts in the
domain of predictive and associative rules extraction and processing as well
as a presentation of certain original contributions to the area.
As used in this work, data mining is the process of analyzing data in order to
find hidden patterns using automatic methodologies. Due in part to major
computational advances in the last decades, extensive research in the area
of data mining led to development of many classes of pattern extraction
algorithms. These algorithms are often employed in systems that yield high
accuracy predictions but the patterns detected by such algorithms are,
more often than not, difficult to interpret.
A direct consequence of this difficulty is the high barrier encountered by
data mining to acceptance in the common information worker’s toolset.
The author spent the most of the last decade as one of the principal
designers and implementers of the Microsoft SQL Server Data Mining
platform, a product with the goal of making data mining more accessible to
information workers. This work is strongly influenced by this industrial
perspective.
1.2 Contributions
This work synthesizes the original contributions of the author over a period
of time longer than the actual doctoral studies, as illustrated by the author’s
Setnes et al., in [68], describe some of the problems associated with using
these measures. The paper defines a set of criteria for such a measure and
introduces such a measure, which will be discussed in detail in Subsection
4.3 below.
4.1.3 Interpolation based rule generalization techniques
Takagi-Sugeno and Mamdani models perform inferences under the
assumption that the rule set completely covers the inference space (i.e. it is
dense). Interpolative reasoning methods address the problem of sparse rule
sets, which do not cover the whole inference space.
Mizumoto and Zimmermann, in [69], analyze the properties of rule
models and the possibility to interpolate new rules in the “generalized
modus tollens”. A modus tollens rule may be written, in logical operator
notation, as
(4.3)
In 1993, in [70], Kóczy and Hirota propose a method (KH-rule
interpolation) for interpolations where results are inferred based on
computation of each -cut level, and the resulting points are connected by
linear pieces to yield an approximate conclusion.
63
4.2 Rule Model Simplification Techniques
Extensive research is available for rule model simplification techniques.
Such techniques may target the feature set considered for rule inference,
the definition of the fuzzy sets participating in the rules or the structure of
the rules models.
4.2.1 Feature set alterations
Feature set alteration techniques share the goal of reducing the number of
features that participate in the inference process. A direct consequence of
applying such alteration techniques is that they result in simplified rule
systems, because a reduction in the number of features implies a smaller
number of predicates in rules’ premises. Such alterations can be classified
as Feature Extraction or Feature Selection techniques.
Feature Extraction techniques allow synthesizing of a new, lower-
dimension feature set which encompasses all or most of the variance of the
original feature set (i.e. the original information is preserved or the loss is
minimal). Such techniques include Principal Component Analysis (aka
Karhunen-Loewe transform), described in [71], which consists in identifying
the eigenvectors of the covariance matrix of the training data and
projecting the data on these eigenvectors. The eigenvalues associated with
these eigenvectors provide a measure of the variance of the whole system
along these vectors and consequently allow sorting the new coordinates
(the eigenvectors) in the order of variance. Frequently, for real data sets, a
low number of eigenvectors can account for 95% or more of the variance in
data.
64
A similar feature extraction technique is Sammon’s non-linear
projection [72]. In this approach, a set of high-dimensional vectors are
projected in a low-dimension space (2 or 3 dimensions) and a gradient
descent technique is used to adjust the projections so that the distance
between projections is as close as possible to the distance between the
original pairs of vectors. As the preservation of the semantic meaning is a
major advantage of the fuzzy rule models, techniques for feature
transformation (which inherently alter the model’s semantics) are not
treated in depth in this paper.
Feature Selection techniques do not create new features, but rather
identify the top most significant features to be used in building a model. On
real data sets, this approach often provides very good results because of
redundancy, co-linearity or irrelevance of certain data dimensions. Dash
and Liu, in [73], provide an extensive overview of the feature selection
techniques commonly used in classification systems. A very popular
technique for feature selection is the information gain method, introduced
in [54]. The information gain feature selection method sorts the input
features by the amount of entropy they reduce from the whole system and
can be used to determine which features should be retained, by keeping
those whose information gains are greater than a predetermined threshold.
Feature selection does not affect the semantic meaning of the rule model
and is used for rule simplification techniques.
65
4.2.2 Changes of the Fuzzy sets definition
Song et al., in [74], suggest using supervised learning to adapt the
parameters of the of the fuzzy membership functions defining the
components of the rules. With the assumption that the inference surface is
relatively smooth, over-fitting of the fuzzy system can be detected in two
ways. Two membership functions coming sufficiently close to each other
can be fused into a single membership function, and membership functions
becoming too narrow can be deleted. In both cases, this adaptive pruning
improves the interpretability of the fuzzy system. This approach is related to
our proposed method for rules generalization and the methods will be
compared in Subsection 4.4 below.
4.2.3 Merging and Removal Based Reduction
Automatically generated rule systems often produce redundant, similar,
inconsistent or inactive rules. Handling of similar rules is detailed in the next
section, covering Similarity Measures and Rule Base Simplification.
Inconsistent rules destroy the logical consistency of the models. Xiong and
Lits, in [75], propose a “consistency index” numerical assessment which
helps measuring the level of consistency/inconsistency of a rule base. They
use this index in the fitness function of a genetic algorithm which searches a
set of optimal rules under two criteria: good accuracy and minimal
inconsistency.
66
4.3 Similarity Measures and Rule Base Simplification
Setnes at al., in [68], propose a similarity measure for rules in a model.
Based on this measure, similar fuzzy sets are merged to create a common
fuzzy set to replace them in the rule base, with the goal of creating a more
efficient and more linguistically tractable model.
A similarity measure for two fuzzy sets, A and B, is defined as a function
[ ] (4.4)
A set of 4 criteria for a similarity measure is first introduced in [68]:
- Non-overlapping fuzzy sets should be totally non-equal.
That is,
(4.5)
- Overlapping fuzzy sets should have a similarity value
greater than 0
(4.6)
- Only equal fuzzy sets should have a similarity value of 1
(4.7)
67
- Similarity between two fuzzy sets should not be
influenced by scaling or shifting the domain on which
they are defined
With these criteria, [68] proposes a new similarity measure, based on set
theory, defined as:
(4.8)
This measure is, therefore, the ratio between the cardinality of intersection
and reunion of the sets. When the equation is rewritten using the
membership functions, in a discrete space X=(x1, x2, …, xn) it becomes:
∑ [ ]
∑ [ ]
(4.9)
The operators are, respectively, minimum (˄) and maximum (˅). This
similarity measure complies with the four criteria above and reflects the
idea of gradual transition from equal to completely non equal fuzzy sets/
With this measure defined, [68] proceeds to simplifying the rule base. Rules
that are similar to the universal fuzzy set (S(A,U)~1, x in X) can, for
example, be removed.
68
The paper also provides a solution for merging similar rules. For this, it uses
a parametric trapezoidal representation of fuzzy sets, each rule being
described by parameters:
{
(4.10)
The merging of two similar fuzzy sets, A and B, defined by µA(x; a1, a2, a3,
a4) and µB(x; b1, b2, b3, b4) is defined as a new fuzzy set, C, defined by µC(x;
c1, c2, c3, c4), where:
c1 = min(a1, b1)
c4 = max(a4, b4)
c2 = λ2a2 + (1- λ2b2)
c3 = λ3a3 + (1- λ3b3)
(4.11)
In the definition of the C fuzzy set, λ2, λ3 are between 0 and 1 and
determine which fuzzy set, A or B, has more influence on the newly
generated set C, with a default value for both of 0.5.
69
Figure 4-1 - Creating a fuzzy set C to replace two similar sets A and B (from [68])
With the merging solution described above, the authors propose an
algorithm for simplifying the rules in the model. The algorithm performs the
following steps:
- Select the most similar pair of fuzzy sets
- If the similarity score exceeds a certain parameter, λ,
then merge the two fuzzy sets and update the rule set
- Repeat until no pair of fuzzy sets exceeds the λ threshold
- For each rule in the system, compute the similarity with
the universal set (U, µU(x)=1 x in X). If the similarity with
the universal set exceeds a certain threshold , then
remove the rule from the set (too universal)
- Merge the rules with identical premise part
70
Figure 4-2 Merging of similar rules (from [68])
Further work in [76] refines the method in [68] by the following steps:
- Reduce the feature set by feature selection
- Apply the method in [68]
- Apply a Genetic Algorithm to improve the accuracy of the
rules. To maintain the interpretability of the rule set, the
genetic algorithm step is restricted to the neighborhood
of the initial rule set
4.4 Rule Generalization
In [1], four molecular descriptors are used (molecular weight, number of H-
bond donors and acceptors, and ClogP) to predict biological activity (IC50). In
the paper, we introduced a novel rule generalization algorithm and a rule
inference procedure able to improve the rules extracted from a neural
71
network. This section describes the rule generalization algorithm, discusses
the results and proposes some directions for further research.
4.4.1 Problem and context
In [1], the IC50 prediction task uses a FAM-type prediction technique called
Fuzzy ARTMAP with Relevance (FAMR).
The Adaptive Resonance Theory (ART), described in detail in [57], is a
special kind of neural network with sequential learning ability. ART’s pattern
recognition features are enhanced with fuzzy logic in the Fuzzy ART model,
introduced in [77].
The FAMR is an incremental, neural network-based learning system used for
classification, probability estimation, and function approximation,
introduced in [78]. The FAMR architecture is able to sequentially
accommodate input-output sample pairs. Each such pair may be assigned a
relevance factor, proportional to the importance of that pair during the
learning phase.
FAM networks have the capability to easily expose the learned knowledge
in the form of fuzzy IF/THEN rules; several authors have addressed this issue
for classification tasks, such as [79] , [80]. The final goal in generating such
rules would be to explain, in human-comprehensible form, how the
network arrives at a particular decision, and to provide insight into the
influence of the input features on the target. To the best of our knowledge,
no author has discussed FAM rule extraction for function approximation
tasks, such as IC50 prediction.
72
Carpenter and Tan, in [79] and [81] were the first who introduced a FAM
rule extraction procedure. To reduce complexity of the fuzzy ARTMAP, a
pruning procedure was also introduced. In [1] we adapt Carpenter and Tan‘s
rule extraction method for function approximation tasks with the FAMR.
4.4.2 The rule generalization algorithm
Let O be the set of rules extracted from the FAMR model. In this section,
the quality of the rules in O is analyzed from the perspective of the
confidence (conf) and support (supp) properties described in Section 2.3.1
above.
The rules in O have support between 0.0% and 16.47%, and confidence
between 0.00% and 100.00%. To ensure the quality of the final rule set, we
use a minimum confidence and a minimum support criterion for the output
rules and prune the rules, from the extracted set, which do not meet these
minimum support and confidence criteria.
The set of rules extracted this way has the following characteristics:
All rules are complete with regard to the input descriptors (the antecedent of each rule contains, therefore, one predicate for each descriptor), a consequence of the rule extraction algorithm.
Certain descriptor fuzzy categories do not appear in any rule. To further analyze this rule set, we introduce two new measures for the rule
set:
73
- Coverage: The percentage of training data points which have the following property: There exists at least one rule for which the molecule‘s descriptors fall within the range of the antecedent (i.e. the percentage of points for which at least one rule is triggered).
- Accuracy: The percentage of training data points which have the following property: There exists at least one rule for which the molecule‘s descriptors fall within the range of all antecedents and, in addition, the output falls within the range of the consequent (i.e. the percentage of points for which a correct rule is triggered).
Assuming that some rules are too specific to the training set (over fitting),
we attempt to generalize them, by applying a greedy Rule Generalization
Algorithm (RGA). The RGA is applied to each rule in the set.
Rule Generalization Algorithm (RGA). Let a rule R be represented as
R: (X1 = x1,X2 = x2, . . . ,Xn = xn) ⇒ (Y = y) (4.12) Relax R by replacing one predicate Xi = xi with a wild card value,
representing any possible state and designated by the (Xi = ) notation. By
definition, the newly formed rule has the same or better support, as its
antecedent is less restrictive. If the newly formed rule’s confidence meets
the minimum confidence criterion, then keep it in a pool of candidates. This
procedure is applied for all the predicates in the rule, resulting in at most n
generalized rules (where n is the number of predicates in the original rule)
which have support better or equal with the original rule. If the candidate
pool is not empty, replace the original with the candidate which maximizes
the confidence. The algorithm is applied recursively to the best
74
generalization and it stops when the candidate pool is empty (no better
generalization can be found).
The RGA’s goal is to relax the rules by trying to improve, at each step, the
rule support, without sacrificing accuracy beyond the minimum acceptable
confidence level.
Figure 4-3 A visual representation of the RGA
Figure 4-3 provides a visual representation of the way the RGA works.
Consider a rule R: (X=High, Y=High)⇒ (Target = t). If, after relaxing the
Y=High condition the new rule R’: (X=High, Y=*)⇒ (Target = t) has sufficient
accuracy (the support is already guaranteed), then R’ becomes a candidate
for replacing R.
In the worst case, the number of predicate replacements for each rule is in
O(n2). Any relaxation of a rule increases (or does not change) the support of
that rule. Therefore, relaxing a rule improves both its confidence and
support.
75
Example of iteratively applying RGA: This example is extracted from the
original experimental results presented with [1] . Let R be a complete rule in
the original O set. As mentioned, previously, all rules contain one predicate
for each of the four inputs.
The values for each of the descriptors are binned in 5 buckets (B1-B5), see
Chapter 6 below, presenting experimental results, for details.
Rules {O1, . . . ,O13} have support between 0.0% and 16.47%, and confidence
between 0.00% and 100.00%. In order to remove irrelevant rules (pruning),
111
we introduce a minimum confidence criterion of 25% and a minimum
support criterion of 2.5%. Rule O3 does not meet these criteria and was
removed from the set.
After applying the algorithm described in 4.4.2 above to the rule set {O1, . . .
,O10- − ,O3} , the following generalized rules are obtained:
G1 : (*, Low-Medium, *, *) ⇒ Excellent
G2 : (*, Medium, Low-Medium, *) ⇒ Excellent
G3 : (Medium, *, Medium, *) ⇒ Excellent
G4 : (*, Low, Low, Medium) ⇒ Mediocre
G5 : (*, Medium-High, Medium-High, *) ⇒ Terrible
As certain descriptor values do not appear in any rule, simple one-predicate
rules were produced to cover those slices of the descriptor space. Only one
such rule is produced for this dataset (after pruning those which do not
meet the minimum confidence and support criteria)
I1 : (Low,*, *, *) ⇒ Terrible
The combined rule set {G1, . . . ,G5} ∪ {I1 } is our end result. Finally, we
compared our FAMR rule extractor to the FNN [6]–[8] and to the following
standard decision trees implementations:
- CART (WEKA implementation - simpleCart) trees [107]
- Microsoft SQL Server 2008 Decision Trees [2]
112
For the decision trees, rules were extracted from each non-root node.
Naturally, the decision-tree derived rules have 100% coverage. The
complete comparison results are presented in Table 6-1.
Method/rule set Training Set Coverage
Training Set Accuracy
Test Set Coverage
Test Set Accuracy
FAMR: {O1, . . . ,O13}
57.39% 36.93% 20% 20%
FAMR: {G1, . . . ,G5}
86.36% 65.34% 90% 75%
FAMR: {G1, . . . ,G5} {I1}
88.64% 67.61% 90% 75%
CART 100% 64.20% 100% 75%
Microsoft Decision Trees
100% 69.32% 100% 80%
Table 6-1 Rules set comparison
The FAMR {G1, . . . ,G5} {I1 } rule set has very good coverage and accuracy. For the test set, the {G1, . . . ,G5} {I1 } rules have almost the same accuracy as the rule set derived from classic decision trees system (the test set consists of 20 molecules, so a difference of 5% translates to one incorrect prediction). This is rather surprising, considering that the fact that decision trees are a dedicated tool for rule generation, whereas the FAMR was essentially designed as a primary prediction/classification model.
6.2.2 Results for the apriori post-processing algorithm
We present here some experimental results obtained after applying the
Rules Generalization Algorithm on various datasets:
Dataset Apriori params Initial Rules Rules after generalization
IC50 minconf=60%, 135 31
Movie Recommendations
minconf=60% minsup=3
18436 1788
113
Demographics, predicting Home Ownership
MovieRecommendations (associative)
Minconf=60% Minsup=10
25058 5677
Iris (discretized) Minconf=60% 208 38
For the movie recommendation dataset, the demographic table has been
used. The apriori algorithm was employed to extract rules predicting home
ownership status from other demographic attributes.
6.3 Experimental results for the Itemized Accuracy Curve
A Windows application has been developed to illustrate and test the
Itemized Accuracy Curve concepts. The application functions as a client for
the Microsoft SQL Server Analysis Service platform, which allows
instantiation of multiple data mining algorithms on the same datasets.
Multiple association rules models were investigated using the IAC client
application. The application uses DMX [108] statements for executing the
recommendation queries.
The UI of the respective application is presented in Figure 6-1. The
application uses the True Positive count as accuracy metric and sorts the
items in the product catalog in descending order of their popularity on the
abscissa. The dominant curve (red line) is associated with an ideal
recommendation system which produces zero False Negatives (and, hence,
the curve is identical to the popularity curve). The green curve, present in
the left part of the diagram, is associated with the Most Frequent n-Items
114
Recommender. The other lines are associated with different
recommendation systems. Clicking at any point on the chart surface
presents the item rendered at the specified location on the abscissa
together with the number of True Positives yielded by each of the
recommenders, as in Table 6-2.
Model Correct Recommendations
(Ideal Model) 233
(MFnR) 0
MA_apriori_p20 120
MA_Trees_2048 165 Table 6-2 True Positive counts for the selected item
Figure 6-1 Itemized Accuracy Chart for n=3 (Movie recommendations)
115
6.3.1 Movie Recommendation Results
We have built four recommendation models, using Microsoft SQL Server:
MA_apriori_p20 and MA_apriori_p40 use the Microsoft Association Rules
algorithms, an optimized implementation of the Apriori algorithm. It uses a
minimum rule probability threshold of 0.2 and 0.4, respectively. They both
use a minimum support of 10 (meaning approximately 0.3% for this
dataset).
MA_Trees_256 and MA_Trees_2048 use the Microsoft Decision Trees
algorithm to build a forest of trees to be used for recommendations. They
build, respectively, 256 (default) and 2048 trees.
Figure 6-2 presents the lift of the 4 models as a function of n, the number of
recommendations:
116
Figure 6-2 Evolution of Lift for various values of N for test models (Movie Recommendations dataset)
6.3.2 Movie Lens Results
We have built four recommendation models, using Microsoft SQL Server:
apriori, apriori_min_supp_10 and apriori_min_supp_100 use the Microsoft
Association Rules algorithm. It uses a minimum rule probability threshold of
0.2 and minimum support thresholds of 1000, 10 and 100.
DecisionTrees uses the Microsoft Decision Trees algorithm to build a forest
of trees to be used for recommendations. It contains 2048 trees.
0.9
1.1
1.3
1.5
1.7
1.9
2.1
2.3
2.5
0 10 20 30
(Baseline MFnR Lift)
MA_apriori_p20-L
MA_Trees_2048-L
MA_Trees_256-L
MA-apriori_p40-L
117
Figure 6-3 presents the lift of the 4 models as a function of n, the number of
recommendations
Figure 6-3 Evolution of Lift for various values of N for test models (Movie Lens dataset)
It is interesting to notice that the decision tree outperforms the apriori
models and that some of the apriori models actually perform worse than
the Most Frequent n-Item Recommender.
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0 10 20 30
(Baseline MFnR Lift)
apriori-L
apriori_min_supp_10-L
apriori_min_supp_100-L
DecisionTrees-L
118
7 Conclusions and directions for further research
The thesis presents a synthesis of recent research in the area of associative
and predictive rules and post processing of these rules. The original
contributions are focused on practical improvements of rule systems.
7.1 Conclusions
In Chapter 4, we introduced a novel method for post-processing a set of
rules in order to improve its generalization capability. The method is
developed specifically for rules extracted from a fuzzy ARTMAP incremental
learning system used for classification, hence for rule generated indirectly
(as Fuzzy ARTMAP does not directly produce rules).
We also proposed an algorithm for generalizing rule sets produced by
common rule extraction algorithms, such as apriori. The experimental
results for this algorithm look very promising as they reduce the size of rule
sets by 5-10 times. More work is necessary to fully determine the properties
of this generalization algorithm, as shown in the next section.
In the second part of the thesis, in Chapter 5, we proposed a novel
instrument for evaluating the quality of recommendation systems, in the
context of recent research regarding the accuracy of recommendation
systems. The instrument has been introduced, as a patent, in [16]. The
Itemized Accuracy Curve has certain interesting properties. Among them:
119
- It provides an intuitive way of comparing different recommendation
systems
- It allows aggregations of the accuracy metrics across items
dimensions
7.2 Further Work
The Rules Generalization Algorithm introduced in Chapter 4 works by
eliminating entire slices of the premise space from the rule antecedents.
While this approach produced good results in our experiments, it is
probably too coarse. A better solution, although more computationally
intensive, may be to check the neighborhood of the initial antecedent and
merge those areas which, when added to the antecedent, keep the rule’s
accuracy above the minimum confidence criteria. Section 4.5.1 suggests
refinement of the algorithm which would result in rules such as:
R’’:(X=High, Y{High, Medium, Very High})⇒
(Target = t).
(4.20)
This consists, in essence, in merging the antecedent part of two rules as
long as they are adjacent, they share the consequent and the resulting rule
does not fall below the minimum confidence threshold.
Section 4.5.1 also describes a direction for research whether the refinement
may be applied to fuzzy rule sets, as a way of merging adjacent fuzzy sets
that serve as premises for Takagi-Sugeno rules with similar consequents.
120
From an implementation perspective, it is interesting to notice that the
algorithm allows block evaluation of multiple measurements. In a typical
relational database, all the neighbors of the premise space could be
evaluated in a single pass over data using GROUP BY relational algebra
constructs. This will likely produce significant performance gains. Recent
developments in the space of in-memory database systems (see [82], [83] )
may be useful in addressing the cost of computing the accuracy and support
while relaxing predicates.
In Section 4.4.3 we proposed an algorithm for generalizing the rule sets
produced by algorithms such as apriori, with significant reduction in the
number of rules, as presented by the experimental results. This reduction
makes the rules set more accessible and easier to interpret. Additional work
is required, though, to estimate the predictive power of the reduced rule
set and to measure the accuracy tradeoff that is being introduced by this
rule set simplification technique. The greedy nature of algorithm prevents
detection of all possible generalizations of the rule set. A different direction
for further work is investigating whether a more complex data structure,
possibly combined with a new sort order which takes into account the
antecedent’s length before the lexicographic order, may address this issue.
121
More work is also needed to study the possibility of applying the rule
generalization algorithm to the area of multiple-level association rules
described in [84] (and also in section 2.3.2 above).
Chapter 5 introduced the Itemized Accuracy Curve as an intuitive way to
compare recommendation systems. The Itemized Accuracy Curve, however,
does not take into account the ranking of an item in the recommendation
list. Investigating accuracy measures that can be used with the Itemized
Accuracy Curve in conjunction with the ranking of items may provide more
value.
Another direction of further research is integrating the evaluation of other
performance characteristics of recommendation systems, such as the
degree to which a recommendation system covers the entire set of items
(see [104]), the computing time, the novelty of recommendations or its’
robustness [105] in the algorithm for computing the itemized accuracy
diagram.
122
Appendix A: Key Algorithms
Apriori
The following pseudo-code is the main procedure for generating frequent
The following procedure generates all of the qualified association rules:
For each frequent itemset f,
generate all the subset x and its complimentary set y = f - x
If Support(f)/Support(x) > Minimum_Probability, then
x => y is a qualified association rule
probability = Support(f)/Support(x)
End If
124
FP-Growth
As discussed in Section 3.1.2 above, the FP-growth algorithm extracts the
frequent items into a frequent pattern tree (FP-tree), retaining the itemset
association information, then divides the database into a set of conditional
databases, each associated with one frequent item, and mine each such
database separately.
A FP-tree is populated in the following steps [38] . A procedure called
BuildFrequentItemsList is supposed to exist and scan the transaction space,
creating a sorted list of items, in descending order of support. The
procedure is also supposed to eliminate infrequent items. The procedure is
not part of the implementation as it can often be optimized in a Database
or platform (e.g. SQL Server Analysis services). Another procedure, Sort, is
supposed to sort items in a transaction in the order specified in the list
argument.
Procedure FP_Create(TransactionSpace)
Let Tree = new node
Tree.item-name = null
Let L = BuildFrequentItemsList(TransactionSpace)
Foreach Trans in TransactionSpace
Let SortedTrans = Sort (Trans, L)
FP_Insert(Tree,SortedTrans)
Next
End Procedure
Procedure FP_Insert(Tree,Trans)
Let p = First Item in Trans
Let q = Reminder of Trans (excluding p)
If Tree has a child node N such as N.item-name = p.item-name
Then
N.count++
125
Else
Create new node N, child of Tree
N.item-name = p.item-name
N.count = 1
End If
FP_Insert(N, q)
End Procedure
Mining of a FP-tree is performed by calling FP_growth(FP_tree, null),
implemented as below (as described in [38]):
Procedure FP_Growth(Tree, x) If Tree contains a single path P then
For each combination β of the nodes in the path P
Generate pattern β x with supp=minimum support of nodes in β
Else for each ai in the header of Tree
Generate pattern β = ai x with supp=supp(ai) Construct β’s conditional pattern base,
Construct β’s conditional tree, Treeβ
If Treeβ then
call FP_Growth(Treeβ , β)
End if
End If
End Procedure
126
Bibliography
[1] Razvan Andonie, Levente Fabry-Asztalos, Ioan Bogdan Crivat, Sarah Abdul-Wahid, and Badi Abdul-Wahid, "Fuzzy ARTMAP rule extraction in computational chemistry," in Proceedings of the International Joint Conference on Neural Networks (IJCNN), Atlanta, GA, 2009, pp. 157-163.
[2] Jamie MacLennan, Ioan Bogdan Crivat, and ZhaoHui Tang, Data Mining with Microsoft SQL Server 2008. Indianapolis, Indiana, United States of America: Wiley Publishing, Inc., 2009.
[3] Ioan Bogdan Crivat, Paul Sanders, Mosha Pasumansky, Marius Dumitru, Adrian Dumitrascu, Cristian Petculescu, Akshai Mirchandani, T.K Anand, Richard Tkachuk, Raman Iyer, Thomas Conlon, Alexander Berger, Sergei Gringauze, James MacLennan, and Rong Guan, "Systems and methods of utilizing and expanding standard protocol," USPTO Patent/Application Nbr. 7689703, 2010.
[4] Ioan B Crivat, Raman Iyer, and C James MacLennan, "Detecting and displaying exceptions in tabular data," USPTO Patent/Application Nbr. 7797264, 2010.
[5] Ioan B Crivat, Raman Iyer, and C. James MacLennan, "Dynamically detecting exceptions based on data changes," USPTO Patent/Application Nbr. 7797356, 2010.
[6] Ioan B Crivat, Raman Iyer, and James MacLennan, "Partitioning of a data mining training set," USPTO Patent/Application Nbr. 7756881, 2010.
[7] Ioan B Crivat, Cristian Petculescu, and Amir Netz, "Efficient Column Based Data Encoding for Large Scale Data Storage," USPTO Patent/Application Nbr. 20100030796 , 2010.
[8] Ioan B Crivat, Cristian Petculescu, and Amir Netz, "Explaining changes in measures thru data mining," USPTO Patent/Application Nbr. 7899776, 2011.
[9] Ioan B Crivat, Cristian Petculescu, and Amir Netz, "Random access in run-length encoded structures," USPTO Patent/Application Nbr. 7952499, 2011.
127
[10] Ioan B. Crivat, Raman Iyer, C. James MacLennan, Scott Oveson, Rong Guan, Zhaohui Tang, Pyungchul Kim, and Irina Gorbach, "Extensible data mining framework ," USPTO Patent/Application Nbr. 7383234, 2008.
[11] Ioan Bogdan Crivat, Pyungchul Kim, ZhaoHui Tang, James MacLennan, Raman Iyer, and Irina Gorbach, "Systems and methods that facilitate data mining," USPTO Patent/Application Nbr. 7398268, 2008.
[12] Ioan Bogdan Crivat, C. James MacLennan, Yue Liu, and Michael Moore, "Techniques for Evaluating Recommendation Systems," Application USPTO Patent/Application Nbr. 20090319330, 2009.
[13] Ioan, B Crivat, C, James MacLennan, and Raman Iyer, "Goal seeking using predictive analytics," USPTO Patent/Application Nbr. 7788200, 2010.
[14] Ioan, Bogdan Crivat, Elena, D. Cristofor, and C. James MacLennan, "Analyzing mining pattern evolutions by comparing labels, algorithms, or data patterns chosen by a reasoning component ," USPTO Patent/Application Nbr. 7636698, 2009.
[15] Ioan, Bogdan Crivat, C., James MacLennan, ZhaoHui Tang, and Raman S. Iyer, "Unstructured data in a mining model language," Patent (USPTO) USPTO Patent/Application Nbr. 7593927, 2009.
[16] Ioan Bogdan Crivat, C. James MacLennan, Yue Liu, and Michael Moore, "Techniques for Evaluating Recommendation Systems," Patent Application (USPTO) USPTO Patent/Application Nbr. 20090319330, 2009.
[17] Jeff Davis. (2002, July)Data Mining with Access Queries [Online]. http://www.techrepublic.com/article/data-mining-with-access-queries/1043734
[18] devexpress. Pivot Table® Style Data Mining Control for ASP.NET AJAX [Online]. http://www.devexpress.com/Products/NET/Controls/ASP/Pivot_Grid/
[19] Laura W. Murphy. (2010)Testimony Regarding Civil Liberties and National Security: Stopping the Flow of Power to the Executive Branch [Online].
[20] Intel Corporation. (2005)Excerpts from A Conversation with Gordon Moore: Moore’s Law *Online+. ftp://download.intel.com/museum/Moores_Law/Video-Transcripts/Excepts_A_Conversation_with_Gordon_Moore.pdf
[21] Chip Walter. (2005, July)Kryder's Law [Online]. http://www.scientificamerican.com/article.cfm?id=kryders-law
[22] John Gantz and David Reinsel. (2010, May)The Digital Universe Decade – Are You Ready? [Online]. http://idcdocserv.com/925
[23] Roger, E. Bohn and James, E. Short. (2010, January)How Much Information? 2009 [Online]. http://hmi.ucsd.edu/pdf/HMI_2009_ConsumerReport_Dec9_2009.pdf
[24] Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth, "Knowledge Discovery and Data Mining: Towards a Unifying Framework," in KDD, 1996.
[25] 11 Ants Analytics. www.11antsanalytics.com. [Online]. http://www.11antsanalytics.com/products/default.aspx
[29] Rakesh Agrawal, Tomasz Imielinski, and Arun N Swami, "Mining association rules between sets of items in large databases," vol. 22, pp. 207-216, 1993, p207-agrawal.pdf.
[30] Jiawei Han and Micheline Kamber, Data Mining Concepts and Techniques. San Diego, CA, USA: Academic Press, 2001.
[31] Microsoft Corporation. Maximum Capacity Specifications for SQL Server [Online]. http://msdn.microsoft.com/en-us/library/ms143432.aspx
[33] Ramakrishnan Srikant and Rakesh Agrawal, "Mining quantitative association rules in large relational tables," in International Conference on Management of Data - SIGMOD, vol. 25, 1996, pp. 1-12, srikant96.pdf.
[34] Nikola K. Kasabov, Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering.: Massachusetts Institute of Technology, 1998.
[35] E.H. Mamdani, "Application of Fuzzy Logic to Approximate Reasoning Using Linguistic Synthesis," IEEE Transactions on Computers - TC, vol. 26, no. 12, pp. 1182-1191.
[36] T. Takagi and M Sugeno, "Fuzzy identification of systems and its applications to modelling and control," IEEE Transactions on Systems, Man and Cybernetics, no. 15, pp. 116-132, 1985, http://pisis.unalmed.edu.co/vieja/cursos/s4405/Lecturas/Takagi%20Sugeno%20Modelling.pdf.
[37] Rakesh Agrawal and Ramakrishnan Srikant, "Fast Algorithms for Mining Association Rules," in Very Large Databases VLDB, 1994, http://www.eecs.umich.edu/~jag/eecs584/papers/apriori.pdf.
[38] Jiawei Han, Jian Pei, and Yiwen Yin, "Mining frequent patterns without candidate generation," in International Conference on Management of Data - SIGMOD, vol. 29, 2000, pp. 1-12, dami04_fptree.pdf.
[39] Jiawei Han, Jian Pei, Yiwen Yin, and Runying Mao, "Mining Frequent Patterns without Candidate: A Frequent-Pattern Tree Approach," Data Mining and Knowledge Discovery, vol. 8, pp. 53-87, 2004, dami04_fptree.pdf.
[40] Ashok Savasere, Edward Omiecinski, and Shamkant B. Navathe, "An Efficient Algorithm for Mining Association Rules in Large Databases," in Very large Databases VLDB, 1995, pp. 432-444, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.103.5437&rep=rep1&type=pdf.
[41] Ramesh, C. Agarwal, Charu C. Aggarwal, and V.V.V. Prasad, "A Tree
Projection Algorithm For Generation of Frequent Itemsets," Journal of Parallel and Distributed Computing , 1999.
[42] Nicolas Pasquier, Yves Bastide, Rafik Taouil, and Lotfi Lakhal, "Efficient Mining of Association Rules Using Closed Itemset Lattices," Information Systems - IS, vol. 24, no. 1, pp. 25-46, 1999, http://cchen1.csie.ntust.edu.tw:8080/students/2009/Efficient%20mining%20of%20association%20rules%20using%20closed%20itemset%20lattices.pdf.
[43] Nicolas Pasquier, Yves Bastide, Rafik Taouil, and Lotfi Lakhal, "Discovering Frequent Closed Itemsets for Association Rules," International Conference on Database Theory - ICDT, pp. 398-416, 1999, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.37.1102&rep=rep1&type=pdf.
[44] Mohammed Javeed Zaki and Ching-jiu Hsiao, "CHARM: An Efficient Algorithm for Closed Itemset Mining," in SIAM International Conference on Data Mining - SDM, 2002, CHARM.pdf.
[45] Zijian Zheng, Ron Kohavi, and Llew Mason, "Real world performance of association rule algorithms," in Knowledge Discovery and Data Mining - KDD, 2001, pp. 401-406, RealWorldPerf01.pdf.
[46] Yun Sing Koh and Nathan Rountree, Rare Association Rule Mining And Knowledge Discovery - Technologies for Infrequent and Critical Event Detection. Hershey, PA: Information Science Reference, 2010.
[47] Bing Liu, Wynne Hsu, and Yiming Ma, "Mining association rules with multiple minimum supports," in Knowledge Discovery and Data Mining - KDD, 1999, pp. 337-341.
[48] Hyunyoon Yun, Danshim Ha, Buhyun Hwang, and Keun Ho Ryu, "Mining association rules on significant rare data using relative support," Journal of Systems and Software - JSS, vol. 67, no. 3, pp. 181-191, 2003.
[49] Ke Wang, Yu He, and Jiawei Han, "Pushing Support Constraints Into Association Rules Mining," IEEE Transactions on Knowledge and Data Engineering : TKDE, pp. 642-658, 2003.
[50] Masakazu Seno and George Karypis, "LPMiner: An Algorithm for
131
Finding Frequent Itemsets Using Length-Decreasing Support," in IEEE: International Conference on Data Mining ICDM, 2001.
[51] E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P. Indyk, R. Motwani, J.D. Ullman, and C. Yang, "Finding interesting associations without support pruning," IEEE Transactions on Knowledge and Data Engineering - TKDE, vol. 13, no. 1, pp. 64-78, 2001, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.96.7294&rep=rep1&type=pdf.
[52] Yun Sing Koh and Nathan Rountree, "Finding Sporadic Rules Using Apriori-Inverse," Lecture Notes in Computer Science, vol. 3518/2005, pp. 153-168, 2005.
[53] L. Szathmary, A. Napoli, P. Valtchev, and Vandceuvre-les-Nancy LORIA, "Towards Rare Itemset Mining," in IEEE International Conference on Tools with Artificial Intelligence - ICTAI 2007, 2007, pp. 305-312, http://hal.archives-ouvertes.fr/docs/00/18/94/24/PDF/szathmary-ictai07.pdf.
[54] J. R. Quinlan, "Induction of Decision Trees," Machine Learning - ML, vol. 1, no. 1, pp. 81-106, 1986, InductionOfDT.pdf.
[55] Leo Breiman, Jerome Friedman, Charles J Stone, and R A Olshen, Classification and Regression Trees.: Chapman & Hall, 1984.
[56] Cristopher M. Bishop, Neural Networks for Pattern Recognition. New York: Oxford University Press, Inc, 1995.
[57] G.A. Carpenter and S Grossberg, The Handbook of Brain Theory and Neural Networks, Michael A. Arbib, Ed. Cambridge, MA: MIT Press, 2003, http://cns.bu.edu/Profiles/Grossberg/CarGro2003HBTNN2.pdf.
[58] Robert Andrews, Joachim Diederich, and Alan B. Tickle, "Survey and critique of techniques for extracting rules from trained artificial neural networks," Knowledge Based Systems - KBS, vol. 8, no. 6, pp. 373-389, 1995.
[59] Alan B. Tickle, Robert Andrews, Mostefa Golea, and Joachim Diederich, "The Truth Will Come to Light: Directions and Challenges in Extracting the Knowledge Embedded Within Trained Artificial Neural Networks," IEEE TRANSACTIONS ON NEURAL NETWORKS, vol. 9, no. 6, 1998,
132
TruthWillComeToLight.pdf.
[60] K Saito and R. Nakano, "Medical diagnosis expert system based on PDP model," in IEEE International Conference on Neural Networks, New York, 1988, pp. 1255-1262.
[61] Kurt Hornik, Maxwell B. Stinchcombe, and Halbert White, "Multilayer feedforward networks are universal approximators," Neural Networks, vol. 2, no. 5, pp. 359-366, 1989.
[62] Bart Kosko, "Fuzzy Systems as Universal Approximators," IEEE Transactions on Computers - TC, vol. 43, no. 11, pp. 1329-1333, 1994, http://sipi.usc.edu/~kosko/FuzzyUniversalApprox.pdf.
[63] J. J. Buckley, Y. Hayashi, and E. Czogala, "On the equivalence of neural nets and fuzzy expert systems," Fuzzy Sets and Systems, vol. 53, no. 2, pp. 129-134, 1993.
[64] J.M. Benitez, J.L. Castro, and I. Requena, "Are artificial neural networks black boxes?," IEEE Transactions on neural Networks, pp. 1156 - 1164 , 1997, http://www.imamu.edu.sa/Scientific_selections/abstracts/Math/Are%20Artificial%20Neural%20Networks%20Black%20Boxes.pdf.
[65] S. Mitra and Y. Hayashi, "Neuro-fuzzy rule generation: survey in soft computing framework," IEEE Transactions on Neural Networks, vol. 11, no. 3, pp. 748-768, 2000.
[66] Razvan Andonie, Levente Fabry-asztalos, Catharine Collar, Sarah Abdul-wahid, and Nicholas Salim, "Neuro-fuzzy Prediction of Biological Activity and Rule Extraction for HIV-1 Protease Inhibitors," in Symposium on Computational Intelligence in Bioinformatics and Computational Biology - CIBCB, 2005, pp. 113-120.
[67] J. Chorowski and J. M. Zurada, "Extracting Rules from Neural Networks as Decision Diagrams," IEEE Transactions on Neural Networks, vol. PP, no. 99, pp. 1 - 12, 2011, ExtRulesNNDecisionDiagrams.pdf.
[68] Magne Setnes, Robert Babuska, Uzay Kaymak, and Hans R. van Nauta Lemke, "Similarity Measures in Fuzzy Rule Base Simplification," IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, vol. 28, no. 3, June 1998.
133
[69] M. Mizumoto and H. J. Zimmermann, "Comparison of fuzzy reasoning methods," Fuzzy Sets and Systems - FSS, vol. 8, no. 3, pp. 253-283, 1982.
[70] László T. Kóczy and Kaoru Hirota, "Approximate reasoning by linear rule interpolation and general approximation," International Journal of Approximate Reasoning - IJAR , vol. 9, no. 3, pp. 197-225, 1993.
[71] I.T. Jolliffe, Principal Component Analysis.: Springer, 2002.
[72] J.W. Sammon, "A Nonlinear Mapping for Data Structure Analysis," IEEE Transactions on Computers - TC, vol. C-18, no. 5, pp. 401-409, 1969, http://www.mec.ita.br/~rodrigo/Disciplinas/MB213/Sammon1969.pdf.
[73] Manoranjan Dash and Huan Liu, "Feature Selection for Classification," Intelligent Data Analysis - IDA, vol. 1, no. 1-4, pp. 131-156, 1997, http://reference.kfupm.edu.sa/content/f/e/feature_selection_for_classification__39093.pdf.
[74] B.G. Song, R.J., II Marks, S. Oh, P. Arabshahi, T.P. Caudell, and J.J. Choi, "Adaptive membership function fusion and annihilation in fuzzy if-then rules," in Second IEEE International Conference on Fuzzy Systems, vol. 2, 1993, pp. 961 - 967.
[75] N. Xiong and Lothar Litz, "Reduction of fuzzy control rules by means of premise learning - method and case study," Fuzzy Sets and Systems - FSS, vol. 132, no. 2, pp. 217-231, 2002, http://www.sciencedirect.com/science/article/pii/S0165011402001124.
[76] Johannes A. Roubos, Magne Setnes, and János Abonyi, "Learning fuzzy classification rules from labeled data," Information Sciences - ISCI, vol. 150, no. 1-2, pp. 77-93, 2003, http://sci2s.ugr.es/keel/pdf/specific/articulo/15-E.pdf.
[77] Gail A. Carpenter, Stephen Grossberg, and David B. Rosen, "Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system," Neural Networks, vol. 4, no. 6, pp. 759-771, 1991,
[78] R Andonie and L. Sasu, "Fuzzy ARTMAP with input relevances," IEEE Transactions on Neural Networks, vol. 17, pp. 929–941, 2006.
[79] Gail Carpenter and H. A. Tan, "Rule Extraction: From Neural Architecture to Symbolic Representation," Connection Science, vol. 7, no. 1, pp. 3-27, 1995.
[80] S. C. Tan, Chee Peng Lim, and M. V. C. Rao, "A hybrid neural network model for rule generation and its application to process fault detection and diagnosis," Engineering Applications of Artificial Intelligence - EAAI, vol. 20, no. 2, pp. 203-213, 2007.
[81] G. A Carpenter and A.-H. Tan, "Rule Extraction, Fuzzy ARTMAP and medical databases," in Proceedings of the World Congress on Neural Networks, Portland, Oregon; Hillsdale, NJ, 1993, pp. 501-506, http://digilib.bu.edu/journals/ojs/index.php/trs/article/view/430.
[82] Ioan B Crivat, Cristian Petculescu, and Amir Netz, "Efficient Column Based Data Encoding for Large Scale Data Storage," Patent Application (USPTO) USPTO Patent/Application Nbr. 20100030796, 2010.
[83] Ioan B Crivat, Cristian Petculescu, and Amir Netz, "Random access in run-length encoded structures," Patent (USPTO) USPTO Patent/Application Nbr. 7952499, 2011.
[84] Jiawei Han and Yongjian Fu, "Discovery of Multiple-Level Association Rules from Large Databases," in Very Large Databases - VLDB, 1995, pp. 420-431, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.64.3214&rep=rep1&type=pdf.
[85] Greg Linden, B. Smith, and J. York, "Amazon.com recommendations: item-to-item collaborative filtering," Internet Computing, IEEE , vol. 7, no. 1, pp. 76 - 80, January 2003.
[87] David Goldberg, David A. Nichols, Brian M. Oki, and Douglas Terry, "Using collaborative filtering to weave an information tapestry," Communications of the ACM - CACM, vol. 35, no. 12, pp. 61-70,
[88] Xiaoyuan Su and Taghi M. Khoshgoftaar, "A Survey of Collaborative Filtering Techniques," Advances in Artificial Intelligence, no. January 2009, 2009, http://www.hindawi.com/journals/aai/2009/421425/.
[89] Badrul Sarwar, George Karypis, Joseph Konstan, and John Reidl, "Item-based collaborative filtering recommendation algorithms," in World Wide Web Conference Series - WWW, 2001, pp. 285-295, http://glaros.dtc.umn.edu/gkhome/fetch/papers/www10_sarwar.pdf.
[90] Jeff J. Sandvig, Bamshad Mobasher, and Robin D. Burke, "Robustness of collaborative recommendation based on association rule mining," in Conference on Recommender Systems - RecSys, 2007, pp. 105-112, http://maya.cs.depaul.edu/~mobasher/papers/smb-recsys07.pdf.
[91] R Andonie, J.E. Russo, and R. Dean, "Crossing the Rubicon: A Generic Intelligent Advisor," International Journal of Computers, Communications & Control, vol. 2, pp. 5-16, 2007, http://www.cwu.edu/~andonie/MyPapers/Advisor%202005.pdf.
[92] Jonathan L. Herlocker, Joseph A. Konstan, Loren G. Terveen, and John T. Riedl, "Evaluating collaborative filtering recommender systems," ACM Transactions on Information Systems - TOIS, vol. 22, no. 1, pp. 5-53, 2004, http://web.engr.oregonstate.edu/~herlock/papers/tois2004.pdf.
[93] Asela Gunawardana and Guy Shani, "A Survey of Accuracy Evaluation Metrics of Recommendation Tasks," Journal of Machine Learning Research - JMLR, vol. 10, pp. 2935-2962, 2009, http://research.microsoft.com/pubs/118124/gunawardana09a.pdf.
[94] Ron Kohavi, Roger Longbotham, Dan Sommerfield, and Randal M. Henne, "Controlled experiments on the web: survey and practical
136
guide," Data Mining and Knowledge Discovery, vol. 18, no. 1, pp. 140-181, http://www.springerlink.com/content/r28m75k77u145115/fulltext.pdf.
[95] Cyril W. Cleverdon and Michael Keen, "Aslib Cranfield research project - Factors determining the performance of indexing systems; Volume 2, Test results," 1966.
[96] Daniel Billsus and Michael J. Pazzani, "Learning Collaborative Information Filters," in International Conference on Machine Learning - ICML, 1998, pp. 46-54, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.40.4781&rep=rep1&type=pdf.
[97] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl, "Analysis of recommendation algorithms for e-commerce," in ACM Conference on Electronic Commerce - EC, 2000, pp. 158-167.
[98] C. J. Van Rijsbergen, Information Retrieval.: Butterworth-Heinemann, 1979.
[99] Yiming Yang and Xin Liu, "A re-examination of text categorization methods," in Research and Development in Information Retrieval - SIGIR, 1999, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.11.9519&rep=rep1&type=pdf.
[100] John A. Swets, "EFFECTIVENESS OF INFORMATION RETRIEVAL METHODS," 1969.
[101] James A. Hanley and Barbara J. McNeil, "The Meaning and Use of the Area undera Receiver Operating Characteristics (ROC) Curve," Radiology, vol. 143, no. 1, pp. 29-36, April 1982, http://www.medicine.mcgill.ca/epidemiology/hanley/software/Hanley_McNeil_Radiology_82.pdf.
[102] Andrew I. Schein, Alexandrin Popescul, Lyle H. Ungar, David M. Pennock, and David Ungar, "Methods and metrics for cold-start recommendations.," in Research and Development in Information Retrieval - SIGIR, 2002, MethodMetricsColdStart.pdf.
[103] Ellen M. Voorhees, "Overview of the TREC 2002 Question Answering Track," in Text Retrieval Conference - TREC, 2002,
137
http://trec.nist.gov/pubs/trec11/papers/QA11.pdf.
[104] Bamshad Mobasher, Honghua Dai, Tao Luo, and Miki Nakagawa, "Effective personalization based on association rule discovery from web usage data," in Web Information and Data Management - WIDM, 2001, pp. 9-15.
[105] François Fouss and Marco Saerens, "Evaluating Performance of Recommender Systems: An Experimental Comparison," in Web Intelligence - WI, 2008, pp. 735-738.
[106] B. J. Dahlen, J. A. Konstan, J. L. Herlocker, N. Good, A. Borchers, and Riedl J., "Jump-starting movielens: user benefits of starting a collaborative filtering system with "dead data"," , 1998.
[107] Ian, H. Witten and Eibe Frank, Data Mining - Practical Machine Learning Tools and Techniques. San Francisco, CA, USA: Morgan Kauffman, 2005.
[108] Microsoft Corp. Data Mining Extensions (DMX) Reference [Online]. http://msdn.microsoft.com/en-us/library/ms132058.aspx
[109] Usama Fayyad, Georges, G. Grinstein, and Andreas Wierse, Information Visualization in Data Mining and Knowledge Discovery. San Diego, CA, USA: Academic Press, 2002.
[110] D. Bamber, "The area above the ordinal dominance graph and the area below the receiver operating characteristic graph.," Journal of Mathematical Psychology, vol. 12, pp. 387-415, 1975.
[112] Microsoft Academic Search. [Online]. http://academic.research.microsoft.com/
[113] Google Scholar. [Online]. http://scholar.google.com/
[114] Ioan, B Crivat, C, James MacLennan, Raman Iyer, and Dumitru Marius, "Using a rowset as a query parameter," Patent (USPTO) USPTO Patent/Application Nbr. 7451137, 2008.