Table of Contents Chapter 4. Clustering and Association Analysis ......................................................................................... 171 4.1. Cluster Analysis or Clustering ........................................................................................................... 171 4.1.1. Distance and similarity measurement...................................................................................... 173 4.1.2. Clustering Methods .................................................................................................................. 177 4.1.3. Partition-based Methods ......................................................................................................... 179 4.1.4. Hierarchical-based clustering ................................................................................................... 183 4.1.5. Density-based clustering .......................................................................................................... 186 4.1.6. Grid-based clustering ............................................................................................................... 188 4.1.7. Model-based clustering ............................................................................................................ 189 4.2. Association Analysis and Frequent Pattern Mining ......................................................................... 193 4.2.1. Apriori algorithm ...................................................................................................................... 197 4.2.2. FP-Tree algorithm ..................................................................................................................... 202 4.2.3. CHARM algorithm ..................................................................................................................... 206 4.2.4. Association Rules with Hierarchical Structure ......................................................................... 210 4.2.5. Efficient Association Rule Mining with Hierarchical Structure................................................. 216 4.3. Historical Bibliography ..................................................................................................................... 218 Exercise......................................................................................................................................................... 221 Sponsored by AIAT.or.th and KINDML, SIIT CC: BY NC ND
53
Embed
Sponsored by AIAT.or.th and KINDML, SIIT · techniques for clustering, and association rule mining in order. 4.1. Cluster Analysis or Clustering Unlike classification, cluster analysis
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Table of Contents
Chapter 4. Clustering and Association Analysis ......................................................................................... 171 4.1. Cluster Analysis or Clustering ........................................................................................................... 171
Figure 4-2: A dissimilarity (distance) matrix and similarity matrix
The first approach is to normalize all attributes into a fixed standard scale, say 0.0 to 1.0 (or -1.0
to 1.0) and then use a distance measure, such as Euclidean distance or Manhattan distance, or use
a similarity measure, such as cosine similarity, to determine the distance between a pair of
objects. The detail of this approach is described in Section 2.5.2. The second approach is to use
different measurements for different types of attributes as follows.
Sponsored by AIAT.or.th and KINDML, SIIT
175
No. Type Method
1. Interval-scaled
attributes
[Normalization Step]
Option 1: transformation of the original value to the standardized value by the absolute deviation of the attribute , denoted by , using the
mean value of the attribute , denoted by .
Option 2: transformation of the original value to the standardized value by the standard deviation of the attribute , denoted by , using the
mean value of the attribute , denoted by .
[Distance Measurement Step] For distance measurement in Figure 4-2, we can use a standard measure such as Euclidean distance or Mahattan distance between two objects, say and , as follows.
Euclidean Distance:
Manhattan Distance:
Both Euclidean distance and Manhattan distance satisfy the following mathematic requirements of a distance function.
1. : The distance is a nonnegative number.
2. : The distance from an object to itself is zero.
3. : The distance is a symmetric function.
4. : The distance satisfies the triangular
inequality. The distance from object i to object j directly is not larger
than the indirect contour over any other object h.
Sponsored by AIAT.or.th and KINDML, SIIT
176
1. Interval-scaled
attributes
(continued)
It is also possible to measure similarity instead of distance. As for Figure
4-2, two common similarity measures are dot product and cosine similarity.
Their formulae are given below. When the object is normalized, the dot
product and the cosine similarity become identical.
Dot Product:
Cosine Similarity:
Note that dot product has no bound but the cosine similarity ranges between -1
and 1 (in this task, 0 and 1).
2. Categorical
attributes
A categorical attribute can be viewed as a generalization of the binary attribute in that it can take more than two states. For example, ‘product brand’ is a categorical attribute that may take one value from a set of more than two possible values, say Oracle, Microsoft, Google, and Facebook. For the distance for a categorical attribute, it is possible to use the same approach with an interval-scaled attribute by setting distance to 1 if two objects have the same value, otherwise 0. When the value of a categorical attribute of two objects, and , is
the same, the dissimilarity and the similiarty between these two objects for that categorical attribute are set to 0 and 1, respectively. Otherwise they are 1 and 0, respectively. If ther
when the object and the object have a same value for attribute A.
when the object and the object have different values for attribute A.
3. Ordinal
attributes
A discrete ordinal attribute positions between a categorical attribute and numeric-valued attribute in the sense that a ordinal attribute has N discrete values (like categorical attributes) but they can be ordered in a meaningful sequence (like numeric-valued attributes). An ordinal attribute is useful for recording subjective assessments of qualities that cannot be measured objectively. For example, ‘height’ can be high, middle and low, ‘weight’ can be heavy, mild and light, etc. This ordinal property presents continuity of an unknown scale but its actual magnitude is not known. To handle the scale of an ordinal attribute, we can treat an ordinal variable by normalization as follows. First, the values of an ordinal variable are mapped to ranks. For example, suppose that each value of an ordinal attribute ( ) has been arranged to ranks ( ), say . Second each value of the ordinal attribute is mapped to a value between 0 and 1 ( ) as follows.
Third the dissimilarity can then be computed using any of the distance measures for interval-scaled variables, using to represent the value for the i-th object.
Sponsored by AIAT.or.th and KINDML, SIIT
177
4. Binary
attributes
Even it is possible to use the same approach with interval-scaled attributes, treating binary variables as if they are interval-scaled may not be suitable since it may mislead to improper clustering results. Here, it is necessary to use more suitable methods specific to binary data for computing dissimilarities. As one approach, a dissimilarity matrix can be calculated from the given binary data. Normally we consider all binary variables to have the same weight. By this setting, a 2-by-2 contingency table can be constructed to calculate dissimilarity between object i and object j as follows.
Object j
1 0 Total
Object i
1 a b a+b
0 c d c+d
Total a+c b+d a+b+c+d
Here, two types of dissimilarity measures for binary attributes are symmetric binary dissimilarity and asymmetric binary dissimilarity as follows.
The asymmetric binary dissimilarity is used when the positive and negative outcomes of a binary attribute are not equally important, such as the positive and negative outcomes of a disease test. That is, the value of 1 of an attribute has different importance level to the value of 0 of that attribute. For example, we may give more importance to the outcome of HIV positive (1) which is usually rarely occurred, and less importance to the outcome of HIV negative (0) which is usually detected. As the negation version of this dissimilarity, we can calculate symmetric binary similarity and asymmetric binary similarity , as follows.
5. Ratio-scaled
attributes
A ratio-scaled attribute takes a positive value on a nonlinear scale, such as an exponential scale. The following is the formula for a ratio-scaled value.
Here, A is a positive numeric constant, B is a numeric constant, and t is a focused variable. It is not good to treat ratio-scaled attributes like interval-scaled attributes since it is likely that the scale may be distorted due to exponential scales. There are two common methods to compute the dissimilarity between objects described by ratio-scaled attributes as follows.
1. The first method is to apply logarithmic transformation to a value of a ratio-scaled attribute for object I, say , by using the formula The transformed value
can be treated as an interval-valued attribute. However, it may be suitable to use other transformation, such as log-log transformation.
2. The second method is to treat as a continuous ordinal attribute and treat their ranks as an interval-valued attribute.
4.1.2. Clustering Methods
During there are many existing clustering algorithms, we can classify the major clustering
methods including partition-based methods, hierarchical methods, density-based methods, grid-
based methods, and model-based methods. Their details are discussed below.
Sponsored by AIAT.or.th and KINDML, SIIT
178
1. Partitioning methods
A partitioning method divides objects (data tuples) into partitions of the data, where
each partition represents a cluster and . This method needs to specify , the number of
partitions, beforehand. Usually clustering assigns one object to only one cluster but it is
possible to allow it to assign to several clusters, such as fuzzy partitioning techniques. The
steps of partitioning methods are as follows. First, the partitioning method will assign
randomly or heuristically each object to a cluster, as initial partition. Here one cluster has
at least one object assigned from the beginning. Second, the method will relocate objects in
clusters iteratively by attempting to improve the partitioning result by moving objects
from one group to another based on a predefined criterion, which is that objects in the
same cluster are close to each other, whereas they are far apart or very different from
objects in different clusters. At present, there have been a few popular heuristic methods,
such as (1) the k-means algorithm, where each cluster is represented by the mean value of
the objects in the cluster, and (2) the k-medoids algorithm, where each cluster is
represented by one of the objects located near the center of the cluster. The partitioning
methods seem work well to construct a number of spherical-shaped clusters in small- to
medium-sized databases. However, these methods need some modification to deal with
clusters with complex shapes and for clustering very large data sets.
2. Hierarchical methods
Unlike a partitioning method, a hierarchical method does not specify the number of
clusters beforehand but attempts to create a hierarchical structure for the given set of data
objects. Two types of a hierarchical method are agglomerative and divisive. The
agglomerative method is the bottom-up approach where the process starts with each
object forming a separate group and then successively merges the objects or groups that
are close to one another, until all of the groups are merged into one (the topmost level of
the hierarchy), or until a termination condition holds. On the other hand, the divisive
method is the top-down approach where the procedure begins with all of the objects in the
same cluster and then for each successive iteration, a cluster is split up into smaller
clusters, until eventually each object is in one cluster, or until a termination condition holds.
Even hierarchical methods are superior in small computation costs due to avoiding a
combinatorial number of different merging or splitting choices, they may suffer with
erroneous decisions in each margining or splitting choice since once a step is done, it can
never be undone. To avoid this problem, two solutions are (1) to perform careful analysis
of linkages among objects at each hierarchical partitioning, as in Chameleon, or (2) to
integrate hierarchical agglomeration and other approaches by first using a hierarchical
agglomerative algorithm to group objects into microclusters, and then performing
macroclustering on the microclusters using another clustering method such as iterative
relocation, as in BIRCH.
3. Density-based methods
Most methods which use distance between objects for clustering will tend to find clusters
with spherical shape. However, in general clusters can have arbitrary shape. Towards this,
it is possible to apply the notion of density for cluster objects into any shape. The general
idea is to start from a single point in a cluster and then to grow the given cluster as long as
the density (number of data points or objects) in the “neighborhood” exceeds a threshold.
For each data point within a given cluster, the neighborhood of a given radius has to
contain at least a minimum number of points. A density-based method tries to filter out
Sponsored by AIAT.or.th and KINDML, SIIT
179
noise (outliers) and discover clusters of arbitrary shape. Examples of density-based
approach are DBSCAN and its extension, OPTICS, and DENCLUE.
4. Grid-based methods
While point-to-point (object pair similarity) calculation in most clustering methods seems
slow, a grid-based method first divides the object space into a finite number of cells as a
grid structure. Then clustering operations are applied on this grid structure. Grid-based
methods are superior in its fast processing time. Rather than the number of data objects,
the time complexity depends only on the number of cells in each dimension in the
quantized space. A typical grid-based method is STING. It is possible to combine grid-
based and density-based as done in WaveCluster.
5. Model-based methods:
Instead of using a simple similarity definition, a model-based method predefines a suitable
model for each of the clusters and then find the best fit of the data to the given model. The
model may form clusters by constructing a density function that reflects the spatial
distribution of the data points. With statistics criteria, we can automatically detect the
number of clusters and obtain more robustness. Some well-known model-based methods
are EM, COBWEB and SOM.
It is hard to say which type of clustering fixes the best with the current task. The solution
depends both on the type of data available and on the particular purpose of the application. It is
possible to explore one-by-one to see their result clusters and compare to get the most practical
one. Some clustering methods may use the ideas of several clustering methods and become
mixed-typed clustering. Moreover, some clustering tasks, such as text clustering or DNA
microarray clustering, may have high dimension, causing difficulty in clustering since the data
become sparse. Clustering high-dimensional data is challenging due to the curse of
dimensionality. Many dimensions may not be relevant. As the number of dimensions increases,
the data become increasingly sparse so that the distance measurement between pairs of points
become meaningless and the average density of points anywhere in the data is likely to be low.
For this task, two influential subspace clustering methods are CLIQUE and PROCLUS. Rather than
searching over the entire data space, they search for clusters in subspaces (or subsets of
dimensions) of the data. Frequent pattern–based clustering, another clustering methodology,
extracts distinct frequent patterns among subsets of dimensions that occur frequently. It uses
such patterns to group objects and generate meaningful clusters. pCluster is an example of
frequent pattern–based clustering that groups objects based on their pattern similarity. Beyond
simple clustering, constraint-based clustering performs clustering under user-specified or
application-oriented constraints. Users may have some preferences on clustering data and
specify them as constraints in clustering process. A constraint is a user’s expectation or describes
“properties” of the desired clustering results. For example, objects in a space are clustered under
the existence of obstacles or they are clustered when some objects are known to be in or not in
the same cluster.
4.1.3. Partition-based Methods
A partitioning method divides objects (data tuples) into partitions of the data, where each
partition represents a cluster and . This method needs to specify , the number of partitions,
beforehand. As the most classic clustering, the k-means method receives in advance the number
of clusters k we construct. With this parameter, k points are chosen at random as cluster centers.
Sponsored by AIAT.or.th and KINDML, SIIT
180
Next all instances are assigned to their closest cluster center according to the ordinary distance
metric, such as Euclidean distance. Next, the centroid, or mean, of the instances in each cluster is
calculated to be a new centroid. These centroids are taken to be new center values for their
respective clusters. The whole process is repeated with the new cluster centers. Finally, iteration
continues until the same points are assigned to each cluster in consecutive rounds, at which stage
the cluster centers have stabilized and will remain the same forever. In summary, four steps of
the k-means algorithm are as follows.
1. Partition objects into k non-empty subsets
2. Compute seed points as the centroids of the clusters of the current partition. The
centroid is the center (mean point) of the cluster.
3. Assign each object to the cluster with the nearest seed point.
4. Go back to Step 2, stop when no more new assignment that is no member changes its
group.
The formal description of k-means can be described as follows. Given a training set of objects and
their associated class labels, denoted by , each object is represented by an
n-dimensional attribute vector, depicting the measure values of n
attributes, , without any class label. Here, suppose that has possible values,
. That is, . Algorithm 4.1 shows a pseudocode of the k-
means method.
Algorithm 4.1. k-means Algorithm
Input: T is a dataset, where and k is the number of clusters.
Output: A set of clusters ,
where each element is a cluster with its members
Procedure:
(1) FOREACH T { ; } // Randomly assign to a class (2) WHILE some members change their groups {
(3) FOREACH {
; } // Calc centroid of each cluster
(4) FOREACH T {
(5) FOREACH {
; } // Calc distance to each cluster
(6)
// Select the best cluster for
(7) ; } // Assign to the best cluster
(8) }
Sponsored by AIAT.or.th and KINDML, SIIT
181
ROUND 1
ROUND 2
ROUND 4
ROUND 3
Figure 4-3: A graphical example of k-means clustering
For more clarity, Figure 4-3 and Figure 4-4 shows an graphical example of the k-means
algorithm and its calculation for each step, respectively. Compared with other methods, the k-
means clustering method is simple and effective. Its principle is to swap the clusters’ members
among clusters until the total distance from each of the cluster’s points to its center becomes
minimal and no more swapping is needed. For example, it is possible to have situations in which
k-means fails to find a good clustering like in Figure 4-5. This figure shows the local optimal due
to improper initial clusters. We can view that these four objects are arranged at the vertices of a
rectangle in two-dimensional space. In the figure, two initial clusters are A and B, where P1 and
P3 are in the cluster A, and P2 and P4 are grouped in the cluster B. Graphically the two initial
cluster centers fall at the middle points of the long sides. This clustering result seem stable.
However, the two natural clusters should be formed by grouping together the two vertices at
either end of a short side. That is, P1 and P2 are in the cluster A, and P3 and P4 are in the cluster
B. In the k-means method, the final clusters are quite sensitive to the initial cluster centers.
Completely different clustering results may be obtained even slight changes are made in the
initial random cluster assignment. To increase the chance of finding a global minimum, one can
execute this algorithm several times with different initial choices and choose the best final result,
the one with the smallest total distance.
Sponsored by AIAT.or.th and KINDML, SIIT
182
ROUND ONE
No. x y Cluster
1 1 8 A
A 3.40 6.30
2 2 7 A
B 8.40 5.00
3 2 10 A 4 3 2 A 5 3 9 A 6 4 3 A 7 4 10 A 8 5 2 A 9 5 4 A 10
8 A
11 6 3 B 12 7 1 B 13 7 11 B 14 8 2 B 15 8 3 B 16 8 11 B 17 9 3 B 1 10 4 B 19 10 10 B 20 11 2 B
ROUND TWO
No. x y A B Cluster
1 1 8 2.94 7.98 A
A 3.60 7.20
2 2 7 1.57 6 71 A
B 8.20 4.10
3 2 10 3.96 8.12 A 4 3 2 4.32 6.18 A 5 3 9 2.73 6.72 A 6 4 3 3.35 4.83 A 7 4 10 3.75 6.66 A 8 5 2 4.59 4.53 B 9 5 4 2.80 3.54 A 10 5 8 2.33 4.53 A 11 6 3 4.20 3.12 B 12 7 1 6.41 4.24 B 13 7 11 5.92 6.16 A 14 8 2 6.30 3.03 B 15 8 3 5.66 2.04 B 16 8 11 6.58 6.01 B 17 9 3 6.50 2.09 B 18 10 4 6.99 1.89 B 19 10 10 7.57 5.25 B 20 11 2 8.73 3.97 B
ROUND THREE
No. x y A B Cluster
1 1 8 2.72 8.19 A
A 3.90 7.90
2 2 7 1.61 6.84 A
B 7.90 3.40
3 2 10 3.22 8.56 A 4 3 2 5.23 5.61 A 5 3 9 1.90 7.14 A 6 4 3 4.22 4.34 A 7 4 10 2.83 7.24 A 8 5 2 5.39 3.83 B 9 5 4 3.49 3.20 B 10 5 8 1.61 5.04 A 11 6 3 4.84 2.46 B 12 7 1 7.07 3.32 B 13 7 11 5.10 7.00 A 14 8 2 6.81 2.11 B 15 8 3 6.08 1.12 B 16 8 11 5.81 6.90 A 17 9 3 6.84 1.36 B 18 10 4 7.16 1.80 B 19 10 10 6.99 6.17 B 20 11 2 9.04 3.50 B
ROUND FOUR
No. x y A B Cluster
1 1 8 2.90 8.29 A
A 4.67 9.33
2 2 7 2.10 6.91 A
B 6.91 2.64
3 2 10 2.83 8.85 A 4 3 2 5.97 5.10 B 5 3 9 1.42 7.44 A 6 4 3 4.90 3.92 B 7 4 10 2.10 7.67 A 8 5 2 6.00 3.22 B 9 5 4 4.05 2.96 B 10 5 8 1.10 5.44 A 11 6 3 5.33 1.94 B 12 7 1 7.56 2.56 B 13 7 11 4.38 7.65 A 14 8 2 7.18 1.40 B 15 8 3 6.39 0.41 B 16 8 11 5.14 7.60 A 17 9 3 7.07 1.17 B 18 10 4 7.24 2.18 B 19 10 10 6.45 6.93 A 20 11 2 9.23 3.40 B
Figure 4-4: Numerical calculation of the k-mean clustering in Figure 4-3
Sponsored by AIAT.or.th and KINDML, SIIT
183
Figure 4-5: Local optimal due to improper initial clusters.
There are many variants of the basic k-means method developed. Some of them try to produce a
hierarchical clustering result (shown in the next section) with a cutting point at k groups and
then perform k-means clustering on the result. However, all methods still left one question on
how large the k should be. It is hard to estimate the likely number of clusters. One solution is to
try different values of k, and choose the best clustering result, i.e., one with the largest intra-
cluster similarity and the smallest inter-cluster similarity. Another solution of finding k is to
begin by finding a few clusters and determining whether it is worth splitting them. For example,
we choose k=2 and then perform k-means clustering until it terminates, and then consider
splitting each cluster.
4.1.4. Hierarchical-based clustering
As one of the common clustering methods, hierarchical clustering groups data objects into a tree
of clusters, without a predefined number of clusters. Their basic operation is to merge the similar
objects or object groups or to split dissimilar objects or object groups to different groups.
Anyway, the most serious drawback of a pure hierarchical clustering method is its inability to
reassign once a merge or split decision has been executed. If a particular merge or split decision
is later known to be a wrong decision, the method cannot make it correct later. To solve this, it is
possible to incorporate some iterative relocation mechanism into the original version. In general,
there are two types of hierarchical clustering methods are agglomerative and divisive, depending
on whether the hierarchical structure (tree) is formed in either bottom-up (merging) or top-
down (splitting) style. As for the bottom-up fashion, the agglomerative hierarchical clustering
approach starts with having each object in its own cluster and then merges these atomic clusters
into larger and larger clusters, until all of the objects are included in a single cluster or until some
certain termination conditions are satisfied. most hierarchical clustering methods belong to this
type. However, there may be various definitions of intercluster similarity. On the other hand, as
for the top-down fashion, the divisive hierarchical clustering approach performs the reverse of
agglomerative hierarchical clustering by starting with all objects in one cluster. It subdivides the
cluster into smaller and smaller pieces, until each object forms a cluster on its own or until it
satisfies certain termination conditions, such as a desired number of clusters is obtained or the
diameter of each cluster is within a certain threshold.
Sponsored by AIAT.or.th and KINDML, SIIT
184
Agglomerative versus divisive hierarchical clustering
While two directions of hierarchical clusterings are top-down and bottom-up, it is possible to
represent both of them in the form of a tree structure. In general, such tree structure is called a
dendrogram. It is commonly used to represent the process of hierarchical clustering. It shows
how objects are grouped together step by step. Figure 4-6 shows a dendrogram for seven objects
in (a) agglomerative clustering and (b) divisive clustering. Here, Step 0 is the initial stage while
Step 6 is the final stage when a single cluster is constructed. The agglomerative hierarchical
clustering places each object into a cluster with its own. Then it tries to merge step-by-step
according to some criterion. For example, in Figure 4-6 (a) at the step 4, the agglomerative
hierarchical clustering method attempts to merge {a,b,c} with {d} and form {a,b,c,d} in the
bottom-up manner. It is also known as AGNES (AGglomerative NESting). In Figure 4-6 (b) at the
step 3, the divisive hierarchical clustering method tries to divide {e,f,g} into {e,f} and {g}. This
approach is also called DIANA (DIvisive ANAlysis). However, in either agglomerative or divisive
hierarchical clustering, the user can specify the desired number of clusters as a termination
condition. That is, it is possible to terminate at any step to obtain clustering results. If the user
requests three clusters, the agglomerative method will terminate at Step 4 while the divisive
method will terminate at Step 2.
Bottom-up (Agglomerative)
Top-down (divisive)
Figure 4-6: Dendrogram: Agglomerative vs. Divisive Clustering
Distance measurement among clusters
As stated in Section 4.1.1, there are several possible distance definition to express distance
between two single objects. However, in hieratchical clustering, an additional requirement is to
define the distance between two clusters which may include more than one object. Four widely-
used measures are single linakage, complete linkage, centroid comparision and element
Sponsored by AIAT.or.th and KINDML, SIIT
185
comparion. Figure 4-7 shows the graphical representation of these four methods. The
formulation of each measure can be defined as follows.
No. Cluster Distance Distance definition
1. Single linkage
(minimum distance)
2. Complete linkage
(maximum distance)
3. Centroid comparison
(mean distance)
where, is the centroid of and is the centroid of .
4. Element comparison
(average distance)
Figure 4-7: Graphical representation of four definitions of cluster distances
Firstly, for the single linkage, if we use the minimum distance, , to measure the
distance between clusters, it is sometimes called a nearest-neighbor clustering algorithm. In this
method, the clustering process is terminated when the distance between nearest clusters
exceeds an arbitrary threshold. It is possible to view the data points as nodes of a graph where
edges form a path between the nodes in a cluster. When two clusters, and , are merged, an
edge is added between the nearest pair of nodes in and . This merging process will result in a
tree-like graph. An agglomerative hierarchical clustering algorithm that uses the minimum
distance measure is also known as a minimal spanning tree algorithm.
Sponsored by AIAT.or.th and KINDML, SIIT
186
Secondly, for the complete linkage, an algorithm uses the maximum distance, , to
measure the distance between clusters. Also called a farthest-neighbor clustering algorithm, the
clustering process is terminated when the maximum distance between the nearest clusters
exceeds a predefined threshold. In this method, each cluster can be viewed as a complete
subgraph where there exist edges connecting all of the nodes in the clusters. The distance
between two clusters is determined by the most distant nodes in the two clusters. These farthest-
neighbor algorithms aim to minimize the increase in diameter of the clusters at each iteration as
little as possible. It performs well when the true clusters are compact and approximately equal in
size. Otherwise, the clusters produced can be meaningless. The nearest-neighbor clustering and
the farthest-neighbor clustering express two extreme cases for defining the distance between
clusters. They are quite sensitive to outliers or noisy data.
Rather than these two methods, sometimes it is better to use mean or average distance
instead, in order to compromise between the minimum and maximum distances and to overcome
the outlier and noisy problem. The mean distance comes for the calculation of the centroid of
each cluster and then the measurement of the distance between a pair of centroids. Also known
as centroid comparison, this method is computationally simple and cheap. Every time two
clusters are merged (for agglomerative) or a cluster is split (for divisive), a new centroid will be
calculated for the newly merged cluster or two centroids will be calculated for the newly split
two cluters. However, agglomerative clustering is slightly simpler than divisive clustering since it
is possible to use a so-called weighted combination technique to calculate the centroid of the
newly merged cluster but it cannot be applied for the divisive approach. For both methods, it is
necessary to calculate the distance between the new centroid(s) with the other centroids.
Compared to the centroid comparison, the element comparison is done to calculate the
distance between two clusters by finding the average distance among all elements in those two
clusters. This step is much more computational expensive than the centroid comparison.
Moreover, the average distance is advantageous in that it can handle categoric as well as numeric
data. The computation of the mean vector for categoric data can be difficult or impossible to
define but it is possible for average distance.
Problems in the hierarchical approach
While hierarchical clustering is simple, it has a drawback in how to select points to merge or split.
Each merging or splitting step is important since the next step will process based on the newly
generated clusters and it will never reconsider the result of the previous steps or swap objects
between clusters. If the previous merge or split decisions are not well determined, low-quality
clusters may be generated. To solve this problem, it is possible to perform multiple-phase
clustering by incorporating other clustering techniques into hierarchical clustering. Three
common methods are BIRCH, ROCK and Chameleon. The BIRCH partitions objects hierarchically
using tree structures whose leaf (or low-level nonleaf nodes) are treated as microclusters,
depending on the scale of resolution. After that, it applies other clustering algorithms to perform
macroclustering on the microclusters. The ROCK merges clusters based on their
interconnectivity after hierarchical clustering. The Chameleon explores dynamic modeling in
hierarchical clustering.
4.1.5. Density-based clustering
Unlike partition-based or hierarchical-based clustering which tends to discover clusters with a
spherical shape, density-based clustering methods have been designed to discover clusters with
arbitrary shape. In this approach, dense regions of objects in the data space will be separated by
regions of low density. As density-based methods, DBSCAN grows clusters according to a density-
Sponsored by AIAT.or.th and KINDML, SIIT
187
based connectivity analysis. OPTICS extends DBSCAN to produce a cluster ordering obtained
from a wide range of parameter settings. DENCLUE clusters objects based on a set of density
distribution functions. In this book, we describe DBSCAN.
DBSCAN: Density-Based Spatial Clustering of Applications with Noise
DBSCAN is a density-based clustering algorithm, finding regions which includes objects with
sufficiently high density into clusters. By this nature, It discovers clusters of arbitrary shape in
spatial databases with noise. In this method, a cluster is defined as a maximal set of density-
connected points. The following are definitions of the concepts used in the methods.
-neighborhood: The neighborhood objects within a radius of a given object is called the
-neighborhood of the object.
Core object: When an object has an -neighborhood containing objects more than a
minimum number, MinObjs, it will be called a core object.
Directly density-reachable: Given a set of objects D, an object p is directly density-
reachable from the object q if p is within the -neighborhood of q, and q is a core object.
Density-reachable: An object p is density-reachable from object q with respect to and
MinObjs in a set of objects, D, if there is a chain of objects , where and
such that is directly density-reachable from with respect to and MinObjs,
for .
Density-Connected: An object p is density-connected to object q with respect to and
MinObjs in a set of objects, D, if there is an object such that both p and q are density-
reachable from o with respect to and MinObjs.
A density-based cluster: A density-based cluster is a set of density-connected objects that
is maximal with respect to density-reachability. Every object not contained in any cluster is
considered to be noise.
Note that density reachability is the transitive closure of direct density reachability, and this
relationship is asymmetric. Only core objects are mutually density reachable. Density
connectivity, however, is a symmetric relation.
Figure 4-8: An example of density-based clustering
Sponsored by AIAT.or.th and KINDML, SIIT
188
For example, given twenty objects (a - t) as shown in Figure 4-8, the neighbor elements for
each object are shown in the left list in the figure. They are derived, basing on .
Moreover, the core objects (marked with ‘*’) are a, b, d, f, g, i, j, k, m, o, p, q and s since they include
more than three neighbors (MinObjs = 3).
The directly density-reachable objects of each core object (q) can be listed as follows.
Object (q) Directly density reachable objects Object (q) Directly density reachable
objects
a* b, c, d k* i, j, m, o, p
b* a, d, f m* k, o, p
d* a, b, c, f, h o* k, m, p, q, s
f* b, d, h p* k, m, o, q
g* e, i, j, k q* o, p, s, t
i* g, j, k s* o, q, t
j* e, g, i, k, m
The density reachable objects of each core object (q) can be listed as follows.
Object (q) Density reachable objects Object (q) Density reachable objects
a* b, c, d, f, h k* e, g, i, j, m, o, p , q, s, t
b* a, c, d, f, h m* e, g, i, j, k, o, p, q, s, t
d* a, b, c, f, h o* e, g, i, j, k, m, p, q, s, t
f* a, b, c, d, h p* e, g, i, j, k, m, o, q, s, t
g* e, i, j, k, m, o, p, q, s, t q* e, g, i, j, k, m, o, p, s, t
i* e, g, j, k, m, o, p, q, s, t s* e, g, i, j, k, m, o, p, q, t
j* e, g, i, k, m, o, p, q, s, t
The clustering result can be defined by the density-connected property. In the example, the
result are two clusters as follows. Moreover, the objects that become noises are l, n and r.
Cluster Cluster members 1 a, b, c, d, f, h 2 g, e, i, j, k, m, o, p, q, s, t
When we apply a sort of indexing, the computational complexity of DBSCAN will be ,
where n is the number of database objects. Without any index, it is . With appropriate
settings of the user-defined parameters and MinObjs, the algorithm is effective at finding
arbitrary-shaped clusters.
4.1.6. Grid-based clustering
In constrast with partition-based, hierarchical-based and density-based methods, the grid-based
clustering approach uses a multiresolution grid data structure to quantize the object space into a
finite number of cells that form a grid structure on which all of the operations for clustering are
performed. This approach aims to improve the processing time. Its time complexity is typically
independent of the number of data objects, instead it depends onthe number of cells in each
dimension in the quantized space. Some typical grid-based methods are STING, WaveCluster and
CLIQUE. Here, STING explores statistical information stored in the grid cells, WaveCluster
clusters objects using a wavelet transformation method, and CLIQUE represents a grid-and
density-based approach for clustering in high-dimensional data space.
STING: STatistical INformation Grid
STING is a grid-based multiresolution clustering technique in which the spatial area is divided
into rectangular cells. There are usually several levels of such rectangular cells corresponding to
Sponsored by AIAT.or.th and KINDML, SIIT
189
different levels of resolution, and these cells form a hierarchical structure: each cell at a high level
is partitioned to form a number of cells at the next lower level. Statistical information regarding
the attributes in each grid cell (such as the mean, maximum, and minimum values) is
precomputed and stored. These statistical parameters are useful for query processing, as
described below.
Figure 4-9: Grid structure in grid-based clustering
Figure 4-9 shows a hierarchical structure for STING clustering. Statistical parameters and
characteristics of higher-level cells can easily be computed from those of the lower-level cells.
These parameters can be the attribute-independent parameters such as count; the attribute-
dependent parameters such as mean, stdev (standard deviation), min (minimum), max
(maximum); and the distribution type of the attribute value in the cell such as normal, uniform,
exponential, or none (if the distribution is unknown). When the data are loaded into the database,
the parameters (e.g., count, mean, stdev, min, and max) of the bottom-level cells can be calculated
directly from the data.
Since STING uses a multiresolution approach to cluster analysis, the quality of STING
clustering depends on the granularity of the lowest level of the grid structure. When the
granularity is too fine, the cost of processing will increase substantially. On the other hand, when
the bottom level of the grid structure is too coarse, it may reduce the quality of cluster analysis.
While STING does not take the spatial relationship between the children and their neighboring
cells for construction of a parent cell into consideration, the shapes of the resultant clusters are
isothetic; that is, all of the cluster boundaries are either horizontal or vertical, and no diagonal
boundary is detected. By this characteristics, the quality and accuracy of the clusters may be
lower, tradeoff with the fast processing time.
4.1.7. Model-based clustering
Instead of using a simple similarity definition, a model-based method predefines a suitable model
for each of the clusters and then find the best fit of the data to the given model. Some well-known
model-based methods are EM, COBWEB and SOM.
Sponsored by AIAT.or.th and KINDML, SIIT
190
EM Algorithm: Expectation-Maximization method
The EM (Expectation-Maximization) algorithm (Algorithm 4.2) is a popular iterative refinement
algorithm used in several applications, such as speech recognition, image processing. It was
developed to estimate the suitable values of unknown parameters by maximizing expectation.
Algorithm 4.2. The EM Algorithm
1. Intitialization step: To obtain the seed for probability calculation, we start with
making an initial guess of the parameter vector. Even there are several possible
choices on this. As one simple approach, the method randomly partition the objects
into k groups and then for each group (cluster) calculate its means (its center) (similar
to k-means partitioning).
2. Repetetion step: To improve the initial cluster parameter, we iteratively refine the
parameters (or clusters) based on the two steps of expectation step and maximization
step.
(a) Expectation Step
The probability that each object is in a cluster is defined as follows.
Here, is the probability that the object occurs in the cluster . We can
define it as which follows the normal distribution (i.e., Gaussian
distribution) under the mean ( ) and the standard deviation of the cluster ( ).
The probability that the object belongs to the cluster is defined as follows.
is the priori probability that the cluster will occur. Without bias, we can set all clusters to have the same probability. For the , it does not depend on any cluster. Therefore, it can be ignored for consideration. Finally, it is possible to use only as the probability that the object belongs to the cluster .
(b) Maximization Step
By the above probability estimates, we can re-estimate or refine the model
parameters as follows.
As its name, this step is the maximization of the likelihood of the distributions given
the data.
Sponsored by AIAT.or.th and KINDML, SIIT
191
Although there have been several variants of EM methods. most of them can be viewed as an
extension of the k-means algorithm, which assigns an object to the cluster that is the closet to it,
based on the cluster mean or the cluster representative. Instead of assigning each object to a
single dedicated cluster, the EM method assigns each object to a cluster according to a weight
representing the probability of membership. That is, no regid boundaries between clusters are
defined. Afterwards, new means are computed based on weighted measures.
Since we do not know which objects should be grouped into the same group beforehand, the
EM begins with an initial estimate (or guess) for each parameter in the mixture model
(collectively referred to as the parameter vector). The parameter can be set by randomly
grouping objects into k clusters, or just selecting k objects in the set of objects to be the means of
the clusters. After this intial setting, the EM algorithm will iteratively rescore the objects against
the mixture density, produced by the parameter vector. The rescored objects are then used to
update the parameter estimates. During calculation, each object is assigned a probability of how
likely it belonged to a given cluster. The detail of the algorithm is described as follows.
While the EM algorithm is simple and easy to implement and converges fast, sometimes it
falls into a local optima. The convergence is guaranteed for certain forms of optimization
functions with the computational complexity of O( where is the number of input
features, is the number of objects, and is the number of iterations.
Sometimes known as Bayesian clustering, the method focuses on the computation of class-
conditional probability density. They are commonly used in the statistics community. In industry,
AutoClass is a popular Bayesian clustering method that uses a variant of the EM algorithm. The
best clustering maximizes the ability to predict the attributes of an object given the correct
clusterof the object. AutoClass can also estimate the number of clusters. It has been applied to
several domains and was able to discover a new class of stars based on infrared astronomy data.
Conceptual Clustering
Unlike conventional clustering which does not focus on detailed description of a cluster,
conceptual clustering forms a conceptual tree by also considering characteristic descriptions for
each group, where each group corresponds to a node (concept or class) in the tree. In other
words, conceptual clustering has two steps; clustering and characterization. Clustering quality is
not solely a function of the individual objects but also the generality and simplicity of the derived
concept descriptions. Most conceptual clustering methods uses probability measurements to
determine the concepts or clusters. As an example of this type, COBWEB is a popular and simple
method of incremental conceptual clustering. Its input objects are expressed by categorical
attribute-value pairs. COBWEB creates a hierarchical clustering in the form of a classification tree.
Figure 4-10 shows an example of a classification tree for a set of animal data. A classification
tree differs from a decision tree in the sense that intermediate nodes in a classification tree
specifies a concept but those in a decision indicates an attribute test. In a classification, each node
refers to a concept and contains a probabilistic description of that concept, which summarizes
the objects classified under the node. To be summarized, the COBWEB works as follows. Given a
set of objects, , each object is represented by an n-dimensional attribute
vector, depicting the measure values of n attributes
of the object. The Bayesian (statistical) classifier assigns (or
predicts) a class to the object when the class has the highest posterior probability over the
others, conditioned on the object’s attribute values . That is, the Bayesian
classifier predicts that the object belongs to the class .
Sponsored by AIAT.or.th and KINDML, SIIT
192
Figure 4-10: A classification tree. This figure is based on (Fisher, 1987)
The probabilistic description includes the probability of the concept and conditional
probabilities of the form , where is an attribute-value pair
(that is, the i-th attribute takes its j-th possible value) and is the concept class. Normally, the
counts are accumulated and stored at each node for probability calculation. The sibling nodes at
a given level of a classification tree form a number of partitions. To classify an object using a
classification tree, a partial matching function is employed to descend the tree along a path of
best matching nodes. COBWEB uses a heuristic evaluation measure called category utility to
guide construction of the tree. Category utility (CU) is defined as follows.
where n is the number of nodes (also called concepts, or categories) forming a partition,
, at the given level of the tree. In other words, category utility can be used to
measure the increase in the expected number of attribute values that can be correctly guessed
given a partition
over the expected number of correct guesses
with no such knowledge
.
As an incremental approach, when a new object comes to COBWEB, it descends the tree
along an appropriate path, updates counts along the way and tries to search for the best node to
place the object in. This decision is done by selecting the situation that has the highest category
utility of the resulting partition. Indeed, COBWEB also computes the category utility of the
partition that would result if a new node were to be created for the object. Therefore, the object
is placed in an existing class, or a new class is created for it, based on the partition with the
highest category utility value. COBWEB has the ability to automatically adjust the number of
classes in a partition. That is there is no need to provide the number of clusters like k-means or
hierarchical clustering.
However, the COBWEB operators are highly sensitive to the input order of the object. To
solve this problem, COBWEB has two additional operators, called merging and splitting. When an
object is incorporated, the two best hosts are considered for merging into a single class.
Moreover, COBWEB considers splitting the children of the best host among the existing
categories, based on category utility. The merging and splitting operators implement a
bidirectional search. That is, a merge can undo a previous split as well as a split can be undone by
a later merging process.
Sponsored by AIAT.or.th and KINDML, SIIT
193
Howover, COBWEB still has a number of limitations. Firstly, it assumes that probability
distributions of an attributes are statistically independent of one another, which is not always
true. Secondly, it is expensive to store the probability distribution representation of clusters,
especially when the attributes have a large number of values. The time and space complexities
depend not only on the number of attributes, but also on the number of values for each attribute.
Moreover, the classification tree is not height-balanced for skewed input data, which may cause
the time and space complexity to degrade dramatically.
As an extension to COBWEB, CLASSIT deals with continuous (or real-valued) data. It stores a
continuous normal distribution (i.e., mean and standard deviation) for each individual attribute
in each node and applies a generalized category utility measure by an integral over continuous
attributes instead of a sum over discrete attributes as in COBWEB. While conceptual clustering is
popular in the machine learning community, both COBWEB and CLASSIT suffers in cases of
clustering large database data.
4.2. Association Analysis and Frequent Pattern Mining
Another form of knowledge that we can mine from data is frequent patterns or associations
which include frequent itemsets, subsequences, substructures and association rules. For example,
in a supermart database, a set of items such as bread and better are likely to appear frequently
together. This set is called a frequent itemset. Moreover, in an electronics shop database, there
may be a frequent subsequence that a PC, then digital camera, and a memory card are bought in
sequence frequently. This is called a frequent subsequence. A substructure is a more complex
pattern. It can refer to different structural forms, such as subgraphs, subtrees, or sublattices,
which may be combined with itemsets or subsequences. If a substructure is found frequently,
that substructure is called a frequent structured pattern. Such frequent patterns are important in
mining associations, correlations, and many other interesting relationships among data.
Moreover, again in a supermarket database, if one buys ice, he (or she) is likely to buy water at
the supermarket. This is called an association rule.
After performing frequent itemset mining, we can use the result to discover associations and
correlations among items in large transactional or relational data sets. The discovery of
interesting correlation relationships among huge amounts of business transaction records can
help in many business decision-making processes, such as catalog design, cross-marketing, and
customer shopping behavior analysis. A typical example of frequent itemset mining and
association rule mining is market basket analysis. Its formulation can be summarized as follows.
Let be a set of possible items, be a set of database
transactions where each transaction includes a number of items, and an itemset is a
set of items. A transaction is said to contain the itemset A if and only if . Let be a
set of the transactions that contain . The support of an itemset A can be defined as the number
of transactions that include all items in , devided by the number of all possible items. It
correspond to the probability that will occur in a transaction (i.e., ) as follows.
An itemset A is called a frequent itemset if and only if the support of A is greater than or equal to a
threshold called minimum support (minsup), i.e. . Note that an itemset
represents a set of items. When the itemset contains k items, it is called k-itemset. For example,
in supermarket database, the set is a 2-itemset. The occurrence frequency of an
itemset is the number of transactions that contain the itemset and the support is its ratio,
compared to the total number of transactions.
Sponsored by AIAT.or.th and KINDML, SIIT
194
An association rule is an implication of the form , where , and
(i.e., and are two non-overlap itemsets. The support of the rule is denoted by
. It corresponds to , (i.e., the union of sets A and B). This is taken
to be the probability, . Moreover, as another measure, the confidence of the rule
is denoted by , specifying the percentage of transactions in T containing A
that also contain B. It corresponds to and then imply . Its
formal description is as follows.
An association rule is called a frequent rule if and only if is a frequent itemset and its
confidence greater than or equal to a threshold called minimum confidence (minconf), i.e.
.
Besides support and confidence, another important measure is lift. Theoretically. if the value
of lift is lower than one, i.e., , the probability that the conclusion (B) will occur
under the condition (A), i.e., , is lower than the probability of conclusion without the
precondition, i.e., . This makes a meaningless rule. Therefore, we always expect an
association rule with a lift larger than or equal to a threshold called minimum lift,
. The fomal description of life is shown below.
In general, as a process to find frequent association rules, association rule mining can be
viewed as a two-step process.
1. Find all frequent itemsets.
2. Generate strong association rules from the frequent itemsets.
The first process, from the transactional database, find a set of frequent itemsets which occur
at least as frequently as a predetermined minimum support count (i.e., minsup) while the second
process generates strong association rules from the frequent itemsets obtained from the first
process. The strong association rules are supposed to have support greater than minimum
support , confidence greater than minimum confidence and lift greater than
minimum lift .
In principle, the second step is much less costly than the first, the overall performance of
mining association rules is normally dominated by the second step. One research topic in mining
frequent itemsets from a large data set is to efficiently generate a huge number of itemsets
satisfying a low minimum support. Furthermore, given a frequent itemset, each of its subsets is
frequent as well. A long itemset will contain a combinatorial number of shorter, frequent sub-
itemsets. For example, a frequent itemset with a length of 30 includes up to
subsets as follows.
This first term comes from the frequent 1-itemsets, , the second term is
, and so on. To solve the problem of such extreme large
number of itemsets, the concepts of closed frequent itemset and maximal frequent itemset are
introduced. Here, we describe frequent itemsets, association rules as well as closed frequent
itemsets and maximal frequent itemsets, using the example in Figure 4-11,
Sponsored by AIAT.or.th and KINDML, SIIT
195
Transaction ID Items 1 coke, ice, paper, shoes, water 2 ice, orange, shirt, water 3 paper, shirt, water 4 coke, orange, paper, shirt, water 5 ice, orange, shirt, shoes, water 6 paper, shirt, water
(a) An toy example of a transactional database for retailing ( )
Itemset Trans Freq.
Itemset Trans Freq.
Itemset Trans Freq. coke 14 2
ice, orange 25 2
orange, shirt, water 245 3
ice 125 3
ice, paper 1 1
paper, shirt, water 346 3 orange 245 3
ice, shirt 25 2
paper 1346 4
ce, water 125 3
shirt 23456 5
orange, paper 4 1
shoes 15 2
orange, shirt 245 3
water 123456 6
orange, water 245 3
paper, shirt 346 3
paper, water 1346 4
shirt, water 23456 5
(b) Frequent itemsets (minimum support = 3 (i.e., 50%))
Here,
No. Left Items Right Items support confidence Lift 1 ice water 3/6=0.50 3/3=1.00 (3/3)/(6/6)=1.0 2 water ice 3/6=0.50 3/6=0.50 (3/6)/(3/6)=1.0 3 orange shirt 3/6=0.50 3/3=1.00 3/3)/(5/6)=1.2 4 shirt orange 3/6=0.50 3/5=0.60 (3/5)/(3/6)=1.2 5 orange water 3/6=0.50 3/3=1.00 (3/3)/(6/6)=1.0 6 water orange 3/6=0.50 3/6=0.50 (3/6)/(3/6)=1.0 7 paper shirt 3/6=0.50 3/4=0.75 (3/4)/(5/6)=0.9 8 shirt paper 3/6=0.50 3/5=0.60 (3/5)/(4/6)=0.9 9 paper water 4/6=0.67 4/4=1.00 (4/4)/(6/6)=1.0
10 water paper 4/6=0.67 4/6=0.67 (4/6)/(4/6)=1.0 11 shirt water 5/6=0.83 5/5=1.00 (5/5)/(6/6)=1.0 12 water shirt 5/6=0.83 5/6=0.83 (5/6)/(5/6)=1.0 13 orange, shirt water 3/6=0.50 3/3=1.00 (3/3)/(6/6)=1.0 14 orange, water shirt 3/6=0.50 3/4=0.75 (3/4)/(5/6)=0.9 15 shirt, water orange 3/6=0.50 3/5=0.60 (3/5)/(3/6)=1.2 16 water orange, shirt 3/6=0.50 3/6=0.50 (3/6)/(3/6)=1.0 17 shirt orange, water 3/6=0.50 3/5=0.60 (3/5)/(3/6)=1.2 18 orange shirt, water 3/6=0.50 3/3=1.00 (3/3)/(5/6)=1.2 19 paper, shirt water 3/6=0.50 3/3=1.00 (3/3)/(6/6)=1.0 20 paper, water shirt 3/6=0.50 3/4=0.75 (3/4)/(5/6)=0.9 21 shirt, water paper 3/6=0.50 3/5=0.60 (3/5)/(4/6)=0.9 22 water paper, shirt 3/6=0.50 3/6=0.50 (3/6)/(3/6)=1.0 23 shirt paper, water 3/6=0.50 3/5=0.60 (3/5)/(4/6)=0.9 24 paper shirt, water 3/6=0.50 3/4=0.75 (3/4)/(5/6)=0.9
(c) Association rules (minimum confidence = 66.67%) ( )
=
Figure 4-11: Frequent itemset and frequent rules (association rules)
Sponsored by AIAT.or.th and KINDML, SIIT
196
Given the transaction database in Figure 4-11 (a), we can find a set of frequent itemsets shown in
Figure 4-11 (b) when the minimum support is set to 0.5. The set of frequent rules (association
rules) is found as displayed in Figure 4-11 (c) when the minimum confidence is set to 0.66.
Moreover, a valid rule need to have a lift of at least 1.0. In this example, the thirteen frequent
itemsets can be summarized in Figure 4-12.
No. Frequent Itemset Transaction Set Support
1. ice 125 3/6
2. orange 245 3/6
3. paper 1346 4/6
4. shirt 23456 5/6
5. water 123456 6/6
6. ice, water 125 3/6
7. orange, shirt 245 3/6
8. orange, water 245 3/6
9. paper, shirt 346 3/6
10. paper, water 1346 4/6
11. shirt, water 23546 5/6
12. orange, shirt, water 245 3/6
13. paper, shirt, water 346 3/6
Figure 4-12: Summary of frequent itemsets with their transaction sets and supports.
The concepts of closed frequent itemsets and maximal frequent itemsets are defined as follows.
[Closed Frequent Itemset]
An itemset X is closed in a data set S if there exists no proper super-itemset such that Y has
the same support count as X in S. An itemset X is a closed frequent itemset in set S if X is both
closed and frequent in S.
[Maximal Frequent Itemset]
An itemset X is a maximal frequent itemset (or max-itemset) in set S if X is frequent, and
there exists no super-itemset Y such that and Y is frequent in S.
Normally it is possible to find the whole set of frequent itemsets from the set of closed frequent
itemsets but it is not possible to find all frequent itemsets from the set of maximal itemsets. The
set of closed frequent itemsets contains complete information regarding its corresponding
frequent itemsets but the set of maximal frequent itemsets registers only the support of the
maximal frequent itemsets.
Using the above example, the frequent closed itemsets are shown in Figure 4-13. Here, the
frequent itemsets which are not closed was indicated by strikethrough. There are six frequent
closed itemsets. Here, the column ‘Transaction Set’ indicates the set of transactions that include
that frequent itemset. For example, the frequent itemset {orange, water} is in the 2nd, 4th and 5th
transactions in Figure 4-11 (a). Moreover, the frequent itemset {orange} is not closed since it
have the same transaction set with its superset, i.e., the frequent itemset {orange, shirt, water}.
Note that the shortest superset that includes the frequent itemset {orange} is the closed frequent
itemset {orange, shirt, water}. Therefore, the closed frequent itemset of {orange} is {orange, shirt,
water}.
Sponsored by AIAT.or.th and KINDML, SIIT
197
No. No. (closed) Frequent Itemset Transaction Set Support
1. ice 125 3/6
2. orange 245 3/6
3. paper 1346 4/6
4. shirt 23456 5/6
5. 1 water 123456 6/6
6. 2 (ice, water), (ice) 125 3/6
7. orange, shirt 245 3/6
8. orange, water 245 3/6
9. paper, shirt 346 3/6
10. 3 (paper, water), (paper) 1346 4/6
11. 4 (shirt, water), (shirt) 23546 5/6
12. 5 (orange, shirt, water),
(orange, water), (orange,
shirt), (orange)
245 3/6
13. 6 (paper, shirt, water),
(paper, shirt)
346 3/6
Figure 4-13: Six closed itemsets with their transaction sets and supports.
In general, the set of closed frequent itemsets contains complete information regarding the
frequent itemsets. For example, the transaction set and support of the frequent itemset {orange},
{orange, shirt} and {orange, water} can be found to be equivalent to those of the closed frequent
itemset {orange, shirt, water} since there is no shorter frequent closed itemset that include them,
except {orange, shirt, water}. That is, TransactionSet({orange})= TransactionSet({orange,shirt})=
Procedure CheckNoInfrequentSubset(c,Lk-1) # c: a candidate k-itemset,
# Lk-1 is frequent (k-1)-itemsets
1: foreach (k-1)-itemsets s of c {
2: if s Lk-1
3: return FALSE
4: return TRUE;
Sponsored by AIAT.or.th and KINDML, SIIT
200
Transaction ID Items 1 coke, ice, paper, shoes, water 2 ice, orange, shirt, water 3 paper, shirt, water 4 coke, orange, paper, shirt, water 5 ice, orange, shirt, shoes, water 6 paper, shirt, water
Step 1: Scan the database to count the frequency of the 1-itemsets and generate the set of the candidate 1-itemset, C1.
Output: C1
Itemset Support {coke} 2
{ice} 3
{orange} 3
{paper} 4
{shirt} 5
{shoes} 2
{water} 6
Step 2: From the set of the candidate 1-itemset C1, identify the frequent ones and generate the frequent 1-itemset, L1. Here the infrequent ones are omitted.
Step 4: From the set of the candidate 2-itemset C2, identify the frequent ones and generate the frequent 2-itemset, L2. Here the infrequent ones are omitted.
Step 5: From the set of the frequent 2-itemset L2, generate the set of the candidate 3-itemset, C3 and count their frequencies. Here, perform join and prune steps as shown above.
For the join step, {orange,shirt} and {orange,water} can be used to generate {orange,shirt,water}, and {paper,shirt} and {paper,water} can be used to generate {paper,shirt,water}. Moreover, for the prune step, since {shirt,water} is frequent (it exists in L2), both {orange,shirt,water} and {paper,shirt,water} are frequenct.
Step 6: From the set of the candidate 3-itemset C3, identify the frequent ones and generate the frequent 3-itemset, L3. Here, all 3-itemsets are frequent.
Output: C3 L3 Itemset Support Itemset Support {orange,shirt,water} 3 {orange,shirt,water} 3 {paper,shirt,water} 3 {paper,shirt,water} 3
Step 7: From the set of the frequent 3-itemset L3, try to generate the set of the
candidate 4-itemset, C3 but no candidate can be generated. Output:
L3 C4 Itemset Support Itemset Support {orange,shirt,water} 3 {paper,shirt,water} 3
Sponsored by AIAT.or.th and KINDML, SIIT
202
Finally, the set of frequent itemsets are They can be listed as follows.
(a) Frequent itemsets when the minimum support is set to 50% (one-level higher)
Rule Confidence
Rule Confidence clothing drink 6/6=1.00 fruit stationary 6/7=0.86 drink clothing 6/10=0.60 stationary fruit 6/9=0.67 clothing stationary 6/6=1.00 clothing, drink stationary 6/6=1.00 stationary clothing 6/9=0.67 clothing, stationary drink 6/6=1.00 drink fruit 7/10=0.70 drink, stationary clothing 6/9=0.67 fruit drink 7/7=1.00 drink, fruit stationary 6/7=0.86 drink stationary 9/10=0.90 drink, stationary fruit 6/9=0.67 stationary drink 9/9=1.00 fruit, stationary drink 6/6=1.00
(b) Possible association rules when the minimum confidence is set to 80% (one-level higher)
Items Count Items Count apple 1356 (1-1) coke, orange 2478A (2-1) coke 23478A (1-2) coke, paper 23478A (2-2) orange 124578A (1-3) coke, shirt 478 (2-3) paper 123478A (1-4) coke, water 23 (2-4)
ruler 269A (1-5) orange, paper 12478A (2-5) shirt 146789 (1-6) orange, shirt 1478 (2-6) shoes 3679 (1-7) orange, water 125 (2-7) water 123569 (1-8) paper, shirt 1478 (2-8) paper, water 123 (2-9) shirt, shoes 679 (2-10) shirt, water 169 (2-11) coke, orange, paper 2478A (3-1)
(c) Frequent itemset when the minimum support is set to 50% (infrequent items are strikethrough)
Rule Confidence
Rule Confidence
Rule Confidence coke orange 5/6=0.83 orange coke 5/7=0.71 coke, orange paper 5/5=1.00 coke paper 6/6=1.00 paper coke 6/7=0.86 coke, paper orange 5/6=0.83 orange paper 6/7=0.86 paper orange 6/7=0.86 orange, paper coke 5/6=0.83
(d) Possible association rules when the minimum confidence is set to 80%
Figure 4-16: An example of mining association rules with the hierarchical structure
Sponsored by AIAT.or.th and KINDML, SIIT
213
This multiple-level or multilevel association rules can be generated from mining data at
multiple levels of abstraction efficiently using concept hierarchies under a support-confidence
framework. As shown above, in general, a top-down strategy can be used together with existing
association rule mining algorithms, such as Apriori, FP-tree, CHARM and their variations. That is,
during mining, the counts are accumulated for the calculation of frequent itemsets at each
concept level, by starting at the top concept level, and working downward in the hierarchy
toward the more specific concept levels, until no more frequent itemsets can be found. That is,
for an example in Figure 4-16, the frequent itemsets and rules in (a)-(b) will be mined first before
mining the frequent itemsets and rules in (c)-(d). A number of variations to this approach can
applied. However, three major ones are enumerated as follows.
1. Uniform Minimum Support
By this uniform minimum support for all levels, the same minimum support threshold is
used when mining at each level of abstraction. For the above example, a minimum support
threshold is set to 0.5 for all levels. The mining results for the level 1 and the level 2 are
shown in Figure 4-16.
In this example, when a minimum support threshold of 0.5 is used for ‘apple,’ ‘orange,’
‘coke,’ ‘water,’ ‘ruler,’ ‘paper,’ ‘shoes,’ and ‘shirt,’ the infrequent items, i.e., ‘apple,
(sup=4/10)’ ‘ruler (sup=4/10),’ and ‘shoes (sup=4/10)’ are eliminated. Moreover, when
the same threshold is applied for ‘fruit’, ‘drink’, ‘stationary’ and ‘clothing’, all general
concepts are found to be frequent, even these three subitems are not.
Search in the condition of a uniform minimum support threshold is simple. When we apply
an Apriori-like optimization technique, we can control the search to avoid examining
itemsets containing any item whose ancestors do not have minimum support. It is possible
since we will find the frequent itemsets in a higher level before those in a lower level.
However, the uniform support approach has a number of drawbacks. For example, if the
minimum support threshold is set too high, it could miss some meaningful associations
occurring at low abstraction levels. If the threshold is set too low, it may generate many
uninteresting associations occurring at high abstraction levels.
2. Reduced Minimum Support
Using reduced minimum support at lower levels (referred to as reduced support), each
level of abstraction has its own minimum support threshold. The deeper the level of
abstraction, the smaller the corresponding threshold is. For example, given the following
Sponsored by AIAT.or.th and KINDML, SIIT
214
hierarchy with reduced minimum support, we can mine items in different levels with
different thresholds.
In this hierarchy, the minimum support thresholds for levels 1, 2 and 3, are 0.5, 0.8 and 0.9,
respectively. By this threshold, as shown in Figure 3.52, two higher concepts, ‘drink’ and
‘stationary,’ are considered. In Figure 3.52 (d)-(e), only frequent itemsets and frequent
rules with ‘drink’ and ‘stationary’ are consider. Moreover, the level-1 items can be mined in
the same way but with a lower support (i.e., 0.5), as shown in Figure 3.52 (f)-(g).
3. Level-cross filtering by single items
Similar to the reduced minimum support at lower levels, each level of abstraction has its
own minimum support threshold where the deeper the level of abstraction, the smaller the
corresponding threshold is. But we use the higher-level mining as filtering. For example,
given the following hierarchy with reduced minimum support, we can eliminate some
trivial itemsets and rules as follows.
In this hierarchy, the minimum support thresholds for levels 1, 2 and 3, are 0.5, 0.8 and 0.9,
respectively. By this threshold, only two higher concepts, ‘drink’ and ‘stationary,’ are
considered and ‘fruit’ and ‘clothing’ are eliminated. Therefore, ‘apple,’ ‘orange,’ ‘shoes,’ and
‘shirt’ are not considered for mining. By this, a pruning mechanism can be provided.
Sponsored by AIAT.or.th and KINDML, SIIT
215
4. Level-cross filtering by k-itemset
This case is similar to level-cross filtering by single item but it filters k-itemsets instead of
single items. If the k-itemset at the higher level does not pass minimum support, the k-
itemsets at the lower level under that concepts will not be examined. For example, given
the following hierarchy with reduced minimum support, we examine all 2-itemsets at the
lower level in the case (a). Since ‘drink, stationary’ passes the minimum support, ‘coke,
ruler,’ ‘coke, paper,’ ‘water, ruler’ and ‘water, paper’ will be examined. On the other hand,
we eliminate all 2-itemsets at the lower level in the case (b). Since ‘drink, fruit’ does not
passes the minimum support, ‘coke, apple,’ ‘coke, orange,’ ‘water, apple’ and ‘water, orange’
are pruned out without examination. The level-cross filtering by k-itemset is similar to
level-cross filtering by single items but it filters k-itemsets instead of single items.
(a) The higher concept passes minimum support, all lower items are examined.
(b) The higher concept does not pass minimum support, all lower items are
pruned out.
5. Controlled level-cross filtering by 1-itemset (single item) or k-itemset
This case is similar to level-cross filtering by single item or k-itemset but there is a separate
threshold for setting the condition to examine the lower level. This threshold is set
separately from the minimum support. The following shows the case of 1-itemset (single
item) and the case of 2-itemset.
Sponsored by AIAT.or.th and KINDML, SIIT
216
(a) Controlled level-cross filtering by 1-itemset (single item)
(b) Controlled level-cross filtering by k-itemset
4.2.5. Efficient Association Rule Mining with Hierarchical Structure
While association rule mining (ARM) is a process to find the set of all subsets of items (called
itemsets) that frequently occur in the database records or transactions, and then to extract the
rules telling us how a subset of items influences the presence of another subset. However,
association rules may not provide desired knowledge in the database. It may be limited with the
granularity over the items. For example, a rule “5% of customers who buy wheat breads, also buy
chocolate milk” is less expressive and less useful than a more general rule “30% of customers
who buy bread, also buy milk”. For this purpose, generalized association rule mining (GARM) was
developed using the information of a pre-defined taxonomy over the items. The taxonomy is a
piece of knowledge, e.g. the classification of the products (or items) into brands, categories,
product groups, and so forth. Given a taxonomy where only leaf nodes (leaf items) present in the
transactional database, more informative, initiative and flexible rules (called generalized
association rules) can be mined from the database
Sponsored by AIAT.or.th and KINDML, SIIT
217
Generalized Association Rules and Generalized Frequent Itemsets
With the presence of a concept hierarchy or taxonomy, the formal problem description of
generalized association rule is different from that of association rule mining. For clarity, all
explanations in the section are illustrated using an example shown in Figure 4-17. Let T be a
concept hierarchy or taxonomy, a directed acyclic graph on items, which represents is-a
relationship by edges, e.g. Figure 4-17 (a). The items in T are composed of a set of leaf items ( )
and a set of non-leaf items ( ).
(a) concept hierarchy or taxonomy
TID Items TID Items TID It ms 1 ACDE A 1245 A 1245 2 ABC B 2356 B 2356 3 BCDE C 123456 C 123456 4 ACD D 13456 D 13456 5 ABCDE E 1356 E 1356 6 BCDE U 123456 V 123456 W 13456
(b) horizontal database (left) vs. vertical database (middle) vs. extended vertical database (right)
Figure 4-17: An example of mining generalized association rules
Let be a set of distinct items where , and let be a set
of transaction identifiers (tids). In this example, ,
and . A subset of I is called an itemset and a subset of T is
called a tidset. Normally, a transactional database is represented in the horizontal database
format, where each transaction corresponds to an itemset shown in the left table of Figure 4-17
(b). An alternative to the horizontal database format is the vertical database format, where each
item corresponds to a tidset which contains that item shown in the middle table of Figure 4-17
(b). Note that the original database contains only leaf items. It is possible to represent an original
vertical database by extending it to cover non-leaf items where a transaction of which item also
supports its related items from taxonomy shown in the right table of Figure 4-17 (b). Let the
binary relation be an extended database. For any and , can be denoted
when x is related to y in the database (called x is supported by y). Here, except the elements in I,
lower case letters are used to denote items and upper case letters for itemsets. For , is
an ancestor of (conversely is a descendant of ) when there is a path from to in . For any
, a set of its ancestors (descendants) is denoted by ( ). For
Sponsored by AIAT.or.th and KINDML, SIIT
218
example, and . A generalized itemset is an itemset each
element of which is not an ancestor of the others, . For example,
( for short), ( for short) are generalized itemsets. Let be a
finite set of all generalized itemsets. Note that, for and (power of I).
The support of G, denoted by , is defined by a percentage of the number of transactions in
which occurs as a subset to the total number of transactions, thus . Any is
called generalized frequent itemset (GFI) when its support is at least a user-specified minimum
support (minsup) threshold.
In GARM, a meaningful rule is an implication of the form , where
and no item in is an ancestor of any items in . For example, and are
meaningful rules, while is a meaningless rule because its support is redundant with
. The support of a rule , defined as = ,
is the percentage of the number of transactions containing both and to the total number of
transactions. The confidence of a rule, defined as , is the conditional probability
that a transaction contains , given that it contains . For example, the support of is
and the confidence is =1 or 100%. The
meaningful rule is called a generalized association rule (GAR) when its confidence is at least a
user-specified minimum confidence (minconf) threshold. The task of GARM is to discover all
GARs the supports and confidences of which are at least minsup and minconf, respectively.
Here, it is possible to consider two relationships, namely subset-superset and ancestor-
descendant relationships, based on lattice theory. Similar to ARM, GARM occupies the subset-
superset relationship which represents a lattice of generalized itemsets. As the second
relationship, an ancestor-descendant relationship is originally introduced to represent a set of k-
generalized itemset taxonomies. By these relationships, it is possible to mine a smaller set of
generalized closed frequent itemsets instead of mining a large set of conventional generalized
frequent itemsets. Two algorithms called SET and cSET are introduced to mine generalized
frequent itemsets and generalized closed frequent itemsets, respectively. By a number of
experiments, SET and cSET outperform the previous well-known algorithms in both
computational time and memory utilization. The number of generalized closed frequent itemsets
is much more smaller than the number of generalized frequent itemsets.
4.3. Historical Bibliography
As unsupervised learning, clustering has been studied extensively in many disciplines due to its
broad applications. Several textbooks are dedicated to the methods of cluster analysis, including
Hartigan (1975), Jain and Dubes (1988), Kaufman and Rousseeuw (1990), and Arabie, Hubert,
and De Sorte (1996). Many survey articles on different aspects of clustering methods include
those done by Jain, Murty and Flynn (1999) and Parsons, Haque, and Liu (2004). As a
partitioning method, Lloyd (1957) and later MacQueen (1967) introduce the k-means algorithm.
Later, Bradley, Fayyad, and Reina (1998) proposed a k-means–based scalable clustering
algorithm. Instead of using means, Kaufman and Rousseeuw (1990) proposed to use the nearest
object to the means as the cluster center and call the method the k-medoids algorithm with two
versions of PAM and CLARA. To cluster categorical data (in contrast with numerical data),
Chaturvedi, Green, and Carroll (1994, 2001) proposed the k-modes clustering algorithm. Later
Haung (1998) proposed independently the k-modes (for clustering categorical data) and k-
prototypes (for clustering hybrid data) algorithms. As an extension to CLARA, the CLARANS
Sponsored by AIAT.or.th and KINDML, SIIT
219
algorithm was later proposed by Ng and Han (1994). Ester, Kriegel, and Xu (1995) proposed
efficient spatial access techniques, such as R*-tree and focusing techniques, to improve the
performance of CLARANS. An early survey of agglomerative hierarchical clustering algorithms
was conducted by Day and Edelsbrunner (1984). Kaufman and Rousseeuw (1990) also
introduced agglomerative hierarchical clustering, such as AGNES, and divisive hierarchical
clustering, such as DIANA. Later, Zhang, Ramakrishnan, and Livny (1996) proposed the BIRCH
algorithm which integrates hierarchical clustering with distance-based iterative relocation or
other nonhierarchical clustering methods to improve the clustering quality of hierarchical
clustering methods. The BIRCH algorithm partitions objects hierarchically using tree structures
whose leaf (or low-level nonleaf nodes) are treated as microclusters, depending on the scale of
resolution. After that, it applies other clustering algorithms to perform macroclustering on the
microclusters. Proposed by Guha, Rastogi, and Shim (1998, 1999), CURE and ROCK utilized
linkage or nearest-neighbor analysis and its transformation to improve the conventional
hierarchical clustering. Exploring dynamic modeling in hierarchical clustering, the Chameleon
was proposed by Karypis, Han, and Kumar (1999). As an early density-based clustering method,
Ester, Kriegel, Sander, and Xu (1996) proposed DBSCAN, which is the first algorithm to utilize the
density in clustering with some parameters needed to be specified. After that, Ankerst, Breunig,
Kriegel, and Sander [ABKS99] have proposed a cluster-ordering method, namely OPTICS, which
facilitates density-based clustering without consideration of parameter setting. Almost at the
same time, Hinneburg and Keim (1998) proposed the DENCLUE algorithm, which use a set of
density distribution functions to glue similar objects together. As a grid-based multi-resolution
approach, STING was proposed by Wang, Yang, and Muntz [WYM97] to cluster objects using
statistical information collected in grid cells. Instead of the original feature space, Sheikholeslami,
Chatterjee, and Zhang (1998) applied wavelet transform to implement a multi-resolution
clustering method, namely WaveCluster, which is a combination of grid- and density-based
approach. As another hybrid of a grid- and density-based approach, CLIQUE was designed based
on Apriori by Agrawal, Gehrke, Gunopulos, and Raghavan (1998) to cope with high-dimensional
clustering using dimension-growth subspace clustering. As model-based clustering, Dempster,
Laird, and Rubin (1977) proposed a well-known statistics-based method, namely the EM
(Expectation-Maximization) algorithm. Handling missing data in EM methods was presented by
Lauritzen (1995). As a variant of the EM algorithm, AutoClass was proposed by Cheeseman and
Stutz (1996) with incorporation of Bayesian Theory. While conceptual clustering was first
introduced by Michalski and Stepp (1983), the first example is COBWEB invented by Fisher
(1987). A succeeding version is CLASSIT by Gennari, Langley, and Fisher (1989). The task of
association rule mining was first introduced by Agrawal, Imielinski, and Swami (1993). The
Apriori algorithm for frequent itemset mining and a method to generate association rules from
frequent itemsets was presented by Agrawal and Srikant (1994a, 1994b). Agrawal and Srikant
(1994b), Han and Fu (1995), and Park, Chen, and Yu (1995) described transaction reduction
techniques in their papers. Later, Pasquier, Bastide, Taouil, and Lakhal (1999) proposed a
method to mine frequent closed itemsets, namely A-Close, based on Apriori algorithm. Later, Pei,
Han, and Mao (2000) proposed CLOSET, an efficient closed itemset mining algorithm based on
the frequent pattern growth method. As a further refined algorithm, CLOSET+ was invented by
Wang, Han, and Pei (2003). Savasere, Omiecinski, and Navathe (1995) introduced the
partitioning technique. Toivonen (1996) explored the sampling techniques while Brin, Motwani,
Ullman, and Tsur (1997) provided a dynamic itemset counting approach. Han, Pei, and Yin
(2000) proposed the FP-growth algorithm, a pattern-growth approach for mining frequent
itemsets without candidate generation. Grahne and Zhu (2003) introduced FPClose, a prefix-tree-
based algorithm for mining closed itemsets using the pattern-growth approach. Zaki (2000)
Sponsored by AIAT.or.th and KINDML, SIIT
220
proposed an approach for mining frequent itemsets by exploring the vertical data format, called
ECLAT. Zaki and Hsiao (2002) presented an extension for mining closed frequent itemsets with
the vertical data format, called CHARM. Bayardo (1998) gave the first study on mining max-
patterns. Multilevel association mining was studied in Han and Fu (1995) and Srikant and
Agrawal (1995, 1997). In Srikant and Agrawal (1995, 1997), five algorithms named Basic,
Cumulate, Stratify, Estimate and EstMerge were proposed. These algorithms apply the horizontal
database and breath-first search strategy like Apriori-based algorithm. Later, Hipp, A. Myka, R.
Wirth, and U. Guntzer (1998) proposed a method, namely Prutax, to use hash tree checking with
vertical database format to avoid generating meaningless itemsets and then to reduce the
computational time needed for multiple scanning the database. Lui and Chung (2000) proposed
an efficient method to discover generalized association rules with multiple minimum supports. A
parallel algorithm for generalized association rule mining (GARM) has also been proposed by
Shintani and Kitsuregawa (1998). Some recent applications that utilize a GARM are shown by
Michail (2000) and Hwang and Lim (2002). Later, Sriphaew and Theeramunkong (2002, 2003,
2004) introduced two types of constraints on two generalized itemset relationships, called
subset-superset and ancestor-descendant constraints, to mine only a small set of generalized
closed frequent itemsets instead of mining a large set of conventional generalized frequent
itemsets. Two algorithms, named SET and cSET, are proposed by Sriphaew and Theeramunkong
(2004) to efficiently find generalized frequent itemsets and generalized closed frequent itemsets,
respectively.
Sponsored by AIAT.or.th and KINDML, SIIT
221
Exercise
1. Apply k-means algorithm to cluster the following data.
AreaType Humidity Temperature
Sea 75 40
Mountain 40 20
Mountain 45 25
Mountain 70 40
Sea 70 25
Here, use the following approaches as the method for clustering
Distance: Euclidean distance ))(...)()( 22
22
2
11 jpipjijiij xxxxxxd
For nominal attribute: use 0 and 1 For numeric attributes: decimal scaling normalization
2. Apply the hierarchical-based clustering for the previous problem.
3. Explain the merits and demerits of the grid-based clustering.
4. Apply the DBSCAN to cluster the following data points. Here, set the minimum number of
objects to three and the radius of neighborhood objects to .
5. Explain the concept of conceptual clustering and EM algorithms, including their merits and
demerits?
6. Assume that the database is as follows. Find frequent itemsets and frequent rules when
minimum support and minimum confidence are set to 60% and 80%, respectively. Show the
process of Apriori, FP-Tree and CHARM methods.
TID ITEMS
1 apple, bacon, bread, pizza, potato, tuna, water
2 bacon, bread, cookie, corn, nut, shrimp, water
3 apple, bread, cookie, nut, pizza, potato, shrimp, tuna, water
4 apple, bacon, bread, cookie, nut, potato, water
5 bread, cookie, corn, nut, pizza, potato, water
6 apple, cookie, corn, nut, potato, shrimp, water
7 bacon, bread, cookie, corn, nut, tuna, water
8 apple, bread, nut, potato, shrimp, water
Sponsored by AIAT.or.th and KINDML, SIIT
222
7. From the table in the previous question, describe how to calculate supports, confidence,
interestingness (correlation) and conviction. Enumerate a number of frequent rules and a
number of interesting rules
8. Explain applications of generalized association rules in sales databases and mobile