35 Chapter-3 ROUGH SET THEORY AND REDUCT BASED RULE GENERATION 3.1 Introduction Rough Set Theory (RST), proposed in 1982 by Zdzislaw Pawlak, is since then in a state of constant development. Its methodology is concerned with the classification and analysis of imprecise, uncertain or incomplete information and knowledge, and has been considered as one of the first non-statistical approaches in data analysis [107]. The fundamental concept behind RST is the approximation of lower and upper spaces of a set, the approximation of spaces being the formal classification of knowledge regarding the interest domain. The subset generated by lower approximations is characterized by objects that will definitely form part of an interest subset, whereas the upper approximation is characterized by objects that will possibly form part of an interest subset. Every subset defined through upper and lower approximation is known as Rough Set [107]. Over the years RST has become a valuable tool in the resolution of various problems, such as representation of uncertain or imprecise knowledge; knowledge analysis; evaluation of quality and availability of information with respect to consistency and presence of data patterns; identification and evaluation of data dependency and reasoning based on uncertain and reduct of information data [105]. The extent of rough set applications used today is much wider than in the past, principally in the areas of medicine, analysis of database attributes and process control. RST has some overlaps with other methods of data analysis, e.g., statistics, cluster analysis, fuzzy sets, evidence theory and others but it can be viewed in its own rights as an independent discipline [114]. The rough set approach seems to be of fundamental importance to AI and cognitive sciences, especially in the areas of machine learning, knowledge acquisition, decision analysis, knowledge discovery from databases, expert systems,
38
Embed
ROUGH SET THEORY AND REDUCT BASED RULE GENERATIONshodhganga.inflibnet.ac.in/bitstream/10603/22715/11... · ROUGH SET THEORY AND REDUCT BASED RULE GENERATION 3.1 Introduction Rough
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
35
Chapter-3
ROUGH SET THEORY AND REDUCT
BASED RULE GENERATION
3.1 Introduction
Rough Set Theory (RST), proposed in 1982 by Zdzislaw Pawlak, is since then in a state
of constant development. Its methodology is concerned with the classification and
analysis of imprecise, uncertain or incomplete information and knowledge, and has been
considered as one of the first non-statistical approaches in data analysis [107]. The
fundamental concept behind RST is the approximation of lower and upper spaces of a set,
the approximation of spaces being the formal classification of knowledge regarding the
interest domain. The subset generated by lower approximations is characterized by
objects that will definitely form part of an interest subset, whereas the upper
approximation is characterized by objects that will possibly form part of an interest
subset. Every subset defined through upper and lower approximation is known as Rough
Set [107]. Over the years RST has become a valuable tool in the resolution of various
problems, such as representation of uncertain or imprecise knowledge; knowledge
analysis; evaluation of quality and availability of information with respect to consistency
and presence of data patterns; identification and evaluation of data dependency and
reasoning based on uncertain and reduct of information data [105].
The extent of rough set applications used today is much wider than in the past, principally
in the areas of medicine, analysis of database attributes and process control. RST has
some overlaps with other methods of data analysis, e.g., statistics, cluster analysis, fuzzy
sets, evidence theory and others but it can be viewed in its own rights as an independent
discipline [114]. The rough set approach seems to be of fundamental importance to AI
and cognitive sciences, especially in the areas of machine learning, knowledge
acquisition, decision analysis, knowledge discovery from databases, expert systems,
36
inductive reasoning and pattern recognition. It seems of particular importance to decision
support systems and data mining [103]. This theory has been successfully applied in
many real-life problems in medicine, pharmacology, engineering, banking, financial and
market analysis and others [115].
An early application of this theory was to classify imprecise and incomplete information.
Reduct and Core are the two important concepts in rough sets theory. RST is an elegant
theory when applied to small data sets because it can always find the minimal reduct and
generate minimal rule sets. However, the general solution for finding the minimal reduct
is NP-hard [78]. An NP-hard problem is defined as a problem that cannot be solved in
polynomial time. In other words as data sets grow large both in dimension and volume
then finding the minimal reduct becomes computationally infeasible.
Rough sets approach shows many advantages. The most important ones are [115].
Synthesis of efficient algorithms for finding hidden patterns in data;
Identification of relationships that would not be found using statistical methods;
Representation and processing of both qualitative and quantitative parameters and
mixing of user-defined and measured data;
Reduction of data to a minimal representation (data reduction);
Evaluation of the significance of data;
Synthesis of classification or decision rules from data;
Legibility and straightforward interpretation of synthesized models;
Generates sets of decision rules from data;
• It is easy to understand;
• Offers straightforward interpretation of obtained results;
So rough set theory can be very useful in many intelligent industrial applications as an
independent approach or combined together with other areas of soft computing, e.g.
fuzzy sets, neural networks, logistic regression etc.
37
3.2 Basic Philosophy of Rough Set
In this section, all the important concepts related to Rough sets theory have been defined
[106].
3.2.1 Equivalence Relation
Let U be a non-empty set and let p, q, and r be elements of U. Consider R such that pRq
if and only if (p, q) is in R. R is an equivalence relation if it satisfies the following three
properties:
i) Reflexive Property: (p, p) is in R for all p in U.
ii) Symmetric Property: if (p, q) is in R, then (q, p) is in R.
iii) Transitive Property: if (p, q) and (q, r) are in R, then (p, r) is in R.
3.2.2 Decision Table or Information System and Indiscernibility Relation
Let T = (U, A, Q, ρ) be a Information system, where U is a non-empty finite set of
objects called the universe, A is a set of attributes, Q is the union of domains of attributes
in A and ρ : U X Q A is a total description function. For classification of objects, set
of attributes A is divided into condition attributes denoted by CON and decision attribute
denoted by DEC. In the context of classification the information table is known as
decision table. The elements of U are called objects, cases, instances or observations[110].
Attributes are interpreted as features, variables or characteristic conditions. Given a
feature a, such that:
a :U Va for a A, Va is called the value set of a .
Let a A, P A, the indiscernibility relation IND(P), is defined as:
IND(P) = {(x, y)U U : for all a P, a(x) = a( y)} In simple words, two objects are
indiscernible if we can not differentiate between them, because they do not differ enough
on the subset P of attributes.
3.2.3 Lower Approximation of a Subset
Let B C and X U , the B-lower approximation set of X, is the set of all elements
of U which can be with certainty classified as elements of X.
XxBUxXB )(:)(
38
This is the B-lower approximation of the subset of X.
3.2.4 Upper Approximation of a Subset
The B-upper approximation set of X is the set of all element of U, that can possibly
belong to the subset of interest X [111].
This represents the B-upper approximation of the subset of X.
3.2.5 Boundary Region of a Subset
It is the collection of elementary sets defined by:
Boundary region consists of those objects that we cannot decisively classify into X in B.
3.2.6. Rough Set
A subset defined through its lower and upper approximations is called a Rough Set.
When the boundary region is a non-empty set that is )(XB ≠ )(XB then the set is
called a Rough Set.
3.2.7 Crisp Set.
A set is called Crisp set when its boundary region is empty that is )(XB = )(XB
3.2.8 Positive Region of a Subset
It is the set of all objects from the universe U which can be classified with certainty to
classes of U / D employing attributes from C.
CPOS )(D = )(XC
XxBUxXB )(:)(
)()()( XBXBXBNB
39
Where )(XC denotes the lower approximation of the set X with respect to C. The
positive region of the subset X belonging to the partition U/D is also called the lower
approximation of the set X. The positive region of a decision attribute with respect to a
subset C represents approximately the quality of C. The union of the positive and the
boundary regions constitutes the upper approximation [109].
Definition 3.2.9 Negative Region of a Subset
The negative region consists of those elementary sets that have no predictive power for a
subset X given a concept R. They consist of all classes that have no overlap with the
concept. That is,
RNEG )(X = )(XRU
3.2.10. Reduct
A system T = (U, A, C, D) is independent if all c in C are indispensable. A set of features
R C is called the reduct of C if T'= (U, A, R, D) is independent and
POSR (D) = POSC (D). Furthermore, there is no T R such that
POST (D) = POSC (D)
A Reduct is a minimal set of features that preserves the indiscernibility relation produced
by a partition of C. There could be several subsets of attributes like R. Similar or
indiscernible objects may be represented several times on an information table, some of
the attributes maybe superfluous or irrelevant, and they could be removed without loss of
classification performance [140].
3.2.11 Core
The set of all the features indispensable in C is denoted by CORE(C). We have
CORE(C) = RED(C)
Where RED(C) is the set of all reducts of C. Thus, the Core is the intersection of all
reducts of an information system [112]. The Core does not consider the dispensable
features and it can be expanded using Reducts.
40
3.2.12. The Dependency Coefficient
Let T = (U, A, C, D) be a decision table. The Dependency Coefficient between the
condition attributes C, and the decision attribute D is given by
γ(C,D) = |POSC(D)| / |U|
The dependency coefficient varies between 0 and 1, since it expresses the proportion of
the objects correctly classified with respect to the total, considering the conditional
features set. If γ = 1, D depend totally on C, if 0<γ<1, the D depends partially on C, and
if γ = 0, then D does not depend on C [113]. A decisional attribute depends on the set of
conditional features if all values of decisional feature D are uniquely determined by
values of conditional attributes; that is there exist a dependency between values of
decisional and conditional features.
3.2.13 Accuracy of the Approximation
The accuracy of the approximation of the set X from the elementary subsets is measured
as the ratio of the lower and the upper approximation size. The ratio is equal to 1, if no
boundary region exists, which indicates a perfect classification. In this case, deterministic
rules for the data classification can be generated [112].
α(X) = Lower(X) / Upper(X)
Thus, a set X with accuracy equal to 1 is crisp, otherwise X is rough. Obviously 0 α
1.
3.2.14 Significance of Attributes
One of the first ideas was to consider the relevant features in the core of an information
system, i.e. the features that belong to the intersection of all reducts of the information
system [108]. It can be easily checked that several definitions of relevant features that are
used by machine learning community can be interpreted by choosing a relevant decision
system corresponding to the given information system [9].
It is also possible to find the relevant features from some approximate reducts of
sufficiently high quality. In attribute reduction some of the attributes can be eliminated
from the information table without loosing relevant information contained in the table [8].
The idea of attribute reduction can be generalized by an introduction of the concept of
41
significance of attributes, which enables an evaluation of attributes not only by a two-
valued scale, {dispensable , indispensable} but by associating with an attribute a real
number from the [0,1] closed interval; this number expresses the importance of the
attribute in the information table.
Let C and D be the sets of condition and decision attributes respectively and let a be a
condition attribute, i.e., Ca . As indicated earlier the number ),( DC expresses the
degree of consistency of the decision table, or the degree of dependency between
attributes C and D, or accuracy of approximation of U/D by C. We can ask how the
coefficient ),( DC changes when removing the attribute a, i.e., what is the difference
between ),( DC and )},{( DaC . We can normalize the difference and define the
significance of the attribute a as
),(
)},{(1
),(
))},{(),(()(),(
DC
DaC
DC
DaCDCaDC
,
Where C and D are condition and decision attributes respectively. In general we have
1)(0 a , for any sets of C and D.
3.3 Information System of a Patient Dataset
A data set is represented as a decision table, where each row represents a case, an event, a
patient, or simply an object. Every column represents an attribute (a variable, an
observation, a property, etc.) and its value for different objects. The attribute values may
be supplied by human expert or user. This table is called an information system or
decision table [159].
A decision table is used to specify what conditions lead to decisions. A decision table is
defined as T = (U, A, Q, ρ) where U is the set of objects in the table, A is a set of
attributes, Q is the union of domains of attributes in A and ρ : U X C A is a total
description function. For classification of objects, set of attributes A is divided into
condition attributes denoted by CON and decision attribute denoted by DEC and A =
CON DEC and CON ∩ DEC = φ [58]. In order to take an example we have collected
the data of 15 patients from a hospital as shown in table 3.1. Patients may or may not
have heart problem depending on the values of condition attributes. There are seven
42
condition attribute and one decision attribute. Each condition attribute has some value
from the domain of that particular attribute.
Let B = { P1, P2, P3, P4, P5, P6, P7, P8, P9, P10, P11, P12, P13, P14, P15} be the set
of 15 patients.
The set of Condition attributes of Information System C = {Heart Palpitation, Blood