Data mining Data mining II II The fuzzy way The fuzzy way Włodzisław Duch Włodzisław Duch Dept. of Informatics, Dept. of Informatics, Nicholas Copernicus University, Nicholas Copernicus University, Toruń, Toruń, Poland Poland http://www.phys.uni.torun.pl/~duch http://www.phys.uni.torun.pl/~duch ISEP Porto, 8-12 July 2002
39
Embed
Data mining II The fuzzy way Włodzisław Duch Dept. of Informatics, Nicholas Copernicus University, Toruń, Poland duch ISEP.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Data miningData mining IIIIThe fuzzy wayThe fuzzy wayData miningData mining IIIIThe fuzzy wayThe fuzzy way
Włodzisław DuchWłodzisław Duch
Dept. of Informatics, Dept. of Informatics, Nicholas Copernicus University, Nicholas Copernicus University,
Basic ideasBasic ideasBasic ideasBasic ideas• Complex problems cannot be analyzed preciselyComplex problems cannot be analyzed precisely• Knowledge of an expert may be approximated Knowledge of an expert may be approximated using imprecise conceptsusing imprecise concepts.. If the weather is If the weather is nicenice and the place is and the place is attractiveattractive then then not many participants staynot many participants stay at the at the school.school.
Fuzzy logic/systems includeFuzzy logic/systems include::• Mathematics of fuzzy sets/systems, fuzzy logics.Mathematics of fuzzy sets/systems, fuzzy logics.• Fuzzy knowledge representation for clusterization, Fuzzy knowledge representation for clusterization, • Classification and regressionClassification and regression..• Extraction of fuzzy concepts and rules from dataExtraction of fuzzy concepts and rules from data..• Fuzzy control theoryFuzzy control theory..
Types of uncertaintyTypes of uncertaintyTypes of uncertaintyTypes of uncertainty
• Stochastic uncertaintyStochastic uncertainty Rolling diceRolling dice, , accidentaccident, , insurance riskinsurance risk … … - - probability theoryprobability theory..• Measurement uncertainty Measurement uncertainty About About 3 cm; 20 3 cm; 20 degreesdegrees - - statisticsstatistics..• Information uncertaintyInformation uncertainty Trustworthy clientTrustworthy client, , known constraintsknown constraints - data - data miningmining. . • Linguistic uncertaintyLinguistic uncertainty SmallSmall, , fastfast, , low price low price – – fuzzy logic. fuzzy logic.
Crisp setsCrisp setsCrisp setsCrisp sets
youngyoung((xx))
Membership Membership functionfunction
young = { x M | age(x) 20 }
young(x) = 1 : age(x) 200 : age(x) > 20
A=“young”
x [years]
1
0
Fuzzy setsFuzzy setsXX universumuniversum, , spacespace; ; xx XXAA linguistic variablelinguistic variable, , conceptconcept, , fuzzy setfuzzy set..AA – a Membership Function (MF), determining – a Membership Function (MF), determining
the degreethe degree, , to which to which xx belongs to belongs to AA. .
Linguistic variablesLinguistic variables, concepts – sums of fuzzy sets., concepts – sums of fuzzy sets.Logical predicate functions with continuous valuesLogical predicate functions with continuous values. . Membership value: different from probability.Membership value: different from probability.
(bold) = 0.(bold) = 0.8 8 does not mean bold does not mean bold 1 1 in in 5 5 casescases..Probabilities are normalized to 1Probabilities are normalized to 1, , MF are notMF are not..
Fuzzy concepts are subjective and context-dependentFuzzy concepts are subjective and context-dependent..
MP are usually convexMP are usually convex, , with single maximumwith single maximum..MPs for similar numbers overlapMPs for similar numbers overlap. .
NumbersNumbers: : corecore= = pointpoint, , xx ((xx)=1)=1Decrease monotonically on both sides of the coreDecrease monotonically on both sides of the core..TypicallyTypically: : triangular functionstriangular functions (a,b,c) (a,b,c) or singletonsor singletons. .
1/ 3 2 / 3 1 2 / 3 1/ 37
5 6 7 8 9F
( ) /i
A i ix X
A x x
( ) /A
X
A x x
Fuzzy rulesFuzzy rules
Commonsense knowledge may sometimes be captured Commonsense knowledge may sometimes be captured in an natural way using fuzzy rulesin an natural way using fuzzy rules. .
What does it mean for fuzzy rules:What does it mean for fuzzy rules:IFIF xx is is A A then then yy isis B ? B ?
Fuzzy implicationFuzzy implicationIf => means correlationIf => means correlation T-norma T( T-norma T(AA,,BB)) is sufficient is sufficient..A=>B A=>B has many realizations. has many realizations.
Interpretation of implicationInterpretation of implicationInterpretation of implicationInterpretation of implication
IfIf xx isis A A thenthen yy isis B: B: correlationcorrelation or implicationor implication..
A=>B A=>B not A not A oror B BA entails BA entails B
A
B
x
y
A
B
y
x
A=>B A=>B A A andand B B
Types of rulesTypes of rulesTypes of rulesTypes of rules
Mamdani typeMamdani type: : IF MF IF MFAA(x)=high then MF(x)=high then MFBB(y)=medium(y)=medium..
Takagi-Sugeno type: IF MFTakagi-Sugeno type: IF MFAA(x)=high then y=f(x)=high then y=fAA(x) (x)
FIR, Fuzzy Implication Rules.FIR, Fuzzy Implication Rules.Logic of implications between fuzzy factsLogic of implications between fuzzy facts..
Linear fLinear fAA(x) – first order Sugeno type.(x) – first order Sugeno type.
FIS, Fuzzy Inference Systems.FIS, Fuzzy Inference Systems.Combine rules fuzzy rules to calculate final decisionsCombine rules fuzzy rules to calculate final decisions. .
Fuzzy systems Fuzzy systems F: F: nn pp use m rules to map use m rules to map vectorvector x on the output F(x)x on the output F(x), , vector or scalarvector or scalar. .
3. 3. InferenceInference3. 3. InferenceInferenceCalculate the degree of truth of rule conclusion: use T-norms such as MIN or product to combine the degree of fulfillment of conditions and the MF of conclusion.
full(h)
THEN Heating=full
conclusions(h)
h
1
0cond=0.3...
h
1
0
mocno(h)
cond =0.3...
konkl(h)
Inference MINconcl=min{cond, full}
Inference
concl. = cond • full
4. Aggregation4. Aggregation
h
1
0
THEN Heating=fullTHEN Heating =mediumTHEN Heating =no
Aggregate all possible rule conclusion using MAX operator to calculate the sum.
5. Defuzzification5. DefuzzificationCalculate crisp value/decision using for example the “Center of Gravity” (COG) method:
h
1
0
concl(h) COG
73
For discrete sets a „center of singletons”, for continuous:
i i • Ai • ci
i i • Ai
i = degree of membership in iAi = area under MF for the set i ci = center of gravity for the set i.
h =
FIS for heating FIS for heating
Fuzzification Inference Defuzzification
T
freeze cold warm
Measured temperature
0.2
0.7
if temp=freezing then valve=open
if temp=cold then valve=half open
if temp=warm then valve=closed
Rule base
freeze=0.7
cold =0.2
hot =0.0
v
full half closed
Output that controls the valve position
0.2
0.7
Takagi-Sugeno rules
Mamdani rules: conclude that IF X1 = A1 i X2 =A2 … Xn = An Then Y = B
TS rules: conclude some functional dependence f(xi) IF X1 = A1 i X2 = A2 …. Xn = An Then Y=f(x1,x2,..xn)
TS rules are usually based on piecewise linear functions(equivalent to linear splines approximation): IF X1 = A1 i X2 = A2…Xn = An Then Y=a0 + a1x1 … +anxn
Induction of fuzzy rulesInduction of fuzzy rulesAll this may be presented in form on networks. All this may be presented in form on networks.
Choices/adaptive parameters in fuzzy rules:Choices/adaptive parameters in fuzzy rules:
• The number of rules (nodes)The number of rules (nodes)..• The number of terms for each attribute.The number of terms for each attribute.• Position of the membership function Position of the membership function ((MFMF). ). • MF shape for each attribute/term.MF shape for each attribute/term.• Type of rules (conclusions)Type of rules (conclusions)..• Type of inference and composition operatorsType of inference and composition operators. . • Induction algorithmsInduction algorithms: : incremental or refinementincremental or refinement..• Type of learning procedure.Type of learning procedure.
Feature space partitionFeature space partition
Regular grid Independent functions
MFs on a gridMFs on a gridMFs on a gridMFs on a grid• AdvantageAdvantage: : simplest approachsimplest approach• Regular gridRegular grid: : divide each dimension in a fixed divide each dimension in a fixed
number of MFs and assign an average value from all number of MFs and assign an average value from all samples that belong to the regionsamples that belong to the region. .
• Irregular gridIrregular grid: : find largest error, divide the grid there find largest error, divide the grid there in two parts adding new MFin two parts adding new MF..
• Mixed methodMixed method: : start from regular grid, adapt start from regular grid, adapt parameters laterparameters later. .
• DisadvantagesDisadvantages: : for for kk dimensions and dimensions and NN MFs in each MFs in each NNkk areas are created areas are created !!Poor quality of approximationPoor quality of approximation..
Optimized MPOptimized MPOptimized MPOptimized MP• AdvantagesAdvantages: : higher accuracyhigher accuracy, , better approximation, less better approximation, less
• Optimized MP may come from:Optimized MP may come from:• Neurofuzzy systemsNeurofuzzy systems – – equivalent to equivalent to RBF RBF network with network with
Gaussian functions (several proofs). Gaussian functions (several proofs). FSM FSM models with triangular ormodels with triangular or trapezoidal functions.trapezoidal functions. Modified MLP networks with Modified MLP networks with bicentralbicentral functions, etc. functions, etc.
• DisadvantagesDisadvantages: : extraction of rules is hard, optimized extraction of rules is hard, optimized MPs are more difficult to create. MPs are more difficult to create.
Improving sets of rules.Improving sets of rules.• How to improve known sets of rulesHow to improve known sets of rules? ? • Use minimization methods to improve parameters of Use minimization methods to improve parameters of
fuzzy rulesfuzzy rules: : usually non-gradient methods are usedusually non-gradient methods are used; ; most often genetic algorithmsmost often genetic algorithms. .
• change rules into neural networkchange rules into neural network, , train the network train the network and convert it into rules againand convert it into rules again..
• Use heuristic methods for local adaptation of Use heuristic methods for local adaptation of parameters of individual rulesparameters of individual rules. .
• Fuzzy logic – good for modeling imprecise knowledgeFuzzy logic – good for modeling imprecise knowledge but but ......
• How do the decision borders of FIS look like? How do the decision borders of FIS look like? Is it worthwhile to make input fuzzy and output crispIs it worthwhile to make input fuzzy and output crisp? ?
• Is it the best approximation methodIs it the best approximation method? ?
Fuzzy rules and data uncertaintyFuzzy rules and data uncertaintyData has been measured with unknown error. Assume Gaussian distribution:
( ; , )x xx G G y x s
x – fuzzy number with Gaussian membership function.
A set of logical rules R is used for fuzzy input vectors: Monte Carlo simulations for arbitrary system => p(Ci|X)
Analytical evaluation p(C|X) is based on cumulant:
1; , 1 erf ( )
2 2
a
x
x
a xa x G y x s dy a x
s
2.4 / 2 xs Error function is identical to logistic f. < 0.02
Fuzzification of crisp rulesFuzzification of crisp rules
Rule Ra(x) = {xa} is fulfilled by Gx with probability
( ) T ; , ( )a x x
a
p R G G y x s dy x a
Error function is approximated by logistic function; assuming error distribution (x)x)),
for s2=1.7 approximates Gauss < 3.5%
Rule Rab(x) = {b> x a} is fulfilled by Gx with probability:
( ) T ; , ( ) ( )b
ab x x
a
p R G G y x s dy x a x b
Soft trapezoids and NNSoft trapezoids and NN
Conclusion: fuzzy logic with (x) (xb) m.f. is equivalent to crisp logic + Gaussian uncertainty. Gaussian classifiers (RBF) are equivalent to fuzzy systems with Gaussian membership functions.
The difference between two sigmoids makes a soft trapezoidal membership functions.
Optimization of rulesOptimization of rules
Fuzzy: large receptive fields, rough estimations.Gx – uncertainty of inputs, small receptive fields.
Minimization of the number of errors – difficult, non-gradient, but now Monte Carlo or analytical p(C|X;M).
21{ }; , | ; ( ),
2x i iX i
E X R s p C X M C X C
• Gradient optimization works for large number of parameters.
• Parameters sx are known for some features, use them as
optimization parameters for others! • Probabilities instead of 0/1 rule outcomes.• Vectors that were not classified by crisp rules have now non-
zero probabilities.
SummarySummarySummarySummary• Fuzzy sets/logic is a useful form of knowledge Fuzzy sets/logic is a useful form of knowledge
representation, allowing for approximate but natural representation, allowing for approximate but natural expression of some types of knowledge.expression of some types of knowledge.
• An alternative way is to include uncertainty of input data An alternative way is to include uncertainty of input data while using crisp logic rules.while using crisp logic rules.
• Adaptation of fuzzy rule parameters leads to neurofuzzy Adaptation of fuzzy rule parameters leads to neurofuzzy systems; the simplest are the RBF networks and systems; the simplest are the RBF networks and Separable Function Networks (SFN), equivalent to any Separable Function Networks (SFN), equivalent to any fuzzy inference systems.fuzzy inference systems.
• Results may sometimes be better than with other Results may sometimes be better than with other systems since it is easier to include a priori knowledge systems since it is easier to include a priori knowledge in fuzzy systems. in fuzzy systems.
DisclaimerDisclaimerDisclaimerDisclaimerA few slides/figures were taken from various presentations found in A few slides/figures were taken from various presentations found in the Internet; unfortunately I cannot identify original authors at the the Internet; unfortunately I cannot identify original authors at the moment, since these slides went through different iterations; one moment, since these slides went through different iterations; one source seems to be J.-S. Roger Jang from NTHU, Taiwan. source seems to be J.-S. Roger Jang from NTHU, Taiwan.
I have to apologize for that.I have to apologize for that.