Data mining II The fuzzy way Włodzisław Duch Dept. of Informatics, Nicholas Copernicus University, Toruń, Poland duch ISEP.

Data miningData mining IIIIThe fuzzy wayThe fuzzy wayData miningData mining IIIIThe fuzzy wayThe fuzzy way

Włodzisław DuchWłodzisław Duch

Dept. of Informatics, Dept. of Informatics, Nicholas Copernicus University, Nicholas Copernicus University,

Toruń, Toruń, PolandPoland

http://www.phys.uni.torun.pl/~duchhttp://www.phys.uni.torun.pl/~duch

ISEP Porto, 8-12 July 2002

http://www.phys.uni.torun.pl/~duch

Basic ideasBasic ideasBasic ideasBasic ideas• Complex problems cannot be analyzed preciselyComplex problems cannot be analyzed precisely• Knowledge of an expert may be approximated Knowledge of an expert may be approximated using imprecise conceptsusing imprecise concepts.. If the weather is If the weather is nicenice and the place is and the place is attractiveattractive then then not many participants staynot many participants stay at the at the school.school.

Fuzzy logic/systems includeFuzzy logic/systems include::• Mathematics of fuzzy sets/systems, fuzzy logics.Mathematics of fuzzy sets/systems, fuzzy logics.• Fuzzy knowledge representation for clusterization, Fuzzy knowledge representation for clusterization, • Classification and regressionClassification and regression..• Extraction of fuzzy concepts and rules from dataExtraction of fuzzy concepts and rules from data..• Fuzzy control theoryFuzzy control theory..

Types of uncertaintyTypes of uncertaintyTypes of uncertaintyTypes of uncertainty

• Stochastic uncertaintyStochastic uncertainty Rolling diceRolling dice, , accidentaccident, , insurance riskinsurance risk … … - - probability theoryprobability theory..• Measurement uncertainty Measurement uncertainty About About 3 cm; 20 3 cm; 20 degreesdegrees - - statisticsstatistics..• Information uncertaintyInformation uncertainty Trustworthy clientTrustworthy client, , known constraintsknown constraints - data - data miningmining. . • Linguistic uncertaintyLinguistic uncertainty SmallSmall, , fastfast, , low price low price – – fuzzy logic. fuzzy logic.

Crisp setsCrisp setsCrisp setsCrisp sets

youngyoung((xx))

Membership Membership functionfunction

young = { x M | age(x) 20 }

young(x) = 1 : age(x) 200 : age(x) > 20

A=“young”

x [years]

1

0

Fuzzy setsFuzzy setsXX universumuniversum, , spacespace; ; xx XXAA linguistic variablelinguistic variable, , conceptconcept, , fuzzy setfuzzy set..AA – a Membership Function (MF), determining – a Membership Function (MF), determining

the degreethe degree, , to which to which xx belongs to belongs to AA. .

Linguistic variablesLinguistic variables, concepts – sums of fuzzy sets., concepts – sums of fuzzy sets.Logical predicate functions with continuous valuesLogical predicate functions with continuous values. . Membership value: different from probability.Membership value: different from probability.

(bold) = 0.(bold) = 0.8 8 does not mean bold does not mean bold 1 1 in in 5 5 casescases..Probabilities are normalized to 1Probabilities are normalized to 1, , MF are notMF are not..

Fuzzy concepts are subjective and context-dependentFuzzy concepts are subjective and context-dependent..

: [0,1]A X

Fuzzy examplesFuzzy examplesFuzzy examplesFuzzy examples

Crisp and fuzzy conceptCrisp and fuzzy concept „ „young menyoung men””

„„Boiling temperatureBoiling temperature” ” has value around has value around 100 100 degreesdegrees ( (pressure, chemistrypressure, chemistry).).

22 100T

W T e

A=“young”

x [years]

1

0

A=“young”

x [years]

1

0

=0.8

x=23x=20

Few definitionsFew definitions

Support of a fuzzy set Support of a fuzzy set A:A:

supp(A) = { supp(A) = { xx XX : : AA((xx) > 0 }) > 0 }

CoreCore of a fuzzy set of a fuzzy set A:A:

core(A) = { core(A) = { xx XX : : AA((xx) ) =1 =1 }}

--cutcut of a fuzzy set of a fuzzy set A:A:

AA = { = { xx XX : : AA((xx) > ) > } }=0.6

HeightHeight = max= max xx AA((xx) ) 1 1

Normal fuzzy setNormal fuzzy set: : sup sup xx XX AA((xx) = 1 ) = 1

Definitions illustratedDefinitions illustratedDefinitions illustratedDefinitions illustrated

MF

X

.5

1

0Core

Crossover points

Support

- cut

Types of MFTypes of MF

x

(x)

1

0 a b c d

Trapezoid: <a,b,c,d>

x

(x)

1

0

Gaus/Bell: N(m,s)

c

; , , , max min , ,1 ,0x a d x

Trap x a b c db a d c

2 2/ 2; x aG x a e 2

1; ,

1bB x a b

x ab

MF exampleMF example

(x)

Singleton: (a,1) i (b,0.5)

x

1

0 a b

(x)

x

1

0 a b c

Triangular: <a,b,c>

; , , max min , ,0x a c x

T x a b cb a c b

Linguistic variablesLinguistic variablesLinguistic variablesLinguistic variablesW=20 => Age=young. L. variable = L. value.L. variable: : temperature terms, fuzzy sets : { cold, warm, hot}

x [C]

(x)

1

0

cold warm hot

4020

Fuzzy numbersFuzzy numbersFuzzy numbersFuzzy numbers

MP are usually convexMP are usually convex, , with single maximumwith single maximum..MPs for similar numbers overlapMPs for similar numbers overlap. .

NumbersNumbers: : corecore= = pointpoint, , xx ((xx)=1)=1Decrease monotonically on both sides of the coreDecrease monotonically on both sides of the core..TypicallyTypically: : triangular functionstriangular functions (a,b,c) (a,b,c) or singletonsor singletons. .

1/ 3 2 / 3 1 2 / 3 1/ 37

5 6 7 8 9F

( ) /i

A i ix X

A x x

( ) /A

X

A x x

Fuzzy rulesFuzzy rules

Commonsense knowledge may sometimes be captured Commonsense knowledge may sometimes be captured in an natural way using fuzzy rulesin an natural way using fuzzy rules. .

IFIF L-variable L-variable-1 = term-1 -1 = term-1 andand L-variableL-variable-2 = term-2-2 = term-2THENTHEN zm. zm. L-variable-3L-variable-3 = term-3 = term-3

IFIF TemperaturTemperaturee = = hothot andand air-condition priceair-condition price = = lowlowTHENTHEN coolingcooling = = strongstrong

What does it mean for fuzzy rules:What does it mean for fuzzy rules:IFIF xx is is A A then then yy isis B ? B ?

Fuzzy implicationFuzzy implicationIf => means correlationIf => means correlation T-norma T( T-norma T(AA,,BB)) is sufficient is sufficient..A=>B A=>B has many realizations. has many realizations.

Interpretation of implicationInterpretation of implicationInterpretation of implicationInterpretation of implication

IfIf xx isis A A thenthen yy isis B: B: correlationcorrelation or implicationor implication..

A=>B A=>B not A not A oror B BA entails BA entails B

A

B

x

y

A

B

y

x

A=>B A=>B A A andand B B

Types of rulesTypes of rulesTypes of rulesTypes of rules

FMR, Fuzzy Mapping Rules.FMR, Fuzzy Mapping Rules.Functional dependenciesFunctional dependencies, , fuzzy graphsfuzzy graphs, , approximation approximation problemsproblems..

Mamdani typeMamdani type: : IF MF IF MFAA(x)=high then MF(x)=high then MFBB(y)=medium(y)=medium..

Takagi-Sugeno type: IF MFTakagi-Sugeno type: IF MFAA(x)=high then y=f(x)=high then y=fAA(x) (x)

FIR, Fuzzy Implication Rules.FIR, Fuzzy Implication Rules.Logic of implications between fuzzy factsLogic of implications between fuzzy facts..

Linear fLinear fAA(x) – first order Sugeno type.(x) – first order Sugeno type.

FIS, Fuzzy Inference Systems.FIS, Fuzzy Inference Systems.Combine rules fuzzy rules to calculate final decisionsCombine rules fuzzy rules to calculate final decisions. .

Fuzzy systems Fuzzy systems F: F: nn pp use m rules to map use m rules to map vectorvector x on the output F(x)x on the output F(x), , vector or scalarvector or scalar. .

Fuzzy approximationFuzzy approximationFuzzy approximationFuzzy approximation

Function f : X Y

SSingletoningleton model model::RRii: IF : IF xx isis A Aii ThenThen y y is is bbii

IF Temperatura=chilly and Heating-price=expensive THEN heating=no

IF Temperature=freezing and Heating-price=cheap THEN heating=full

full full medium

full medium weak

medium weak no

Rules baseRules base

Temperature freezing cold chillyPrice

cheap

so-so

expensive

Heating

1. Fuzzification1. Fuzzification

t

1

0

chilly(T)=0.5

IF Temperature = chilly

15C p

1

0

cheap(p)=0.3

and Heating-price = cheap...

48 Euro/MBtu

0.5 0.3

Fuzzification: from measured values to MF:Determine membership degrees for all fuzzy sets (linguistic variables):

Temperature: T=15 C Heating-price: p=48 Euro/MBtu

2. 2. Term compositionTerm composition2. 2. Term compositionTerm composition

Calculate the degree of rule fulfillment for all conditionscombining terms using fuzzy AND, ex. MIN operator.

A(X) = A1(X1) A2(X2) AN(XN) for rules RA

all(X) = min{chilly(t), cheap(p)} = min{0.5,0.3} = 0.3

t

1

0

chilly(T)=0.5

IF Temperature=chilly

15 C p

1

0

cheap(p)=0.3

and Heat-price=cheap...

48 Euro/MBtu

0.5 0.3

3. 3. InferenceInference3. 3. InferenceInferenceCalculate the degree of truth of rule conclusion: use T-norms such as MIN or product to combine the degree of fulfillment of conditions and the MF of conclusion.

full(h)

THEN Heating=full

conclusions(h)

h

1

0cond=0.3...

h

1

0

mocno(h)

cond =0.3...

konkl(h)

Inference MINconcl=min{cond, full}

Inference

concl. = cond • full

4. Aggregation4. Aggregation

h

1

0

THEN Heating=fullTHEN Heating =mediumTHEN Heating =no

Aggregate all possible rule conclusion using MAX operator to calculate the sum.

5. Defuzzification5. DefuzzificationCalculate crisp value/decision using for example the “Center of Gravity” (COG) method:

h

1

0

concl(h) COG

73

For discrete sets a „center of singletons”, for continuous:

i i • Ai • ci

i i • Ai

i = degree of membership in iAi = area under MF for the set i ci = center of gravity for the set i.

h =

FIS for heating FIS for heating

Fuzzification Inference Defuzzification

T

freeze cold warm

Measured temperature

0.2

0.7

if temp=freezing then valve=open

if temp=cold then valve=half open

if temp=warm then valve=closed

Rule base

freeze=0.7

cold =0.2

hot =0.0

v

full half closed

Output that controls the valve position

0.2

0.7

Takagi-Sugeno rules

Mamdani rules: conclude that IF X1 = A1 i X2 =A2 … Xn = An Then Y = B

TS rules: conclude some functional dependence f(xi) IF X1 = A1 i X2 = A2 …. Xn = An Then Y=f(x1,x2,..xn)

TS rules are usually based on piecewise linear functions(equivalent to linear splines approximation): IF X1 = A1 i X2 = A2…Xn = An Then Y=a0 + a1x1 … +anxn

Fuzzy system in MatlabFuzzy system in Matlab

rulelist=[ 1 1 3 1 1 1 2 3 1 1 1 3 2 1 1 2 1 3 1 1 2 2 2 1 1 2 3 1 1 1 3 1 2 1 1 3 2 3 1 1 3 3 3 1 1];fis=addrule(fis,rulelist);showrule(fis)gensurf(fis);Surfview(fis);

first inputsecond inputoutputrule weightoperator (1=AND, 2=OR)

1. If (temperature is cold) and (oilprice is normal) then (heating is high) (1)

2. If (temperature is cold) and (oilprice is expensive) then (heating is medium) (1)

3. If (temperature is warm) and (oilprice is cheap) then (heating is high) (1)

4. If (temperature is warm) and (oilprice is normal) then (heating is medium) (1)

5. If (temperature is cold) and (oilprice is cheap) then (heating is high) (1)

6. If (temperature is warm) and (oilprice is expensive) then (heating is low) (1)

7. If (temperature is hot) and (oilprice is cheap) then (heating is medium) (1)

8. If (temperature is hot) and (oilprice is normal) then (heating is low) (1)

9. If (temperature is hot) and (oilprice is expensive) then (heating is low) (1)

Fuzzy Inference System (FIS)Fuzzy Inference System (FIS)Fuzzy Inference System (FIS)Fuzzy Inference System (FIS)

IF speed is slow then break = 2IF speed is medium then break = 4* speedIF speed is high then break = 8* speed

R1: w1 = .3; r1 = 2R2: w2 = .8; r2 = 4*2R3: w3 = .1; r3 = 8*2

speed2

.3

.8

.1

slow medium high

Break = (wi*ri) / wi

= 7.12

MF(speed)

First-order TS First-order TS FISFISFirst-order TS First-order TS FISFIS• Rules

IF X is A1 and Y is B1 then Z = p1*x + q1*y + r1

IF X is A2 and Y is B2 then Z = p2*x + q2*y + r2

• Fuzzy inference

A1 B1

A2 B2

x=3

X

X

Y

Yy=2

w1

w2

z1 =p1*x+q1*y+r1

z =

z2 =p2*x+q2*y+r2

w1+w2

w1*z1+w2*z2

Induction of fuzzy rulesInduction of fuzzy rulesAll this may be presented in form on networks. All this may be presented in form on networks.

Choices/adaptive parameters in fuzzy rules:Choices/adaptive parameters in fuzzy rules:

• The number of rules (nodes)The number of rules (nodes)..• The number of terms for each attribute.The number of terms for each attribute.• Position of the membership function Position of the membership function ((MFMF). ). • MF shape for each attribute/term.MF shape for each attribute/term.• Type of rules (conclusions)Type of rules (conclusions)..• Type of inference and composition operatorsType of inference and composition operators. . • Induction algorithmsInduction algorithms: : incremental or refinementincremental or refinement..• Type of learning procedure.Type of learning procedure.

Feature space partitionFeature space partition

Regular grid Independent functions

MFs on a gridMFs on a gridMFs on a gridMFs on a grid• AdvantageAdvantage: : simplest approachsimplest approach• Regular gridRegular grid: : divide each dimension in a fixed divide each dimension in a fixed

number of MFs and assign an average value from all number of MFs and assign an average value from all samples that belong to the regionsamples that belong to the region. .

• Irregular gridIrregular grid: : find largest error, divide the grid there find largest error, divide the grid there in two parts adding new MFin two parts adding new MF..

• Mixed methodMixed method: : start from regular grid, adapt start from regular grid, adapt parameters laterparameters later. .

• DisadvantagesDisadvantages: : for for kk dimensions and dimensions and NN MFs in each MFs in each NNkk areas are created areas are created !!Poor quality of approximationPoor quality of approximation..

Optimized MPOptimized MPOptimized MPOptimized MP• AdvantagesAdvantages: : higher accuracyhigher accuracy, , better approximation, less better approximation, less

functions, context dependent MPsfunctions, context dependent MPs..

• Optimized MP may come from:Optimized MP may come from:• Neurofuzzy systemsNeurofuzzy systems – – equivalent to equivalent to RBF RBF network with network with

Gaussian functions (several proofs). Gaussian functions (several proofs). FSM FSM models with triangular ormodels with triangular or trapezoidal functions.trapezoidal functions. Modified MLP networks with Modified MLP networks with bicentralbicentral functions, etc. functions, etc.

• Decision trees, fuzzy decision trees.Decision trees, fuzzy decision trees.• Fuzzy machine learning inductive systemsFuzzy machine learning inductive systems..

• DisadvantagesDisadvantages: : extraction of rules is hard, optimized extraction of rules is hard, optimized MPs are more difficult to create. MPs are more difficult to create.

Improving sets of rules.Improving sets of rules.• How to improve known sets of rulesHow to improve known sets of rules? ? • Use minimization methods to improve parameters of Use minimization methods to improve parameters of

fuzzy rulesfuzzy rules: : usually non-gradient methods are usedusually non-gradient methods are used; ; most often genetic algorithmsmost often genetic algorithms. .

• change rules into neural networkchange rules into neural network, , train the network train the network and convert it into rules againand convert it into rules again..

• Use heuristic methods for local adaptation of Use heuristic methods for local adaptation of parameters of individual rulesparameters of individual rules. .

• Fuzzy logic – good for modeling imprecise knowledgeFuzzy logic – good for modeling imprecise knowledge but but ......

• How do the decision borders of FIS look like? How do the decision borders of FIS look like? Is it worthwhile to make input fuzzy and output crispIs it worthwhile to make input fuzzy and output crisp? ?

• Is it the best approximation methodIs it the best approximation method? ?

Fuzzy rules and data uncertaintyFuzzy rules and data uncertaintyData has been measured with unknown error. Assume Gaussian distribution:

( ; , )x xx G G y x s

x – fuzzy number with Gaussian membership function.

A set of logical rules R is used for fuzzy input vectors: Monte Carlo simulations for arbitrary system => p(Ci|X)

Analytical evaluation p(C|X) is based on cumulant:

1; , 1 erf ( )

2 2

a

x

x

a xa x G y x s dy a x

s

2.4 / 2 xs Error function is identical to logistic f. < 0.02

Fuzzification of crisp rulesFuzzification of crisp rules

Rule Ra(x) = {xa} is fulfilled by Gx with probability

( ) T ; , ( )a x x

a

p R G G y x s dy x a

Error function is approximated by logistic function; assuming error distribution (x)x)),

for s2=1.7 approximates Gauss < 3.5%

Rule Rab(x) = {b> x a} is fulfilled by Gx with probability:

( ) T ; , ( ) ( )b

ab x x

a

p R G G y x s dy x a x b

Soft trapezoids and NNSoft trapezoids and NN

Conclusion: fuzzy logic with (x) (xb) m.f. is equivalent to crisp logic + Gaussian uncertainty. Gaussian classifiers (RBF) are equivalent to fuzzy systems with Gaussian membership functions.

The difference between two sigmoids makes a soft trapezoidal membership functions.

Optimization of rulesOptimization of rules

Fuzzy: large receptive fields, rough estimations.Gx – uncertainty of inputs, small receptive fields.

Minimization of the number of errors – difficult, non-gradient, but now Monte Carlo or analytical p(C|X;M).

21{ }; , | ; ( ),

2x i iX i

E X R s p C X M C X C

• Gradient optimization works for large number of parameters.

• Parameters sx are known for some features, use them as

optimization parameters for others! • Probabilities instead of 0/1 rule outcomes.• Vectors that were not classified by crisp rules have now non-

zero probabilities.

SummarySummarySummarySummary• Fuzzy sets/logic is a useful form of knowledge Fuzzy sets/logic is a useful form of knowledge

representation, allowing for approximate but natural representation, allowing for approximate but natural expression of some types of knowledge.expression of some types of knowledge.

• An alternative way is to include uncertainty of input data An alternative way is to include uncertainty of input data while using crisp logic rules.while using crisp logic rules.

• Adaptation of fuzzy rule parameters leads to neurofuzzy Adaptation of fuzzy rule parameters leads to neurofuzzy systems; the simplest are the RBF networks and systems; the simplest are the RBF networks and Separable Function Networks (SFN), equivalent to any Separable Function Networks (SFN), equivalent to any fuzzy inference systems.fuzzy inference systems.

• Results may sometimes be better than with other Results may sometimes be better than with other systems since it is easier to include a priori knowledge systems since it is easier to include a priori knowledge in fuzzy systems. in fuzzy systems.

DisclaimerDisclaimerDisclaimerDisclaimerA few slides/figures were taken from various presentations found in A few slides/figures were taken from various presentations found in the Internet; unfortunately I cannot identify original authors at the the Internet; unfortunately I cannot identify original authors at the moment, since these slides went through different iterations; one moment, since these slides went through different iterations; one source seems to be J.-S. Roger Jang from NTHU, Taiwan. source seems to be J.-S. Roger Jang from NTHU, Taiwan.

I have to apologize for that.I have to apologize for that.

Data mining II The fuzzy way Włodzisław Duch Dept. of Informatics, Nicholas Copernicus University, Toruń, Poland duch ISEP.

Documents

space x x

sup x x

ab x x

types of mf x x

fuzzy sets x universum

young x years

fuzzy examples

fuzzy logicsystems