Chase Methods based on Knowledge Discovery

Chase Methods

based on Knowledge Discovery

Agnieszka Dardzinska & Zbigniew W. Ras [email protected] & [email protected]

X Faculty-Name Dept. Chair

x1 Bob

x2 John Jones

x3 Mike

x4 EE

x5 Tom EE

GIVEN: Incomplete Information System (IIS) Constraints (functional dependencies,..) which IIS satisfies

[ Dept Chair, Chair Dept Faculty-Name, … Dept(x1) =Dept(x2) ]

Algorithm Chase

Tableau System for IIS – information system with null values replaced by variables

X Faculty Name Department Chair

x1 Bob vd n1

x2 John vd Jones

x3 Mike n2 n3

x4 vE EE n4

x5 Tom EE n5

distinguished variables, one for each attribute (if b is an attribute

of interest, then vb is the corresponding distinguished variable)

nondistinguished variables (there are countably many of them:

n1, n2, n3, ….)

Variables in Tableaux System


x1 Bob vd n1

x2 John vd Jones

x3 Mike n2 n3

x4 vE EE n4

x5 Tom EE n5


x1 Bob vd Jones

x2 John vd Jones

x3 Mike n2 n3

x4 Tom EE n4

x5 Tom EE n4

Functional Dependencies:

[Department → Chair]

[Department *Chair → Faculty Name]

Input: tableaux system S and set of functional dependencies F

Output: tableaux system CHASEF(S)

BeginS1:=S;

while there are t1, t2 S1 and (B b) F

such that t1[B]= t2[B] and t1[b] < t2[b]

do change all the occurrences of the value

t2[b] in S1 to t1[b]

CHASEF(S):=S1

End

Algorithm Chase

Input: tableaux system S and set of functional dependencies F

Output: tableaux system CHASEF(S)

Begin

S1:=S;

while there are t1, t2 S1 and (B b) F

such that t1[B]= t2[B] and t1[b] < t2[b]

do change all the occurrences of the value

t2[b] in S1 to t1[b]

CHASEF(S):=S1

EndThe algorithm always terminates if applied to a finite tableaux system. If one execution of the algorithm generates a tableaux system satisfying F, then every execution of the algorithm generates the same tableaux system.

Algorithm Chase

Algorithm Chase 1

1. Chase 1 identifies all incomplete attributes (their values are called

concepts) in IS .

2. Main Algorithm

- Extraction of rules from IS describing these concepts,

- Null values in IS are replaced by values suggested by these

rules.

3. These two steps are repeated till fixpoint is reached.

Chase supported by rules extracted from IIS (Chase 1)

X b c d e f g

x1 b1 c1 e2 f1

x2 b2 c2 d2 e1 f2 g3

x3 b1 c1 d3 e1 f1 g1

x4 b3 c3 d3 e3 f1 g3

x5 b2 c2 e3 f1 g2

x6 c1 d2 f2 g1

x7 b1 d2 e2 f4 g1

x8 d2 e2 f2 g3

x9 b3 c1 d1 f2

x10 b2 c1 e3 f4 g2

X = {x1, x2, x3, x4, x5, x6, x7, x8, x9, x10}

A = {b, c, d, e, f, g}

e2 b1 (support 2), c1 f1 b1 (support 2), g2 b2 (support 2), c3 b3 (support 1), c2 b2 (support 2), g3d2 b2 (support 1), e3d3 b3 (support 1), f2d2 b2 (support 1).

Attribute b

),,( VAXS

Example (Chase1)

X b c d e f g

x1 b1 c1 e2 f1

x2 b2 c2 d2 e1 f2 g3

x3 b1 c1 d3 e1 f1 g1

x4 b3 c3 d3 e3 f1 g3

x5 b2 c2 e3 f1 g2

x6 c1 d2 f2 g1

x7 b1 d2 e2 f4 g1

x8 d2 e2 f2 g3

x9 b3 c1 d1 f2

x10 b2 c1 e3 f4 g2

Attribute b

Two null values in S: b(x6), b(x8)

b(x6):e2 b1 (support 2), c1 f1 b1 (support 2), g2 b2 (support 2), c3 b3 (support 1), c2 b2 (support 2), g3d2 b2 (support 1), e3d3 b3 (support 1), f2d2 b2 (support 1).

Example (Chase1)

X b c d e f g

x1 b1 c1 e2 f1

x2 b2 c2 d2 e1 f2 g3

x3 b1 c1 d3 e1 f1 g1

x4 b3 c3 d3 e3 f1 g3

x5 b2 c2 e3 f1 g2

x6 c1 d2 f2 g1

x7 b1 d2 e2 f4 g1

x8 d2 e2 f2 g3

x9 b3 c1 d1 f2

x10 b2 c1 e3 f4 g2

Attribute b


b(x6):e2 b1 (support 2), c1 f1 b1 (support 2), g2 b2 (support 2), c3 b3 (support 1), c2 b2 (support 2), g3d2 b2 (support 1), e3d3 b3 (support 1), f2d2 b2 (support 1).

Example (Chase1)

X b c d e f g

x1 b1 c1 e2 f1

x2 b2 c2 d2 e1 f2 g3

x3 b1 c1 d3 e1 f1 g1

x4 b3 c3 d3 e3 f1 g3

x5 b2 c2 e3 f1 g2

x6 c1 d2 f2 g1

x7 b1 d2 e2 f4 g1

x8 d2 e2 f2 g3

x9 b3 c1 d1 f2

x10 b2 c1 e3 f4 g2

Attribute b


b(x8):e2 b1 (support 2),

c1 f1 b1 (support 2),

g2 b2 (support 2),

c3 b3 (support 1),

c2 b2 (support 2),

g3d2 b2 (support 1),

e3d3 b3 (support 1),

f2d2 b2 (support 1). b(x6) = b2

Example (Chase1)

X b c d e f g

x1 b1 c1 e2 f1

x2 b2 c2 d2 e1 f2 g3

x3 b1 c1 d3 e1 f1 g1

x4 b3 c3 d3 e3 f1 g3

x5 b2 c2 e3 f1 g2

x6 c1 d2 f2 g1

x7 b1 d2 e2 f4 g1

x8 d2 e2 f2 g3

x9 b3 c1 d1 f2

x10 b2 c1 e3 f4 g2

c(x7):

b(x6) = b2

Two null values in S: c(x7), c(x8).

b1 c1 (support 2), e2 c1 (support 1), f4 c1 (support 1), g1 c1 (support 2), b2 d2 c2 (support 1), b2e1 c2 (support 1), b2f2 c2 (support 1), b2g3 c2 (support 1), d2e1 c2 (support 1),d2g3 c2 (support 1).

Example (Chase1)

X b c d e f g

x1 b1 c1 e2 f1

x2 b2 c2 d2 e1 f2 g3

x3 b1 c1 d3 e1 f1 g1

x4 b3 c3 d3 e3 f1 g3

x5 b2 c2 e3 f1 g2

x6 c1 d2 f2 g1

x7 b1 d2 e2 f4 g1

x8 d2 e2 f2 g3

x9 b3 c1 d1 f2

x10 b2 c1 e3 f4 g2

c(x7):

b(x6) = b2


b1 c1 (support 2),

e2 c1 (support 1),

f4 c1 (support 1),

g1 c1 (support 2),

b2 d2 c2 (support 1),

b2e1 c2 (support 1),

b2f2 c2 (support 1),

b2g3 c2 (support 1),

d2e1 c2 (support 1),

d2g3 c2 (support 1).

Example (Chase1)

X b c d e f g

x1 b1 c1 e2 f1

x2 b2 c2 d2 e1 f2 g3

x3 b1 c1 d3 e1 f1 g1

x4 b3 c3 d3 e3 f1 g3

x5 b2 c2 e3 f1 g2

x6 c1 d2 f2 g1

x7 b1 d2 e2 f4 g1

x8 d2 e2 f2 g3

x9 b3 c1 d1 f2

x10 b2 c1 e3 f4 g2

c(x8):

c(x7) = c1


b1 c1 (support 2),

e2 c1 (support 1),

f4 c1 (support 1),

g1 c1 (support 2),

b2 d2 c2 (support 1),

b2e1 c2 (support 1),

b2f2 c2 (support 1),

b2g3 c2 (support 1),

d2e1 c2 (support 1),

d2g3 c2 (support 1).b(x6) = b2 ,

Example (Chase1)

Input: System S=(X, A, V)Set of incomplete attributes In(A)={a1, a2, …, ak}Set of rules L(D)

Output: System Chase1(S)begin j:=1; while j ≤ k do begin

Sj:=Sfor all vVaj do

while

there is xX and rule (t v)L(D) such that xNSj(t) and card(aj(x))≠1

begina(x):=v;

endj:=j+1

end S:={Sj:1 ≤ j ≤ k}, Chase1 (S, In(A), L(D)) end

Algorithm Chase1(S, In(A), L(D))

A1- “the no. of different attributes

used in a query"

A2- “the percent of null values

in a queried IS"

A3- “the no. of objects returned

by QAS when IS-complete"


by QAS when IS-incomplete"

(optimistic interpretation)


by QAS based on rule-based

chase algorithm"

(pessimistic interpretation)

A6- “the no. of bad objects retrieved"

A7- “the no. of passes of rule-based

chase algorithm"

query A1 A2 A3 A4 A6 A7A5

4

248

q1

q1

q1

q1

q2

q2

q2

q2

3

3

33

4

44

4

4

4

4

4

4

4

44

2

2

33

333

2

13

12

11

1

1

12

0

5

5

8

14

1414

14

13131313 13

8

15

17

9

151727 16

2q3

q3

q3

q3

2222 4 2

22

2

23

381214 22

222222 25

252732

3028 23

22

ZOO Database

Rules Discovery

from partially

Incomplete Information Systems

Information System S = ( X, A, V )

X - finite set of objects,

A - finite set of attributes,

- set of their values. AaVV a :

)( xa

Assumption

1. For any , Aa Xx

2. For any ba ba VV

)(: xaJi{ ai Va 1)(

xaJi

ip }),( ii pa

Data (Incomplete)

x8

x7

x6

x5

x4

x3

x2

x1

edcbaX

),( 32

2a),( 3

11b

1e

1c1d

),( 21

1e),( 2

12e

),( 31

1a ),( 32

1b),( 3

12b

),( 41

2a),( 4

33a ),( 3

22b 2d

1a 2b ),( 21

1c),( 2

13c 2d

3e

3a 2c1d

),( 32

1e),( 3

12e

),( 32

1a),( 3

12a 1b 2c

1e

2a 2b 3c 2d),( 3

12e

),( 32

3e

2a ),( 41

1b),( 4

32b

),( 31

1c),( 3

22c 2d 2e

3a2b 1c

1d 3e

Example

Goal: Describe e in terms of {a,b,c,d}

x8

x7

x6

x5

x4

x3

x2

x1

edcbaX

),( 32

2a),( 3

11b

1e

1c1d

),( 21

1e),( 2

12e

),( 31

1a ),( 32

1b),( 3

12b

),( 41

2a),( 4

33a ),( 3

22b 2d

1a 2b ),( 21

1c),( 2

13c 2d

3e

3a 2c1d

),( 32

1e),( 3

12e

),( 32

1a),( 3

12a 1b 2c

1e

2a 2b 3c 2d),( 3

12e

),( 32

3e

2a ),( 41

1b),( 4

32b

),( 31

1c),( 3

22c 2d 2e

3a2b 1c

1d 3e

)},(),1,(),,{(* 32

5331

11 xxxa

)}1,(),1,(),,(),,(),,{(* 7631

541

232

12 xxxxxa

)}1,(),1,(),,{(* 8443

23 xxxa

)},(),1,(),,(),,(),,{(* 41

7521

431

231

11 xxxxxb

)}1,(),,(

),1,(),,(),1,(),,(),,{(*

843

7

621

4332

231

12

xx

xxxxxb

Algorithm ERID for Extracting Rules from partially Incomplete Information System


x8

x7

x6

x5

x4

x3

x2

x1

edcbaX

),( 32

2a),( 3

11b

1e

1c1d

),( 21

1e),( 2

12e

),( 31

1a ),( 32

1b),( 3

12b

),( 41

2a),( 4

33a ),( 3

22b 2d

1a 2b ),( 21

1c),( 2

13c 2d

3e

3a 2c1d

),( 32

1e),( 3

12e

),( 32

1a),( 3

12a 1b 2c

1e

2a 2b 3c 2d),( 3

12e

),( 32

3e

2a ),( 41

1b),( 4

32b

),( 31

1c),( 3

22c 2d 2e

3a2b 1c

1d 3e

)}1,(),,(),,(),,(),1,{(* 831

721

331

211 xxxxxc

)},(),1,(),1,(),,{(* 32

25431

22 xxxxc

)}1,(),,(),,{(* 621

331

23 xxxc

)}1,(),,(),1,(),1,{(* 821

5411 xxxxd

)}1,(),1,(),,(),1,(),1,{(* 7621

5322 xxxxxd

Algorithm ERID for Extracting Rules from partially Incomplete Information System


x8

x7

x6

x5

x4

x3

x2

x1

edcbaX

),( 32

2a),( 3

11b

1e

1c1d

),( 21

1e),( 2

12e

),( 31

1a ),( 32

1b),( 3

12b

),( 41

2a),( 4

33a ),( 3

22b 2d

1a 2b ),( 21

1c),( 2

13c 2d

3e

3a 2c1d

),( 32

1e),( 3

12e

),( 32

1a),( 3

12a 1b 2c

1e

2a 2b 3c 2d),( 3

12e

),( 32

3e

2a ),( 41

1b),( 4

32b

),( 31

1c),( 3

22c 2d 2e

3a2b 1c

1d 3e

For the values of the decision attribute we have:

)}1,(),,(),1,(),,{(* 532

4221

11 xxxxe

)}1,(),,(),,(),,{(* 731

631

421

12 xxxxe

)}1,(),,(),1,{(* 832

633 xxxe

Algorithm ERID

Goal: Describe e in terms of {a,b,c,d}.

x8

x7

x6

x5

x4

x3

x2

x1

edcbaX

),( 32

2a),( 3

11b

1e

1c1d

),( 21

1e),( 2

12e

),( 31

1a ),( 32

1b),( 3

12b

),( 41

2a),( 4

33a ),( 3

22b 2d

1a 2b ),( 21

1c),( 2

13c 2d

3e

3a 2c1d

),( 32

1e),( 3

12e

),( 32

1a),( 3

12a 1b 2c

1e

2a 2b 3c 2d),( 3

12e

),( 32

3e

2a ),( 41

1b),( 4

32b

),( 31

1c),( 3

22c 2d 2e

3a2b 1c

1d 3e

2. Check the relationship “ ”

between values of classification

attributes {a,b,c,d} and values

of decision attribute e

Algorithm ERID


x8

x7

x6

x5

x4

x3

x2

x1

edcbaX

),( 32

2a),( 3

11b

1e

1c1d

),( 21

1e),( 2

12e

),( 31

1a ),( 32

1b),( 3

12b

),( 41

2a),( 4

33a ),( 3

22b 2d

1a 2b ),( 21

1c),( 2

13c 2d

3e

3a 2c1d

),( 32

1e),( 3

12e

),( 32

1a),( 3

12a 1b 2c

1e

2a 2b 3c 2d),( 3

12e

),( 32

3e

2a ),( 41

1b),( 4

32b

),( 31

1c),( 3

22c 2d 2e

3a2b 1c

1d 3e

Niiii pxc )},{(* Njjjj qye )},{(*Let , .

and confidence of the rule are above some threshold values.

We say that:** ji ec iff support

ji ec

Algorithm ERID


x8

x7

x6

x5

x4

x3

x2

x1

edcbaX

),( 32

2a),( 3

11b

1e

1c1d

),( 21

1e),( 2

12e

),( 31

1a ),( 32

1b),( 3

12b

),( 41

2a),( 4

33a ),( 3

22b 2d

1a 2b ),( 21

1c),( 2

13c 2d

3e

3a 2c1d

),( 32

1e),( 3

12e

),( 32

1a),( 3

12a 1b 2c

1e

2a 2b 3c 2d),( 3

12e

),( 32

3e

2a ),( 41

1b),( 4

32b

),( 31

1c),( 3

22c 2d 2e

3a2b 1c

1d 3e

Niiii pxc )},{(* Njjjj qye )},{(*Let , .

and confidence of the rule

are above some threshold values.

We say that:

** ji ec iff support

ji ec

How to define support and confidence

of a rule ?ji ec

Algorithm ERID

To define support and confidence of the rule a1 e3 we compute:

)},(),1,(),,{(* 32

5331

11 xxxa

10110)sup( 32

31

31 ea

)}1,(),,(),1,{(* 832

633 xxxe

)sup(

)sup()(

1

3131 a

eaeaconf

21)sup( 32

31

1 a

Support of the rule:

Support of the term a1:

Confidence of the rule:

Definition of Support and Confidence (by example)


x8

x7

x6

x5

x4

x3

x2

x1

edcbaX

),( 32

2a),( 3

11b

1e

1c1d

),( 21

1e),( 2

12e

),( 31

1a ),( 32

1b),( 3

12b

),( 41

2a),( 4

33a ),( 3

22b 2d

1a 2b ),( 21

1c),( 2

13c 2d

3e

3a 2c1d

),( 32

1e),( 3

12e

),( 32

1a),( 3

12a 1b 2c

1e

2a 2b 3c 2d),( 3

12e

),( 32

3e

2a ),( 41

1b),( 4

32b

),( 31

1c),( 3

22c 2d 2e

3a2b 1c

1d 3e

** 11 ea )1(sup 65 - marked negative

** 21 ea )1(sup 61

** 31 ea )11(sup - marked positive

)5.0( conf

Thresholds (provided by user):

Minimal support (λ1 = 1)

Minimal confidence (λ2 = ½)

- marked negative

Extracting Rules from partially Incomplete Information System(Algorithm ERID(λ1, λ2))

x8

x7

x6

x5

x4

x3

x2

x1

edcbaX

),( 32

2a),( 3

11b

1e

1c1d

),( 21

1e),( 2

12e

),( 31

1a ),( 32

1b),( 3

12b

),( 41

2a),( 4

33a ),( 3

22b 2d

1a 2b ),( 21

1c),( 2

13c 2d

3e

3a 2c1d

),( 32

1e),( 3

12e

),( 32

1a),( 3

12a 1b 2c

1e

2a 2b 3c 2d),( 3

12e

),( 32

3e

2a ),( 41

1b),( 4

32b

),( 31

1c),( 3

22c 2d 2e

3a2b 1c

1d 3e

**

33ea )11(sup but )36.0( conf

**

12eb )1(sup 6

7 but )22.0( conf**

22eb )1(sup 12

17 but )27.0( conf**

31ec )1(sup 2

3 but )47.0( conf**

22ec )11(sup but )33.0( conf

**

11ed )1(sup 3

5 but )48.0( conf**

31ed )11(sup but )28.0( conf

**

12ed )1(sup 2

3 but )33.0( conf**

32ed )1(sup 3

5 but )37.0( conf

Algorithm ERID(λ1, λ2)

x8

x7

x6

x5

x4

x3

x2

x1

edcbaX

),( 32

2a),( 3

11b

1e

1c1d

),( 21

1e),( 2

12e

),( 31

1a ),( 32

1b),( 3

12b

),( 41

2a),( 4

33a ),( 3

22b 2d

1a 2b ),( 21

1c),( 2

13c 2d

3e

3a 2c1d

),( 32

1e),( 3

12e

),( 32

1a),( 3

12a 1b 2c

1e

2a 2b 3c 2d),( 3

12e

),( 32

3e

2a ),( 41

1b),( 4

32b

),( 31

1c),( 3

22c 2d 2e

3a2b 1c

1d 3e

)11(sup but )36.0( conf

)1(sup 67 but )22.0( conf

)1(sup 1217 but )27.0( conf







They all are not marked

**

33ea

**

12eb

**

22eb

**

31ec

**

22ec

**

11ed

**

31ed

**

12ed

**

32ed


x8

x7

x6

x5

x4

x3

x2

x1

edcbaX

),( 32

2a),( 3

11b

1e

1c1d

),( 21

1e),( 2

12e

),( 31

1a ),( 32

1b),( 3

12b

),( 41

2a),( 4

33a ),( 3

22b 2d

1a 2b ),( 21

1c),( 2

13c 2d

3e

3a 2c1d

),( 32

1e),( 3

12e

),( 32

1a),( 3

12a 1b 2c

1e

2a 2b 3c 2d),( 3

12e

),( 32

3e

2a ),( 41

1b),( 4

32b

),( 31

1c),( 3

22c 2d 2e

3a2b 1c

1d 3e

*)*( 313 eca )11(sup and )8.0( conf

*)*( 313 eda )11(sup and )5.0( conf

*)*( 323 eda )10(sup *)*( 122 edb )1(sup 3

2 *)*( 222 ecb )1(sup 2

1


x8

x7

x6

x5

x4

x3

x2

x1

edcbaX

),( 32

2a),( 3

11b

1e

1c1d

),( 21

1e),( 2

12e

),( 31

1a ),( 32

1b),( 3

12b

),( 41

2a),( 4

33a ),( 3

22b 2d

1a 2b ),( 21

1c),( 2

13c 2d

3e

3a 2c1d

),( 32

1e),( 3

12e

),( 32

1a),( 3

12a 1b 2c

1e

2a 2b 3c 2d),( 3

12e

),( 32

3e

2a ),( 41

1b),( 4

32b

),( 31

1c),( 3

22c 2d 2e

3a2b 1c

1d 3e

)11(sup and )8.0( conf


)10(sup

)1(sup 32

)1(sup 21

They all are marked positive.

*)*( 313 eca

*)*( 313 eda

*)*( 323 eda

*)*( 122 edb

*)*( 222 ecb


x8

x7

x6

x5

x4

x3

x2

x1

edcbaX

),( 32

2a),( 3

11b

1e

1c1d

),( 21

1e),( 2

12e

),( 31

1a ),( 32

1b),( 3

12b

),( 41

2a),( 4

33a ),( 3

22b 2d

1a 2b ),( 21

1c),( 2

13c 2d

3e

3a 2c1d

),( 32

1e),( 3

12e

),( 32

1a),( 3

12a 1b 2c

1e

2a 2b 3c 2d),( 3

12e

),( 32

3e

2a ),( 41

1b),( 4

32b

),( 31

1c),( 3

22c 2d 2e

3a2b 1c

1d 3e



)10(sup

)1(sup 32

)1(sup 21

They all are marked positive.

They all are marked negative.

*)*( 313 eca

*)*( 313 eda

*)*( 323 eda

*)*( 122 edb

*)*( 222 ecb


x8

x7

x6

x5

x4

x3

x2

x1

edcbaX

),( 32

2a),( 3

11b

1e

1c1d

),( 21

1e),( 2

12e

),( 31

1a ),( 32

1b),( 3

12b

),( 41

2a),( 4

33a ),( 3

22b 2d

1a 2b ),( 21

1c),( 2

13c 2d

3e

3a 2c1d

),( 32

1e),( 3

12e

),( 32

1a),( 3

12a 1b 2c

1e

2a 2b 3c 2d),( 3

12e

),( 32

3e

2a ),( 41

1b),( 4

32b

),( 31

1c),( 3

22c 2d 2e

3a2b 1c

1d 3e

The algorithm continues for terms

of length 3, 4, … till all of them

have either positive or negative

marks.

Rules are automatically constructed

from relations marked positive.


Algorithm Chase 2(for Partially IIS)

),,( VAXS - partially incomplete information system of type λ, if S

is incomplete and the following three conditions hold:

Xx )(xaS is defined for any , Aa

]1})1:),{()([()()( iiiS pmipaxaAaXx )])((})1:),{()([()()( iiiS pimipaxaAaXx

Algorithm Chase 2

S1, S2 - partially incomplete, both of type λ and both classifying

the same sets of objects (from X) using the same sets of attributes (A)

Let }1:),{)( 1111mipaxa iiS and }.1:),{()( 2222

mipaxa iiS

The pair (S1, S2) satisfies containment relation Ψ (or Ψ(S1)= S2) if:

))](())(([)()(21

xacardxacardAaXx SS |]]|||[))](())(([[)()( 112221 j

jiij

jiiSS ppppxacardxacardAaXx

We also denote that fact by )]())(([)()(21

xaxaAaXx SS

Algorithm Chase 2

x8

x7

x6

x5

x4

x3

x2

x1

edcbaX

),( 32

2a),( 3

11b

1e

1c1d

),( 21

1e),( 2

12e

),( 31

1a ),( 32

1b),( 3

12b

),( 41

2a),( 4

33a ),( 3

22b 2d

1a 2b ),( 21

1c),( 2

13c 2d

3e

3a 2c1d

),( 32

1e),( 3

12e

),( 32

1a),( 3

12a 1b 2c

1e

2a 2b 3c 2d),( 3

12e

),( 32

3e

2a ),( 41

1b),( 4

32b

),( 31

1c),( 3

22c 2d 2e

3a2b 1c1d 3e

System S1 System S2

x8

c2x7

x6

x5

x4

x3

x2

x1

edcbaX

),( 32

2a

1e

1c1d

),( 21

1e),( 2

12e

),( 31

1a ),( 32

1b),( 3

12b

),( 41

2a),( 4

33a 2d

1a 2b ),( 31

1c),( 3

23c 2d

3e

3a 2c1d

),( 32

1a),( 3

12a 1b 2c

1e

2a 2b 3c 2d),( 3

12e

),( 32

3e

2a ),( 41

1b),( 4

32b 2d 2e

2b 1c1d 3e),( 3

21a

),( 31

2a

2b),( 3

11c

),( 32

2c

2e

Assumptions: - information system of type λ - set of all pairwise independent rules extracted by ERID from S

),,( VAXS )}(:){()( AIncDvtDL c

NS(t) - standard interpretation of term t in S, meaning that:

)}(),(:),{()( xapvpxvN S , for any aVv

)()()( 2121 tNtNttN SSS

)()()( 2121 tNtNttN SSS

where for any , we have:IiiiS pxtN )},{()( 1 JjjjS qxtN )},{()( 2

JIiiiiJIiiiIJjjjSS qpxpxpxtNtN )},max(,{()},{()},{()()( \\21

JIiiiiSS qpxtNtN )},{()()( 21

In(A) = {a1, … , ak} - incomplete attributes in S

..................

;0:,:)( jj nxbfor all do begin

if and is a maximal subset of rules from L(D) such that

then if thenbegin

end endpj:= pj +nj;

endif /containment relation holds between aj(x), [bj(x)/pj]/then

jaVv 1))(( xacard j }:){( Iivt i

)(),( iSi tNpxj

)]sup()([ vtvtconfp iiiIi

)])}sup()([,{()(:)( vtvtconfpvxbxb iiiIijj

)]sup()([: vtvtconfpnn iiiIijj

]/)([))(( jjj pxbxa ]/)([:)( jjj pxbxa

..................

Algorithm Chase 2 (S, In(A), L(D))

x8

x7

x6

x5

x4

x3

x2

x1

edcbaX

),( 32

2a),( 3

11b

1e

1c1d

),( 21

1e),( 2

12e

),( 31

1a ),( 32

1b),( 3

12b

),( 41

2a),( 4

33a ),( 3

22b 2d

1a 2b ),( 21

1c),( 2

13c 2d

3e

3a 2c1d

),( 32

1e),( 3

12e

),( 32

1a),( 3

12a 1b 2c

1e

2a 2b 3c 2d),( 3

12e

),( 32

3e

2a ),( 41

1b),( 4

32b

),( 31

1c),( 3

22c 2d 2e

3a2b 1c

1d 3e

][ 311 ear ,1)sup( 1 r 5.0)( 1 rconf

][ 222 ear ,)sup( 35

2 r 51.0)( 2 rconf

][ 114 ebr ,2)sup( 4 r 72.0)( 4 rconf

][ 325 ebr ,)sup( 38

5 r 51.0)( 5 rconf

][ 31110 edcr ,1)sup( 10 r 5.0)( 10 rconf

Incomplete Information System Sof type λ = 0.3

λ1=1, λ2=0.5ERID(λ1, λ2)

x8

x7

x6

x5

x4

x3

x2

x1

edcbaX

),( 32

2a),( 3

11b

1e

1c1d

),( 21

1e),( 2

12e

),( 31

1a ),( 32

1b),( 3

12b

),( 41

2a),( 4

33a ),( 3

22b 2d

1a 2b ),( 21

1c),( 2

13c 2d

3e

3a 2c1d

),( 32

1e),( 3

12e

),( 32

1a),( 3

12a 1b 2c

1e

2a 2b 3c 2d),( 3

12e

),( 32

3e

2a ),( 41

1b),( 4

32b

),( 31

1c),( 3

22c 2d 2e

3a2b 1c

1d 3e

Incomplete Information System Sof type λ = 0.3

λ1=1, λ2=0.5ERID(λ1, λ2)

Algorithm Chase 2 will try to replace

)},(),,{()( 21

221

11 eexe

by enew(x1) = {(e1, ), (e2, ), (e3, )}.

We will show that Ψ(e(x1)) = enew(x1)(the value e(x1) will be changed by Chase 2).

x8

x7

x6

x5

x4

x3

x2

x1

edcbaX

),( 32

2a),( 3

11b

1e

1c1d

),( 21

1e),( 2

12e

),( 31

1a ),( 32

1b),( 3

12b

),( 41

2a),( 4

33a ),( 3

22b 2d

1a 2b ),( 21

1c),( 2

13c 2d

3e

3a 2c1d

),( 32

1e),( 3

12e

),( 32

1a),( 3

12a 1b 2c

1e

2a 2b 3c 2d),( 3

12e

),( 32

3e

2a ),( 41

1b),( 4

32b

),( 31

1c),( 3

22c 2d 2e

3a2b 1c

1d 3e

For x1:

)24.0,(

))3/14/()111(,(

3

21

10051

38

31

21

31

3

e

e

)97.0,())3/5/()(,( 2100

5135

32

2 ee

)48.0,()2/)2(,( 110072

32

1 ee

we have: )}24.0,(),97.0,(),48.0,{()( 3211 eeexe

Because the confidence assigned to e3 is below the threshold λ, then only two values remain:

(e1, 0.48), (e2, 0.97).The value of attribute e assigned to x1 is:

{(e1, 0.33), (e2, 0.67)}.Incomplete Information System Sof type λ = 0.3

Distributed Chase Algorithm (Chase 3)

Let:

),}({ LSDS Iii - distributed autonomous information systems

),,( iiii VAXS - information system for any Ii (I - set of sites)

Ii

iki DD

, - knowledge-base at site Ii

ikD , - set of (k, i)-rules (constructed at site k and sent

to site i)

Strategy for constructing

knowledge-base and Algorithm Chase 3Ii

iki DD

,

Notation:

qS=[a, b, c : d, e] - request by S for definitions of a, b, c

with additional information that d, e

are complete attributes in S.

Global Ontology

g a b c

g1 b2

g1 a2 b1 c2

g1 a2 c1

g1 a1 b1 c1

S2 b a d e

a1 d2

b2 a2 d2 e2

b1 a2 d1 e1

d1

S1

a b c d

a1 b2

b1 c2

a2 b2 d2

a2 b1 c1

rule support systemS KB

KBS

r1r2

qS=[a, c, d : b]qS

Global Ontology

g a b c

g1 b2

g1 a2 b1 c2

g1 a2 c1

g1 a1 b1 c1

S2 b a d e

a1 d2

b2 a2 d2 e2

b1 a2 d1 e1

d1

S1

a b c d

a1 b2

b1 c2

a2 b2 d2

a2 b1 c1

rule support system

b1a2 1 S

b2*d2a2 1 S

b2a2 1 S1

c1*b1a1 1 S2

S

r1

r2

KBS

r1r2

qS=[a, c, d : b]qS

Assumption:

.

),,( iiii VAXS

Di - granularity level of values of attributes used in rules from Di may differ from the granularity level of values of attribute used in descriptions of objects in

Chase 3 algorithm to be applicable to Si has to be based on rules from Di satisfying the following two conditions:

.

attribute value used in the decision part of a rule has the granularity level either equal to or finer than the granularity level of the corresponding attribute in Si

the granularity level of any attribute used in the classification part of a rule is either equal or softer than the granularity level of the corresponding attribute in Si

Hierarchical attributes: age, salaryRule in Di: (age, young) (salary, 40k)

age

young middle-aged old

salary

low medium high

18 … 29 30 … 60 61 … 80 10k…40k 50k 60k 70k 80k…100k

Example

Assumption:tuple t in Si supports rule . )( ii SDds

1. An overlapping attribute between rule and the tuple is the decision attribute in . ds

If two attributes, involved in that match, have different granularities, then the decision value d has to be replaced by a softer value which granularity will match the granularity of the corresponding attribute in Si.

2. An overlapping attribute between rule and the tuple is the classification attribute in . ds

If two attributes, involved in that match, have different granularities, then the value of attribute a has to be replaced by a finer value which granularity will match the granularity of a in Si.

Two cases:

Algorithm Chase 3 (Construction of new Di followed by Chase 2)

Chase 4(All Information Systems are equally

involved in chase)

cbag edabS3 S2

KB KB

qS2

qS3

dcbaS1

KB

qS2

qS1qS1

qS3

qS1=[a, c, d : b]

qS2=[b, a, e : d]

qS3=[a, b, c : g]

cbag edabS3 S2

r1, r2

r5, r6

KB

r5, r6

r3, r4

KB

qS2

qS3

r3, r4 – extracted from S3 r1, r2 – extracted from S2

dcbaS1

r1, r2

r3, r4

KB

r5, r6 – extracted from S1

qS2

qS1qS1

qS3

qS1=[a, c, d : b]

qS2=[b, a, e : d]

qS3=[a, b, c : g]

Chase Methods based on Knowledge Discovery

Documents

g2 b2 support

c2 b2 support

g3d2 b2 support

f2d2 b2 support

e2 b1 support

c3 b3 support

e3d3 b3 support

c1 f1 b1 support