Chase Methods based on Knowledge Discovery Agnieszka Dardzinska & Zbigniew W. Ras [email protected] & [email protected]
Jan 21, 2016
Chase Methods
based on Knowledge Discovery
Agnieszka Dardzinska & Zbigniew W. Ras [email protected] & [email protected]
X Faculty-Name Dept. Chair
x1 Bob
x2 John Jones
x3 Mike
x4 EE
x5 Tom EE
GIVEN: Incomplete Information System (IIS) Constraints (functional dependencies,..) which IIS satisfies
[ Dept Chair, Chair Dept Faculty-Name, … Dept(x1) =Dept(x2) ]
Algorithm Chase
Tableau System for IIS – information system with null values replaced by variables
X Faculty Name Department Chair
x1 Bob vd n1
x2 John vd Jones
x3 Mike n2 n3
x4 vE EE n4
x5 Tom EE n5
distinguished variables, one for each attribute (if b is an attribute
of interest, then vb is the corresponding distinguished variable)
nondistinguished variables (there are countably many of them:
n1, n2, n3, ….)
Variables in Tableaux System
X Faculty Name Department Chair
x1 Bob vd n1
x2 John vd Jones
x3 Mike n2 n3
x4 vE EE n4
x5 Tom EE n5
X Faculty Name Department Chair
x1 Bob vd Jones
x2 John vd Jones
x3 Mike n2 n3
x4 Tom EE n4
x5 Tom EE n4
Functional Dependencies:
[Department → Chair]
[Department *Chair → Faculty Name]
Input: tableaux system S and set of functional dependencies F
Output: tableaux system CHASEF(S)
BeginS1:=S;
while there are t1, t2 S1 and (B b) F
such that t1[B]= t2[B] and t1[b] < t2[b]
do change all the occurrences of the value
t2[b] in S1 to t1[b]
CHASEF(S):=S1
End
Algorithm Chase
Input: tableaux system S and set of functional dependencies F
Output: tableaux system CHASEF(S)
Begin
S1:=S;
while there are t1, t2 S1 and (B b) F
such that t1[B]= t2[B] and t1[b] < t2[b]
do change all the occurrences of the value
t2[b] in S1 to t1[b]
CHASEF(S):=S1
EndThe algorithm always terminates if applied to a finite tableaux system. If one execution of the algorithm generates a tableaux system satisfying F, then every execution of the algorithm generates the same tableaux system.
Algorithm Chase
Algorithm Chase 1
1. Chase 1 identifies all incomplete attributes (their values are called
concepts) in IS .
2. Main Algorithm
- Extraction of rules from IS describing these concepts,
- Null values in IS are replaced by values suggested by these
rules.
3. These two steps are repeated till fixpoint is reached.
Chase supported by rules extracted from IIS (Chase 1)
X b c d e f g
x1 b1 c1 e2 f1
x2 b2 c2 d2 e1 f2 g3
x3 b1 c1 d3 e1 f1 g1
x4 b3 c3 d3 e3 f1 g3
x5 b2 c2 e3 f1 g2
x6 c1 d2 f2 g1
x7 b1 d2 e2 f4 g1
x8 d2 e2 f2 g3
x9 b3 c1 d1 f2
x10 b2 c1 e3 f4 g2
X = {x1, x2, x3, x4, x5, x6, x7, x8, x9, x10}
A = {b, c, d, e, f, g}
e2 b1 (support 2), c1 f1 b1 (support 2), g2 b2 (support 2), c3 b3 (support 1), c2 b2 (support 2), g3d2 b2 (support 1), e3d3 b3 (support 1), f2d2 b2 (support 1).
Attribute b
),,( VAXS
Example (Chase1)
X b c d e f g
x1 b1 c1 e2 f1
x2 b2 c2 d2 e1 f2 g3
x3 b1 c1 d3 e1 f1 g1
x4 b3 c3 d3 e3 f1 g3
x5 b2 c2 e3 f1 g2
x6 c1 d2 f2 g1
x7 b1 d2 e2 f4 g1
x8 d2 e2 f2 g3
x9 b3 c1 d1 f2
x10 b2 c1 e3 f4 g2
Attribute b
Two null values in S: b(x6), b(x8)
b(x6):e2 b1 (support 2), c1 f1 b1 (support 2), g2 b2 (support 2), c3 b3 (support 1), c2 b2 (support 2), g3d2 b2 (support 1), e3d3 b3 (support 1), f2d2 b2 (support 1).
Example (Chase1)
X b c d e f g
x1 b1 c1 e2 f1
x2 b2 c2 d2 e1 f2 g3
x3 b1 c1 d3 e1 f1 g1
x4 b3 c3 d3 e3 f1 g3
x5 b2 c2 e3 f1 g2
x6 c1 d2 f2 g1
x7 b1 d2 e2 f4 g1
x8 d2 e2 f2 g3
x9 b3 c1 d1 f2
x10 b2 c1 e3 f4 g2
Attribute b
Two null values in S: b(x6), b(x8)
b(x6):e2 b1 (support 2), c1 f1 b1 (support 2), g2 b2 (support 2), c3 b3 (support 1), c2 b2 (support 2), g3d2 b2 (support 1), e3d3 b3 (support 1), f2d2 b2 (support 1).
Example (Chase1)
X b c d e f g
x1 b1 c1 e2 f1
x2 b2 c2 d2 e1 f2 g3
x3 b1 c1 d3 e1 f1 g1
x4 b3 c3 d3 e3 f1 g3
x5 b2 c2 e3 f1 g2
x6 c1 d2 f2 g1
x7 b1 d2 e2 f4 g1
x8 d2 e2 f2 g3
x9 b3 c1 d1 f2
x10 b2 c1 e3 f4 g2
Attribute b
Two null values in S: b(x6), b(x8)
b(x8):e2 b1 (support 2),
c1 f1 b1 (support 2),
g2 b2 (support 2),
c3 b3 (support 1),
c2 b2 (support 2),
g3d2 b2 (support 1),
e3d3 b3 (support 1),
f2d2 b2 (support 1). b(x6) = b2
Example (Chase1)
X b c d e f g
x1 b1 c1 e2 f1
x2 b2 c2 d2 e1 f2 g3
x3 b1 c1 d3 e1 f1 g1
x4 b3 c3 d3 e3 f1 g3
x5 b2 c2 e3 f1 g2
x6 c1 d2 f2 g1
x7 b1 d2 e2 f4 g1
x8 d2 e2 f2 g3
x9 b3 c1 d1 f2
x10 b2 c1 e3 f4 g2
c(x7):
b(x6) = b2
Two null values in S: c(x7), c(x8).
b1 c1 (support 2), e2 c1 (support 1), f4 c1 (support 1), g1 c1 (support 2), b2 d2 c2 (support 1), b2e1 c2 (support 1), b2f2 c2 (support 1), b2g3 c2 (support 1), d2e1 c2 (support 1),d2g3 c2 (support 1).
Example (Chase1)
X b c d e f g
x1 b1 c1 e2 f1
x2 b2 c2 d2 e1 f2 g3
x3 b1 c1 d3 e1 f1 g1
x4 b3 c3 d3 e3 f1 g3
x5 b2 c2 e3 f1 g2
x6 c1 d2 f2 g1
x7 b1 d2 e2 f4 g1
x8 d2 e2 f2 g3
x9 b3 c1 d1 f2
x10 b2 c1 e3 f4 g2
c(x7):
b(x6) = b2
Two null values in S: c(x7), c(x8).
b1 c1 (support 2),
e2 c1 (support 1),
f4 c1 (support 1),
g1 c1 (support 2),
b2 d2 c2 (support 1),
b2e1 c2 (support 1),
b2f2 c2 (support 1),
b2g3 c2 (support 1),
d2e1 c2 (support 1),
d2g3 c2 (support 1).
Example (Chase1)
X b c d e f g
x1 b1 c1 e2 f1
x2 b2 c2 d2 e1 f2 g3
x3 b1 c1 d3 e1 f1 g1
x4 b3 c3 d3 e3 f1 g3
x5 b2 c2 e3 f1 g2
x6 c1 d2 f2 g1
x7 b1 d2 e2 f4 g1
x8 d2 e2 f2 g3
x9 b3 c1 d1 f2
x10 b2 c1 e3 f4 g2
c(x8):
c(x7) = c1
Two null values in S: c(x7), c(x8).
b1 c1 (support 2),
e2 c1 (support 1),
f4 c1 (support 1),
g1 c1 (support 2),
b2 d2 c2 (support 1),
b2e1 c2 (support 1),
b2f2 c2 (support 1),
b2g3 c2 (support 1),
d2e1 c2 (support 1),
d2g3 c2 (support 1).b(x6) = b2 ,
Example (Chase1)
Input: System S=(X, A, V)Set of incomplete attributes In(A)={a1, a2, …, ak}Set of rules L(D)
Output: System Chase1(S)begin j:=1; while j ≤ k do begin
Sj:=Sfor all vVaj do
while
there is xX and rule (t v)L(D) such that xNSj(t) and card(aj(x))≠1
begina(x):=v;
endj:=j+1
end S:={Sj:1 ≤ j ≤ k}, Chase1 (S, In(A), L(D)) end
Algorithm Chase1(S, In(A), L(D))
A1- “the no. of different attributes
used in a query"
A2- “the percent of null values
in a queried IS"
A3- “the no. of objects returned
by QAS when IS-complete"
A4- “the no. of objects returned
by QAS when IS-incomplete"
(optimistic interpretation)
A5- “the no. of objects returned
by QAS based on rule-based
chase algorithm"
(pessimistic interpretation)
A6- “the no. of bad objects retrieved"
A7- “the no. of passes of rule-based
chase algorithm"
query A1 A2 A3 A4 A6 A7A5
4
248
q1
q1
q1
q1
q2
q2
q2
q2
3
3
33
4
44
4
4
4
4
4
4
4
44
2
2
33
333
2
13
12
11
1
1
12
0
5
5
8
14
1414
14
13131313 13
8
15
17
9
151727 16
2q3
q3
q3
q3
2222 4 2
22
2
23
381214 22
222222 25
252732
3028 23
22
ZOO Database
Rules Discovery
from partially
Incomplete Information Systems
Information System S = ( X, A, V )
X - finite set of objects,
A - finite set of attributes,
- set of their values. AaVV a :
)( xa
Assumption
1. For any , Aa Xx
2. For any ba ba VV
)(: xaJi{ ai Va 1)(
xaJi
ip }),( ii pa
Data (Incomplete)
x8
x7
x6
x5
x4
x3
x2
x1
edcbaX
),( 32
2a),( 3
11b
1e
1c1d
),( 21
1e),( 2
12e
),( 31
1a ),( 32
1b),( 3
12b
),( 41
2a),( 4
33a ),( 3
22b 2d
1a 2b ),( 21
1c),( 2
13c 2d
3e
3a 2c1d
),( 32
1e),( 3
12e
),( 32
1a),( 3
12a 1b 2c
1e
2a 2b 3c 2d),( 3
12e
),( 32
3e
2a ),( 41
1b),( 4
32b
),( 31
1c),( 3
22c 2d 2e
3a2b 1c
1d 3e
Example
Goal: Describe e in terms of {a,b,c,d}
x8
x7
x6
x5
x4
x3
x2
x1
edcbaX
),( 32
2a),( 3
11b
1e
1c1d
),( 21
1e),( 2
12e
),( 31
1a ),( 32
1b),( 3
12b
),( 41
2a),( 4
33a ),( 3
22b 2d
1a 2b ),( 21
1c),( 2
13c 2d
3e
3a 2c1d
),( 32
1e),( 3
12e
),( 32
1a),( 3
12a 1b 2c
1e
2a 2b 3c 2d),( 3
12e
),( 32
3e
2a ),( 41
1b),( 4
32b
),( 31
1c),( 3
22c 2d 2e
3a2b 1c
1d 3e
)},(),1,(),,{(* 32
5331
11 xxxa
)}1,(),1,(),,(),,(),,{(* 7631
541
232
12 xxxxxa
)}1,(),1,(),,{(* 8443
23 xxxa
)},(),1,(),,(),,(),,{(* 41
7521
431
231
11 xxxxxb
)}1,(),,(
),1,(),,(),1,(),,(),,{(*
843
7
621
4332
231
12
xx
xxxxxb
Algorithm ERID for Extracting Rules from partially Incomplete Information System
Goal: Describe e in terms of {a,b,c,d}
x8
x7
x6
x5
x4
x3
x2
x1
edcbaX
),( 32
2a),( 3
11b
1e
1c1d
),( 21
1e),( 2
12e
),( 31
1a ),( 32
1b),( 3
12b
),( 41
2a),( 4
33a ),( 3
22b 2d
1a 2b ),( 21
1c),( 2
13c 2d
3e
3a 2c1d
),( 32
1e),( 3
12e
),( 32
1a),( 3
12a 1b 2c
1e
2a 2b 3c 2d),( 3
12e
),( 32
3e
2a ),( 41
1b),( 4
32b
),( 31
1c),( 3
22c 2d 2e
3a2b 1c
1d 3e
)}1,(),,(),,(),,(),1,{(* 831
721
331
211 xxxxxc
)},(),1,(),1,(),,{(* 32
25431
22 xxxxc
)}1,(),,(),,{(* 621
331
23 xxxc
)}1,(),,(),1,(),1,{(* 821
5411 xxxxd
)}1,(),1,(),,(),1,(),1,{(* 7621
5322 xxxxxd
Algorithm ERID for Extracting Rules from partially Incomplete Information System
Goal: Describe e in terms of {a,b,c,d}
x8
x7
x6
x5
x4
x3
x2
x1
edcbaX
),( 32
2a),( 3
11b
1e
1c1d
),( 21
1e),( 2
12e
),( 31
1a ),( 32
1b),( 3
12b
),( 41
2a),( 4
33a ),( 3
22b 2d
1a 2b ),( 21
1c),( 2
13c 2d
3e
3a 2c1d
),( 32
1e),( 3
12e
),( 32
1a),( 3
12a 1b 2c
1e
2a 2b 3c 2d),( 3
12e
),( 32
3e
2a ),( 41
1b),( 4
32b
),( 31
1c),( 3
22c 2d 2e
3a2b 1c
1d 3e
For the values of the decision attribute we have:
)}1,(),,(),1,(),,{(* 532
4221
11 xxxxe
)}1,(),,(),,(),,{(* 731
631
421
12 xxxxe
)}1,(),,(),1,{(* 832
633 xxxe
Algorithm ERID
Goal: Describe e in terms of {a,b,c,d}.
x8
x7
x6
x5
x4
x3
x2
x1
edcbaX
),( 32
2a),( 3
11b
1e
1c1d
),( 21
1e),( 2
12e
),( 31
1a ),( 32
1b),( 3
12b
),( 41
2a),( 4
33a ),( 3
22b 2d
1a 2b ),( 21
1c),( 2
13c 2d
3e
3a 2c1d
),( 32
1e),( 3
12e
),( 32
1a),( 3
12a 1b 2c
1e
2a 2b 3c 2d),( 3
12e
),( 32
3e
2a ),( 41
1b),( 4
32b
),( 31
1c),( 3
22c 2d 2e
3a2b 1c
1d 3e
2. Check the relationship “ ”
between values of classification
attributes {a,b,c,d} and values
of decision attribute e
Algorithm ERID
Goal: Describe e in terms of {a,b,c,d}
x8
x7
x6
x5
x4
x3
x2
x1
edcbaX
),( 32
2a),( 3
11b
1e
1c1d
),( 21
1e),( 2
12e
),( 31
1a ),( 32
1b),( 3
12b
),( 41
2a),( 4
33a ),( 3
22b 2d
1a 2b ),( 21
1c),( 2
13c 2d
3e
3a 2c1d
),( 32
1e),( 3
12e
),( 32
1a),( 3
12a 1b 2c
1e
2a 2b 3c 2d),( 3
12e
),( 32
3e
2a ),( 41
1b),( 4
32b
),( 31
1c),( 3
22c 2d 2e
3a2b 1c
1d 3e
Niiii pxc )},{(* Njjjj qye )},{(*Let , .
and confidence of the rule are above some threshold values.
We say that:** ji ec iff support
ji ec
Algorithm ERID
Goal: Describe e in terms of {a,b,c,d}
x8
x7
x6
x5
x4
x3
x2
x1
edcbaX
),( 32
2a),( 3
11b
1e
1c1d
),( 21
1e),( 2
12e
),( 31
1a ),( 32
1b),( 3
12b
),( 41
2a),( 4
33a ),( 3
22b 2d
1a 2b ),( 21
1c),( 2
13c 2d
3e
3a 2c1d
),( 32
1e),( 3
12e
),( 32
1a),( 3
12a 1b 2c
1e
2a 2b 3c 2d),( 3
12e
),( 32
3e
2a ),( 41
1b),( 4
32b
),( 31
1c),( 3
22c 2d 2e
3a2b 1c
1d 3e
Niiii pxc )},{(* Njjjj qye )},{(*Let , .
and confidence of the rule
are above some threshold values.
We say that:
** ji ec iff support
ji ec
How to define support and confidence
of a rule ?ji ec
Algorithm ERID
To define support and confidence of the rule a1 e3 we compute:
)},(),1,(),,{(* 32
5331
11 xxxa
10110)sup( 32
31
31 ea
)}1,(),,(),1,{(* 832
633 xxxe
)sup(
)sup()(
1
3131 a
eaeaconf
21)sup( 32
31
1 a
Support of the rule:
Support of the term a1:
Confidence of the rule:
Definition of Support and Confidence (by example)
Goal: Describe e in terms of {a,b,c,d}
x8
x7
x6
x5
x4
x3
x2
x1
edcbaX
),( 32
2a),( 3
11b
1e
1c1d
),( 21
1e),( 2
12e
),( 31
1a ),( 32
1b),( 3
12b
),( 41
2a),( 4
33a ),( 3
22b 2d
1a 2b ),( 21
1c),( 2
13c 2d
3e
3a 2c1d
),( 32
1e),( 3
12e
),( 32
1a),( 3
12a 1b 2c
1e
2a 2b 3c 2d),( 3
12e
),( 32
3e
2a ),( 41
1b),( 4
32b
),( 31
1c),( 3
22c 2d 2e
3a2b 1c
1d 3e
** 11 ea )1(sup 65 - marked negative
** 21 ea )1(sup 61
** 31 ea )11(sup - marked positive
)5.0( conf
Thresholds (provided by user):
Minimal support (λ1 = 1)
Minimal confidence (λ2 = ½)
- marked negative
Extracting Rules from partially Incomplete Information System(Algorithm ERID(λ1, λ2))
x8
x7
x6
x5
x4
x3
x2
x1
edcbaX
),( 32
2a),( 3
11b
1e
1c1d
),( 21
1e),( 2
12e
),( 31
1a ),( 32
1b),( 3
12b
),( 41
2a),( 4
33a ),( 3
22b 2d
1a 2b ),( 21
1c),( 2
13c 2d
3e
3a 2c1d
),( 32
1e),( 3
12e
),( 32
1a),( 3
12a 1b 2c
1e
2a 2b 3c 2d),( 3
12e
),( 32
3e
2a ),( 41
1b),( 4
32b
),( 31
1c),( 3
22c 2d 2e
3a2b 1c
1d 3e
**
33ea )11(sup but )36.0( conf
**
12eb )1(sup 6
7 but )22.0( conf**
22eb )1(sup 12
17 but )27.0( conf**
31ec )1(sup 2
3 but )47.0( conf**
22ec )11(sup but )33.0( conf
**
11ed )1(sup 3
5 but )48.0( conf**
31ed )11(sup but )28.0( conf
**
12ed )1(sup 2
3 but )33.0( conf**
32ed )1(sup 3
5 but )37.0( conf
Algorithm ERID(λ1, λ2)
x8
x7
x6
x5
x4
x3
x2
x1
edcbaX
),( 32
2a),( 3
11b
1e
1c1d
),( 21
1e),( 2
12e
),( 31
1a ),( 32
1b),( 3
12b
),( 41
2a),( 4
33a ),( 3
22b 2d
1a 2b ),( 21
1c),( 2
13c 2d
3e
3a 2c1d
),( 32
1e),( 3
12e
),( 32
1a),( 3
12a 1b 2c
1e
2a 2b 3c 2d),( 3
12e
),( 32
3e
2a ),( 41
1b),( 4
32b
),( 31
1c),( 3
22c 2d 2e
3a2b 1c
1d 3e
)11(sup but )36.0( conf
)1(sup 67 but )22.0( conf
)1(sup 1217 but )27.0( conf
)1(sup 23 but )47.0( conf
)11(sup but )33.0( conf
)1(sup 35 but )48.0( conf
)11(sup but )28.0( conf
)1(sup 23 but )33.0( conf
)1(sup 35 but )37.0( conf
They all are not marked
**
33ea
**
12eb
**
22eb
**
31ec
**
22ec
**
11ed
**
31ed
**
12ed
**
32ed
Algorithm ERID(λ1, λ2)
x8
x7
x6
x5
x4
x3
x2
x1
edcbaX
),( 32
2a),( 3
11b
1e
1c1d
),( 21
1e),( 2
12e
),( 31
1a ),( 32
1b),( 3
12b
),( 41
2a),( 4
33a ),( 3
22b 2d
1a 2b ),( 21
1c),( 2
13c 2d
3e
3a 2c1d
),( 32
1e),( 3
12e
),( 32
1a),( 3
12a 1b 2c
1e
2a 2b 3c 2d),( 3
12e
),( 32
3e
2a ),( 41
1b),( 4
32b
),( 31
1c),( 3
22c 2d 2e
3a2b 1c
1d 3e
*)*( 313 eca )11(sup and )8.0( conf
*)*( 313 eda )11(sup and )5.0( conf
*)*( 323 eda )10(sup *)*( 122 edb )1(sup 3
2 *)*( 222 ecb )1(sup 2
1
Algorithm ERID(λ1, λ2)
x8
x7
x6
x5
x4
x3
x2
x1
edcbaX
),( 32
2a),( 3
11b
1e
1c1d
),( 21
1e),( 2
12e
),( 31
1a ),( 32
1b),( 3
12b
),( 41
2a),( 4
33a ),( 3
22b 2d
1a 2b ),( 21
1c),( 2
13c 2d
3e
3a 2c1d
),( 32
1e),( 3
12e
),( 32
1a),( 3
12a 1b 2c
1e
2a 2b 3c 2d),( 3
12e
),( 32
3e
2a ),( 41
1b),( 4
32b
),( 31
1c),( 3
22c 2d 2e
3a2b 1c
1d 3e
)11(sup and )8.0( conf
)11(sup and )5.0( conf
)10(sup
)1(sup 32
)1(sup 21
They all are marked positive.
*)*( 313 eca
*)*( 313 eda
*)*( 323 eda
*)*( 122 edb
*)*( 222 ecb
Algorithm ERID(λ1, λ2)
x8
x7
x6
x5
x4
x3
x2
x1
edcbaX
),( 32
2a),( 3
11b
1e
1c1d
),( 21
1e),( 2
12e
),( 31
1a ),( 32
1b),( 3
12b
),( 41
2a),( 4
33a ),( 3
22b 2d
1a 2b ),( 21
1c),( 2
13c 2d
3e
3a 2c1d
),( 32
1e),( 3
12e
),( 32
1a),( 3
12a 1b 2c
1e
2a 2b 3c 2d),( 3
12e
),( 32
3e
2a ),( 41
1b),( 4
32b
),( 31
1c),( 3
22c 2d 2e
3a2b 1c
1d 3e
)11(sup and )8.0( conf
)11(sup and )5.0( conf
)10(sup
)1(sup 32
)1(sup 21
They all are marked positive.
They all are marked negative.
*)*( 313 eca
*)*( 313 eda
*)*( 323 eda
*)*( 122 edb
*)*( 222 ecb
Algorithm ERID(λ1, λ2)
x8
x7
x6
x5
x4
x3
x2
x1
edcbaX
),( 32
2a),( 3
11b
1e
1c1d
),( 21
1e),( 2
12e
),( 31
1a ),( 32
1b),( 3
12b
),( 41
2a),( 4
33a ),( 3
22b 2d
1a 2b ),( 21
1c),( 2
13c 2d
3e
3a 2c1d
),( 32
1e),( 3
12e
),( 32
1a),( 3
12a 1b 2c
1e
2a 2b 3c 2d),( 3
12e
),( 32
3e
2a ),( 41
1b),( 4
32b
),( 31
1c),( 3
22c 2d 2e
3a2b 1c
1d 3e
The algorithm continues for terms
of length 3, 4, … till all of them
have either positive or negative
marks.
Rules are automatically constructed
from relations marked positive.
Algorithm ERID(λ1, λ2)
Algorithm Chase 2(for Partially IIS)
),,( VAXS - partially incomplete information system of type λ, if S
is incomplete and the following three conditions hold:
Xx )(xaS is defined for any , Aa
]1})1:),{()([()()( iiiS pmipaxaAaXx )])((})1:),{()([()()( iiiS pimipaxaAaXx
Algorithm Chase 2
S1, S2 - partially incomplete, both of type λ and both classifying
the same sets of objects (from X) using the same sets of attributes (A)
Let }1:),{)( 1111mipaxa iiS and }.1:),{()( 2222
mipaxa iiS
The pair (S1, S2) satisfies containment relation Ψ (or Ψ(S1)= S2) if:
))](())(([)()(21
xacardxacardAaXx SS |]]|||[))](())(([[)()( 112221 j
jiij
jiiSS ppppxacardxacardAaXx
We also denote that fact by )]())(([)()(21
xaxaAaXx SS
Algorithm Chase 2
x8
x7
x6
x5
x4
x3
x2
x1
edcbaX
),( 32
2a),( 3
11b
1e
1c1d
),( 21
1e),( 2
12e
),( 31
1a ),( 32
1b),( 3
12b
),( 41
2a),( 4
33a ),( 3
22b 2d
1a 2b ),( 21
1c),( 2
13c 2d
3e
3a 2c1d
),( 32
1e),( 3
12e
),( 32
1a),( 3
12a 1b 2c
1e
2a 2b 3c 2d),( 3
12e
),( 32
3e
2a ),( 41
1b),( 4
32b
),( 31
1c),( 3
22c 2d 2e
3a2b 1c1d 3e
System S1 System S2
x8
c2x7
x6
x5
x4
x3
x2
x1
edcbaX
),( 32
2a
1e
1c1d
),( 21
1e),( 2
12e
),( 31
1a ),( 32
1b),( 3
12b
),( 41
2a),( 4
33a 2d
1a 2b ),( 31
1c),( 3
23c 2d
3e
3a 2c1d
),( 32
1a),( 3
12a 1b 2c
1e
2a 2b 3c 2d),( 3
12e
),( 32
3e
2a ),( 41
1b),( 4
32b 2d 2e
2b 1c1d 3e),( 3
21a
),( 31
2a
2b),( 3
11c
),( 32
2c
2e
Assumptions: - information system of type λ - set of all pairwise independent rules extracted by ERID from S
),,( VAXS )}(:){()( AIncDvtDL c
NS(t) - standard interpretation of term t in S, meaning that:
)}(),(:),{()( xapvpxvN S , for any aVv
)()()( 2121 tNtNttN SSS
)()()( 2121 tNtNttN SSS
where for any , we have:IiiiS pxtN )},{()( 1 JjjjS qxtN )},{()( 2
JIiiiiJIiiiIJjjjSS qpxpxpxtNtN )},max(,{()},{()},{()()( \\21
JIiiiiSS qpxtNtN )},{()()( 21
In(A) = {a1, … , ak} - incomplete attributes in S
..................
;0:,:)( jj nxbfor all do begin
if and is a maximal subset of rules from L(D) such that
then if thenbegin
end endpj:= pj +nj;
endif /containment relation holds between aj(x), [bj(x)/pj]/then
jaVv 1))(( xacard j }:){( Iivt i
)(),( iSi tNpxj
)]sup()([ vtvtconfp iiiIi
)])}sup()([,{()(:)( vtvtconfpvxbxb iiiIijj
)]sup()([: vtvtconfpnn iiiIijj
]/)([))(( jjj pxbxa ]/)([:)( jjj pxbxa
..................
Algorithm Chase 2 (S, In(A), L(D))
x8
x7
x6
x5
x4
x3
x2
x1
edcbaX
),( 32
2a),( 3
11b
1e
1c1d
),( 21
1e),( 2
12e
),( 31
1a ),( 32
1b),( 3
12b
),( 41
2a),( 4
33a ),( 3
22b 2d
1a 2b ),( 21
1c),( 2
13c 2d
3e
3a 2c1d
),( 32
1e),( 3
12e
),( 32
1a),( 3
12a 1b 2c
1e
2a 2b 3c 2d),( 3
12e
),( 32
3e
2a ),( 41
1b),( 4
32b
),( 31
1c),( 3
22c 2d 2e
3a2b 1c
1d 3e
][ 311 ear ,1)sup( 1 r 5.0)( 1 rconf
][ 222 ear ,)sup( 35
2 r 51.0)( 2 rconf
][ 114 ebr ,2)sup( 4 r 72.0)( 4 rconf
][ 325 ebr ,)sup( 38
5 r 51.0)( 5 rconf
][ 31110 edcr ,1)sup( 10 r 5.0)( 10 rconf
Incomplete Information System Sof type λ = 0.3
λ1=1, λ2=0.5ERID(λ1, λ2)
x8
x7
x6
x5
x4
x3
x2
x1
edcbaX
),( 32
2a),( 3
11b
1e
1c1d
),( 21
1e),( 2
12e
),( 31
1a ),( 32
1b),( 3
12b
),( 41
2a),( 4
33a ),( 3
22b 2d
1a 2b ),( 21
1c),( 2
13c 2d
3e
3a 2c1d
),( 32
1e),( 3
12e
),( 32
1a),( 3
12a 1b 2c
1e
2a 2b 3c 2d),( 3
12e
),( 32
3e
2a ),( 41
1b),( 4
32b
),( 31
1c),( 3
22c 2d 2e
3a2b 1c
1d 3e
Incomplete Information System Sof type λ = 0.3
λ1=1, λ2=0.5ERID(λ1, λ2)
Algorithm Chase 2 will try to replace
)},(),,{()( 21
221
11 eexe
by enew(x1) = {(e1, ), (e2, ), (e3, )}.
We will show that Ψ(e(x1)) = enew(x1)(the value e(x1) will be changed by Chase 2).
x8
x7
x6
x5
x4
x3
x2
x1
edcbaX
),( 32
2a),( 3
11b
1e
1c1d
),( 21
1e),( 2
12e
),( 31
1a ),( 32
1b),( 3
12b
),( 41
2a),( 4
33a ),( 3
22b 2d
1a 2b ),( 21
1c),( 2
13c 2d
3e
3a 2c1d
),( 32
1e),( 3
12e
),( 32
1a),( 3
12a 1b 2c
1e
2a 2b 3c 2d),( 3
12e
),( 32
3e
2a ),( 41
1b),( 4
32b
),( 31
1c),( 3
22c 2d 2e
3a2b 1c
1d 3e
For x1:
)24.0,(
))3/14/()111(,(
3
21
10051
38
31
21
31
3
e
e
)97.0,())3/5/()(,( 2100
5135
32
2 ee
)48.0,()2/)2(,( 110072
32
1 ee
we have: )}24.0,(),97.0,(),48.0,{()( 3211 eeexe
Because the confidence assigned to e3 is below the threshold λ, then only two values remain:
(e1, 0.48), (e2, 0.97).The value of attribute e assigned to x1 is:
{(e1, 0.33), (e2, 0.67)}.Incomplete Information System Sof type λ = 0.3
Distributed Chase Algorithm (Chase 3)
Let:
),}({ LSDS Iii - distributed autonomous information systems
),,( iiii VAXS - information system for any Ii (I - set of sites)
Ii
iki DD
, - knowledge-base at site Ii
ikD , - set of (k, i)-rules (constructed at site k and sent
to site i)
Strategy for constructing
knowledge-base and Algorithm Chase 3Ii
iki DD
,
Notation:
qS=[a, b, c : d, e] - request by S for definitions of a, b, c
with additional information that d, e
are complete attributes in S.
Global Ontology
g a b c
g1 b2
g1 a2 b1 c2
g1 a2 c1
g1 a1 b1 c1
S2 b a d e
a1 d2
b2 a2 d2 e2
b1 a2 d1 e1
d1
S1
a b c d
a1 b2
b1 c2
a2 b2 d2
a2 b1 c1
rule support systemS KB
KBS
r1r2
qS=[a, c, d : b]qS
Global Ontology
g a b c
g1 b2
g1 a2 b1 c2
g1 a2 c1
g1 a1 b1 c1
S2 b a d e
a1 d2
b2 a2 d2 e2
b1 a2 d1 e1
d1
S1
a b c d
a1 b2
b1 c2
a2 b2 d2
a2 b1 c1
rule support system
b1a2 1 S
b2*d2a2 1 S
b2a2 1 S1
c1*b1a1 1 S2
S
r1
r2
KBS
r1r2
qS=[a, c, d : b]qS
Assumption:
.
),,( iiii VAXS
Di - granularity level of values of attributes used in rules from Di may differ from the granularity level of values of attribute used in descriptions of objects in
Chase 3 algorithm to be applicable to Si has to be based on rules from Di satisfying the following two conditions:
.
attribute value used in the decision part of a rule has the granularity level either equal to or finer than the granularity level of the corresponding attribute in Si
the granularity level of any attribute used in the classification part of a rule is either equal or softer than the granularity level of the corresponding attribute in Si
Hierarchical attributes: age, salaryRule in Di: (age, young) (salary, 40k)
age
young middle-aged old
salary
low medium high
18 … 29 30 … 60 61 … 80 10k…40k 50k 60k 70k 80k…100k
Example
Assumption:tuple t in Si supports rule . )( ii SDds
1. An overlapping attribute between rule and the tuple is the decision attribute in . ds
If two attributes, involved in that match, have different granularities, then the decision value d has to be replaced by a softer value which granularity will match the granularity of the corresponding attribute in Si.
2. An overlapping attribute between rule and the tuple is the classification attribute in . ds
If two attributes, involved in that match, have different granularities, then the value of attribute a has to be replaced by a finer value which granularity will match the granularity of a in Si.
Two cases:
Algorithm Chase 3 (Construction of new Di followed by Chase 2)
Chase 4(All Information Systems are equally
involved in chase)
cbag edabS3 S2
KB KB
qS2
qS3
dcbaS1
KB
qS2
qS1qS1
qS3
qS1=[a, c, d : b]
qS2=[b, a, e : d]
qS3=[a, b, c : g]
cbag edabS3 S2
r1, r2
r5, r6
KB
r5, r6
r3, r4
KB
qS2
qS3
r3, r4 – extracted from S3 r1, r2 – extracted from S2
dcbaS1
r1, r2
r3, r4
KB
r5, r6 – extracted from S1
qS2
qS1qS1
qS3
qS1=[a, c, d : b]
qS2=[b, a, e : d]
qS3=[a, b, c : g]