Khai Ph D Liu
Nguyn Nht [email protected] Vin Cng ngh Thng tin v Truyn thng Trng i hc Bch Khoa H NiNm hc 2010-2011
Ni dung mn hc:Gii thiu v Khai ph d liu Gii thiu v cng c WEKA Tin x l d liu Pht hin cc lut kt hp Cc k thut phn lp v d on Cc k thut phn nhm
Khai Ph D Liu
2
Pht hin cc lut kt hp Gii thiuBi ton pht hin lut kt hp (Association rule mining)Vi mt tp cc g p giao dch (transactions) cho trc, cn tm cc ( ) , lut d on kh nng xut hin trong mt giao dch ca cc mc (items) ny da trn vic xut hin ca cc mc khcTID Items
Cc v d ca lut kt hp:{Diaper} {Beer} {Milk, Bread} {Eggs, Coke} {Beer, Bread} {Milk}
1 2 3 4 5
Bread, Milk Bread, Diaper, Beer, Eggs Milk, Diaper, Beer, Coke Bread, Milk, Diaper, Beer Bread, Milk, Diaper, Coke
Khai Ph D Liu
3
Cc nh ngha c bn (1)Tp mc (Itemset)Mt tp hp gm mt hoc nhiu mcV d: {Milk, Bread, Diaper}TID Items
Tp mc mc k (k-itemset)Mt tp mc gm k mc
1 2 3 4 5
Bread, Milk Bread, Diaper Beer Bread Diaper, Beer, Eggs Milk, Diaper, Beer, Coke Bread, Milk, Diaper, Beer Bread, Milk Diaper Bread Milk, Diaper, Coke
Tng s h tr (Support count) (S )S ln xut hin ca mt tp mc V d: ({Milk, Bread, Diaper}) = 2
h tr (Support) sT l cc giao dch cha mt tp mc V d: s({Milk, Bread, Diaper}) = 2/5
Tp mc thng xuyn (Frequent/large itemset)Mt tp mc m h tr ln hn hoc bng mt gi tr ngng minsupKhai Ph D Liu 4
Cc nh ngha c bn (2)Lut kt hp (Association rule)Mt biu thc ko theo c dng: X Y, trong X v Y l cc tp mc V d: {Milk, Diaper} {Beer} {MilkTID Items
1 2 3 4 5
Bread, Milk Bread, Diaper, Beer, Eggs Milk, Diaper, Beer, Coke Bread, Milk, Diaper, Beer Bread, Milk, Diaper, Coke
Cc o nh gi lut h tr (Support) s ( pp ) T l cc giao dch cha c X v Y i vi tt c cc giao dch tin cy (Confidence) c T l cc giao dch cha c X v Y i vi cc giao dch cha XKhai Ph D Liu
{Milk , Diaper} Beers=c=
( Milk , Diaper, Beer )|T|
=
2 = 0 .4 5
(Milk, Diaper, Beer) 2 = = 0.67 (Milk, Diaper) 35
Pht hin cc lut kt hpVi mt tp cc giao dch T, mc ch ca bi ton pht hin lut kt hp l tm ra tt c cc lut c: h tr gi tr ngng minsup, v tin cy gi tr ngng minconf
Cch tip cn vt cn (Brute-force)Lit k tt c cc lut kt hp c th Tnh t T h ton h tr v ti cy cho mi l t t tin h i lut Loi b i cc lut c h tr nh hn minsup hoc c tin cy nh hn minconf
Phng php vt cn ny c chi ph tnh ton qu ln, khng p dng c trong thc t!Khai Ph D Liu 6
Pht hin lut kt hpTID Items
Cc lut kt hp:{Milk, Diaper} {Beer} {Milk, Beer} {Diaper} {Diaper, Beer} {Milk} {Beer} {Milk, Diaper} {Diaper} {Milk Beer} {Milk, {Milk} {Diaper, Beer} (s=0.4, c=0.67) (s=0.4, c=1.0) (s=0.4, c=0.67) (s=0.4, c=0.67) (s 0.4, c=0 5) (s=0 4 c 0.5) (s=0.4, c=0.5)
1 2 3 4 5
Bread, Milk Bread, Diaper, B B d Di Beer, E Eggs Milk, Diaper, Beer, Coke Bread, Milk, Diaper, Beer Bread, Milk, Diaper, Coke
Tt c cc lut trn u l s phn tch (thnh 2 tp con) ca cng tp mc : {Milk, Diaper, Beer} Cc lut sinh ra t cng mt tp mc s c cng h tr, nhng c th khc v tin cy Do , trong qu trnh pht hin lut kt hp, chng ta c th tch ring 2 yu cu v h tr v tin cyKhai Ph D Liu 7
Pht hin lut kt hpQu trnh pht hin lut kt hp s gm 2 bc (2 giai on) quan trng:Sinh ra cc tp mc thng xuyn (frequent/large itemsets) Sinh ra tt c cc tp mc c h tr minsup Sinh ra cc lut kt hp T mi tp mc thng xuyn (thu c bc trn), sinh ra tt c cc lut c tin cy cao ( minconf) Mi lut l mt phn tch nh phn (phn tch thnh 2 phn) ca mt tp mc thng xuyn
Bc sinh ra cc tp mc thng xuyn (bc th 1) vn c chi ph tnh ton qu cao!Khai Ph D Liu 8
Lattice biu din cc tp mc cn xtnull A B C D E
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
ABC
ABD
ABE
ACD
ACE
ADE
BCD
BCE
BDE
CDE
Vi d mc, th phi xt n 2d cc tp mc c th!
ABCD
ABCE
ABDE
ACDE
BCDE
ABCDE
Khai Ph D Liu
9
Sinh ra cc tp mc thng xuynTID 1 2 3 4 5 Items Bread, B d Milk Bread, Diaper, Beer, Eggs Milk, Diaper, Beer, Coke Bread, Milk, Diaper, Beer Bread, Milk, Diaper, Coke
Phng php vt cn (Brute force) (Brute-force) Mi tp mc trong lattice u c xt Tnh h tr ca mi tp mc, bng cch duyt qua tt c cc giao d h i dch Vi mi giao dch, so snh n vi mi tp mc c xt phc tp ~ O(N.M.w) Vi M = 2d, th phc tp ny l qu ln!Khai Ph D Liu 10
Cc chin lc sinh tp mc thng xuynGim bt s lng cc tp mc cn xt (M) Tm kim (xt) y : M=2d ( ) y S dng cc k thut ct ta (pruning) gim gi tr M Gim bt s lng cc giao dch cn xt (N) g g ( ) Gim gi tr N, khi kch thc (s lng cc mc) ca tp mc tng ln Gim bt s lng cc so snh (matchings/comparisons) gia cc tp mc v cc g giao dch (N.M) ( ) S dng cc cu trc d liu ph hp (hiu qu) lu cc tp mc cn xt hoc cc giao dch Khng cn phi so snh mi tp mc vi mi giao dchKhai Ph D Liu 11
Gim bt s lng cc tp mc cn xtNguyn tc ca gii thut Apriori Loi b (prunning) da trn h trNu mt tp mc l thng xuyn, th tt c cc tp con (subsets) ca n u l cc tp mc thng xuyn Nu mt tp mc l khng thng xuyn (not frequent) th tt c frequent), cc tp cha (supersets) ca n u l cc tp mc khng thng xuyn
Nguyn tc ca gii thut Apriori da trn c tnh khng n iu (anti-monotone) ca h tr
X , Y : ( X Y ) s( X ) s(Y ) h tr ca mt tp mc nh hn h tr ca cc tp con ca nKhai Ph D Liu 12
Apriori: Loi b da trn h tr
Tp mc khng thng xuyn
Cc tp cha ca tp mc m c (AB) b loi bKhai Ph D Liu 13
Apriori: Loi b da trn h trItem Bread Coke C k Milk Beer Diaper Eggs Count 4 2 4 3 4 1
Cc tp mc mc 1 (1-itemsets)Itemset {Bread,Milk} {Bread,Beer} {Bread,Diaper} {Milk,Beer} {Milk,Diaper} {Beer,Diaper} Count 3 2 3 2 3 3
Cc tp mc mc 2 (2itemsets) (Khng cn xt cc tp mc c cha mc Coke hoc Eggs) Cc tp mc mc 3 (3-itemsets)
minsup = 3
Nu xt tt c cc tp mc c th: 6C + 6C + 6C = 41 1 2 3 Vi c ch loi b da trn h tr: 6 + 6 + 1 = 13
Ite m s e t { B r e a d ,M ilk ,D ia p e r }
C ount 3
Khai Ph D Liu
14
Gii thut AprioriSinh ra tt c cc tp mc thng xuyn mc 1 (frequent 1 itemsets): 1-itemsets): cc tp mc thng xuyn ch cha 1 mc Gn k = 1 Lp li, cho n khi khng c thm bt k tp mc p g p thng xuyn no miT cc tp mc thng xuyn mc k (cha k mc), sinh ra cc tp mc mc mc (k 1) cn xt (k+1) Loi b cc tp mc mc (k+1) cha cc tp con l cc tp mc khng thng xuyn mc k Tnh h tr ca mi tp mc mc (k+1), bng cch duyt qua (k+1) tt c cc giao dch Loi b cc tp mc khng thng xuyn mc (k+1) Thu c cc tp mc thng xuyn mc (k+1)Khai Ph D Liu 15
Gim bt s lng cc so snhCc so snh (matchings/comparisons) gia cc tp mc cn xt v cc giao dchCn phi duyt qua tt c cc giao dch, tnh h tr ca mi tp mc cn xt
gim bt s lng cc so snh, cn s dng cu trc bm (hash structure) lu cc tp mc cn xtThay v phi so snh mi giao dch vi mi tp mc cn xt, th ch cn so snh n vi cc tp mc cha trong cc (hashed buckets)
TID 1 2 3 4 5
Items Bread, Milk , Bread, Diaper, Beer, Eggs Milk, Diaper, Beer, Coke Bread, Milk, Diaper, Beer Bread, Milk, Diaper, Coke
Khai Ph D Liu
16
Sinh ra cy bm (hash tree)Gi s chng ta c 15 tp mc mc 3 cn xt:{1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5}, {3 5 6}, {3 5 7}, {6 8 9} {3 6 7} {3 6 8} 5} 6} 7} 9}, 7},
Sinh ra cy bm (Hash tree):Hm bm (Hash function) V d: h(p) = p mod 3 Kch thc ti a ca nt l (Max leaf size): S lng ti cc tp mc c lu mt nt l (Nu s lng cc tp mc vt qu gi tr ny, nt s tip tc b phn chia) V d: Max leaf size = 3
(Hm bm) 3,6,9 1,4,7 147 2,5,8
145 124 457Khai Ph D Liu
234 567 345 136 125 458 159
356 357 689
367 368
17
Pht hin lut kt hp bng cy bm (1)(Hm bm)
Cy bm lu cc tp mc cn xt
1,4,7 2,5,8 258
3,6,9
234 567 Bm (hash) i vi 1, 4, hoc 7 145 136 345 124 457 125 458Khai Ph D Liu 18
356 357 689
367 368
159
Pht hin lut kt hp bng cy bm (2)(Hm bm)
Cy bm lu cc tp mc cn xt
1,4,7 2,5,8 258
3,6,9
234 567 145 Bm (hash) (h h) i vi 2, 5, hoc 8 136 345 124 457 125 458Khai Ph D Liu 19
356 357 689
367 368
159
Pht hin lut kt hp bng cy bm (3)(Hm bm)
Cy bm lu cc tp mc cn xt
1,4,7 2,5,8 258
3,6,9
234 567 145 Bm (hash) i vi 3, 6, hoc 9 136 345 124 457 125 458Khai Ph D Liu 20
356 357 689
367 368
159
Cc tp mc mc k trong mt giao dchi vi giao dch t, hy xc nh cc tp mc mc 3?G Gi s trong o g mi tp mc, cc mc c lit k theo th t t in
Khai Ph D Liu
21
Xc nh cc tp mc bng cy bm (1)1 2 3 5 6 Giao dch t 1+ 2356 2+ 356 3+ 56234 567 145 136 345 124 457 125 458 159 356 357 689 367 368(Hm bm)
1,4,7 2,5,8
3,6,9
Khai Ph D Liu
22
Xc nh cc tp mc bng cy bm (2)1 2 3 5 6 Giao dch t 1+ 2356 12+ 356 13+ 56 15+ 6145 136 345 124 457 125 458 159 356 357 689 367 368 234 567(Hm bm)
2+ 356 3+ 56
1,4,7 2,5,8
3,6,9
Ch cn so snh giao dch t vi 11 g (trong tng s 15) tp mc cn xt!Khai Ph D Liu 23
Apriori: Cc yu t nh hng phc tpLa chn gi tr ngng minsupGi tr minsup qu thp s sinh ra nhiu tp mc thng xuyn iu ny c th lm tng s lng cc tp mc phi xt v di (kch thc) ti a ca cc tp mc thng xuyn
S lng cc mc trong c s d liu (cc giao dch)Cn thm b nh lu gi tr h tr i vi mi mc Nu s lng cc mc (tp mc mc 1) thng xuyn tng ln, th chi ph tnh ton v chi ph I/O (duyt cc giao dch) cng tng
Kch thc ca c s d liu (cc giao dch)Gii thut Apriori duyt c s d liu nhiu ln. Do , chi ph tnh ton ca Apriori tng ln khi s lng cc giao dch tng ln
Kch thc trung bnh ca cc giao dchKhi kch thc (s lng cc mc) trung bnh ca cc giao dch tng ln, th di ti a ca cc tp mc thng xuyn cng tng, tng v chi ph duyt cy bm cng tngKhai Ph D Liu 24
Biu din cc tp mc thng xuynTrong thc t, s lng cc tp mc thng xuyn c sinh ra t mt csdl giao dch c th rt ln Cn mt cch biu din ngn gn (compact representation)Bng mt tp (nh) cc tp mc thng xuyn i din m c th dng suy ra (sinh ra) tt c cc tp mc thng xuyn khc
C 2 cch biu din nh vyCc tp mc thng xuyn ln nht (Maximal frequent itemsets) Cc tp mc thng xuyn ng (Closed frequent itemsets)
Khai Ph D Liu
25
Cc tp mc thng xuyn ln nhtMt tp mc thng xuyn l ln nht (Maximal frequent itemset), nu mi tp cha (superset) ca n u l tp mc khng thng xuynCc tp mc thng xuyn ln nht
Cc tp mc khng thng xuynKhai Ph D Liu
Ranh gii26
Cc tp mc thng xuyn ngMt tp mc thng xuyn l ng (Closed frequent itemset), nu khng c tp cha no ca n c cng h tr vi n g p g Itemset {A} {B} {C} { } {D} {A,B} {A,C} {A,D} {B,C} {B,D} {C,D} Support 4 5 3 4 4 2 3 3 4 3
TID 1 2 3 4 5
Items {A,B} {B,C,D} {A,B,C,D} {A B C D} {A,B,D} {A,B,C,D}
Itemset Support {A,B,C} 2 {A,B,D} 3 {A,C,D} 2 {B,C,D} 3 {A,B,C,D} 2
Khai Ph D Liu
27
Tp mc thng xuyn: ln nht vs. ng (1)null
TIDs245D E
TID 1 2 3 4 5
Items ABC ABCD BCE ACDE DE12ABC
124A
123B
1234C
345
12AB
124AC
24AD
4AE
123BC
2BD
3
BE
24
CD
34
CE
45
DE
2ABD ABE
24ACD
4ACE
4ADE
2BCD
3
BCE
BDE
4
CDE
2
4ABCD ABCE ABDE ACDE BCDE
Khng c h tr bi g bt k giao dch noKhai Ph D Liu
ABCDE
28
Tp mc thng xuyn: ln nht vs. ng (2)Minsup = 2124A null
ng, nhng khng phi l ln nht245D E
123B
1234C
345
ng v ln nht34CE
12AB
124AC
24AD
4AE
123BC
2BD
3
BE
24
CD
45
DE
12ABC
2ABD ABE
24ACD
4ACE
4ADE
2BCD
3
BCE
BDE
4
CDE
2
4ABCD ABCE ABDE ACDE BCDE
# ng = 9 # Ln nht = 4
ABCDE
Khai Ph D Liu
29
Tp mc thng xuyn: ln nht vs. ng (3)Bt k tp mc thng xuyn ln nht no cng l tp mc thng xuyn ng Cch biu din s dng tp mc thng xuyn ln g y nht khng gi thng tin v h tr ca cc tp con (ca mi tp mc thng xuyn ln nht)Khai Ph D Liu
30
Gii thut FP-GrowthMt phng php khc cho vic xc nh cc tp mc thng xuynNh li: Apriori s dng c ch sinh-kim tra (sinh ra cc tp mc cn xt, v kim tra xem mi tp mc c phi l thng xuyn)
FP-Growth biu din d liu ca cc giao dch bng mt cu trc d liu gi l FP-tree FP tree FP-Growth s dng cu trc FP-tree xc nh trc tip cc tp mc thng xuyn
Khai Ph D Liu
31
Biu din bng FP-treeVi mi giao dch, FP-tree xy dng mt ng i (path) trong cy Hai giao dch c cha cng mt s cc mc, th ng i ca chng s c phn (on) chungCng nhiu cc ng i c cc phn chung, th vic biu din bng FP-tree s cng gn (compressed/compacted)
Nu kch thc ca FP-tree nh c th lu tr trong b nh lm vic, th gii thut FP-Growth c th xc nh cc tp mc thng xuyn trc tip t FP tree FP-tree lu trong b nhKhng cn phi lp li vic duyt d liu lu trn cngKhai Ph D Liu
32
Xy dng FP-tree (1)Ban u, FP-tree ch cha duy nht nt gc (c biu ) din bi k hiu null) C s d liu cc giao dch c duyt ln th 1, xc nh (tnh) h tr ca mi mc ( ) Cc mc khng thng xuyn (infrequent items) b loi b Cc m c th ng xuyn (frequent items) c sp xp mc thng n (freq ent c p theo th t gim dn v h trTrong v d ( cc slides tip theo), th t gim dn v h tr: A, B, C, D, E
C s d liu cc giao dch c duyt ln th 2, xy dng FP t d FP-treeKhai Ph D Liu 33
Xy dng FP-tree (2)(Sau khi xt giao dch th 1)
null A:1 A1 A:1 B:1 null A:2 B:1 C:1 C1
null
(Sau khi xt giao dch th 2)
B:1 B1 C:1 D:1 B:1 C:1(Sau khi xt giao dch th 3)
TID 1 2 3 4 5 6 7 8 9 10
Items { , } {A,B} {B,C,D} {A,C,D,E} {A,D,E} {A,B,C} {A,B,C,D} {A} {A,B,C} {A B C} {A,B,D} {B,C,E}
B:1
D:1 E:1Khai Ph D Liu
D:1 D134
Xy dng FP-tree (3)TID 1 2 3 4 5 6 7 8 9 10 Items {A,B} {B,C,D} {A,C,D,E} {A,D,E} {A,B,C} {A,B,C,D} {A} {A,B,C} {A,B,D} { , , } {B,C,E}
C s d liu cc giao dch
null A:8
(Sau khi xt giao dch th 10)
B:2 C:2 D:1
B:5 C:3 D:1 D:1
C:1 D:1 E:1
D:1 E:1
Bng con trItem A B C D E Pointer
E:1
Cc con tr c s dng trong q qu trnh sinh cc tp mc p thng xuyn ca FP-GrowthKhai Ph D Liu 35
FP-Growth: Sinh cc tp mc thng xuynFP-Growth sinh cc tp mc thng xuyn trc tip t FP tree, FP-tree t mc l n mc gc (bottom up) (bottom-up)Trong v d trn, FP-Growth trc ht tm cc tp mc thng xuyn kt thc bi E sau mi tm cc tp mc thng xuyn kt thc bi D bi C bi B v bi A D C B
V mi giao dch c biu din bng mt ng i trong FP-tree, chng ta c th xc nh cc tp mc FP tree, thng xuyn kt thc bi mt mc (vd: E), bng cch duyt cc ng i cha mc (E)Nhng Nh ng i ny c xc nh d dng b cc con t h d bng tr gn vi nt (vd: E)
Khai Ph D Liu
36
Cc ng i kt thc bi mt mc(Cc ng g i kt thc bi e) (Cc ng g i kt thc bi d)
( (Cc ng i g kt thc bi c)
( (Cc ng i g kt thc bi b)Khai Ph D Liu
( (Cc ng i g kt thc bi a)37
Xc nh cc tp mc thng xuynFP-Growth tm tt c cc tp mc thng xuyn kt thc bi mt mc da theo chin lc chia tr (divide(divide and-conquer)V d, cn tm tt c cc tp mc thng xuyn kt thc bi e Trc ht, ki t t T ht kim tra tp mc mc 1 ({ }) c phi l t mc ({e}) hi tp thng xuyn Nu n l tp mc thng xuyn, xt cc bi ton con: tm tt c cc t mc thng xuyn kt thc bi d tp th th de bi cebi b bi bev bi ae Mi bi ton con nu trn li c phn tch thnh cc bi ton con nh hn h h Kt hp cc li gii ca cc bi ton con, chng ta s thu c cc tp mc thng xuyn kt thc bi eKhai Ph D Liu
38
Vd: Cc tp mc thng xuyn kt thc bi eXc nh tt c cc ng i trong FP-tree kt thc bi e FP treeCc ng i tin t (prefix paths) i vi e
Da vo cc ng i tin t i vi e, xc nh h tr ca e, bng cch cng cc gi tr h tr gn vi nt e Gi s minsup=2 th tp mc {e} minsup=2, l tp mc thng xuyn (v n c h tr =3 > minsup)Khai Ph D Liu
Cc ng i tin t i vi e
39
Vd: Cc tp mc thng xuyn kt thc bi eV {e} l tp mc thng xuyn, nn FP-Growth phi gii quyt cc bi ton con: tm cc tp mc thng xuyn kt thc bi debi cebi bev bi ae Trc tin, cn chuyn cc ng i tin t ca e thnh biu din FP-tree c iu kin (conditional FP-tree)C cu trc tng t nh FP-tree c d t cc tp mc thng xuyn kt thc bi mt dng tm t th th t mc
Khai Ph D Liu
40
Xy dng FP-tree c iu kinCp nht cc gi tr h tr i vi cc ng i tin tV mt s gi tr h tr tnh n c cc giao dch khng cha mc e V d: ng i null b:2 c:2 e:1 g tnh n c giao dch {b,c} khng cha mc e. Do , gi tr h tr phi gn bng 1, th hin s lng cc giao dch cha {b,c,e}
Loi b nt e khi cc ng i tin t Sau khi cp nht cc gi tr h tr i vi cc ng i tin t, mt s mc c th tr nn khng thng xuyn B loi bVd: Nt b by gi c gi tr h tr =1Khai Ph D Liu
FP-tree c iu kin i vi e
41
Vd: Cc tp mc thng xuyn kt thc bi eFP-Growth s dng cu trc biu din FPtree c iu kin i vi e, gii quyt cc bi ton con: tm cc tp mc thng xuyn kt thc bi debi cebi bev bi ae Vd: Vd t cc tp mc thng xuyn kt tm t th thc bi de, cc ng i tin t i vi d c xy dng t biu din FP-tree c iu kin ki i vi e i Bng cch cng vi gi tr h tr gn vi nt d, d chng ta xc nh c h tr cho tp {d,e} h tr ca {d,e}=2: n l mt tp mc thng th xuyn Khai Ph D Liu 42
Cc ng i tin t i vi de
Sinh ra cc lut kt hp (1)Vi mi tp mc thng xuyn L, cn tm tt c cc tp con khc rng f L sao cho: f L f tha mn iu kin v tin cy ti thiu Vd: Vi tp mc thng xuyn {A,B,C,D}, cc lut cn xt gm c: ABC D, ABD C, ACD B, BCD A, A BCD BCD, B ACD ACD, C ABD ABD, D ABC AB CD, AC BD, AD BC, BC AD, BD AC, CD AB, Nu |L| = k, th s phi xt (2k 2) cc lut kt hp c th (b qua 2 lut: L v L)Khai Ph D Liu 43
Sinh ra cc lut kt hp (2)Lm th no sinh ra cc lut t cc tp mc thng xuyn, mt cch c hiu qu? y , q Xt tng qut, tin cy khng c c tnh khng n iu (anti-monotone)c(ABC D) c th ln hn hoc nh hn c(AB D)
Nhng, tin cy ca cc lut c sinh ra t cng mt tp mc thng xuyn th l i c tnh kh h h li c h khng n iuV d: Vi L = {A,B,C,D}: c(ABC D) c(AB CD) c(A BCD) tin cy c c tnh khng n iu i vi s lng cc mc v phi ca lut p utKhai Ph D Liu 44
Apriori: Sinh ra cc lut (1)Lattice ca cc lutLut c tin cy thp
Cc lut b loi bKhai Ph D Liu 45
Apriori: Sinh ra cc lut (2)Cc lut cn xt c sinh ra bng cch kt hp 2 lut c cng tin t (phn bt u) ca phn kt lun (rule consequent)CD=>AB BD=>AC
V d: Kt hp 2 lut (CD=>AB, BD=>AC) s sinh ra lut cn xt D => ABC Loi b lut D=>ABC nu bt k mt lut con ca n (AD=>BC, BCD=>A, ) khng c tin cy cao (< minconf)Khai Ph D Liu 46
D=>ABC
Ti liu tham kho P.-N. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining (chapter 6). Addison Wesley, 2005. Addison-Wesley,
Khai Ph D Liu
47