Top Banner
ER-Miner: A New Method ER-Miner: A New Method to Mine Essential to Mine Essential Rules and Constrained Rules and Constrained Essential Rules Essential Rules Donghui Zhang Donghui Zhang CCIS, Northeastern CCIS, Northeastern University University Unpublished work of our group
50

ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Mar 26, 2015

Download

Documents

Gabrielle Diaz
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

ER-Miner: A New Method to ER-Miner: A New Method to Mine Essential Rules and Mine Essential Rules and

Constrained Essential RulesConstrained Essential Rules

Donghui ZhangDonghui Zhang

CCIS, Northeastern UniversityCCIS, Northeastern University

Unpublished work of our group

Page 2: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

OutlineOutline

Association rule miningAssociation rule mining Existing work on mining essential Existing work on mining essential

rulesrules Our new scheme – ER-MinerOur new scheme – ER-Miner Count the number of strong rules – Count the number of strong rules –

ER-counterER-counter Mining constrained essential rulesMining constrained essential rules PerformancePerformance

Page 3: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

What Is Association Mining?What Is Association Mining?

Given a transactional database (TDB), wGiven a transactional database (TDB), where each transaction contains some itehere each transaction contains some items, ms,

Find association rules, e.g. {bread}Find association rules, e.g. {bread}{mil{milk, egg}.k, egg}.

Strong rule: confidence and support are Strong rule: confidence and support are high.high.

Page 4: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Why Is Association Mining an Essential Why Is Association Mining an Essential Task in Data Mining?Task in Data Mining?

Foundation for many essential data mining tasksFoundation for many essential data mining tasks• Association, correlation, causalityAssociation, correlation, causality

• Sequential patterns, temporal or cyclic association, Sequential patterns, temporal or cyclic association, partial periodicity, spatial and multimedia associationpartial periodicity, spatial and multimedia association

• Associative classification, cluster analysis, iceberg cube, Associative classification, cluster analysis, iceberg cube, fascicles (semantic data compression)fascicles (semantic data compression)

Broad applicationsBroad applications• Basket data analysis, cross-marketing, catalog design, Basket data analysis, cross-marketing, catalog design,

sale campaign analysissale campaign analysis• Web log (click stream) analysis, DNA sequence analysis, Web log (click stream) analysis, DNA sequence analysis,

etc.etc.

Page 5: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Basic Concepts: Frequent ItemsetBasic Concepts: Frequent Itemsets and Association Ruless and Association Rules

Itemset X={xItemset X={x11, …, x, …, xkk}} Find all the rules Find all the rules XX→→YY with min with min

confidence and supportconfidence and support• supportsupport, , ss, , probabilityprobability that that

a transaction contains Xa transaction contains XYY• confidenceconfidence, , c,c, conditional pconditional p

robabilityrobability that a transaction that a transaction having X also contains having X also contains YY..

Let min_support = 50%, min_conf = 50%:

A → C (50%, 66.7%)C → A (50%, 100%)

Customerbuys diaper

Customerbuys both

Customerbuys beer

Transaction-Transaction-idid

Items boughtItems bought

1010 A, B, CA, B, C

2020 A, CA, C

3030 A, DA, D

4040 B, E, FB, E, F

Page 6: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Mining Association Rules—an Mining Association Rules—an ExampleExample

For rule For rule AA →→ CC::support = support({support = support({AA,,CC}) = 50%}) = 50%confidence = support({confidence = support({AA,,CC})/support({})/support({AA}) = 66.6%}) = 66.6%

Min. support 50%Min. confidence 50%

Transaction-Transaction-idid

Items boughtItems bought

1010 A, B, CA, B, C

2020 A, CA, C

3030 A, DA, D

4040 B, E, FB, E, F

Frequent itemsetFrequent itemset SupportSupport

{A}{A} 75%75%

{B}{B} 50%50%

{C}{C} 50%50%

{A, C}{A, C} 50%50%

Page 7: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Mining Association Rules—A Mining Association Rules—A Two-Phase ApproachTwo-Phase Approach

Frequent Itemset GenerationFrequent Itemset Generation. Scan through . Scan through TDB and find all itemsets whose support is aTDB and find all itemsets whose support is above bove minsupportminsupport..

• Apriori, FP-Growth, …Apriori, FP-Growth, … Rule GenerationRule Generation. Given frequent itemset . Given frequent itemset SS, fo, fo

r every subset r every subset SS’, try to generate ’, try to generate SSS’S’ (co (controlling parameter is ntrolling parameter is minconfidence)minconfidence)..

Page 8: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

OutlineOutline

Association rule miningAssociation rule mining Existing work on mining essential Existing work on mining essential

rulesrules Our new scheme – ER-MinerOur new scheme – ER-Miner Count the number of strong rules – Count the number of strong rules –

ER-counterER-counter Mining constrained essential rulesMining constrained essential rules PerformancePerformance

Page 9: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

A Critical ObservationA Critical ObservationRuleRule SupportSupport ConfidenceConfidence

AA →→ BC BC sup(Asup(ABBC)C) sup(Asup(ABBC)/sup(A)C)/sup(A)AAB B →→ C C sup(Asup(ABBC)C) sup(Asup(ABBC)/sup(AC)/sup(AB)B)ACAC →→ B B sup(Asup(ABBC)C) sup(Asup(ABBC)/sup(AC)/sup(AC)C)AA →→ B B sup(Asup(AB)B) sup(Asup(AB)/sup(A)B)/sup(A)AA →→ C C sup(Asup(AC)C) sup(Asup(AC)/sup(A)C)/sup(A)

AA →→ BCBC has smaller support and confidence than has smaller support and confidence than the other rules, the other rules, independentindependent to the TDB. to the TDB.

Rules Rules AAB B →→ C, C, ACAC →→ B, B, AA →→ B and B and AA →→ C are C are redundantredundant with regard to A with regard to A →→ BC. BC.

While mining association rules, a large While mining association rules, a large percentage of rules may be redundant.percentage of rules may be redundant.

Page 10: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Formal Definition of Essential RuleFormal Definition of Essential Rule

Definition 1Definition 1 Rule rRule r11 impliesimplies another rule r another rule r2 2 if sif support(rupport(r11)≤support(r)≤support(r22) and confidence(r) and confidence(r11)≤ c)≤ confidence(ronfidence(r22) independent to TDB. ) independent to TDB.

Denote as rDenote as r11 r r22

Definition 2Definition 2 Rule rRule r11 is an is an essential ruleessential rule if r if r11 is s is strong and trong and r r22 s.t. r s.t. r22 r r11 . .

Page 11: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Generate Strong Rules from an Generate Strong Rules from an Essential RuleEssential Rule

Theorem 1Theorem 1 Rule X→Y Rule X→Y X’→Y’, iff X X’→Y’, iff X X’, Y X’, Y Y’, and XY’, and XY Y X’ X’Y’.Y’.

Given an essential rule Given an essential rule AA→→BCDBCD, all rules it can , all rules it can imply can be derived by deleting a non-empty imply can be derived by deleting a non-empty subset (e.g. {subset (e.g. {BCBC}) from its consequent, and ad}) from its consequent, and add part of the deleted subset (e.g. d part of the deleted subset (e.g. , {, {BCBC}) to its }) to its antecedent.antecedent.

E.g., both E.g., both ABCABC→→DD and and AA→→DD can be implied by can be implied by AA→→BCDBCD..

Page 12: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

What’s the Benefits of Mining What’s the Benefits of Mining Essential Rules?Essential Rules?

A strong rule where the consequent A strong rule where the consequent contains contains kk items will cause other items will cause other 33kk--22kk-1-1 strong rules to be redundant. strong rules to be redundant.

E.g., E.g., AA→→BCDBCD will cause 18 rules to be will cause 18 rules to be redundant.redundant.

The set of essential rules is much The set of essential rules is much more compact, and we have simple more compact, and we have simple methods to derive the redundant methods to derive the redundant rules.rules.

Page 13: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Existing Work – Adjacency LatticeExisting Work – Adjacency Lattice

Aggarwal and Yu [AY98] proposed to pre-compAggarwal and Yu [AY98] proposed to pre-compute the in-memory structure ute the in-memory structure adjacency latticeadjacency lattice..

Every node corresponds to an itemset, whose Every node corresponds to an itemset, whose support is no less than a small threshold (primsupport is no less than a small threshold (primary support).ary support).

A direct link points from itemset A direct link points from itemset XX to to YY if if XX Y Y a and nd |X|=|Y|-1|X|=|Y|-1..

At query time, any At query time, any minsupportminsupport provided by us provided by user is no less than the primary support.er is no less than the primary support.

Page 14: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Existing Work – Adjacency LatticeExisting Work – Adjacency LatticeNull

A 1% B 2% C 2% D 1%

AB 0.5% AC 0.7% BC 0.4% BD 0.6%

ABC 0.3%

Page 15: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Existing Work – Mining Essential Existing Work – Mining Essential RulesRules

The existing algorithm to mine essential rules The existing algorithm to mine essential rules works as follows.works as follows.

Given an itemset Given an itemset XX and and minconfidenceminconfidence, start fr, start from node om node XX, browse the lattice upward to find e, browse the lattice upward to find every ancestor very ancestor X’X’ s.t. s.t.• Confidence(X’ Confidence(X’ X - X’) ≥ X - X’) ≥ minconfidenceminconfidence• This is not true for any parent node of X’This is not true for any parent node of X’

We report We report X’ → X – X’X’ → X – X’ as an essential rule. as an essential rule.

Page 16: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Existing Work – Adjacency LatticeExisting Work – Adjacency LatticeNull

A 1% B 2% C 2% D 1%

AB 0.5% AC 0.7% BC 0.4% BD 0.6%

ABC 0.3%

minconfidence=60%Input = {ABC}

ERs:ABCBCA

Page 17: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Existing Work – LimitationsExisting Work – Limitations

Only when the antecedent union the consequeOnly when the antecedent union the consequent is EQUAL to some input itemset.nt is EQUAL to some input itemset.

E.g., if the input is {E.g., if the input is {ABCABC}, can not find essential }, can not find essential rule rule AA→→C.C.

In order to find all essential rules, needs to takIn order to find all essential rules, needs to take as input e as input allall the frequent itemsets. the frequent itemsets.

A frequent itemset with A frequent itemset with nn items have 2 items have 2nn subset subsets, all of them are frequent! (except for s, all of them are frequent! (except for ))

Page 18: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

OutlineOutline

Association rule miningAssociation rule mining Existing work on mining essential Existing work on mining essential

rulesrules Our new scheme – ER-MinerOur new scheme – ER-Miner Count the number of strong rules – Count the number of strong rules –

ER-counterER-counter Mining constrained essential rulesMining constrained essential rules PerformancePerformance

Page 19: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Basic Concept: Max ItemsetBasic Concept: Max Itemset

Given a TDB, a Given a TDB, a max itemsetmax itemset is a frequen is a frequent itemset, where any superset is not freqt itemset, where any superset is not frequent.uent.

# max itemsets is much smaller than # fr# max itemsets is much smaller than # frequent itemset.equent itemset.

Page 20: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Our New Scheme – ER-MinerOur New Scheme – ER-Miner

ER-MinerER-Miner takes as input the set of max itemset takes as input the set of max itemsets.s.

ER-MinerER-Miner is based on the is based on the ERM-treeERM-tree Pruning techniques Pruning techniques Duplicate avoidance that minimize the numbeDuplicate avoidance that minimize the numbe

r of rules we examine.r of rules we examine.

Page 21: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Lattice of rulesLattice of rules

Defined with:Defined with:• set = set of all rulesset = set of all rules• op = implicationop = implication

Root (virtual): Root (virtual): ABC.ABC. To generate a child rule: either delete an To generate a child rule: either delete an

item from the consequent, or move it to item from the consequent, or move it to the antecedent.the antecedent.

Page 22: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Example of a Lattice of rulesExample of a Lattice of rules

(ABC)

A(BC)C(AB)B(AC)

AC(B)AB(C)

A(B)A(C) B(A)

B(C) BC(A)

C(A)

C(B)

Idea: browse top-down; prune a sub-tree whenever an essential rule is found.

Page 23: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Lattice Lattice ERM-tree ERM-tree

(ABC)

A(BC)C(AB)B(AC)

AC(B)AB(C)

A(B)A(C)

B(A)

B(C) BC(A)

AB(C)

C(A)

C(B)

AC(B) BC(A)

Page 24: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Duplicate Elimination by Duplicate Elimination by the Moving Principlethe Moving Principle

AB(C)AB(C) was generated from was generated from A(BC)A(BC) by by moving moving BB (from consequent to (from consequent to antecedent), while antecedent), while AB(C)AB(C) was was generated from generated from B(AC)B(AC) by moving by moving A.A.

Moving principle:Moving principle: Move an item Move an item only if the item is bigger than all only if the item is bigger than all items in the antecedent.items in the antecedent.

Page 25: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Example of a Preliminary version of Example of a Preliminary version of ERM-treeERM-tree

(ABC)

A(BC)C(AB)B(AC)

AC(B)AB(C)

A(B)A(C)

B(A)

B(C) BC(A)

AB(C)

C(A)

C(B)

AC(B) BC(A)

Page 26: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

The Deletion PrincipleThe Deletion Principle

keep track of the recent keep track of the recent deleted item deleted item rr. .

Pass Pass rr to the child nodes to the child nodes to be generated later.to be generated later.

When expanding a node, When expanding a node, only delete an item if it only delete an item if it is larger than is larger than rr..

A(BCD)

A(CD) A(BD)

A(D) A(D)

delete B delete C

delete Bdelete C

r=C

Page 27: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

The Order PrincipleThe Order Principle

Apply deletion before Apply deletion before moving. moving.

Within each category Within each category process smaller items process smaller items first.first.

A(BCD)

AB(CD)

A(BD)

AB(D) AB(D)

move B delete C

move Bdelete C

After deletion is processed, here r=D

Page 28: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Examine the ERM-treeExamine the ERM-tree

By incorporating the three principles, our ERBy incorporating the three principles, our ERM-tree does not have duplicates, while containM-tree does not have duplicates, while contains all rules from a given max itemset.s all rules from a given max itemset.

We do not pre-build the whole tree. Instead, oWe do not pre-build the whole tree. Instead, our ur ERM-Tree-ExamineERM-Tree-Examine algorithm algorithm examinesexamines eac each node in a breath-first order.h node in a breath-first order.

In this way, we guarantee:In this way, we guarantee:• If If rr11 r r22, , rr11 is examined before is examined before rr22;;• Each rule is examined at most once.Each rule is examined at most once.

Page 29: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Algorithm Algorithm ERM-Tree-ExamineERM-Tree-Examine Input: a max itemset M, an examine procedure.Input: a max itemset M, an examine procedure.

• enqueue [enqueue [(M), -1] into a queue Q;(M), -1] into a queue Q;• whilewhile Q is not empty Q is not empty

dequeue an element [H(T), r] from Q;dequeue an element [H(T), r] from Q; examineexamine H→T unless H= H→T unless H= ;; ifif |T|=1, continue loop; |T|=1, continue loop; forfor every item m in T in increasing order every item m in T in increasing order

• ifif H H and m>r and m>r change r to m;change r to m; enqueue [H(T-m), r] into Q;enqueue [H(T-m), r] into Q;

end ifend ifend forend for

forfor every item m in T in increasing order every item m in T in increasing order• ifif h h H, m>hH, m>h

enqueue [Henqueue [H{m}(T-m), r] into Q{m}(T-m), r] into Qend ifend if

end forend forend whileend while

Page 30: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Example of ERM-treeExample of ERM-tree(ABC), -1

A(BC), CC(AB), BB(AC), C

AC(B),CAB(C),CA(B),CA(C),C B(A),CB(C),C BC(A),C C(A),BC(B),B

Each node is of the form [X(Y), Each node is of the form [X(Y), rr], where X(Y) ], where X(Y) represents a rule X→Y, and represents a rule X→Y, and rr is the recent deleted is the recent deleted item. The root contains a virtual rule item. The root contains a virtual rule →Y, and its →Y, and its rr initially equals to -1. initially equals to -1.

Page 31: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Mining Essential Rules from A SinglMining Essential Rules from A Single Max Itemsete Max Itemset

Algorithm ER-Miner-Single mines essentiAlgorithm ER-Miner-Single mines essential rules from a single max itemset, by mal rules from a single max itemset, by modifying ERM-Tree-Examine in the followodifying ERM-Tree-Examine in the following ways.ing ways.• Instantiate the examine procedure as the onInstantiate the examine procedure as the on

e to compute the confidence of a given rule;e to compute the confidence of a given rule;• Whenever a strong rule is identified, we repWhenever a strong rule is identified, we rep

ort it as an essential rule and omit examininort it as an essential rule and omit examining the sub-tree (i.e. prune the sub-tree).g the sub-tree (i.e. prune the sub-tree).

Page 32: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Local False Positive EliminationLocal False Positive Elimination Local false positives can be Local false positives can be

eliminated by comparing it eliminated by comparing it to the existing essential to the existing essential rules to see if any essential rules to see if any essential rule can imply it.rule can imply it.

Since the number of Since the number of essential rules is not large, essential rules is not large, even if we straightforwardly even if we straightforwardly scanning through all scanning through all essential rules, the essential rules, the performance is still performance is still acceptable.acceptable.

An optimized method is also An optimized method is also proposed so that only part of proposed so that only part of the existing essential rules the existing essential rules need to be compared.need to be compared.

(ABC), -1

A(BC), C B(AC), C C(AB), B

BC(A),C

Local false essential rule BC→A

Page 33: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Mining Essential Rules from MultiplMining Essential Rules from Multiple Max Itemsetse Max Itemsets

Given a TDB and a set of max itemsets, straighGiven a TDB and a set of max itemsets, straightforwardly we can run ER-Miner-Single on eactforwardly we can run ER-Miner-Single on each max itemset individually to find all essential h max itemset individually to find all essential rules.rules.

Two problem with the straightforward solutioTwo problem with the straightforward solution:n:• Duplicate examinationDuplicate examination• Global false positivesGlobal false positives

ER-Miner-Multiple integrates the solutions to tER-Miner-Multiple integrates the solutions to the above two problems with ER-Miner-Single.he above two problems with ER-Miner-Single.

Page 34: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Duplicate Examine AvoidanceDuplicate Examine Avoidance

A sub-tree may exist in muA sub-tree may exist in multiple ERM-trees Avoid sucltiple ERM-trees Avoid such duplicate examines as foh duplicate examines as follows.llows.• when we visit a node X(Y) of when we visit a node X(Y) of

an ERM-tree, it is compared an ERM-tree, it is compared to previously examined max to previously examined max itemsets.itemsets.

• If some max itemset containIf some max itemset contains Xs XY, omit examining this nY, omit examining this node (as well as the sub-tree).ode (as well as the sub-tree).

(ABCDF)

(ABCDEH)

(ABCEG)

{ABCD}

{ABCE}

{ABC}

Page 35: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Global False Positive EliminationGlobal False Positive Elimination

One ERM-tree may report an One ERM-tree may report an essential rule which implies essential rule which implies previously identified “essential” previously identified “essential” rules from some other ERM-tree, rules from some other ERM-tree, (global false positives).(global false positives).

To eliminate the global false To eliminate the global false positives, each essential rule is positives, each essential rule is compared to the existing compared to the existing essential rules got from other essential rules got from other examined ERM-trees.examined ERM-trees.

Only need to check whether an Only need to check whether an existing reported rule can be existing reported rule can be implied by the new reported rule.implied by the new reported rule.

(ABCDF)

(ABCDEH)

B→CD B→CDH

Page 36: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

OutlineOutline

Association rule miningAssociation rule mining Existing work on mining essential Existing work on mining essential

rulesrules Our new scheme – ER-MinerOur new scheme – ER-Miner Count the number of strong rules – Count the number of strong rules –

ER-counterER-counter Mining constrained essential rulesMining constrained essential rules PerformancePerformance

Page 37: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Counting the Number of Strong Counting the Number of Strong RulesRules

In OLAP environment, user may be also iIn OLAP environment, user may be also interested in knowing the number of stronterested in knowing the number of strong rules to refine ng rules to refine minsupportminsupport and and mincominconfidencenfidence..

The set of essential rules along with the The set of essential rules along with the number of total strong rules will give usenumber of total strong rules will give users a better idea of how to adjust these twrs a better idea of how to adjust these two thresholds.o thresholds.

Page 38: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Problem DefinitionProblem Definition

Definition 3Definition 3 Given a set of strong rules Given a set of strong rules R={rR={r11,…,r,…,rnn}, the derived set of R, denoted }, the derived set of R, denoted as S(ras S(r11,…,r,…,rnn), is the set of all strong rules, ), is the set of all strong rules, each of which either belongs to R, or can each of which either belongs to R, or can be implied by some rbe implied by some riiR.R.

Our ER-Counter takes as input the Our ER-Counter takes as input the set of essential rules R={rset of essential rules R={r11,…,r,…,rnn}, }, and returns|S(rand returns|S(r11,…,r,…,rnn)|.)|.

Page 39: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Non-intersected Derived SetsNon-intersected Derived Sets

For n input strong rules R={rFor n input strong rules R={r11,…,r,…,rnn}, }, any two derived sets, S(rany two derived sets, S(rii) and S(r) and S(rjj), ), do not intersect, |S(rdo not intersect, |S(r11,…,r,…,rnn)|=)|=i=1i=1(3(3|Yi||Yi|--22|Yi||Yi|), where Y), where Yii is the consequent of is the consequent of rrii..

Especially, the size of a derived set Especially, the size of a derived set with regard to a single rule with regard to a single rule rr is 3 is 3kk-2-2kk, , where k is the size of the consequent where k is the size of the consequent of of rr..

Page 40: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

If S(rIf S(rii) and S(r) and S(rjj) intersect, it gives the upper ) intersect, it gives the upper bound to the exact number.bound to the exact number.

For example, given two essential rules For example, given two essential rules A→BC and B→AC, both rules imply AB→C, A→BC and B→AC, both rules imply AB→C, and AB→C is counted twice.and AB→C is counted twice.

Theorem 2Theorem 2 Given two strong rules rGiven two strong rules r11: X: X11→Y→Y11 and rand r22: X: X22→Y→Y22, S(r, S(r11) intersects with S(r) intersects with S(r22) if only if r1 ) if only if r1 and r2 satisfy all three conditions: (1) Yand r2 satisfy all three conditions: (1) Y11YY2 2 , , (2) X(2) X11-X-X22YY22, (3) X, (3) X22-X-X1 1 Y Y11..

Overlap Among Derived SetsOverlap Among Derived Sets

Page 41: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Overlap Among Derived SetsOverlap Among Derived Sets

Theorem 3Theorem 3 Given two strong rules rGiven two strong rules r11: : XX11→Y→Y11 and r and r22: X: X22→Y→Y22, if S(r, if S(r11))S(rS(r22) ) , then , then r’= Xr’= X11XX22→Y→Y11YY22 , denoted as r , denoted as r11rr22, must , must be a valid strong rule, and S(r’)= be a valid strong rule, and S(r’)= S(rS(r11))S(rS(r22).).

Then the number of derived rules of Then the number of derived rules of two intersected strong rules is |S(rtwo intersected strong rules is |S(r11)|)|+|S(r+|S(r22)|-|S(r)|-|S(r11rr22)|.)|.

Page 42: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Overlap GraphOverlap Graph

Each node represents an essential rule. IEach node represents an essential rule. If f S(rS(rii))S(rS(rjj) ) , there is an edge betwee, there is an edge between n rrii and and rrjj..

r1

r2

r4

r6

r7

r3 r5

r8

Page 43: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Overlap Among Derived SetsOverlap Among Derived Sets

Consider the complex example, rConsider the complex example, r44 intersects with more than one rules, i.e. intersects with more than one rules, i.e. rr44 has more than one neighbors. has more than one neighbors.

Recursively call ER-Counter to compute Recursively call ER-Counter to compute the exact number of overlapping strong the exact number of overlapping strong rules between rrules between r44 and its neighbors. and its neighbors.

In this example, it equals to In this example, it equals to

|S(r|S(r4 4 rr33, r, r4 4 rr55, r, r4 4 rr66)|.)|.

Page 44: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Algorithm Algorithm ER-CounterER-Counter

Input: a set of essential rules Input: a set of essential rules {r{r11,…,r,…,rnn}.}.• sum = 0sum = 0;;• forfor every rule every rule rrii in increasing order in increasing order

sum = sum + 3sum = sum + 3kk-2-2kk, where , where kk is the number of items in the co is the number of items in the consequent of nsequent of rrii..

Find the neighbors of rFind the neighbors of rii among among {r{ri+1i+1,…,r,…,rnn}} in the conceptual in the conceptual overlap graph. Let the set of neighbors be overlap graph. Let the set of neighbors be {r{rj1j1,…,r,…,rjtjt}.}.

ifif the set of neighbors is not empty the set of neighbors is not empty• sum = sum – ER-Counter(rsum = sum – ER-Counter(riirrj1j1, …, r, …, riirrjtjt))

end ifend ifend forend for

• returnreturn sumsum;;

Page 45: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

OutlineOutline

Association rule miningAssociation rule mining Existing work on mining essential Existing work on mining essential

rulesrules Our new scheme – ER-MinerOur new scheme – ER-Miner Count the number of strong rules – Count the number of strong rules –

ER-counterER-counter Mining constrained essential rulesMining constrained essential rules PerformancePerformance

Page 46: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Problem DefinitionProblem Definition

The extended problem of finding The extended problem of finding constrained essential rulesconstrained essential rules is: among is: among all strong rules that satisfy given all strong rules that satisfy given constraints, find those that cannot be constraints, find those that cannot be implied by any other such rule.implied by any other such rule.

Example of the constraint: the rule Example of the constraint: the rule antecedent should contain antecedent should contain sodasoda..

Page 47: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Mining Constrained Essential RulesMining Constrained Essential Rules

Our ER-Miner scheme is extended as follOur ER-Miner scheme is extended as follows.ows.• Only those max itemsets that satisfy the iteOnly those max itemsets that satisfy the ite

mset constraints will be chosen to generate mset constraints will be chosen to generate ERM-trees.ERM-trees.

• When browsing the ERM-tree, at each node When browsing the ERM-tree, at each node we check whether the constraint is satisfied we check whether the constraint is satisfied prior to the confidence checking.prior to the confidence checking.

• Try to prune the sub-tree according to the cTry to prune the sub-tree according to the constraint, if applicable.onstraint, if applicable.

Page 48: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

OutlineOutline

Association rule miningAssociation rule mining Existing work on mining essential Existing work on mining essential

rulesrules Our new scheme – ER-MinerOur new scheme – ER-Miner Count the number of strong rules – Count the number of strong rules –

ER-counterER-counter Mining constrained essential rulesMining constrained essential rules PerformancePerformance

Page 49: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

Performance Comparison

32 54 260 1083

10

100

1000

10000

100000

ER-Miner

Lattice

# Max Patterns

Exe

cutio

n T

ime

(m

s)

Page 50: ER-Miner: A New Method to Mine Essential Rules and Constrained Essential Rules Donghui Zhang CCIS, Northeastern University Unpublished work of our group.

0.2 0.4 0.5 0.6 0.7 0.8 0.9 0.9810

100

1000

10000

100000

Mushroom Lattice

Chess Lattice

Connect Lattice

Mushroom ER-Miner

Chess ER_Miner

Connect ER-Miner

Confidence

Exe

cutio

n T

ime

(ms)