Top Banner
CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining
25

CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.

Jan 05, 2016

Download

Documents

Austin Wilkins
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.

CS 478 – Tools for Machine Learning and Data Mining

Association Rule Mining

Page 2: CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.

2

Page 3: CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.

Association Rule Mining

• Clearly not limited to market-basket analysis• Associations may be found among any set of

attributes– If a representative votes Yes on issue A and No on

issue C, then he/she votes Yes on issue B– People who read poetry and listen to classical

music also go to the theater

• May be used in recommender systems

Page 4: CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.

A Market-Basket Analysis Example

4

Page 5: CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.

Terminology

5

Item

Itemset

Transaction

Page 6: CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.

Association Rules

• Let U be a set of items– Let X, Y U– X Y =

• An association rule is an expression of the form X Y, whose meaning is:– If the elements of X occur in some context, then

so do the elements of Y

6

Page 7: CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.

Quality Measures

• Let T be the set of all transactions• We define:

7

Page 8: CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.

Learning Associations

• The purpose of association rule learning is to find “interesting” rules, i.e., rules that meet the following two user-defined conditions:– support(X Y) MinSupport– confidence(X Y) MinConfidence

8

Page 9: CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.

Basic Idea

• Generate all frequent itemsets satisfying the condition on minimum support

• Build all possible rules from these itemsets and check them against the condition on minimum confidence

• All the rules above the minimum confidence threshold are returned for further evaluation

9

Page 10: CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.

Apriori Principle

• Theorem:– If an itemset is frequent, then all of its subsets

must also be frequent (the proof is straightforward)

• Corollary:– If an itemset is not frequent, then none of its

superset will be frequent

• In a bottom up approach, we can discard all non-frequent itemsets

Page 11: CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.

AprioriAll

• L1 • For each item Ij I

– count({Ij}) = | {Ti : Ij Ti} |– If count({Ij}) MinSupport x m

• L1 L1 {({Ij}, count({Ij})}• k 2• While Lk-1

– Lk – For each (l1, count(l1)), (l2, count(l2)) Lk-1

• If (l1 = {j1, …, jk-2, x} l2 = {j1, …, jk-2, y} x y)– l {j1, …, jk-2, x, y}– count(l) | {Ti : l Ti } |– If count(l) MinSupport x m

Lk Lk {(l, count(l))}– k k + 1

• Return L1 L2… Lk-1

11

Page 12: CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.
Page 13: CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.

13

Page 14: CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.

14

Page 15: CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.

15

Page 16: CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.

16

Page 17: CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.

17

Page 18: CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.

18

Page 19: CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.

19

Page 20: CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.

20

Page 21: CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.

Illustrative Training Set

Page 22: CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.

Running Apriori (I)

• Items:– (CH=Bad, .29) (CH=Unknown, .36) (CH=Good, .36)– (DL=Low, .5) (DL=High, .5)– (C=None, .79) (C=Adequate, .21)– (IL=Low, .29) (IL=Medium, .29) (IL=High, .43)– (RL=High, .43) (RL=Moderate, .21) (RL=Low, .36)

• Choose MinSupport=.4 and MinConfidence=.8

22

Page 23: CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.

Running Apriori (II)

• L1 = {(DL=Low, .5); (DL=High, .5); (C=None, .79); (IL=High, .43); (RL=High, .43)}

• L2 = {(DL=High + C=None, .43)}

• L3 = {}

23

Page 24: CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.

Running Apriori (III)

• Two possible rules:– DL = High C = None (A)– C = None DL = High (B)

• Confidences:– Conf(A) = .86 Retain– Conf(B) = .54 Ignore

24

Page 25: CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.

Summary

• Note the following about Apriori:– A “true” data mining algorithm.– Despite popularity, real reported applications are few– Easy to implement with a sparse matrix and simple sums– Computationally expensive

• Actual run-time depends on MinSupport• In the worst-case, time complexity is O(2n)

– Not strictly an associations learner• Induces rules, which are inherently unidirectional• There are alternatives (e.g., GRI)

25