Top Banner
Association Rule Mining Ayesha Ali
16

Association Rule Mining in Data Mining

Apr 14, 2017

Download

Education

Ayesha Ali
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Association Rule Mining in Data Mining

Association Rule Mining

Ayesha Ali

Page 2: Association Rule Mining in Data Mining

Association Analysis

• Discovery of Association Rules – showing attribute-value conditions that occur

frequently together in a set of data, e.g. market basket

– Given a set of data, find rules that will predict the occurrence of a data item based on the occurrences of other items in the data

• A rule has the form body head⇒– buys(Omar, “milk”) buys(Omar, “sugar”)⇒

Page 3: Association Rule Mining in Data Mining

Association Analysis

Page 4: Association Rule Mining in Data Mining

Association AnalysisLocation Business Type

1 Barber, Bakery, Convenience Store, Meat Shop, Fast Food

2 Bakery, Bookstore, Petrol Pump, Convenience Store, Library, Fast Food

3 Carpenter, Electrician, Barber, Hardware Store,

4 Bakery, Vegetable Market, Flower Shop, Sweets Shop, Meat Shop

5 Convenience Store, Hospital, Pharmacy, Sports Shop, Gym, Fast Food

6 Internet Café, Gym, Games Shop, Shorts Shop, Fast Food, Bakery

Association Rule: X Y ; (Fast Food, Bakery) (Convenience Store)

Support S: Fraction of items that contain both X and Y = P(X U Y) S(Fast Food, Bakery, Convenience Store) = 2/6 = .33

Confidence C: how often items in Y appear in locations that contain X = P(X U Y) C[(Fast Food, Bakery) (Convenience Store)] = P(X U Y) / P(X)

= 0.33/0.50 = .66

Page 5: Association Rule Mining in Data Mining

Association Analysis

• Given a set of transactions T, the goal of association rule mining is to find all rules having– support ≥ minsup threshold– confidence ≥ minconf threshold

• Brute-force approach:– List all possible association rules– Compute the support and confidence for each rule– Prune rules that fail the minsup and minconf thresholds

⇒ Computationally prohibitive!

Page 6: Association Rule Mining in Data Mining

Association AnalysisLocation Business Type

1 Barber, Bakery, Convenience Store, Meat Shop, Fast Food, Meat Shop

2 Bakery, Bookstore, Petrol Pump, Convenience Store, Library, Fast Food

3 Carpenter, Electrician, Barber, Hardware Store, Meat Shop

4 Bakery, Vegetable Market, Flower Shop, Sweets Shop, Meat Shop

5 Convenience Store, Hospital, Pharmacy, Sports Shop, Gym, Fast Food

6 Internet Café, Gym, Sweets Shop, Shorts Shop, Fast Food, Bakery

Association Rules: (Fast Food, Bakery) (Convenience Store) Support S: .33 Confidence C: .66(Convenience Store, Bakery) (Fast Food) Support S: .33 Confidence C: .50(Fast Food, Convenience Store) (Bakery) Support S: .33 Confidence C: .55(Convenience Store) (Fast Food, Bakery) Support S: .33 Confidence C: .66(Fast Food) (Convenience Store, Bakery) Support S: .33 Confidence C: 1(Bakery) (Fast Food, Convenience Store) Support S: .33 Confidence C: .66

Page 7: Association Rule Mining in Data Mining

Association AnalysisAssociation Rules: (Fast Food, Bakery) (Convenience Store) Support S: .33 Confidence C: .66(Convenience Store, Bakery) (Fast Food) Support S: .33 Confidence C: .50(Fast Food, Convenience Store) (Bakery) Support S: .33 Confidence C: .66(Convenience Store) (Fast Food, Bakery) Support S: .33 Confidence C: .66(Fast Food) (Convenience Store, Bakery) Support S: .33 Confidence C: 1(Bakery) (Fast Food, Convenience Store) Support S: .33 Confidence C: .66

Observations

Above rules are binary partitions of given item set Identical Support but different Confidence Support and Confidence thresholds may be different

Page 8: Association Rule Mining in Data Mining

Mining Association Rules

• Two-step approach:

Step 1. Frequent Itemset GenerationGenerate all itemsets whose support ≥ minsup

Step 2. Rule GenerationGenerate high confidence rules from each frequent itemset,where each rule is a binary partitioning of a frequent itemset

Note: Frequent itemset generation is still computationally expensive

Page 9: Association Rule Mining in Data Mining

Mining Association Rules

• Frequent Item Generation

Lattice Graph of possible item sets

Page 10: Association Rule Mining in Data Mining

Mining Association Rules

• Brute-force approach:– Each node in the lattice graphs is a candidate frequent itemset– Count the support of each candidate by scanning the database

– N = 6– w = (Barber, Bakery, Convenience Store, Meat Shop, Fast Food, Bookstore, Petrol Pump, Library, Carpenter,

Electrician, Hardware Store, Vegetable Market, Flower Shop, Sweets Shop, Meat Shop, Hospital, Pharmacy, Sports Shop, Gym, Internet Café) = 20

– M = 220 = 1048576– Complexity ~ O (NMw)

Page 11: Association Rule Mining in Data Mining

Mining Association Rules

W Unique Items in Item set

Page 12: Association Rule Mining in Data Mining

Mining Association Rules

• Frequent Itemset Generation – Reduce the number of candidates (M)– Reduce the number of transactions/locations (N)– Reduce the number of comparisons (NM)• Use efficient data structures to store the candidates• No need to match every candidate against every

transaction/location

Page 13: Association Rule Mining in Data Mining

Reducing the number of candidates

• Apriori principle:– If an itemset is frequent, then all of its subsets must also

be frequent• Important Support property:

– Support of an itemset never exceeds the support of its subsets

– This is known as the anti-monotone property of support

Page 14: Association Rule Mining in Data Mining

Reducing the number of candidates

Applying Apriori principle

Page 15: Association Rule Mining in Data Mining

Reducing the number of candidates

• N = 20• All Possible candidate sets;

– NC1 + NC2 + NC3 + … + NCN

• Minimum Occurrence Based Filtering

Set m= 2 and L = 1While (L < N){

Scan DB: List = Create Occurrence Frequency Table of candidate sets of Length LIf no candidate in List then Break;

Filter all candidate sets with Occurrence Frequency < mCreate new candidate set of Length (L=L+1) from List

}

Page 16: Association Rule Mining in Data Mining

Filter Minimum Occurrences

m < 2

Reducing the number of candidatesBusiness Type Count

Barber 2

Bakery 2

Book tore 1

Carpenter 1

Convenience Store 3

Electrician 1

Fast Food 3

Flower Shop 1

Gym 1

Games Shop 1

Hardware Store 1

Hospital 1

Internet Café 1

Library 1

Meat Shop 1

Petrol Pump 1

Pharmacy 1

Sports Shop 1

Sweets Shop 1

Vegetable Market 1

Business Type CountBarber 2

Bakery 2

Convenience Store 3

Fast Food 3

Filter

Scan 1

Business Type Count(Barber, Bakery) 1

(Barber, Convenience Store) 1

(Barber, Fast Food) 1

(Bakery, Convenience Store) 2

(Bakery, Fast Food) 3

(Convenience Store, Fast Food) 3

Pairs of Two Items; 4C2 = 6

Business Type Count(Bakery, Convenience Store) 2

(Bakery, Fast Food) 3

(Convenience Store, Fast Food) 3

Filter Minimum Occurrences m < 2

L1

L2