Top Banner
Tilani Gunawardena Algorithms: Mining Association Rules
30

Assosiate rule mining

Jan 24, 2017

Download

Education

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Assosiate rule mining

Tilani Gunawardena

Algorithms: Mining Association Rules

Page 2: Assosiate rule mining

• Data Mining : Uncovering and discovering hidden & potentially useful information from your data

• Descriptive Information– Find Patterns that are human interpretable– Ex: Clustering, Association Rule Mining,

• Predictive Information– Find Value of an attribute using the values of other

attributes– Ex: Classification, Regression,

Descriptive & Predictive Information/Model

Page 3: Assosiate rule mining

• Typical and widely used example of Association Rules application is market basket analysis– Frequent Patterns are patterns that appear in a data set

frequently. – Milk and Bread, that appear frequently together in a

transaction data set.

• Other names: Frequent Item Set Mining, Association Rule Mining, Market Basket Analysis, Link Analysis etc.

Introduction

Page 4: Assosiate rule mining
Page 5: Assosiate rule mining

• Association Rules: describing association relationships among the attributes in the set of relevant data.

• To find the relationships between objects which are frequently used together

• Association Rules find all sets of item(itemsets ) that have support greater than the minimum support– Then using the large itemsets to generate the desired

rules that have confidence greater than the minimum confidence

Association Rules

Page 6: Assosiate rule mining

• If the customer buys milk then he may also buy cereal or , if the customer buys a tablet computer then he may also buy a case(cover)

• There are two basic criteria that association rules use, Support and Confidence– To identify the relationship and rules generated by

analysing data for frequently used if/then patterns– Association rules are usually needed to satisfy a

user-specified minimum support and a user-specified minimum confidence at the same time

Page 7: Assosiate rule mining

• Rule: X Y⇒• Support: Probability that a transaction contains

X and Y = Applicability of the Rule – Support =P(X ∪Y) = or

• Confidence: Conditional probability that a transaction having X also contains Y = Strength of the Rule – Confidence =

ConceptsCoverage =support

Accuracy =confidence

Page 8: Assosiate rule mining

• Both Confidence and Support should be large • By convention, confidence and support values are written as

percentages (%).

• Item Set: Set of Items• K-Item Set: An item set that contains k items

– {A,B} is a 2-item set. • Frequency, Support Count, Count: Number of transaction that

contains the item set. • Frequent Itemsets: Itemsets that occurs frequently (more than

minimum support)

– A set of all items in a store I= {i1,i2,i3,…im}– A set of all transactions (Transaction Database T)

• T= {t1,t2,t3,t4,… tn}• Each ti is a set of items s.t.

• Each transaction ti has a transaction ID(TID)

Concepts

Page 9: Assosiate rule mining

Example:

Rule Support Confidence

A D⇒ 2/5 2/3

C A⇒ 2/5 2/4

A C⇒ 2/5 2/3

B & C D⇒ 1/5 1/3

Page 10: Assosiate rule mining

TID Items Bought

2000 A,B,C

1000 A,C

4000 A,D

5000 B,E,F

Example :

Minimum Support = 50% Minimum Confidence = 50%

A C⇒ (Sup=50%, Conf=66.6%)

C A ⇒ (Sup=50%, Conf=100%)

Page 11: Assosiate rule mining

• Naive method for finding association rules: – Use separate-and-conquer method – Treat every possible combination of attribute

values as a separate class • Two problems: – Computational complexity – Resulting number of rules (which would have to be

pruned on the basis of support and confidence) • But: we can look for high support rules directly!

Association Rules

Page 12: Assosiate rule mining

• Support: number of instances correctly covered by association rule – The same as the number of instances covered by all

tests in the rule (LHS and RHS!) • Item: one test/attribute-value pair• Item set : all items occurring in a rule• Goal: only rules that exceed pre-defined support – ⇒ Do it by finding all item sets with the given

minimum support and generating rules from them!

Item Sets

Page 13: Assosiate rule mining

Example :Weather data

Outlook Temp Humidity Windy Play

Sunny Hot High False No

Sunny Hot High True No

Overcast Hot High False Yes

Rainy Mild High False Yes

Rainy Cool Normal False Yes

Rainy Cool Normal True No

Overcast Cool Normal True Yes

Sunny Mild High False No

Sunny Cool Normal False Yes

Rainy Mild Normal False Yes

Sunny Mild Normal True Yes

Overcast Mild High True Yes

Overcast Hot Normal False Yes

Rainy Mild High True No

Page 14: Assosiate rule mining

1-item Sets 2-item sets 3-item sets 4-item setsOutlook=Sunny(5) Outlook=Sunny

Temperature=Hot(2)Outlook=SunnyTemperature= HotHumidity=High(2)

Outlook = Sunny Temperature = Hot Humidity = High Play = No (2)

Temperature = Cool(4) Outlook = Sunny Humidity = High (3)

Outlook=SunnyHumidity = High Windy = False (2)

Outlook = Rainy Temperature = Mild Windy = FalsePlay = Yes (2)

... … … …

Item sets for weather data

In total: 12 one-item sets,

47 two-item sets,

39 three-item sets,

6 four-item sets and

0 five-item sets

(with minimum support of two)

Page 15: Assosiate rule mining

• Once all item sets with minimum support have been generated, we can turn them into rules

• Examples:– Humidity = Normal, Windy = False, Play = Yes(4)

• Seven (2N-1) Potential rules:– If Humidity = Normal and Windy = False then Play=Yes 4/4– If Humidity = Normal Play=Yes then Windy = False 4/6– If Windy = False and Play=Yes then Humidity = Normal 4/6– If Humidity = Normal then Windy = False and Play=Yes 4/7– If Windy = False then Play=Yes and Humidity = Normal 4/8– If Play=Yes then Humidity=Normal and Windy=False 4/9– If - then Humidity=Normal and Windy=False and Play=Yes 4/14

Generating rules from an item set

Page 16: Assosiate rule mining

• Rules with support > 2 and confidence = 100%:

Rules for weather data

Association rule Sup. Conf.

1 Humidity=Normal Windy=False ⇒ Play=Yes 4 100%

2 Temperature=Cool ⇒ Humidity=Normal 4 100%

3 Outlook=Overcast ⇒ Play=Yes 4 100%

4 Temperature=Cold Play=Yes ⇒ Humidity=Normal 3 100%

… ... … …

58 Outlook=Sunny Temperature=Hot ⇒ Humidity=High 2 100%

• In Total::– 3 rules with support four – 5 with support three – 50 with support two

Page 17: Assosiate rule mining

• Item set: – Temperature = Cool, Humidity = Normal, Windy = False, Play = Yes (2)

• Resulting rules (all with 100% confidence): – Temperature = Cool ,Windy = False ⇒ Humidity = Normal, Play = Yes – Temperature = Cool ,Windy = False, Humidity = Normal ⇒Play = Yes – Temperature = Cool ,Windy = False, Play = Yes ⇒ Humidity = Normal

• Due to the following “frequent” item sets: – Temperature = Cool, Windy = False (2) – Temperature = Cool, Humidity = Normal, Windy = False (2) – Temperature = Cool, Windy = False, Play = Yes (2)

Example rules from the same set

Page 18: Assosiate rule mining

• How can we efficiently find all frequent item sets? • Finding one-item sets easy • Idea: use one-item sets to generate two-item sets,

two-item sets to generate three-item sets, ... – If (A B) is frequent item set, then (A) and (B) have to

be frequent item sets as well! – In general: if X is frequent k-item set, then all (k-1)-

item subsets of X are also frequent ⇒Compute k-item set by merging (k-1)-item

sets

Generating item sets efficiently

Page 19: Assosiate rule mining

• Given: five three-item sets – (A B C), (A B D), (A C D), (A C E), (B C D)

• Candidate four-item sets: – (A B C D) OK because of (A C D) (B C D) – (A C D E) Not OK because of (C D E)

• Final check by counting instances in dataset! • (k –1)-item sets are stored in hash table

Example

Page 20: Assosiate rule mining

• 2 Steps– Find all itemsets that have minimum support (frequent itemsets, also called large itemsets)– Use frequent itemsets to generate rules

• Key idea: A subsets of a frequent itemset must also be a frequent itemsets– If {I1 ,I2} is a frequent itemset, then{I1} and {I2} should be a frequent itemsets

• An iterative approach to find frequent itemsets

Apriori Algorithm Example 2

Page 21: Assosiate rule mining

TID Items

100 1 3 4

200 2 3 5

300 1 2 3 5

400 2 5

500 1 3 5

Apriori Algorithm Example 2:

Itemset Support

{1} 3

{2} 3

{3} 4

{4} 1

{5} 4

Minimum Support Count =2

Itemset Support

{1} 3

{2} 3

{3} 4

{5} 4

Itemset Support

{1,2} 1

{1,3} 3

{1,5} 2

{2,3} 2

{2,5} 3

{3,5} 3

Itemset Support

{1,3} 3

{1,5} 2

{2,3} 2

{2,5} 3

{3,5} 3

Candidate List of 1-itemsets Frequent List of 1-itemsets

Candidate List of 2-itemsets Frequent List of 2-itemsets

A subsets of a frequent itemset must also be a frequent itemsets

Page 22: Assosiate rule mining

TID Items

100 1 3 4

200 2 3 5

300 1 2 3 5

400 2 5

500 1 3 5

Apriori Algorithm Example:

Minimum Support Count =2

Itemset In FI2?

{1,2,3}{1,2},{1,3},{2,3}

NO

{1,2,5}{1,2},{1,5},{2,5}

NO

{1,3,5}{1,3},{1,5},{3,5}

Yes

{2,3,5}{2,3},{2,5},{3,5}

Yes

Itemset Support

{1,3,5} 2

{2,3,5} 2

Candidate List of 3-itemsets

Frequent List of 3-itemsets

Itemset Support

{1,3} 3

{1,5} 2

{2,3} 2

{2,5} 3

{3,5} 3

Frequent List of 2-itemsets

A subsets of a frequent itemset must also be a frequent itemsets

Page 23: Assosiate rule mining

TID Items

100 1 3 4

200 2 3 5

300 1 2 3 5

400 2 5

500 1 3 5

Apriori Algorithm Example:

Minimum Support Count =2

Itemset Support

{1,2,3,5} 1

Candidate List of 4-itemsets

Frequent List of 4-itemsets

Itemset Support

Empty

Frequent List of 4-itemsets

A subsets of a frequent itemset must also be a frequent itemset

Itemset In FI3?

{1,2,3,5}{1,2,3},{1,2,5},{1,3,5},{2,4,5}

NoItemset Support

{1,3,5} 2

{2,3,5} 2

Frequent List of 3-itemsets

If Support is large enough

Page 24: Assosiate rule mining

• The Apriori algorithm takes advantage of the fact that any subset of a frequent itemset is also a frequent itemset

• The algorithm can therefore, reduce the number of candidates being considered by only exploring the itemsets whose support count is greater than the minimum support count

• All infrequent itemsets can be pruned if it has an infrequent subset

Apriori Algorithm

Page 25: Assosiate rule mining

• Build a Candidate List of k-itemsets and then extract a Frequent List if k-itemsets using the support count

• After that, we use the Frequent List of k-itemsets in determining the Candidate and Frequent List of (k+1) itemsets

• We use Pruning to do that• We repeat until we have an empty Candidate

or Frequent of k-itemsets– Then we return the List of k-1 itemsets

Algorithm

Page 26: Assosiate rule mining

• Now we have the list of frequent itemsets

• Generate all nonempty subsets for each frequent itemset I– For I ={1,3,5} , all noneempty subsets are {1,3},{1,5},{3,5},

{1},{3},{5}– For I = {2,3,5} , all noneempty subsets are {2,3},{2,5},{3,5},

{2}, {3},{5}

Generate Associate Rules

Itemset Support

{1,3,5} 2/5

{2,3,5} 2/5

Frequent List of 3-itemsets

Page 27: Assosiate rule mining

• For rule X Y , Confidence

• For every nonempty subset s of I, output the rule :

s (I-s)

If Confidence >= min_confidenceWhere min_confidence is minimum confidence threshold

Let us assume• Minimum confidence threshold is 60%

Page 28: Assosiate rule mining

• R1: 1& 3 5– Confidence=

• R2: 1 & 5 3– Confidence =

• R3: 3 & 5 1– Confidence=

• R4: 13 &5– Confidence =

• R5: 3 1 & 5– Confidence =

• R6: 5 1 & 3– Confidence =

TID Items

100 1 3 4

200 2 3 5

300 1 2 3 5

400 2 5

500 1 3 5

For I ={1,3,5} , all noneempty subsets are {1,3},{1,5},{3,5},{1},{3},{5}

Page 29: Assosiate rule mining

• R1: 1& 3 5– Confidence= 2/3=66.66%– R1 is selected

• R2: 1 & 5 3– Confidence =2/2=100%– R2 is selected

• R3: 3 & 5 1– Confidence= 2/3=66.66%– R3 is selected

• R4: 13 &5– Confidence =2/3=66.66%– R4 is selected

• R5: 3 1 & 5– Confidence = 2/4=50%– R5 is Rejected

• R6: 5 1 & 3– Confidence =2/4 =50%– R6 is Rejected

TID Items

100 1 3 4

200 2 3 5

300 1 2 3 5

400 2 5

500 1 3 5

For I ={1,3,5} , all noneempty subsets are {1,3},{1,5},{3,5},{1},{3},{5}

Page 30: Assosiate rule mining

• R7: 2& 3 5– Confidence= 2/2=100%– R7 is selected

• R8: 2 & 5 3– Confidence =2/3=66.66%– R8 is selected

• R9: 3 & 5 2– Confidence= 2/3=66.66%– R9 is selected

• R10: 23 &5– Confidence =2/3=66.66%– R10 is selected

• R11: 3 2 & 5– Confidence = 2/4=50%– R11 is Rejected

• R12: 5 2 & 3– Confidence =2/4 =50%– R12 is Rejected

TID Items

100 1 3 4

200 2 3 5

300 1 2 3 5

400 2 5

500 1 3 5

For I = {2,3,5} , all noneempty subsets are {2,3},{2,5},{3,5},{2}, {3},{5}