Top Banner

of 20

Rule Mining

Apr 03, 2018

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 7/29/2019 Rule Mining

    1/20

    Association Rule Mining

  • 7/29/2019 Rule Mining

    2/20

    2

    The Task Two ways of defining the task

    General Input: A collection of instances Output: rules to predict the values of any attribute(s) (not

    just the class attribute) from values of other attributes E.g. if temperature = cool then humidity =normal If the right hand side of a rule has only the class attribute,

    then the rule is a classification rule Distinction: Classification rules are applied together as sets of

    rules

    Specific - Market-basket analysis Input: a collection of transactions

    Output: rules to predict the occurrence of any item(s) fromthe occurrence of other items in a transaction E.g. {Milk, Diaper} -> {Beer}

    General rule structure: Antecedents -> Consequents

  • 7/29/2019 Rule Mining

    3/20

    Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #

    Association Rule Mining

    Given a set of transactions, find rules that will predict the

    occurrence of an item based on the occurrences of otheritems in the transaction

    Market-Basket transactions

    TID Items

    1 Bread, Milk2 Bread, Diaper, Beer, Eggs3 Milk, Diaper, Beer, Coke4 Bread, Milk, Diaper, Beer5 Bread, Milk, Diaper, Coke

    Example of Association Rules

    {Diaper} {Beer},

    {Milk, Bread} {Eggs,Coke},

    {Beer, Bread} {Milk},

    Implication means co-occurrence,

    not causality!

  • 7/29/2019 Rule Mining

    4/20

    Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #

    Definition: Frequent Itemset

    Itemset

    A collection of one or more itemsExample: {Milk, Bread, Diaper}

    k-itemset

    An itemset that contains k items

    Support count ()

    Frequency of occurrence of anitemset

    E.g. ({Milk, Bread,Diaper}) = 2

    Support

    Fraction of transactions that containan itemset

    E.g. s({Milk, Bread, Diaper}) = 2/5

    Frequent Itemset

    An itemset whose support is greaterthan or equal to a minsup threshold

    TID Items

    1 Bread, Milk2 Bread, Diaper, Beer, Eggs3 Milk, Diaper, Beer, Coke

    4 Bread, Milk, Diaper, Beer5 Bread, Milk, Diaper, Coke

  • 7/29/2019 Rule Mining

    5/20

    Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #

    Definition: Association Rule

    Example:

    Beer}Diaper,Milk{

    4.052

    |T|)BeerDiaper,,Milk( s

    67.03

    2

    )Diaper,Milk(

    )BeerDiaper,Milk,(

    c

    Association Rule

    An implication expression of the formX Y, where X and Y are itemsets

    Example:

    {Milk, Diaper} {Beer}

    Rule Evaluation Metrics

    Support (s)

    Fraction of transactions that contain

    both X and Y

    Confidence (c)

    Measures how often items in Yappear in transactions that

    contain X

    TID Items

    1 Bread, Milk

    2 Bread, Diaper, Beer, Eggs

    3 Milk, Diaper, Beer, Coke

    4 Bread, Milk, Diaper, Beer

    5 Bread, Milk, Diaper, Coke

  • 7/29/2019 Rule Mining

    6/20

    Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #

    Association Rule Mining Task

    Given a set of transactions T, the goal of

    association rule mining is to find all rules having

    support minsup threshold

    confidence minconfthreshold

    Brute-force approach:

    List all possible association rules

    Compute the support and confidence for each rule

    Prune rules that fail the minsup and minconf

    thresholds

    Computationally prohibitive!

  • 7/29/2019 Rule Mining

    7/20

    Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #

    Mining Association Rules

    Example of Rules:{Milk,Diaper} {Beer} (s=0.4, c=0.67)

    {Milk,Beer} {Diaper} (s=0.4, c=1.0)

    {Diaper,Beer} {Milk} (s=0.4, c=0.67)

    {Beer} {Milk,Diaper} (s=0.4, c=0.67)

    {Diaper} {Milk,Beer} (s=0.4, c=0.5){Milk} {Diaper,Beer} (s=0.4, c=0.5)

    TID Items

    1 Bread, Milk2 Bread, Diaper, Beer, Eggs3 Milk, Diaper, Beer, Coke

    4 Bread, Milk, Diaper, Beer5 Bread, Milk, Diaper, Coke

    Observations:

    All the above rules are binary partitions of the same itemset:

    {Milk, Diaper, Beer}

    Rules originating from the same itemset have identical support but

    can have different confidence

    Thus, we may decouple the support and confidence requirements

  • 7/29/2019 Rule Mining

    8/20

    Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #

    Mining Association Rules

    Two-step approach:

    1. Frequent Itemset Generation

    Generate all itemsets whose support minsup

    2. Rule Generation Generate high confidence rules from each frequent itemset,

    where each rule is a binary partitioning of a frequent itemset

    Frequent itemset generation is stillcomputationally expensive

  • 7/29/2019 Rule Mining

    9/20

    9

    Power set Given a set S, power set, P is the set of all

    subsets of S Known property of power sets If S has n number of elements, P will have N = 2n

    number of elements.

    Examples: For S = {}, P={{}}, N = 20 = 1 For S = {Milk}, P={{}, {Milk}}, N=21=2 For S = {Milk, Diaper} P={{},{Milk}, {Diaper}, {Milk, Diaper}}, N=22=4

    For S = {Milk, Diaper, Beer}, P={{},{Milk}, {Diaper}, {Beer}, {Milk, Diaper},

    {Diaper, Beer}, {Beer, Milk}, {Milk, Diaper, Beer}},N=23=8

    B F h

  • 7/29/2019 Rule Mining

    10/20

    10

    Brute Force approach toFrequent Itemset Generation

    For an itemset with 3elements, we have 8 subsets Each subset is a candidate

    frequent itemset whichneeds to be matched againsteach transaction

    TID Items

    1 Bread, Milk2 Bread, Diaper, Beer, Eggs3 Milk, Diaper, Beer, Coke

    4 Bread, Milk, Diaper, Beer5 Bread, Milk, Diaper, Coke

    Itemset Count

    {Milk} 4

    {Diaper} 4

    {Beer} 3

    1-itemsets

    2-itemsets

    Itemset Count

    {Milk, Diaper} 3

    {Diaper, Beer} 3

    {Beer, Milk} 23-itemsets

    Itemset Count

    {Milk, Diaper, Beer} 2

    Important Observation: Counts of subsets cant be smaller than the count of an itemset!

  • 7/29/2019 Rule Mining

    11/20

    Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #

    Reducing Number of Candidates

    Apriori principle:

    If an itemset is frequent, then all of its subsets must also

    be frequent

    Apriori principle holds due to the following propertyof the support measure:

    Support of an itemset never exceeds the support of its

    subsets

    This is known as the anti-monotone property of support

    )()()(:, YsXsYXYX

  • 7/29/2019 Rule Mining

    12/20

    Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #

    Illustrating Apriori Principle

    Item Count

    Bread 4Coke 2Milk 4Beer 3Diaper 4Eggs 1

    Itemset Count

    {Bread,Milk} 3{Bread,Beer} 2{Bread,Diaper} 3

    {Milk,Beer} 2{Milk,Diaper} 3{Beer,Diaper} 3

    Itemset Count

    {Bread,Milk,Diaper} 3

    Items (1-itemsets)

    Pairs (2-itemsets)

    (No need to generatecandidates involving Coke

    or Eggs)

    Triplets (3-itemsets)Minimum Support = 3

    If every subset is considered,6C1 +6C2 +

    6C3 = 41With support-based pruning,

    6 + 6 + 1 = 13Write all possible 3-itemsets

    and prune the list based on infrequent 2-itemsets

  • 7/29/2019 Rule Mining

    13/20

    Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #

    Apriori Algorithm

    Method:

    Let k=1

    Generate frequent itemsets of length 1

    Repeat until no new frequent itemsets are identifiedGenerate length (k+1) candidate itemsets from length kfrequent itemsets

    Prune candidate itemsets containing subsets of length k thatare infrequent

    Count the support of each candidate by scanning the DBEliminate candidates that are infrequent, leaving only thosethat are frequent

    Note: This algorithm makes several passes over the transaction list

  • 7/29/2019 Rule Mining

    14/20

    Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #

    Rule Generation

    Given a frequent itemset L, find all non-empty subsets f

    L such that f L f satisfies the minimum confidencerequirement

    If {A,B,C,D} is a frequent itemset, candidate rules:

    ABC D, ABD C, ACD B, BCD A,A BCD, B ACD, C ABD, D ABC

    AB CD, AC BD, AD BC, BC AD,BD AC, CD AB,

    If |L| = k, then there are 2k 2 candidate association rules(ignoring L and L)

    Because, rules are generated from frequent itemsets,they automatically satisfy the minimum support threshold

    Rule generation should ensure production of rules that satisfyonly the minimum confidence threshold

  • 7/29/2019 Rule Mining

    15/20

    Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #

    Rule Generation

    How to efficiently generate rules from frequent itemsets?

    In general, confidence does not have an anti-monotone property

    c(ABC D) can be larger or smaller than c(AB D)

    But confidence of rules generated from the same itemset has ananti-monotone property

    e.g., L = {A,B,C,D}:

    c(ABC D) c(AB CD) c(A BCD)

    Confidence is anti-monotone w.r.t. number of items on the RHS of

    the rule

    Example: Consider the following two rules

    {Milk} {Diaper, Beer} has cm=0.5

    {Milk, Diaper} {Beer} has cmd=0.67 > cm

  • 7/29/2019 Rule Mining

    16/20

    16

    Computing Confidence for Rules

    Unlike computing support,computing confidence doesnot require several passesover the transaction list

    Supports computed from

    frequent itemset generationcan be reused Tables on the side show

    support values for all (exceptnull) the subsets of itemset{Bread, Milk, Diaper}

    Confidence values are shownfor the (23-2) = 6 rulesgenerated from thisfrequent itemset

    Itemset Support

    {Milk} 4/5=0.8

    {Diaper} 4/5=0.8

    {Bread} 4/5=0.8

    Example of Rules:

    {Bread, Milk} {Diaper} (s=0.6, c=1.0)

    {Milk, Diaper} {Bread} (s=0.6, c=1.0)

    {Diaper, Bread} {Milk} (s=0.6, c=1.0){Bread} {Milk, Diaper} (s=0.6, c=0.75)

    {Diaper} {Milk, Bread} (s=0.6, c=0.75)

    {Milk} {Diaper, Bread} (s=0.6, c=0.75)

    Itemset Support{Milk, Diaper} 3/5=0.6

    {Diaper, Bread} 3/5=0.6

    {Bread, Milk} 3/5=0.6

    Itemset Support

    {Bread, Milk, Diaper} 3/5=0.6

  • 7/29/2019 Rule Mining

    17/20

    17

    Rule generation in AprioriAlgorithm

    For each frequent k-itemset where k > 2 Generate high confidence

    rules with one item in theconsequent

    Using these rules,iteratively generate highconfidence rules withmore than one items inthe consequent

    if any rule has lowconfidence then all theother rules containing theconsequents can bepruned (not generated)

    {Bread, Milk, Diaper}1-item rules

    {Bread, Milk} {Diaper}{Milk, Diaper} {Bread}{Diaper, Bread} {Milk}

    2-item rules{Bread} {Milk, Diaper}{Diaper} {Milk, Bread}

    {Milk} {Diaper, Bread}

  • 7/29/2019 Rule Mining

    18/20

    18

    Evaluation

    Support and confidence used by Apriori allow a lot ofrules which are not necessarily interesting

    Two options to extract interesting rules Using subjective knowledge

    Using objective measures (measures better than confidence) Subjective approaches

    Visualization users allowed to interactively verify thediscovered rules

    Template-based approach filter out rules that do not fit

    the user specified templates Subjective interestingness measure filter out rules that

    are obvious (bread -> butter) and that are non-actionable (donot lead to profits)

  • 7/29/2019 Rule Mining

    19/20

    Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #

    Drawback of Confidence

    Coffee Coffee

    Tea 15 5 20

    Tea 75 5 80

    90 10 100

    Association Rule: Tea Coffee

    Confidence= P(Coffee|Tea) = 0.75

    but P(Coffee) = 0.9

    Although confidence is high, rule is misleading

    P(Coffee|Tea) = 0.9375

  • 7/29/2019 Rule Mining

    20/20

    20

    Objective Measures

    Confidence estimates rule quality in terms of theantecedent support but not consequent support As seen on the previous slide, support for consequent

    (P(coffee)) is higher than the rule confidence(P(coffee/Tea))

    Weka uses other objective measures Lift (A->B) = confidence(A->B)/support(B) = support(A-

    >B)/(support(A)*support(B)) Leverage (A->B) = support(A->B) support(A)*support(B)

    Conviction(A->B) = support(A)*support(not B)/support(A->B) conviction inverts the lift ratio and also computes support

    for RHS not being true