Top Banner
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu
12

Association Rule Mining

Feb 14, 2016

Download

Documents

dinh

Association Rule Mining. CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu. Outline. What is association rule mining? Methods for association rule mining Extensions of association rule. What Is Association Rule Mining?. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Association Rule Mining

CS685 : Special Topics in Data Mining, UKY

The UNIVERSITY of KENTUCKY

Association Rule Mining

CS 685: Special Topics in Data MiningSpring 2008

Jinze Liu

Page 2: Association Rule Mining

CS685 : Special Topics in Data Mining, UKY

Outline

• What is association rule mining?• Methods for association rule mining • Extensions of association rule

Page 3: Association Rule Mining

CS685 : Special Topics in Data Mining, UKY

What Is Association Rule Mining?

• Frequent patterns: patterns (set of items, sequence, etc.) that occur frequently in a database [AIS93]

• Frequent pattern mining: finding regularities in data– What products were often purchased together?

• Beer and diapers?!

– What are the subsequent purchases after buying a car?– Can we automatically profile customers?

Page 4: Association Rule Mining

CS685 : Special Topics in Data Mining, UKY

Basics

• Itemset: a set of items– E.g., acm={a, c, m}

• Support of itemsets– Sup(acm)=3

• Given min_sup=3, acm is a frequent pattern

• Frequent pattern mining: find all frequent patterns in a database

TID Items bought100 f, a, c, d, g, I, m, p200 a, b, c, f, l,m, o300 b, f, h, j, o400 b, c, k, s, p500 a, f, c, e, l, p, m, n

Transaction database TDB

Page 5: Association Rule Mining

CS685 : Special Topics in Data Mining, UKY

Frequent Pattern Mining: A Road Map

• Boolean vs. quantitative associations – age(x, “30..39”) ^ income(x, “42..48K”) buys(x,

“car”) [1%, 75%]• Single dimension vs. multiple dimensional

associations • Single level vs. multiple-level analysis

– What brands of beers are associated with what brands of diapers?

Page 6: Association Rule Mining

CS685 : Special Topics in Data Mining, UKY

Extensions & Applications

• Correlation, causality analysis & mining interesting rules

• Maxpatterns and frequent closed itemsets• Constraint-based mining • Sequential patterns• Periodic patterns• Structural Patterns• Computing iceberg cubes

Page 7: Association Rule Mining

CS685 : Special Topics in Data Mining, UKY

Frequent Pattern Mining Methods

• Apriori and its variations/improvements • Mining frequent-patterns without candidate

generation• Mining max-patterns and closed itemsets• Mining multi-dimensional, multi-level

frequent patterns with flexible support constraints

• Interestingness: correlation and causality

Page 8: Association Rule Mining

CS685 : Special Topics in Data Mining, UKY

Apriori: Candidate Generation-and-test

• Any subset of a frequent itemset must be also frequent — an anti-monotone property– A transaction containing {beer, diaper, nuts} also contains

{beer, diaper}– {beer, diaper, nuts} is frequent {beer, diaper} must also

be frequent

• No superset of any infrequent itemset should be generated or tested– Many item combinations can be pruned

Page 9: Association Rule Mining

CS685 : Special Topics in Data Mining, UKY

Apriori-based Mining

• Generate length (k+1) candidate itemsets from length k frequent itemsets, and

• Test the candidates against DB

Page 10: Association Rule Mining

CS685 : Special Topics in Data Mining, UKY

Apriori Algorithm• A level-wise, candidate-generation-and-test

approach (Agrawal & Srikant 1994)TID Items10 a, c, d20 b, c, e30 a, b, c, e40 b, eMin_sup=2

Itemset Supa 2b 3c 3d 1e 3

Data base D 1-candidates

Scan D

Itemset Supa 2b 3c 3e 3

Freq 1-itemsetsItemset

abacaebcbece

2-candidates

Itemset Supab 1ac 2ae 1bc 2be 3ce 2

Counting

Scan D

Itemset Supac 2bc 2be 3ce 2

Freq 2-itemsetsItemset

bce

3-candidates

Itemset Supbce 2

Freq 3-itemsets

Scan D

Page 11: Association Rule Mining

CS685 : Special Topics in Data Mining, UKY

The Apriori Algorithm• Ck: Candidate itemset of size k• Lk : frequent itemset of size k

• L1 = {frequent items};• for (k = 1; Lk !=; k++) do

– Ck+1 = candidates generated from Lk;– for each transaction t in database do increment the count

of all candidates in Ck+1 that are contained in t– Lk+1 = candidates in Ck+1 with min_support

• return k Lk;

Page 12: Association Rule Mining

CS685 : Special Topics in Data Mining, UKY

Important Details of Apriori

• How to generate candidates?– Step 1: self-joining Lk

– Step 2: pruning• How to count supports of candidates?