Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree Wei Fan, Kun Zhang, Hong Cheng, Jing Gao, Xifeng Yan, Jiawei.

Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree

Wei Fan, Kun Zhang, Hong Cheng,

Jing Gao, Xifeng Yan, Jiawei Han,

Philip S. Yu, Olivier Verscheure

How to find good features from semi-structured raw data for classification

Feature Construction

Most data mining and machine learning model assume the following structured data: (x1, x2, ..., xk) -> y where xi’s are independent variable y is dependent variable.

y drawn from discrete set: classification y drawn from continuous variable: regression

When feature vectors are good, differences in accuracy among learners are not much.

Questions: where do good features come from?

Frequent Pattern-Based Feature Extraction

Data not in the pre-defined feature vectors Transactions

Biological sequence

Graph database

Frequent pattern is a good candidate for discriminative features So, how to mine them?

FP: Sub-graphO

A discovered pattern

NSC 4960

NSC 191370

NSC 40773

NSC 164863 NS

NSC 699181

(example borrowed from George Karypis presentation)

Frequent Pattern Feature Vector Representation

P1 P2 P3

Data1 1 1 0Data2 1 0 1Data3 1 1 0Data4 0 0 1

………

Petal.Width< 1.75setosa

versicolor virginica

Petal.Length< 2.45

Any classifiers you can name

LRMining these predictivefeatures is an NP-hardproblem.

100 examples can get up to1010 patterns

Most are useless

Example 192 examples

12% support (at least 12% examples contain the pattern), 8600 patterns returned by itemsets 192 vs 8600 ?

4% support, 92,000 patterns 192 vs 92,000 ??

Most patterns have no predictive power and cannot be used to construct features.

Our algorithm Find only 20 highly predictive patterns can construct a decision tree with about 90% accuracy

Data in “bad” feature space Discriminative patterns

A non-linear combination of single feature(s) Increase the expressive and discriminative power of the

feature space

An example

-1 1 1

1 -1 1

-1 -1 1Data is non-linearly separable in (x, y)

New Feature Space

Data is linearly separable in (x, y, F)

Mine &

Transform

• Solving Problem

-1 1 1

1 -1 1

-1 -1 1X Y F:x=0,

0 0 1 0

1 1 0 1

-1 1 0 1

1 -1 0 1

-1 -1 0 1

1ItemSet:F: x=0,y=0Association ruleF: x=0 y=0

Computational Issues Measured by its “frequency” or support.

E.g. frequent subgraphs with sup ≥ 10% or ≥ 10% examples contain these patterns

“Ordered” enumeration: cannot enumerate “sup = 10%” without first enumerating all patterns > 10%.

NP hard problem, easily up to 1010 patterns for a realistic problem. Most Patterns are Non-discriminative. Low support patterns can have high “discriminative power”. Bad! Random sampling not work since it is not exhaustive.

Most patterns are useless. Random sample patterns (or blindly enumerate without considering frequency) is useless.

Small number of examples. If subset of vocabulary, incomplete search. If complete vocabulary, won’t help much but introduce sample selection bias

problem, particularly to miss low support but high info gain patterns

1. Mine frequent patterns (>sup)

Frequent Patterns1-------------------------------2----------3----- 4 --- 5 ----------- 6 ------- 7------

DataSet mine

Mined Discriminative

Patterns

select

2. Select most discriminative patterns;

3. Represent data in the feature space using such patterns;

4. Build classification models.

F1 F2 F4

Data1 1 1 0Data2 1 0 1Data3 1 1 0Data4 0 0 1

………represent

Petal.Width< 1.75setosa

versicolor virginica

Petal.Length< 2.45

Any classifiers you can name

Conventional Procedure

Feature Construction and Selection

Two-Step Batch Method

Two Problems

Mine step combinatorial explosion

Frequent Patterns

1-------------------------------2----------3----- 4 --- 5 ----------- 6 ------- 7------

DataSetmine

1. exponential explosion 2. patterns not considered if minsupport isn’t small

enough

Two Problems Select step

Issue of discriminative power

Frequent Patterns

1-------------------------------2----------3----- 4 --- 5 ----------- 6 ------- 7------

Mined Discriminative

Patterns

select

3. InfoGain against the complete dataset, NOT on subset of

examples

4. Correlation notdirectly evaluated on their

joint predictability

Direct Mining & Selection via Model-based Search Tree Basic Flow

Mined Discriminative Patterns

Compact set of highly

discriminative patterns

1234567...

Divide-and-Conquer Based Frequent Pattern Mining

Mine & Select P: 20%

Mine & Select P:20%

Few Data

Mine & Select P:20%

dataset

Most discriminative F based on IG

Feature Miner

Classifier

Global Support:

10*20%/10000=0.02%

Analyses (I)

1. Scalability (Theorem 1)

Upper bound

“Scale down” ratio to obtain extremely low support pat:

2. Bound on number of returned features (Theorem 2)

4. Non-overfitting5. Optimality under exhaustive search

Analyses (II)

3. Subspace is important for discriminative pattern

Original set: no-information gain if C1 and C0: number of examples belonging to class 1 and 0 P1: number of examples in C1 that contains “a pattern α” P0: number of examples in C0 that contains the same pattern α

Subsets could have info gain:

Experimental Studies: Itemset Mining (I)

Scalability Comparison

Adult Chess Hypo Sick Sonar

Log(DT #Pat) Log(MbT #Pat)

Log(DTAbsSupport) Log(MbTAbsSupport)

Datasets #Pat using MbT supRatio (MbT #Pat / #Pat using MbT

Adult 252809 0.41%

Chess +∞ ~0%

Hypo 423439 0.0035%

Sick 4818391 0.00032%

Sonar 95507 0.00775%

Mine & Select P:20%

Few Data

dataset

Global Support:

10*20%/10000=0.02%

Mine & Select P:20%

Mine & Select P:20%4

Mine & Select P:20%

Few Data

dataset

Global Support:

10*20%/10000=0.02%

Mine & Select P:20%

Experimental Studies: Itemset Mining (II)

Accuracy of Mined Itemsets

DT Accuracy MbT Accuracy

4 Wins 1 loss

Log(DT #Pat) Log(MbT #Pat)

much smallernumber ofpatterns

Experimental Studies: Itemset Mining (III)

Convergence

Experimental Studies: Graph Mining (I)

9 NCI anti-cancer screen datasets The PubChem Project. URL: pubchem.ncbi.nlm.nih.gov. Active (Positive) class : around 1% - 8.3%

2 AIDS anti-viral screen datasets URL: http://dtp.nci.nih.gov. H1: CM+CA – 3.5% H2: CA – 1%

Experimental Studies: Graph Mining (II) Scalability

0300600900

120015001800

NCI1 NCI33 NCI41 NCI47 NCI81 NCI83 NCI109 NCI123 NCI145 H1 H2

DT #Pat MbT #Pat

Log(DT Abs Support) Log(MbT Abs Support)2

Mine & Select P:20%

Few Data

dataset

Global Support:

10*20%/10000=0.02%

Mine & Select P:20%

Few Data

dataset

Global Support:

10*20%/10000=0.02%

Mine & Select P:20%

Experimental Studies: Graph Mining (III) AUC and Accuracy

DT MbTAUC

Accuracy

DT MbT

11 Wins

10 Wins 1 Loss

AUC of MbT, DT MbT VS Benchmarks

Experimental Studies: Graph Mining (IV)

7 Wins, 4 losses

Summary Model-based Search Tree

Integrated feature mining and construction. Dynamic support Can mine extremely small support patterns Both a feature construction and a classifier Not limited to one type of frequent pattern: plug-play

Experiment Results Itemset Mining Graph Mining

Software and Dataset available from: www.cs.columbia.edu/~wfan

Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree Wei Fan, Kun Zhang, Hong Cheng, Jing Gao, Xifeng Yan, Jiawei.

predictive patterns

mined discriminative

essential frequent patterns

low support patterns

discriminative features

classification y

random sample patterns

data mining

Documents

Discriminative Random Fields

佳味面馆 JIAWEI

© 2008 IBM Corporation Mining Significant Graph Patterns by...

Unsupervised Discovery of Mid-Level Discriminative...

Discriminative Frequent Pattern Analysis for Effective...

1 gSpan: Graph-based substructure pattern mining Authors:...

Discriminative Model Checking

Food-101 { Mining Discriminative Components with Random...

Discriminative SNMF EA201603

Jiawei Xu April 2009 - Pitt

1 Identifying Bug Signatures Using Discriminative Graph...

1 CloseGraph: Mining Closed Frequent Graph Patterns Xifeng.....

Jiawei Gao en presenta la Xina

Chen Chen 1, Cindy X. Lin 1, Matt Fredrikson 2, Mihai...

Hong Cheng Jiawei Han

Fourier transform Jiawei Chiu