Top Banner
© 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)
27

© 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

© 2008 IBM Corporation

Mining Significant Graph Patterns by Leap Search

Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

Page 2: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

2

IBM T. J. Watson Research Center

Graph Pattern Mining | © 2008 IBM Corporation

Graph Patterns

Interestingness measures / Objective functions

• Frequency: frequent graph pattern

• Discriminative: information gain, Fisher score

• Significance: G-test

• …

Page 3: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

3

IBM T. J. Watson Research Center

Graph Pattern Mining | © 2008 IBM Corporation

Frequent Graph Pattern

Page 4: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

4

IBM T. J. Watson Research Center

Graph Pattern Mining | © 2008 IBM Corporation

Optimal Graph Pattern (this work)

Page 5: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

5

IBM T. J. Watson Research Center

Graph Pattern Mining | © 2008 IBM Corporation

Objective Functions

Challenge: Not Anti-Monotonic

X

Page 6: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

6

IBM T. J. Watson Research Center

Graph Pattern Mining | © 2008 IBM Corporation

Challenge: Non Anti-Monotonic

Anti-Monotonic

Non Monotonic

Non-Monotonic: Enumerate all subgraphs then check their score?

Enumerate subgraphs : small-size to large-size

Page 7: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

7

IBM T. J. Watson Research Center

Graph Pattern Mining | © 2008 IBM Corporation

Frequent Pattern Based Mining Framework

Exploratory task

Graph clustering

Graph classification

Graph index

(SIGMOD’04, ’05)(ISMB’05, ’07)

Graph Database Frequent Patterns Optimal Patterns

1. Bottleneck : millions, even billions of patterns

2. No guarantee of quality

Page 8: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

8

IBM T. J. Watson Research Center

Graph Pattern Mining | © 2008 IBM Corporation

Direct Pattern Mining Framework

Exploratory task

Graph clustering

Graph classification

Graph index

Graph Database Optimal Patterns

Direct

How?

Page 9: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

9

IBM T. J. Watson Research Center

Graph Pattern Mining | © 2008 IBM Corporation

Upper-Bound

Page 10: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

10

IBM T. J. Watson Research Center

Graph Pattern Mining | © 2008 IBM Corporation

Upper-Bound: Anti-Monotonic (cont.)

Rule of Thumb : If the frequency difference of a graph pattern in the positive dataset and the negative dataset increases, the pattern becomes more interesting

We can recycle the existing graph mining algorithms to accommodate non-monotonic functions.

Page 11: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

11

IBM T. J. Watson Research Center

Graph Pattern Mining | © 2008 IBM Corporation

Vertical Pruning

Larg

e <- s

mall

Page 12: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

12

IBM T. J. Watson Research Center

Graph Pattern Mining | © 2008 IBM Corporation

Horizontal Pruning: Structural Proximity

Page 13: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

13

IBM T. J. Watson Research Center

Graph Pattern Mining | © 2008 IBM Corporation

Structural Proximity: Another Perspective

# of frequent patterns >> # of possible frequency pairs

Many patterns share the same score

Page 14: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

14

IBM T. J. Watson Research Center

Graph Pattern Mining | © 2008 IBM Corporation

Frequency Envelope

Page 15: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

15

IBM T. J. Watson Research Center

Graph Pattern Mining | © 2008 IBM Corporation

Structural Leap Search

Page 16: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

16

IBM T. J. Watson Research Center

Graph Pattern Mining | © 2008 IBM Corporation

Frequency Association

Significant patterns often fall into the high-quantile of frequency

Starting with the most frequent patterns

Page 17: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

17

IBM T. J. Watson Research Center

Graph Pattern Mining | © 2008 IBM Corporation

Descending Leap Mine

1. Structural Leap Searchwith frequency threshold

3. Structural Leap Search

2. Support-Descending Mining

F(g*) converges

Page 18: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

18

IBM T. J. Watson Research Center

Graph Pattern Mining | © 2008 IBM Corporation

Results: NCI Anti-Cancer Screen Datasets

Name # of Compounds Tumor Description

MCF-7 27,770 Breast

MOLT-4 39,765 Leukemia

NCI-H23 40,353 Non-Small Cell Lung

OVCAR-8 40,516 Ovarian

P388 41,472 Leukemia

PC-3 27,509 Prostate

SF-295 40,271 Central Nerve System

SN12C 40,004 Renal

SW-620 40,532 Colon

UACC257 39,988 Melanoma

YEAST 79,601 Yeast anti-cancer

Link: http://pubchem.ncbi.nlm.nih.gov

Chemical Compounds: anti-cancer or not

# of vertices: 10 ~ 200

Page 19: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

19

IBM T. J. Watson Research Center

Graph Pattern Mining | © 2008 IBM Corporation

Efficiency

Vertical Pruning

Horizontal Pruning

Page 20: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

20

IBM T. J. Watson Research Center

Graph Pattern Mining | © 2008 IBM Corporation

Effectiveness (runtime)

frequency descending

frequency descending+ leap mine

Page 21: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

21

IBM T. J. Watson Research Center

Graph Pattern Mining | © 2008 IBM Corporation

Effectiveness (accuracy)

slightly different

Page 22: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

22

IBM T. J. Watson Research Center

Graph Pattern Mining | © 2008 IBM Corporation

Graph Classification

Name OA Kernel LEAP OA Kernel (6x) LEAP (6x)

Average (AUC) 0.70 0.72 0.75 0.77

* OA Kernel: Optimal Assignment Kernel LEAP: LEAP search

(6x)

(6x)

Page 23: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

23

IBM T. J. Watson Research Center

Graph Pattern Mining | © 2008 IBM Corporation

Scalability Means Something !

LEAP

OA

LEAP(6X)

OA(6X)

~20sec

~100sec

~200sec

~8000sec

Linear

Quadratic

Page 24: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

24

IBM T. J. Watson Research Center

Graph Pattern Mining | © 2008 IBM Corporation

Direct Pattern Mining Framework

Exploratory task

Graph clustering

Graph classification

Graph index

Graph Database Optimal Graph Patterns

Direct

Page 25: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

25

IBM T. J. Watson Research Center

Graph Pattern Mining | © 2008 IBM Corporation

Beyond Graph Patterns

Exploratory task

Clustering

Classification

Index

itemset/sequence/tree Database Optimal Patterns

Direct

1. Direct mining can be applied to itemsets, sequences, and trees

2. Existing algorithms can be recycled to mine patterns with sophisticated measures.

3. Pattern-based methods including indexing and classification are competitive.

Page 26: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

26

IBM T. J. Watson Research Center

Graph Pattern Mining | © 2008 IBM Corporation

Thank you

Direct Mining of Discriminative and Essential Graphicaland Itemset Features via Model-based Search Tree

SIGKDD’08 @ Las Vegas

Page 27: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

27

IBM T. J. Watson Research Center

Graph Pattern Mining | © 2008 IBM Corporation

Graph Classification: Kernel Approach

Kernel-based Graph Classification

Optimal Assignment Kernel (Fröhlich et al. ICML’05)