Top Banner
1 Mining Tree Queries in a Graph Bart Goethals , Eveline Hoekx a nd Jan Van den Bussche KDD’05 presentor: Ming Jing Tsai
21

1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai.

Dec 26, 2015

Download

Documents

Todd Stokes
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai.

1

Mining Tree Queries in a Graph

Bart Goethals , Eveline Hoekx and Jan Van den Bussche

KDD’05presentor: Ming Jing Tsai

Page 2: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai.

2

Introduction

mining tree pattern T in a single graph Incremental in the number of nodes Unordered, rooted

For each tree T, all conjunctive queries are generated

SQL

Page 3: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai.

3

Tree query pattern example

Selected node(constant):0,8 Existential node:∃ Distinguished node: x

Page 4: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai.

4

matching A query Q matchs in a graph G Homomorphism h

(i,j) ∈ Q , (h(i), h(j)) ∈ G Verify value on x to distinguish them

Don’t care existential nodes on different values

Page 5: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai.

5

∃0 8

Q

G

Frequency = 3(4,5,8)

Page 6: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai.

6

Generate all trees

Increasing number of nodes Canonically ordered

Level sequence ith number is the depth of the ith node in preord

er Lexicagraph:Maximal one

Level sequence 012212 > 012122

Page 7: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai.

7

queries

Levelwise Fix a tree T, and find all queries based o

n T whose frequency in G is at lease k Q{∏, ∑, λ}

∏: existential nodes ∑: selected nodes λ: label of selected nodes

Page 8: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai.

8

Page 9: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai.

9

To generate candidate in an efficient manner,using of candidacy tables and frequency tables

Page 10: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai.

10

CanTab ∏, ∑

parents

Each candidacy table can be computed by taking the natural join of its parent’s(∏’, ∑’) frequency tables

CanTabφ,{x} as the table with a single column x,holding all nodes of the graph G being mined

Page 11: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai.

11

∏=x2,formulate expression->SQL

∑={x1,x3} Candidacy table

Frequency table

Page 12: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai.

12

Equivalent queries

To avoid query Q2 equivalent to an earlier query Q1

Containment mapping Q1 to Q2 is a homomorphism the distinguished variables of Q1 is mapping

one-to-one to those of Q2 So as selected nodes

Case1:Q1 has fewer nodes than Q2 Case2:Q1 and Q2 have the same number

of nodes

Page 13: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai.

13

Case1 redundancy checking

Q2 contains redundant subtrees such that removing them yields an equivalent query

Redundancy a subtree C in the form of a linear chain of exist

ential nodes such that parent of C has another subtree that is at least as deep as C

Q1Q2Q2

Page 14: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai.

14

Case 2 canonical forms

Q1 and Q2 are tree isomorphism Canonical forms

Existential nodes-> ∃ Selceted nodes ->c Distinguished nodes->X

C, ∃

∃,C

∃,X

C,X

X,C

X,X

C, ∃

∃,C

∃,X

C,X

X,C

X,X

Page 15: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai.

15

experiment

Pentium4 2.8GHz 1GB main memory Linux 2.6 C++ embedded SQL Relational database:DB2 UDB v8.2

Page 16: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai.

16

Real dataset

A food web, a protein intersactions graph, and a citation graph

k: frequency threshold Size: maximal size of trees in the run It all takes several hours

Page 17: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai.

17

Food web

154 species dependent on Scotch Broom Label 20 occurs in many frequent patterns->

Orthotylus adenocarpi( 什麼都吃的植物害蟲 )

Frequency 176

Page 18: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai.

18

Protein interaction graph

1870 種 Saccharomyces cerevisiae 發酵酵母菌 ( 幫助麵包發酵 )

A small number of highly connected nodes occur

Page 19: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai.

19

Citation graph

Kdd cup 2003 2500 papers high-energy physics 350,000 cross-references

Frequency 1655

Page 20: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai.

20

Synthetic data,web graphs Tree size 5 Minsup 4,10,25

Page 21: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai.

21

Uniform random graphs

Dense, uniform minsup: 10,25 edges:47,264,997