Top Banner
New Algorithms for New Algorithms for Enumerating Enumerating All Maximal Cliques All Maximal Cliques Kazuhisa Makino Kazuhisa Makino Takeaki Uno Takeaki Uno Osaka University National Institute of JAPAN Informatics, JAPAN
22

New Algorithms for Enumerating All Maximal Cliques Kazuhisa Makino Takeaki Uno Kazuhisa Makino Takeaki Uno Osaka University National Institute of JAPAN.

Dec 14, 2015

Download

Documents

Aron Greene
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: New Algorithms for Enumerating All Maximal Cliques Kazuhisa Makino Takeaki Uno Kazuhisa Makino Takeaki Uno Osaka University National Institute of JAPAN.

New Algorithms for EnumeratingNew Algorithms for Enumerating All Maximal Cliques All Maximal Cliques

Kazuhisa Makino Takeaki Uno Kazuhisa Makino Takeaki Uno Osaka University National Institute of

JAPAN Informatics, JAPAN

9/Jul/2004 SWAT 2004

Page 2: New Algorithms for Enumerating All Maximal Cliques Kazuhisa Makino Takeaki Uno Kazuhisa Makino Takeaki Uno Osaka University National Institute of JAPAN.

BackgroundBackground

Recently, Enumeration algorithms are interestingRecently, Enumeration algorithms are interesting

・・ There are still many unsolved nice problems

(unlike to ordinal discrete algorithms)

・・ Recent increase of computer power makes

many enumeration problems practically solvable

many applications have been appearing,

such as, genome, data mining, clustering, so on

・・ Some (theoretical) algorithms use enumeration as subroutines

(recognition of perfect graph)

Page 3: New Algorithms for Enumerating All Maximal Cliques Kazuhisa Makino Takeaki Uno Kazuhisa Makino Takeaki Uno Osaka University National Institute of JAPAN.

Background (cont.)Background (cont.)

・・ My institute has 100 researchers of informatics

・ ・ At least 5 researchers (independently) use implementations of enumeration algorithms

・・ Suppose that there are 100,000 researchers of informatics

in the world

5000 researchers use enumeration algorithms ?????

Page 4: New Algorithms for Enumerating All Maximal Cliques Kazuhisa Makino Takeaki Uno Kazuhisa Makino Takeaki Uno Osaka University National Institute of JAPAN.

Problems and ResultsProblems and Results

Problem1 :Problem1 : for a given graph G=(V, E),

enumerate all maximal cliques in G

Problem2 :Problem2 : for a given bipartite graph G=(V1∪V2, E),

enumerate all maximal bipartite cliques in G

( Problem2 Problem2 is a special case of Problem1Problem1 )

・・ We propose algorithms for solving these problems,

reduce the time complexity in dense cases and sparse cases.

・・ Computational experiments for random graphs and real-world data

Page 5: New Algorithms for Enumerating All Maximal Cliques Kazuhisa Makino Takeaki Uno Kazuhisa Makino Takeaki Uno Osaka University National Institute of JAPAN.

DifficultyDifficulty

・・ Consider branch-and-bound type enumeration:

divide maximal cliques into two groups

maximal cliques including v / not including v

・・ If a group includes no maximal clique, cut off the branch

Finding a maximal clique not including given vertices of S

is NP-Complete

Can not cut off subproblems(branches)

including no maximal cliquev1∈K v1∈K

v2∈Kv2∈K

Page 6: New Algorithms for Enumerating All Maximal Cliques Kazuhisa Makino Takeaki Uno Kazuhisa Makino Takeaki Uno Osaka University National Institute of JAPAN.

Existing Studies and OursExisting Studies and Ours

O(|V||E|): Tsukiyama, Ide, Ariyoshi & Shirakawa,

O(|V||E|), lexicographic order: Johnson, Yanakakis & Papadimitriou

O(a(G)|E|): Chiba & Nishizeki

( a(G): arboricity of G with m/(n-1) ≦ a(G) ≦m1/2 )

・・ many heuristic algorithms in data mining, for bipartite case

Ours:Ours:O(|V|2.376) (dense case) (dense case)O(Δ4) (sparse case) (sparse case)O((Δ*)4 + θ3 ) ( (θ vertices have degree vertices have degree > Δ* ) )O(Δ3) (bipartite case) (bipartite case)O(Δ2) (bipartite case with using much memory) (bipartite case with using much memory)

Page 7: New Algorithms for Enumerating All Maximal Cliques Kazuhisa Makino Takeaki Uno Kazuhisa Makino Takeaki Uno Osaka University National Institute of JAPAN.

Enumeration of Maximal CliquesEnumeration of Maximal Cliques

・・ Improved version of algorithm of Tsukiyama et. al.Idea: Construct a route on all maximal cliques to be traversedIdea: Construct a route on all maximal cliques to be traversed

・・ For a maximal clique K of G = ( V, E ) : C (K) : lexicographically maximum maximal clique including K

K≦i : vertices of K with indices ≦ i

i(K) : minimum index s.t. C(K≦i) = C(K≦i+1)

parentparent of a maximal clique K : C(K≦i(K)-1) ・・ parent is lexicographically larger than K

K i(K)

11

2255

88

1010

111144

99

88

101033

66

77

33

44

66

77

99

1,3,61,3,6 >> 1,4,51,4,5

1,2,31,2,3 >> 1,2,41,2,4

LexicographicallyLexicographically largerlarger

Page 8: New Algorithms for Enumerating All Maximal Cliques Kazuhisa Makino Takeaki Uno Kazuhisa Makino Takeaki Uno Osaka University National Institute of JAPAN.

Graph Representation of RelationGraph Representation of Relation

・・ Parent-child relation is acyclic

  graph representation forms a tree (enumeration treeenumeration tree)

Visit all maximal cliques by depth-first search

・・ need to find children of a maximal clique

Page 9: New Algorithms for Enumerating All Maximal Cliques Kazuhisa Makino Takeaki Uno Kazuhisa Makino Takeaki Uno Osaka University National Institute of JAPAN.

Child of Maximal CliqueChild of Maximal Clique

Γ(vi) : vertices adjacent to vi

K[i] = C ( K≦i ∩ Γ(vi) {∪ vi} )

・・ H is a child of K only if only if H = K[i] for some i>i(K)

(H is a child of K ifif the parent of K[i] is K )

・・ i(K[i]) = i

・・ construct K[i] in O(|E|) time

・・ construct parent in O(|E|) time

( O(Δ2 ) time)

・・ for i=i(K)+1,…,|V| in O(|V||E|) time

enumerate O(|V||E|) time

per maximal clique

K,i(K)=611

2255

88

1010

111144 9944 99

K[8]

1010

88

33

66

77

Page 10: New Algorithms for Enumerating All Maximal Cliques Kazuhisa Makino Takeaki Uno Kazuhisa Makino Takeaki Uno Osaka University National Institute of JAPAN.

Characterization of ChildCharacterization of Child

The parent of K[i] = K ⇔⇔

(1) no vj , j<i is adjacent to all vertices in K≦i ∩Γ(vi) {∪ vi}

(2) no vj , j<i is adjacent to all vertices in K≦i∩Γ(vi) ∪ K≦j

(1) is not satisfied ⇔⇔ K[i] and parent of K[i] includes vj∈K

(2) is not satisfied ⇔⇔ parent of K[i] includes vj∈K

11

55

1010

4499

K 10≦ ∩Γ(v10) ∪ {v10}

44

K 5≦ ∪

77

33

K = {3,4,7,9} K[10] = {3,7,10} K 5≦ = {3,4}K 7≦ ∩Γ(v10) = {3,7}

Page 11: New Algorithms for Enumerating All Maximal Cliques Kazuhisa Makino Takeaki Uno Kazuhisa Makino Takeaki Uno Osaka University National Institute of JAPAN.

Use of Matrix MultiplicationUse of Matrix Multiplication

・ ・ Check the conditions (1) and (2) by matrix multiplication

(1) no vj , j<i is adjacent to all vertices in K ≦i ∩Γ(vi) {∪ vi}

ith row of left ⇒⇒ K≦i∩Γ(vi) {∪ vi}

jth column of right ⇒⇒ Γ(vj)

ij cell of product ⇒ ⇒ | K≦i∩Γ(vi) {∪ vi} ∩ Γ(vj) |

K≦i∩Γ(vi) {∪ vi} Γ(vj)

Γ(vj) ∩ K ≦i ∩Γ(vi) {∪ vi}

== |K≦i∩Γ(vi) {∪ vi}| ??

Checked in O( |V|2.368 ) time ⇒ ⇒ time complexity is O( |V|2.368 ) for each

Condition (2) can be checked in the same way

Page 12: New Algorithms for Enumerating All Maximal Cliques Kazuhisa Makino Takeaki Uno Kazuhisa Makino Takeaki Uno Osaka University National Institute of JAPAN.

Sparse CasesSparse Cases

・・ If vi is adjacent to no vertex in K

K[i] = C ( K≦i ∩ Γ(vi) {∪ vi} ) = C ({vi})

parent of K[i] = C ( C ({vi}) ≦i )

If C ({vi}) ≦i = φ, parent of K[i] is K0

If C ({vi}) ≦i ≠φ, (1) is not satisfied

If K ≠ K0, K[i] is not a child of K

・・ Since |K|≦Δ+1 , at most Δ(Δ+1) vertices are adjacent to K

・・ Each K[i] takes O(Δ2) time to construct the parentO(Δ4 ) per maximal clique

Δ: max. degree

O((Δ*)4 + |Θ|3 ) if partially dense

Δ*: max. degree in V \ Θ

Page 13: New Algorithms for Enumerating All Maximal Cliques Kazuhisa Makino Takeaki Uno Kazuhisa Makino Takeaki Uno Osaka University National Institute of JAPAN.

Bipartite CliqueBipartite Clique

・ Enumerate maximal bipartite cliques in G =(V1 ∪V2 ,E )

( = maximal cliques in G’ = (V1 ∪V2 , E ∪V1 ×V1 ∪V2×V2 ))

enumerated in O( |V|2.368 ) time for each

・ But a sparse bipartite graph will be dense

need some improvements for sparse cases

V1 V2

Page 14: New Algorithms for Enumerating All Maximal Cliques Kazuhisa Makino Takeaki Uno Kazuhisa Makino Takeaki Uno Osaka University National Institute of JAPAN.

K[i]

Fast Construction of Fast Construction of K[i]

・ For any maximal bipartite clique K

K ∩V2 = ∩v∈K ∩V1 Γ(v)

K ∩V1 = ∩v∈K ∩V2 Γ(v)

・ K[i]∩V1 for all i are computed in O(Δ2) time

・ K[i] for all i are computed in O(Δ3) time

V1

V2

1 2

vi

3 4

v1 v2 v5 v6

Γ(1)

Γ(2)

Γ(3)

Γ(4)

K[v1] K[v6]

Page 15: New Algorithms for Enumerating All Maximal Cliques Kazuhisa Makino Takeaki Uno Kazuhisa Makino Takeaki Uno Osaka University National Institute of JAPAN.

K[i]

Checking the ParentChecking the Parent

・ Put small indices to V1 , large indices to V2

K[i] is a child of K ⇔ ⇔ K[i]≦i = K≦i

checked in O(Δ) time

1V1

V2

2 3 ・・・ |V1|-1 |V1|

|V1|+1 |V1|+2 ・・・

V1

V2vi

Enumerated in O(Δ3) time for each O(Δ2) by using memory

Page 16: New Algorithms for Enumerating All Maximal Cliques Kazuhisa Makino Takeaki Uno Kazuhisa Makino Takeaki Uno Osaka University National Institute of JAPAN.

Computational ExperimentsComputational Experiments

・ for graphs randomly generated

・ vertex vi is connected to vertices from i-r to i+r with probability 1/2

CPU time / degree

0246810121416

degree

CPUtime / 10,000 maximal cliques

0

500

1000

1500

2000

2500

3000

3500

Tsukiyama r=10

Ours r=10

Tsukiyama r=30

Ours r=30

・ Faster than Tsukiyama’s algorithm

・ Computation time is linear in maximum degree

Page 17: New Algorithms for Enumerating All Maximal Cliques Kazuhisa Makino Takeaki Uno Kazuhisa Makino Takeaki Uno Osaka University National Institute of JAPAN.

Benchmark ProblemsBenchmark Problems

・ Problem of finding frequent closed item sets from database

equivalent to maximal bipartite clique enumeration

・ Used on KDDcup (data mining algorithm competition )

BMS-WebView1     (from Web-log data)

|V|= 60,000, ave. degree 2.5

BMS-WebView2   (from Web-log data)

|V|= 80,000, ave. degree 5

BMS-POS (from POS data)

|V|= 510,000, ave. degree 6

IBM-Artificial   (artificial data)

|V|= 100,000 , ave.degree 10

Page 18: New Algorithms for Enumerating All Maximal Cliques Kazuhisa Makino Takeaki Uno Kazuhisa Makino Takeaki Uno Osaka University National Institute of JAPAN.

BMS-POS

10

100

1000

10000

threshold(%)

time(sec)

Apiori

FP-growth

CHARM

Ours

IBM-artificial

1

10

100

1000

threshold(%)

time(sec)

Apriori

FP-growth

closet

CHARM

Ours

BMS-WebView1

1

10

100

1000

threshold(%)

time(sec)

Apriori

FP-growth

closet

CHARM

Ours

ResultsResults

BMS-WebView2

1

10

100

1000

10000

100000

threshold(%)

time(sec)

Apriori

FP-growth

closet

CHARM

Ours

Page 19: New Algorithms for Enumerating All Maximal Cliques Kazuhisa Makino Takeaki Uno Kazuhisa Makino Takeaki Uno Osaka University National Institute of JAPAN.

Conclusion and Future WorkConclusion and Future Work

・ Proposed fast algorithms for enumerating

maximal cliques: O(|V|2.376), O(Δ4 ), O((Δ*)4 + θ3 )

maximal bipartite cliques: O(|V|2.376), O(Δ3 ), O(Δ2)

・ Examined benchmark problems of data mining,

and showed that our algorithm performs well.

Future work:Future work:

・ Can we improve more? What is the difficulty ?

・ Can we enumerate other maximal (minimal) graph objects ?

・ Can we apply matrix multiplication to other enumeration problems ?

・ What can be enumerated efficiently in practice ?

Page 20: New Algorithms for Enumerating All Maximal Cliques Kazuhisa Makino Takeaki Uno Kazuhisa Makino Takeaki Uno Osaka University National Institute of JAPAN.

Frequent SetsFrequent Sets

Input graph:Input graph:

An item and a customer is connected

iff the customer purchased the item

In a maximal bipartite clique:

Customers: have similar favorites

Items: frequently purchased together

[Agrawal et al. 96, Zaki et al. 02, Pei 00, Han 00, … ]

customer1

customer2

customer3

customer4

beer

nappy

milk

Page 21: New Algorithms for Enumerating All Maximal Cliques Kazuhisa Makino Takeaki Uno Kazuhisa Makino Takeaki Uno Osaka University National Institute of JAPAN.

Few Large Degree VerticesFew Large Degree Vertices

・・ Very few vertices (denoted by Θ) have large degrees

・・ Divide the maximal cliques into two groups: (a) cliques not included in Θ (b) cliques included in Θ

・・ (a) can be enumerated in O(Δ’4) time・ ・ Maximal clique K in the induced graph by Θ is a maximal clique of G ⇔⇔ K is not included in any of (a) O(|Θ|3) time for each

O(Δ’4 + |Θ|3 ) per maximal clique

small degree small degree < < Δ’Δ’

large degree

Page 22: New Algorithms for Enumerating All Maximal Cliques Kazuhisa Makino Takeaki Uno Kazuhisa Makino Takeaki Uno Osaka University National Institute of JAPAN.

Avoid Duplications by Using MemoryAvoid Duplications by Using Memory

・ We can avoid duplications by storing all maximal bipartite cliques

・ From K ∩V1 =Γ(K ∩V2) , we store all K ∩V1

1. Get a K from memory (which is un-operated)

2. generate all K[i]∩V1

3. Store each K[i]∩V1 if it is not in memory

4. Go to 1 if a maximal clique is un-operated

Enumerated in O(Δ2) time for each