Top Banner
Association Mining via Co-clustering of Sparse Matrices Brian Thompson * , Linda Ness , David Shallcross , Devasis Bassu *
19

Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†

Jan 04, 2016

Download

Documents

Alban Foster
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†

Association Mining viaCo-clustering of Sparse

MatricesBrian Thompson*, Linda Ness†,David Shallcross†, Devasis Bassu†

* †

Page 2: Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†

Definitions

Let be an matrix. A bicluster of is a subset of matrix entries formed by the intersection of a set of rows and a set of columns , and is denoted by .

Association Mining via Co-clustering of Sparse Matrices

𝑀𝑀 𝐼 , 𝐽

Page 3: Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†

Motivation

Matrices can represent: binary relations, objects and attributes, terms and documents, gene expression, recommender systems, ...

Dense biclusters indicate strong associations

Association Mining via Co-clustering of Sparse Matrices

𝑀𝑀 𝐼 , 𝐽

Page 4: Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†

Motivation

Matrices can represent: binary relations, objects and attributes, terms and documents, gene expression, recommender systems, ...

Dense biclusters indicate strong associations

Association Mining via Co-clustering of Sparse Matrices

𝑀𝑀 𝐼 , 𝐽

Page 5: Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†

Co-Clustering

Co-clustering: Given a matrix, cluster the rows and columns to form large, dense biclusters

Challenges:Don’t know the number or sizes of clusters a prioriWant solution to be efficient and scalableMatrix may be sparse

Association Mining via Co-clustering of Sparse Matrices

R1

R2

R3

C1 C2 C3

Page 6: Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†

Our Approach

We propose a two-step approach:

1. Define a quality metric for bicluster partitions

We consider metrics of the form

(Motivation for this choice is in the 15-minute version of the talk...)

2. Find a co-clustering that maximizes the value of

We propose the CC-MACS algorithm

(Co-Clustering via Maximal Anti-Chain Search)

Association Mining via Co-clustering of Sparse Matrices

Page 7: Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†

The CC-MACS Algorithm

1. Build randomized k-d trees on rows (), cols ()

2. Populate for via DP

3. Initialize MACs and heaps ;

4. While at least one of and is non-empty:

• WLOG let

• Update data structures and variables:, , , for

• If , add to

5. Return co-clustering formed by

Association Mining via Co-clustering of Sparse Matrices

Page 8: Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†

The CC-MACS Algorithm

Page 9: Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†

The CC-MACS Algorithm

Page 10: Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†

The CC-MACS Algorithm

Page 11: Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†

The CC-MACS Algorithm

Page 12: Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†

The CC-MACS Algorithm

Page 13: Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†

The CC-MACS Algorithm

Page 14: Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†

Experiments: Synthetic Data

• Generate matrix with biclusters of size selected randomly from ; non-bicluster entries are , each bicluster entry is a with probability

• Want co-clustering output to match ground truth

• Compare via -score:

Association Mining via Co-clustering of Sparse Matrices

Page 15: Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†

Experiments: Real-World Data

• Matrices from domains of finite element modeling and quantum chemistry [src: NIST Matrix Market repository]

Association Mining via Co-clustering of Sparse Matrices

Dataset Original Matrix

Cross-Associatio

n

CC-MACS ()

CC-MACS ()

CC-MACS ()

Page 16: Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†

Concluding Thoughts

• The CC-MACS algorithm runs in time.

• Our approach compared favorably to state-of-the-art and baseline methods for a classification task on synthetic data.

• Choice of metric can affect quality and granularity of results; different metrics may be appropriate for different applications.

• The CC-MACS algorithm effectively identified large, dense biclusters in the datasets evaluated.

Association Mining via Co-clustering of Sparse Matrices

Page 17: Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†

Acknowledgements/Disclaimer

This research was supported by the Intelligence Advanced Research Projects Activity (IARPA) via Air Force Research Laboratory (AFRL) contract number FA8650-10-C-706. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, AFRL, or the U.S. Government.

Any misinformation, mistakes, or misunderstanding resulting from this talk are solely the fault of the speaker.

Association Mining via Co-clustering of Sparse Matrices

Page 18: Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†

Questions?

Association Mining via Co-clustering of Sparse Matrices

Page 19: Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†

Example Matrices

Spectral methods, which try to rearrange rows and columns to form a diagonal block matrix, would not perform well on this matrix.

The dashed lines suggest a good co-clustering.

Association Mining via Co-clustering of Sparse Matrices