Top Banner
Graph Theoretic Latent Class Discovery and It’s Robustness to Minimal Dominating Set Choice J. L. Solka, C. E. Priebe, and D. J. Marchette [email protected];[email protected] NSWCDD Interface04 – p.1/24
24

Graph Theoretic Latent Class Discovery and It’s · PDF fileGraph Theoretic Latent Class Discovery and It’s ... and text data mining. ... and B. T. Clark, “A Visualization...

Jan 31, 2018

Download

Documents

vandieu
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Graph Theoretic Latent Class Discovery and It’s · PDF fileGraph Theoretic Latent Class Discovery and It’s ... and text data mining. ... and B. T. Clark, “A Visualization Framework

Graph Theoretic LatentClass Discovery and It’s

Robustness to Minimal Dominating SetChoice

J. L. Solka, C. E. Priebe, and D. J. Marchette

[email protected];[email protected]

NSWCDD

Interface04 – p.1/24

Page 2: Graph Theoretic Latent Class Discovery and It’s · PDF fileGraph Theoretic Latent Class Discovery and It’s ... and text data mining. ... and B. T. Clark, “A Visualization Framework

Agenda

What is latent class discovery?

What are some approaches to the latent

class discovery process?

The class cover catch digraph classifier.

Latent class discovery results on a gene

expression data set.

Wrap-up and conclusions.

Interface04 – p.2/24

Page 3: Graph Theoretic Latent Class Discovery and It’s · PDF fileGraph Theoretic Latent Class Discovery and It’s ... and text data mining. ... and B. T. Clark, “A Visualization Framework

Acknowledgments

Michael C. Minnotte and Jurgen Symanzik,

and others for organizing the conference

Office of Naval Research through their ILIR

Program for funding this effort

Interface04 – p.3/24

Page 4: Graph Theoretic Latent Class Discovery and It’s · PDF fileGraph Theoretic Latent Class Discovery and It’s ... and text data mining. ... and B. T. Clark, “A Visualization Framework

What is Latent ClassDiscovery?

A latent class is a class of observations that reside

undiscovered within a known class of observations.

Develop a general methodology for the discernment of latent

class structure during discriminant analysis.

Moderately large hyperdimensional data sets.

During training or testing.

Explore applications of developed methodologies to the

analysis of data sets in the areas hyperdimensional image

analysis, artificial olfactory systems, computer security data,

gene expression data, and text data mining.

Interface04 – p.4/24

Page 5: Graph Theoretic Latent Class Discovery and It’s · PDF fileGraph Theoretic Latent Class Discovery and It’s ... and text data mining. ... and B. T. Clark, “A Visualization Framework

Flow Chart

HYPERDIMENSIONAL DATA

GRAPH THEORETICDISCRIMINANTANALYSIS

METRICSPACEADAPTATION

LATENTCLASSES

NONLINEARDIMENSIONALITYREDUCTION

MULTIDIMENSIONALSCALING

IINNSSIIGGHHTTSS

Interface04 – p.5/24

Page 6: Graph Theoretic Latent Class Discovery and It’s · PDF fileGraph Theoretic Latent Class Discovery and It’s ... and text data mining. ... and B. T. Clark, “A Visualization Framework

Dominating Set

two− class data andcovering discs

Dominatingset

Interface04 – p.6/24

Page 7: Graph Theoretic Latent Class Discovery and It’s · PDF fileGraph Theoretic Latent Class Discovery and It’s ... and text data mining. ... and B. T. Clark, “A Visualization Framework

CCCD-Based Latent Class Discovery

−6 −5 −4 −3 −2 −1 0 1 2 3 4−7

−6

−5

−4

−3

−2

−1

0

1

2

3

Interface04 – p.7/24

Page 8: Graph Theoretic Latent Class Discovery and It’s · PDF fileGraph Theoretic Latent Class Discovery and It’s ... and text data mining. ... and B. T. Clark, “A Visualization Framework

ALL/AML Leukemia GeneExpression Analysis

72 Patients

7129 genes

Apply CCCDto ALL Observations

Cluster CCCDSolution Based on Radii

Examine Clusters forLatent Class Structure

Ascertain Significance ofLatent Class Structure

= AML

= ALL B− cel l

= ALL T− cel l

Interface04 – p.8/24

Page 9: Graph Theoretic Latent Class Discovery and It’s · PDF fileGraph Theoretic Latent Class Discovery and It’s ... and text data mining. ... and B. T. Clark, “A Visualization Framework

Resubstitution ErrorRate Estimate

For each

� � ��� � � � � ��� an empirical risk (resubstitution error rate estimate)

��� iscalculated as

� � � � � � ��� � � � � ��

��� �� ��� � ��� �! � �#" $ $ $ " �&% ' (*),+ - ��. � / 0213' (),+4 3�5

� 6��� �

� ��7 � � �! � �#" $ $ $ " � % ' ( )+ - ��. � / 0213' ()8+4 3�5 �

Interface04 – p.9/24

Page 10: Graph Theoretic Latent Class Discovery and It’s · PDF fileGraph Theoretic Latent Class Discovery and It’s ... and text data mining. ... and B. T. Clark, “A Visualization Framework

ClassificationDimension

We proceed by defining the “scale dimension”

��� �

to be the cluster map dimension that

minimizes a dimensionality-penalized empirical risk;

��� ��� � / 021 ��� � � / 021 ��� � � �5

for

some penalty coefficient

� � � � �

.

Interface04 – p.10/24

Page 11: Graph Theoretic Latent Class Discovery and It’s · PDF fileGraph Theoretic Latent Class Discovery and It’s ... and text data mining. ... and B. T. Clark, “A Visualization Framework

ALL/AML ClassificationDimension Plot

Interface04 – p.11/24

Page 12: Graph Theoretic Latent Class Discovery and It’s · PDF fileGraph Theoretic Latent Class Discovery and It’s ... and text data mining. ... and B. T. Clark, “A Visualization Framework

Gene Latent ClassDiscovery

Interface04 – p.12/24

Page 13: Graph Theoretic Latent Class Discovery and It’s · PDF fileGraph Theoretic Latent Class Discovery and It’s ... and text data mining. ... and B. T. Clark, “A Visualization Framework

ALL/AML MDS Plot

Interface04 – p.13/24

Page 14: Graph Theoretic Latent Class Discovery and It’s · PDF fileGraph Theoretic Latent Class Discovery and It’s ... and text data mining. ... and B. T. Clark, “A Visualization Framework

How Robust is theMethodology?

One other “success” story using artificial nose data.

What if we had used another dominating set in ouranalysis?

Is the discovered latent class structure independent ofthe dominating set used?

Interface04 – p.14/24

Page 15: Graph Theoretic Latent Class Discovery and It’s · PDF fileGraph Theoretic Latent Class Discovery and It’s ... and text data mining. ... and B. T. Clark, “A Visualization Framework

An Exhaustive Enumeration ofAll Possible Dominating Setsfor the Gene Data

180 21 node solutions

16 of the nodes remain fixed across the solutions

14 greedy solutions

Interface04 – p.15/24

Page 16: Graph Theoretic Latent Class Discovery and It’s · PDF fileGraph Theoretic Latent Class Discovery and It’s ... and text data mining. ... and B. T. Clark, “A Visualization Framework

Classification Space Curvesfor the 180 Solutions

5 10 15 20

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Interface04 – p.16/24

Page 17: Graph Theoretic Latent Class Discovery and It’s · PDF fileGraph Theoretic Latent Class Discovery and It’s ... and text data mining. ... and B. T. Clark, “A Visualization Framework

Classification Dimension for the 180Solutions (red o Greedy Solutions,Green * Previous Solution)

0 20 40 60 80 100 120 140 160 1802

2.5

3

3.5

4

4.5

5

5.5

6

6.5

7

Interface04 – p.17/24

Page 18: Graph Theoretic Latent Class Discovery and It’s · PDF fileGraph Theoretic Latent Class Discovery and It’s ... and text data mining. ... and B. T. Clark, “A Visualization Framework

Number of Dominating Sets forEach Vertex

0 10 20 30 40

050

100

150

Number of Dominating sets for each vertex

Vertex

# D

omin

atin

g S

ets

T−CellB−CellIn−degree 0

Interface04 – p.18/24

Page 19: Graph Theoretic Latent Class Discovery and It’s · PDF fileGraph Theoretic Latent Class Discovery and It’s ... and text data mining. ... and B. T. Clark, “A Visualization Framework

Digraph Analysis

� �

� �

� �

� �

� �

� �

� �

� �

� �

� �

! "

#

$ %

& '

(

) *

+ ,

-

. /

0 1

2

3 4

5 6

78 9: ; < => ?@ 8 A: < 8 @ B: C < BD : E ; FG HD I@ J H < K C H F@ 98 @ 9 L < ; J 8 C <D 8 @ J H < =M N B I O8 @ F J 8 @ 9D < JDP

Q

R S

T U

V

W X

Y Z

[

\ ]

^ _

`

a b

c d

e

f g

h i

j

k l

m n

78 9: ; < o> ?@ 8 A: < 8 @ B: C < B D : E ; FG HD I@ J H < K C H F@ 98 @ 9 L < ; J 8 C <D 8 @ J H < = p B I O8 @ F J 8 @ 9 D < JD J H F J C I: q B ; <D : q J

r ; I O F 9 ; < < Bs F q 9 I ;8 J H OP

=

Interface04 – p.19/24

Page 20: Graph Theoretic Latent Class Discovery and It’s · PDF fileGraph Theoretic Latent Class Discovery and It’s ... and text data mining. ... and B. T. Clark, “A Visualization Framework

Latent Class DiscoveryFigures of Merit

How can we be assured that all of the greedy dominating set solutions discover thesame latent classes?

Previous greedy solution had 3 clusters that are pure B and 1 cluster thatcontained 8/9 of the T observations

Percentage of B points that are in pure B clusters and the highest percentage of Tpoints in any one cluster

Interface04 – p.20/24

Page 21: Graph Theoretic Latent Class Discovery and It’s · PDF fileGraph Theoretic Latent Class Discovery and It’s ... and text data mining. ... and B. T. Clark, “A Visualization Framework

Purity (Latent Class Discovery) forthe Golub Gene Data , Red Trianglesare the Greedy Solutions

0.4 0.5 0.6 0.7 0.8 0.9

0.80

0.85

0.90

0.95

1.00

bpercent

tper

cent

Interface04 – p.21/24

Page 22: Graph Theoretic Latent Class Discovery and It’s · PDF fileGraph Theoretic Latent Class Discovery and It’s ... and text data mining. ... and B. T. Clark, “A Visualization Framework

Remaining QuestionsDemonstrated similar latent class discovery among allof the greedy dominating set solutions

Many of the 7129 variates (genes) are superfluous tothe discriminant analysis problem

Work is ongoing to examine the discovered latentclasses based on subsets of the genes

Various figures of merit have been used to choose thesubsets of the genes

Interface04 – p.22/24

Page 23: Graph Theoretic Latent Class Discovery and It’s · PDF fileGraph Theoretic Latent Class Discovery and It’s ... and text data mining. ... and B. T. Clark, “A Visualization Framework

ConclusionsDeveloped a new concept for latent class discovery during

discriminant analysis

Illustrated one graph theoretic methodology for the

discovery of the latent classes

Illustrated this methodology with a gene expression data set.

Presented some preliminary results examining the

robustness of the discovery process to the cccd process

Interface04 – p.23/24

Page 24: Graph Theoretic Latent Class Discovery and It’s · PDF fileGraph Theoretic Latent Class Discovery and It’s ... and text data mining. ... and B. T. Clark, “A Visualization Framework

ReadingsC. E. Priebe, J. L. Solka, D. J. Marchette, and B. T. Clark, “Class Cover CatchDigraphs for Latent Class Discovery in Gene Expression Monitoring by DNAMicroarrays,” to appear the Special Issue of Computational Statistics and DataAnalysis on Statistical Visualization, 2002+.

J. L. Solka, C. E. Priebe, and B. T. Clark, “A Visualization Framework for theAnalysis of Hyperdimensional Data,” in International Journal of Image andGraphics Special Issue on Data Mining, 2002.

Marchette, D.J., Priebe, C.E., “Characterizing the scale dimension of ahigh-dimensional classification problem,”in Pattern Recognition,2002

Interface04 – p.24/24