Top Banner
Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin
35

Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Dec 22, 2015

Download

Documents

Dwain Stokes
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Multiclassifier Systems: Back to the Future

Joydeep Ghosh

The University of Texas at Austin

Page 2: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Agenda

• MCS at crossroads– Even part of SAS,….. but what’s next?

• Historical Tidbits– Selected (old) highlights– Themes worth re-visiting

• Broadening the scope– Combining multiple clusterings

• Knowledge transfer/reuse

– Exploiting output space• Limits to performance, confidences, added classes,…

– Modular approaches revisited

Page 3: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Combining Votes/Ranks• Roots in French revolution?

– Jean-Claude de Borda, 1781– Condorcet’s rule, 1785

• Duncan Black (1958): Condorcet then Borda

– Condorcet’s Jury Theorem, 1785

• Social choice functions or group consensus functions– Arrow’s impossibility theorem (1963)

• Even 3 classes can be problematic

Did not have “true class”

• Linear opinion pool: Laplace

Page 4: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Multi-class Winner-Take All

• Selfridge’s PANDEMONUIM (1958)– Ensembles of specialized demons

– Hierarchy: Data, computational and cognitive demons

– Decision : pick demon that “shouted the loudest”• Hill climbing; re-constituting useless demons, ….

• Nilsson’s Committee Machine (1965)– Pick max of C linear discriminant functions:

g i (X) = Wi T X + wi0

Page 5: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Hybrid PR in the 70’s and 80’s

Theory: “No single model exists for all pattern recognition problems and no single technique is applicable to all problems. Rather what we have is a bag of tools and a bag of problems” Kanal, 1974

Practice: multimodal inputs; multiple representations,…

• Syntactic + structural + statistical approaches (Bunke 86)• multistage models:

– progressively identify/reject subset of classes– invoke KNN if linear classifier is ambiguous

• Combining multiple output types: 0/1; [0,1],…

Designs were typically specific to application

Page 6: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Combining in Other Areas

PR/Vision: Data /sensor/decision fusion

AI: evidence combination (Barnett 81)

Econometrics: combining estimators (Granger 89)

Engineering: Non-linear control

Statistics: model-mix methods, Bayesian Model Averaging,..

Software: diversity

….

Page 7: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

mid 90’s-: Competing vs. Cooperating Models

Data, Knowledge, Sensors,…

Classification/ Regression Model 1

Model 2

Model n

Combiner

Confidence, ROC,

Final Decision

Feedback

Less diversity nowadays??

Page 8: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Motivation for Modular Networks

(Sharkey 97)– More interpretable localized models (Divide and conquer)– Incorporate prior knowledge– Better modeling of inverse problems, discontinuous maps,

switched time series, ..– Future (localized) modifications– Neurobiological plausibility

Varieties:

Cooperative, successive, supervisory,..Automatic or explicit decomposition

• Progress in MCS:– Local selection (Woods et al 97);– dynamic classifiers (Giacinto & Roli, 00)

Page 9: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

DARPA Sonar Transients Classification Program (1989-)

J. Ghosh, S. Beck and L. Deuser, IEEE Jl. of Ocean Engineering, Vol 17, No. 4, October 1992, pp. 351-363.

Ave/median/..

MLP RBF Classifer N

FFT

Pre-processsed Data from Observed Phenomenon

. . .

. . . . . .

Gabor Wavelets

Feature Set M

Page 10: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Ensembles: Insights and Lessons (Ho, MCS 2001)

Additional Observations

• Coverage Optimization:– Bagging/arcing/.. Most popular in machine learning and neural

network communities!

– sweet spot in training data set size

• Decision Optimization:– Usually simple averaging adequate (Kittler et al, 96,98)

– Highly correlated outputs

– Diversity from feature and classifier choices more effective than diversity from samples/training

Page 11: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Cluster Ensembles

• Given a set of provisional partitionings, we want to aggregate them into a single consensus partitioning, even without access to original features .

Clusterer #1

(individual cluster labels)

(consensus labels)

Page 12: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Cluster Ensemble Problem

• Let there be r clusterings (r) with k(r) clusters each

• What is the integrated clustering that optimally summarizes the r given clusterings using k clusters?

Much more difficult than Classification ensembles

Page 13: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Application Scenarios

• Improve quality and robustness– Reduce variance

– Good results on a wide range of data using a diverse portfolio of algorithms

• Knowledge reuse– Consolidate legacy clusterings where original object descriptions

are no longer available

• Distributed Clustering (one clusterer/ node)– Only some features available per clusterer

– Only some objects available per clusterer

– Hybrids

Page 14: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Average Norm. Mutual Info. (ANMI)

• Normalized mutual information between clusterings a, b

• Other normalizations, e.g. using geometric mean, possible

• Proposed: Optimal k consensus clustering

• Empirical validation

Page 15: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Designing a Consensus Function

• Direct optimization – impractical

• Three efficient heuristics– Cluster-based Similarity Partitioning Alg. (CSPA)

• O( n2 k r)

– HyperGraph Partitioning Alg. (HGPA)• O( n k r)

– Meta-Clustering Alg. (MCLA)• O( n k2 r2)

All 3 exploit a hypergraph representation of the sets of cluster labels (input to consensus function)

See AAAI 2002 paper for details.

• Supra-consensus function : performs all three and picks the one with highest ANMI (fully unsupervised)

Page 16: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Hypergraph Representation

• One hyperedge/cluster

• Example:

Page 17: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Applications and Experiments

• Data-setsa) 2-dimensional bi-modal Gaussian simulated data

(k=2, d=2, n=1000)

b) 5 Gaussians in 8-dimensions (k=5, d=8, n=1000)

c) Pen digit data (k=3, d=4, n=7494)

d) Yahoo news web-document data (k=40, d=2903, n=2340)

• application setups– Robust Consensus Clustering (RCC)

– Feature Distributed Clustering (FDC)

Page 18: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Robust Consensus Clustering (RCC)

• Goal: Create an `auto-focus´ clusterer that works for a wide variety of data-sets

• Diverse portfolio of 10 approaches– SOM, HGP

– GP (Eucl, Corr, Cosi, XJac)

– KM (Eucl, Corr, Cosi, XJac)

• Each approach is run on the same subsample of the data and the 10 clusterings combined using our supra-consensus function

• Evaluation using increase in NMI of supra-consensus results increase over Random

Page 19: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Robustness Summary

• Avg. qualityversusensemblequality

• For severalsamplesizes n(50,100,200,400,800)

• 10-fold exp.• ±1 standard

deviation bars

Page 20: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Feature-Distributed Clustering (FDC)

Federated cluster analysis with partial feature views

• Experimental scenario– Portfolio of r clusterers receiving random subset of features for all

objects

• Approach– identical individual clustering algorithm (graph partitioning) and

same k

– Use supra-consensus function for combining

• Evaluation– NMI of consensus with category labels

Page 21: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

FDC Example

• Data: 5 Gaussians in 8 dimensions

• Experiment: 5 clusterings in 2-dimensional subspaces

• Result: Avg. ind. 0.70, best ind. 0.77, ensemble 0.99

Page 22: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Experimental Results FDC• Reference clustering and consensus clustering

• Ensemble always equal or better than individual:

• More than double the avg. individual quality in YAHOO!

Page 23: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Remarks

• Cluster ensembles – Improve quality & robustness

– Enable knowledge reuse

– Work with distributed data

– Are yet largely unexplored

• Future work– Soft in/output clusterings

– What if (some) Features are known?

– Bioinformatics

• Papers, data, demos & code at http://strehl.com/

Page 24: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Solving Related Classification Problems

• Real-world problems are often not isolated

• History:– Compound decision theory (Abend, 68)

90s: Life-long learning, learning to learn, … (Pratt, Thrun,..)

Page 25: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Knowledge transfer or reuse

• Leveraging a set of previously existing solutions for (possibly) related problems

• Scarce new data prior knowledge

ExistingSUPPORT TARGET

Classifier j

Size Color Shape

Classifier i

Size Color Shape

Knowledge Transfer

Page 26: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Supra-Classifier Architecture

K. Bollacker and J. Ghosh, "Knowledge reuse in multiclassifier systems", Pattern Recognition Letters, 18 (11-13), Nov 1997, 1385-90.

Page 27: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Output Space Decomposition

History: • Pandemonium, committee machine

– “1 class vs. all others”

• Pairwise classification (how to combine?)• Limited

• Application specific solutions (80’s)

• Error correcting output coding (Dietterich & Bakhiri, 95)+ve: # of meta-classifiers can be less; can tailor features-ve : groupings may be forced

• Desired: a general framework for natural grouping of classes– Hierarchical with variable resolution– Custom features

Page 28: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Hierarchical Grouping of Classes

• Top down: Solve 3 coupled problems– group classes into two meta-classes– design feaure extractor tailored for the 2 meta-classes (e.g. Fisher)– design the 2-metaclass classifier (Bayesian)

• Solution using Deterministic Annealing :– Softly associate each class with both partitions– Compute/update the most discriminating features– Update associations;

• For hard associations: also lower temperature

– Recurse

– Fast convergence, computation at macro-level

Page 29: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Binary Hierarchical Classifier

• Building the tree:– Bottom-Up– Top-Down

• Hard & soft variants

• Provides valuable domain knowledge

• Simplified feature extraction at each stage

Page 30: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

The Future: Scaling to Large, Non-Stationary Datasets

• Can build a knowledge base of – discriminating features

– Typical class pairings

• More amenable to changing mix of classes, changing class statistics

• Integrate with semi-supervised learning methods

Page 31: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Re-visiting Mixtures of Experts (MoEs)

• Hierarchical versions possible

Expert 1

Expert 2

Expert K

Gating Network

x …

gk

g2

g1(x)

y2

yk

y1Y(x) =

gi (x)yi

(x)

Page 32: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Beyond Mixtures of Experts

• Problems with soft-max based gating network

• Alternative: use normalized Gaussians– Structurally adaptive: add/delete experts

• on-line learning versions

• hard vs. soft switching; error bars, etc

– Piaget’s assimilation & accomodation

V. Ramamurti and J. Ghosh, "Structurally Adaptive Modular Networks for Non-Stationary Environments", IEEE Trans. Neural Networks, 10(1), Jan 1999, pp. 152-60.

Page 33: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Generalizing MoE models

• Mixtures of X

– X = HMMs, factor models, trees, principal components…

• State dependent gating networks

– Sequence classification

• Mixture of Kalman Filters

– Outperformed NASA’s McGill filter bank!

W. S. Chaer, R. H. Bishop and J. Ghosh, "Hierarchical Adaptive Kalman Filtering for Interplanetary Orbit Determination", IEEE Trans. on Aerospace and Electronic Systems, 34(3), Aug 1998, pp. 883-896. 

Page 34: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Some Directions for MCS

– Extend to “Multi-learner systems”

– Develop a Meta-theory based on data properties

- for Classification:• Catering to changing statistics and changing questions

(“concept drift”)• Maintaining explainability (cf. Brieman’s constant)• Classification of sequences

– Online evidence accumulation• distributed data mining and scalability issues• Active learning• Implications for feature selection• Computational aspects

Page 35: Joydeep Ghosh UT-ECE Multiclassifier Systems: Back to the Future Joydeep Ghosh The University of Texas at Austin.

Joydeep Ghosh UT-ECE

Acknowledgements:

Completed PhD theses:

Alexander Strehl, (cluster ensembles), May 02Shailesh Kumar, “Modular Learning Through Output Space

Transformations”, 2000

Viswanath Ramamurti, Modular Networks, 1997  Kurt D. Bollacker, “A Supra-Classifier Framework for Knowledge Reuse”,

1998   Kagan Tumer, math analysis of ensembles, 1996  Ismail Taha, symbolic + connectionist, 1997 Papers at: http://www.lans.ece.utexas.edu