Top Banner
Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner, Martin Law, Erinija Pranckeviciene, Albert Orriols-Puig, Nuria Macia
26

Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,

Jan 20, 2016

Download

Documents

Lee Hawkins
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,

Bell Laboratories

Intrinsic complexity of classification problems

Tin Kam Ho

With contributions from

Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner, Martin Law,Erinija Pranckeviciene, Albert Orriols-Puig, Nuria Macia

Page 2: Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,

2 All Rights Reserved © Alcatel-Lucent 2008

Supervised Learning: Many Methods, Data Dependent Performances

Bayesian classifiers, logistic regression, linear & polynomial discriminators, nearest-neighbors, decision trees & forests, neural networks, support vector machines, ensemble methods, …

ZeroR NN1 NNK NB C4.5 PART SMO XCS aud 25.3 76.0 68.4 69.6 79.0 81.2 - 57.7

aus 55.5 81.9 85.4 77.5 85.2 83.3 84.9 85.7

bal 45.0 76.2 87.2 90.4 78.5 81.9 - 79.8

bpa 58.0 63.5 60.6 54.3 65.8 65.8 58.0 68.2

bps 51.6 83.2 82.8 78.6 80.1 79.0 86.4 83.3

bre 65.5 96.0 96.7 96.0 95.4 95.3 96.7 96.0

cmc 42.7 44.4 46.8 50.6 52.1 49.8 - 52.3

gls 34.6 66.3 66.4 47.6 65.8 69.0 - 72.6

h-c 54.5 77.4 83.2 83.6 73.6 77.9 - 79.9

hep 79.3 79.9 80.8 83.2 78.9 80.0 83.9 83.2

irs 33.3 95.3 95.3 94.7 95.3 95.3 - 94.7

krk 52.2 89.4 94.9 87.0 98.3 98.4 96.1 98.6

lab 65.4 81.1 92.1 95.2 73.3 73.9 93.2 75.4

led 10.5 62.4 75.0 74.9 74.9 75.1 - 74.8

lym 55.0 83.3 83.6 85.6 77.0 71.5 - 79.0

mmg 56.0 63.0 65.3 64.7 64.8 61.9 67.0 63.4

mus 51.8 100.0 100.0 96.4 100.0 100.0 100.0 99.8

mux 49.9 78.6 99.8 61.9 99.9 100.0 61.6 100.0

pmi 65.1 70.3 73.9 75.4 73.1 72.6 76.7 76.0

prt 24.9 34.5 42.5 50.8 41.6 39.8 - 43.7

seg 14.3 97.4 96.1 80.1 97.2 96.8 - 96.1

sick 93.8 96.1 96.3 93.3 98.4 97.0 93.8 96.7

soyb 13.5 89.5 90.3 92.8 91.4 90.3 - 76.2

tao 49.8 96.1 96.0 80.8 95.1 93.6 83.6 88.4

thy 19.5 68.1 65.1 80.6 92.1 92.1 - 86.3

veh 25.1 69.4 69.7 46.2 73.6 72.6 - 72.2

vote 61.4 92.4 92.6 90.1 96.3 96.5 95.6 95.4

vow 9.1 99.1 96.6 65.3 80.7 78.3 - 87.6

wne 39.8 95.6 96.8 97.8 94.6 92.9 - 96.3

zoo 41.7 94.6 92.5 95.4 91.6 92.5 - 92.6

Avg 44.8 80.0 82.4 78.0 82.1 81.8 84.1 81.7

• No clear winners good for all problems

• Often, accuracy reaches a limit for a practical problem, even with the best known method

Page 3: Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,

3 All Rights Reserved © Alcatel-Lucent 2008

Accuracy Depends on the Goodness of Match between Classifiers and Problems

NNXCSerror=0.06%

error=1.9%

Better!

Problem A Problem B

error=0.6%

error=0.7%

XCS NN

Better!

Page 4: Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,

4 All Rights Reserved © Alcatel-Lucent 2008

Measuring Geometrical Complexity of Classification Problems

Our goal: tools and languages for studying

Characteristics of geometry & topology of high-dim data sets

How they change with feature transformations and sampling

How they interact with classifier geometry

We want to know:

What are real-world problems like? What is my problem like? What can be expected of a method on a specific problem?

Page 5: Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,

5 All Rights Reserved © Alcatel-Lucent 2008

Parameterization of Data Complexity

Page 6: Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,

6 All Rights Reserved © Alcatel-Lucent 2008

Some Useful Measures of Geometric Complexity

22

21

221

σσ)μ(μ

f

Classical measure of class separability

Maximize over all features to find the most discriminating

Fisher’s Discriminant Ratio

Degree of Linear Separability

Find separating hyper-plane by linear programming

Error counts and distances to plane measure separability

Length of Class Boundary

Compute minimum spanning tree

Count class-crossing edges

Shapes of Class Manifolds

Cover same-class pts with maximal balls

Ball counts describe shape of class manifold

Page 7: Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,

7 All Rights Reserved © Alcatel-Lucent 2008

Real-World Data Sets:

Benchmarking data from UC-Irvine archive

844 two-class problems452 are linearly separable, 392 non-separable

Synthetic Data Sets:

Random labeling of

randomly located points100 problems in 1-100 dimensions

Using Complexity Measures to Study Problem Distributions

Random labeling

Linearly separable real-world data

Linearly non-separable real-world data

Complexity Metric 1

Metr

ic 2

Page 8: Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,

8 All Rights Reserved © Alcatel-Lucent 2008

Measures of Geometrical Complexity

Page 9: Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,

9 All Rights Reserved © Alcatel-Lucent 2008

Distribution of Problems in Complexity Space lin.sep lin.nonsep random �

Page 10: Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,

10 All Rights Reserved © Alcatel-Lucent 2008

The First 6 Principal Components

Page 11: Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,

11 All Rights Reserved © Alcatel-Lucent 2008

Interpretation of the First 4 PCs

PC 1: 50% of variance: Linearity of boundary and proximity of opposite class neighbor

PC 2: 12% of variance: Balance between within-class scatter and between-class distance

PC 3: 11% of variance: Concentration & orientation of intrusion into opposite class

PC 4: 9% of variance: Within-class scatter

Page 12: Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,

12 All Rights Reserved © Alcatel-Lucent 2008

• Continuous distribution

• Known easy & difficult problems occupy opposite ends

• Few outliers

• Empty regionsRandom labels

Linearly separable

Problem Distribution in 1st & 2nd Principal Components

Page 13: Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,

13 All Rights Reserved © Alcatel-Lucent 2008

Relating Classifier Behavior to Data Complexity

Page 14: Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,

14 All Rights Reserved © Alcatel-Lucent 2008

Class Boundaries Inferred by Different Classifiers

XCS: a genetic algorithm

Nearest neighbor classifier

Linear classifier

Page 15: Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,

15 All Rights Reserved © Alcatel-Lucent 2008

Domains of Competence of Classifiers

•Which classifier works the best for a given classification problem?

•Can data complexity give us a hint?

Complexity metric 1

Metr

ic 2

NN

LC

XCSDecisionForest

?

Page 16: Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,

16 All Rights Reserved © Alcatel-Lucent 2008

Domain of Competence Experiment

Use a set of 9 complexity measuresBoundary, Pretop, IntraInter, NonLinNN, NonLinLP,Fisher, MaxEff, VolumeOverlap, Npts/Ndim

Characterize 392 two-class problems from UCI data,all shown to be linearly non-separable

Evaluate 6 classifiersNN (1-nearest neighbor)LP (linear classifier by linear programming)Odt (oblique decision tree)Pdfc (random subspace decision forest)Bdfc (bagging based decision forest)XCS (a genetic-algorithm based classifier)

ensemble methodsensemble methodsensemble methods

Page 17: Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,

17 All Rights Reserved © Alcatel-Lucent 2008

Identifiable Domains of Competence by NN and LP

Best Classifier for Benchmarking Data

Page 18: Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,

18 All Rights Reserved © Alcatel-Lucent 2008

Regions in complexity space where the best classifier is (nn,lp, or odt) vs. an ensemble technique

Boundary-NonLinNN

IntraInter-Pretop

MaxEff-VolumeOverlap

ensemble+ nn,lp,odt

Less Identifiable Domains of Competence

Page 19: Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,

19 All Rights Reserved © Alcatel-Lucent 2008

Difficulties in Estimating Data Complexity

Page 20: Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,

20 All Rights Reserved © Alcatel-Lucent 2008

Apparent vs. True Complexity: Uncertainty in Measures due to Sampling Density

2 points 10 points

100 points 500 points 1000 points

Problem may appear deceptively simple or complex with small samples

Page 21: Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,

21 All Rights Reserved © Alcatel-Lucent 2008

Uncertainty of Estimates at Two Levels

Sparse training data in each problem & complex geometry cause ill-posedness of class boundaries

(uncertainty in feature space)

Sparse sample of problems causes difficulty in identifying regions of dominant competence

(uncertainty in complexity space)

Page 22: Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,

22 All Rights Reserved © Alcatel-Lucent 2008

Complexity Estimates and Dimensionality Reduction

Feature selection/transformation may change the difficulty of a classification problem:

• Widening the gap between classes• Compressing the discriminatory information• Removing irrelevant dimensions

It is often unclear to what extent these happen We seek quantitative description of such changes

Feature selection Discrimination

Page 23: Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,

23 All Rights Reserved © Alcatel-Lucent 2008

10 20 30 40 50 60 70 80 90

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Boundary

1N

N e

rro

r FFS subsets all datasets

boundary versus 1NN classification error spectra1

colon spectra2eogat ovarian spectra3

Spread of classification accuracy and geometrical complexity due to forward feature selection

Page 24: Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,

24 All Rights Reserved © Alcatel-Lucent 2008

Conclusions

Page 25: Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,

25 All Rights Reserved © Alcatel-Lucent 2008

Summary: Early Discoveries

•Problems distribute in a continuum in complexity space

•Several key measures provide independent characterization

•There exist identifiable domains of classifier’s dominant competency

•Sparse sampling, feature selection, and feature transformation induce variability in complexity estimates

Page 26: Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,

26 All Rights Reserved © Alcatel-Lucent 2008

For the Future

Further progress in statistical learning will need systematic, scientific evaluation of the algorithms with problems that are difficult for different reasons.

A “problem synthesizer” will be useful to provide a complete evaluation platform, and reveal the “blind spots” of current learning algorithms.

Rigorous statistical characterization of complexity estimates from limited training data will help gauge the uncertainty, and determine applicability of data complexity methods.

Ongoing: DCol: Data Complexity Library ICPR 2010 Contest on Domain of Dominant Competence