Top Banner
1 New Formulations for Predictive Learning Vladimir Cherkassky University of Minnesota [email protected] Tutorial at IJCNN-05 July 31, 2005
40

New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

Aug 25, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

1

New Formulations for Predictive Learning

Vladimir CherkasskyUniversity of Minnesota

[email protected] at IJCNN-05

July 31, 2005

Page 2: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

2

OutlineMotivation and Background Standard Inductive Learning FormulationAlternative Formulations- non-inductive types of inference- non-standard inductive formulationsPredictive models for interpretationConclusions

Page 3: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

3

Motivation:Importance of Problem Formulation

Traditional (Simplistic) View

‘Useful’ =‘Predictive’May lead to misconceptions:

Inductive models are completely data-drivenThe goal is to design better algorithms

Page 4: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

4

Motivation: philosophicalKarl Popper: Science starts from problems, and not from observations

Confucius: Learning without thought is useless, thought without learning is dangerous

What to do vs how to do

Page 5: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

5

MotivationAnother view of Predictive Learning

Importance of problem formulation (vs algorithm)Just a few known formulationsThousands of algorithms

Page 6: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

6

Background: historical

The problem of predictive learningGiven past data + reasonable assumptions

Estimate unknown dependency for future predictions

Driven by applications (not theory)

Page 7: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

7

Historical DevelopmentStatistics (mathematical science)Goal: model identification, density estimation

Neural Networks (empirical science)Goal of learning: generalization, risk minimization

Statistical Learning (VC theory) (natural science)Goal of learning: generalization for distinct learning problem formulations

Page 8: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

8

Standard Inductive Learning

The learning machine observes samples (x ,y), and returns an estimated responseTwo modes of inference: identification vs imitationRisk

Generatorof samples

LearningMachine

System

x

y

Λ

y

),(ˆ wfy x=

min,y),w)) dP(Loss(y, f( →∫ xx

Page 9: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

9

Two Learning Problems

Learning ~ estimating mapping x → y(in the sense of risk minimization)Binary Classification: estimating an indicator function (with 0/1 loss)Regression: estimating a real-valued function (with squared loss) Assumptions: iid, training/test, loss fct

Page 10: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

10

Contributions of VC-theoryThe Goal of Learningsystem imitation vs system identificationTwo factors responsible for generalizationKeep-It-Direct Principle (Vapnik, 1995)Do not solve a problem of interest by solving a more general (harder) problem as an intermediate stepClear Distinction between- problem setting- solution approach (inductive principle)- learning algorithm

Page 11: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

11

Alternative FormulationsRe-examine assumptions behind standard inductive learning

1 Finite training + large unknown test setnon-inductive inference (transduction, …)

2 Particular loss functionsnew inductive formulations (application-

driven)3 Single model

multiple model estimation

Page 12: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

12

1.TransductionHow to incorporate unlabeled test data into the learning processEstimating function at given pointsGiven: training data (Xi, yi) , i = 1,…nand unlabeled test points Xn+j , j = 1,…kEstimate: class labels at these test points Note: need to predict only at given test points Xn+j, not for every possible input X

Page 13: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

13

Transduction vs Induction

a priori knowledge assumptions

estimated function

training data

predicted output

induction deduction

transduction

Page 14: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

14

Transduction based on size of marginThe problem: Find class label of test input X

Page 15: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

15

Many potential applicationsPrediction of molecular bioactivity for drug discoveryTraining data~1,909; test~634 samplesInput space ~ 139,351-dimensionalPrediction accuracy:

SVM induction~74.5%; transduction ~ 82.3%Ref: J. Weston et al, KDD cup 2001 data analysis: prediction

of molecular bioactivity for drug design – binding to thrombin, Bioinformatics 2003

Page 16: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

16

Beyond Transduction: Selection

Selection ProblemGiven: training data (Xi, yi) , i = 1,…nand unlabeled test points Xn+j , j = 1,…kSelect: a subset of m test points with the highest probability of belonging to one classNote: selective inference needs only to select a subset of m test points, rather than assign class labels to all test points.

Page 17: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

17

Hierarchy of Types of Inference

IdentificationImitationTransductionSelection.....

Implications: philosophical, human learning

Page 18: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

18

2. Application-driven formulations

APPLICATION NEEDS

LossFunction

Input, output,other variables

Training/test data

AdmissibleModels

FORMAL PROBLEM STATEMENT

LEARNING THEORY

Page 19: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

19

Inductive Learning System (revised)The learning machine observes samples (x ,y), and returns an estimated response to minimize application-specific Loss [f(x,w), y]

y

Generatorof samples

LearningMachine

System

x

y

Λ

y

Loss[f(x,w),y]

Page 20: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

20

Application: financial engineering

Asset management via daily trading:non-standard learning formulation

Buy/sell/holdy

predictionPREDICTIVE

MODELy=f(x)

TRADINGDECISION

MARKET

input xindicators

GAIN/ LOSS

Page 21: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

21

Example: timing of mutual funds

Background: buy-and-hold vs tradingRecent scandals in mutual fund industryDaily trading scenario

Buyorsell

Money MarketIndex or Fund

Sellorbuy

Proprietary Exchange Strategy

Page 22: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

22

Example of Actual TradingImproved return + Reduced risk/ volatility:

Page 23: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

23

Learning formulation for fund trading

Given - Daily % price changes of a fund - Time series of daily values of input variables- Indicator decision function (1/0 ~ Buy/Sell)

Objective: maximize total return over n-day period

( )iiii

pppq1−

−=

iX

),( wxfy ii =

∑=

=n

iii qwxfwQ

1),()(

Page 24: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

24

Non-standard inductive formulation

Buy/sell/holdy

predictionPREDICTIVE

MODELy=f(x)

TRADINGDECISION

MARKET

input xindicators

GAIN/ LOSS

Maximize total account valueNeither classification, nor regression

∑=

=n

iii qwxfwQ

1),()(

Page 25: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

25

3. Multiple Model EstimationSingle-model formulation

Estimate unknown dependency

x → y

Multiple-model approach:Available data can be ‘explained’ using several models

Page 26: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

26

Example data sets: Regression

Two regression models Single complex model

Page 27: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

27

Multiple Model FormulationAvailable (training) data are generated by several (unknown) regression models,

Goals of learning:Partition available data (clustering, segmentation)Estimate a model for each subset of data (supervised learning)

Assumption:Majority of the data samples can be explained (described) by a single model.

mmty ξ+= )(x mX∈x

Page 28: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

28

Experimental Results: Linear

0 0.5 10

0.5

1

1.5

2

2.5

3

3.5

(a)0 0.5 1

0

0.5

1

1.5

2

2.5

3

3.5

(b)

M1 estimateM2 estimate

0 0.5 10

0.5

1

1.5

2

2.5

3

3.5

(c)0 0.5 1

0

0.5

1

1.5

2

2.5

3

3.5

(d)

M1 estimateM2 estimate

Page 29: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

29

Experimental Results: Non-Linear

0 0.5 1-2

-1

0

1

2

(a)0 0.5 1

-2

-1

0

1

2

(b)

M1 estimateM2 estimate

0 0.5 1-2

-1

0

1

2

(c)0 0.5 1

-2

-1

0

1

2

(d)

M1 estimateM2 estimate

Page 30: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

30

Multiple Model Classification

Single-model approachcomplex model

Multiple-model approachtwo simple models

Page 31: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

31

Procedure for MMC

Initialization: Available data = all training samples.Step 1: Estimate major model, i.e. apply robust classification to available data

Here, ‘Robustness’ wrt variations of data generated by minor model (s)

Step 2: Partition available data (from one class) into two subsets

Step 3: Remove subset of data (from one class) classified by the major model from available data.Iterate

Page 32: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

32

Example of MMC: XOR data set

Training phase

Page 33: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

33

Comparison for toy data set

MMC hyperplanes

-1 -0.5 0 0.5

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

(a)

H

(1) H

(2)

-1 -0.5 0 0.5

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

(b)

RBF-SVM

Page 34: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

34

Comparison continuedSVM polynomial kernel Prediction Accuracy

Error (%SV)RBF 0.058 (25.5%)Poly 0.067 (26.4%)MMC 0.055 (14.5%)

-1 -0.5 0 0.5

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

(c)

Page 35: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

35

Summary for Multiple Model Estimation

Improvements due to novel problem formulation, not sophisticated algorithmsPractical learning algorithm using based on (linear) SVMResulting model has hierarchical structureAdvantages:

InterpretationNo Kernel Selection

Page 36: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

36

Prediction and interpretationMany, many applications intrinsically difficult to formalizeTwo practical goals of learning:- prediction (objective loss function)- interpretation, understanding (subjective)Most algorithms developed for predictivesettings, but used for interpretation and human decision makingRationale: good predictive model ~ true

Page 37: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

37

Example:functional neuroimagingUnderstanding fMRI image data:- estimate ‘good’ Brain Activation Maps showing brain activity (colored patches) in response to specific tasksMeasure of goodness: predictability, reproducibility

Page 38: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

38

Predictive models for understanding

Always assume inductive formulationWhat if transduction yields much better prediction?Fundamental problem (classical view):- human reasoning ~ logic + induction- transduction does not fit this paradigmGoal of science: understandingGoal of science: perform/act well

Page 39: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

39

ConclusionsMethodological shift:think first about the problem formulation, rather than learning algorithmsImportance of problem formulation- for empirical comparisons - the limits of predictive models Philosophical impact of Vapnik’s new types of (non-inductive) inference

Page 40: New Formulations for Predictive Learningewh.ieee.org/cmte/cis/mtsc/ieeecis/Vladimir Cherkassky_IJCNN05.pdf · Prediction of molecular bioactivity for drug discovery Training data~1,909;

40

ReferencesVC-Theory: V. Vapnik, Statistical Learning Theory, Wiley, NY

Transduction: V. Vapnik (1998), Statistical Learning Theory, Wiley, + many recent papers

Timing of Mutual Funds: E. Zitzewitz (2002), Who cares about shareholders: arbitrage-proofing mutual funds. Journal of Law, Economics and Organization, 19, 2, pp. 245-280

Multiple Model Estimation: Y. Ma and V. Cherkassky (2003), Multiple model classification using SVM-based approach, in Proc. IJCNNV. Cherkassky and Y. Ma (2005), Multiple model regression estimation, IEEE TNN, 14, 4, pp. 785-798