Top Banner
1/14/03 1 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute
25

1/14/031 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1/14/031 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

1/14/03 1

Math Models for Learning and Discovery

Kristin P. BennettMathematical Sciences DepartmentRensselaer Polytechnic Institute

Page 2: 1/14/031 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

1/18/05 2

The Learning Problem

The problem of understanding intelligence is said to be the greatest problem in science today and “the” problem for this century – as deciphering the genetic code was for the second half of the last one…the problem of learning represents a gateway to understanding intelligence in man and machines.

-- Tomasso Poggio and Steven Smale 2003

Page 3: 1/14/031 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

1/18/05 3

What do these problems have in common?

Design and Discovery of PharmaceuticalsTarget Marketing in BusinessDiagnosis of Breast CancerDiscovery of Novel SuperconductorsDetection of Anthrax using TZ

spectroscopyModeling and predicting global tradeRNA Transcription

Page 4: 1/14/031 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

1/18/05 4RENSSELAER

DRUG TRIVIA (2000 old info)

• In USA $25B/yr for R&D of pharmaceuticals (33% clinicals)• Worth their weight in gold• 10-15 years from conception market for drug• Development cost 0.5B/drug• First-year sales > $1B/drug• 1 drug approved/5000 compounds tested• 1 out of 100 drugs succeeds to market• 19 Alzheimer’s drugs in development• 20,000,000 Americans with Alzheimer by 2050

Page 5: 1/14/031 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

1/18/05 5RENSSELAER

Page 6: 1/14/031 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

1/18/05 6

HIV Reverse-Transcriptase Inhibition modeling:

Have a few Molecules that have been tested:

Can we predict if new molecule will inhibit HIV?

TOWARDS TREATING THE HIV EPEDIMIC

N

NHN

X

R

R1 S

HN

N

O

O

OHO

R N N

O

O

R2

O OTBDMS

S

OO

OH2NTBDMSO

R1N O OTBDMS

S

OO

OH2NTBDMSO

NN

R1 R2N N

S

O N

O

R2R1

Page 7: 1/14/031 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

1/18/05 7

The bioactivities of a small set of molecules

Many Possible Descriptors for each molecules:

Molecular Weight

Electrostatic Potential

Ionization Potential

Can we predict molecules bioactivity?

What do we know?

Page 8: 1/14/031 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

1/18/05 8

Database Marketing

Bank has $1.7 billion portfolio of home mortgages.

When customer refinances, they may lose customer.

Questions will a customer refinance?

If so, offer that customer a good deal on refinancing.

Page 9: 1/14/031 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

1/18/05 9

What do we know?

For many customers, we know if they refinanced or not.

We know attributes of customer: Income Age Residential Area Payment History

Can we predict behavior of future customers?

Page 10: 1/14/031 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

1/18/05 10

Breast Cancer Diagnosis

Fine needle aspirate of breast tumor.

Is tumor benign or malignant?

Page 11: 1/14/031 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

1/18/05 11

What do we know?

For patients in initial study, we know whether tumor was benign or malignant.

Have a digital image of tumor aspirate.Know characteristics doctors look at:

Uniformity of cell shape Uniformity of cell size Cell Mitosis

Page 12: 1/14/031 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

1/18/05 12

What do we know?

For patients in initial study, we know whether tumor was benign or malignant.

Have a digital image of tumor aspirate.Know characteristics doctors look at:

Uniformity of cell shape Uniformity of cell size Cell Mitosis

Page 13: 1/14/031 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

1/18/05 13

Superconductivity

Superconductivity is the ability of a material to conduct current with no resistance and extremely low loss.

A few high temperature superconductors have been found.

What other compounds are superconductors?

Page 14: 1/14/031 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

1/18/05 14

Applications of Superconductivity:

Magnetic Resonance Imaging

Page 15: 1/14/031 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

1/18/05 15

Applications of Superconductivity

Maglev Trains

Page 16: 1/14/031 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

1/18/05 16

Applications of Superconductivity

Very small and efficient motorsBetter power transmission cablesBetter cellular phone service

Find a cheap high-temperature superconductor and you will get the NOBEL PRIZE.

Page 17: 1/14/031 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

1/18/05 17

What do we know?

Many compounds have been tested to see if they are superconductors.

Many descriptors exists for these compounds based on molecular properties.

Page 18: 1/14/031 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

1/18/05 18

What do all these problems have in common?

Each problemCan be posed as a “yes” or “no”

question.Has examples known to be of the

“yes” type or the “no” type.Each example has an associated set

of descriptors.Learn Classification Function !

Page 19: 1/14/031 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

1/18/05 19

Data Mining

Each problem has data.Our job is to “mine” information from

this data.Information depends on the question

asked.In this case we must produce a

predictive yes/no model (a.k.a. a classification model) based on the data.

Page 20: 1/14/031 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

1/18/05 20

Mathematical Model

Have data

Construct predictive function f(x)ySolve mathematical model to find f

Want f to generalize well on future data

1 1( , ), , ( , )m mx y x y

2

2min ( )

m

f i i Ki

f x y f

Page 21: 1/14/031 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

1/18/05 21

Types of Learning Problems

Classification

Regression

Clustering

Ranking

1 or 1iy

iy R

unknowniy

1 2 , ,k jy y y y

Page 22: 1/14/031 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

1/18/05 22

Data Mining

Classification = yes/no modelsStart with examples of yes and no.Associate a set of descriptors with each

example. Descriptors must be appropriate for the question you are asking.

Construct a model to split the two setsUse the model to predict new examples.

Page 23: 1/14/031 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

1/18/05 23

Learning Model

What kind of learning task is it? What sort of f should we use?

Kernel function What loss function to use? What regularization function? How can we solve this learning model? How well will the model predict new points?

( ) ( , )i ii

f x K x x

Page 24: 1/14/031 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

1/18/05 24

Class information

See course web page http://

www.rpi.edu/~bennek/class/mmld/index.htm

Page 25: 1/14/031 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

1/18/05 25

Assignment for Friday

Read and be prepared to discuss Chapter 1, Shaw-Taylor and

CristianiniLecturer: Gautam Kunapuli