Top Banner
Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France
30

Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Non-linear Principal Manifoldsa Useful Tool in Bioinformatics and Medical Applications

Andrei ZinovyevInstitute des Hautes Etudes

Scientifique,France

Page 2: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Plan of the talk

Object of study Definition of principal manifold

(PM) Constructing PMs: elastic maps Examples of biomedical

applications

Page 3: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Principal manifoldsElastic maps framework

SVM

Principal manifolds

Regression,approximation

Supervisedclassification

K-means

SOM

Clustering

Multidim.scaling

VisualizationPCA

Factor analysis

LLE ISOMAP

Non-linearData-miningmethods

Page 4: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Finite set of objects in RN

X i

i=1..m

IRIS database

Petal heght

Petal width

Sepal width

Sepal height

SPECIES

4.9 3 1.4 0.2 Iris-setosa

4.7 3.2 1.3 0.3 Iris-setosa

4.6 3.1 1.5 0.2 Iris-setosa

7 3.2 4.7 1.4 Iris-versicolor

6.4 3.2 4.5 1.5 Iris-versicolor

6.9 3.1 4.9 1.5 Iris-versicolor

6.3 3.3 6 2.5 Iris-virginica

5.8 2.7 X 1.9 Iris-virginica

7.1 3 5.9 2.1 Iris-virginica

6.3 2.9 5.6 1.8 Iris-virginica

Page 5: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Mean point

m

iiX

mX

1

1

min1

2

m

ii XX

K-meansclustering

min1

2

m

ii YclosestX

Page 6: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Principal “Object”

,

min1

2

m

i

Page 7: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Principal Component Analysis

,

Max

imal

disp

ersio

n

1st Principalaxis

2nd principalaxis

Page 8: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Principal manifold

Page 9: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

What do we want?

Non-linear surface (1D, 2D, 3D …) Smooth and not twisted The data model is unknown Speed (time linear with Nm) Uniqueness

Fast way to project datapoints

Page 10: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Metaphor of elasticity

Datapoints

Graphnodes

U(Y)U(E), U(R)

Page 11: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Constructing elastic nets

y E (0) E (1) R (1) R (0) R (2)

Page 12: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Definition of elastic energy

)()()( REY UUUU

2)(

1

)(

)()(

1 ijp

i Kx

Y yXN

Uij

2)()(

1

)( )0()1( iis

ii

E EEU

r

i

iiii

R RRRU1

2)()()()( )0(2)2()1(.

E (0) E (1)

R (1) R (0) R (2)

y

Xj

00 , ii

Page 13: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Elastic manifold

Page 14: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Global minimum and softening

0, 0 103

0, 0 102

0, 0 101

0, 0 10-1

Page 15: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Adaptive algorithms

Growing net

Adaptive net

Refining net:

Idea of scaling:

Page 16: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Projection onto the manifold

Closest node of the net

Closest point of the manifold

Page 17: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Colorings: visualize any function

Page 18: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Density visualization

Page 19: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Example: different topologies

RN

R2

Page 20: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

VIDAExpert tool and elmap C++ package

Page 21: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Regression and principal manifolds

regression principal component

x

F(x)

min2 ii Pxx min)(

2 ii xFx

Data

Gen.curve

Grid

Page 22: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Image skeletonization or clustering around curves

Page 23: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Approximation of molecular surfaces

Page 24: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Application: economical data

Gross output

Density

ProfitGrowth temp

Page 25: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Medical table1700 patients with infarctus myocarde

Lethal casesPatients map, density

Page 26: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Medical table1700 patients with infarctus myocarde

128 indicators

Age Numberof infarctusin anamnesis

Stenocardia functionalclass

Page 27: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Codon usage in all genes of one genome

Escherichia coli Bacillus subtilis

Majority of genes

Highly expressed genes

“Foreign” genes

“Hydrophobic” genes

Page 28: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Golub’s leukemia dataset3051 genes, 38 samples (ALL/B-cell,ALL/T-cell,AML)

ALL sample AML sample

Map of genes: vote for ALL vote for AML used by T.Golub used by W.Lie

Page 29: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Golub’s leukemia datasetmap of samples: AML ALL/B-cell ALL/T-cell

density

Cystatin C Retinoblastomabinding protein P48

CA2 Carbonic anhydrase II

X-linked Helicase II

Page 30: Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.

Thank you for your attention!

Questions?