Top Banner
Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering, University of Oxford Information Driven Healthcare: Data Visualization & Classification Lecture 1: Introduction & preprocessing Centre for Doctoral Training in Healthcare Innovation
31

Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

Mar 29, 2015

Download

Documents

Zander Broyhill
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation,Institute of Biomedical Engineering, University of Oxford

Information Driven Healthcare:Data Visualization & Classification

Lecture 1: Introduction & preprocessing

Centre for Doctoral Training in Healthcare Innovation

Page 2: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

The course

A practical overview of (a subset of) classifiers and visualization tools

Data preparation, PCA, K-means Clustering, KNN Statistics, regression, LDA, logistic regression Neural Networks Gaussian Mixture models, EM Support Vector Machines

Labs – try to 1) Classify flowers (classic dataset), … then 2) Predict mortality in the ICU! (... & publish if you

do well!)

Page 3: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

Workload

Two lectures each morning

Five 4-hour labs (each afternoon)

Read one article each eve (optional)

Page 4: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

Assessment /assignments

Class interaction

Lab diary – write up notes as you perform investigations – submit lab code (m-file) and Word/OO doc answering the questions at 5pm each day …

No paper please!

Absolutely no homework!

... but you can write a paper afterwards if your results are good!

Page 5: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

Course texts

Ian Nabney, Netlab, Algorithms for Pattern Recognition,  in their series Advances in Pattern Recognition. Springer (2001) ISBN 1-85233-440-1 http://www.ncrg.aston.ac.uk/netlab/book.php

Christopher M. Bishop, Pattern Recognition and Machine Learning Springer (2006) ISBN 0-38-731073-8 http://research.microsoft.com/en-us/um/people/cmbishop/PRML/index.htm

Press, Teukolsky, Vetterling & Flannery, Numerical Recipes in C, the Art of Scientific Computing, 2nd Edition, Cambrige University Press, 1992. [Ch. 2.6, 10.5(p414-417), 11.0(p465-460), 15.4(p671-688), 15.5(p681-688), 15.6&15.7(p689-700)]     Online at http://www.nrbook.com/a/bookcpdf.php

L. Tarassenko, A Guide to Neural Computation, John Wiley & Sons (February 1998) Ch. 7 (p77-101)

Ian Nabney, Netlab2? - when available!

Page 6: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

Syllabus – Week 1

Monday Data exploration [GDC]Lecture 1: (9.30:10.30am) Introduction, probabilities, entropy, preprocessing, normalization, segmenting

data (PCA, ICA) Lecture 2: (11am-12pm) Feature extraction, visualization, (K-means, SOM, GTM, Neuroscale).Lab 1 (1-5pm) Preprocessing of data & visualization - segmentation (train, test, evaluation), PCA & K-means

with 2 classesReading for tomorrow: Bishop PRML, Ch4.1 p179-196, Ch4.3.2 p205-206, Ch2.3.7 p102-103,691, Netlab

Ch3.5-3.6 p101-107

Tuesday Clinical Statistics & Classifiers [IS]Lecture 3: (9.30:10.30am) Clinical statistics: t-test, X2 test, Wilcoxon rank sum test, Linear regression,

bootstrap, jackknife.Lecture 4: (11am-12pm) Clinical classifiers: LDA, KNN, Logistic RegressionLab 2 (1-5pm) – P-values, statistical testing, LDA, KNN and Logistic regression.Reading for tomorrow: Netlab: Ch5.1-5.6 p165-167, Ch6 p191-221

Wednesday Optimization and Neural Networks [GDC]Lecture 5: (9.30:10.30am) ANNs - RBFs and MLPs - choosing an architecture, balancing the data. Lecture 6: (11am-12pm) Training & optimization, N-fold validation.Lab 3 (1-5pm) Training an MLP to classify flower types and then mortality - partitioning and balancing data,Reading for tomorrow: Netlab: Ch3.1-3.4 p79-100

Thursday Probabilistic Methods [DAC]Lecture 7: (9.30:10.30am) GMM, MCMC, Density EstimationLecture 8: (11am-12pm) EM, Variation Bayes, missing dataLab 4 (1-5pm) GMM and EMReading for tomorrow: Bishop: Ch7 p325-345 (SVM)

Friday Support Vector Machines [CO/GDC]Lecture 9: (9.30:10.30am) SVMs and constrained optimizationLecture 10: (11am-12pm) Wrap-upLab 5 (1-5pm) Use SVM toolbox and vary 2 parameters for regression & classification (1 class death and

then alive), then 2 class.

Page 7: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

Overview of data for lab

You will be given two datasets:

1. A simple dataset for learning – Fisher’s Iris dataset

2. A complex ICU database (if this works – publish!!!)

In each lab you will use dataset 1 to understand the problem, then dataset 2 to see how you can apply this to more challenging data

Page 8: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

So let’s start … what are we doing?

Trying to learn classes from data so when we see new data, we can make a good guess concerning it’s class membership (e.g. is this patient part of the set of people likely to die and if so, can we change his/her treatment)

How do we do this? Supervised – use labelled data to train an algorithm. Unsupervised – use heuristics or metrics to look for

clusters in data (K-means clustering, KNN, SOMs, GMM, …)

Page 9: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

Data preprocessing/manipulation

Filter data to remove outliers (reject obvious large/small values)

Zero-mean, unit variance data if parameters are not in same units!

Compress data into lower dimensions to reduce workload or to visualize data relationships

Rotate data, or expand into higher dimensions to improve the separation between classes.

Page 10: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

The curse of dimensionality

Richard Bellman (1953) coined the term The Curse of Dimensionality (or Hughes effect)

It’s the problem caused by the exponential increase in volume associated with adding extra dimensions to a (mathematical) space.

Bellman gives the following example:

Given 100 evenly-spaced sample points suffice to sample a unit interval with no more than 0.01 distance between points;

An equivalent sampling of a 10D unit hypercube with a lattice with a spacing of 0.01 between adjacent points would require 1020 sample points:

Therefore, at this spatial sampling resolution, the 10-dimensional hypercube is a factor of 1018 ‘larger’ than the unit interval.

mup

pet.

wik

ia.c

om

Page 11: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

So what does that mean for us? Need to think about how much data we have and how many

parameters we use.

“Rule of thumb”: need to have at least 10 training samples of each class per input feature dimension (although this depends on separability of data and can be up to 30 for complex problems and as low as 2-5 for simple problems [*])

So for the Iris dataset – we have 4 measured features on 50 examples of each of the three classes … so we have enough!

For ICU data we have 1400 patients, 970 survived and 430 died … so taking the minimum of these we could use up to 43 of the 112 features

Generally though you need more data …

Or you compress the data into a smaller number of dimensions

[*] Thomas G. Van Niel, Tim R. McVicarb and Bisun Datt, On the relationship between training sample size and data dimensionality: Monte Carlo analysis of broadband multi-temporal classification, Remote Sensing of Environment, Volume 98, Issue 4, 30 October 2005, Pages 468-480 doi:10.1016/j.rse.2005.08.011

Page 12: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

Principal Component Analysis (PCA)

Standard signal/noise separation method

Compress data into lower dimensions to reduce workload or to visualize data relationships

Rotate data to improve the separation between classes

Also known as Karhunen-Loève (KL) transform or the Hotelling transform or Singular Value Decomposition (SVD) – although SVD is actually a mathematical method of PCA

Page 13: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

Principal Component Analysis (PCA)

A form of Blind source Separation – an observation, X , can be broken down into a mixing matrix , A , and a set of basis functions , Z :

X=AZ

Second order decorrelation = independence Find a set of orthogonal axes in the data

(independence metric = variance) Project data onto these axes to decorrelate Independence is forced onto the data through

the orthogonality of axes

Page 14: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

Two dimensional example

Where are the principal components?

Hint: axes of maximum variation, and orthogonal

Page 15: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

Two dimensional example Gives best axis to project minimum

RMS error

Data becomes ‘sphered’ or whitened / decorrelated

Page 16: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

Singular Value Decomposition (SVD)

Decompose observation X=AZ into….

X=USVT

S is a diagonal matrix of singular values with elements arranged in descending order of magnitude (the singular spectrum)

The columns of V are the eigenvectors of C=XTX (the orthogonal subspace … dot(vi,vj)=0 ) … they ‘demix’ or rotate the data

U is the matrix of projections of X onto the eigenvectors of C … the ‘source’ estimates

Page 17: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

SVD – matrix algebra

Decompose observation X=AZ into….

X=USVT

Page 18: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

Eigenspectrum of decomposition

S = singular matrix … zeros except on the leading diagonal

Sij (i=j) are the eigenvalues½

Placed in order of descending magnitude

Correspond to the magnitude of projected data along each eigenvector

Eigenvectors are the axes of maximal variation in the data

Variance = power

(analogous to Fourier components in power spectra)

[stem(diag(S).^2)] Eigenspectrum= Plot of eigenvalues

Page 19: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

SVD: Method for PCA

See BSS notes and example at end of presentation

Page 20: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

SVD for noise/signal separation

To perform SVD filtering of a signal, use a truncated SVD decomposition (using the first p eigenvectors)

Y=USpVT

[Reduce the dimensionality of the data by discarding noise projections Snoise=0 Then reconstruct the data with just the signal subsapce]

Most of the signal is contained in the first few principal components.

Discarding these and projecting back into the original observation space effects a noise-filtering or a noise/signal separation

Page 21: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

e.g.

Imagine a ‘spectral decomposition’ of the matrix:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= x xu1 u2

1

2

v1

v2

Page 22: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

SVD – Dimensionality reduction

How exactly is dimension reduction performed?

A: Set the smallest singular values to zero:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 23: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

SVD – Dimensionality reduction

… note approximation sign

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18

0.36

0.18

0.90

0

00

9.64x

0.58 0.58 0.58 0 0

x~

Page 24: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

SVD - Dimensionality reduction

… and resultant matrix is an approximation using only 3 eigenvectors

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

~

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 0 0

0 0 0 0 00 0 0 0 0

Page 25: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

Real ECG data example

X

Xp =USpVT

S2

Xp … p=2

n

Xp … p=4

Page 26: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

Recap - PCA

Second order decorrelation = independence Find a set of orthogonal axes in the data

(independence metric = variance) Project data onto these axes to decorrelate Independence is forced onto the data through the

orthogonality of axes Conventional noise / signal separation technique Often used as a method of initializing weights for

neural network and other learning algorithms (see Wed lectures).

Page 27: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

Appendix

Worked example (see lecture notes)

http://www.robots.ox.ac.uk/~gari/cdt/IDH/docs/ch14_ICASVDnotes_2009.pdf

Page 28: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

Worked example

Page 29: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

Worked example

Page 30: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

Worked example

Page 31: Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering,

Worked example