COMP60431 Machine Learning Advanced Computer Science MSc

Post on 30-Nov-2014

271 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

Transcript

COMP60431 Machine Learning

Advanced Computer Science MSc

Lecturers: Magnus Rattray & Gavin Brown

What is Machine Learning?

1. Software that adapts to (“learns” from) data

2. Concerned with creating and using mathematical “data structures” that allow a computer to exhibit behaviour that would normally require a human.

Applications

Speech and hand-writing recognition Autonomous robot control Data mining and bioinformatics Playing games Fault detection Clinical diagnosis Spam email detection Inverse kinematics

Applications are diverse, algorithms are generic.

Introduce the concepts and details behind various ML methods, including how they work, and use existing software packages to illustrate how they are used on data.

Projects – explore the field, reinvent if you want

What will you be doing?

Machine Learning Methods

Learning from labelled data (supervised learning) (e.g. trying to predict the weather from a dataset of historical patterns)

Learning from unlabelled data (unsupervised learning) (e.g. trying to identify natural patterns in sales of books on Amazon.com)

Learning from sequential data (e.g. Speech recognition, DNA sequence analysis)

Statistical Learning

Different Machine learning methods can be unified within a framework of statistics :

Data is considered to be from a probability distribution. Typically, we don’t expect perfect learning but only

“probably correct” learning. Statistical concepts are the key to measuring our

future expected performance.

Important:If you’re not prepared to get into a bit of maths (linear

algebra, calculus, statistics) don’t take this course.

Example 1: Hand-written digits

Data: Greyscale images

Task: Classification (0, 1, 2, 3…..9)

Problem features: Highly variable inputs from same class,

including some “weird” inputs.

                                                

Methods: K-Nearest Neighbour or Support Vector Machines

US Postal Service Digits

Example 2: Predicting heart disease

-- 1. age -- 2. sex -- 3. chest pain type (4 values) -- 4. resting blood pressure -- 5. serum cholestoral in mg/dl -- 6. fasting blood sugar > 120 mg/dl -- 7. resting electrocardiographic results (values 0,1,2) -- 8. maximum heart rate achieved -- 9. exercise induced angina -- 10. oldpeak = ST depression induced by exercise relative to rest -- 11. the slope of the peak exercise ST segment -- 12. number of major vessels (0-3) colored by flourosopy

Example 2: Predicting heart disease

age sex ch bp sc fb ele mx ang old slo maj typ r

67 0 3 115 564 0 2 160 0 1.6 2 0 7 1

57 1 2 124 261 0 0 141 0 0.3 1 0 7 2

64 1 4 128 263 0 0 105 1 0.2 2 1 7 1

74 0 2 120 269 0 2 121 1 0.2 1 1 3 1

65 1 4 120 177 0 0 140 0 0.4 1 0 7 1

56 1 3 130 256 1 2 142 1 0.6 2 1 6 2

59 1 4 110 239 0 2 142 1 1.2 2 1 7 2

……63 0 4 150 407 0 2 154 0 4 2 3 7 2

59 1 4 135 234 0 0 161 0 0.5 2 0 7 1

53 1 4 142 226 0 2 111 1 0 1 0 7 1

44 1 3 140 235 0 2 180 0 0 1 0 3 1

61 1 1 134 234 0 0 145 0 2.6 2 2 3 2

57 0 4 128 303 0 2 159 0 0 1 1 3 1

71 0 4 112 149 0 0 125 0 1.6 2 0 3 1

(2% of full

dataset shown)

Example 2: Predicting heart disease

“Heuristics that make us smart”

Example 3: DNA microarrays

DNA from ~10,000 genes attached to a glass slide called a “microarray”.

Green and red labels attached to mRNA from two different sample tissues.

DNA microarrays

Tasks: Sample classification, gene classification, visualisation and clustering of genes/samples.

Problem features: Very high-dimensional data (many features)

but relatively small number of examples (samples)

Extremely noisy data (noise ~ signal) Lack of good domain knowledge

Projection of 10,000 dimensional data onto 2D using PCA effectively separates cancer subtypes.

DNA microarrays

Relevant disciplines

Algorithms Artificial intelligence Control Physics Information theory Dynamical systems Neurobiology Signal processing

Statistics Linear algebra Etc, etc …..

Researchers in ML come from a variety of different backgrounds.

Prerequisites

Need: Reasonable knowledge of calculus and matrix/vector algebra.

Don’t need: Previous experience of Matlab programming – this will be learned during the course.

Module structure

Assessed exercises (20%)Project (30%)January examination (50%)

Period 1 (Tuesdays) 28th Sept – 3rd Nov

Resources

We’ll provide full slides and notes.

If you want a book, this is a suggestion:

E. Alpaydin

“Introduction to Machine Learning”

What now ?

Web page : http://intranet.cs.man.ac.uk/mlo/comp60431/

The course begins on Tuesday 29th Sept.

If you want to take the course: check primer tutorial on the required maths, practice with Matlab (tutorial on website)

Questions?

Example : Speech recognition

Data: features from spectral analysis of speech signals (two in this simple example).

Task: Classification of vowel sounds in words of the form “h-?-d”, e.g. head, hid, had etc.

Problem features: Highly variable data with same classification Good feature selection is very important This task is a small part of a larger task

Method: Multilayer neural network

top related