Top Banner
EE3J2 Data Mining Slide 1 EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell
33

Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 1

EE3J2 Data Mining

Lecture 15: Introduction to Artificial Neural Networks

Martin Russell

Page 2: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 2

Objectives

Unsupervised and supervised learning Modelling and discrimination Introduction to Artificial Neural Networks (ANNs)

Page 3: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 3

Unsupervised learning

So far we have looked at techniques which try to discover structure in ‘raw’ data – data with no information about classes– Gaussian Mixture Modelling

– Clustering

We treat the whole data set as a single entity, and try to discover underlying structure

The analysis is unsupervised, and automatic learning of the structure of the data is unsupervised learning

Page 4: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 4

Supervised learning

In some cases additional information is available For example, for speech data we might know who

was speaking, or what he or she said This is information about the class of each piece of

data When the analysis is driven by class labels, it is

called supervised learning

Page 5: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 5

Modelling and Discrimination

In supervised learning we can:– Analyse the data for each class separately– Try to discover how to distinguish between classes

Could apply GMM or clustering separately to model each class

Alternatively, we could try to find a method to discriminate between the classes

Page 6: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 6

Modelling and DiscriminationClass models

Decision boundary

Page 7: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 7

Discrimination

In the simplest cases we can discriminate between two classes using a class boundary

Allocation of a point to a class depends on which side of the boundary it lies

Linear decision boundary

Non-linear

decision boundary

Page 8: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 8

Artificial Neural Networks

There are many approaches to discrimination A common class of approaches is based on the idea

of Artificial Neural Networks (ANNs) Inspiration for the basic elements of an ANN

(artificial neuron) comes from biology… …but the analogy really stops there ANNs are just a computational device for processing

patterns – not “artificial brains”

Page 9: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 9

A model of a neuron

Page 10: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 10

An Artificial Neuron

Simple artificial neuron Basic idea –

– if the input to unit u4 is big enough, then the neurone ‘fires’

– Otherwise nothing happens

How do we calculate the input to u4?

i1 i2 i3

w1,4 w2,4 w3,4

u4

Page 11: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 11

Artificial Neurone (2) Suppose that the inputs to units

1, 2 and 3 are i1, i2 and i3

Then the input to u4 is:

In general, for an artificial neuron with N input units the input to unit k is:

4,334,224,114 wiwiwii

i1 i2 i3

w1,4 w2,4 w3,4

u4

N

nknnk wii

1,

Page 12: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 12

The ‘threshold’ activation function The activation function decides

whether the neuron should “fire” A suitable activation function is

the threshold function g:

The output of u4 is then:

0 if 0

0 if 1

x

xxg

i1 i2 i3

w1,4 w2,4 w3,4

u4

44 igo

Page 13: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 13

Other activation functions

Linear:

Sigmoid

xxg

kxe

xg

1

1

Sigmoid activation function

Page 14: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 14

The ‘bias’

As described, the neuron will ‘fire’ only if its input is greater than 0

We can change the value of the point of firing by introducing a bias

This is an additional input unit whose input is fixed at 1

i1 i2 i3

w1,4 w2,4 w3,4

u4

wb,4

1

Page 15: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 15

How the bias works…

The artificial neuron ‘fires’ if input to u4 is greater than or equal to 0

I.E: But this happens only if

Or, equivalently,

04,4,334,224,114 bwwiwiwii

04,4,334,224,114 bwwiwiwii

4,4,334,224,11 bwwiwiwi

Page 16: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 16

Example (2D)

Suppose u has a threshold or sigmoid activation function

u will ‘fire’ if:

x y

3 1

u

-2

1

23 i.e.

023

xy

yx

Page 17: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 17

Example (continued)

23 xyx y

3 1

u4

-2

1

2/3

2

23 xy

u1 u2u3

Page 18: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 18

Example (continued)

Assume – linear activation functions for units u1, u2 and u3

– Sigmoid activation function for u4

If input to u1 is 2 and input to u2 is 2, then:– Input to u4 is 2 × 3 + 2 ×1 + 1 × (-2) = 6– Hence output from u4 is g(6) = 0.998

If input to u1 is -2 and input to u2 is -2, then:– Input to u4 is -2 × 3 + -2 ×1 + 1 × (-2) = -10– Hence output from u4 is g(-10) = 4.54 × 10-5

Page 19: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 19

Example 2

x y

2 -1

u4

-1

1

-1

1/2

12

012

xy

yx

Page 20: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 20

Combining 2 Artificial Neuronsx y

3 1

u

-2

1 x y

2 -1

u

-1

1

-1

1/2

2

2/3

Page 21: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 21

Combining neurons – artificial neural networks

x y

3

u4

-2

1u1 u2

-12 1 -1

20 -20

u5

u6

-2

1

Page 22: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 22

Combining neurons

Input to u4 is 3 × x + 1 × y - 2

Input to u5 is 2 × x + (-1) × y – 1

When x = 3, y = 0– Input to u4 is 7, input to u5 is 5

– Output from u4 is 1, output from u5 is 0.99

– Input to u6 is 1 × 20 + 0.88 × (-20) - 2 = -1.88

– Output from u6 is 0.13

Page 23: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 23

Outputs

i1 i2 o6

3 0 0.13

0.5 2 1.00

0.5 -2 0.00

-1 0 0.06

Page 24: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 24

Combining neurones

2

2/3

-1

‘firing region’

Page 25: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 25

Single layer Multi-Layer Perceptron (MLP)

Input layer

Hidden layer

Output layer

Page 26: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 26

Single Layer MLP Can characterize arbitrary convex regions Defines the region using linear decision boundaries

Page 27: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 27

Two-layer MLP

Hidden layers

Page 28: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 28

Two-Layer MLP

An MLP with two hidden layers can characterize arbitrary shapes

First hidden layer characterises convex regions Second hidden layer combines these convex regions There is no advantage in having more than two

hidden layers

Page 29: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 29

MLP training

To define an MLP must decide:– Number of layers

– Number of input units

– Number of hidden units

– Number of output units

Once these are defined, properties of the MLP are completely defined by the values of the weights

How do we choose the weight values?

Page 30: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 30

MLP training (continued)

MLP weights learnt automatically from training data We have already seen computational techniques for

estimating:– Parameters of GMMs

– Centroid positions in clustering

Similarly there is an iterative computational technique for estimating MLP weights – “Error-Back-Propagation”

Page 31: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 31

Error-back propagation (EBP)

EBP is a ‘gradient descent’ method, like others we have seen

First stage is to choose initial values for the weights The EBP algorithm then changes the weights

incrementally to identify the class boundaries Only guaranteed to find a local optimum

Page 32: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 32

Other types of ANN

Multi-Layer Perceptrons (MLP) are not the only types of ANNs

There are lots of others:– Radial Basis Function (RBF) networks

– Support Vector Machines (SVMs)

– …

There are also ANN interpretations of other methods

Page 33: Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 33

Summary

Discrimination versus Modelling Brief introduction to neural networks Definition of an ‘artificial neurone’ Activation functions – linear and sigmoid Linear boundary defined by a single neurone Convex region defined by a one-level MLP Two-level MLPs