Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

EE3J2 Data MiningSlide 1

EE3J2 Data Mining

Lecture 15: Introduction to Artificial Neural Networks

Martin Russell


Objectives

Unsupervised and supervised learning Modelling and discrimination Introduction to Artificial Neural Networks (ANNs)


Unsupervised learning

So far we have looked at techniques which try to discover structure in ‘raw’ data – data with no information about classes– Gaussian Mixture Modelling

– Clustering

We treat the whole data set as a single entity, and try to discover underlying structure

The analysis is unsupervised, and automatic learning of the structure of the data is unsupervised learning


Supervised learning

In some cases additional information is available For example, for speech data we might know who

was speaking, or what he or she said This is information about the class of each piece of

data When the analysis is driven by class labels, it is

called supervised learning


Modelling and Discrimination

In supervised learning we can:– Analyse the data for each class separately– Try to discover how to distinguish between classes

Could apply GMM or clustering separately to model each class

Alternatively, we could try to find a method to discriminate between the classes


Modelling and DiscriminationClass models

Decision boundary


Discrimination

In the simplest cases we can discriminate between two classes using a class boundary

Allocation of a point to a class depends on which side of the boundary it lies

Linear decision boundary

Non-linear

decision boundary


Artificial Neural Networks

There are many approaches to discrimination A common class of approaches is based on the idea

of Artificial Neural Networks (ANNs) Inspiration for the basic elements of an ANN

(artificial neuron) comes from biology… …but the analogy really stops there ANNs are just a computational device for processing

patterns – not “artificial brains”


A model of a neuron


An Artificial Neuron

Simple artificial neuron Basic idea –

– if the input to unit u4 is big enough, then the neurone ‘fires’

– Otherwise nothing happens

How do we calculate the input to u4?

i1 i2 i3

w1,4 w2,4 w3,4

u4


Artificial Neurone (2) Suppose that the inputs to units

1, 2 and 3 are i1, i2 and i3

Then the input to u4 is:

In general, for an artificial neuron with N input units the input to unit k is:

4,334,224,114 wiwiwii

i1 i2 i3

w1,4 w2,4 w3,4

u4

N

nknnk wii

1,


The ‘threshold’ activation function The activation function decides

whether the neuron should “fire” A suitable activation function is

the threshold function g:

The output of u4 is then:

0 if 0

0 if 1

x

xxg

i1 i2 i3

w1,4 w2,4 w3,4

u4

44 igo


Other activation functions

Linear:

Sigmoid

xxg

kxe

xg

1

1

Sigmoid activation function


The ‘bias’

As described, the neuron will ‘fire’ only if its input is greater than 0

We can change the value of the point of firing by introducing a bias

This is an additional input unit whose input is fixed at 1

i1 i2 i3

w1,4 w2,4 w3,4

u4

wb,4

1


How the bias works…

The artificial neuron ‘fires’ if input to u4 is greater than or equal to 0

I.E: But this happens only if

Or, equivalently,

04,4,334,224,114 bwwiwiwii

04,4,334,224,114 bwwiwiwii

4,4,334,224,11 bwwiwiwi


Example (2D)

Suppose u has a threshold or sigmoid activation function

u will ‘fire’ if:

x y

3 1

u

-2

1

23 i.e.

023

xy

yx


Example (continued)

23 xyx y

3 1

u4

-2

1

2/3

2

23 xy

u1 u2u3


Example (continued)

Assume – linear activation functions for units u1, u2 and u3

– Sigmoid activation function for u4

If input to u1 is 2 and input to u2 is 2, then:– Input to u4 is 2 × 3 + 2 ×1 + 1 × (-2) = 6– Hence output from u4 is g(6) = 0.998

If input to u1 is -2 and input to u2 is -2, then:– Input to u4 is -2 × 3 + -2 ×1 + 1 × (-2) = -10– Hence output from u4 is g(-10) = 4.54 × 10-5


Example 2

x y

2 -1

u4

-1

1

-1

1/2

12

012

xy

yx


Combining 2 Artificial Neuronsx y

3 1

u

-2

1 x y

2 -1

u

-1

1

-1

1/2

2

2/3


Combining neurons – artificial neural networks

x y

3

u4

-2

1u1 u2

-12 1 -1

20 -20

u5

u6

-2

1


Combining neurons

Input to u4 is 3 × x + 1 × y - 2

Input to u5 is 2 × x + (-1) × y – 1

When x = 3, y = 0– Input to u4 is 7, input to u5 is 5

– Output from u4 is 1, output from u5 is 0.99

– Input to u6 is 1 × 20 + 0.88 × (-20) - 2 = -1.88

– Output from u6 is 0.13


Outputs

i1 i2 o6

3 0 0.13

0.5 2 1.00

0.5 -2 0.00

-1 0 0.06


Combining neurones

2

2/3

-1

‘firing region’


Single layer Multi-Layer Perceptron (MLP)

Input layer

Hidden layer

Output layer


Single Layer MLP Can characterize arbitrary convex regions Defines the region using linear decision boundaries


Two-layer MLP

Hidden layers


Two-Layer MLP

An MLP with two hidden layers can characterize arbitrary shapes

First hidden layer characterises convex regions Second hidden layer combines these convex regions There is no advantage in having more than two

hidden layers


MLP training

To define an MLP must decide:– Number of layers

– Number of input units

– Number of hidden units

– Number of output units

Once these are defined, properties of the MLP are completely defined by the values of the weights

How do we choose the weight values?


MLP training (continued)

MLP weights learnt automatically from training data We have already seen computational techniques for

estimating:– Parameters of GMMs

– Centroid positions in clustering

Similarly there is an iterative computational technique for estimating MLP weights – “Error-Back-Propagation”


Error-back propagation (EBP)

EBP is a ‘gradient descent’ method, like others we have seen

First stage is to choose initial values for the weights The EBP algorithm then changes the weights

incrementally to identify the class boundaries Only guaranteed to find a local optimum


Other types of ANN

Multi-Layer Perceptrons (MLP) are not the only types of ANNs

There are lots of others:– Radial Basis Function (RBF) networks

– Support Vector Machines (SVMs)

– …

There are also ANN interpretations of other methods


Summary

Discrimination versus Modelling Brief introduction to neural networks Definition of an ‘artificial neurone’ Activation functions – linear and sigmoid Linear boundary defined by a single neurone Convex region defined by a one-level MLP Two-level MLPs

Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

Documents

neuron slide

u4u4 slide

raw data data

ee3j2 data mining modelling

unsupervised learning

piece of data

data set

speech data