Neural networks 1

G51IAIIntroduction to AI

Andrew ParkesNeural Networks 1

Neural Networks

• AIMA– Section 20.5 of 2003 edition

• Fundamentals of Neural Networks : Architectures, Algorithms and Applications. L, Fausett, 1994

• An Introduction to Neural Networks (2nd Ed). Morton, IM, 1995

Brief History

• Try to create artificial intelligence based on the natural intelligence we know:

• The brain– massively interconnected neurons

G5G51IAI1IAI Neural Networks Neural Networks

Neural Networks

Natural Neural Networks

• Signals “move” via electrochemical signals

• The synapses release a chemical transmitter – the sum of which can cause a threshold to be reached – causing the neuron to “fire”

• Synapses can be inhibitory or excitatory


• We are born with about 100 billion neurons

• A neuron may connect to as many as 100,000 other neurons


• McCulloch & Pitts (1943) are generally recognised as the designers of the first neural network

• Many of their ideas still used today e.g.– many simple units, “neurons” combine to

give increased computational power– the idea of a threshold


Modelling a Neuron

• aj :Activation value of unit j

• wj,i :Weight on link from unit j to unit i

• ini :Weighted sum of inputs to unit i

• ai :Activation value of unit i

• g :Activation function

j

jiji aWin ,


Activation Functions

• Stept(x) = 1 if x ≥ t, else 0 threshold=t

• Sign(x) = +1 if x ≥ 0, else –1

• Sigmoid(x) = 1/(1+e-x)

Building a Neural Network

1. “Select Structure”: Design the way that the neurons are interconnected

2. “Select weights” – decide the strengths with which the neurons are interconnected

– weights are selected so get a “good match” to a “training set”

– “training set”: set of inputs and desired outputs

– often use a “learning algorithm”

Neural Networks

• Hebb (1949) developed the first learning rule – on the premise that if two neurons

were active at the same time the strength between them should be increased

Neural Networks

• During the 50’s and 60’s many researchers worked, amidst great excitement, on a particular net structure called the “perceptron”.

• Minsky & Papert (1969) demonstrated a strong limit on the power of perceptrons– saw the death of neural network research for about

15 years

• Only in the mid 80’s (Parker and LeCun) was interest revived because of their learning algorithm for a better design of net – (in fact Werbos discovered algorithm in 1974)

Basic Neural Networks

• Will first look at simplest networks• “Feed-forward”

– Signals travel in one direction through net

– Net computes a function of the inputs


The First Neural Neural Networks

Neurons in a McCulloch-Pitts network are connected by directed, weighted paths

-1

2

2X1

X2

X3

Y



If the weight on a path is positive the path is excitatory, otherwise it is inhibitory

-1

2

2X1

X2

X3

Y



The activation of a neuron is binary. That is, the neuron either fires (activation of one) or does not fire (activation of zero).

-1

2

2X1

X2

X3

Y



For the network shown here the activation function for unit Y is

f(y_in) = 1, if y_in >= θ else 0

where y_in is the total input signal receivedθ is the threshold for Y

-1

2

2X1

X2

X3

Y



Originally, all excitatory connections into a particular neuron have the same weight, although different weighted connections can be input to different neurons

Later weights allowed to be arbitrary

-1

2

2X1

X2

X3

Y



Each neuron has a fixed threshold. If the net input into the neuron is greater than or equal to the threshold, the neuron fires

-1

2

2X1

X2

X3

Y



The threshold is set such that any non-zero inhibitory input will prevent the neuron from firing

-1

2

2X1

X2

X3

Y

Building Logic Gates

• Computers are built out of “logic gates”

• Can we use neural nets to represent logical functions?

• Use threshold (step) function for activation function– all activation values are 0 (false) or 1

(true)



AND Function

1

1X1

X2

Y

AND

X1 X2 Y

1 1 1

1 0 0

0 1 0

0 0 0

Threshold(Y) = 2



AND FunctionOR Function

2

2X1

X2

Y

OR

X1 X2 Y

1 1 1

1 0 1

0 1 1

0 0 0

Threshold(Y) = 2



AND NOT Function

-1

2X1

X2

Y

ANDNOT

X1 X2 Y

1 1 0

1 0 1

0 1 0

0 0 0

Threshold(Y) = 2


Simple Networks

AND OR NOTInput 1 0 0 1 1 0 0 1 1 0 1Input 2 0 1 0 1 0 1 0 1Output 0 0 0 1 0 1 1 1 1 0


Simple Networks

t = 0.0

y

x

W = 1.5

W = 1

-1


Perceptron• Synonym for Single-

Layer, Feed-Forward Network

• First Studied in the 50’s

• Other networks were known about but the perceptron was the only one capable of learning and thus all research was concentrated in this area


Perceptron• A single weight only

affects one output so we can restrict our investigations to a model as shown on the right

• Notation can be simpler, i.e.

jWjIjStepO 0


What can perceptrons represent?

AND XORInput 1 0 0 1 1 0 0 1 1Input 2 0 1 0 1 0 1 0 1Output 0 0 0 1 0 1 1 0



0,0

0,1

1,0

1,1

0,0

0,1

1,0

1,1

AND XOR

• Functions which can be separated in this way are called Linearly Separable

• Only linearly separable functions can be represented by a perceptron

• XOR cannot be represented by a perceptron



Linear Separability is also possible in more than 3 dimensions – but it is harder to visualise

XOR

• XOR is not “linearly separable”– Cannot be represented by a perceptron

• What can we do instead?1. Convert to logic gates that can be

represented by perceptrons2. Chain together the gates

• Make sure you understand the following– check it using truth tables

X1 XOR X2 = (X1 AND NOT X2) OR (X2 AND NOT X1)



XOR Function

2

2

2

2

-1

-1

Z1

Z2

Y

X1

X2

XOR

X1 X2 Y

1 1 0

1 0 1

0 1 1

0 0 0

X1 XOR X2 = (X1 AND NOT X2) OR (X2 AND NOT X1)

Single- vs. Multiple-Layers

• Once we chain together the gates then we have “hidden layers” – layers that are “hidden” from the output

lines• Have just seen that hidden layers allow us to

represent XOR– Perceptron is single-layer– Multiple layers increase the representational

power, so e.g. can represent XOR• Generally useful nets have multiple-layers

– typically 2-4 layers

Expectations

• Be able to explain the terminology used, e.g.– activation functions– step and threshold functions– perceptron– feed-forward– multi-layer, hidden layers– linear separability

• XOR– why perceptrons cannot cope with XOR– how XOR is possible with hidden layers

Questions?

Neural networks 1

Technology

neural neural networks

neural networks hebb

basic neural networks

neural nets

neural networks aima

2x1 x2 y

unit y

2x1 x2 x3 y