Top Banner
Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur [email protected]
49

Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur [email protected].

Dec 29, 2015

Download

Documents

Edwina Cross
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Neural Networks

Pabitra MitraComputer Science and Engineering

IIT [email protected]

Page 2: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.
Page 3: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Neural Networks NN 1 3

The Neuron• The neuron is the basic information processing unit of a

NN. It consists of:1 A set of synapses or connecting links, each link

characterized by a weight: W1, W2, …, Wm

2 An adder function (linear combiner) which computes the weighted sum of the inputs:

3 Activation function (squashing function) for limiting the amplitude of the output of the neuron.

m

1jj xwu

j

) (u y b

Page 4: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Computation at Units• Compute a 0-1 or a graded function of the

weighted sum of the inputs• is the activation function

ii xwxw.

1w

nw

2w

1x

2x

nx

).( xwgg

()g

Page 5: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Neural Networks NN 1 5

The Neuron

Inputsignal

Synapticweights

Summingfunction

Biasb

ActivationfunctionLocal

Fieldv Output

y

x1

x2

xm

w2

wm

w1

)(

Page 6: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Common Activation Functions

• Step function: g(x)=1, if x >= t ( t is a threshold)g(x) = 0, if x < t

• Sign function: g(x)=1, if x >= t ( t is a threshold)g(x) = -1, if x < t

• Sigmoid function: g(x)= 1/(1+exp(-x))

Page 7: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Neural Networks NN 1 7

Bias of a Neuron

• Bias b has the effect of applying an affine transformation to u

v = u + b• v is the induced field of the neuron

v

u

m

1jj xwu

j

Page 8: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Neural Networks NN 1 8

Bias as extra input

Inputsignal

Synapticweights

Summingfunction

ActivationfunctionLocal

Fieldv Output

y

x1

x2

xm

w2

wm

w1

)(

w0x0 = +1

• Bias is an external parameter of the neuron. Can be modeled by adding an extra input.

bw

xwv j

m

j

j

0

0

Page 9: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Neural Networks NN 1 9

Face Recognition

90% accurate learning head pose, and recognizing 1-of-20 faces

Page 10: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Neural Networks NN 1 10

Handwritten digit recognition

Page 11: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Computing with spaces

x1 x2

y

perceptual features

+1 = cat, -1 = dog

x1

x2

y

dog cat

y g(Wx)QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

E y g(Wx) 2error:

Page 12: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Can Implement Boolean Functions

• A unit can implement And, Or, and Not• Need mapping True and False to numbers:– e.g. True = 1.0, False= 0.0

• (Exercise) Use a step function and show how to implement various simple Boolean functions

• Combining the units, we can get any Boolean function of n variablesCan obtain logical circuits as special case

Page 13: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Network Structures

• Feedforward (no cycles), less power, easier understood– Input units– Hidden layers– Output units

• Perceptron: No hidden layer, so basically correspond to one unit, also basically linear threshold functions (ltf)

• Ltf: defined by weights and threshold , value is 1 iff otherwise, 0txw .

tw

Page 14: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Neural Networks NN 1 14

Single Layer Feed-forward

Input layerof

source nodes

Output layerof

neurons

Page 15: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Neural Networks NN 1 15

Multi layer feed-forward

Inputlayer

Outputlayer

Hidden Layer

3-4-2 Network

Page 16: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Network Structures

• Recurrent (cycles exist), more powerful as they can implement state, but harder to analyze. Examples:

• Hopfield network, symmetric connections, interesting properties, useful for implementing associative memory• Boltzmann machines: more general, with applications

in constraint satisfaction and combinatorial optimization

Page 17: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Simple recurrent networks

z1 z2

x1 x2

hidden layer

input layer

output layer

context units

input

(Elman, 1990)

x2 x1

copy

x(i1)

x(i)

Page 18: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Perceptron Capabilities

• Quite expressive: many, but not all Boolean functions can be expressed. Examples:– conjuncts and disjunctions, example

– more generally, can represent functions that are true if and only if at least k of the inputs are true:

– Can’t represent XOR

1)( 2121 xxxx

kxxx n ...21

Page 19: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Representable Functions

• Perceptrons have a monotinicity property: If a link has positive weight, activation can

only increase as the corresponding input value increases (irrespective of other input values)

• Can’t represent functions where input interactions can cancel one another’s effect (e.g. XOR)

Page 20: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Representable Functions

• Can represent only linearly separable functions

• Geometrically: only if there is a line (plane) separating the positives from the negatives

• The good news: such functions are PAC learnable and learning algorithms exist

Page 21: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Linearly Separable

+

++

+

+

++

+++

+

+

-

_

Page 22: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

NOT linearly Separable

++

+

+

_

+

+ OR

+

+

Page 23: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Problems with simple networks

x1 x2

x1

x2 y

Some kinds of data are not linearly separable

x1

x2

AND

x1

x2

OR

x1

x2

XOR

Page 24: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

A solution: multiple layers

z1 z2

y

x1 x2

y

z1

z2

x1

x2

hidden layer

input layer

output layer

Page 25: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

The Perceptron Learning Algorithm

• Example of current-best-hypothesis (CBH) search (so incremental, etc.):

• Begin with a hypothesis (a perceptron)• Repeat over all examples several times– Adjust weights as examples are seen

• Until all examples correctly classified or a stopping criterion reached

Page 26: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Method for Adjusting Weights

• One weight update possibility:• If classification correct, don’t change• Otherwise:– If false negative, add input: – If false positive, subtract input:

• Intuition: For instance, if example is positive, strengthen/increase the weights corresponding to the positive attributes of the example

jjj xww

jjj xww

Page 27: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Properties of the Algorithm

• In general, also apply a learning rate

• The adjustment is in the direction of minimizing error on the example

• If learning rate is appropriate and the examples are linear separable, after a finite number of iterations, the algorithm converges to a linear separator

jjj xww

Page 28: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Another Algorithm(least-sum-squares algorithm)

• Define and minimize an error function• S is the set of examples, is the ideal

function, is the linear function corresponding to the current perceptron

• Error of the perceptron (over all examples):

• Note:

()h()f

Se

ehefhE 2))()(()2/1()(

)(.)(.)( exwexweh ii

Page 29: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

The Delta Rule

x1 x2

y

+1 = cat, -1 = dog

E y g(Wx) 2

wij E

wij

E

wij

2 y g(Wx) g'(Wx) x j

wij y g(Wx) g'(Wx) x j

output error

influenceof input

for any function g with derivative g

perceptual features

Page 30: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Derivative of Error

• Gradient (derivative) of E:

• Take the steepest descent direction:

• is the gradient along , is the learning rate

],...,,[)(10 nw

E

w

E

w

EhE

iiiii w

Ewwww

where,

iw

E

iw

Page 31: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Gradient Descent

• The algorithm: pick initial random perceptron and repeatedly compute error and modify the perceptron (take a step along the reverse of gradient)

E

Descent direction:

Gradient direction:

Page 32: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

E (error)

wij

E

wij

0

E

wij

0

E

wij

0

wij E

wij

( is learning rate)

General-purpose learning mechanisms

Page 33: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Gradient Calculation

))())(())(()((

))()(())()((22

1

))()((2

1]))()((

2

1[ 22

Se ii

Se i

Se iSeii

ehw

efw

ehef

ehefw

ehef

ehefw

ehefww

E

Page 34: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Derivation (cont.)

jiw

exwexehef

w

exwehef

efw

exwehef

w

E

i

jj

Sei

Se j i

jj

Se ii

for ,0))(.(

as ,))())(()((

)))(.(

))(()((

constant is )( ,)))(.(

0))(()((

Page 35: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Properties of the algorithm• Error function has no local minima (is quadratic)

• The algorithm is a gradient descent method to the global minimum, and will asymptotically converge

• Even if not linearly separable, can find a good (minimum error) linear classifier

• Incremental?

Page 36: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Multilayer Feed-Forward Networks

• Multiple perceptrons, layered• Example: a two-layer network with 3 inputs one

output, one hidden layer (two hidden units)

hidden layer

output layerinputs layer

2x

3x

1x

Page 37: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Power/Expressiveness

• Can represent interactions among inputs (unlike perceptrons)

• Two layer networks can represent any Boolean function, and continuous functions (within a tolerance) as long as the number of hidden units is sufficient and appropriate activation functions used

• Learning algorithms exist, but weaker guarantees than perceptron learning algorithms

Page 38: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Back-Propagation

• Similar to the perceptron learning algorithm and gradient descent for perceptrons

• Problem to overcome: How to adjust internal links (how to distribute the “blame” or the error)

• Assumption: internal units use differentiable functions and nonlinear

• sigmoid functions are convenient

Page 39: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Neural Networks NN 1 39

Recurrent Network with hidden neuron(s): unit delay operator z-1 implies dynamic system

z-1

z-1

z-1

Recurrent network

inputhiddenoutput

Page 40: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Back-Propagation (cont.)

• Start with a network with random weights• Repeat until a stopping criterion is met– For each example, compute the network

output and for each unit i it’s error term– Update each weight (weight of link going

from node i to node j):

i

ijw

)( where, iowwww jijijijij

Output of unit i

Page 41: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

The Error Term

• • •

• •

)o'(i)Err(iδi

nodeoutput an is if ),()()( i ehefiErr ii

node internal an is if ,)( i wiErr jij

i node of derivative is )(' io

1 sigmoid,For -o(i)) o(i) (o'(i)

Page 42: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Derivation

• Write the error for a single training example; as before use sum of squared error (as it’s convenient for differentiation, etc):

• Differentiate (with respect to each weight…)• For example, we get

for weight connecting node j to output i

iunitoutput

ii ehefE

2))()((2

1

,)()()(')( iji

joiErriojow

E

ijw

Page 43: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Properties

• Converges to a minimum, but could be a local minimum

• Could be slow to converge(Note: Training a three node net is NP-Complete!)

• Must watch for over-fitting just as in decision trees (use validation sets, etc.)

• Network structure? Often two layers suffices, start with relatively few hidden units

Page 44: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Properties (cont.)

• Many variations to the basic back-propagation: e.g. use momentum

• Reduce with time (applies to perceptrons as well)

10),1()(- )( nwionw ijjij

Nth update amount a constant

Page 45: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Networks, features, and spaces

• Artificial neural networks can represent any continuous function…

• Simple algorithms for learning from data– fuzzy boundaries– effects of typicality

Page 46: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

NN properties

• Can handle domains with – continuous and discrete attributes– Many attributes– noisy data

• Could be slow at training but fast at evaluation time

• Human understanding of what the network does could be limited

Page 47: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Networks, features, and spaces

• Artificial neural networks can represent any continuous function…

• Simple algorithms for learning from data– fuzzy boundaries– effects of typicality

• A way to explain how people could learn things that look like rules and symbols…

Page 48: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Networks, features, and spaces

• Artificial neural networks can represent any continuous function…

• Simple algorithms for learning from data– fuzzy boundaries– effects of typicality

• A way to explain how people could learn things that look like rules and symbols…

• Big question: how much of cognition can be explained by the input data?

Page 49: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.

Challenges for neural networks• Being able to learn anything can make it

harder to learn specific things– this is the “bias-variance tradeoff”