Top Banner
Lecture 1 1 Neural Networks Prof. Ruchi Sharma Bharatividyapeeth College of Engineering New Delhi
68
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Nn3

Lecture 1 1

Neural Networks

Prof. Ruchi SharmaBharatividyapeeth College of

EngineeringNew Delhi

Page 2: Nn3

Lecture 1 2

Plan Requirements Background

Brains and Computers Computational Models

Learnability vs Programming Representability vs Training Rules

Abstractions of Neurons Abstractions of Networks

Completeness of 3-Level Mc-Cullough-Pitts Neurons

Learnability in Perceptrons

Page 3: Nn3

Lecture 1 3

Brains and Computers What is computation? Basic Computational Model

(Like C, Pascal, C++, etc)(See course on Models of

Computation.)

Analysis of problem and reduction to algorithm.

Implementation of algorithm by

Programmer.

Page 4: Nn3

Lecture 1 4

Computation

Brain also computes. ButIts programming seems

different.

Flexible, Fault Tolerant (neurons die every day),

Automatic Programming,Learns from examples,Generalizability

Page 5: Nn3

Lecture 1 5

Brain vs. Computer

Brain works on slow components (10**-3 sec)

Computers use fast components (10**-9 sec)

Brain more efficient (few joules per operation) (factor of 10**10.)

Uses massively parallel mechanism.

****Can we copy its secrets?

Page 6: Nn3

Lecture 1 6

Brain vs Computer Areas that Brain is better:

Sense recognition and integration

Working with incomplete information

Generalizing Learning from

examples Fault tolerant

(regular programming is notoriously fragile.)

Page 7: Nn3

Lecture 1 7

AI vs. NNs

AI relates to cognitive psychology Chess, Theorem Proving,

Expert Systems, Intelligent agents (e.g. on Internet)

NNs relates to neurophysiology Recognition tasks,

Associative Memory, Learning

Page 8: Nn3

Lecture 1 8

How can NNs work?

Look at Brain: 10**10 neurons (10

Gigabytes) . 10 ** 20 possible

connections with different numbers of dendrites (reals)

Actually about 6x 10**13 connections (I.e. 60,000 hard discs for one photo of contents of one brain!)

Page 9: Nn3

Lecture 1 9

Brain

Complex Non-linear (more later!) Parallel Processing Fault Tolerant Adaptive Learns Generalizes Self Programs

Page 10: Nn3

Lecture 1 10

Abstracting

Note: Size may not be crucial (apylsia or crab does many things)

Look at simple structures first

Page 11: Nn3

Lecture 1 11

Real and Artificial Neurons

Page 12: Nn3

Lecture 1 12

One NeuronMcCullough-Pitts

This is very complicated. But abstracting the details,we have

w1

w2

wn

x1

x2

xn

hresholdntegrate

Integrate-and-fire Neuron

Page 13: Nn3

Lecture 1 13

Representability

What functions can be represented by a network of Mccullough-Pitts neurons?

Theorem: Every logic function of an arbitrary number of variables can be represented by a three level network of neurons.

Page 14: Nn3

Lecture 1 14

Proof

Show simple functions: and, or, not, implies

Recall representability of logic functions by DNF form.

Page 15: Nn3

Lecture 1 15

AND, OR, NOT

w1

w2

wn

x1

x2

xn

hresholdntegrate

1.5

1.0

1.0

Page 16: Nn3

Lecture 1 16

AND, OR, NOT

w1

w2

wn

x1

x2

xn

hresholdntegrate

.9

1.0

1.0

Page 17: Nn3

Lecture 1 17

Page 18: Nn3

Lecture 1 18

AND, OR, NOT

w1

w2

wn

x1

x2

xn

hresholdntegrate

-.5

-1.0

Page 19: Nn3

Lecture 1 19

DNF and All Functions

Theorem Any logic (boolean) function

of any number of variables can be represented in a network of McCullough-Pitts neurons.

In fact the depth of the network is three.

Proof: Use DNF and And, Or, Not representation

Page 20: Nn3

Lecture 1 20

Other Questions?

What if we allow REAL numbers as inputs/outputs?

What real functions can be represented?

What if we modify threshold to some other function; so output is not {0,1}. What functions can be represented?

Page 21: Nn3

Lecture 1 21

Representability and Generalizability

Page 22: Nn3

Lecture 1 22

Learnability and Generalizability

The previous theorem tells us that neural networks are potentially powerful, but doesn’t tell us how to use them.

We desire simple networks with uniform training rules.

Page 23: Nn3

Lecture 1 23

One Neuron(Perceptron)

What can be represented by one neuron?

Is there an automatic way to learn a function by examples?

Page 24: Nn3

Lecture 1 24

Perceptron Training Rule

Loop:Take an example.

Apply to network. If correct answer,

return to loop. If incorrect, go to FIX.FIX: Adjust network weights

by input example . Go to

Loop.

Page 25: Nn3

Lecture 1 25

Example of PerceptronLearning

X1 = 1 (+) x2 = -.5 (-) X3 = 3 (+) x4 = -2 (-) Expanded Vector

Y1 = (1,1) (+) y2= (-.5,1)(-)

Y3 = (3,1) (+) y4 = (-2,1) (-)

Random initial weight (-2.5, 1.75)

Page 26: Nn3

Lecture 1 26

Graph of Learning

Page 27: Nn3

Lecture 1 27

Trace of Perceptron W1 y1 = (-2.5,1.75)

(1,1)<0 wrong W2 = w1 + y1 = (-1.5,

2.75) W2 y2 = (-1.5, 2.75)(-.5,

1)>0 wrong W3 = w2 – y2 = (-1,

1.75) W3 y3 = (-1,1.75)(3,1) <0

wrong W4 = w4 + y3 = (2, 2.75)

Page 28: Nn3

Lecture 1 28

Perceptron Convergence Theorem

If the concept is representable in a perceptron then the perceptron learning rule will converge in a finite amount of time.

(MAKE PRECISE and Prove)

Page 29: Nn3

Lecture 1 29

What is a Neural Network?

What is an abstract Neuron?

What is a Neural Network? How are they computed? What are the advantages? Where can they be used?

Agenda What to expect

Page 30: Nn3

Lecture 1 30

Perceptron Algorithm Start: Choose arbitrary

value for weights, W Test: Choose arbitrary

example X If X pos and WX >0 or X

neg and WX <= 0 go to Test

Fix: If X pos W := W +X; If X negative W:= W –X; Go to Test;

Page 31: Nn3

Lecture 1 31

Perceptron Conv. Thm.

Let F be a set of unit length vectors. If there is a vector V* and a value e>0 such that V*X > e for all X in F then the perceptron program goes to FIX only a finite number of times.

Page 32: Nn3

Lecture 1 32

Applying Algorithm to “And”

W0 = (0,0,1) or random X1 = (0,0,1) result 0 X2 = (0,1,1) result 0 X3 = (1,0, 1) result 0 X4 = (1,1,1) result 1

Page 33: Nn3

Lecture 1 33

“And” continued Wo X1 > 0 wrong; W1 = W0 – X1 = (0,0,0) W1 X2 = 0 OK (Bdry) W1 X3 = 0 OK W1 X4 = 0 wrong; W2 = W1 +X4 = (1,1,1) W3 X1 = 1 wrong W4 = W3 –X1 = (1,1,0) W4X2 = 1 wrong W5 = W4 – X2 = (1,0, -1) W5 X3 = 0 OK W5 X4 = 0 wrong W6 = W5 + X4 = (2, 1, 0) W6 X1 = 0 OK W6 X2 = 1 wrong W7 = W7 – X2 = (2,0, -1)

Page 34: Nn3

Lecture 1 34

“And” page 3 W8 X3 = 1 wrong W9 = W8 – X3 = (1,0, 0) W9X4 = 1 OK W9 X1 = 0 OK W9 X2 = 0 OK W9 X3 = 1 wrong W10 = W9 – X3 = (0,0,-1) W10X4 = -1 wrong W11 = W10 + X4 = (1,1,0) W11X1 =0 OK W11X2 = 1 wrong W12 = W12 – X2 = (1,0, -1)

Page 35: Nn3

Lecture 1 35

Proof of Conv Theorem

Note:1. By hypothesis, there is a

such that V*X > for all x F 1. Can eliminate threshold (add additional dimension to

input) W(x,y,z) > threshold if and only if

W* (x,y,z,1) > 02. Can assume all examples are positive ones

(Replace negative examples by their negated vectors) W(x,y,z) <0 if and only if W(-x,-y,-z) > 0.

Page 36: Nn3

Lecture 1 36

Proof (cont). Consider quotient V*W/|W|.(note: this is multidimensionalcosine between V* and W.)Recall V* is unit vector .

Quotient <= 1.

Page 37: Nn3

Lecture 1 37

Proof(cont) Now each time FIX is visited

W changes via ADD. V* W(n+1) = V*(W(n) + X) = V* W(n) + V*X >= V* W(n) + Hence

V* W(n) >= n

Page 38: Nn3

Lecture 1 38

Proof (cont) Now consider denominator: |W(n+1)| = W(n+1)W(n+1) =

( W(n) + X)(W(n) + X) = |W(n)|**2 + 2W(n)X + 1

(recall |X| = 1 and W(n)X < 0 since X is positive example and we are in FIX)

< |W(n)|**2 + 1

So after n times |W(n+1)|**2 < n (**)

Page 39: Nn3

Lecture 1 39

Proof (cont) Putting (*) and (**) together:

Quotient = V*W/|W| > n sqrt(n)

Since Quotient <=1 this means n < (1/This means we enter FIX a

bounded number of times. Q.E.D.

Page 40: Nn3

Lecture 1 40

Geometric Proof

See hand slides.

Page 41: Nn3

Lecture 1 41

Perceptron Diagram 1

SolutionOK

No examples

No examples

No examples

Page 42: Nn3

Lecture 1 42

SolutionOK

No examples

No examples

No examples

Page 43: Nn3

Lecture 1 43

Page 44: Nn3

Lecture 1 44

Additional Facts Note: If X’s presented in

systematic way, then solution W always found.

Note: Not necessarily same as V*

Note: If F not finite, may not obtain solution in finite time

Can modify algorithm in minor ways and stays valid (e.g. not unit but bounded examples); changes in W(n).

Page 45: Nn3

Lecture 1 45

Perceptron Convergence Theorem

If the concept is representable in a perceptron then the perceptron learning rule will converge in a finite amount of time.

(MAKE PRECISE and Prove)

Page 46: Nn3

Lecture 1 46

Important Points

Theorem only guarantees result IF representable!

Usually we are not interested in just representing something but in its generalizability – how will it work on examples we havent seen!

Page 47: Nn3

Lecture 1 47

Percentage of Boolean Functions Representible by a Perceptron

Input Perceptron Functions

1 4 42 16 143 104 2564 1,882 65,5365 94,572

10**96 15,028,134

10**19 7 8,378,070,864 10**38

8 17,561,539,552,946 10**77

Page 48: Nn3

Lecture 1 48

Generalizability

Typically train a network on a sample set of examples

Use it on general class Training can be slow; but

execution is fast.

Page 49: Nn3

Lecture 1 49

•Pattern Identification

•(Note: Neuron is trained)

•weights

field receptivein threshold Axw ii kdkdkfjlll

field. receptive in the is letter The Axw ii

Perceptron

Page 50: Nn3

Lecture 1 50

What wont work?

Example: Connectedness with bounded diameter perceptron.

Compare with Convex with

(use sensors of order three).

Page 51: Nn3

Lecture 1 51

Biases and Thresholds We can

replace the threshold with a bias.

A bias acts exactly as a weight on a connection from a unit whose activation is always 1.

n

iii xw

1

n

iii xw

1

0-

n

iii xw

0

0 1 - 00 xw

Page 52: Nn3

Lecture 1 52

Perceptron

Loop : Take an example and

apply to network. If correct answer – return

to Loop. If incorrect – go to Fix.

Fix : Adjust network weights by

input example. Go to Loop.

Page 53: Nn3

Lecture 1 53

Perceptron Algorithm

Let be arbitrary Choose: choose Test: If and

go to ChooseIf and

go to Fix plusIf and

go to ChooseIf and

go to Fix minusFix plus: go to ChooseFix minus: go to Choose

v

Fx

Fx

FxFxFxFx

0xv

0xv

0xv0xv

xvv :xvv :

Page 54: Nn3

Lecture 1 54

Perceptron Algorithm Conditions to the algorithm

existence : Condition no.1:

Condition no.2:We choose F to be a group of unit vectors.

0*

0* ,

xvFxif

xvFxifv

Page 55: Nn3

Lecture 1 55

Geometric viewpoint

*vx

nw

1nw

Page 56: Nn3

Lecture 1 56

Perceptron Algorithm Based on these conditions

the number of times we enter the Loop is finite.

Proof:

0 0

*

yAyxAx

FFF*

Positive

examples

Negative

examples

Examples world

Page 57: Nn3

Lecture 1 57

Perceptron Algorithm-Proof We replace the

threshold with a bias. We assume F is a

group of unit vectors.1 F

Page 58: Nn3

Lecture 1 58

Perceptron Algorithm-Proof We reduce

what we have to prove by eliminating all the negative examples and placing their negations in the positive examples.

0

0

*

*

*

A

AF

AF

FF

-yy

yaya

ii

iiii

0 0 **

Page 59: Nn3

Lecture 1 59

Perceptron Algorithm-Proof

1*

*

*

A

AA

AA

AAAG

nAA

AAAAAAAAA

n

llll

*

****1

*

The numerator :

After n changes

Page 60: Nn3

Lecture 1 60

Perceptron Algorithm-Proof

nA

AAA

AAAAA

n

ttt

ttttt

2

222

11

2

1

1 0

12

The denominator :

After n changes

Page 61: Nn3

Lecture 1 61

Perceptron Algorithm-Proof

2

*

*

2

1

1

n

nn

n

A

AAAG

nAA

nA

n

n

n

nFrom the numerator :

From the denominator :

n is final

Page 62: Nn3

Lecture 1 62

Example - AND

0 0 1

0 1 1

1 0 1

1 1 1

1x 2x AND

0

0

0

1

bias

1,0,11,1,00,1,1FIX

11,1,0

0,1,11,0,01,1,1FIX

11,0,0

1,1,11,1,10,0,0FIX

01,1,1

01,0,1

01,1,0

01,0,0

0,0,0

123

212

012

101

301

030

020

010

000

0

xww

wxw

xww

wxw

xww

wxw

wxw

wxw

wxw

w

wrong

wrong

wrong

etc…

Page 63: Nn3

Lecture 1 63

AND – Bi Polar solution

-1

-1

1 0

1-1

1 0

-1

1 1 0

1 1 1 0

-1

-1

1 0

1-1

1 1

-1

1 1 1

1 1 1 1

1 1 1

1 1 -1

wrong+

wrong

wrong -

-0 2 0

-1

-1

1 0

1-1

1 0

-1

1 1 0

1 1 1 1

continue

success

Page 64: Nn3

Lecture 1 64

Problem

111 RHSMHSLHS 333 RHSMHSLHS222 RHSMHSLHS

0111 RHSMHSLHS

0222 RHSMHSLHS 0333 RHSMHSLHS

321 MHSMHSMHS

011 RHSLHS 022 RHSLHS 033 RHSLHS

21 LHSLHS 32 RHSRHS

should be small enough so that

should be small enough so that

2RHS2LHS

022 RHSLHS 022 RHSLHS

contradiction!!!

Page 65: Nn3

Lecture 1 65

Linear Separation

Every perceptron determines a classification of vector inputs which is determined by a hyperline

Two dimensional examples (add algebra)

OR AND XOR

not possible

Page 66: Nn3

Lecture 1 66

Linear Separation in Higher Dimensions

In higher dimensions, still linear separation, but hard to tell

Example: Connected; Convex - which can be handled by Perceptron with local sensors; which can not be.

Note: Define local sensors.

Page 67: Nn3

Lecture 1 67

What wont work?

Try XOR.

Page 68: Nn3

Lecture 1 68

Limitations of Perceptron

Representability Only concepts that are

linearly separable. Compare: Convex versus

connected Examples: XOR vs OR