EE141 1 Hebbian Model Learning Janusz A. Starzyk .

EE1411

Hebbian Model LearningHebbian Model Learning

Janusz A. Starzyk

http://grey.colorado.edu/CompCogNeuro/index.php/CECN_CU_Boulder_OReillyhttp://grey.colorado.edu/CompCogNeuro/index.php/Main_Page

Based on a courses taught by Prof. Randall O'Reilly, University of Colorado,Prof. Włodzisław Duch, Uniwersytet Mikołaja Kopernikaand http://wikipedia.org/

Cognitive Neuroscience Cognitive Neuroscience and Embodied Intelligenceand Embodied Intelligence

EE1412

So farSo farElements: neurons, ions, channels, membranes, conductivity, impulse generation...

Neural networks: signal transformation, filtering specific information, amplification, contrast, network stability, winner takes most (WTM), noise, network attractors...

Many specific mechanisms, eg. mechano-electrical transduction of sensory signals: hair cells in the ear open ion channels with the help of proteins, functioning like springs attached to the ion channels, converting mechanical vibrations into electrical impulses.

How do network configurations form which do interesting things? Learning is necessary!

EE1413

Learning: typesLearning: types1. How should an ideal learning system look?

2. How does a human being learn?

Detectors (neurons) can change local parameters but we want to achieve a change in the functioning of the entire information processing network.

We will consider two types of learning, requiring other mechanisms:

Learning an internal model of the environment (spontaneous). Learning a task set by the network (supervised). Connection of both.

EE1414

Model learningModel learningInternal representations of patterns appearing in incoming signals in the environment of a given neural group.Discovering correlations between signals.

positive correlation

Elements of images, movements, animal behavior or emotions, we can correlate everything by creating a behavioral model.

Only strong correlations are relevant, there are too many weak ones and they can be coincidental.

Example: hebb_correl.proj, in Chapter 4

EE1415

SimulationSimulationSelect:hebb_correl.proj, in Chapter 4

Click on r.wt in the network window, after clicking on the hidden neuron we see the initialization of weights of the entire network to 0.5.

Click on act in the network window, on

run in the control window

In effect we get binary weights =>

lrate = pright = probability of the first event

Defaults changes pright =1 to 0.7

EE1416

Biological foundations: LTP, LTDBiological foundations: LTP, LTDLong-Term Potentiation, LTP, was discovered in 1966 first in the hippocampus, then in the cortex.Stimulating a neuron with a current Of ~100Hz for 1 second increases synaptic efficiency by 50-100%, it's a long-term effect.

Opposite effect: LTD, Long-Term Depression.

The most common form of LTP/LTD is related to NMDA receptors. Activity of NMDA channels requires presynaptic as well as postsynaptic activity, and so is in compliance with the rule introduced by Donald Hebb in 1949, tersely summarized thus:

Neurons that fire together wire together.

Neurons showing simultaneous activity strengthen their bonds.

EE1417

NMDA receptorsNMDA receptors1. Mg+ ions block NMDA channels. An increase in postsynaptic potential is necessary to remove them and enable interactions with glutamate.

2. Presynaptic activity is necessary to release the glutamate, which opens NMDA channels.

3. Ca++ ions enter these channels triggering a series of chemical reactions, which are not completely tested.

The effect is nonlinear: small amounts of Ca++ give LTD and large amounts give LTP. Many other processes play a role in LTP.

More detailed information on LTP/LTD.

EE1418

Hebbian CorrelationHebbian CorrelationFrom a theoretical point of view the biological mechanism LTP is not very relevant, we test only the simplest versions.

Simple Hebb's rule: wij = ai aj

Change in weights is proportional to pre- and post-synaptic activity.

Weights increase for neurons with strongly correlated activity, don't change for neurons whose activity doesn't show a correlation.

xi

yj=xw

EE1419

Hebb - normalizationHebb - normalizationSimple Hebb's rule: wij = xi yj

leads to an infinite increase in weights.

This can be avoided in many ways;

often employed is a normalization of weights:

wij = (xi -wij) yj

This has a biological justification:

when x and y are large we have a strong LTP, much Ca++

when y is large but x is small we have LTD, some Ca++ when y is small nothing happens because Mg+ ions block the NMDA channels

EE14110

Model learningModel learningHebb's mechanism allows for learning correlations.

What happens if we add more postsynaptic neurons?

They will learn the same correlations!

If we use kWTA then output units will compete with each other.

Learning = survival of the fittest (Darwin's mechanism) + specialization.

Learning based on self-organization

Inhibition of kWTA: only the strongest units remain active. Hebbian learning: the winners become even stronger. Result: different neurons react to different signal properties.

EE14111

What do we want from model learning? What do we want from model learning?

The environment supplies a lot of information, but the signals are variable and of poor quality, the identification of objects and relationships between them isn't possible without extensive knowledge of what can be expected.

We need an environmental state model biased for recognition and correct behavior; correlations are a necessary (but not sufficient) condition of causal relationships.

EE14112


This experience (bias) can also be a factor limiting recognition when we stubbornly look for old solutions in the new game.

We assume that in genetic development nature worked out proven mechanisms of getting to know the world.

- problem: these mechanisms aren't obvious and easy to identify.

Nativists (psychologists who stress genetic influences on behavior) assume that people are born with specified knowledge about the world

- this isn't genetically justified

In opposition to this, a genetic record of connective structures is possible and can constitute genetically encoded knowledge (for example how to breathe or nurse)

Expectations based on previous experience can ease adaptation to a new situation

Example – it's easier to learn a new video game if you've already played other video games and when the designers keep similar game elements

EE14113


This leads to a discrepancy between the model and reality, also called the bias-variance dilemma - a precise model hinders generalization - an oversimplified model prevents correct representationA simple (parsimonious) model was preferred in the 14th century by William of Occam leading to Occam's razor – which cuts in preference of the simplest explanation of a phenomenon.

It's more pragmatic to consider the necessity of introducing beginning knowledge through the model designer

The designer must substitute the mechanism of property selection with his own model This is why many people avoid the introduction of preliminary assumptions (biases), preferring general machine learning mechanisms

EE14114

Standard PCAStandard PCAPrincipal component analysis (PCA) is a mathematical technique for finding linear signal combinations with the greatest variance.

The first neuron should learn the most important correlations, so first we calculate the correlations of its inputs averaged over time: Cik=xixkt for the first element; then for the next, but each neuron should be independent, so it should calculate orthogonal combinations.

For the set of images consecutive components look like this ===============>

How to do this with the help of neurons?

EE14115

PCA on one neuronPCA on one neuronLet's assume that the environment is composed of diagonal lines. Let's accept a linear activation for moment t (image nr t):

Let the change in weights be specified by the simple Hebb's rule:

wij(t+1) = wij(t) + xi yj

After presentation of all the images:

j k kjk

y x w

The change in weights is proportional to the average of the product of the inputs/outputs. Correlation can replace average.

1' 'ij i j i j i j t

t t

w x y x y x yn

EE14116

Hebbian CorrelationsHebbian Correlations

If the averages are zero and the variance is one then the average of the product is the correlation; the change in weights is proportional to:

Cik=xixkt are correlations between inputs; the average of the weights changes slowly. The change in weight for input i is then the weighted average of the correlations between the activity of this input and the remaining ones.

After the presentation of many images, the weights will be dominated by the strongest correlations and yj will calculate the strongest component of PCA

Correlation:

~ij i j i k kj i k kjtt tk kt

ik kjt

k

w x y x x w x x w

C w

2 2ij i i j j i j

tC x x y y

EE14117

ExampleExampleThe two first inputs are completely correlated; the third is uncorrelated.

Changes follow according to Hebb's rule for =1.

Let's assume that the signals have a zero average (xi=+1 the same number of times as xi=-1); for each vector x =(x1,x2,x3), y is calculated, and then the new weights.

Correlated units determine the symbol and scale of the weights,and weights of these inputs grow quickly, whereas the weight of the uncorrelated input x3 decreases.

The weights of unit j change in this way: w(t+1)=w(t)+Cw(t)

EE14118

NormalizationNormalizationThe simplest normalization avoiding an infinite increase in weights: wij = (xi – wij) yj

Erkki Oja (1982) proposed: wij = (xi –yj wij) yj

For one unit, after learning the weights stop changing:

wij = 0 = (xi –yj wij) yj

Weight wij = xi /yj = xi / k xk wkj

The weight of a given input signal is then a fraction of the complete weighted activity of all the signals.

This rule also leads to the calculation of the most important main component. How to calculate the other components?

EE14119

Problems of PCAProblems of PCAHow to generate the succeeding PCA components in neural networks? We numerically perform orthogonalization of successive yj but this is not easy to do with the help of a neural network. Sequential PCA orders components, from the most important to the least; this can be achieved by introducing connections between hidden neurons, but this is an artificial solution.PCA assumes a hierarchical structure: the most important component for all images, in effect we get eg. for image analysis, successive components as chessboards with an increasing number of squares since the correlations of pixels for a large number of images disappear. The problem with PCA can be characterized as: PCA calculates correlations in the entire input space whereas useful correlations exist in local subspaces. Natural images create heterarchies, different combinations are equally important for different images, subsets of features relevant for certain categories are not important for differentiating others.

EE14120

Conditional PCAConditional PCAConditional principal component analysis (CPCA): calculate correlations not for all features but only for these features which are present.

PCA functions on all features, giving orthogonal components.

CPCA functions on subsets of features, ensuring that different components encode different interesting combinations of signal features, eg. edges.

The competition realized with the help of kWTA will ensure the activity of different neurons for different images.

In effect: encoding images => How to do this with the help of neurons?

EE14121

CPCA equationsCPCA equationsA neuron is trained only on a subset of imageswith predetermined features, eg.edges slanting in a certain way.

Normalized Hebb's rule: wij = (xi -wij) yj

The weights move in direction xi, on condition of the activity of yj.

In effect the conditional probability:

P(xi=1|yj=1) = P(xi|yj) = wij

The weight wij = the probability that the input unit xi is active given that the receiving unit yj is also active.

EE14122

Probabilistic interpretationProbabilistic interpretationThe success of CPCA depends on the selection of a function determining the activity of neurons – an automatic determination process is possible in a few ways: self-organization or error correction.

Activations averaged over time are represented by probabilities P(xi|t), P(yj|t). The change in weights for all images t appearing with P(t): wij = t (P(yj|t) P(xi|t) -P(yj|t)wij] P(t)

In a state of equilibrium wij =0 so:

wij = t P(yj|t)P(xi|t)P(t) / t P(yj|t)P(t) =t P(yj,xi,t) / t P(yj,t) = P(xi ,yj)/P(yj) = P(xi|yj)

Weight wij = conditional probability xi under condition yj.

How to biologically justify normalization?

EE14123

Biological interpretationBiological interpretationNormalized Hebb's rule: wij = (xi -wij) yj

Let's assume that the weights are wij ~0.5, there are then 3 possibilities:

1. xi , yj ~1 (a strong pre- and postsynaptic activity), so xi > wij, weights increase, so we have LTP, as in NMDA channels.

2. yj ~1 but xi < wij, weight decrease, we have LTD, a weak input signal will suffice to unblock the Mg+ ion of NMDA channel.

A strong postsynaptic activity can also unblock other voltage dependent channels and introduce a small amount of Ca++.

3. Activity yj ~0 doesn't give any changes, voltage channels and NMDA aren't active. Learning happens faster for small wij, because xi < wij more often.Qualitatively consistent with observations of weight saturation.

EE14124

SimulationsSimulationsSelect:hebb_correl.proj, in Chapter 4

Description: Chapter 4. 6Look at EventsEvt Label, and within this FreqEvent is 1 for Right and 0 for Left

Change in weight values: Graph_log

lrate = 0.005, try 0.1Change p_right from 1 to 0.7 and to 0.5

Change Env_type from One_line to Three_lines and p_right=0.7

Notice that the weights are becoming small, diffuse, because the conditional probabilities for images learning entire categories are becoming small; the output unit contributes to this because it has a small selectivity.

EE14125

Normalization of weights in CPCANormalization of weights in CPCACPCA weights are not very selective, don't lead to image differentiation – they don't have dynamic range; for typical situations P(xi|yj) is small, but we want it around 0.5.

Solution: renormalization of weights and contrast enhancement.Normalization: uncorrelated signals should have a weight of 0.5, but in simulations with seldom appearing signals xi approach a value of ~0.1-0.2. Let's factorize the weight change into two terms: wij = (xi -wij) yj = [(1-wij) xi yj+(1-xi)(0-wij)yj ]

The first term causes an increase in weights in the direction of 1, the second causes a decrease in the direction of 0; if we want to maintain average weights around 0.5 we must increase the first term, eg. :

wij = [(0.5/wij) xi yj+(1-xi)(0-wij)yj ]

The linear correlation is still wij = P(xi|yj) 0.5/The simulator has a parameter savg_cor[0,1] determining the degree of normalization

EE14126

Contrast in CPCAContrast in CPCAInstead of a linear weight change we want to ignore weak correlations and strengthen strong correlations – to increase the contrast between interesting aspects of signals and those that are not. This increases the simplicity of the connections (the weak ones can be skipped) and accelerates the learning process, helping the weights decide what to do.

Contrast enhancement: instead of a linear weight change use a sigmoidal one:

Two parameters:

gain and offset Whereimposes higher threshold

Attention: this is a scaling of individual weights not of activations!

ij

ij

ij

w

ww

11

1ˆ

EE14127

SimulationsSimulationsSelect:hebb_correl.proj, in Chapt. 4

Description: Chapt. 4. 6Change Env_type from One_line to Five_lines and p_right=0.7

For these lines CPCA gives identical weights around 0.2.

Change the normalization, setting savg_cor=1 The weights should be around 0.5The parameter savg_cor allows us to influence the number of features used by the hidden units.

Contrast: set wt_gain=6 instead of 1, PlotEffWt will show the curve of effective weights. Influence on learning: for Three_lines, savg_cor=1Change wt_off from 1 to 1.25

EE141 1 Hebbian Model Learning Janusz A. Starzyk .

Documents