Sparse Neural Systems: The Ersatz Brain gets Thinner James A. Anderson Department of Cognitive and Linguistic Sciences Brown University Providence, Rhode.

Sparse Neural Systems: The Ersatz Brain gets Thinner

James A. Anderson

Department of Cognitive and Linguistic Sciences

Brown University

Providence, Rhode Island

[email protected]

Speculation Alert!

Rampant speculation follows.

Biological Models

The human brain is composed of on the order of 1010 neurons, connected together with at least 1014 neural connections. (Probably underestimates.)

Biological neurons and their connections are extremely complex electrochemical structures. The more realistic the neuron approximation the smaller the network that can be modeled.

There is good evidence that for cerebral cortex a bigger brain is a better brain.

Projects that model neurons are of scientific interest. They are not large enough to model or simulate

interesting cognition.

Neural Networks.

The most successful brain

inspired models are neural networks.

They are built from simple approximations of biological neurons: nonlinear integration of many weighted inputs.

Throw out all the other biological detail.

Neural Network Systems

Units with these approximations can build systems that

can be made large, can be analyzed, can be simulated, can display complex

cognitive behavior.

Neural networks have been used to model important aspects of human cognition.

Most neural nets assume full connectivity between layers.

A fully connected neural net uses lots of connections!

A Fully Connected Network

Sparse Connectivity The brain is sparsely connected. (Unlike most neural

nets.) A neuron in cortex may have on the order of 100,000

synapses. There are more than 1010 neurons in the brain. Fractional connectivity is very low: 0.001%.

Implications: • Connections are expensive biologically since they

take up space, use energy, and are hard to wire up correctly.

• Therefore, connections are valuable.• The pattern of connection is under tight control.• Short local connections are cheaper than long ones.

Our approximation makes extensive use of local connections for computation.

Few active units represent an event.

“In recent years a combination of experimental, computational, and theoretical studies have pointed to the existence of a common underlying principle involved in sensory information processing, namely that information is represented by a relatively small number of simultaneously active neurons out of a large population, commonly referred to as ‘sparse coding.’”

Bruno Olshausen and David Field (2004 paper, p. 481).

Sparse Coding

There are numerous advantages to sparse coding.

Sparse coding provides •increased storage capacity in associative memories •is easy to work with computationally, We will make use of these properties.

Sparse coding also •“makes structure in natural signals explicit” •is energy efficient.

Best of all: It seems to exist!

Higher levels (further from sensory inputs) show sparser coding than lower levels.

Advantages of Sparse Coding

See if we can make a learning system that starts from the assumption of both •sparse connectivity and•sparse coding.

If we use simple neural net units it doesn’t work so well.

But if we use our Network of Networks approximation, it works better and makes some interesting predictions.

Sparse Connectivity + Sparse Coding

The simplest sparse system has a single active unit connecting to a single active unit.

If the potential connection does exist, simple outer-product Hebb learning can learn it easily.

Not very interesting.

The Simplest Connection

A useful notion in sparse systems is the idea of a path.

A path connects a sparsely coded input unit with a sparsely coded output unit.

Paths have strengths just as connections do.

Strengths are based on the entire path, from input to output, which may involve intermediate connections.

It is easy for Hebb synaptic learning to learn paths.

Paths

One of many problems.

Suppose there is a common portion of a path for two single active unit associations,

a with d (a>b>c>d) and e with f (e>b>c>f).

We cannot weaken or strengthen the common part of the path (b>c) because it is used for multiple associations.

Common Parts of a Path

Some speculations: If independent paths are desirable an initial construction bias would be to make available as many potential paths as possible.

In a fully connected system, adding more units than contained in the input and output layers would be redundant.

They would add no additional processing power.

Obviously not so in sparse systems!

Fact: There is a huge expansion in number of units going from retina to thalamus to cortex.

In V1, a million input fibers drive 200 million V1 neurons.

Make Many, Many Paths!

Network of Networks ApproximationSingle units do not work so

well in sparse systems.

Let us our Network of Networks approximation and see if we can do better.

Network of Networks: the

basic computing units are not neurons, but small (104 neurons) attractor networks.

Basic Network of Networks

Architecture:• 2 Dimensional array of

modules • Locally connected to

neighbors

Received wisdom has it that neurons are the basic

computational units of the brain. The Ersatz Brain Project is based on a different assumption.

The Network of Networks model was developed in collaboration with Jeff Sutton (Harvard Medical School, now NSBRI).

Cerebral cortex contains intermediate level structure,

between neurons and an entire cortical region. Examples of intermediate structure are cortical columns of various sizes (mini-, plain, and hyper)

Intermediate level brain structures are hard to study

experimentally because they require recording from many cells simultaneously.

The Ersatz Brain Approximation:The Network of Networks.

Cortical Columns: Minicolumns

“The basic unit of cortical operation is the minicolumn … It contains of the order of 80-100 neurons except in the primate striate cortex, where the number is more than doubled. The minicolumn measures of the order of 40-50 m in transverse diameter, separated from adjacent minicolumns by vertical, cell-sparse zones … The minicolumn is produced by the iterative division of a small number of progenitor cells in the neuroepithelium.” (Mountcastle, p. 2)

VB Mountcastle (2003). Introduction [to a special issue of Cerebral Cortex on columns]. Cerebral Cortex, 13, 2-4.

Figure: Nissl stain of cortex in planum temporale.

Columns: Functional

Groupings of minicolumns seem to form the physiologically observed functional columns. Best known example is orientation columns in V1.

They are significantly bigger than minicolumns, typically around 0.3-0.5 mm.

Mountcastle’s summation:

“Cortical columns are formed by the binding together of many minicolumns by common input and short range horizontal connections. … The number of minicolumns per column varies … between 50 and 80. Long range intracortical projections link columns with similar functional properties.” (p. 3)

Cells in a column ~ (80)(100) = 8000

Interactions between Modules

Modules [columns?] look a little like neural net units.

But interactions between modules are vector not scalar!

Gain greater path selectivity this way.

Interactions between modules are described by state interaction matrices instead of simple scalar weights.

Columnar identity is maintained in both forward and backward projections

“The anatomical column acts as a functionally tuned unit and point of information collation from laterally offset regions and feedback pathways.” (p. 12)

“… feedback projections from extra-striate cortex target the clusters of neurons that provide feedforward projections to the same extra-striate site. … .” (p. 22).

Lund, Angelucci and Bressloff (2003). Cerebral Cortex, 12, 15-24.

Columns and Their Connections

Return to the simplest situation for layers:

Modules a and b can display two orthogonal patterns, A and C on a and B and D on b.

The same pathways can learn to associate A with B and C with D. Path selectivity can overcome the limitations of scalar systems.

Paths are both upward and downward.

Sparse Network of Networks

Consider the common path situation again.

We want to associate patterns on two paths, a-b-c-d and e-b-c-f with link b-c in common.

Parts of the path are physically common but they can be functionally separated if they use different patterns. Pattern information propagating forwards and backwards can sharpen and strengthen specific paths without interfering with the strengths of other paths.

Common Paths Revisted

Just stringing together simple associators works:

For module b:

Change in coupling term between a and b: Δ(Sab) = ηbaT

Change in coupling term between c and b Δ(Tcb) = ηbcT

For module c: Δ(coupling term Udc) = ηcd

T Δ(coupling term Tbc) = ηcb

T If pattern a is presented at layer 1 then:

Pattern on d = (Ucd) (Tbc) (Sab) a = η3 dcT cbT baT a = (constant) d

Associative Learning along a Path

Because information propagates backward and forward, closed loops are possible and likely.

Tried before: Hebb cell assemblies were self exciting neural loops. Corresponded to cognitive entities: for example, concepts.

Hebb’s cell assemblies are hard to make work because of the use of scalar interconnected units. But module assemblies can become a powerful feature of the sparse approach.

We have more selective connections.

See if we can integrate relatively dense local connections with relatively sparse projections to and from other layers to form module assemblies.

Module Assemblies

Biological Evidence:Columnar Organization in IT

Tanaka (2003) suggests a columnar organization of different response classes in primate inferotemporal cortex.

There seems to be some internal structure in these regions: for example, spatial representation of orientation of the image in the column.

IT Response Clusters: Imaging

Tanaka (2003) used intrinsic visual imaging of cortex. Train video camera on exposed cortex, cell activity can be picked up.

At least a factor of

ten higher resolution than fMRI.

Size of response is

around the size of functional columns seen elsewhere: 300-400 microns.

Columns: Inferotemporal Cortex

Responses of a region of IT to complex images involve discrete columns.

The response to a

picture of a fire extinguisher shows how regions of activity are determined.

Boundaries are where

the activity falls by a half.

Note: some spots are

roughly equally spaced.

Active IT Regions for a Complex Stimulus

Note the large number of roughly equally distant spots (2 mm) for a familiar complex image.

Intralayer connections are sufficiently dense so that active modules a little distance apart can become associatively linked.

Recurrent collaterals of cortical pyramidal cells form relatively dense projections around a pyramidal cell. The extent of lateral spread of recurrent collaterals in cortex seems to be over a circle of roughly 3 mm diameter.

If we assume that: •A column is roughly a third of a mm, •There are roughly 10 columns in a square mm. •A 3 mm diameter circle has an area of roughly 10 square mm,

A column projects locally to about 100 other columns.

Intralayer Connections

If the modules are simultaneously active the pairwise associations forming the loop abcda are learned.

The path closes on itself.

Consider a. After traversing the linked path a>b>c>d>a, the pattern arriving at a around the loop is a constant times the pattern on a.

If the constant is positive there is the potential for positive feedback if the total loop gain is greater than one.

Loops

Loops can be kept separate even with common modules.

If the b pattern is different in the two loops, there is no problem. The selectivity of links will keep activities separate.

Activity from one loop will not spread into the other (unlike Hebb cell assemblies).

Loops with Common Modules

If b is identical in the two loops b is ambiguous. There is no a priori reason to activate Loop 1, Loop 2, or both. Selective loop activation is still possible, though it requires additional assumptions to accomplish.

More complex connection patterns are possible.

Richer interconnection patterns might have all connections learned.

Ambiguous module b will receive input from d as well as a and c.

A larger context would allow better loop disambiguation by increasing the coupling strength of modules.

Richly Connected Loops

Putting in All Together:

Sparse interlayer connections and dense intralayer connections work together. Once a coupled module assembly is formed, it can be linked to by other layers.

Now becomes a dynamic, adaptive computational architecture that becomes both workable and interesting.

Working Together

Two Parts …

Suppose we have two such assemblies that co-occur frequently.

Parts of an object say …

As learning continues: Groups of module assemblies bind together through Hebb associative learning.

The small assemblies can act as the “sub-symbolic” substrate of cognition and the larger assemblies, symbols and concepts.

Note the many new interconnections.

Make a Whole!

Conclusion (1)

• The binding process looks like compositionality.

• The virtues of compositionality are well known.

• It is a powerful and flexible way to build cognitive information processing systems.

• Complex mental and cognitive objects can be built from previously constructed, statistically well-designed pieces.

Conclusion (2)• We are suggesting here a possible model for the dynamics and learning in a compositional-like system.

• It is built based on constraints derived from connectivity, learning, and dynamics and not as a way to do optimal information processing.

• Perhaps this property of cognitive systems is more like a splendid bug fix than a well chosen computational strategy.

• Sparseness is an idea worth pursuing.• May be a way to organize and teach a cognitive computer.

Sparse Neural Systems: The Ersatz Brain gets Thinner James A. Anderson Department of Cognitive and Linguistic Sciences Brown University Providence, Rhode.

Documents

neural connections

sparse neural systems

sparse connectivity

neural networks

neural nets

connected network slide

neural network systems

human brain