Lateral thinking, from the Hopfield model to cortical dynamics

This article appeared in a journal published by Elsevier. The attachedcopy is furnished to the author for internal non-commercial researchand education use, including for instruction at the authors institution

and sharing with colleagues.

Other uses, including reproduction and distribution, or selling orlicensing copies, or posting to personal, institutional or third party

websites are prohibited.

In most cases authors are permitted to post their version of thearticle (e.g. in Word or Tex form) to their personal website orinstitutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies areencouraged to visit:

http://www.elsevier.com/copyright

http://www.elsevier.com/copyright

Author's personal copy

Research Report

Lateral thinking, from the Hopfield model to cortical dynamics

Athena Akrami, Eleonora Russo, Alessandro Treves⁎

SISSA, Cognitive Neuroscience sector, via Bonomea 265, 34136 Trieste, Italy

A R T I C L E I N F O A B S T R A C T

Article history:Accepted 13 July 2011Available online 23 July 2011

Self-organizing attractor networks may comprise the building blocks for cortical dynamics,providing the basic operations of categorization, including analog-to-digital conversion,association and auto-association, which are then expressed as components of distinctcognitive functions depending on the contentsof theneural codes ineach region. Toassess theviability of this scenario, we first review how a local cortical patch may be modeled as anattractor network, in which memory representations are not artificially stored as prescribedbinary patterns of activity as in the Hopfield model, but self-organize as continuously gradedpatterns induced by afferent input. Recordings in macaques indicate that such corticalattractornetworksmayexpress retrievaldynamicsovercognitivelyplausible rapid timescales,shorter than those dominated by neuronal fatigue. A cortical network encompassing manylocal attractor networks, and incorporating a realistic descriptionof adaptation dynamics,maybe captured by a Potts model. This network model has the capacity to engage long-rangeassociations into sustained iterative attractor dynamics at a cortical scale, in what may beregarded as a mathematical model of spontaneous lateral thought.This article is part of a Special Issue entitled: Neural Coding.

© 2011 Elsevier B.V. All rights reserved.

Keywords:Neural computationAssociative memoryCortical dynamicsModular network

1. Introduction: a universalcortical transaction?

Information-processing models of cognitive functions, a mostproductive approach developed over the last few decades,have usually described those functions in terms of sequencesof specialized routines, conceptually akin to components of acomplex computer code. For example, the computation of thetrajectory to reach a particular goal in space may be describedas entailing the transformation of spatial information arrivingthrough the senses from sensor-based to allocentric coordi-nates, then the construction or extraction from a memorystore of the relevant map of the environment, then thegeometric calculation of the available paths connecting thecurrent position and the goal, and of their properties, such as

time needed, energy expenditure, chances of failure (Hikosakaet al., 1999; Kawato, 1999; Tanji, 2001; Wolpert, 1997). Readingwritten text, instead, may be described as entailing theextraction of line and corner elements, the recognition of theabstract invariants characterizing each letter, the compositionof individual letters with error correction to form meaningfulwords, a further error correction stage that takes into accountneighboring words, and a cascade of higher-level lexical andsemantic processes (Plaut, 1999). Associative processes haveoften been seen as alternative side paths to the orderly usageof such specialized routines, “lateral thinking” that mayoccasionally provide a shortcut to, andmore frequently derail,the successful execution of a task.

The ready availability of functional imaging techniques hasencouraged the further elaboration of such information

B R A I N R E S E A R C H 1 4 3 4 ( 2 0 1 2 ) 4 – 1 6

⁎ Corresponding author. Fax: +39 040 3787528.E-mail address: [email protected] (A. Treves).URL: http://www.sissa.it/~ale/ (A. Treves).

0006-8993/$ – see front matter © 2011 Elsevier B.V. All rights reserved.doi:10.1016/j.brainres.2011.07.030

ava i l ab l e a t www.sc i enced i r ec t . com

www.e l sev i e r . com/ loca te /b ra i n res

https://www.researchgate.net/publication/222464457_A_connectionist_approach_to_word_reading_and_acquired_dyslexia?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2


processing models, promising to assign distinct cortical areasas the theatres for the operation of several of the routines. Yet,activation patterns observed with fMRI cannot resolve singleneuron activity, which would be necessary in order to testinformation processing models at the algorithmic level.Evidence from neurophysiological recordings in brain slices,in rats and in macaques, and sparsely in human patients, onthe other hand, have essentially provided no evidence for anyother neuronal operation taking place in the cortex, other thanassociative processes: associative synaptic plasticity andassociative retrieval. The hypothesis has to be entertained,therefore, that the cerebral cortex may contribute nothing butassociative network processes, although theymay be “dressedup” in different guises depending on the connectivity of eachcortical area and on the codes it expresses. For such ahypothesis to be subject to validation or falsification, however,the notion of associative processes has to be made precise, forexample in terms of amathematically defined networkmodel.

The Hopfield (1982) model meets the requirements for amathematically well-defined model of associative memoryretrieval, as it could be implemented in a local corticalnetwork. Its cortical plausibility has been questioned, howev-er, because of several dramatic simplifying assumptions itrelied on, at least in the original version, as analyzedmathematically by Amit, Gutfreund and Sompolinsky (1985,1987). Moreover, it is a simplified model of memory retrievalbased on an even crudermodel of associativememory storage.Over the nearly 3 decades since it was put forward, the effectof many of those simplifications has been analyzed, mathe-matically and with computer simulations, and overall it hasbeen found not to alter the qualitative import of themodel (seeRolls and Treves, 1998). In this contribution, based in part on aPhD Thesis (Akrami, unpublished) and including some origi-nal results, we discuss quantitatively, with computer simula-tions and with reference to recordings in monkeys, some ofthe crucial conceptual steps that bridge the gap between theHopfield model and local cortical circuits, particularly withregard to how memory representations may be stored, andto the time scale for retrieval dynamics. The aim is to assessthe validity of a yet more abstract model of a local corticalnetwork—a single Potts unit—as a building block of models ofextended cortical networks, which operate exclusively throughassociative processes.

2. Results

In the original Hopfield model, a memory item is retrievedfrom the network when neural activity, stimulated by a partialcue (usually given as a starting condition for the network),evolves into a pattern strongly correlated with one of the prepresentations which have been stored on synaptic weights.Daniel Amit (1995) and others have pointed at such “attractordynamics” as a robust universal mechanism for memoryretrieval in the cerebral cortex, and in the hippocampus. Howsmoothly can this retrieval operation proceed, and how widethe “basins of attraction” are of the p memory states, shoulddepend on how memory representations are establishedduring the storage phase, which determines whether other

attractors may hinder or obstruct retrieval. In the hippocam-pus, new memory representations are believed to be estab-lished under the dominant and decorrelating influence of thespecialized dentate gyrus preprocessor, with its strong, sparseconnectivity to CA3, so an ad hoc analysis is required (Cerastiand Treves, 2010). To assess, instead, how the storage processaffects retrieval capacity in the cortex, where no dentateinputs are available, it is necessary to consider first the mainfactors that determine the effectiveness of attractor dynamics:connectivity, representational sparseness, the presence ofnoise.

2.1. Effective retrieval capacity with cortically realisticstorage processes

A most important factor that determines retrieval is thedegree of connectivity in the network. In the original Hopfieldmodel the connectivity is complete, i.e. each of the N units inthe network receives input from all other N−1 units (Hopfield1982). This simplifying assumption was linked to imposingsymmetry on the coupling constants, that is, the synapticweights, which in turn led to a great clarification of theproperties of auto-associative neural networks (Hopfield 1982;Amit et al., 1985, 1987). The analysis of network performancederived from statistical physics and applicable in the “ther-modynamic” limit N→∞ can however be extended to the casewhere the number C of inputs per unit is smaller than N, butstill it is regarded as very large, C→∞ (Sompolinsky, 1986);andeven in the so called “highly diluted” limit (C→∞ but C/N→0)considered by Derrida et al. (1987). The symmetry of theweights can likewise be discarded, leading to characterizesome interesting dynamical properties of asymmetric net-works (Derrida and Pomeau, 1986; Sompolinsky and Kanter,1986; Derrida et al., 1987; Kree and Zippelius, 1987; Crisanti andSompolinsky, 1987; Gutfreund et al., 1988). Overall the maininsight gained by introducing incomplete or sparse connec-tivity, C<N−1, is rather simple: the capacity of networks withsymmetric or asymmetric weights is primarily determined byC, and to a lesser extent by N. This result has been derivedmany years ago, for example from a signal-to-noise analysis,applicable when the synaptic weights encode p uncorrelatedmemory patterns represented as sparse activity distributionsof sparsity a (Treves and Rolls, 1991). “Noise” here denotes theinterference due to other memory patterns, what in thephysics jargon is dubbed quenched noise. The analysis showsthat the signal scales as C, and the noise as √(pC). The relationbetween the maximum number pc of patterns that can beturned into dynamical attractors, i.e. that can be associativelyretrieved, and the number C of connections per receiving unittakes the form, for sparsely coded patterns (in the limit a→0;Treves and Rolls, 1991)

pc e k Ca ln 1

a

� �

where k is a numerical factor of order 0.1–0.2. In this limit pc isindependent of N.

The essential advantage introduced by the sparse connec-tivity, if randomly diluted, is that quenched noise has less ofan opportunity to reverberate coherently. The signal from the

5B R A I N R E S E A R C H 1 4 3 4 ( 2 0 1 2 ) 4 – 1 6

https://www.researchgate.net/publication/13390701_Neural_networks_with_non-linear_synapses_and_static_noise?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2

https://www.researchgate.net/publication/16246447_Hopfield_JJ_Neural_Network_and_Physical_Systems_with_Emergent_Collective_Computational_Abilities_Proc_Natl_Acad_Sci_79_2554-2558?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2

https://www.researchgate.net/publication/2778367_The_Hebbian_paradigm_reintegrated_local_reverberations_as_internal_representations_Behav_Brain_Sci?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2

https://www.researchgate.net/publication/230982549_An_Exactly_Solvable_Asymmetric_Neural_Network_Model?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2

https://www.researchgate.net/publication/232045202_What_determines_the_capacity_of_autoassociative_memories_in_the_brain_Network_2371-397?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2


partial correlation with the item to be retrieved does summatecoherently and proportionally to the C inputs to each unit,independently of the exact organization of the connections inthe network. The fluctuations in the overlaps with the otherpatterns represent the sole noise source when the so called“fast” noise is low (T→0). These fluctuations dissipate awaythe more the C connections are dispersed among manypotential presynaptic units, so that fluctuations come fromdifferent sources and average out. A rigorous analysis, whichrevises the earlier notion of the dominance of feedback loops,has been put forward recently by Roudi and Hertz (2011). Inany case, the interference caused by the propagation of thenoise is stronger the higher is the proportion of units activealong the propagation path (the higher is a). For a given load(fixed p/C), diluted connectivity reduces therefore the influ-ence of this “quenched”, i.e. static noise, and performance isbetter the larger isN, if a>>0; if a→0,Nmakes no difference. Inthe rat hippocampus, for example, in particular in region CA3,the sparsity a has been estimated to be of order 0.05 (Papp andTreves, 2008) and C/N of order 0.04, and the main factor indetermining capacity is clearly C, not N. In the inferotemporalcortex of monkeys, instead, the sparsity a is difficult toestimate, because stimuli are used that tend to activate thecells being recorded, to avoid wasting time on mostly silentresponses; still, it is expected to be higher, perhaps even oforder 0.5 (Rolls and Tovee, 1995; Franco et al., 2007), and C/N isestimated in the range 0.1–0.2. These two factors, higher a andhigher C/N, combine to greatly reduce the retrieval capacity oflocal neocortical networks, even for pyramidal cells endowedwith the same connectivity C as hippocampal ones.

2.1.1. Fast noise and partial cuesThe older studies have mainly focused on networks that areassigned memory representations by fiat, with a prescribedstatistics, and operate in the absence of fast noise (thevariability due to synaptic failures and to stochastic emissionof action potentials) and are given as initial condition a nearlyfull cue. To obtain a quantitative assessment of a realisticcortical scenario, one needs to consider a connectivity thatinterpolates between the fully recurrent and symmetricattractor network studied by Amit et al. (1987), the highlydiluted model of Derrida et al. (1987) and the strictly feed-forward attractor network studied by Domany et al. (1989),butalso one needs to use a partial cue, given as a brief externalinput to a subset of the units in the network. One should notethat although the networkmay successfully retrieve a pattern,once given a full cue as the input, it may fail to evolve to theright configuration if the initial input is only partiallycorrelated with the stored pattern. That is, depending on thesize of the basin of attraction, the network may or may not beable to complete a distorted pattern. An auto-associativememory that only retrieves a stored activity pattern if given afull cue does not serve any useful purpose. Thus, the ability ofthe network to complete a partial cue needs to be assessed. Tothis end, in the original analyses summarized in Fig. 1 (toprow), we have presented the network with noisy cues. Thisform of noise is different from one trial to the next (unlike thequenched noise, which is due to synaptic weights that encodealso other patterns) but it does not vary in timewithin the trial(unlike what is properly called “fast” noise, see below); it is

introduced by replacing a portion (1−q) of the full input vectorby random values—its level is then quantified by a value 1−qbetween 0 and 1, indicating the proportion of units receivingnoise instead of the cue. In addition, one may model othersources of trial-to-trial variability, those like synaptic noisethat are usually called “fast” noise in the physics jargon, byintroducing an effective temperature. The variability modeledby a temperature is a major element in networks of binaryunits, which otherwise display, at steady state, only twoimplausibly distinct activity levels (Amit, 1989). It is lessessential in networks of graded response units, which displaycontinua of activity levels. Here we adopt a threshold-linearmodel for single units (Treves, 1990).

In the rest of this subsection, a threshold-linear auto-associative network is described in its capacity to retrievepatterns, as determined by the type of storage process. Theresults of averaging several simulations are plotted in twodimensional “phase diagrams”, which chart the degree ofsuccessful retrieval as a function of the number of connectionsC, on the x-axis, and either the cue level q or the number ofstored patterns p, on the y-axis. The simple model used hasbeen discussed in detail elsewhere (Treves and Rolls, 1991).

We focus on the three comparisons schematized in Table 1,which are contrasting

• Pattern Statistics: a Binary Distribution (BP) vs. a ContinuousDistribution (CP).

• Pattern Generation: a Self-Organized network (SO) vs. anetwork with an External Assignment (EA) of patterns.

• Architecture: a 1-layered network (1 L) vs. a 2-layerednetwork (2 L), which receives the external input via feed-forward afferents.

In all simulations reported here, the level of sparseness inthe network is regulated, by acting on the common threshold,at a=0.2, while the input patterns which in different ways areproposed to the network are less sparse, ainput=0.3. This choiceemphasizes the beneficial effect of allowing the network tostore patterns not identical to those it receives, which is thecore advantage of self-organization. The size of the network isalso set to N=1000, whereas the number of recurrentconnections per unit is varied, varying as a consequence alsothe degree of “dilution”, 1−C/N, virtually from 0 to 1 (actually,to 99%, when C=10, approaching the highly diluted limit).Besides varying the connectivity, the cue level q is varied fromfull to minimal cues (in the top row of Fig. 1 the fraction ofunits receiving randomnoise as input varies from 0 to 90%, forp=40), or thememory load is varied fromminimal to extensive(in the bottom row of Fig. 1 the number of memories stored inthe network varies from p=1 to p=70, with full cues, q=1). Thedependent variable is the fraction of runs with identicalparameters that terminate in a steady state highly over-lapping (m>0.8, see Experimental procedures) with the cuedmemory representation. Results are smoothed to suppress theconsiderable variability due to the limited size of the network.

2.1.2. 1st comparison: binary vs. continuous patternsMemory representations are assigned to the network asprescribed activity patterns, hence with an external assign-ment (EA) procedure, but either as binary patterns (BP), with a

6 B R A I N R E S E A R C H 1 4 3 4 ( 2 0 1 2 ) 4 – 1 6

https://www.researchgate.net/publication/233820301_Modeling_Brain_Function?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2

https://www.researchgate.net/publication/6412355_Neuronal_selectivity_population_sparseness_and_ergodicity_in_the_inferior_temporal_visual_cortex?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2

https://www.researchgate.net/publication/13391539_Information_storage_in_neural_networks_with_low_levels_of_activity?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2

https://www.researchgate.net/publication/230982549_An_Exactly_Solvable_Asymmetric_Neural_Network_Model?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2

https://www.researchgate.net/publication/15439322_Rolls_E_T_Tovee_M_J_Sparseness_of_the_neuronal_representation_of_stimuli_in_the_primate_temporal_visual_cortex_J_Neurophysiol_73_713-726?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2



randomly picked fraction ainput of the units in a state of activity1, and the rest quiescent (state 0), or as continuous patterns(CP), with the activity state assigned to each active unitindependently of other units, from a continuous distribution,constrained to have equal mean and variance (see Experi-mental procedures). The comparison is between the leftmostand the second column of Fig. 1. One can see that

• Bothwith binary andwith continuous input patterns, higherconnectivity is beneficial, either in completing noisier cues(top row) or in maintaining a steady attractor state in theface of increasing memory load (bottom row).

• Attractors produced by binary input patterns are not as wellretrieved as those produced by continuous ones because,due to the sparsity mismatch we have chosen, the networkhas to suppress, at retrieval, part of the externally assignedinput pattern (1/3, with our parameters).

• Neither in the binary, nor in the continuous case, are theattractors really affected by noise in the cue: at p=40, activitytends to always converge to the correct basin of attraction, if

the connectivity is sufficient, exceptwhen the cue isminimal,q<0.2–0.3. That is, basins of attraction are wide.

• In the binary case, the storage capacity of a network with1000 units appears to be effectively limited to about 40patterns, with full connectivity, whereas with continuouspatterns the limit is higher, at about 70 patterns.

2.1.3. 2nd comparison: external assignment vs. self-organizationThe second term in the comparison above serves as the firstterm in the next comparison: continuous memory patterns,assigned by fiat to the network, as they are randomly drawnfrom a certain distribution. The comparison is with a self-organizing network, that receives the very same memorypatterns, but then reverberates them along its recurrentcollateral connections already during the storage phase,continuously regulating the sparseness of their activity byacting on the common gain and threshold of all the units. Thisreverberation, before setting the recurrent connectionweightswith the usual “Hebbian” learning rule, allows reaching a

Fig. 1 – The retrieval capacity of an auto-associative network depends, but only quantitatively, on the character of the storageprocess. Four storage procedures are contrasted in the four columns, as explained in the text. Retrieval capacity is quantified bythe fraction of runs that endwith an overlap>0.8with the attractor correlated to the cue. Averages are taken over 10 simulationswith different random seeds for each set of parameters, and a smoothing algorithm is applied among nearby sets. The whiteregions then correspond to prevailingly successful retrieval, as a function of the number of connections per unit (x-axes) and ofthe noise in the cue (y-axes, top row) or the memory load (y-axes, bottom row).

Table 1 – The 3 comparisons illustrated in Fig. 1.

Pattern statistics Pattern generation Architecture

1st comparison Binary (BP) EA 1 L↕

Continuous (CP)2nd comparison CP External assignment (EA) 1 L

↕Self-organized (SO)

3rd comparison CP SO 1-layer (1 L)↕

2-layer (2 L)

7B R A I N R E S E A R C H 1 4 3 4 ( 2 0 1 2 ) 4 – 1 6


steady state which may be compatible, in some sense, withthe other memories already stored on the same connections.Usually, however, reverberation through excitatory connec-tions tends tomake attractors coalesce with each other, and isbetter suppressed. Competitive interactions, expressed in ourmodel by threshold regulation, instead are beneficial, if theyallow the storage of sparser patterns than those provided inthe input. The suppression of recurrent excitatory activityduring information storage is thought to be achieved viaAcetylcholine, in the real brain (Hasselmo et al., 1995), and it ismodeled in our simulation via a factor M (see Experimentalprocedures). For clarity, we present in Fig. 1 results for M=0,complete suppression, so that only the beneficial effect ofthreshold regulation is manifest. With the threshold imposingsparser patterns at storage, they are less interfering, with eachother, than when they are imposed from outside with nochance for adjustments. The comparison is between the twocentral columns of Fig. 1. One can see that

• Self-organization, in its core form of threshold setting, mayhave a dramatic beneficial effect: the same continuousmemory attractors that have to be thresholded only atretrieval, if patterns are imposed as they are (EA, left) can bethresholded and hence become sparser already at storage,making them more robust to interference, if they areallowed to self-organize (SO, right).

• With full cues, self-organization (in fact, threshold setting)increases by about twice the maximum number of memo-ries that can be retrieved with a given connectivity (bottomrow).

2.1.4. 3rd comparison: 1-layer vs. 2-layerIn the last comparison, memory attractors self-organizethrough threshold setting, but from inputs either receiveddirectly by the network, as in the second term of thecomparison above, or received by units in a separate inputlayer, and relayed through a set of feed-forward weights set attemporally constant but random values (see Experimentalprocedures). Note that we assess the effect of adding an inputlayer by comparing self-organizing networks, in order toisolate it from the effect of self-organization itself, in theprevious comparison. The feed-forward weights result in atruncated Gaussian distribution of activity levels at differentunits. In terms of first order statistics, this distribution issimilar to the exponential distribution (subsequently thre-sholded at zero) with which it is compared; and both aresimilar, only sparser, to that externally assigned with thecontinuous distribution of values. Subtle correlations betweendifferent units and different memory patterns are howeverinduced by the fact that the same input units determine,through the afferent connections with constant weights, theactivity distributions to be encoded in the recurrent weights,after reverberation. In the 1-layer case, there are no correla-tions in the inputs.

In the 2-layer network, the feed-forward layer thereforereplaces the random number generator in creating inputpatterns that are then thresholded (and in general, whenM≠0,also self-organize) in the second layer, with the usualprocedure. The comparison is thus between two slightlydifferent versions of a random generator, a synthetic com-

puter routine (instructed with the continuous distributiondescribed in Experimental procedures and churning outindependent numbers every time) and a neurally plausibleone (generating a somewhat different distribution). Thecomparison is between the two rightmost columns of Fig. 1.One can see that

• Performance is similar: inserting the input layer does notmake much of a difference, as expected.

• Still, the 1-layer network performs better with noisy cuesand with extensive memory loads, compared to the 2-layernetwork, in that the connectivity required for effectiveretrieval, in the presence of noise and with substantial load,is significantly higher with the additional input layer.

• Subsequent analysis show that the 1 L advantage is not dueto the above pairwise correlations, but to the tail of theGaussian distribution producing a different number of unitsfiring at very high rates than in the 1 L case: curtailing thefiring with a pre-set saturation level makes the twonetworks equivalent (not shown).

Comparisons across the overall sequence of 4 stages, asillustrated in Fig. 1, indicate that the details of the storageprocess have a surprisingly limited quantitative effect on theretrieval performance of an attractor network, and they do notfundamentally alter the capacity for retrieval of an associativemechanism based on modifying synaptic weights. In fact, theenhanced performance of the “self-organized” networks, i.e.the last two columns in Fig. 1, is entirely due to the chosenmismatch between a=0.2 and ainput=0.3, which is betteraddressed by threshold setting before storage, rather thanafter. Whichever the storage process, connectivity is the maindeterminant of retrieval capacity. With our parameters,replacing the cortically implausible binary patterns, assumedin the early analyses of the Hopfield model, with continuousones leads to higher capacity, because it facilitates sparsifica-tion at retrieval (Treves and Rolls, 1991). Capacity becomeseven higher when allowing already during storage forsparsification through threshold setting (which can be imple-mented in spiking networks via feedback inhibition; Battagliaand Treves, 1998), and it is maintained if the inputs are notprovided directly to individual units in the network, but fed byan input layer and relayed through plausible connectivity.

2.2. Time scales for getting in and out ofattractor dynamics

In the comparisons above, retrieval occurred in the absence ofneural fatigue and other adaptation processes, hence makingthe attractor states genuine long-term steady states of thesystem. In the cortex, a variety of adaptation effects alter thebiophysics of neurons and synapses over a range of timescales, as a function of their recent activity, making the idea ofa steady state a purely theoretical construct (Gros, 2007). Is thenotion of dynamical attractor still relevant?

A theoretical analysis of the dynamics of a spikingrecurrent network suggests that it is. The time scale for theconvergence of network activity to a steady state was found tobe determined primarily by excitatory synaptic conductanceinactivation times, with only a very minor modulation by the

8 B R A I N R E S E A R C H 1 4 3 4 ( 2 0 1 2 ) 4 – 1 6

https://www.researchgate.net/publication/1888467_Neural_networks_with_transient_state_dynamics?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2

https://www.researchgate.net/publication/15576296_Hasselmo_ME_Schnell_E_Barkai_E_Dynamics_of_learning_and_recall_at_excitatory_recurrent_synapses_and_cholinergic_modulation_in_rat_hippocampal_region_CA3_J_Neurosci_15_5249-5262?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2



prevailing membrane time constants and inverse firing rates(Treves, 1993). An unsophisticated model of neuronal firingrate adaptation can be included in the spiking networkmodel,without altering the convergence to steady state, which to alarge extent occurs before adaptation has exerted any majoreffect; the “steady” state is simply not steady, and firing ratesdecrease, possibly leading to a rapid transition to anothernetwork state, after some time (Fig. 4), once the boundaries ofthe relevant basin of attraction for the fast dynamics arereached from the inside, that is, an escape route has beenfound. A subsequent analysis demonstrated that “sliding” tothe bottom of the basin of attraction can be even faster,essentially instantaneous, if the network receives an orientingafferent input when primed in a balanced state, in which it ismost sensitive to fluctuations, since they can grow veryrapidly (Tsodyks and Sejnowski, 1995). Computer simulations,of an auto-associative network of simple integrate and fireunits, indicate that the two regimes may correspond to twodifferent phases of convergence (Fig. 2). The match betweenthe network state and the stored memory, quantified by aninformation measure, initially rises with an exponential(“mean-field”) approach to a provisionally stable state, witha time constant which in the simulations turns out to beapproximately 2.5 times the synaptic inactivation timeconstant. The state reached when the cue is still active ishowever distorted, because the cue is only partial, and as soonas the cue is removed the information measure rises again,this time instantaneously—the apparent slope stems in factsolely from the resolution of the algorithm. This is a non-mean-field, fluctuation-driven switch-like event.

Can a similar time course of convergence to the attractor,including perhaps the two steps seen in the computer

simulations of Fig. 2, be observed also in the cortex? Since adynamical attractor could not possibly be a steady state in thereal brain, one cannot see the convergence to a plateau as inFig. 2; but onemay use cueswith variable degrees of noise, andtest whether neuronal responses eventually become noise-invariant (Amit et al., 1997). Since neurons in inferior temporal(IT) cortex tend to respond to complex visual stimuli, one canprobe their response dynamics by using, instead of variablenoise levels, stimuli that aremorphed or interpolated betweenpairs of familiar images. These stimuli can be conceived asgenerating cues, to putative attractor networks in IT, withvariable levels of correlation to the two familiar images, whichhence may have a memory representation. While one can seta precise stimulus onset and offset, cue onset and cue removalare not well defined, as they depend on the dynamics of neuralcircuits preceding IT in information flow. In the study byAkrami et al., 2009, averages of about 100 single-unit re-sponses tended to peak ~120 ms following stimulus onset,with firing rates that depended almost linearly on the degreeof morph (Fig. 3, left). The responses then declined, butremained above baseline for several hundred milliseconds.This sustained component remained linearly dependent onmorph level for stimuli more similar to the less effective of thetwo images, but progressively converged to a single responseprofile, independent of morph level, for stimuli more similarto the more effective image. Thus, these neurons appear todemonstrate the convergence posited by network models ofattractor dynamics, gradual like the initial approach to theattractor state in Fig. 2. No instantaneous switch, like thesecond step in Fig. 2, can be observed, possibly because cueremoval (the subsiding afferent excitation from neuronsupstream) is not sharp nor reproducibly timed, and becauseadaptation effects are already operating to bring down theaverage firing rates (note that the time courses of the firing ofindividual units show an extreme variety of time courses, seeAkrami et al., 2009).

A simple model of neuronal firing rate fatigue, in fact,allows interpreting the results of the neurophysiologicalrecordings in monkeys (Fig. 3, right). The model suggeststhat convergent dynamics could be produced by attractorstates and firing rate adaptation within the population of ITneurons. The convergence may be relatively rapid, occurringover less than a hundred milliseconds, but delayed untilstimulus offset, as the cue effectively clamps network activityin a state of partial correlation with the attractor, as seenbefore cue removal in Fig. 2.

The model-based interpretation of the recordings in Fig. 3(see Akrami et al., 2009) indicates that the activity configura-tion of an attractor network depends on the balance, in theinstantaneous input current that enters each unit, betweenthe contribution of afferent inputs, relaying the cue, and thatof recurrent connections with memory-structured weights.The competition between feed-forward and recurrent inputsdetermines the network trajectory in its phase-space: to fullyconverge to the memory attractor, or to stay close to theactivity distribution dictated by the external input. A networkwith stronger recurrent weights can evolve faster towards itsasymptotic state. When the external input is totally washedaway, the asymptotic state is the same for any finite value ofrecurrent strength, above the minimum required to access it.

Fig. 2 – A rapid initial approach to an attractor state, welldescribed by mean-field theory, is adjusted almostinstantaneously once the noisy cue is removed. The graph isadapted from Fig. 3 of Battaglia and Treves, 1998, and thelegend details the synaptic inactivation time constants usedin 5 different sets of simulations of a recurrent spikingnetwork (each curve averages all runs in a set). The thickblack line indicates the timewhen a noisy cue current (with a0.3 correlation to the attractor state) is applied to some of theunits in the network, and then suddenly removed. No firingrate adaptation is included in the model.

9B R A I N R E S E A R C H 1 4 3 4 ( 2 0 1 2 ) 4 – 1 6

https://www.researchgate.net/publication/14075887_Amit_D_J_Brunel_N_Model_of_global_spontaneous_activity_and_local_structured_activity_during_delay_periods_in_the_cerebral_cortex_Cereb_Cortex_7_237-252?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2

https://www.researchgate.net/publication/245440688_Treves_A_Mean-field_analysis_of_neuronal_spike_dynamics_Network_4_259-284?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2

https://www.researchgate.net/publication/232074631_Rapid_switching_in_Balanced_Cortical_Network_Models?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2

https://www.researchgate.net/publication/23139941_Converging_Neuronal_Activity_in_Inferior_Temporal_Cortex_during_the_Classification_of_Morphed_Stimuli?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2



https://www.researchgate.net/publication/13757386_Stable_and_Rapid_Recurrent_Processing_in_Realistic_Autoassociative_Memories?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2


2.2.1. Latching dynamicsIf the external input is exactly half-way between two attractingstates, the network can be kept in equilibrium until the cuebegins to weaken, and a fluctuation tips the balance one way orthe other. This is shown in the simulation of Fig. 4a, where afluctuation favoring one of the two cued memories, and visible

already after about 100 simulated milliseconds, grows as theexternal input weakens, and almost reaches full correlationwith the memory itself. By that time, however, the units thathad been active from the beginning are turned inactive byfatigue, and the network temporarily stabilizes into a (different)state with intermediate overlap, until also the units activatedlater are turned off. At that time thenetwork ceases to showanyoverlap with either attractor state; activity however persists insome of the units. This is an uncorrelated network state, akin tothat described by Amit and Brunel (1997). In our simulation,given the continuing (uncorrelated) drive to its units, it isunstable. Then, around 500 ms in the simulation, a correlationwith one of the twomemory patterns develops spontaneously,because of the instability of the uncorrelated state. Thecorrelation stays up until turned off by firing rate adaptation,by which time the other attractor state turns on, and so itproceeds in a flip-flop fashion. The protracted uncorrelatedperiod and the alternation between the two attractor states aredue to their being the only memories stored in the network: InFig. 4b, with 6 memories, the network does not sit in theuncorrelated state, but it still ends up alternating between twoattractors, likely slightly more correlated or anti-correlatedbetween themselves than with the others (see Russo et al.,2008), one of which happens to be one of the two cued ones.Finally in Fig. 4c, with 15 memories, the network hops freelyfrom attractor to attractor, once each ismade unstable by firingrate adaptation: the network then shows latching dynamics.Note that latching is less regular than alternation, and eachlatching transition, which can be conceptualized as the decay-ing attractor acting as an internal cue for the next, follows asomewhat different time course.

2.3. Latching dynamics at a cortical scale

In a single auto-associative memory network, as discussedabove, latchingdynamicsare observed if either theuncorrelated

Fig. 3 – Convergent average single-unit responses to morphed stimuli, seen in monkey IT cortex (left), are reproduced in anattractor network model including firing rate adaptation (right). The experimental study and the model are described in detailby Akrami et al. (2009). In the model, firing rate adaptation is introduced as a difference of exponentials, of time constants 50and 100 ms (see Experimental procedures). The cue is removed gradually and at somewhat different times for different units(the gray dashed line is the average time course) tomimic the neural response to stimulus offset at 312 mspost onset (black bar).Attractor convergence is gradual, both in the real data and in the model, and the rapid switch demonstrated in the simulationsin Fig. 2 is not observed; likely because of the gradual subsiding of the input cue.

Fig. 4 – Latching dynamics can follow cue-driven retrieval, if anetwork is unstable in its uncorrelated state. Increasing thememory load of a small recurrent network from 2 to 15memories makes the spontaneous dynamics, which followthe successful or aborted retrieval elicited by an ambiguouscue (thin gray line), turn from a simple alternation into adiverse hopping sequence, from attractor to attractor. Thenetwork is the same as the one in Fig. 3, right, only half itssize, N=500, with the same firing rate adaptation model, andit is externally assigned with binary patterns.

10 B R A I N R E S E A R C H 1 4 3 4 ( 2 0 1 2 ) 4 – 1 6



state is unstable—and then a very small random fluctuation,correlated with one of the memory patterns, can be amplifiedand cue its retrieval in an unpredictable manner—or else, if theuncorrelated state is stable andhas a finite basin of attraction ofits own, if thememory patterns are however strongly correlatedwith eachother, so that thedecayingpattern itself cues thenextone.While the first typeof transition canbe regardedas inducedby noise, the second type explicitly reflects the correlationalstructure of the stored memories. For both types, the first unitsto change state in a self-reinforcing manner are some amongthe inactive ones, which are activated by the latching cue.

Similar latching dynamics have been described for the Pottsauto-associative network (Kanter, 1988), which is intended tomodel a network ofmany local auto-associative networks in thecortex (O'Kane and Treves, 1992a; Fulvi Mari and Treves, 1998).The Pottsmodel reduces each local network to a single unit of aparticular type, a Potts graded response variable,which can takegraded activation values 0<σk<1 in S different states, k=1,…,S,as well as remain in the inactive state to a degree σ0=1−Σkσk. Itoffers the advantage of simplifying the analysis of auto-association mechanisms by removing local dynamics, andfocusing attention on global dynamics. A global cortical activitypattern is interpretedby themodelas thecompositionof severalactive states, each expressed in a cortical patch or smallnetwork, whose internal dynamics are not described, exceptby means of the collective variables σk. Other, non active smallnetworks are taken to be in the inactive state, that is, σ0≈1. Theretrieval capacity of a non-adapting version of the Potts modelhas then been analysed in detail (Kropff and Treves, 2005).When endowedwith amodel of firing rate adaptation, the Pottsnetwork exhibits latching dynamics with the same types oftransitions, between uncorrelated and between correlatedmemory patterns, as seen in a model of a single cortical patch(Russo et al., 2008). In addition, the Potts network can exhibit“pathological” oscillations between highly correlated patterns,what can also be observed in models of a single local network.

To introduce firing rate adaptation in a network in whichsingle neuron-like units are not represented, two distinctprocesses are summarily described as activity-dependentthresholds. One, drivenwith time constant τ2 by the activationσk of each state, represents fatigue specifically in the neuronsactive in that state; the input activation feeding into that stateis then compared to that specific threshold. The other, drivenwith time constant τ3 by the summed activation Σkσk acrossstates, represents overall resource consumption at the localpatch level, as well as slow non-specific inhibition; a generalactivity-dependent term is then added to the fixed (activity-independent) common threshold of each Potts unit. Neuronaldynamics is taken to evolve more rapidly than adaptationeffects, at a characteristic time scale τ1, at which inputactivations are reflected into the corresponding activationvalues of each state. It is at this rapid time scale that, asdiscussed above, attractor dynamics take place (a differentmodel in which the time constant τ3 is short and representsfast inhibition is discussed in a thorough analysis by Russoand Treves that we have now in preparation). In the Pottsauto-associativemodel, the positive feedback characteristic ofattractor dynamics is represented by a self-reinforcing term, ofstrength w, with which each activation σk feeds into itself.

When τ1<<τ2<τ3, several different phases characterizelatching dynamics (Russo and Treves, unpublished results).If w is small, no latching occurs, unless the inactive state ofeach Potts unit is artificially made unstable by low or evennegative thresholds, in which case latching is among uncor-related states. Asw grows, the network enters a phase of phasiclatching, inwhich it latches for only a few transitions, betweencorrelated memories, followed by a tonic phase in which itlatches until the generic thresholds prevent further transi-tions, hence for a time of order τ3. If w is even larger, latchingcan be sustained for an indefinitely long time, as the genericthresholds become ineffective to stop it. In addition, severalother parameters affect latching duration, and in particular a

Fig. 5 –Global latching in the Potts network can terminate abruptly after a few transitions or carry on for a long time, irrespectiveof the typical interval between individual transitions, which is regulated by the time constant τ2. In the simulations exemplifiedhere, a network of 300 Potts units each connected to 70 others has been structured with p global memory patterns, where p=70above and p=90 below. S=6, τ1=16 ms, τ2=500 ms, while τ3=250 s is effectively irrelevant, as latching here is of a phasic nature.

11B R A I N R E S E A R C H 1 4 3 4 ( 2 0 1 2 ) 4 – 1 6

https://www.researchgate.net/publication/13389134_Potts-glass_models_of_neural_networks?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2

https://www.researchgate.net/publication/1844110_The_storage_capacity_of_Potts_models_for_semantic_memory_retrieval?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2

https://www.researchgate.net/publication/232033581_Why_the_simplest_notion_of_neocortex_as_an_autoassociative_memory_would_not_work?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2

https://www.researchgate.net/publication/231074931_Free_association_transitions_in_models_of_cortical_latching_dynamics?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2

https://www.researchgate.net/publication/13401728_Modeling_neocortical_areas_with_a_modular_neural_network?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2


higher memory load leads to longer latching, as illustrated inFig. 5.

As for a local auto-associative network, each latchingtransition in the global Potts network is initiated by the self-reinforcing activation of some of the units previously in thequiescent state. It can be shown (Russo and Treves, unpub-lished) that these units are those with the lowest threshold tobe activated into the next global pattern, and their activationtriggers a cascade of positive feedback that culminates intothe “flip” into another attractor state — although in practicethe transition can be much messier than a clean flip, as Fig. 5illustrates.

3. Discussion

Although the notion of associative retrieval is firmly en-sconced into our intellectual and literary awareness, withProust's madeleine a much abused cliché, its contribution toshaping ideas about the cortical network operations underly-ing thought processes has remained unclear. Grand theoriesof conscious (Dehaene et al., 1998) or specifically linguistic(Hauser et al., 2002) processing seem compatible with animplementation in terms of associative mechanism, but aprecise correspondence has been obstructed by a number offactors.

First,mostmodeling studies focus on a single local networkinwhich thememory states are prescribed and stored as givenon connection weights, hence implicitly expressing a super-vised form of training. It has not been clear how well an auto-associative network can self-organize, in a substantiallyunsupervised manner, especially when embedded in a globalnetwork of cortical patches, remote from the sensory periph-ery and its clear-cut input signals.

Second, an early understanding of associative retrieval wasbased on mathematical network models evolving in discretetime steps, leading to the mistaken notion that attractordynamics require many “iterations” and long times to unfold,and at the end reach a stable state which would be in starkcontrast with the ever-changing nature, on a sub-second timescale, of cortical activity. How fast would attractor dynamicscascade, across an extended network of cortical areas, hasbeen a particularly elusive issue to grasp, amid persistentnotions of cortical hierarchies that would have to be traversedin a strict sequence.

Third, while it is clear what associative memory retrievalfrom a cue provided by a sensory stimulus is, the notion ofinternally generated cues has remained rather fuzzy, partic-ularly the scenario which envisages parallel tumultuousprocessing of multiple internal cues at distinct locations inthe cortex. It appears unlikely that spontaneous and seem-ingly haphazard processing of that sort can submit to theorderly rigidity of rational thought; at most, it may sub-serveintuition and day-dreaming, it would seem.

A series of modeling studies, which we have partlyreported and partly reviewed here, contributes somewhat tobridge the gap between the neuronal and the cognitive level,by addressing the first, the second and the third issue above.The resulting perspective, while abstract, is consistent with anumber of disparate observations.

Anatomically, Braitenberg (1978) has long ago observed thatlong-range cortical connections, which he denotes as the Asystem, do not appear to express a qualitatively distinctfunctional role from the B system of local connections amongpyramidal cells, connections that do not leave the gray matter.Although the A system is more structured, its biophysical andbiochemical properties seem broadly the same, and the A and Bsystems have been modeled as implementing the same auto-associative operations on two different spatial scales (O'Kaneand Treves, 1992b). This leads to the speculation that globalnetwork operations may be reducible to the combination ofmultiple instances of a universal local operation, the “corticaltransaction”, implemented with distinct flavors along thewiring diagram relevant to distinct cognitive processes.

Neurophysiologically, associative processes appear suitedto describe cross-modal interactions, whether studied withlesions in the rat (Winters and Reid, 2010), with single-unitrecordings in monkeys (Rolls et al., 1996) or with imaging inhumans (Goldberg et al., 2006).

Iterative associative retrieval, or latching dynamics as wehave called it, endows analog neuronal processing with someof the digital character of symbolic manipulation, as illustrat-ed in Figs. 4, 5. Something similar occurs in the interestingmodel proposed by De Almeida et al (2006). The discretizationof a highly dimensional continuum of firing rate configura-tions is approached via attractor dynamics. As a quasi-discrete process, latching dynamics can be characterized asapproximately recursive. Recursion, referred to the generationof infinite sequences of elements drawn from a finitealphabet, is an abstract and very loose notion. In the languagedomain, it has often been taken to refer to a number of specificexpressions, like the embedding of clauses one inside theother in syntax; but attempts to identify language- or syntax-specific recursive neural network operations are likely to be ill-directed, as discussed elsewhere (Treves, 2005). More promis-ing appears to be the approach taken forty years ago by DavidMarr (1970), who proposed to regard the cerebral cortex interms of its ability to decode the outside world using memoryand associative processes. Braitenberg and Schüz (1991) alsoconcluded, from series of anatomical observations, that ingeneral terms the cortex must operate as an associativememorymachine. The “synfire chain”model byMoshe Abeles(1982) and Abeles et al. (1993) gives associative retrieval adynamical character, although over time scales that may betaken to be much shorter than those, related to firing rateadaptation, pacing latching dynamics. Pulvermuller (2002) hasindeed observed that synfire chains, or objects of a similarnature, could be at the basis of the human language faculty,even though themechanisms underlying syntactic operationsremain to be identified.

Unlike the state of a single neuron in a local network,however, the activation of a Potts unit can be thought tocorrespond, extrapolating the model to the real cortex, to ameaningful local pattern of activation, which can in principlebe decoded and perhaps lexicalized or verbally described. Inother words, while the firing rates r of individual neuronsunderlie psychological thought processes but in an implicit,non-transparent manner, for the variables σk of the Pottsmodel there is potentially an explicit correspondence withsurface phenomena of mental behavior.

12 B R A I N R E S E A R C H 1 4 3 4 ( 2 0 1 2 ) 4 – 1 6

https://www.researchgate.net/publication/11022070_The_Faculty_of_Language_What_is_It_Who_Has_it_and_How_Did_It_Evolve?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2

https://www.researchgate.net/publication/11252148_A_brain_perspective_on_language_mechanisms_From_discrete_neuronal_ensembles_to_serial_order?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2

https://www.researchgate.net/publication/18770727_A_Theory_for_Cerebral_Neocortex?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2

https://www.researchgate.net/publication/7107421_Perceptual_knowledge_retrieval_activates_sensory_brain_regions_Journal_of_Neuroscience_26_4917-4921?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2

https://www.researchgate.net/publication/6644391_Concatenated_retrieval_of_correlated_stored_information_in_neural_networks?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2

https://www.researchgate.net/publication/47636923_Frontal_latching_networks_A_possible_neural_basis_for_infinite_recursion?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2

https://www.researchgate.net/publication/13459467_A_Neuronal_Model_of_a_Global_Workspace_in_Effortful_Cognitive_Tasks?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2

https://www.researchgate.net/publication/239061075_Cortical_architectonics_general_and_areal?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2

https://www.researchgate.net/publication/44575640_A_Distributed_Cortical_Representation_Underlies_Crossmodal_Object_Recognition_in_Rats?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2


In conclusion, the idea that a blind, uncontrolled processsuch as associative latching dynamics may be all there is tocompositionality, and ultimately to cortical cognition, maysound disappointing. Latching dynamics, though, emergesnaturally from the associative processes thought to beimplemented pervasively throughout the cortex (Renartet al., 1999), and requiring the freely available ingredients ofextensive connectivity, associative plasticity and, at most,firing rate adaptation. As discussed elsewhere, latching mayextend to longer and longer times and become potentiallyinfinite, and thus more distinctly recursive, with a suitablyextended connectivity, as humans appear to be equippedwith(Treves, 2005). If more sophisticated cortical mechanismsexist, to support rational cognition, it appears that they havenot been identified yet.

4. Experimental procedures

The neurophysiological recording experiments, the spikingnetwork simulations, mean-field analysis and Potts networkstudy mentioned in the text are all described in detail in thepublications cited. In this section, we describe the firingrate simulations providing the core original results reportedhere.

We have studied a simple but broad class of sparse,asymmetric random networks in which all specific connec-tions are excitatory, and inhibition is provided by aglobal mechanism that simulates a single interneuron: itreceives excitation from all primary neurons and it inhibitsthem all equally at the next time step. The network wasadapted to make three comparisons (as sketched in Table 1),based on “pattern statistics”, “pattern generation”, and“architecture”.

4.1. Pattern statistics

Each pattern of activity μ, with μ=1, … ,p, is represented by avector ξ1μ=(ξ1μ, ξ2μ, …, ξNμ ), where ξiμ represents the level ofactivity of neuron i.

Continuous Distribution (CP): p uncorrelated graded activ-ity patterns are generated using a common exponentialdistribution obtained by setting for each input unit

ξμi = −12log 1−

xa

� �

if x<a, and ξ=0 if x>a, where x is a random value with auniform distribution between zero and one. Note that withthis procedure

p ξμi� �

= a exp−2ξμi + 1−að Þδ ξμi� �

: ð1Þ

The sparseness a of the representation is defined as

a =∑N

i=1ri =N

� �2

∑N

i=1r2i =N

:

Binary Distribution (BP): in this scheme, instead, the unitsare randomly set to zero or one according to the probabilitydistribution

p ξμi� �

= aδ ξμi −1� �

+ 1−að Þδ ξμi� � ð2Þ

Note thatwith either assignment, the “quenched” variablesξiμ ‘s are positive (or zero) and satisfy approximately theconstraints <ξ>=<ξ2>=a, where <> stands for an averageover their distribution p(ξ).

4.2. Pattern generation

For this comparison, a 1-layer Hopfield-like network, in whichthe inputs are directly imposed to the recurrent layer (EA) iscontrasted with an identical network that receives the verysame memory patterns except that during the storage phase,before setting the synaptic weights according to a “Hebbian”learning rule, the network reverberates activity along itsrecurrent connections (with weights by a factor M, and inFig. 1 we setM=0, see below), and it regulates its common gainand threshold to reach to a steady state (SO). To differentiatethese two types of patterns, one may use the notation η for thepatterns already reverberated, adjusted and ready to be stored,whereas ξ denotes original patterns assigned with the statisticsdescribed by Eqs. (1), (2). In case of EA, η=ξ, while for SO, η≠ξ.

4.3. Architecture

The network model is comprised of either 1 or 2 layers.The 1-layer network (1 L) includes N units labeled by an

index i=1…N. Each unit receives CRC recurrent collateralconnections from other units. The probability that two unitsare connected does not depend on their indexes.

In the 2-layer network (2 L), the first layer functions as aninput stage that projects afferent inputs to the second layer, inanalogy to the input fromearlier visual areas tomore advancedareas. Units in the second layer receive inputs from the firstlayer, including anotherN=1000 units, as well as from units inthe same layer. Each unit in the (output) patch receivesCFF=350 feed-forward connections from the input array, andCRC recurrent collateral connections from other units in thepatch. Both sets of connections are assigned to each receivingunit at random. Weights are originally set at a uniformconstant value, to which a random component is added ofsimilar mean square amplitude, to generate an approximatelyexponential distribution of initial weights onto each unit.

4.4. The input and output of each unit

Once a pattern is imposed on the input layer, the activitycirculates in the network for 100 simulation time steps, eachtaken to correspond to ca. 12.5 ms (Treves, 2004). Eachupdating of unit i amounts to summing all excitatory inputs as

hi = ηinputi + M∑jwrc

ij rrcj −b

1N∑krrck

� �1Lð Þ

hi = ∑kwff

ikηinputk + M∑

jwrc

ij rrcj −b

1N∑krrck

� �2Lð Þ

ð3Þ

13B R A I N R E S E A R C H 1 4 3 4 ( 2 0 1 2 ) 4 – 1 6

https://www.researchgate.net/publication/8410640_Computational_constraints_between_retrieving_the_past_and_predicting_the_future_and_the_CA3-CA1_differentiation_Hippocampus_14535-556?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2

https://www.researchgate.net/publication/47636923_Frontal_latching_networks_A_possible_neural_basis_for_infinite_recursion?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2


The first two terms enable the memories encoded in theweights to determine the dynamics; the third term isunrelated to the memory patterns, but is an addition to thefixed threshold, designed to regulate the activity of thenetwork, so that at any moment in time, x≡ 1

N∑iri andy≡ 1

N∑i r2i both approach the prescribed sparseness value a.The simulation assumes a threshold-linear activation func-tion for each unit, r(t)=g(h(t)−Thr)Θ(h(t)−Thr), where Thr is thefixed threshold below which the input elicits no output, and gis a gain parameter. In the simulations, the induced activationin each unit is followed by a competitive algorithm thatnormalizes the mean activity of the (output) units, and alsosets their sparseness to a constant a=0.2 (Treves and Rolls,1991). The algorithm represents a combination of subtractiveand divisive feedback inhibition, and operates by iterativelyadjusting the gain g and threshold Thr+b of the threshold-linear transfer function. In Eq. (3),M can be any value between0 and 1, and parametrizes the contribution of recurrentcollaterals in driving the activity of each unit. As previouslyshown (Treves, 2004; Menghini et al., 2007) the best perfor-mance is obtained when collaterals are suppressed duringtraining, in line with the Hasselmo argument about the role ofcholinergic modulation of recurrent connections (Barkai andHasselmo, 1994). In the simulations reported in Fig. 1, M=0during learning, for simplicity, in all models, and M=1 duringtesting, but in Fig. 6 we show the effect of increasing M duringlearning, which effectively leads, if collaterals are not stronglysuppressed, to reduced storage capacity.

4.5. Synaptic weights

Recurrent connections, which are the storage site for thememory patterns, have their baseline weight modifiedaccording to a covariance “Hebbian” learning rule as:

wij =1Ca

∑P

μ=1cijξμi ξμj −ξ

� �ð4Þ

cij is a binary variable equal to 1 if there is a connectionrunning from neuron j to neuron i, and 0 otherwise. ξ is themean activity of unit j over all memory patterns.

4.6. Overlaps

One of the relevant order parameters measuring the quality ofretrieval is the overlap between the microscopic state of the

network and each pattern. To assess the storage capacity ofour model, for each value of p we gave the trained network afull cue, corresponding to one of the stored patterns, and after100 synchronous updateswemeasured the final overlap of thenetwork state with the presented pattern, i.e. the cosine of thetwo population vectors. If the final overlap is larger than 0.8retrieval was deemed successful. Repeating this process for 10different seeds of the random number generator and pdifferent patterns, the average ratio of successful retrieval isplotted in Fig. 1. Note that in Fig. 5, overlaps are insteaddefined by subtracting the average activation of each Pottsstate, so they can be negative.

4.7. Implementation of firing rate decay (frequencyadaptation)

We implemented adaptation by subtracting from the inputactivation of each unit a term proportional to the recentactivation of the unit. The term is a difference of twoexponentials with different time constants:

ri tð Þ = g hi tð Þ−ατ1τ2τ2−τ1

r2i tð Þ−r1i tð Þð Þ� �

r1i tð Þ = r1i t−1ð Þexp −1τ1

� �+ ri t−1ð Þ

r2i tð Þ = r2i t−1ð Þexp −1τ2

� �+ ri t−1ð Þ

Where in units of time steps (note the msec used in thefigures) the time constants are set as τ1=4, τ2=8 and α=0.02.The input to each unit is then affected by its firing rate at allprevious time steps. The exponential decay makes its activityat the last time step more influential than all others. Thedifference of the two exponentials means that the effect ofadaptation appears only after the second iteration. Note thatthis formulation reduces the effectiveness of adaptation whent is small.

Acknowledgments

We are indebted to the late Daniel Amit, to Edmund Rolls,Francesco Battaglia, Emilio Kropff, Bharathi Jagadeesh andother colleagues for their contribution in a series of relatedstudies.

Fig. 6 – Increasing the weight of recurrent connections during storage leads to a capacity decrease, and to more sensitivity tonoise. Here the third diagram in the lower row of Fig. 1, for the 1-layer “self-organizing” network with M=0, is contrasted withthe equivalent diagrams for M=0.1, 0.5 and 1.

14 B R A I N R E S E A R C H 1 4 3 4 ( 2 0 1 2 ) 4 – 1 6

https://www.researchgate.net/publication/8410640_Computational_constraints_between_retrieving_the_past_and_predicting_the_future_and_the_CA3-CA1_differentiation_Hippocampus_14535-556?el=1_x_8&enrichId=rgreq-17603d5d-2c94-40c6-a241-43293fe0780a&enrichSource=Y292ZXJQYWdlOzUxNTY5MzM2O0FTOjEwMzE4MjI1MjM4MDE2NUAxNDAxNjExOTc1MDc2


R E F E R E N C E S

Abeles, M., 1982. Local Cortical Circuits. Springer, New York.Abeles, M., Bergman, H., Margalit, E., Vaadia, E., 1993.

Spatiotemporal firing patterns in the frontal cortex ofbehaving monkeys. J. Neurophysiol. 70, 1629–1638.

Akrami, A., Liu, Y., Treves, A., Jagadeesh, B., 2009. Convergingneuronal activity in inferior temporal cortex during theclassification of morphed stimuli. Cereb. Cortex 19 (4),760–776.

Amit, D.J., 1995. The Hebbian paradigm reintegrated: localreverberations as internal representations. Behav. BrainSci. 18, 617–657.

Amit, D.J., 1989. Modeling Brain Function. Cambridge UniversityPress, New York.

Amit, D.J., Brunel, N., 1997. Model of global spontaneous activityand local structured activity during delay periods in thecerebral cortex. Cereb. Cortex 7, 237–252.

Amit, D.J., Gutfreund, H., Sompolinsky, H., 1985. Storing infinitenumbers of patterns in a spin-glass model of neural networks.Phys. Rev. Lett. 55 (14), 1530–1533.

Amit, D.J., Gutfreund, H., Sompolinsky, H., 1987. Informationstorage in neural networks with low levels of activity. Phys.Rev. A 35 (5), 2293–2303.

Amit, D.J., Fusi, S., Yakovlev, V., 1997. Paradigmatic workingmemory (attractor) cell in IT cortex. Neural Comput. 9 (5),1071–1092.

Barkai, E., Hasselmo, M.E., 1994. Modulation of the input/outputfunction of rat piriform cortex pyramidal cells. J. Neurophysiol.72 (2), 644.

Battaglia, F.P., Treves, A., 1998. Stable and rapid recurrentprocessing in realistic autoassociative memories. NeuralComput. 10, 431–450.

Braitenberg, V., 1978. Cortical architectonics: general and arealIn: Brazier, M.A.B., Petsche, H. (Eds.), Architectonics of theCerebral Cortex. Raven, New York, pp. 443–465.

Braitenberg, V., Schüz, A., 1991. Anatomy of the Cortex.Springer-Verlag, Berlin.

Cerasti, E., Treves, A., 2010. How informative are spatial CA3representations established by the dentate gyrus? PLoSComput. Biol. 6, e1000759.

Crisanti, A., Sompolinsky, H., 1987. Dynamics of spin systemswith randomly asymmetric bonds: Langevin dynamics and aspherical model. Phys. Rev. A 36 (10), 4922–4939.

De Almeida, R.M.C., Espinosa, A., Idiart, M.A.P., 2006.Concatenated retrieval of correlated stored informationin neural networks. Phys. Rev. E 74, 041912.

Dehaene, S., Kerszberg, M., Changeaux, J.P., 1998. A neuronalmodel of a global workspace in effortful cognitive tasksProc. Natl. Acad. Sci. U. S. A. 95, 14529–14534.

Derrida, B., Pomeau, Y., 1986. Random networks of automata: asimple annealed approximation. Europhys. Lett. 1, 45–49.

Derrida, B., Gardner, E., Zippelius, A., 1987. An exactly solubleasymmetric neural network model. Europhys. Lett. 4,167–174.

Domany, E., Kinzel, W., Meir, R., 1989. Layered neural networks.J. Phys. A 22, 2081–2102.

Franco, L., Rolls, E.T., Aggelopoulos, N.C., Jerez, J.M., 2007.Neuronal selectivity, population sparseness, and ergodicity inthe inferior temporal visual cortex. Biol. Cybern. 96 (6),547–560.

Fulvi Mari, C., Treves, A., 1998. Modeling neocortical areas with amodular neural network. Biosystems 48, 47–55.

Goldberg, R.F., Perfetti, C.A., Schneider, W., 2006. Perceptualknowledge retrieval activates sensory brain regions. J. Neurosci.26, 4917–4921.

Gros, C., 2007. Neural networks with transient state dynamics.New J. Phys. 9 art.109.

Gutfreund, H., Reger, J.D., Young, A.P., 1988. The nature ofattractors in an asymmetric spin glass with deterministicdynamics. J. Phys. A Math. Gen. 21, 2775–2797.

Hasselmo, M., Schnell, E., Barkai, E., 1995. Dynamics of learningand recall at excitatory recurrent synapses and cholinergicmodulation in rat hippocampal region CA3. J. Neurosci. 15,5249–5262.

Hauser, M.D., Chomsky, N., Fitch, W.T., 2002. The faculty oflanguage: what is it, who has it, and how did it evolve? Science298, 1569–1579.

Hikosaka,O.,Nakahara,H., Rand,M.K., Sakai, K., Lu,X.,Nakamura,K.,Miyachi, S., Doya, K., 1999. Parallel neural networks for learningsequential procedures. Trends Neurosci. 22 (10), 464–471.

Hopfield, J.J., 1982. Neural networks and physical systems withemergent collective computational abilities. Proc. Natl. Acad.Sci. U. S. A. 79, 2554–2558.

Kanter, I., 1988. Potts-glass models of neural networks. Phys. Rev.A 37, 2739–2742.

Kawato, M., 1999. Internal models for motor control and trajectoryplanning. Curr. Opin. Neurobiol. 9, 718–727.

Kree, R., Zippelius, A., 1987. Continuous-time dynamics ofasymmetrically diluted neural networks. Phys. Rev. A 36 (9),4421–4427.

Kropff, E., Treves, A., 2005. The storage capacity of Potts models forsemantic memory retrieval. J. Stat. Mech. Theory Exp. P08010.

Marr, D., 1970. A theory for cerebral neocortex. Proc. R. Soc. Lond. B176, 161–234.

Menghini, F., van Rijsbergen, N.J., Treves, A., 2007. Modellingadaptation aftereffects in associativememory. Neurocomputing70, 2000–2004.

O'Kane, D., Treves, A., 1992a. Why the simplest notion ofneocortex as an autoassociative memory would not work.Network 3, 379–384.

O'Kane, D., Treves, A., 1992b. Short and long range connections inautoassociative memory. J. Phys. A 25, 5055–5069.

Papp, G., Treves, A., 2008. Network analysis of the significance ofhippocampal subfields. In: Mizumori, S.J.I. (Ed.), HippocampalPlace Fields: Relevance to Learning and Memory. OxfordUniversity Press, New York, pp. 328–342.

Plaut, D.C., 1999. A connectionist approach to word reading andacquired dyslexia: extension to sequential processing. Cogn.Sci. 23, 543–568.

Pulvermuller, F., 2002. A brain perspective on languagemechanisms: from discrete neuronal ensembles to serial order.Prog. Neurobiol. 67, 85–111.

Renart, A., Parga, N., Rolls, E.T., 1999. Associative memoryproperties of multiple cortical modules. Network 10, 237–255.

Rolls, E.T., Tovee, M.J., 1995. Sparseness of the neuronalrepresentation of stimuli in the primate temporal visual cortex.J. Neurophysiol. 73, 713–726.

Rolls, E.T., Treves, A., 1998. Neural Networks and Brain Function.Oxford UP, Oxford.

Rolls, E.T., Critchley, H., Mason, R., Wakeman, E.A., 1996.Orbitofrontal cortex neurons: role in olfactory and visualassociation learning. J. Neurophysiol. 75, 1970–1981.

Roudi, Y., Hertz, J., 2011. Dynamical TAP equations fornon-equilibrium Ising spin glasses. JSTAT 03, P03031.

Russo, E., Namboodiri, V.M.K., Treves, A., Kropff, E., 2008. Freeassociation transitions in models of cortical latching dynamics.New J. Phys. 10 (1), 015008.

Sompolinsky, H., 1986. Neural networks with non-linear synapsesand a static noise. Phys. Rev. A 34, 2571.

Sompolinsky, H., Kanter, I., 1986. Temporal association inasymmetric neural networks. Phys. Rev. Lett. 57 (22), 2861–2864.

Tanji, J., 2001. Sequential organization of multiple movements:involvement of cortical motor areas. Annu. Rev. Neurosci. 24,631–651.

15B R A I N R E S E A R C H 1 4 3 4 ( 2 0 1 2 ) 4 – 1 6


Treves, A., 1990. Threshold-linear formal neurons inauto-associative nets. J. Phys. A 23, 2631–2650.

Treves, A., 1993. Mean-field analysis of neuronal spike dynamics.Netw. Comput. Neural Syst. 4, 259–284.

Treves, A., 2004. Computational constraints between retrievingthe past and predicting the future, and the CA3–CA1differentiation. Hippocampus 14, 539–556.

Treves, A., 2005. Frontal latching networks: a possible neuralbasis for infinite recursion. Cogn. Neuropsychol. 21,276–291.

Treves, A., Rolls, E.T., 1991. What determines the capacity ofautoassociative memories in the brain? Network 2, 371–397.

Tsodyks, M.V., Sejnowski, T., 1995. Rapid state switching inbalanced cortical network models. Netw. Comput. Neural Syst.6, 111–124.

Winters, B.D., Reid, J.M., 2010. A distributed cortical representationunderlies crossmodal object recognition in rats. J. Neurosci. 30,6253–6261.

Wolpert, D.M., 1997. Computational approaches to motor control.Trends Cogn. Sci. 1 (6), 209–216.

16 B R A I N R E S E A R C H 1 4 3 4 ( 2 0 1 2 ) 4 – 1 6

Lateral thinking, from the Hopfield model to cortical dynamics

Documents