Top Banner
A Pulse-Coupled Neural Network as A Simplified Bottom-Up Visual Attention Model Marcos Quiles, Roseli Romero, and Liang Zhao Department of Computer Science Institute of Mathematics and Computer Science University of S˜ ao Paulo - S˜ ao Carlos, SP, Brazil { quiles, rafrance, zhao }@icmc.usp.br. Abstract This work presents a bottom-up visual attention model based on a Pulse-Coupled Neural Network for scene seg- mentation. Each object in a given scene is represented by a synchronized pulse train, while different objects fire at dif- ferent phases. Taking this into account, the model focuses on one object at a time. Using this model, the limit of lin- ear non-separability can be easily overcome and computer simulations show its good performance. 1 Introduction The development of computer vision systems still probe to be a difficult task for computer scientists due to all the in- volved steps, as well as the complexity and great amount of data that needs to be analysed at the same time. To reduce the amount of incoming visual data obtained from the envi- ronment, the human visual system is able to select only the relevant amount of information necessary to perform tasks such as object recognition [16]. Visual Attention is the way that humans and other biological systems use to select rele- vant visual information [15]. Selective visual attention is used by the biological sys- tem to optimize its processing capacity. Selective attention is responsible for identifying the part of the visual input where the processing is performed and, at the same time suppressing irrelevant visual information [2]. Selective vi- sual attention is generated by a combination of information from the retina and early visual cortical areas (bottom-up attention) as well as feedback signals from areas outside of the visual cortex (top-down attention) [8]. The bottom-up attention is formed by simple features ex- tracted from the image, such as intensity, stereo disparity, colour, orientation, and others. All this information is com- bined to create a saliency map which represents the points of interest in the visual input. The top-down attention signals are responsible for being biased concerning the competition of all the points generated by the saliency map. This infor- mation can be, for example, a visual search for a specific object. Because of this biological evidence, selective visual at- tention can be used to develop artificial vision systems [2]. In this case, selective visual attention is used to reduce the amount of incoming data by selecting only part of the vi- sual information for further processing improving the per- formance or efficiency of the system. Several models of visual attention have been proposed and they can be divided in two groups. The first belongs to computational neuroscience area, where the computational models are used to simulate and understand biological sys- tems [5, 4, 8]. The second group is related to computer vision area, where the models are applied to reduce the amount of data analysed by the system [16, 10]. In this case, the models do not need to have a strong biological plausibility. The model proposed in this paper belongs to the sec- ond group. Two aspects motivate the present work. The first is to explore the parallel architecture typically observed in neural network models and the second is to use the de- velop model to face nonlinear separable problems, such as the Double Spiral problem [3]. In this paper, a visual attention model based on a Pulse- Coupled Neural Network is proposed. The Pulse-Coupled Neural Networks (PCNN) is a neural network composed of spiking neurons. It is based on the Linking-field neural network model proposed by Eckhorn [6]. The Linking- field neural network was projected as a minimal model to explain the experimentally observed synchronous feature- dependent activity of neural assemblies over large cortical distances in the cat cortex [9]. Considering that this neural network was proposed to study the cat visual cortex, that is the part of the brain that processes the information deriving Proceedings of the Ninth Brazilian Symposium on Neural Networks (SBRN'06) 0-7695-2680-2/06 $20.00 © 2006
6

A Pulse-Coupled Neural Network as A Simplified Bottom-Up Visual Attention Model

Mar 29, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Pulse-Coupled Neural Network as A Simplified Bottom-Up Visual Attention Model

A Pulse-Coupled Neural Network as A Simplified Bottom-Up Visual AttentionModel

Marcos Quiles, Roseli Romero, and Liang ZhaoDepartment of Computer Science

Institute of Mathematics and Computer ScienceUniversity of Sao Paulo - Sao Carlos, SP, Brazil

{ quiles, rafrance, zhao }@icmc.usp.br.

Abstract

This work presents a bottom-up visual attention modelbased on a Pulse-Coupled Neural Network for scene seg-mentation. Each object in a given scene is represented by asynchronized pulse train, while different objects fire at dif-ferent phases. Taking this into account, the model focuseson one object at a time. Using this model, the limit of lin-ear non-separability can be easily overcome and computersimulations show its good performance.

1 Introduction

The development of computer vision systems still probeto be a difficult task for computer scientists due to all the in-volved steps, as well as the complexity and great amount ofdata that needs to be analysed at the same time. To reducethe amount of incoming visual data obtained from the envi-ronment, the human visual system is able to select only therelevant amount of information necessary to perform taskssuch as object recognition [16]. Visual Attention is the waythat humans and other biological systems use to select rele-vant visual information [15].

Selective visual attention is used by the biological sys-tem to optimize its processing capacity. Selective attentionis responsible for identifying the part of the visual inputwhere the processing is performed and, at the same timesuppressing irrelevant visual information [2]. Selective vi-sual attention is generated by a combination of informationfrom the retina and early visual cortical areas (bottom-upattention) as well as feedback signals from areas outside ofthe visual cortex (top-down attention) [8].

The bottom-up attention is formed by simple features ex-tracted from the image, such as intensity, stereo disparity,colour, orientation, and others. All this information is com-bined to create a saliency map which represents the points of

interest in the visual input. The top-down attention signalsare responsible for being biased concerning the competitionof all the points generated by the saliency map. This infor-mation can be, for example, a visual search for a specificobject.

Because of this biological evidence, selective visual at-tention can be used to develop artificial vision systems [2].In this case, selective visual attention is used to reduce theamount of incoming data by selecting only part of the vi-sual information for further processing improving the per-formance or efficiency of the system.

Several models of visual attention have been proposedand they can be divided in two groups. The first belongs tocomputational neuroscience area, where the computationalmodels are used to simulate and understand biological sys-tems [5, 4, 8].

The second group is related to computer vision area,where the models are applied to reduce the amount of dataanalysed by the system [16, 10]. In this case, the models donot need to have a strong biological plausibility.

The model proposed in this paper belongs to the sec-ond group. Two aspects motivate the present work. Thefirst is to explore the parallel architecture typically observedin neural network models and the second is to use the de-velop model to face nonlinear separable problems, such asthe Double Spiral problem [3].

In this paper, a visual attention model based on a Pulse-Coupled Neural Network is proposed. The Pulse-CoupledNeural Networks (PCNN) is a neural network composed ofspiking neurons. It is based on the Linking-field neuralnetwork model proposed by Eckhorn [6]. The Linking-field neural network was projected as a minimal model toexplain the experimentally observed synchronous feature-dependent activity of neural assemblies over large corticaldistances in the cat cortex [9]. Considering that this neuralnetwork was proposed to study the cat visual cortex, that isthe part of the brain that processes the information deriving

Proceedings of the Ninth Brazilian Symposium on Neural Networks (SBRN'06)0-7695-2680-2/06 $20.00 © 2006

Page 2: A Pulse-Coupled Neural Network as A Simplified Bottom-Up Visual Attention Model

Figure 1. The PCNN Neuron [9]

from the eye, Johnson developed the PCNN model whichwas applied to image processing [9].

When compared to other neural network models, thePCNN has some differences:

• PCNN does not require a learning process;

• The PCNN architecture is simpler than other neuralnetworks. It has just one layer which works as an inputand an output layer;

• PCNN have only one laterally connected layer of neu-rons, where each neuron is related to a pixel of the in-put image.

Many researchers have extended Johnson’s work, apply-ing this neural network model to several applications: Im-age Fusion [1], Image Segmentation [12], Foveation [11],Image Shadow Removal [7], Contour and Motion Match-ing [17], Clustering [14], etc.

In this work, a bottom-up visual attention model basedon a Pulse-Coupled Neural Network applied to nonlinearobject separation is proposed. In this context, the proposedvisual attention model is used to perform a temporal scenesegmentation of objects in the visual field, i.e., the attentionmechanism will highlight one of several objects present inthe image in an instant t.

This paper is organized as follows. In section 2, a briefdescription of the PCNN model proposed by Johnson [9]is presented. Afterwards, section 3 describes the proposedmodications of the PCNN model. The experiments per-formed are presented in section 4. Finally, section 5 con-cludes the article and suggests future work.

2 The Pulse-Coupled Neural Network

The Pulse-Coupled Neural Network was former definedby Johnson [9] using the neuron model proposed by Eck-horn et al. [6]. The neuron, presented in Figure 1, is consti-tuted by three parts: the dendritic tree, the linking modula-tion and the pulse generator.

The dendritic tree consists of two main branches: theFeeding input Fj and the Linking input Lj described byequations (1) and (2) respectively. The Linking input isa weighed sum from the input signals which arrive fromthe neighbour neurons and the Feeding input is a weighedsum from the input signals which arrive from the neighbourneurons plus an additional input Ij . The inputs (synapses)can be visualized as leaky integrators (RC circuits). Thesesynapses are charged by pulses and their output amplitudesrise steeply based on the amplitude-gain factor follow by anexponential decay which is determined by the decay timeconstant (Feeding decay: αF

kjt and Linking decay:αLkjt). In

most implemented cases for image processing, the charge ofthese synapses are done by a constant value, which means,when the synapse receives a pulse, it is instantaneously setby this constant value and does not rise steeply. The decay-ing signal is responsible for maintaining a prolonged “post-synaptic” activity of an input signal (persistent signal [1]).

The decay time constants αF and αL and the amplitude-gain factors characterize the signals. The feeding inputshave a smaller decay time constant than the decay time ofthe linking inputs. This makes the feeding signal more per-sistent than the signals of the linking inputs.

The feeding and the linking inputs are described respec-tively by:

Fj =∑

k

Fkj =∑

k

[Mkj exp(−αFkjt)] ∗ Yk(t) + Ij (1)

Lj =∑

k

Lkj =∑

k

[Wkj exp(−αLkjt)] ∗ Yk(t) (2)

where Mkj and Wkj are the weights1 for the feeding and thelinking inputs. Yk(t) is the input signal (spike) arrive fromthe kth neuron, αF

kj and αLkj are the decay time constants of

the feeding and linking inputs respectively, Ij is an externalinput and ∗ is the convolution operator.

The Link Modulation part, is responsible for generatingthe total internal activity Uj . The internal activity Uj of theneuron is denoted by:

Uj = Fj(1 + βjLj) (3)

where βj describe the linking strength.According to Broussard et al. [1] the feeding inputs are

modulates by the linking inputs. This means that the inter-nal activity is Uj is ever increased to more than the feedinginput level when a linking input occurs [9]. The linkingconnection force closes neurons with related characteristicsto fire in unison. This outcome is responsible for the syn-chrony activity observed in these neural networks.

1they are also called Kernels [13]

Proceedings of the Ninth Brazilian Symposium on Neural Networks (SBRN'06)0-7695-2680-2/06 $20.00 © 2006

Page 3: A Pulse-Coupled Neural Network as A Simplified Bottom-Up Visual Attention Model

The pulse generator also consists of a leaky integratorthat creates a dynamical threshold θj with constant decaytime αT

j . When Uj exceeds the threshold θj , the neuronfires. The spike of the neuron immediately charges the leakyintegrator to a constant value θstart. This increase of θj to ahigh value that prevents the neuron firing instantly after thefirst spike. This period in which the neuron cannot fire iscalled the refractory period.

3 The Modified Model

Due to the limitation of the original model to perform thedesynchronization of different assemblies of neurons acti-vated by different objects, we propose some modificationsin order to overcome this limitation and to adapt the modelfor the visual attention problem:

Figure 2. The Modified PCN

• The feeding inputs were modified to only respond tothe external signal I and not to the signal which arrivedfrom the neighbours.

Fj = Ij (4)

• The weight matrix W that represents the connectionstrength between the neighbour neurons i and j hasbeen defined as:

wj,i =1

ri,j(5)

where ri,j is the radius between the neuron i and j.

• As described in section 2, the proposed implementa-tion does not consider the increasing (by amplitude-gain factor) of the inputs and of the threshold of thepulse generator. When a pulse arrives at the input, itis instantaneously charged to a constant. The samehappens to the threshold, when the neuron generatesa pulse; the threshold is set to a start value θstart. In

other words, the θ is not increased by a constant, butset up to a constant value.

• The θ also receive an influence of the pulses that arriveat the linking input (Figure 2), i.e., for each pulse thatarrives at the linking input from the neighbour neurons,the threshold θ is decreased by a factor γ ([0,1]) aspresented in the follow equation:

θ = θ.γ (6)

for every signal (spike) which arrives at the linkingsinapses.

The implementation of the neural network is performedas follows:

1. Set up the initial parameters of the network2. Unmark all neurons3. Advance all the neurons with one time unit t → t+14. Check if any neuron not marked is able to fire. If a

neuron is able to fire, mark it and insert the neuroninside a queue

5. Return to step 4 until all neurons able to fire weremarked

6. Remove a neuron from the queue7. Propagate the signal of this neuron to all its neigh-

bours8. Check if any neuron that receives the signal is able to

fire, if it is able, mark and put it in the queue9. Return to step 8 until all neurons of the neighbourhood

are verified10. Return to step 6 until the queue is empty11. Return to step 2.

Taking this implementation into consideration, severaltests were performed, as is described in the following sec-tion.

4 Experiments

In this Section, we show some experimental results byusing the proposed model. The methodology for the sim-ulations and a discussion on the obtained results are alsopresented.

4.1 Methodology

In this model, each pixel in the input image is associatedto a neuron in the neural network. Each pixel value is fed asthe external input to the corresponding neuron. Stimulatedneurons will generate pulses and those pulses are propa-gated to their neighbours. Taking this into account, coupledneurons, which represent a coherent object in the scene, are

Proceedings of the Ninth Brazilian Symposium on Neural Networks (SBRN'06)0-7695-2680-2/06 $20.00 © 2006

Page 4: A Pulse-Coupled Neural Network as A Simplified Bottom-Up Visual Attention Model

Figure 3. Image used in simulation I: Two Ob-jects

Figure 4. Image used in simulation II: SevenObjects

synchronized and always fire at same time. Different ob-jects may have different sizes or shapes. Thus, each syn-chronized pulse train will have a different phase. Therefore,all objects can be separated.

For the feeding input, as described in Section 3, onlythe value of the pixel of the image is considered and notthe spikes of the neighbours (Fj = Ij). Moreover, in theexperiments the feeding input is also weighed by a constantc.

Several tests have been performed with numerous varia-tions of values of the parameters and the best results wereachieved with the following parameter values: Linkingstrength β = 0.01, Linking decay αL = 0.40, Thresholddecay αT = 0.20, Feeding weight c = 0.25, Neighbourradius = 4, Start threshold θstart = 5, and γ = 0.99.

Linear non-separability problems, such as the spiralproblem, have been caught the attention of researchers in

Figure 5. Image used in simulation III: TenGray-Level Objects

the neural network field since the publication of the in-fluential book Perceptrons by Minsky and Papert in 1969.The Spiral problem consists of distinguishing on a two-dimensional plane between a connected single spiral and adisconnected double spiral [3]. This problem is often usedas a benchmark to test the ability of neural network learningalgorithms to discern about nonlinear separable classes. Inour tests, images similar to the spiral problem are used.

4.2 Results

In the experiments carried out in this paper, images withseveral linearly nonseparable objects (Fig. 3, 4 and 5) havebeen used to distinguish a determined object among othersin the same image. Each object is highlighted at a differentinstant. Therefore, it can be said that the proposed neuralnetwork model acts as a selective visual attention system.

In the first experiment performed with the image pre-sented in Figure 3, the two spirals in the image are distin-guished by two pulse trains of different phases, as shown inFigure 6. It can be observed from Figure 6, at the beginningof the simulation, that both objects fire together, but due tothe mechanism of pulse propagation, the objects start to fireat different phases.

The proposed model is also able to separate the objectspresented in Figure 4, which consists of seven objects withnonlinearly separable characteristics. The segmentation re-sults can be observed from the time series presented in Fig-ure 7, where it can be observed that the neural network isable to desynchronize the various objects presented by high-lighting each object in an instance of time.

The model can be also applied to gray-level scene seg-mentation. If we consider the image presented in Figure 5as an input image, the segmentation result is presented bythe numbered objects in Figure 8, in which it is possible to

Proceedings of the Ninth Brazilian Symposium on Neural Networks (SBRN'06)0-7695-2680-2/06 $20.00 © 2006

Page 5: A Pulse-Coupled Neural Network as A Simplified Bottom-Up Visual Attention Model

observe that the neural network is able to highlight the ob-jects in different instances. Moreover, two interesting phe-nomena are observed from this experiment. For example,in some instances of time (t), the object formed the parts 7,8, 9 and 10 fires at the same time, which is considered bythe neural network as a unique object (see arrow (a) in Fig-ure 8). In other instances, each part fires at a different time,therefore they are considered as different objects (see arrow(b) in Figure 8).

This finding has a biological plausible explanation, forexample, when we look at an object “hand”, we may see the“hand” as a whole at some instance, while we may see thedetails of each part of the hand, thus considering the “palm”and “fingers” as distinct objects. We can also observe in thetime series presented in Figure 8, the objects with a brightgray-level or large area catch “attention” at a high frequencywhen compared to objects with a dark gray-level or a smallarea.

5 Conclusion

This paper proposed a bottom-up visual attention modelbased on a Pulse-Coupled Neural Network. The model wasapplied to distinguish objects in a given scene, even thoughthey are linearly nonseparable.

Computer simulations also showed that the proposedmodel is able to perform a selective visual attention of eachobject in the input image; this visual selection is carried outby pulse trains, where each group of neurons representingan object fires at different instances of time.

Other interesting phenomena observed during the exper-iments were the ability of the neural network to consider anobject as a whole, and in other instances details of the ob-ject are shown. This is consistent to the features of human(animal) visual attention.

As for future work, firstly, we intend to use the devel-oped model as a part of an object recognition system, wherethe proposed model will be used to deliver the attention toone of several objects present in the input image. Secondly,mathematical analysis will be carried out to explain the pa-rameters set empirically in the experiments. Finally, we willinclude other properties of the image as input to the neuralnetwork, such as colour, depth, etc. This will be done tomake it possible to apply this model to real images.

References

[1] R. P. Broussard, S. K. Rogers, M. E. Oxley, and G. L. Tarr.Physiologically motivated image fussion for object detectionusing a pulse coupled neural network. IEEE Transaction onNeural Networks, 10(3):554–563, 1999.

[2] L. Carota, G. Indiveri, and V. Dante. A softwarehardware se-lective attention system. Neurocomputing, 58-60:647–653,2004.

[3] K. Cheng and D. Wang. Perceiving geometric patterns:From spirals to inside-outside relations. IEEE Transactionson Neural Networks, 12(5):1084–1102, 2001.

[4] G. Deco and E. T. Rolls. Attention, short-term memory, andaction selection: A unifying theory. Progress in Neurobiol-ogy, 76:236–256, 2005.

[5] G. Deco, E. T. Rolls, and J. Zihl. Neurobiology of Attention,chapter 97 - A Neurodynamical Model of Visual Attention,pages 593–599. Elsevier, Oxford, 2005.

[6] R. Eckhorn, H. J. Reitboeck, M. Arndt, and P. Dicke. Fea-ture linking via synchronization among distributed assem-blies: Simulation of results from cat visual cortex. NeuralComputation, 2:293–307, 1990.

[7] X. Gu, D. Yu, and L. Zhang. Image shadow removal usingpulse coupled neural network. IEEE Transaction on NeuralNetworks, 16(3):692–698, 2005.

[8] L. Itti and C. Koch. Computational modelling of visual at-tention. Nature Reviews Neuroscience, 2:194–203, 2001.

[9] J. L. Johnson. Pulse-coupled neural nets: translation, ro-tation, scale, distortion, and intensity signal invariance forimages. Applied Optics, 33(26):6239–6253, 1994.

[10] T. Jost, N. Ouerhani, R. von Wartburg, R. Muri, andH. Hugli. Assessing the contribution of color in visual atten-tion. Computer Vision and Image Understanding, 100:107–123, 2005.

[11] J. M. Kinser. Foveation by a pulse-coupled neural net-work. IEEE Transaction on Neural Networks, 10(3):621–625, 1999.

[12] G. Kuntimad and H. S. Ranganath. Perfect image segmenta-tion using pulse coupled neural networks. IEEE Transactionon Neural Networks, 10(3):591–598, 1999.

[13] T. M. Nazmy. Evaluation of the pcnn standard model forimage processing purposes. IJICIS, 4(2):101–111, 2004.

[14] M. B. H. Rhouma and H. Frigui. Self-organization of pulse-coupled oscillator with application to clustering. IEEETransaction on Patter Analysis and Machine Intelligence,23(2):180–195, 2001.

[15] J. K. Tsotsos. On the relative complexity of active vs. pas-sive visual search. International Journal of Computer Vi-sion, 7:127–141, 1992.

[16] D. Walther, U. Rutishauser, C. Cock, and P. Perona. Se-lective visual attention enables learning and recognition ofmultiples objects in cluttered scenes. Computer Vision andImage Understanding, 100:41–63, 2005.

[17] B. Yu and L. Zhang. Pulse-coupled neural networks for con-tour and motion matchings. IEEE Transaction on NeuralNetworks, 15(5):1186–1201, 2004.

Proceedings of the Ninth Brazilian Symposium on Neural Networks (SBRN'06)0-7695-2680-2/06 $20.00 © 2006

Page 6: A Pulse-Coupled Neural Network as A Simplified Bottom-Up Visual Attention Model

Figure 6. Temporal activities of the units presented in Figure 3 for the objects 1 and 2 respectively

Figure 7. Temporal activities of the units presented in Figure 4 for all the objects labeled in the image

Figure 8. Temporal activities of the units presented in Figure 5 for the 10 objects in the image

Proceedings of the Ninth Brazilian Symposium on Neural Networks (SBRN'06)0-7695-2680-2/06 $20.00 © 2006