Final Year Project report “Multi-agents for Image Understanding” Group members: Alaa Sukarieh (ID: 200300509 Major: EE e-mail: [email protected]) Abbas Darwish (ID: 200300207 Major: CCE e-mail: [email protected]) Mohamad Darwish (ID: 200300208 Major: EE e-mail: [email protected]) Advisors: Prof. Karim Kabalan (American University of Beirut) Prof. Walid Smari (University of Dayton) Report Date: May 23, 2006
45
Embed
Final Year Project report - aub.edu.lb · This report includes the implementation procedures of our Final Year Project of our fourth year in engineering at the American University
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Final Year Project report
“Multi-agents for Image Understanding”
Group members: Alaa Sukarieh (ID: 200300509 Major: EE e-mail: [email protected]) Abbas Darwish (ID: 200300207 Major: CCE e-mail: [email protected]) Mohamad Darwish (ID: 200300208 Major: EE e-mail: [email protected]) Advisors: Prof. Karim Kabalan (American University of Beirut) Prof. Walid Smari (University of Dayton)
Where Vi is the ith point of the active contour and α, β are the model parameters that
play an important role in the convergence to the desired solution.
Here, we are going to state the Consistency and stability of Active Contours
presented in [9]. This method talks about external energies of active contours and
their formulation as Euclidean arc length integrals; it goes on to show how
representing the external energies of active contours is a biased method. This bias is
American University of Beirut-EECE 502-FYP report
15
shown when the external energy does not occur at an image edge. Moreover the active
contours of these images are sometimes unstable when initialized at the true edge of
the image. An example of this is the external energy function0
|| ||L
I ds− ∇∫ . Thus, the
writers of this paper opted to use a non-Euclidean arc-length which remains
unaffected by changes in motion, in an attempt to solve this bias. To do this, they had
to create a new formulation of the active contours. In summary, they defined the arc
length in a way that the length of an infinitesimal piece of contour does not change
when being moved in the normal direction. Moreover, the contour is now an integral
curve of a vector field which is the gradient of a local energy function. In this manner,
the active contour turns out to have no global energy function which is why their
method is not biased.
In the following approach, we are going to present the image recognition using
Vector Valued Active Contours that is presented in [10]. In this paper, the author
presented a framework if active contours for object segmentation in vector valued
images. They used geodesic snakes and showed that the solution to deformable
contours approach for boundary detection is in fact given by a geodesic curve in
Riemann space. This curve is the result of a new metric which the author derived from
the vector image; this metric is based on the edges of vector valued images on
classical Riemannian geometry. The use of this technique is vast; since it can be used
to sample images at different scales thus allowing for better segmentation, Moreover
it could be used in combining different image modalities to produce a vector image.
American University of Beirut-EECE 502-FYP report
16
Chapter 3
System Design
In this chapter, we are going to discuss the design of our system including the
implementation approach that we are using in building the system. Also, we are going
to state the language used in the implementation.
3.1 Implementation approach:
After setting the available approaches in this domain, we have to choose one
of these approaches to implement in our project. The method that we decided on is
related to multi-agents system for image recognition. This stated approach came as a
solution for the problem of recognizing objects in a given image that are transposed to
scaling and translation. So, the main concept here is the automatic recognition of
objects in a given image invariant to its position, scale, and orientation.
This approach is based on the utilization of software agents where multi-agent
systems are developed for image segmentation and interpretation. To achieve the
desired goal, we need the agents to have some pre-knowledge about the object to
facilitate recognition. For this reason, we have to interpret the images using the multi-
agents system in a distributed architecture combined with neural networks. The
system could be implemented on a cluster computer with the MPICH2 Library.
Now we are going to present the architecture for the proposed pattern
recognition in images using multi-agents system with complemented neural networks.
This proposed multi-agents system (as in figure 3.1) consists of one server and many
clients. The server consists of a supervisor for the system responsible for management
of information. While each clients is composed of two parts; the first part includes a
American University of Beirut-EECE 502-FYP report
17
supervisor responsible for recognition module and management of local agents, and
the second part consists of agents that are responsible for object detection. Here, we
are going to use a blackboard architecture that allows us to keep some information
regarding the processing evolution. This means that we are going to have a back
ground about each pixel, in the specified image, that contains the pixel’s state (if
processed by other agents or not). In the next step, we are going to discuss the detailed
role of each part of the architecture.
Client nb. 1
- supervisor - agents
Client nb. N
- supervisor - agents
Server - supervisor of the system - Information Management
Fig. 3.1: multi-agents architecture
The main role of the server is to collect information from the clients regarding the
state of the processed pixels and update the blackboard according to those new states.
The collected information by the server from the clients can be the location of the
recognized objects in the given image, the regions processed by each agent, and the
American University of Beirut-EECE 502-FYP report
18
location of each agent. After collecting this information and updating the blackboard,
the new updated blackboard is sent back to each client. This step helps in telling each
client about the processed regions by the other agents for them not to work on these
pixels again.
The goal of the client is to collect local data (such as the location and the regions
processed by each agent) from the regional agents and send them to the main server.
This can be done with the help of the supervisor found in each client and that serves
as an intermediate point between the client and the server.
Analyzing the above background, we can see that the idea of this architecture
mainly originated from biological facts and especially using the procedure followed
by ants to recognize their way back home, which is a form of indirect communication
through the environment. As these ants move from the nest towards their desired
food, they deposit a volatile substance along their way called pheromone. The
quantity of the deposited pheromone depends on the distance and time expected to
finish the whole operation. Moreover, in later stages, this pheromone may present a
guide for new operations. Similar to the above stated procedure we have to design
cooperation between the different agents. In our case, each agent is responsible of
depositing two kinds of pheromone for other agents to check; repulsive and attractive
pheromones. The repulsive pheromones are used to tell other agents not to process the
specified region that is already processed by other agents. While the attractive
pheromones, are responsible for asking some help from other agents, inside or outside
the same client, on processing a given area.
The actions that should be taken by each agent, while processing a given area of
the image, are to move, locate, and mark pheromone, where each action will be
explained in details. The move action indicates that the action should move on to
American University of Beirut-EECE 502-FYP report
19
process pixels inside the specified image and that are not previously processed by
other agents. The locate action indicates that the agents are required to be positioned
on a given pixel inside the specified area. The pheromone marking can be divided into
two parts; attractive and repulsive pheromone marking. The attractive pheromones
can tell the agents about the pixels that should be processed first and that need a help
of other complementary detection method. While the repulsive marking gives
information on the blackboard about the pixels that are already processed by this
agent and that is located to.
3.2 Detection Process:
The recognition process is done in two steps. First, we describe an object by a set
of invariant features using Zernike Moments descriptors, explained above. Then these
parameters are used by a designed neural network to be matched to the closest image
in the database.
As stated previously, our implementation was based on Harris key points,
Zernike moments & Back-Propagation algorithms in Neural Networks so as to help us
out in implementing our FYP. Below is a short overview of these methods.
Zernike
In the implementation phase of our FYP we resorted to use the Zernike
polynomials and moments to extract certain features from the images.
There are 2 types of polynomials, even and odd Zernike polynomials.
• The odd Zernike polynomials are given by the following equation:
o ( , ) ( ) cos( )m mn nZ R mρ φ ρ= φ
American University of Beirut-EECE 502-FYP report
20
• While the even Zernike polynomials are given by the following equation
o ( , ) ( ) sin( )m mn nZ R mρ φ ρ− = φ
We should note that the above variables are defined below:
• m and n are nonnegative integers with , n m≥
• φ is the azimuth angle in radians 0 2φ π≤ <
• ρ is the normalized radial distance. 0 1ρ≤ <
• The radial polynomials mnR are defined as
o ( ) / 2
2
0
( 1) ( )!( )!(( / 2 )!(( ) / 2 )!
kn mm n kn
k
n kRK n m k n m k
ρ ρ−
−
=
− −=
+ − − −∑ if n-m is even
o if n-m is odd ( ) 0mnR ρ =
Neural networks:
In general, a neural network (NN) is made-up of a group of directly connected
neurons. A single neuron can be connected to as many other neurons; furthermore the
total number of neurons and connections in a network reach a very large number. In
this document, we should note that when we use terms like neuron, neural networks,
learning, or experience, they should be taken in the context of NNs. We preferred the
use of NNs because of their ability to learn by training, thus a NN can recognize the
image of an apple if it had been previously trained using examples of an apple, further
explanation of this will be done through explanation of the code.
We will start this chapter with an overview of the different components of a
neural network, i.e. we will be talking about the sigmoid function, neurons and Back
Propagation Neural networks.
American University of Beirut-EECE 502-FYP report
21
o Sigmoid Function
A sigmoid function is usually defined as:
1( )(1 )axf x
e−=+
Where “a” is a real constant number. The constant a is taken to be equal to a
value between 0.5 and 2. Usually when starting a NN from scratch a is taken as equal
to 1, and then the equivalent value of a is then updated to reflect the needed values of
the function, whichever meets our needs. The input of the sigmoid function is the
output of neurons.
o Neuron
We can think of a neuron as a black box that has one or many inputs but just
one output. The output of a neuron is calculated as follows:
1) First we multiply each input of the neuron by its specified weight
2) Add these numbers together and then scale their sum to a number between
0 and 1.
Fig 3.2: Tree architecture.
American University of Beirut-EECE 502-FYP report
22
Let
In general:
1 1 2 2 3 3( ) ( ) (d x w x w x w= ∗ + ∗ + ∗ )
w
)
1
n
i ii
d x=
= ∗∑
Let θ be a real number which we will call Threshold. The value of θ is usually taken
to vary between 0.25 and 1.
Thus the output of the Neuron is: (z s d θ= + .That is the output z is the result of
applying the sigmoid function on (d + θ). What we need to do is to find the right
value of the weights and threshold. As a starting point the weights are taken to be
equal to a random number between 0 and 1.
o Back Propagation:
Back-Propagation is a supervised learning1 technique that we will be using to
train our neural network.
A Pseudo Code of the technique is as follows:
1. Load the training set to the neural network.
2. Evaluate the network's output to the desired output from that sample.
a. Calculate the error in each output neuron.
3. For each neuron
a. Compute what the output should be. (Desired Output)
1 In supervised learning, we are given a set of example pairs ( , ), ,x y x X y Y∈ ∈ and the aim is to find a function f to infer the mapping implied by the data and the cost function is related to the mismatch between our mapping and the data.
American University of Beirut-EECE 502-FYP report
23
b. Compute the local error. The local error reflects the needed
number to add or subtract from the real output to make it match
the desired one.
4. Correct the weights of each neuron so that the local error is lowered.
5. Assign a coefficient k for the local error to neurons at the previous
level; k should be distributed according to the weights, where the
neurons connected by stronger weights are given a larger k.
6. Repeat the steps above on the neurons at the previous level, using k as
its error.
Thus the errors propagate backwards from the outer nodes to the inner nodes.
Back-Propagation usually allows quick convergence if it is run on networks which
have no feedback.
The following diagram shows a Back Propagation NN:
Fig 3.3: Network design
American University of Beirut-EECE 502-FYP report
24
This NN consists of three layers:
1. Input layer with three neurons.
2. Hidden layer with two neurons.
3. Output layer with two neurons.
The number of neurons in the input layer depends on the number of possible inputs
we have, while the number of neurons in the output layer depends on the number of
desired outputs. The number of hidden layers and how many neurons in each hidden
layer cannot be well defined in advance, and could change per network configuration
and type of data. In general the addition of a hidden layer could allow the network to
learn more complex patterns, but at the same time decreases its performance.
NNs are being used by many applications because of their ability to learn by
example. This ability makes them very attractive in environments where other
methods fail to do the job or the running time is very high. They have been
extensively used in the fields of image recognition, speech recognition, adaptive
control & computer vision.
3.3 Parameters Selection
As mentioned in the report there is three main stages in our design for object
detection. In every stage there are some parameters that affect the recognition rate.
Also we have to take care of complexity and execution time.
In the first stage, Harris Key points detection, we have to agree on the average
number of key points to be detected in the used images. The number of key points is
important because Zernike moments are to be computed on a neighborhood of each
detected key point. We found that we can’t know the number of key points needed
American University of Beirut-EECE 502-FYP report
25
until we test the software and know the effect of increasing or decreasing the number
of key points. In the second stage, we have to set the order of the Zernike moments to
be computed on the pixels around each key point. Also, here we need to fix the
number of pixels since it affects the detection accuracy, for example 11x11 pixels
gives result different than 19x19.
After certain number of experimentations we reached the best combination of
the parameters mentioned above and we are going to give samples of these results:
To know the number of Key points that maximize recognition rate, we
followed the following procedure: First we fixed the order of Zernike moments to 30
and we decided to compute these moments on a 9x9 pixels neighborhood of each
detected key point, but at the same time we varied the number of Key points.
Table 3.1: Software parameters (varying number of harris points)
Set 1 Set 2 Set 3 Set 4 Set 5 Set 6
Num. of Harris Key Points
40 35 30 25 20 15
Order of Zernike Moments
30 30 30 30 30 30
Range of Pixels 9x9 9x9 9x9 9x9 9x9 9x9 Recognition Rate 55% 62% 73% 79% 83% 81%
As we can see, the best result is when the number of key points is between 25
and 15. To be on the safe side, we thought that we need to vary the other parameters
to be surer about the correct number of key points. We decided to vary both order of
Zernike moments and Range of pixels but one at a time.
In the following table we are going to keep the same parameters as the above
table except the order of the Zernike moments. The order is changed from 30 to 20
American University of Beirut-EECE 502-FYP report
26
Table 3.2: Software parameters (varying number of harris points)
Set 1 Set 2 Set 3 Set 4 Set 5 Set 6
Num. of Harris Key Points
40 35 30 25 20 15
Order of Zernike Moments
20 20 20 20 20 20
Range of Pixels 9x9 9x9 9x9 9x9 9x9 9x9 Recognition Rate 60% 67% 77% 83% 85% 83%
Also, it is obvious that we get the highest rate when there are 20 key points in
the used images. But we still have to check the correctness of this number when we
vary the range of pixels. This is shown in the next table where we modified the range
of pixels to 11x11 rather than 9x9.
Table 3.3: Software parameters (varying number of harris points)
Set 1 Set 2 Set 3 Set 4 Set 5 Set 6
Num. of Harris Key Points
40 35 30 25 20 15
Order of Zernike Moments
30 30 30 30 30 30
Range of Pixels 11x11 11x11 11x11 11x11 11x11 11x11 Recognition Rate 57% 65% 74% 82% 85% 83%
Based on these results we found that the best number of key points in the
image is 20 as an average. So we fixed this number for finding the other parameters
(order of moments and range of pixels).
Now we have to agree on the order of Zernike moments. But as we can see
from the above tables that order 20 gave higher recognition rate than order 30;
moreover after some research we found that best order is 15, so we made this order as
ESTIMATION FOR ACCURATE LOCALISATION OF ACTIVE CONTOURS”, pp. 781-784, IEEE, 2001
[9] Tianyun Ma, Hemant D. Tagare, “Consistency and Stability of Active Contours with Euclidean and Non-Euclidean Arc Lengths”, pp. 1549-1559, IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 8, NO. 11, NOVEMBER 1999 [10] Guillermo Sapiro, “Vector-Valued Active Contours”, pp. 680-685, IEEE,
1996 [11] Choksuriwong, Anant, Christophe Rosenberger, and Waleed W. Smari.
"Multi-agents System for Image Understanding." pp. 01-06. [12] "XITE."(2002) University of OSLO Retrieved Dec. 25, 2005 from
http://www.ifi.uio.no/forskning/grupper/dsb/Software/Xite/ftp/ [13] Mpich2. Retrieved Dec. 25, 2005, from http://www-
2005.10.05. 22 May 2006 < http://www.tek271.com/articles/neuralNet/IntoToNeuralNets.html>.
American University of Beirut-EECE 502-FYP report
39
[15] Wikipedia contributors, 'Supervised learning', *Wikipedia, The Free
Encyclopedia,* 7 May 2006, 13:07 UTC, < http://en.wikipedia.org/w/index.php?title=Supervised_learning&oldid=51975666> [accessed 22 May 2006]
[16] Wikipedia contributors, 'Backpropagation', *Wikipedia, The Free
Encyclopedia,* 3 May 2006, 04:41 UTC, < http://en.wikipedia.org/w/index.php?title=Backpropagation&oldid=51316816> [accessed 22 May 2006]
[17] Abdallah, Samer. "Object Recognition via Invariance." *PhD. Thesis, *2000 [18] A.Choksuriwong, H.Laurent, C. Rosenberger, and C.Maaoui. “Object
Recognition using Local characterization and Zernike moments”. Laboratoire Vision et Bobotique-UPRES EA 2078 ENSI se Bourges- Université d’Orléans 10 boulevard Lahitolle. 18020 Bourger Cedex, France.