Page | 1 A PROJECT REPORT ON MOVING OBJECT DETECTION FROM VIDEO USING NEURAL NETWORK Submitted in partial fulfillment for the requirement of the award of DEGREE IN MASTER OF SCIENCE IN COMPUTER SCIENCE Paper Code:- MCS 1005 Submitted by Shantasree Kar Roll: 101611 No.:02220123 Regn No.: 22-110021754 of 2011-2012 Under the supervision of Dr.Prodipto Das, Asst. Professor Department of Computer Science Assam University , Silchar
38
Embed
A PROJECT REPORT ON MOVING OBJECT DETECTION FROM VIDEO ... · MOVING OBJECT DETECTION FROM VIDEO USING NEURAL NETWORK Submitted in partial fulfillment for the requirement of the award
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page | 1
A PROJECT REPORT
ON
MOVING OBJECT DETECTION FROM VIDEO USING
NEURAL NETWORK
Submitted in partial fulfillment for the requirement of the award of
DEGREE
IN
MASTER OF SCIENCE IN COMPUTER SCIENCE
Paper Code:- MCS 1005
Submitted by
Shantasree Kar
Roll: 101611 No.:02220123
Regn No.: 22-110021754 of 2011-2012
Under the supervision of
Dr.Prodipto Das,
Asst. Professor
Department of Computer Science
Assam University , Silchar
Page | 2
Date: ………………….
CERTIFICATE
This is to certify that the project paper work entitled “MOVING
OBJECT DETECTION FROM VIDEO USING NEURAL
NETWORK” submitted by Shantasree Kar(Roll: 101611 No.:
02220123), hereby recommended to be accepted for the partial
fulfillment of the requirements for M.Sc(Computer Science) degree
from Assam University.
( Dr.Bipul Syam Purkayastha ) Head of the Department
Department of Computer Science
Assam University, Silchar
Pin – 788011
DEPARTMENT OF COMPUTER SCIENCE
SCHOOL OF PHYSICAL SCIENCES
ASSAM UNIVERSITY SILCHAR
A CENTRAL UNIVERSITY CONSTITUTED UNDER ACT XIII OF 1989
ASSAM, INDIA, PIN - 788011
Page | 3
Date: ………………….
CERTIFICATE
This is to certify that the project paper work entitled “MOVING
OBJECT DETECTION FROM VIDEO USING NEURAL
NETWORK” submitted by Shnatasree Kar(Roll: 101611 No.:
02220123), hereby recommended to be accepted for the partial
fulfillment of the requirements for M.Sc(Computer Science) degree
from Assam University.
(Dr.Prodipto Das)
Assistant Professor
Department of Computer Science
Assam University, Silchar
Pin – 788011
DEPARTMENT OF COMPUTER SCIENCE
SCHOOL OF PHYSICAL SCIENCES
ASSAM UNIVERSITY SILCHAR
A CENTRAL UNIVERSITY CONSTITUTED UNDER ACT XIII OF 1989
ASSAM, INDIA, PIN - 788011
Page | 4
DECLARATION
I, SHANTASREE KAR do hereby declare that the project paper work
entitled “MOVING OBJECT DETECTION FROM VIDEO USING
NEURAL NETWORK” has been carried out by me under the guidance of
Dr.Prodipto Das, Assistant Professor, Department Of Computer Science,
Assam University, Silchar. Whenever I have used materials (data,
theoretical analysis, figures, and text) from other sources, I have given due
credit to them by citing them in the text of this report and giving their
details in the references.
Date…………………. Shantasree Kar
Roll :-101611 No.: 02220123
Regn No.:-22110021754
Department of Computer Science
Page | 5
ACKNOWLEDGEMENT
My sincere gratitude and thanks towards my project paper guide Dr.Prodipto
Das, Assistant Professor, Department of Computer Science, Assam University,
Silchar, Assam,India.
It was only with his backing and support that I could complete the report. He
provided me all sorts of help and corrected me if ever seemed to make mistakes.
I have no such words to express my gratitude.
I acknowledge my sincere gratitude to the HOD of Computer Science
Department, Assam University, Silchar. He gave me the permission to do the
project work. Without his support I couldn’t even start the work. So I am
grateful to him.
I acknowledge my sincere gratitude to the lecturers, research scholars and the
lab technicians for their valuable guidance and helping attitude even in their
very busy schedule.
And at last but not the least, I acknowledge my dearest parents for being such a
nice source of encouragement and moral support that helped me tremendously
in this aspect.
I also declare to the best of my knowledge and belief that the Project Work has
not been submitted anywhere else.
Place: SHANTASREE KAR
Date: Roll : 101611 No.:02220123
Department of Computer Science
Page | 6
Chapter1
INTRODUCTION
Moving object detection is a computer technology related to computer vision, image
processing, neural network that deals with detecting instances of semantic objects of a
certain class ( such as human, cars etc) in digital image or video. Well researched
domains include vehicle detection, pedestrian detection.Moving object detection has
many applications in the domain of computer vision, including image retrieval and
video surveillance.
Moving object detection for real world applications is still a challenging problem.
While recent research datasets increases the amount of training sets and testing
examples to get closer to the real world problems, the ability of detectors to process
large data sets in reasonable time becomes another important issue besides accuracy.
It is not only the training examples that matters, but also the number of classes.
Moving object detection involves locating objects in the frame of a video sequence.
Every tracking method requires an object detection mechanism either in every frame
or when the object first appears in the video. In moving object detection various
background subtraction techniques available in the literature were simulated.
Background subtraction involves the absolute difference between the current image
and the reference updated background over a period of time. A good background
subtraction should be able to overcome the problem of varying illumination condition,
background clutter, shadows, camouflage, bootstrapping and at the same time motion
segmentation of foreground object should be done at the real time.
The moving object tracking in video pictures has attracted a great deal of interest in
computer vision. Object tracking is the first step in surveillance systems, navigation
systems and object recognition. There is a huge significance of object tracking in real
time environment as it enables several important applications such as to provide better
sense of security using visual information, Security and surveillance to recognize
people, to analyze shopping behaviour of customers in retail space, video abstraction
Page | 7
to obtain automatic annotation of videos, to generate object based summaries, traffic
management to analyze flow, to detect accidents, video editing to eliminate
cumbersome human operator interaction, to design futuristic video effects.
NEURAL NETWORK
A neural network is a massively parallel distributed processor made up of a single
processing units , which has a natural propensity for storing experiential knowledge
and making it available for use. It resembles the brain in two aspects:
1. Knowledge is acquired by the network from its environment through a learning
process.
2. Interneuron connection strengths, known as synaptic weights, are used to store
the acquired knowledge.
The procedure used to perform the learning process is called a learning
algorithm , the function of which is to modify the synaptic weights of the
network in an orderly fashion to attain a desired design objective.
The modification of synaptic weights provides the traditional methods for the
design of neural network. However, it is possible for a neural network to
modify its own topology which is motivated by the fact that neurons in the
human brain can die and that new synaptic connection can grow.
Neural networks are also referred to in literature as neurocomputers ,
connectionist networks , parallel distributed processors etc.
Page | 8
ADVANTAGE OF NEURAL NETWORK
Neural networks are an approach to computing that involves developing mathematical
structures with the ability to learn. Neural network have the remarkable ability to
derive meaning from complicated or imprecise data and can be used to extract pattern
and detect trends that are too complex to be noticed by either human or other
computer techniques. A trained neural network can be thought as an expert in the
category of information it has been given to analyze.
Neural networks have broad applicabilities to real world business problem and have
already been successfully applied in many industries.
Neural networks use a set of processing elements analogous to neurons in the brain.
These processing elements are interconnected in a network that can identify patterns
in data. These distinguishes neural network from other computing programs that
simply follow instructions in a fixed sequential order.
The structure of neural network look something like the following:-
Fig1:- Interconnected Neural Network.
Here the first layer is the input layer, second layer is the hidden layer and the third
layer is the output layer.
Each node in the hidden layer is fully connected to the inputs which mean that what is
learned in a hidden node is based on all the inputs taken together. Statisticians
Page | 9
maintain that the network can pick up the interdependencies in the model. The
following diagram gives an idea of what goes inside the hidden layer
Fig2:- Interconnection weights of neural network.
The weighted sum is performed: X1*W1+X2*W2+…+Xn*Wn.
This weighted sum is performed for each hidden layer and that is how
interconnection is represented.
Page | 10
Chapter-2
FUNDAMENTALS OF NEURAL NETWORK
There are many different types of neural networks, but they all have four basic
attributes:-
A set of processing units;
A set of connections;
A computing procedures;
A training procedure.
Processing Units:
A neural network contains a potentially huge number of very simple processing units .
All these units operate simultaneously, supporting massive parallelism. All
computation in the system is performed by these units. At each moment in time each
unit simply computes scalar function of its local inputs and broadcasts the result called
the activation value to its neighbouring units. The units are typically divided into input
units, which receive data from the environment, hidden units, which may internally
transform the data representation, and/or output units, which represent decisions
control or decisions signals.
Connections:-
The units in a neural network are organized in a given topology by a set of
connections or weights. Each weight has a real value typically ranging from -∞ to +∞,
although sometimes range is limited. The value or weight of a neuron describes how
much influence does a neuron have on its neighbour. The values of all the weights
predetermine the network’s computational reaction to any arbitrary input pattern.
Weights can change as a result of training, but they tend to change slowly.
Page | 11
A network can be connected with any kind of topology. Common topologies include
unstructured, layered, modular and recurrent.
Computation:-
Computation always begins by presenting an input pattern to the network. Then the
activations of all the remaining units are computed.A given unit is typically updated in
two stages: first we compute the unit’s net input and then we compute its output
activation.
In the standard case, the net input Xj for unit j is just the weighted sum of the inputs
Xj=∑YiWji
Where Yi is the output activation of an incoming unit, and Wij is the weight from unit i
to unit j.
Once we have computed the unit’s net input Xj, we compute the output activation Yi
as a function of Xj. .This activation function also called a transfer function can be
either deterministic or stochastic. Deterministic local activation usually take one of the
three forms- linear, threshold or sigmoidal as shown in figure.
(a) (b) (c)
Fig3:-Deterministiclocal activation function:(a) linear (b)threshold (c) sigmoidal
The simplest form of non-linearity is provided by the threshold activation function:-
y = o if X<=0
=1 if X>0
Training:-
Page | 12
Training of a network, in the most general sense means adapting the connections so
that the network exhibits the desired computational behaviour for all input patterns.
The process usually involves modifying the weights but sometimes it also involves
modifying the actual topology of the network. Topological changes can improve both
generalization and the speed of learning, by constraining the class of function that the
network is capable of learning.
TAXANOMY OF NEURAL NETWORK
There are three main classes of learning procedures:-
Supervised learning, in which a teacher provides output target for each input
pattern and corrects the networks errors explicitly.
Semi-supervised(or reinforcement) learning, in which a teacher merely
indicates whether the network’s response to a training pattern is good or bad.
Unsupervised learning, in which there is no teacher, and the network must find
regularities in the training data by itself.
Supervised learning
Perceptrons are the simplest types of feedforward networks that use supervised
learning. In case of single layer perceptron, the Delta rule can be applied directly.
Beacause a perceptron’s activation are binary, this general learning rule reduces to the
perceptron learning rule, which says that if an input is active(Yi=1) and the output Yj
is wrong then Wij should be either increased or decreased by a small amont of e,
depending if the desired output is 1 or 0, respectively.
Multilayer perceptron(MLPs) can theoretically learn any function, but they are more
complex to train. The delta rule cannot be applied directly to MLPs because thereare
no targets in the hidden layers. However if an MLP uses continuous rather than
distinct activation functions i.e, sigmoids rather than threshold functions, then it
becomes possible to use partial derivatives and chain rule to derive the influence of
Page | 13
any weight on any output activation, which in turn indicates how to modify that
weight in order to reduce the network’s error. This generalization of the Delta Rule is
known as backpropagation.
Semi-Supervised learning
Semi-supervised learning is class of supervised learning tasks that also make use of
unlabeled data for training typically a small amount of labelled data with a large
amount of unlabeled data. The problem of semi supervised learning is reduced to the
problem of supervised learning, by setting the training targets to be either the actual
outputs or their negations, depending on whether the network’s behaviour was judged
good or bad. The network is than trained using the Delta Rule, where the targets are
compared against the network’s mean outputs, and error is backpropagated through
the network if necessary.
Unsupervised learning
In unsupervised learning there is no teacher, and a network must detect regularities in
the input data by itself. Such self-organizing networks can be used for compressing,
clustering, quantizing, classifying, or mapping input data. One way to perform
unsupervised training is to recast it into the paradigm of supervised training, by
designating an artificial target for each input pattern, and applying backpropagation.
In particular we can train a network to reconstruct the input pattern on the output
layer, while passing the data through a bottleneck of hidden units. Such a network
learns to preserve as much information as possible in the hidden layer. This type of
network is often called an encoder, especially when the inputs/outputs are binary
vectors.
Page | 14
Chapter-3
LITERATURE SURVEY
1. Takaya et al.(2005):- This paper made use of the neural network to track the
motion of moving objects recorded in a sequence of video images. Given the
motion vector of an arbitrary pixel and colour, the neural network determines if
the pixel belong to the moving object or not.
2. Deng et al.(2010):- A study on vehicle and pedestrian recognition based on
back propagation (BP) neural network is presented in this paper. First, extract
the moving objects from the image sequence using background subtraction.
Second, select part of the objects for further detected. Third, extract several
significant eigen values from the rest objects, which can indicate the
differences of contour between pedestrian and vehicle. Finally, eigen vector is
formed and used as the input of the back propagation neural network, the
output of which is the detecting result.
3. Meenatchi et al(2014):- . A robust and real-time method for tracking objects is
presented in this paper. The proposed algorithm includes two stages: object
tracking, object Segmentation. According to the segmented object shape, a
predict method based on Kalman filter is proposed. Kalman filter model is
used to tracking and predicting the trace of an object. Image enhancement is the
process of adjusting tracked frame images so that the results are more suitable
for display. Edge detection technique is used for finding discontinuities in gray
level images of tracked frames. Finally segmented frames are converted into
video sequence.
4. Tiwari et al.(2012):- This paper addresses the problem of scene understanding
based on neural network and image segmentation. Here they used the
backpropagation algorithm to train the network and features are extracted
using colours in the RGB colour spaces.
Page | 15
Chapter-4
SEGMENTATION
Segmentation is the process of partitioning a digital image into multiple regions or set
of pixels.
The goal of segmentation is to simplify and/or change the representation of an
image into something that is more meaningful and easier to analyze. The result of
segmentation is a set of segments that collectively cover the entire scene or a set of
contours extracted from the scene. Each of the pixel in a region is similar with respect
to some characteristics or computed property such as colour, intensity or texture.
Adjacent regions are significantly different with respect to the same characteristic(s).
SEGMENTATION OF OBJECTS IN IMAGE SEQUENCES
Images are segmented into objects to achieve efficient compression by coding the
contour and texture separately. As the purpose is to achieve high compression
performance, the objects segmented may not be semantically meaningful to human
observers. The more recent applications, such as content-based image/video retrieval
and image/video composition, require that the segmented objects be semantically
meaningful. Finding moving objects in image sequences is one of the most important
tasks in computer vision and image processing. Background subtraction approach
means to compute the stationary background image and to identify the moving objects
as those pixel in the image that differ significantly from the background. Background
subtraction can provide an effective means of locating a moving object.
Page | 16
ALGORITHM OF GAUSSIAN MIXTURE MODEL:
In order to give a better understanding of the algorithm used for background
subtraction the following steps were adopted to achieve the desired results:
1. Firstly, we compare each input pixels to the mean 'µ' of the associated components.
If the value of a pixel is close enough to a chosen component's mean, then that
component is considered as the matched component. In order to be a matched
component, the difference between the pixel and mean must be less than compared to
the component's standard deviation scaled by factor D in the algorithm.
2. Secondly, update the Gaussian weight, mean and standard deviation (variance) to
reflect the new obtained pixel value. In relation to non-matched components the
weights 'w' decreases whereas the mean and standard deviation stay the same. It is
dependent upon the learning component 'p' in relation to how fast they change.
3. Thirdly, here we identify which components are parts of the background model. To
do this a threshold value is applied to the component weights 'w'.
4. Fourthly, in the final step we determine the foreground pixels. Here the pixels that
are identified as foreground don’t match with any components determined to be the
background.
Background modeling by Gaussian mixtures is a pixel based process. Let x be a
random process representing the value of a given pixel in time. A convenient
framework to model the probability density function of x is the parametric Gaussian
mixture model where the density is composed of a sum of Gaussians. Let p(x) denote
the probability density function of a Gaussian mixture comprising K component
densities:
p(x)=∑wk N(x;μk,σk) k=1 to K
where wk are the weights, and N(x;μk,σk) is the normal density of mean μk and
covariance matrix Σk = σkI,( I denotes the identity matrix).
First, the parameters are initialized with wk = w0, μk = μ0 and σk = σ0. If there is
amatch, i.e.
Page | 17
||x – μj||/σj <τ for some j ∈ [1..K]
where τ(> 0) is some threshold value, then the parameters of the mixture are
updated as follows:
wk(t) = (1− α)wk(t − 1) + αMk(t),
μk(t) = (1− β)μk(t − 1) + β x,
σ2 k(t) = (1− β)σ2 k(t − 1) + β ||(x − μk(t))||2,
where Mk(t) is equal to 1 for the matching component j and 0 otherwise. If there is no
match, the component with the lowest weight wk is re-initialized with wk = w0, μk = x
and σk = σ0. The learning rate α is constant and β is defined as:
β = αN(x;μk,σk).
Finally ,the weights wk are normalized at each iteration to add up to 1.
Fig4:- Background subtraction model
Page | 18
EXTRACTION OF FRAMES
Page | 19
Fig:-5 Background subtraction
Page | 20
TEMPORAL DIFFERENCING
In temporal differencing, moving regions are detected by taking pixel-by-pixel
difference of consecutive frames (two or three) in a video sequence. Temporal
differencing is the most common method for moving object detection in
scenarios where the camera is moving. Here the moving object is detected by taking
the difference of consecutive image frames t-1 and t.
FOREGROUND DETECTION
In this step, it identifies the pixels in the frame. Foreground detection compares the
video frame with the background model, and identify candidate foreground pixels
from the frame. Commonly- used approach for foreground detection is to check
whether the pixel is significantly different from the corresponding background
estimate.
Page | 21
Chapter-5
OBJECT TRACKING
After the object detection is achieved, the problem of establishing a correspondence
between object masks in consecutive frames should arise. Obtaining the correct track
information is crucial for subsequent actions, such as object identification and activity
recognition. For this situation, Kalman filtering technique is used.The Kalman filter is
a recursive two-stage filter. At each iteration, it performs a predict step and an update
step.
The predict step predicts the current location of the moving object based on previous
observations. For instance, if an object is moving with constant acceleration, we can
predict its current location, based on its previous location, , using the equations of
motion.
The update step takes the measurement of the object’s current location (if available),
and combines this with the predicted current location, , to obtain an a posteriori
estimated current location of the object.
The equations that govern the Kalman filter are given below
1. Predict stage:
A. Predicted (a priori) state:
B. Predicted (a priori) estimate covariance:
2. Update stage:
A. Innovation or measurement residual:
Z
Page | 22
C. Optimal Kalman gain:
D. Updated (a posteriori) state estimate:
E. Updated (a posteriori) estimate covariance:
They can be difficult to understand at first, so let’s first take a look at what each of
these variables are used for:
is the current state vector, as estimated by the Kalman filter, at time t .
Zt the measurement vector taken at time t .
Pt measures the estimated accuracy of at time t .
F describes how the system moves (ideally) from one state to the next, i.e. how
one state vector is projected to the next, assuming no noise (e.g. no acceleration)
Hdefines the mapping from the state vector, , to the measurement vector, .
Qand R define the Gaussian process and measurement noise,respectively, and
characterise the variance of the system.
B and U are control-input parameters are only used in systems that have an input;
these can be ignored in the case of an object tracker.
The two stages of the filter correspond to the state-space model typically used to
model linear dynamical systems. The first stage solves the process equation:
Page | 23
The process noise is additive Gaussian white noise (AWGN) with zero mean and
covariance defined by:
The second one is the measurement equation:
The measurement noise V is also AGWN with zero mean and covariance defined by:
In order to implement a Kalman filter, we have to define several variables that model
the system. We have to choose the variables contained by Xt and Zt and, and also
choose suitable values for F,H,Q,R as well as an initial value for Pt.
We will define our measurement vector as:
where and are
the upper-left and lower-right corners of the bounding box around the detected object,
respectively.
Page | 24
Chapter-6
BLOB ANALYSIS
For image processing, a blob is defined as a region of connected pixels. Blob analysis
is the identification and study of these regions in an image. The algorithms discern
pixels by their value and place them in one of two categories: the foreground
(typically pixels with a nonzero value) or the background (pixels with a zero value).
Blob analysis is used in finding blobs whose spatial characteristics satisfy certain
criteria. In many applications where computation is time consuming, blob analysis is
used to eliminate blobs that are of no interest based on their spatial characteristics, and
keep only the relevant blobs for further analysis. It can also be used to find statistical
information such as the size of the blobs or the number, location, and the presence of
blob regions.
Another typical problem of any motion detection system is its reliability in the
presence of sudden changes in light conditions. This is typical of indoor environments
(owing to light switches) but also of outdoor contexts, owing, for example, to a
sudden cloud. In such cases, the system should be able to re-establish normal
conditions as soon as possible. This kind of situation can be easily detected by
monitoring continuously the variations between consecutive images of moving points.
The difference between the percentages of moving points in Itm and It-1
mhas been
evaluated as follows:
Dt =||Num(Itm )-Num(It-1
m)|| / dim(I)
where Num() is the number of black points (i.e. moving points)
in the image and dim() is the image dimension.
Page | 25
Fig6:- Background subtraction of new position
Page | 26
Chapter-7
TRAINING OF NEURAL NETWORK
BACKPROPAGATION
The term “backpropagation” has come from the word backward propagation of
error. It follows supervised learning method. It is a method of training artificial
neural networks used in conjunction with an optimization method such as gradient
descent. Backpropagation requires that the activation function used by the artificial
neurons be differentiable. The backpropagation learning algorithm has two
phases : Propagation and weight update.
Propagation
Each propagation involves the following steps:
1. Forward propagation of a training pattern’s input through the neural network in
order to generate the propagation's output activations.
2. Backward propagation of the propagation’s output activations through the
neural network using the training pattern's target in order to generate the deltas of
all output and hidden neurons.
Weight update
For each weight synapse follow the following steps:
1. Multiply its output delta and input activation to get the gradient of the
weight.
2. Bring the weight in the opposite direction of the gradient by subtracting a
ratio of it from the weight.
Page | 27
This ratio influences the speed and quality of learning; it is called the learning rate.
The sign of the gradient of a weight indicates where the error is increasing, this is
why the weight must be updated in the opposite direction.
Repeat the two phases until the performance of the network is satisfactory.
BACKPROPAGATION ALGORITHM
The error signal at the output of neuron j at iteration n (i.e, presentation of the nth