8th Sem Project ITAMS

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

Jnana Sangama, Belgaum-590 014

2012 - 2013

A Report on

“Intelligent Traffic Analysis & Monitoring

System”

Submitted to RVCE (Autonomous Institution Affiliated to Visvesvaraya Technological

University (VTU), Belgaum) in partial fulfillment of the requirements for the award of

degree of

BACHELOR OF ENGINEERING

in

COMPUTER SCIENCE AND ENGINEERING

by

Mayank Darbari 1RV09CS058

Shruti V Kamath 1RV09CS102

Under the guidance

of

Dr. Rajashree Shettar

Professor

Dept. of CSE, RVCE

R. V. College of Engineering, (Autonomous Institution Affiliated to VTU)

Department of Computer Science and Engineering, Bangalore – 560059

VISVESVARAYA TECHNOLOGICAL UNIVERSITY, BELGAUM

R.V. COLLEGE OF ENGINEERING, (Autonomous Institution Affiliated to VTU)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Mysore Road, R.V.Vidy aniketan Post , Bang alore - 560059

CERTIFICATE

Certified that the project work entitled “Intelligent Traffic Analysis & Monitoring

System” carried out by Mr. Mayank Darbari & Ms. Shruti V Kamath, USN:

1RV09CS058 & 1RV09CS102 are bonafide students of R.V. College of Engineering,

Bangalore in partial fulfilment for the award of Bachelor of Engineering in Computer

Science and Engineering of the Visvesvaraya Technological University, Belgaum

during the year 2012-2013. It is certified that all corrections/suggestions indicated for

internal assessment have been incorporated in the report deposited in the departmental

library. The project report has been approved as it satisfies the academic requirement in

respect of project work prescribed for the said degree.

Name of the Examiners Signature with Date

1.____________________ __________________

2.____________________ __________________

Dr. Rajashree Shettar

Professor,

Department of CSE,

R.V.C.E, Bangalore –59

Dr. N. K. Srinath

Head of Department,

Department of CSE,

R.V.C.E, Bangalore –59

Dr. B. S. Satyanarayana

Principal,

R.V.C.E,

Bangalore –59

VISVESVARAYA TECHNOLOGICAL UNIVERSITY, BELGAUM

R.V. COLLEGE OF ENGINEERING, (Autonomous Institution Affiliated to VTU)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Mysore Road, R.V.Vidy aniketan Post , Bang alore - 560059

DECLARATION

We, Mayank Darbari & Shruti V Kamath, students of Eighth Semester B.E., in the

Department of Computer Science and Engineering, R.V. College of Engineering,

Bangalore declare that the project entitled “Intelligent Traffic Analysis &

Monitoring System” has been carried out by us and submitted in partial fulfillment of

the course requirements for the award of degree in Bachelor of Engineering in

Computer Science and Engineering of Visvesvaraya Technological University,

Belgaum during the academic year 2012 -2013. The matter embodied in this report has

not been submitted to any other university or institution for the award of any other

degree or diploma.

Mayank Darbari 1RV09CS058

Shruti V Kamath 1RV09CS102

Department of Computer Science and Engineering,

R.V. College of Engineering,

Bangalore-560059

ACKNOWLEDGMENT

Any achievement, be it scholastic or otherwise does not depend solely on the individual efforts

but on the guidance, encouragement and cooperation of intellectuals, elders and friends. A

number of personalities, in their own capacities have helped me in carrying out this project work.

We would like to take this opportunity to thank them all.

First and foremost we would like to thank Dr. B. S. Satyanarayana, Principal, R.V.C.E,

Benguluru, for his moral support towards completing our project work.

We would like to thank Dr. N. K. Srinath, Head of Department, Computer Science &

Engineering, R.V.C.E, Benguluru, for his valuable suggestions and expert advice.

We deeply express our sincere gratitude to our project mentor Dr. Rajashree Shettar, Associate

Dean & Professor, Department of CSE, R.V.C.E, Benguluru, for her able guidance, regular

source of encouragement and assistance throughout this project.

We thank our Parents, and all the Faculty members of Department of Computer Science &

Engineering for their constant support and encouragement.

Last, but not the least, we would like to thank our peers and friends who provided us with

valuable suggestions to improve our project.

Mayank Darbari

1RV09CS058

8th Sem, CSE

Shruti V Kamath

1RV09CS102

8th Sem, CSE

i

ABSTRACT

The recent progress in the field of computer science has lead to the reducing cost and

growing computing power of the hardware, making vision-based technologies prominent and

popular solutions for surveillance and control systems. Visual vehicle surveillance is gaining

prominence in traffic analysis and monitoring. It is useful for criminal investigation and by

traffic police department to search by the video’s content in order to analyze the objects in the

video. To find a vehicle from these videos because of car crashes, speeding, a truck in a no truck

zone or a particular type of vehicle that the user may be interested in, is a common case. A user

would have to rewind the video to look for an event which happened at a previous time. This

would be a very labor intensive and tedious process, and events could be overlooked due to

human error, if there is not an effective content based method for indexing and retrieval. This

project involves coherent system consisting of object detection, object tracking and classification

of objects, and it’s indexing from vehicle surveillance videos, vehicle speed limit warning, and

determination of traffic flow direction, traffic density and accident detection in real time.

The objects are detected using optical flow method and tracked using Kalman Filtering

method. The objects extracted are classified using 10 features including shape based features

such as area, height, width, compactness factor, elongation factor, skewness, perimeter,

orientation, aspect ratio and extent. A comparative analysis is presented in this project for the

classification of objects (car, truck, auto, human, motorcycle, none) based on Multi-class SVM

(one vs. all), Back-propagation, and Adaptive Hierarchical Multi-class SVM. This system has

different components for traffic analysis and monitoring such as determining the type of object

detected with the count for each computed in real time. The traffic flow direction is determined

as top to down or left to right or their vice versa. The traffic density is determined either as low,

medium or high. There are two algorithms used to accident detection, one based on a motion

direction and the other where a cumulative set of features is calculated to detect an accident.

The results obtained for the classification methods have an accuracy of 92% for Multi-

SVM (one vs. all), 87.8% for Adaptive Hierarchical Multi-class SVM, and 82% for back-

propagation. Using the trained classifier obtained using Multi-class SVM (one vs. all), the

objects are classified in real-time. In addition, objects are indexed using type of object, size,

color and the frames it appears in its respective video. ii

Table of Contents ITAMS

TABLE OF CONTENTS

Dept. of CSE, RVCE Feb-May 2013 iii


Dept. of CSE, RVCE Feb-May 2013 iv


Dept. of CSE, RVCE Feb-May 2013 v


Dept. of CSE, RVCE Feb-May 2013 vi

List of Figures ITAMS

LIST OF FIGURES

Dept. of CSE, RVCE Feb-May 2013 vii

List of Tables ITAMS

LIST OF TABLES

Dept. of CSE, RVCE Feb-May 2013 viii

Page 1

Chapter 1

Introduction

Detecting and recognizing moving vehicles in traffic scenes for traffic surveillance,

traffic control, and road traffic information systems is an emerging research area for

Intelligent Transportation Systems. Due to the progress on the reducing cost and growing

computing power of the hardware, the vision-based technologies has become the popular

solutions for traffic surveillance and control systems. Visual vehicle surveillance videos are

widely used by the police for criminal investigation, by the traffic monitoring system and for

detection of abnormal activities and events like accidents. To find a vehicle from these videos

because of car crashes, speeding, a truck in a no truck zone or a particular type of vehicle that

the user may be interested in, is a common case. Analysis of traffic, like the direction of flow

and the density of traffic, is also an important aspect of traffic monitoring systems. A user

would have to rewind the video to look for an event which happened at a previous time. This

would be a very labor intensive and tedious process, and events could be overlooked due to

human error, if there is not an effective content based method for indexing and retrieval. The

proposed traffic monitoring system greatly reduces human effort.

Visual vehicle surveillance is one of the fastest growing segments of the security

industry. Some of the prominent commercial vehicle surveillance systems include the IBM

S3 [1] and the Hitachi Data Systems Solutions for Video Surveillance [2]. These systems not

only provide the capability to automatically monitor a scene but also the capability to manage

surveillance data and perform event based retrieval. In India, Aftek, Logica and Traffline are

some of the widely used traffic systems. The most recent research work in visual vehicle

surveillance includes real-time vehicle detection by parts [3], integrated lane and vehicle

detection and tracking [4] and occluded vehicle recognition and tracking [5].

The proposed system is a smart surveillance system which works for both real time

and prerecorded traffic videos. Moving objects are detected and tracked in the given input

video. Detected objects are classified based on their types using a robust selection of features.

Functionalities related to event detection and traffic analysis such as accident detection speed

estimation, traffic flow direction determination and traffic density estimation are also

implemented. In case of prerecorded videos, if the videos are large, to save time and

Introduction ITAMS

Dept. of CSE, RVCE Feb-May 2013 Page 2

resources, shots are detected from it using colour histogram method, which takes the

histogram difference of the frames and computes a histogram. A threshold value is then set.

The frames where the value is above the threshold are identified as shots. There are great

redundancies among the frames in the same shot; therefore, certain frames that best reflect

the shot contents are selected as key frames to succinctly represent the shot. For detection of

objects, the Optical Flow Model is used. The objects detected in the video are tracked by

Kalman Filtering.

Objects detected are classified using Multi-class Support Vector Machine and Back-

propagation algorithm, based on a number of features including shape based features. Initially

the Multi-class SVM (one vs. all), Adaptive Hierarchical Multi-class SVM decision tree

approach [6] and Back-propagation algorithm are trained using samples of the objects to be

detected by it. The trained data is then used to predict objects under predetermined number of

classes, based on type of the object. The classification results of these algorithms are

compared. The system contains separate modules for accident detection, traffic density

measurement, traffic flow direction determination and speed estimation.

Accident detection is achieved by using a cumulative set of features such as

orientation, position, area, change in bounding box of vehicles and speed. When the change

in these features for an object exceeds their respective thresholds across consecutive frames,

then an index is set. When the overall index value exceeds a particular threshold, an accident

is detected. For traffic density estimation, the number of objects per frame is counted and

based on the threshold value; density is estimated to be low, medium or high. The direction of

motion vectors across consecutive frames is taken into consideration for estimating the traffic

direction to be north to south, east to west or their vice-versa. The change in position of a

tracked vehicle across frames in seconds is calculated to estimate the speed in pixels per

second. By using a suitable scale factor, it is converted to kilometre per hour.

The proposed system also has a querying module, which can be used to query

detected objects using their dimension, colour or type. The frames in which only a particular

queried object appears can be viewed as well.

Introduction ITAMS


1.1 Definitions

The following are the definitions relevant to this project:

1.1.1 Computer Vision

Computer vision [7] is a field in Computer Science that includes methods for

acquiring, processing, analyzing, and understanding images from the real world in

order to produce numerical or symbolic information. A theme in the development of

this field has been to duplicate the abilities of human vision by electronically

perceiving and understanding an image. Computer vision has application like Object

Detection, Object Recognition and Object Classification.

1.1.2 Video Analysis

The process of developing algorithms for the purpose of processing digital video data

with the objective of extracting the information conveyed by the data is known as

video analysis [8]. This feature is used in a wide range of domains including health

care, entertainment, safety and security, retail and transport. Algorithms for video

analysis may be implemented as software on computers for general purpose, or as

hardware on processing units specialized for handling video sequences.

1.1.3 Object Detection

Object detection [9] is a computer technology related to computer vision and image

processing that attempts to associate a region of interest in the image or video with a

potential object (such as cars, vegetation, humans, buildings). Well-researched

domains of object detection include face detection and vehicle detection. Object

detection has applications in many applications of computer vision, including image

retrieval and video surveillance.

http://en.wikipedia.org/wiki/Computer_vision

http://en.wikipedia.org/wiki/Image_processing

http://en.wikipedia.org/wiki/Image_processing

http://en.wikipedia.org/wiki/Face_detection

http://en.wikipedia.org/wiki/Pedestrian_detection

http://en.wikipedia.org/wiki/Image_retrieval

http://en.wikipedia.org/wiki/Image_retrieval

http://en.wikipedia.org/wiki/Video_surveillance

Introduction ITAMS


1.1.4 Video Tracking

Video tracking [10] is the task of estimating over time the position of objects of

interest in a video sequence. It has a variety of uses, some of which are: security and

surveillance, human-computer interaction, video communication and compression,

traffic control, medical imaging and video editing.

1.1.5 Feature Extraction

When the input data to an algorithm is too large to be processed and it is suspected to

be notoriously redundant then the input data is transformed into a reduced

representation set of features, also named features vector. Transforming the input data

into the set of features is called feature extraction. The features used in this project are

aspect ratio, height, width, elongation, skewness, compactness, orientation, area,

perimeter and extent.

1.1.6 Object Recognition

Object recognition [11] is a process for identifying a specific object in a digital image

or video. Object recognition algorithms rely on matching or learning algorithms using

appearance-based or feature-based techniques. Object recognition is useful in

applications such airport surveillance, automated vehicle parking systems, and bio-

imaging.

1.1.7 Supervised Learning

Supervised learning [12] is a subject in which a model is estimated from data for

mapping explanatory variable to predictive variables. Supervised learning takes a

known set of input data and known responses to the data, and seeks to build a

predictor model that generates reasonable predictions for the response to new data. A

supervised learning algorithm analyzes the training data and produces an inferred

function, which can be used for mapping new examples. An optimal scenario will

allow for the algorithm to correctly determine the class labels for unseen instances.

Introduction ITAMS


1.1.8 Artificial Neural Network

Artificial Neural Network [13] (ANN) is a computational model, which is based on

Biological Neural Network. To build artificial neural network, artificial neurons, also

called as nodes, are interconnected. The architecture of ANN is very important for

performing a particular computation. Some neurons are arranged to take inputs from

outside environment. These neurons are not connected with each other, so the

arrangement of these neurons is in a layer, called as Input layer. All the neurons of

input layer are producing some output, which is the input to next layer. The

architecture of ANN can be of single layer or multilayer. In a single layer Neural

Network, only one input layer and one output layer is there, while in multilayer neural

network, there can be one or more hidden layer.

1.1.9 Video Indexing

Physical features can be extracted to partition video data into useful footage segments

and store the segment attribute information, or annotations, as indexes. These indexes

should describe the essential information pertaining to the video segments, and should

be content based. Indexes can be visualized through the interface so users can perform

various functions.

1.1.10 Video Retrieval

Video Retrieval refers to the provision of search facilities over archives of digital

video sequences, where these search facilities are based on the outcome of an analysis

of digital video content to extract index able data for the search process.

1.2 Literature Survey

Vehicle surveillance systems incorporate electronic, computer, and communication

technologies into vehicles and roadways for monitoring traffic conditions, reducing

congestion, enhancing mobility, and detecting accidents. To achieve these goals, in past

decades, there have been many approaches proposed for tackling problems related to visual

vehicle surveillance. Among them, the vision based approach has the advantages of easy

Introduction ITAMS


maintenance and high flexibility in traffic monitoring and, thus, becomes one of the most

popular techniques used in visual vehicle surveillance for traffic control.

Surveillance and monitoring systems often require on line segmentation of all moving

objects in a video sequence. Background subtraction is a simple approach to detect moving

objects in video sequences. The basic idea is to subtract the current frame from a background

image and to classify each pixel as foreground or background by comparing the difference

with a threshold [14]. Morphological operations followed by a connected component analysis

are used to compute all active regions in the image. In practice, several difficulties arise

during Background Subtraction. To deal with these difficulties several methods have been

proposed in [15]. Most works however rely on statistical models of the background. A

Gaussian Mixture Model [16] may be used to detect objects. Another set of algorithms is

based on spatio-temporal segmentation of the video signal. These methods try to detect

moving regions taking into account not only the temporal evolution of the pixel intensities

and colour but also their spatial properties.

Most commercial surveillance systems rely on background modelling for detection of

moving objects, in particular vehicles. However they fail to handle crowded scenes as

multiple objects close to each other are often merged into a single motion blob.

Environmental factors such as shadow effects, rain, snow, etc. also cause issues for object

segmentation. Various models and methods have been proposed for appearance-based object

detection, in particular vehicle detection. Examples include the seminal work of Viola and

Jones [17] and many extensions using different features, such as edgelets and strip features,

as well as different boosting algorithms like Real Adaboost and GentleBoost. Support vector

machines with histograms of oriented gradients have also been a popular choice for object

detection. In earlier work, Schneiderman et al. in [18], showed good vehicle detection results

using statistical learning of object parts.

A survey of the research in object tracking [19] discusses about new research in the area

of moving object tracking as well as in the field of computer vision. Here research on object

tracking can be classified as point tracking, kernel tracking and contour tracking according to

the representation method of a target object. In point tracking approach, statistical filtering

method has been used to estimating the state of target object. In kernel tracking approach,

various estimating methods are used to find corresponding region to target object. Contour

tracking is based on the fact that when an object arbitrarily deforms, each contour point can

Introduction ITAMS


move independently in it. A vehicle tracking system [20] has also been developed to deal

with night-time traffic videos. Kalman filter, a type of kernel tracking, and particle filter, a

type of contour tracking, are the most popular tracking method. Kalman filter uses a series of

measurements observed over time, to produce estimates more accurate as compared to a

single measurement. Particle filter uses a sophisticated model estimation technique based on

simulation. In the proposed system, Kalman Filtering is used for object tracking due to its

predictive nature and its ability to handles occlusions.

Methods for occlusion handling in object detection generally rely on object part

decomposition and modelling. In this project, however, these methods are not well suited due

to the low-resolution vehicle images. Video-based occlusion handling from the tracking

perspective has been addressed by Senior [21], but it assumes objects are initially far apart

before the occlusion occurs. Large-scale learning is an emerging research topic in computer

vision. Recent methods have been proposed to deal with a large number of object classes and

large amounts of data. In contrast, this approach deals with large-scale feature selection,

showing that a huge amount of local descriptors over multiple feature planes coupled with

parallel machine learning algorithms can handle occlusion effectively to an extent. The most

difficult problem associated with vehicle tracking is the occlusion effect among vehicles. In

order to solve this problem an algorithm is developed referred to as spatio-temporal Markov

random field [22], for traffic images at intersections. This algorithm models a tracking

problem by determining the state of each pixel in an image and its transit, and how such

states transit along both the image axes as well as the time axes.

To feed the classifier with information to use for classification, mathematical measures

called features need to be extracted from the objects to be classified. Feature extraction in

terms of supervised learning [23] can be described as, given a set of candidate features, the

selection of a subset of feature most suitable for the classification algorithm to be used.

Features are generally divided with respect to the shape and texture of the object. Shape

features are based on the objects geometry, captured by both the boundary and the interior

region. Texture features on the other hand depend on the grayscale values of the interior.

Some of the latest feature extraction methods include SIFT descriptor [24], SURF descriptor

[25], GLOH features [26] and HOG features [27]. The features used in this project are a

combination of shape based and texture based features. These include area, perimeter, height,

width, orientation, compactness, extent, skewness, elongation and aspect ratio.

Introduction ITAMS


A semi real-time vehicle tracking algorithm [28] to determine the speed of the vehicles in

traffic from traffic cam video has also been developed. This method involves object feature

identification, detection, and tracking in multiple video frames. Speed calculations are made

based on the calibrated pixel distances. Optical flow images have been computed and used

for blob analysis to extract features representing moving objects. Some challenges exist in

distinguishing among vehicles in uniform flow of traffic when the object are too close, are in

low contrast with one another, and travel with the same or close to the same speed. In the

absence of a ground truth for the actual speed of the tracked vehicles accuracy cannot be

determined.

Extensive research has also been done to try to address the problem of estimating the

traffic on the road. Most of the methods used for traffic density either used the pre-learnt

models [29] (which are made by training the systems over the pre classified samples and

known shapes of object first before using them for the general classification purposes) or

used the temporal data [30] available overtime to estimate the cars and background. The

problems with these techniques are that they fail to classify if some untrained examples

appear like accidents, change in weather conditions etc. The other works mostly uses the

temporal data available over time through these cameras to estimate the traffic.

A new approach to describe traffic scene, including vehicle collisions and vehicle

anomalies [31] at intersections by video processing and motion statistic techniques has been

developed. Detecting and analysing accident events are done by observing partial vehicle

trajectories and motion characteristics. Hwang et al. in [32] propose a method which

generates and evolves structure of dynamic Bayesian network to deal with uncertainty and

dynamic properties in real world using genetic algorithm. Effectiveness of the generated

structure of dynamic Bayesian network is evaluated in terms of evolution process and the

accuracy in a domain of the traffic accident detection. Jung Lee in [33] considers a video

image detector system using tracking techniques overcoming shadows, occlusions and no

lighting at night. It derives the traffic information, volume count, speeds and occupancy time,

under kaleidoscopic environments, and proposes an accident detection system. A system

which uses a Hidden Markov Model [34] has also been developed. The system learns various

event behaviour patterns of each vehicle in the HMM chains and then, using the output from

the tracking system, identifies current event chains. The current system can recognize

bumping, passing, and jamming. However, by including other event patterns in the training

Introduction ITAMS


set, the system can be extended to recognize those other events, e.g., illegal U-turns or

reckless driving. There are two methods implemented for accident detection in this paper.

The first one is based on the accident detection module from [35] with additional features

such as change in bounding box in our method. The second method utilizes a block matching

technique for motion estimation as described in [36]. Here motion vectors are taken into

consideration and depending on their changes, occurrence of an accident is determined.

Through this module, we demonstrate the effectiveness of these two simple methods for

accident detection in traffic video sequences.

An approach for visual detection and attribute-based classification of vehicles in

crowded surveillance scenes is explained in [37]. Large-scale processing is addressed along

two dimensions: 1) large scale indexing, and 2) learning vehicle detectors with large-scale

feature selection, using a feature pool containing millions of feature descriptors. This method

for vehicle detection also explicitly models occlusions and multiple vehicle types (e.g., buses,

trucks, SUVs, cars), while requiring very few manual labelling. Artificial Neural Networks

have been widely used for classification if objects. Bayesian Networks have also been

extensively used for object classification. Some of the most popular classification techniques

for vehicles include SVM [38], Back-propagation and Adaboost [39]. An Adaptive

Hierarchal Multi-class SVM method [6] has also been discussed.

For indexing purposes [40], the vehicles are tracked over time, and each vehicle is given

a unique ID. In addition, the location and the bounding box of each vehicle are output for

each frame. This data format includes the ID, the first frame the vehicle appears, the vehicle’s

position and size for each frame until it disappears, and its average size and type. These data

of each detected vehicle are saved as text file, and the vehicle itself is stored into metadata

repository with ID as its name. In this proposed approach, a similar approach with additional

parameters such as dimensions of vehicles is used for indexing and retrieval. Each detected

object is given a video ID, a position ID, file ID, type ID and colour ID. These IDs for a

detected object along with its image and dimension are stored in a .mat file for querying

purposes.

After a thorough research, the following algorithms were concluded the best suited for

this project. For object detection the Optical Flow Model [41], Kalman Filtering [42] for

tracking objects including in real time. A comparative analysis of classification algorithms

Multi-class SVM (one vs. all), Adaptive Hierarchal Multi-class SVM and Back-propagation

Introduction ITAMS


is presented for categorizing detected objects. The approaches mentioned above have been

incorporated for accident detection and the traffic analysis module.

1.3 Motivation

CCTV (Closed-Circuit Television) cameras are becoming increasingly common and

widespread, due to the increasing traffic in most cities,. Used for traffic management, the

cameras allow operators to monitor traffic conditions visually and the police to detect crimes

such as speeding or banned vehicles. The large number of cameras makes it impractical for

each to be monitored at all times by an operator, and as such it is common for an event of

interest (e.g. an accident) to be neglected. A surveillance monitoring and analysis system

benefits traffic and police authorities in order to serve the society better. For example, if an

accident occurs and is detected, the appropriate authorities can be notified in real time so that

quick action can be taken. Another example of the application is that illegal vehicles on the

road, such as truck in a no truck zone may be detected and can be flagged in the video in real

time. As manual monitoring of multiple CCTV cameras is impractical, it is required to

develop a content based indexing and retrieval system to make this tedious process easy.

With suitable processing and analysis it is possible to extract a lot of useful information on

traffic from the videos, e.g. the number, type, and speed of vehicles using the road. Computer

Vision, being a new and upcoming field in Computer Science, and visual vehicle surveillance

being a relatively new area of exploration, a system using its principles and methods has been

developed. The main goal of this system is to build a coherent traffic analysis and monitoring

system which will contribute to better run city.

1.4 Problem Statement

The requirement is to build a system for road video surveillance, which tackles the

common issues and help provide useful inputs for managing and controlling the traffic. The

main aim of the project is to build an Intelligent Traffic Analysis & Monitoring System. It

should be able to detect and track objects in the given input videos. The detected objects

should also be classified based on their type, using a robust selection of features.

Functionalities related to event detection and traffic analysis like accident detection, speed

estimation, traffic flow direction determination and traffic density estimation also need to be

Introduction ITAMS


implemented. The project tries to develop and combine algorithms which lead to an efficient

system in terms of detection, tracking and classification of objects. The system should work

in real time and for prerecorded videos as well. The system should have a querying module

which can extracts objects from a video based on its dimension, colour and type. It should

also be able to show the frame in which a particular queried object appears.

1.5 Objectives

The objectives of this project are as follows:

1. Detect objects in a video.

2. Track objects in a video.

3. Distinguish between humans and vehicles in a video.

4. Amongst determined vehicles, classify them according to their type.

5. In case of vehicles, index them based on their dimensions, type and colour.

6. Classify vehicles based on these parameters.

7. Retrieve vehicles based on these parameters.

8. Detect the occurrence of an accident.

9. Detect speed of the vehicle.

10. Detect the density of traffic.

11. Detect the direction of flow of the traffic.

1.6 Scope

This project presents a novel vehicle surveillance video indexing and retrieval system

based on object type measurement. The system works for real time video sequences, as well

as prerecorded videos. Firstly, all moving objects are detected from videos using Optical

Flow Model, followed by Kalman Filtering for tracking detected objects. Then each moving

object is segmented, and its features are extracted. Both the vehicle image and its features are

stored in the metadata repository. During retrieval, when the user has selected a particular

type of vehicle, the system would return the most qualified vehicles without re-processing the

videos. Video clip which contains the vehicle selected by user is then replayed, and the

trajectory is depicted on the frame simultaneously. Experimental results prove this system is

Introduction ITAMS


an effective approach for video surveillance and interactive indexing and retrieval. The

system also denotes the density of traffic, the speed of vehicles, detects accidents, and

determines traffic flow direction in a chosen video, which are most commonly required for

traffic analysis.

1.7 Methodology

The preprocessing part for a surveillance video that is shot boundary detection and key

frame extraction is done to reduce the redundant frames, if necessary. In case the video

duration is less, we proceed to the next step. To detect the objects in the video, the Optical

Flow Model is used. Optical flow is the distribution of apparent velocities of movement of

brightness patterns in an image. Optical flow can arise from the relative motion of objects and

the viewer. Optical Flow reflects the image changes due to motion during a time interval. The

optical flow field represented in the form of Velocity vector consisting of length of the vector

determines the magnitude of velocity and direction of the vector determines the direction of

motion.

After objects are detected, they are tracked using Kalman Filtering. This algorithm uses a

series of measurements of position of the object that has been detected in the frame observed

over time, containing noise and other inaccuracies, and produces estimates of unknown

variables that tend to be more precise than those based on a single measurement alone. Thus,

the position estimate in the next frame is determined. Then, the weights are updated; when

the position in the next frame is known (it becomes present frame). Higher weights are given

to those object tracks with higher certainty of being to that track and vice versa. The

predicted tracks are assigned to the detections using an assignment algorithm called

Hungarian algorithm. Thus the most optimal tracks are obtained. Bounding box and the

trajectory for the objects is drawn.

When the vehicles come closer to the camera, or somewhere midway in case of two way

traffic, the vehicles are extracted. Features are extracted from the detected object, namely

aspect ratio, height, weight, elongation, perimeter, area, compactness, extent, skewness and

orientation. These features are then used to train the Multi-class SVM (one vs. all). Around

1000 samples are taken to train the Multi-class SVM (one vs. all). These samples were

Introduction ITAMS


manually labeled into the following classes, Car, Bike, Truck/Bus, Human, Auto & Junk. The

Multi-class SVM (one vs. all) model was trained using Gaussian Radial Basis Function

(RBF) kernel. 500 samples were used for testing the trained classifier.

Adaptive hierarchical Multi-class SVM is used to train and test the samples mentioned

above as well. The training and testing is in the form of a binary tree with RBF kernel being

used.

These results were compared to results obtained, using Back-propagation. The algorithm

used was Levenberg-Marquardt Back-propagation algorithm with an input layer consisting of

10 nodes, hidden layer consisting of 12 nodes and output layer consisting of 6 nodes.

These three classification techniques were compared, and since Multi-Class SVM (one

vs. all) gave better results, it was incorporated into the real time detection, tracking and

classification module. When a video input is given to this module, the features of detected

objects are extracted, and based on these features, and the trained Multi-class SVM (one vs.

all), their classes are predicted as one of the mentioned above.

The traffic analysis module contains the following three functionalities, namely Speed

Estimation, Traffic Flow Direction Determination & Traffic Density Estimation. For speed

estimation in a particular frame, the change in pixels per second of the object between two

consecutive frames (the previous frame and the current frame) is calculated. For traffic flow

estimation, the video frame is divided into blocks. For each block, a motion vector is

calculated and the motion between the video frames is estimated. This estimation is done

using a block matching method by moving a block of pixels over a search region. Traffic

Density is calculated by counting the number of vehicles per frame and using appropriate

threshold values. The density is determined to be low, medium or high.

Accident detection for an object is done through calculating the change in speed, area,

position, size of bounding box and orientation of a particular vehicle across consecutive

frames its present in. Then these features are added and compared with a threshold. If this

value exceeds the threshold, then accident is signaled. It is also calculated by observing the

random change in direction of motion vectors in the video when an accident occurs. The

results for both are compared as well.

Introduction ITAMS


1.8 Organization of the Report

To make the understanding of the realised work easier, the report contains eight chapters

which can be explained as follows:

Chapter 1: The introduction, methodology and objectives of this project are described in this

chapter. The difficulty is not only to track and detect the objects, but also to associate the data

to different objects. This includes a number of problems such as classification, validation and

occlusion handling. Complete separate modules for accident detection and traffic analysis are

also being developed.

Chapter 2: This chapter describes software requirements specification, which enlists all

necessary requirements that are required for the project development.

Chapter 3: This chapter explains the high level design of the project. It explains about the

input data, the output and the transformations necessary to visualize the results. This chapter

also describes the system architecture and the data flow diagrams.

Chapter 4: Gives a detailed design and theoretical description of the various algorithms

being used in this project, be it for tracking, detection, classification, etc. respectively.

Chapter 5: This chapter describes the languages and the environment used for developing

and implementing the project. Here an overview of MATLAB is given, and define its coding

standards, its syntaxes and its limitations.

Chapter 6: This chapter has a detailed overview of the various tests performed on the

system.

Chapter 7: In this chapter the results of the different aspects of the system are tabulated and

presented. The errors and the accuracy of the algorithms used are investigated and then the

influences of different parameters are tested.

Chapter 8: Finally this chapter ends the report with a conclusion and future works.

Page 15

Chapter 2

Software Requirements Specification

2.1 Overall Description

This section gives a brief description of the proposed system.

2.1.1 Product Perspective

This project presents a novel vehicle surveillance video indexing and retrieval system

based on object type measurement. The system works for real time video sequences, as well

as pre-recorded videos. Firstly, all moving objects are detected from videos using Optical

Flow Model, followed by Kalman Filtering for tracking the detected objects. Then each

moving object is segmented, and its features are extracted. Both the vehicle image and its

features are stored in the metadata repository. During retrieval, when the user selects a

particular vehicle, based on either type or colour, the system would return the most qualified

vehicles without re-processing the videos. Video clip which contains the vehicle selected by

user is then replayed, and the trajectory is depicted on the frame simultaneously.

Experimental results prove this system is an effective approach for video surveillance and

interactive indexing and retrieval. The system also incorporates commonly useful

functionalities such as the classification of vehicles, density of traffic, the speed of vehicles,

detects accidents, and determines traffic flow direction in a chosen video.

2.1.2 Product Functions

In this project, traffic videos which are captured by stationary cameras gazing at an

angle toward the ground plane. The system works for both pre-recorded and real-time videos.

The system is made up of 5 modules.

The system can be divided into 5 modules:

Object Detection and Tracking Module: This module uses the Optical Flow Model to

detect object and Kalman Filtering to track objects. This module works for both real time and

pre-recorded videos.

Software Requirements Specification ITAMS


Classification Module: This module classifies the detected object into six classes, namely

car, bike, truck/bus, human, auto and junk, based on the features of the object. This module

uses three algorithms for classification: Multi- class SVM (one vs. all), Adaptive Hierarchal

Multi-class SVM and Back-propagation Algorithm. This module works for both real time and

pre-recorded videos.

Accident Detection Module: This module checks the video for accidents and signals if an

accident occurs. This module implements two approaches for accident detection, one based

on motion vectors of the detected object, and the other based on calculation of overall

accident index for all the detected objects.

Traffic Analysis Module: This module has three functionalities: Traffic Flow Direction

Estimation, Traffic Density Estimation and Vehicle Speed Estimation.

Querying Module: This module can be used to query detected objects based on their

dimension, colour and type. This module also shows the frames in which a particular object

appears.

2.1.3 User Characteristics

The program provides very easy-to-use functions to input the video and does not

expect any extra technical knowledge from the user. A basic understanding of all the options

provided in the program would facilitate user in analysing to the best possible extent. Since it

is a command-driven input it is sufficiently easy for any kind of end user to execute it.

2.1.4 Constraints

1. The software only works for videos captured by stationary cameras.

2. Foreground blobs vary according to the quality of video, not being well defined in

certain cases even after applying morphological operations.

3. Occlusions are partially handled in certain situation, because of the stationary camera,

due to its position; the vehicles do not diverge while they are captured.

4. Managing huge amounts of training data for Multi-class SVM and Back-propagation

is tedious.

5. The height of the camera is required to get real world coordinates for the objects.



2.1.5 Assumptions and Dependencies

1. The camera for recording videos is stationary.

2. The video contains moving prominent vehicles, which are not heavily occluded.

3. Scale factor is assumed, as the height of the camera is not known.

4. Certain dimensions are approximated based on other dimensions.

2.2 Specific Requirements

2.2.1 Functional Requirements

1. The system handles both real time, and pre-recorded video.

2. The system is able to take snapshots from the video.

3. The system is capable of video playback at 30fps with a resolution of at least

320x240.

4. The system should be compatible with AVI, WMV formats.

2.2.2 Performance Requirements

The device must be running at least on a 2.67 GHz CPU, with 4 GB of RAM, for the

application to run optimally. Power supply (220 Volts A.C.) should be provided to the

system.

2.2.3 Supportability

The supportability or maintainability of the system being built, including the coding

standards, naming conventions, class libraries, maintenance access and maintenance utilities

can be enhanced by implementing it like real world application, with the use of optimum high

end hardware and software. One or two lines of documentation must be provided along with

the functions to indicate what they are trying to achieve. Documentation must be provided for

every module.

2.2.4 Software Requirements

o Windows XP/7/8 OS

o MATLAB 2011a, with Image Processing Toolkit

o SmartDraw



2.2.5 Hardware Requirements

o Intel ® Core ™ i5 CPU, M 560 @ 2.67 GHz

o 4GB RAM

o 500GB HDD

o A colour monitor with a minimum resolution of 1024x786 pixels

2.2.6 Design Constraints

1. Efficient usage of memory is required for processing images and videos.

2. As the program needs to be run even on low-end machines the code should be

efficient and optimal with the minimal redundancies.

3. Needless to say, the program is made to be robust and fast.

4. It is assumed that the standard output device, namely the monitor, supports colours.

5. One of the requirement in the file saving and retrieval process is that the required file

is in the current directory.

2.2.7 Interfaces

The interface developed is user friendly and very easy to understand. Even a new user

can easily understand the complete functionalities of the application. A Graphical User

Interface (GUI) is implemented for this purpose.

2.3 Concluding Remarks

This chapter describes software requirements specification, which enlists all necessary

requirements that are required for the project development. It gives a brief module wise

description of the project, as well as specifying the specific software, hardware and design

requirements required for the project.

Page 19

Chapter 3

High Level Design

This chapter deals with the design constraints of the system, the architectural strategy

adopted for the system, the system architecture of the system and it also presents the data

flow diagrams for the different modules of the system.

3.1 Design Constraints

This section addresses the issues that need to be discussed or resolved before

attempting to devise a complete design solution.

3.1.1 General Constraints

The following constraints must be kept in mind while developing the code:

1. The system must have MATLAB 2011a or higher with Image Processing Toolbox,

Computer Vision Toolbox & Neural Network Toolbox installed.

2. The program must be robust to handle multiple conditions and sub-conditions.

3. The operating system in use must be Windows 7 (or any equivalent) or higher.

3.1.2 Development Methods

The whole project has been implemented on MATLAB 2011a. The user interface will

be coded in the same using GUIDE (GUI Development Environment) as its core. Different

modules are required for different aspects of the project. Their integration needs to be done in

a manner such that they do not interfere with the other functionalities.

3.2 Architectural Strategies

This section gives a description about the architectural strategies adopted for the

development of this project. It mentions the programming language used, the future plans for

the project, how data is stored and managed and describes the software model of the proposed

system.

High Level Design ITAMS


3.2.1 Programming Language

The programming language plays a major role in the efficiency as well as the future

development of the project. As such, we have chosen MATLAB as the programming

environment to be used. We made a minimalistic use of the in-built functions of MATLAB.

The algorithms for all the parts of the proposed system have been coded manually.

3.2.2 Future Plans

While this project exploits the manipulation of the various parameters, some features

may affect the optimal classification of objects more than others. As part of our future

enhancements, we aim to find these features and optimize them so as to find the most

accurate solution for classification. Furthermore, it would be worthwhile to run this system

with a feed from a greater variety of cameras, as well as using moving cameras. Most likely,

this would aid in complete handling of occlusion and would lead to improved detection and

classification results. Data storage should be as efficient as possible, in spite of having a large

number of training samples.

3.2.3 Data Storage Management

Data storage management is essential for the efficient running of the program. It has

been ensured that all dynamically allocated variables and objects are efficiently cleaned up

and de-allocated. Various techniques must be used to guarantee a certain amount of available

RAM to the program memory space.

3.2.4 Software Model

The described system follows a common data repository model. This model consists

of a set of mechanisms and data structures that allows software to handle data and execute

functions in an effective manner. In this model, there is a central data repository which holds

all the data. This system has a central repository of image and video files as can be seen in

Fig 3.1. The user can choose a video from this repository for processing according to her

needs. There also exist .mat files which contain a video ID, a position ID, file ID, type ID and

colour ID for each detected object. Real-time videos after processing are stored back into the

repository.



3.3 System Organization

This section gives a description about the organization of the various system

components and the order in which the input to the system is processed. Fig 3.1 demonstrates

the framework for the project.

Figure 3.1 Framework for Traffic Analysis & Monitoring System

The proposed system works for both real time and pre-recorded traffic videos. In case

of pre-recorded videos, if the videos are large, to save time and resources, shots are detected

from it using colour histogram method, which takes the histogram difference of the frames

and computes a histogram. A threshold value is then set. The frames where the value is above

the threshold are identified as shots. There are great redundancies among the frames in the

same shot; therefore, certain frames that best reflect the shot contents are selected as key

frames to succinctly represent the shot. For detection of objects, the Optical Flow Model is

used, which has been found to be efficient in comparison to the Gaussian Mixture Model.

The objects detected in the video are tracked by Kalman Filtering. Objects detected

are classified using Multi-class SVM (one vs. all), Adaptive Hierarchal Multi-class SVM and

Back-propagation, based on a number of features. Initially Multi-class SVM (one vs. all),

Adaptive Hierarchal Multi-class SVM and Back-propagation Algorithm are trained using

samples of the objects to be detected by it. The trained data is then used to predict objects



under predetermined number of classes, based on type of the object. Certain features such as

dimensions and colour are used for querying vehicles form a given surveillance video.

The system also contains modules for Accident Detection and Traffic Analysis. The

Traffic Analysis module consists of three functionalities, namely Traffic Density

Measurement, Traffic Flow Direction Determination and Vehicle Speed Estimation.

The system contains a Querying module as well. This module can be used to query

detected objects based on their dimension, colour and type. This module also shows the

frames in which a particular object appears.

3.4 Data Flow Diagrams

A data flow diagram (DFD) is a graphical representation of the "flow" of data through

an information system, modelling its process aspects. DFDs can also be used for the

visualization of data processing (structured design). A DFD shows what kinds of information

will be input to and output from the system, where the data will come from and go to, and

where the data will be stored. The DFD’s have been drawn using the SmartDraw software.

3.4.1 DFD Level – 0

The DFD Level-0 consists of two external entities, the GUI and the Output Video, along with

a process block 1.0, representing the Intelligent Traffic Analysis & Management System as

shown in Fig 3.2. The ITAMS consists of the Tracking, Querying, Accident Detection and

Traffic Analysis modules. The user selects one of these modules and the desired output is

displayed.

Figure 3.2 Data Flow Diagram Level – 0

http://en.wikipedia.org/wiki/Information_system

http://en.wikipedia.org/wiki/Data_visualization

http://en.wikipedia.org/wiki/Data_processing



3.4.2 DFD Level – 1

The DFD Level-1 consists of two external entities, the GUI and the Output Video, along with

six process blocks, representing the internal workings of the Intelligent Traffic Analysis &

Management System as shown in Fig 3.3. Initially objects are detected in any input video in

process block 1.1. After this objects are tracked in process block 1.3. From this step, the user

can proceed in one of three ways. She could either choose the accident detection module

(process block 1.2), the traffic analysis module (process block 1.4), or the classification

module (process block 1.6), which is preceded by the feature extraction module (process

block 1.5).

Figure 3.3 Data Flow Diagram Level – 1



3.4.3 DFD Level – 2 (Traffic Analysis)

The DFD Level-2 for Traffic Analysis consists of two external entities, the GUI and the

Output Video, along with three process blocks, representing the three functionalities of the

Traffic Analysis Module, namely Speed Estimation (process block 1.4.1), Traffic Flow

Direction Determination (process block 1.4.2) and Traffic Density Estimation (process block

1.4.3) as shown in Fig 3.4. Any one of these functionalities may be chosen by the user, and

the corresponding analysis takes place on the on the selected input video.

Figure 3.4 Data Flow Diagram Level – 2 (for Traffic Analysis)



3.4.4 DFD Level – 2 (Object Tracking)

The DFD Level-2 for Object Tracking consists of two external entities, the GUI and the

Output Video, along with five process blocks, representing the working of the Kalman Filter

for Object Tracking shown in Fig 3.5. The prior position of the object is noted (process block

1.3.1). It consists of two operations, predict (process block 1.3.3) and update (process block

1.3.2), wherein, the tracks of objects is predicted in the next frame (process block 1.3.4) and

later the weights is updated for more accurate prediction in the subsequent frames.

Figure 3.5 Data Flow Diagram Level – 2 (for Object Tracking)



3.4.5 DFD Level – 2 (Accident Detection)

The DFD Level-2 for Accident Detection consists of two external entities, the GUI and the

Output Video, along with six process blocks, to calculate the overall accident index, and

signal whether an accident has occurred or not as shown in Fig 3.6. After the speed (1.2.1),

area (1.2.2), bounding box size (1.2.3), orientation (1.2.4) and position (1.2.5) of the detected

vehicles have been determined; they are all compared with preset threshold values.

Depending on this comparison, the overall accident index is calculated (process block 1.2.6),

and if it exceeds the total threshold value, then an accident is signaled.

Figure 3.6 Data Flow Diagram Level – 2 (for Accident Detection)


This chapter explains the high level design of the project. It explains about the input

data, the output and the transformations necessary to visualize the results. This chapter also

describes the system architecture and the data flow diagrams.

Tracked

Objects

Page 27

Chapter 4

Detailed Design

This chapter describes each system component in detail. The Fig 4.1 shows the

organization of the various system components. First object detection is done using Optical

Flow Model, followed by tracking, which is achieved through Kalman Filtering. After this,

features are extracted, and are used to train the Support Vector Machine and Back-

propagation Algorithm, which are used to classify objects into predetermined categories.

Modules for accident detection and traffic analysis are integrated after the features have been

extracted. A module for Querying is present as well.

Figure 4.1 Organization of System Components

Detailed Design ITAMS


4.1 Pre-processing

Pre-processing may be required for pre-recorded video sequences. Shot boundary

detection [16] is performed to extract a scene of interest from a given input video. This is

followed by key frame extraction [16] if the resultant video has redundant frames.

4.1.1 Shot Boundary Detection

A shot is a consecutive sequence of frames captured by a camera action that takes

place between start and stop operations, which mark the shot boundaries. There are strong

content correlations between frames in a shot. Therefore, shots are considered to be the

fundamental units to organize the contents of video sequences and the primitives for higher

level semantic annotation and retrieval tasks. Generally, shot boundaries are classified as cut

in which the transition between successive shots is abrupt and gradual transitions which

include dissolve, fade in, fade out, wipe, etc., stretching over a number of frames. Cut

detection is easier than gradual transition detection. The algorithm implemented in this

project follows a threshold based approach.

The threshold-based approach detects shot boundaries by comparing the measured

pair-wise similarities between frames with a predefined threshold: When a similarity is less

than the threshold, a boundary is detected. The threshold can be global, adaptive, or global

and adaptive combined. The global threshold-based algorithms use the same threshold, which

is generally set empirically, over the whole video, as in. The major limitation of the global

threshold-based algorithms is that local content variations are not effectively incorporated

into the estimation of the global threshold, therefore influencing the boundary detection

accuracy.

4.1.2 Key Frame Extraction

There are great redundancies among the frames in the same shot; therefore, certain

frames that best reflect the shot contents are selected as key frames to succinctly represent the

shot. The extracted key frames should contain as much salient content of the shot as possible



and avoid as much redundancy as possible. The implemented algorithm adopts the colour

feature to extract key frames. The following algorithm has been used in this project:

1. Choose the first frame as the standard frame that is used to compare with the

following frames.

2. Get the corresponding pixel value in both frames one by one, and computing their

difference respectively.

3. After finishing 2, add the results in 2 altogether. The sum will be the difference

between these two frames.

4. Finally, if sum is larger than a threshold we set, select frame (1+i) as a key-frame,

then frame (1+i) becomes the standard frame. Redo 1 to 4 until there is no frame can

be captured.

4.2 Optical Flow Model

The Optical Flow Model described in [41] is used for object detection. Optical flow is

the distribution of apparent velocities of movement of brightness patterns in an image.

Optical flow can arise from relative motion of objects and the view. Discontinuities in the

optical flow can help in segmenting images into regions that corresponds to different objects.

To compute the optical flow between two images, the following optical flow

constraint equation is evaluated:

Ixu+Iyv+It=0 (4.2.1)

In equation (4.2.1), the following values are represented:

Ix, Iy and It are the spatiotemporal image brightness derivatives

u is the horizontal optical flow

v is the vertical optical flow

As this equation is under constrained, there are several methods to solve for u and v.

Horn-Schunck Method is used in this project.



By assuming that the optical flow is smooth over the entire image, the Horn-Schunck

method computes an estimate of the velocity field.

In Horn-Schunck method, u and v are solved as follows:

1. Compute Ix and Iy using the Sobel convolution kernel: [-1 -2 -1; 0 0 0; 1 2 1], and its

transposed form for each pixel in the first image.

2. Compute It between images 1 and 2 using the [-1 1] kernel.

3. Assume the previous velocity to be 0, and compute the average velocity for each pixel

using [0 1 0; 1 0 1; 0 1 0] as a convolution kernel.

4. Iteratively solve for u and v.

Blob analysis, or Blob detection is to detect and analysis connected region in a frame and

the vehicles are tracked. The optical flow vectors are stored as complex numbers. Their

magnitude squared is computed, which is used for thresholding the frames. Median filtering

is applied to remove speckle noise. Morphological operations are done to remove small

objects and holes. The bounding box is drawn for blobs with extent ratio above 0.4 and those

of suitable size. The motion vectors are drawn for these detected vehicles as well.

4.3 Kalman Filter Tracking

The Kalman filter, also known as linear quadratic estimation (LQE), is an algorithm

that uses a series of measurements observed over time, containing noise (random variations)

and other inaccuracies, and produces estimates of unknown variables that tend to be more

precise than those based on a single measurement alone. The filter is named after Rudolf

(Rudy) E. Kálmán, one of the primary developers of its theory [42] [43].

This algorithm that uses a series of measurements of position of the object that has been

detected in the frame observed over time, containing noise and other inaccuracies, and

produces estimates of unknown variables that tend to be more precise than those based on a

single measurement alone. Thus, the position estimate in the next frame is determined. Then,

the weights are updated; when the position in the next frame is known (it becomes present

frame). Higher weights are given to those object tracks with higher certainty of being to that

track and vice versa. The predicted tracks are assigned to the detections using an assignment

http://en.wikipedia.org/wiki/Algorithm

http://en.wikipedia.org/wiki/Statistical_noise

http://en.wikipedia.org/wiki/Rudolf_E._K%C3%A1lm%C3%A1n

http://en.wikipedia.org/wiki/Rudolf_E._K%C3%A1lm%C3%A1n



algorithm called Hungarian algorithm. Thus the most optimal tracks are obtained. Bounding

box and the trajectory for the objects is drawn.

, where p=position and v=velocity (4.3.1)

( ) ( ) ( )

( ) (4.3.2)

( ) ( ) ( ) (4.3.3)

The above equations (4.3.1), (4.3.2) and (4.3.3) are used to estimate the position and

velocity of an object, which are nothing but the equations of kinematics, and they are used for

prediction of the aforementioned values.

The algorithm works in a two-step process. In the prediction step, the Kalman filter

produces estimates of the current state variables, along with their uncertainties. Once the

outcome of the next measurement (necessarily corrupted with some amount of error,

including random noise) is observed, these estimates are updated using a weighted average,

with more weight being given to estimates with higher certainty. Because of the algorithm's

recursive nature, it can run in real time using only the present input measurements and the

previously calculated state; no additional past information is required.

From a theoretical standpoint, the main assumption of the Kalman filter is that the

underlying system is a linear dynamical system and that all error terms and measurements

have a Gaussian distribution (often a multivariate Gaussian distribution).

The weights are calculated from the covariance, a measure of the estimated

uncertainty of the prediction of the system's state. The result of the weighted average is a new

state estimate that lies in between the predicted and measured state, and has a better estimated

uncertainty than either alone. This process is repeated every time step, with the new estimate

and its covariance informing the prediction used in the following iteration. This means that

the Kalman filter works recursively and requires only the last "best guess", rather than the

entire history, of a system's state to calculate a new state.

http://en.wikipedia.org/wiki/Weighted_mean

http://en.wikipedia.org/wiki/Real-time_Control_System

http://en.wikipedia.org/wiki/Linear_dynamical_system

http://en.wikipedia.org/wiki/Gaussian_distribution

http://en.wikipedia.org/wiki/Multivariate_Gaussian_distribution

http://en.wikipedia.org/wiki/Covariance

http://en.wikipedia.org/wiki/Recursive_filter



A simple step-by-step guide for Kalman filtering is mentioned below [44] [45]:

1. Building a Model

First we have to check of the Kalman Filtering conditions fit the problem.

The two equations of Kalman Filter are as follows:

xk = Axk-1 + Buk + wk-1 (4.3.4)

zk = Hxk + vk (4.3.5)

Each xk may be evaluated by using a linear stochastic equation (4.3.4). Any xk is a linear

combination of its previous value plus a control signal uk and a process noise.

The second equation (4.3.5) tells that any measurement value is a linear combination of the

signal value and the measurement noise. They are both considered to be Gaussian. The

process noise and measurement noise are statistically independent.

The entities A, B and H are in general form matrices. While these values may change

between states, most of the time, they are assumed to be constant.

If the problem fits into this model, the only thing left is to estimate the mean and standard

deviation of the noise functions wk-1 and vk. The better noise parameters are estimated, the

better estimates is obtained.

2. Starting the Process

The next step is to determine the necessary parameters and the initial values if the model fits

the Kalman Filter.

There are two distinct set of equations: Time Update (prediction) and Measurement Update

(correction), as presented in Table 4.1. Both equation sets are applied at each kth

state.



Time Update

(prediction)

Measurement Update

(correction)

(

)

( )

( )

Table 4.1 Equations for Prediction and Correction

R is rather simple to find out, because information about the noise in the environment is

generally available. But finding out Q is not so obvious. To start the process, x0 and P0 are

estimated.

3. Iteration

Here, is the "prior estimate" which in a way means the rough estimate before the

measurement update correction. And also is the "prior error covariance". These "prior"

values are used in the Measurement Update equations.

In Measurement Update equations, is determined which is the estimate of x at time k.

Also, is determined which is necessary for the k1 (future) estimate, together with . The

Kalman Gain ( ) evaluated is not needed for the next iteration step. The values evaluated at

Measurement Update stage are also called "posterior" values.

The position of an object is obtained from the centroid of the blob of the objects in a

particular frame in the input video which the detections are obtained using optical flow

method. Thus by using the Kalman Filter Tracking approach, the predicted value for the

position of the vehicle in the particular frame in a video is obtained. The predicted tracks are

assigned to the detections using an assignment algorithm called Hungarian algorithm [46].

The distance method used is the Euclidean Method, to calculate the cost matrix. The cost

value threshold is set to 60 in this approach. If the cost is within this value, then the

prediction holds good, else it is discarded. Each track is assigned a unique colour and is

mapped to their corresponding objects in the video.



The detected object is tracked for a set number of frames. In this approach, the value is set

between 4-15. If the object disappears for these numbers of frames, and reappears, then

occlusion is handled appropriately, else if it does not reappear, then the object is assumed to

have gone out of sight.

4.4 Feature Extraction

The essential task for the video processing system will be to take an object region in a

video and classify it, thereby recognizing it. In other words, a collection of classes is

generated, namely ―junk‖, ―bike‖, ―truck/bus‖, ―car‖, ―human‖ and ―auto‖, and then the

detected object in a video is taken and it is determined to which, if any, of the classes that

object falls into. Such a mechanism is called a classifier. To feed the classifier information to

use for classification, mathematical measurements (features) from that object are extracted.

When the object is close to the camera, it is captured such that the size of the bounding box is

the largest, so that the object is extracted in whole. The features selected for classification

should be stored in a feature vector. The following features were extracted from detected

objects in this system for classification:

1. Area:

Area is a scalar. It is the actual number of pixels in the region.

2. Extent:

Extent is a scalar. It is the proportion of the pixels in the bounding box. It is computed by

dividing the area of the object (blob) by the area of the bounding box.

3. Perimeter:

Is the vector containing the distance around the boundary of each contiguous region in the

image, where p is the number of regions. Perimeter is computed by calculating the distance

between each adjoining pair of pixels around the border of the region.

4. Aspect Ratio:

The Aspect Ratio is obtained by dividing the width of the bounding box, by the height.



5. Height:

It is the height of the bounding box.

6. Width:

It is the width of the bounding box.

Centroid:

Centroid (centre of mass) (xc; yc) of shape Q with area S, is given by equation 4.4.1:

∑

∑ (4.4.1)

Central Moment:

The central moment of order pq for object (region) Q is defined by equation 4.4.2:

∑ ( )

( )

(4.4.2)

where is area of Q (number of pixels in Q), (xc; yc) centroid of Q, and p; q = 0; 1;…

7. Compactness:

Is given by equation 4.4.3:

(4.4.3)

8. Elongation:

Also known as elongation or elongatedness, is calculated as shown in equation 4.4.4:

(

) (4.4.4)

9. Orientation:

Orientation of object can be defined as angle between x axis and principal axis, axis around

which the object can be rotated with minimum inertia. It is a Scalar; the angle (in degrees



ranging from -90 to 90 degrees) between the x-axis and the major axis of the ellipse that has

the same second-moments as the region. It is given by equation 4.4.5:

( )

(4.4.5)

10. Skewness:

Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution,

or data set, is symmetric if it looks the same to the left and right of the center point. The

skewness for a normal distribution is zero, and any symmetric data should have skewness

near zero. Negative values for the skewness indicate data that are skewed left and positive

values for the skewness indicate data that are skewed right. By skewed left, it is meant that

the left tail is long relative to the right tail. It is given by the equation 4.4.6:

( ) (4.4.6)

l here defines the length as explained above.

4.5 Accident Detection

Having discussed about the advantages of video-based traffic surveillance systems, a

new paradigm can be added to the application of video surveillance systems, if accidents can

be detected at traffic intersections and reported to the concerned authorities so that necessary

action can be taken.

An important stage in automatic vehicle crash monitoring systems is the detection of

vehicles in each video frame and accurately tracking the vehicles across multiple frames.

With such tracking, vehicle information such as speed, change in speed and change in

orientation can be determined to facilitate the process of crash detection. The region of

interest which incorporates the road is taken into consideration.



4.5.1 Accident Detection using Overall Accident Index

The detail description of this module is as follows:

1. The first step of the process is the frame extraction step. In this frames are extracted

from the video camera input.

2. The second step of the process is the vehicle detection step. Here the already stored

background frame is subtracted from the input frame to detect the moving regions in

the frame. The difference image is further thresholded to detect the vehicle regions in

the frame. Hence the vehicles in each frame are detected.

3. In the third step low-level features such as area, centroid, orientation, luminance and

colour of the extracted vehicle regions are computed. And also for each of the region

detected in frame at time t, similarity index is computed with all of the regions

detected in frame at time t+1 using human vision based model analysis.

4. In the tracking stage, Euclidean distance is computed between the low-level features

of each vehicle in frame n and all the other vehicles detected in frame n+1. This

Euclidean distance vector is combined with the already computed similarity index for

a particular vehicle region in frame n. Based on the minimum distance between

vehicle regions tracking was done.

5. In the next step, the centroid position of a tracked vehicle in each frame is computed

and based on this information and the frame rate; the speed of the tracked vehicle is

computed in terms of pixels/second.

6. Since the position of the video camera is fixed, the camera parameters such as focal

length, pan and tilt angle of the vehicle remains the constant.

7. From all this information the pixel coordinates of the vehicle in each frame is

converted to real-world coordinates. By this conversion, the speed of the vehicle in

terms of km/hr is computed.



8. Based on the velocity information, position and low-level features of the tracked

vehicle suitable thresholds are defined to determine the occurrence of accidents.

The accident detection algorithm is summarized as follows:

1. Speeds of the tracked vehicles are calculated. Refer to equation 4.5.1.

2. Velocity, Area, Position, size of Bounding box and Orientation Indexes are

calculated.

3. Overall Accident Index is calculated using the sum of individual indexes and

occurrence of accident is identified.

( ) √( ( ) ( ))

( ( ) ( ))

(4.5.1)

The differences in area, position, bounding box, orientation and speed of the object

across consecutive frames above a certain threshold, are used to calculate an index. This

computed value of the feature is compared with a threshold value and an index is set. If the

computed value exceeds the threshold value, then that particular index is set to 1, otherwise 0.

When all 5 indexes mentioned above are calculated, they are summed up and this new value

called the Overall Accident Index is compared against a predefined Accident Threshold. All

these thresholds are set based on the input video. In this system, the Accident Threshold is set

to 3. If it exceeds this value, an accident is signalled.

4.5.2 Accident Detection using block Matching

The Block Matching Algorithm [36] estimates motion between two images or two

video frames using blocks of pixels. The Block Matching block matches the block of pixels

in frame k to a block of pixels in frame k+1 by moving the block of pixels over a search

region.



Assuming that the input to the block is frame k. The Block Matching block performs the

following steps:

1. The block subdivides this frame using the values entered for the Block size [height

width] and Overlap [r c] parameters. The Block size and Overlap used in this

implementation is [15 15] and [0 0] respectively.

2. For each subdivision or block in frame k+1, the Block Matching block establishes a

search region based on the value entered for the Maximum displacement [r c]

parameter. Here the Maximum displacement was set to [7 7].

3. The block searches for the new block location using either the Exhaustive or Three-

step search method.

In this paper, the search method used is the Exhaustive search method. In the Exhaustive

search method, the block selects the location of the block of pixels in frame k+1 by moving

the block over the search region 1 pixel at a time. This process is computationally expensive

as compared to the Three-step search method.

A Block matching criteria parameter is used to specify how the block measures the

similarity of the block of pixels in frame k to the block of pixels in frame k+1. Block

matching criteria parameter used in this paper the Mean Square Error (MSE), which causes

the Block Matching block to estimate the displacement of the center pixel of the block as the

(d1 d2) values that minimize the following MSE equation 4.5.2:

( )

∑ ∑ ( ) ( ) ( ) (4.5.2)

In equation 4.5.2, B is an N1xN2 block of pixels, and s(x,y,k) denotes a pixel location at (x,y)

in frame k.

The Velocity output parameter is used to specify the block's output. Here the

parameter was chosen as ―Horizontal and vertical components in complex form‖. This makes

the block output the optical flow matrix where each element is of the form u+jv. The real part

of each value is the horizontal velocity component and the imaginary part of each value is the

vertical velocity component. If the direction of motion vectors is random, that is in different

directions then an accident is detected. In other words, the count of the motion vectors in the



approximate four different directions is almost the same then, due to a crash, the vectors tend

to point in different directions signalling an accident.

4.6 Traffic Analysis

Video traffic analysis is the capability of automatically analysing traffic videos to

detect and determine properties of traffic such as traffic flow, speed of vehicles and density of

traffic not based on a single image in real time. This section has the following 3 components:

Traffic Flow Direction Estimation

Vehicle Speed Estimation

Traffic Density Estimation

4.6.1 Traffic Flow Direction Estimation

An image (frame) is extracted from the input video, and is divided into blocks as

mentioned in section 4.5.2. The region of interest which incorporates the road is taken into

consideration. To get the direction of traffic flow, motion is computed between two objects

using exhaustive block matching technique. Thus the motion of traffic flow is determined.

Vectors, which are divided into real and imaginary parts, are drawn around the objects. The

vector is clustered into 4 quadrants in the Cartesian coordinate system, based on the direction

of motion. If the real part of the vector is positive and imaginary part of the vector is positive,

or if the real part of the vector is negative and imaginary part of the vector is positive, then

the direction of flow is North to South, otherwise vice versa. If the real part of the vector is

negative and imaginary part of the vector is positive, or if the real part of the vector is

negative and imaginary part of the vector is negative, then the direction of flow is West to

East, otherwise vice versa.

4.6.2 Vehicle Speed Estimation

Speed of a particular vehicle region Xi in frame It is computed using the distance

travelled by the vehicle in frame It+1 and the frame rate of the video from which the image

sequence are extracted. The distance travelled by the vehicle is computed using the centroid



position (x,y) of the vehicle in It and It+1. Let Xi denote a particular vehicle detected in It and

Xj denote the same vehicle detected in It+1, assuming the correspondence between the

vehicles is determined using the vehicle tracking step. Speed of a particular vehicle region Xi

is given by equation 4.6.1:

( ) √( ( ) ( ))

( ( ) ( ))

(4.6.1)

The scale factor that relates pixel distance to real-world distance was approximately

found to be (100pixel ≃ 120m). Therefore scale factor =.0.8 is used.

4.6.3 Traffic Density Estimation

Traffic Density is calculated by counting the number of vehicles per frame and using

appropriate threshold values. The density is determined to be either:

1. Low

2. Medium

3. High

4.7 Object Classification

Once the objects are detected and tracked, they are categorised into one of the following

six categories:

1. Car

2. Bike

3. Bus/Truck

4. Human

5. Auto

6. Junk

The algorithms used for classification are Multi-class SVM (one vs. all), Adaptive

Hierarchal Multi-class SVM and Back-propagation Algorithm.



4.7.1 Support Vector Machine

Support Vector Machine (SVM) was first heard in 1992, introduced by Boser, Guyon,

and Vapnik in COLT-92 [36] [47]. Support vector machines (SVMs) are a set of related

supervised learning methods used for classification and regression [36] [47]. Support Vector

machines can be defined as systems which use hypothesis space of a linear functions in a

high dimensional feature space, trained with a learning algorithm from optimization theory

that implements a learning bias derived from statistical learning theory. It is being used for

many applications, such as hand writing analysis, face analysis and so forth, especially for

pattern classification and regression based applications. SVM gained popularity due to many

promising features such as better empirical performance.

The goals of SVM are separating the data with hyper plane and extend this to non-

linear boundaries using kernel trick. For calculating the SVM, the goal is to correctly classify

all the data. For mathematical calculations the following equations are used,

[a] If Yi= +1;

[b] If Yi= -1; wxi + b ≤ 1

[c] For all i; yi (wi + b) ≥ 1

In this equation x is a vector point and w is weight and is also a vector. So to separate

the data [a] should always be greater than zero. Among all possible hyper planes, SVM

selects the one where the distance of hyper plane is as large as possible. If the training data is

good and every test vector is located in radius r from training vector. Now if the chosen hyper

plane is located at the farthest possible from the data. This desired hyper plane which

maximizes the margin also bisects the lines between closest points on convex hull of the two

datasets. Thus there is [a], [b] & [c].

SVM can be represented as:

SVM classification, equation 4.7.1:

l

i 1

i

2

Kf,Cf min

i

yif(xi) 1 - i, for all i i 0 (4.7.1)

1bwxi



SVM classification, Dual formulation, equation 4.7.2:

l

1i

l

1j

jijiji

l

1i

iα

),K(yyαα2

1α min

i

xx 0 i C, for all i; 0

1

l

i

ii y (4.7.2)

Variables i are called slack variables and they measure the error made at point (xi,yi).

Training SVM becomes quite challenging when the number of training points is large.

Kernel: If data is linear, a separating hyper plane may be used to divide the data. However it

is often the case that the data is far from linear and the datasets are inseparable. To allow for

this kernels are used to non-linearly map the input data to a high-dimensional space.

Feature Space: Transforming the data into feature space makes it possible to define a

similarity measure on the basis of the dot product. If the feature space is chosen suitably,

pattern recognition can be easy.

Kernel Functions: The idea of the kernel function is to enable operations to be performed in

the input space rather than the potentially high dimensional feature space. Hence the inner

product does not need to be evaluated in the feature space. The goal of the function is to

perform mapping of the attributes of the input space to the feature space. The kernel function

plays a critical role in SVM and its performance. It is based upon reproducing Kernel Hilbert

Spaces, equation 4.7.3:

( ) ( ( ) ( )) (4.7.3)

If K is a symmetric positive definite function, which satisfies Mercer’s Conditions, then the

kernel represents a legitimate inner product in feature space. The training set is not linearly

separable in an input space. The training set is linearly separable in the feature space. This is

called the ―Kernel trick‖. The Kernel trick allows SVM’s to form nonlinear boundaries.

The kernel function used in this project is the Gaussian Radial Basis Function, as represented

by equation 4.7.4:

( ) (

) (4.7.4)



SVM is a useful technique for data classification. Even though it’s considered that

Neural Networks are easier to use than this, however, sometimes unsatisfactory results are

obtained. A classification task usually involves with training and testing data which consist

of some data instances. Each instance in the training set contains one target values and

several attributes. The goal of SVM is to produce a model which predicts target value of data

instances in the testing set which are given only the attributes.

Classification in SVM is an example of Supervised Learning. Known labels help

indicate whether the system is performing in a right way or not. This information points to a

desired response, validating the accuracy of the system, or be used to help the system learn to

act correctly. A step in SVM classification involves identification as which are intimately

connected to the known classes. This is called feature selection or feature extraction. Feature

selection and SVM classification together have a use even when prediction of unknown

samples is not necessary. They can be used to identify key sets which are involved in

whatever processes distinguish the classes.

The Multi-class SVM (one vs. all) method has been used. Here, the samples are

trained which consists of n classes. It takes the samples of class i as positive samples and the

rest as negative samples, are constructed during training.

The Multi-class SVM method in [26] using a top-down approach for training and

testing. This has been implemented which used a decision tree approach. The dataset is

divided into two clusters using k-means clustering algorithm (k=2). The means of each group

is the input to the clustering algorithm. Using SVM, which is a binary classification

algorithm, it is trained with one cluster given as positive samples and the other as negative

sample. This is continued for each cluster till only one class remains. For testing purpose, a

decision binary tree is built and a top –down approach is used. Starting from the root node, it

goes down left or right sub-tree until it reaches a leaf which is the class it belongs to. The

classifier for training has 5 internal nodes.

The major strengths of SVM are the training is relatively easy. Another major

advantage is that it is convex. No local optimal, unlike in neural networks. It scales relatively

well to high dimensional data and the trade-off between classifier complexity and error can

be controlled explicitly. The weakness includes the need for a good kernel function.



4.7.2 Back-propagation

Back-propagation [48], an abbreviation for "backward propagation of errors", is a

common method of training artificial neural networks. From a desired output, the network

learns from many inputs, similar to the way a child learns to identify a dog from examples of

dogs. It is a supervised learning method. It requires a dataset of the desired output for many

inputs, making up the training set. It is most useful for feed-forward networks (networks that

have no feedback, or simply, that have no connections that loop). Back-propagation requires

that the activation function used by the artificial neurons (or "nodes") be differentiable.

The sigmoid equation (4.7.5) is what is typically used as a transfer function between

neurons. It is similar to the step function, but is continuous and differentiable. One useful

property of this transfer function is the simplicity of computing its derivative.

( ) ( ) (4.7.5)

A neuron is the smallest unit in any neural network. A multi input neuron consist

multiple inputs ( ), which are assigned a weight ( ). is the output of the multi input

neuron, which is obtained by giving the inputs, along with their respective weights the

function . The following equation (4.7.6) defines the figure 4.2:

( ) (4.7.6)

Figure 4.2 Multi Input Neuron



a. Notation: Following are the notations used to explain the Back-propagation Algorithm:

- Input to node j of layer l

- Weight from layer l-1 node i to layer l node j

( ) ( ) - Sigmoid Transfer Function

- Bias of node j of layer l

- Output of node j in layer l

- Target value of node j of the output layer

b. Error Calculation:

Given a set of training data points tk and output layer output Ok, the error can be written as

shown in equation 4.7.7:

∑ ( )

(4.7.7)

Let the error of the network for a single training iteration is denoted by E. It is required to

calculate , the rate of change of the error with respect to the given connective

weight, so that it can be minimized. Two cases are considered: The node is an output node, or

it is in a hidden layer.

c. Output layer node:

An output layer node can be defined by the equation 4.7.8:

(4.7.8)

where, ( )( )



d. Hidden layer node:

A hidden layer node can be defined by the equation 4.7.9:

(4.7.9)

e. The Algorithm:

1. Run the network forward with your input data to get the network output

2. For each output node compute as shown in equation 4.7.10:

( )( ) (4.7.10)

3. For each hidden node calculate as shown in equation 4.7.11:

( ) ∑ (4.7.11)

4. Update the weights and biases as shown in equations 4.7.12 and 4.7.13:

(4.7.12)

(4.7.13)

The Levenberg-Marquardt Back-propagation Method is implemented in this project. The

number of input nodes is 10, number of hidden layer nodes is 12 and number of output nodes

is 6 as shown in Fig 4.3. The number of patterns used is 1016.

Figure 4.3 Parameters for Back-propagation Algorithm

http://www.mathworks.in/help/nnet/ref/trainlm.html



The classification result with highest accuracy is taken as the trained classifier for

classifying the moving objects in real time. The objects are captured when close to the

camera and the features mentioned in section 4.4 are calculated passed to the trained

classifier to determine the count of the category it belongs to.

4.8 Estimation of Car Dimensions

To estimate the length and width of the detected cars, their orientation and the width

and height of their bounding boxes were considered. To get the length of the car, the

Pythagoras Theorem was applied as shown in Fig 4.4, and length was obtained by the

division of the height of the bounding box by the cosine if the angle of orientation, as shown

in equation 4.8.1.

Length of Car = Height of its bounding box/cosine(cars orientation) (4.8.1)

Similarly if the orientation of the car is less than 85 degrees, the width of the

bounding box is considered to be the width of the car, otherwise the width is approximated by

the length of the car.

Figure 4.4 Calculation of the length of the car

Actual length

of vehicle if

at an angle



4.9 Estimation of Car Colour

Once the foreground blobs are extracted from the video and are classified as cars,

their colours are estimated based on the following method. The dominant colour for each

vehicle is extracted, allowing the user to search for vehicles based on five colours—black,

white, red, grey and blue. The dominant colour is computed by initially converting each input

video frame into (hue, saturation, luminance) HSL space from its original RGB space, and

then quantizing the HSL space into six colours. Before this quantization is done, a convex

hull is drawn over the car image and the pixels (extracted blob) falling under it are

considered. This ensures only the pixels within the contour of the object are considered for

colour estimation. Low values of luminance are closer to black and higher values are closer to

white. An approximate hue and saturation value range is associated for the colour required.

Starting with lightness value of the centroid pixel of the blocks, a histogram is computed for

the values using the approximate ranges. The values which satisfy a particular range of

luminance for a colour are considered for comparing with saturation range, and those

satisfying the saturation range of the same colour, are considered for comparison with hue

range of that particular colour. The colour corresponding to the bin which receives the

majority of votes is then assigned as the dominant colour of the object belonging to a

specified track.

4.10 Indexing

Indexing of extracted objects is done in the following manner. Each detected object is

given a video ID, a position ID, file ID, type ID and colour ID. These IDs for a detected

object along with its image and dimension are stored in a .mat file. The frames in which an

object appears in its respective video is also stored. As more and more vehicles keep getting

detected, they are added the .mat file. The Junk class as classified by the algorithm has been

left out and a few cases where occlusion was not handled was manually pruned.

4.11 Querying

The proposed system also has a querying module which can be used to query objects

based on their dimension, colour or type. The .mat file created during the indexing of



detected objects is used for querying. The user may view the frames in which a particular

queried object appears.


This chapter gives a detailed design and theoretical description of the various

algorithms being used in this project, be it for tracking, detection, classification, etc.

respectively. The chapter explains the working of each module implemented in this project in

detail, giving a description about the input data and the resultant output data of each module

as well.

Page 51

Chapter 5

Implementation

5.1 Programming Language Selection

In developing any application, the programming language plays a significant role. The

choice of the programming language has to be made based on the requirements of the

application at hand. In this application, MATLAB programming [49] is used as it is very

convenient to develop video processing applications with it, owing to its wide variety of

available toolkits and libraries. However, only basic library functions have been used such as

for arithmetic calculations, viewing and certain library functions from vision and image

processing toolbox. The implementation of algorithms has been done stepwise.

5.2 Platform Selection

The platform selected for the project is Windows 7, as it is a compatible OS for running

MATLAB. Also, the input video for the application is AVI (Audio Video Interleave) format,

which is a multimedia container format introduced by Microsoft in November 1992 as part of

its Video for Windows technology. AVI files can contain both audio and video data in a file

container that allows synchronous audio-with-video playback. AVI format is compatible both

with Windows 7 and MATLAB.

5.3 Code Conventions

5.3.1 Naming Conventions

File names can be whatever the user wants them to be (usually simpler is better though),

with a few exceptions:

1. MATLAB for Windows retains the file naming constraints set by DOS. The following

characters cannot be used in filenames:

a. " / : * < > | ?

http://en.wikipedia.org/wiki/Microsoft_Windows

http://en.wikipedia.org/wiki/Disk_Operating_System

Implementation ITAMS


2. It is not allowed to use the name of a reserved word as the name of a file. For

example, while.m is not a valid file name because while is one of MATLAB's

reserved words.

3. When an m-file function is declared, the m-file must be the same name as the function

or MATLAB will not be able to run it. For example, for a function called 'factorial':

a. function Y = factorial(X)

b. it must be saved as "factorial.m" in order to use it.

5.3.2 File Organization

Efficient file organization MATLAB may be done as follows:

1. Use a separate folder for each project

2. Write header comments, especially H1

3. Save frequent console commands as a script

1. Managing Folders:

The Current Folder Browser provides a few features to make managing separate

folders easier. The tree views multiple projects from a root directory. Having a top-down

hierarchical view makes it easy to move files between project directories. The address bar is

used for quickly switching back and forth between project directories. This allows keeping

only one of these folders on the MATLAB search path at the same time. If there is a nested

folder structure of useful functions that needs to be accessed (for example a hierarchical tools

directory), “Add with Subfolders” from the context menu can be used to quickly add a whole

directory tree to the MATLAB search path.

2. Write Header Comments:

Having comment lines in files enables the functions, scripts, and classes to participate

in functions like help and lookfor. When a directory is supplied to the help function it reads

out a list of functions in that directory. The display may be customized with a Contents.m

file, which can be generated with the Contents Report.

http://www.mathworks.com/access/helpdesk/help/releases/R2009b/techdoc/ref/help.html

hhttp://www.mathworks.com/access/helpdesk/help/releases/R2009b/techdoc/ref/lookfor.html



3. Save Console commands as a script:

Several commands in the Command History may be selected to create a script out of.

The method used is as follows:

(1) Delete from the Command History the unwanted commands so that the ones of interest

are left as a contiguous block.

(2) Then create a file in the Editor.

(3) Select in the History those commands.

(4) Drag the whole block into the Editor.

5.3.3 Properties Declaration

1. Rational Operators

MATLAB has all the standard rational operators. It is important to note, however, that unless

told otherwise, all rational operations are done on entire arrays, and use the matrix

definitions.

Add, Subtract, multiply, divide, exponent operators:

%addition

a = 1 + 2

%subtraction

b = 2 - 1

%matrix multiplication

c = a * b

%matrix division (pseudoinverse)

d = a / b

%exponentiation

e = a ^ b

Equality '==' returns the value "TRUE" (1) if both arguments are equal. This must not be

confused with the assignment operator '=' which assigns a value to a variable.



Greater than, less than and greater than or equal to, less than or equal to are given by >, <, >=,

<= respectively. All of them return a value of true or false.

2. Boolean Operators

The boolean operators are & (boolean AND) | (boolean OR) and ~ (boolean NOT /negation).

A value of zero means false, any non-zero value (usually 1) is considered true.

The negation operation in MATLAB is given by the symbol ~, which turns any FALSE

values into TRUE and vice versa.

The NOT operator has precedence over the AND and OR operators in MATLAB unless the

AND or OR statements are in parenthesis.

3. Declaring Strings

MATLAB can also manipulate strings. They should be enclosed in single quotes:

>> fstring = 'hello'

fstring =

hello

4. Displaying values of String Variables

If it is needed to display the value of a string, the semicolon is omitted as is standard in

MATLAB.

If is needed to display a string in the command window in combination with other text, one

way is to use array notation combined with either the 'display' or the 'disp' function:

>> fstring = 'hello';

>> display( [ fstring 'world'] )

helloworld

MATLAB doesn't put the space in between the two strings. If a space is required, it must be

done explicitly.



5. Comparing Strings

Unlike with rational arrays, strings will not be compared correctly with the relational

operator, because this will treat the string as an array of characters. To get the comparison the

strcmp function is used as follows:

>> string1 = 'a';

>> strcmp(string1, 'a')

ans = 1

>> strcmp(string1, 'A')

ans = 0

6. Anonymous functions

An anonymous function can be created at the command or in a script:

>>f = @(x) 2*x^2-3*x+4;

>>f(3)

ans = 13

To make an anonymous function of multiple variables, use a comma-separated list to declare

the variables:

>>f = @(x,y) 2*x*y;

>>f(2,2)

ans = 8

7. Function Handles

A function handle passes an m-file function into another function. The functionality it offers

is similar to that of function pointers in C++.

To pass an m-file to a function, first the m-file must be written:

function xprime = f(t,x)

xprime = x;



Save it as myfunc.m. To pass this to another function, @ symbol is used as follows:

>> ode45(@myfunc, [0:15], 1)

5.3.4 Class Declarations

Users can create their own MATLAB classes. For example, a class to represent polynomials

could be defined. This class could define the operations typically associated with MATLAB

classes, like addition, subtraction, indexing, displaying in the command window, and so on.

However, these operations would need to perform the equivalent of polynomial addition,

polynomial subtraction, and so on. For example, when two polynomial objects are added:

p1 + p2

the plus operation would know how to add polynomial objects because the polynomial class

defines this operation.

2. MATLAB Classes — Key Terms

MATLAB classes use the following words to describe different parts of a class definition and

related concepts.

Class definition — Description of what is common to every instance of a class.

Properties — Data storage for class instances

Methods — Special functions that implement operations that are usually performed

only on instances of the class

Events — Messages that are defined by classes and broadcast by class instances when

some specific action occurs

Attributes — Values that modify the behavior of properties, methods, events, and

classes

Listeners — Objects that respond to a specific event by executing a callback function

when the event notice is broadcast

Objects — Instances of classes, which contain actual data values stored in the objects'

properties



Subclasses — Classes that are derived from other classes and that inherit the methods,

properties, and events from those classes (subclasses facilitate the reuse of code

defined in the superclass from which they are derived).

Superclasses — Classes that are used as a basis for the creation of more specifically

defined classes (i.e., subclasses).

Packages — Folders that define a scope for class and function naming

5.3.5 Comments

Comment lines begin with the character '%', and anything after a '%' character is

ignored by the interpreter. The % character itself only tells the interpreter to ignore the

remainder of the same line.

In the MATLAB Editor, commented areas are printed in green by default, so they

should be easy to identify. There are two useful keyboard shortcuts for adding and removing

chunks of comments. Select the code to be commented or uncommented, and then press Ctrl-

R to (⌘-/ for Mac) place one '%' symbol at the beginning of each line and Ctrl-T (⌘-T for

Mac) to do the opposite.

1. Common uses

Comments are useful for explaining what function a certain piece of code performs especially

if the code relies on implicit or subtle assumptions or otherwise perform subtle actions. For

example,

% Calculate average velocity, assuming acceleration is constant

% and a frictionless environment.

force = mass * acceleration

It is common and highly recommended to include as the first lines of text a block of

comments explaining what an M file does and how to use it. MATLAB will output the

comments leading up to the function definition or the first block of comments inside a

function definition when this is typed:

>> help functionname



All of MATLAB's own functions written in MATLAB are documented this way as well.

Comments can also be used to identify authors, references, licenses, and so on. Such text is

often found at the end of an M file though also can be found at the beginning. Finally,

comments can be used to aid in debugging.

5.4 The Computer Vision System Toolbox

Computer Vision System Toolbox provides primitives for the design and simulation

of computer vision and video processing systems. The toolbox includes methods for feature

extraction, motion detection, object detection, object tracking, stereo vision, video

processing, and video analysis. Tools include video file I/O, video display, drawing graphics,

and compositing. Capabilities are provided as MATLAB functions, MATLAB System

objects, and Simulink blocks. For rapid prototyping and embedded system design, the system

toolbox supports fixed-point arithmetic and C code generation.

Key Features:

Feature detection, including FAST, Harris, Shi & Tomasi, SURF, and MSER

detectors.

Feature extraction and putative feature matching.

Object detection and tracking, including Viola-Jones detection and CAMShift

tracking.

Motion estimation, including block matching, optical flow, and template matching.

RANSAC-based estimation of geometric transformations or fundamental matrices.

Video processing, video file I/O, video display, graphic overlays, and compositing.

Block library for use in Simulink.

5.5 The Image Acquisition Toolbox

Image Acquisition Toolbox enables the user to acquire images and video from

cameras and frame grabbers directly into MATLAB. The user can detect hardware

automatically and configure hardware properties. Advanced workflows let the user trigger

acquisition while processing in-the-loop, perform background acquisition, and synchronize

sampling across several multimodal devices. With support for multiple hardware vendors and

http://www.mathworks.com/products/matlab/



industry standards, the user can use imaging devices ranging from inexpensive Web cameras

to high-end scientific and industrial devices that meet low-light, high-speed, and other

challenging requirements.

Key Features:

Support for industry standards, including DCAM, Camera Link, and GigE Vision.

Support for common OS interfaces for webcams, including Direct Show, QuickTime,

and video4linux2.

Support for a range of industrial and scientific hardware vendors.

Multiple acquisition modes and buffer management options.

Synchronization of multimodal acquisition devices with hardware triggering.

Interactive tool for rapid hardware configuration, image acquisition, and live video

previewing.

Support for C code generation in Simulink.

5.6 The Neural Network Toolbox

Neural Network Toolbox provides functions and apps for modeling complex

nonlinear systems that are not easily modeled with a closed-form equation. The toolbox can

be used to design, train, visualize, and simulate neural networks. The user can use Neural

Network Toolbox for applications such as data fitting, pattern recognition, clustering, time-

series prediction, and dynamic system modeling and control.

Key Features:

Supervised networks, including multilayer, radial basis, learning vector quantization

(LVQ), time-delay, nonlinear autoregressive (NARX), and layer-recurrent.

Unsupervised networks, including self-organizing maps and competitive layers.

Apps for data-fitting, pattern recognition, and clustering.

Parallel computing and GPU support for accelerating training (using Parallel

Computing Toolbox).

Preprocessing and postprocessing for improving the efficiency of network training

and assessing network performance.

http://www.mathworks.com/products/parallel-computing/

http://www.mathworks.com/products/parallel-computing/



Modular network representation for managing and visualizing networks of arbitrary

size.

Simulink® blocks for building and evaluating neural networks and for control systems

applications.

5.7 GUIs in MATLAB

A MATLAB GUI is a figure window to which the user adds user-operated controls.

The user can select, size, and position these components as the user likes. Using callbacks the

user can make the components do what the user wants when the user clicks or manipulates

them with keystrokes.

The user can build MATLAB GUIs in two ways:

Use GUIDE (GUI Development Environment), an interactive GUI construction kit.

Create code files that generate GUIs as functions or scripts (programmatic GUI

construction).

The first approach starts with a figure that the user populate with components from within

a graphic layout editor. GUIDE creates an associated code file containing callbacks for the

GUI and its components. GUIDE saves both the figure (as a FIG-file) and the code file.

Opening either one also opens the other to run the GUI.

In the second, programmatic GUI-building approach, the user creates a code file that

defines all component properties and behaviors; when a user executes the file, it creates a

figure, populates it with components, and handles user interactions. The figure is not

normally saved between sessions because the code in the file creates a new one each time it

runs.


This chapter describes the languages and the environment used for developing and

implementing the project. Here an overview of MATLAB is given, and define its coding

standards, its syntaxes and its limitations.

http://www.mathworks.in/products/simulink/

Page 61

Chapter 6

SOFTWARE TESTING

The user may select anyone of the modules from the User Interface, and then can

choose an input video for it.

Beforehand, a training set of 1016 samples is prepared using 13 different datasets

(videos). The training data consists of two arrays, one representing the features of the

different detected objects, and the other specifying to which group they belong (car, bike,

truck/bus, human, auto, junk).

The different functionalities that maybe chosen are Speed Estimation, Traffic

Direction Determination, Density Estimation and Classification of the detected objects.

6.1 Testing Environment

Testing was done on a Dell XPS laptop, connected to a power supply (not on battery

power), with the following specifications:

Hardware Specifications:

o Intel ® Core ™ i7 CPU

o 4.00 GB RAM

o 500GB HDD

Software Specifications:

o Windows 7 OS

o MATLAB 2011a

6.2 Unit Testing

For the unit testing of modules, the videos viptraffic.avi and t.avi were chosen which has a

total 35 cars in them. The videos were separately given as input to each of the modules, and

the results were tabulated as shown below:

Software Testing ITAMS


6.2.1 Detection Module

Object Detection is implemented using Optical Flow method and is tested with the videos

vip.avi and t.avi which have 35 samples in total. The actual output matches with the expected

output. The unit test for the detection module is shown in Table 6.1.

S/No. of Test Case 1

Name of Test Case Object Detection Test

Feature being Tested Object Detection (using Optical Flow Model)

Description Tests whether objects are detected accurately or not

Sample Input Traffic video containing 35 objects

Expected Output 35 detections

Actual Output 35 detections

Remarks Success

Table 6.1 Detection Module Test

The object detection test was a success, with an accuracy of 100%.

6.2.2 Tracking Module

Object Tracking is implemented using Kalman Filter and is tested with the videos vip.avi and

t.avi which have 35 samples in total. The actual output matches with the expected output. The

unit test for the detection module is shown in Table 6.2.


Name of Test Case Object Tracking Test

Feature being Tested Object Tracking (using Kalman Filtering)

Description Tests whether objects are tracked accurately or not

Sample Input Traffic video containing 35 objects

Expected Output 35 objects tracked

Actual Output 35 objects tracked

Remarks Success

Table 6.2 Tracking Module Test

The object tracking test was a success, with an accuracy of 100%.



6.2.3 Speed Estimation Module

The Speed estimation test was done on the viptraffic.avi video. Speed of the 10 cars

was calculated as follows, in the unit pixels/second. A scale factor of 0.8 is considered to

convert the speed into km/hr. The results of the test are shown in Table 6.3 and the unit

testing of the module is summarised in Table 6.4.

Pixels/Second Kilometers/Hour

1141.8 91.34

873.6 72

1411.1 112.88

265.4 21.23

99.3 7.94

115.1 9.21

115.1 9.21

84.8 6.78

173.2 13.85

142.3 11.38

Table 6.3 Speed Estimation Module Results


Name of Test Case Speed Estimation Test

Feature being Tested Vehicle Speed Estimation

Description Tests whether speed is estimated for all objects or not

Sample Input Traffic video containing 10 vehicles

Expected Output 10 speed estimations

Actual Output 10 speed estimations

Remarks Success

Table 6.4 Speed Estimation Module Test

The speed estimation test was a success, with the actual output matching the expected output.



6.2.4 Density Estimation Module

The unit test for the density estimation module is shown in Table 6.5.


Name of Test Case Density Estimation Test

Feature being Tested Traffic Density Estimation

Description Tests whether traffic density is estimated accurately or not

Sample Input Traffic video with low traffic density

Expected Output Low Density

Actual Output Low Density

Remarks Success

Table 6.5 Density Estimation Module Test

Traffic density for the input is Low, and that determined by the system is the same.

6.2.5 Traffic Flow Direction Estimation Module

The unit test for the direction estimation module is shown in Table 6.6.


Name of Test Case Direction Estimation Test

Feature being Tested Traffic Flow Direction Estimation

Description Tests the traffic flow direction

Sample Input Traffic video with traffic flow direction from N-S

Expected Output North to South

Actual Output North to South

Remarks Success

Table 6.6 Traffic Direction Estimation Module Test

In the test video, the traffic direction is North to South, and that determined by the

system is the same.



6.2.6 Classification Module

The unit test for the classification module is shown in Table 6.7. The training set consists of

1016 samples and the testing set consists of 35 samples. An accuracy of 91.42% was

achieved.


Name of Test Case Object Classification Test

Feature being Tested Object Classification

Description Classifies the detected objects

Sample Input Traffic video with 35 objects

Expected Output 35 Accurate Classifications

Actual Output 32 Accurate Classifications

Remarks Accuracy – 91.42%

Table 6.7 Classification Module Test

6.2.7 Accident Detection Module

The video used for this module was a2.avi and a4.avi. Accidents were detected

accurately in both these test videos. The unit test for the accident detection module is shown

in Table 6.8.


Name of Test Case Accident Detection Test

Feature being Tested Accident Detection

Description Detects for accidents in the video

Sample Input Traffic video with an accident

Expected Output Accident Signalled

Actual Output Accident Signalled

Remarks Success

Table 6.8 Accident Detection Module Test



6.2.8 Colour Estimation Module

The unit test for the colour estimation module is shown in Table 6.9. Colour estimation was

done based on the approach described in section 4.9. An accuracy of 85.71% was achieved.


Name of Test Case Colour Estimation Test

Feature being Tested Object Colour Estimation

Description Estimates colour of cars in the video

Sample Input Traffic video with 35 cars

Expected Output 35 Accurate Estimations

Actual Output 30 Accurate Estimations

Remarks Accuracy – 85.71%

Table 6.9 Colour Estimation Module Test

6.2.9 Summary

A summary of all the unit tests is shown in Table 6.10.

S/no. of Unit Test Name of Unit Test Remarks

1 Detection Test Accuracy – 100%

2 Tracking Test Accuracy – 100%

3 Speed Estimation Test Success

4 Density Estimation Test Success

5 Direction Estimation Test Success

6 Classification Test Accuracy – 91.42%

7 Accident Detection Test Success

8 Colour Estimation Test Accuracy – 85.71%

Table 6.10 Summary of the Unit Tests



6.3 SYSTEM & GUI TESTING

The Intelligent Traffic Analysis & Monitoring System has been abbreviated as

ITAMS.

Figure 6.1 GUI for ITAMS

The home screen of ITAMS is shown in Fig 6.1. The left hand side of the screen has Traffic

Analysis, Accident Detection and Tracking modules. The center of the screen has a window

to display the selected input video. The right hand side of the screen has options for querying

various types of objects detected from the input video.

Figure 6.2 Interface to select a Video File

When the Load Video button is pressed, an interface as shown in Fig 6.2 appears for the

selection of a video.



Figure 6.3 Selected Video being Played in the GUI

Once a video is selected, it is played in the GUI after the Play button is pressed, as shown in

the Fig 6.3.

Figure 6.4 Traffic Analysis Module (Estimation of Density)

As shown in Fig 6.4, any one option may be chosen from the Traffic Analysis Module with

the help of a radio button. Once the required option is selected, on pressing the Analyze

button, the selected video is played in the GUI, and the appropriate analysis may be observed.



Figure 6.5 Traffic Analysis Module (Estimation of Direction)

In the Fig 6.5, the user has chosen the Estimate Direction option to analyse the density of

traffic in the selected traffic video.

Figure 6.6 Traffic Analysis Module (Estimation of Speed)

In the Fig 6.6, the user has chosen the Estimate Speed option to analyse the density of traffic

in the selected traffic video.



Figure 6.7 Accident Detection Module

When the Accident Detection button is pressed, an interface to choose a video, similar to that

shown in Fig 6.2 appears. Once the video is selected, it is analysed for accidents, and if an

accident is detected, it is signalled, as shown in Fig 6.7.

Figure 6.8 Tracking Module

The tracking of the various objects detected in the video is shown in Fig 6.8. The trail of the

tracked object is also highlighted, as shown by the yellow trail of the black car in the above

figure. The count of the object of each type is also displayed.



Figure 6.9 Interface to select an Image File

When the Load Image button is pressed, an interface as shown in Fig 6.9 appears for the

selection of an image file.

Figure 6.10 Viewing Images

The images in a selected .mat file may be viewed as shown in Fig 6.10. The Previous and

Next Button may be used to cycle through all the images in the .mat file.



Figure 6.11 Querying Options

For specific querying of objects, the object type and its colour may be chosen. Once the Done

button is pressed, the objects maybe viewed in the panel below the querying options as shown

in Fig 6.11. Similar specified objects may be cycled through using the Previous and Next

buttons. If a car is selected, then its dimensions may be viewed in the dimensions panel.

Figure 6.12 Playing Video of Queried Image

On pressing the Show Frames button, a short video clip is played, only showing those frames

in which the selected object appears, as shown in Fig 6.12.




This chapter gives a detailed overview of the various unit tests, system tests and GUI

tests performed on the system, and their resultant outcome.

Page 74

Chapter 7

Experimental Analysis and Results

Once the objects have been detected in the input video, their features are extracted

and used for classification. First a training set of 1016 samples is prepared using 13 different

datasets (videos). These datasets have been acquired from some academic institute website.

They have a combined duration of 12 minutes and an average frame rate of 29Hz. The total

number of frames processed was around 20880. The training data consists of two arrays, one

representing the features of the different detected objects, and the other specifying to which

group they belong (car, bike, truck/bus, human, auto, junk). Additionally, a testing set of 500

samples is created using 4 different datasets (videos). These video sequences have a

combined duration of 4 minutes and an average frame rate of 29Hz, containing around 6960

frames. The testing data also consists of two arrays, one representing the features of the

different detected objects, and the other specifying to which group they belong.

The training data, along with the test array representing the features is given as input

to the SVM, Adaptive Hierarchal Multi-class SVM and Back-propagation Algorithm. An

appropriate kernel is selected for the SVM algorithms. In the present work the Gaussian

kernel is selected. The results of the classification are stored in a .mat file, which is later used

to calculate accuracy and precision, using the testing group array for actual values.

7.1 Evaluation Metric

Confusion Matrix was used as the evaluation metric. A confusion matrix is a specific

table layout that allows visualization of the performance of an algorithm. Each column of the

matrix represents the instances in a predicted class, while each row represents the instances in

an actual class. The name stems from the fact that it makes it easy to see if the system is

confusing two classes (i.e. commonly mislabelling one as another).

Experimental Analysis and Report ITAMS


7.1.1 Confusion Matrix for Multi-class SVM (one vs. all)

The confusion matrix for Multi-class SVM (one vs. all) is shown in Table 7.1.

Predicted class

Actual Class

Car Bike Bus/Truck Human Auto Junk

Car 199 2 1 1 0 22

Bike 4 86 0 0 0 5

Bus/Truck 0 0 15 0 1 1

Human 0 0 0 10 0 0

Auto 0 0 0 0 2 0

Junk 3 0 0 0 0 148

Table 7.1 Confusion Matrix for Multi-class SVM (Accuracy = 92%)

7.1.2 Confusion Matrix for Adaptive Hierarchal Multi-class SVM

The confusion matrix for Adaptive Hierarchal Multi-class SVM is shown in Table

7.2.

Predicted class

Actual Class


Car 171 0 0 1 0 12

Bike 3 82 0 0 0 1

Bus/Truck 5 0 13 0 0 0

Human 0 3 0 9 0 1

Auto 5 0 3 0 3 1

Junk 22 3 0 1 0 161

Table 7.2 Confusion Matrix for Adaptive Hierarchal SVM (Accuracy = 87.80%)



7.1.3 Confusion Matrix for Back-propagation

The confusion matrix for Back-propagation is shown in Table 7.3.

Predicted class

Actual Class


Car 178 7 5 1 0 28

Bike 7 75 0 2 0 1

Bus/Truck 2 0 10 0 3 2

Human 0 0 0 3 0 0

Auto 0 0 0 0 0 0

Junk 19 6 1 5 0 145

Table 7.3 Confusion Matrix for Back-propagation (Accuracy = 82.20%)

7.2 Experimental Dataset

The dataset consists of 17 videos in total, of which 13 are used for creating the

training data, and 4 are used for creating the testing data. The reason for using so many

videos is that they provide features of objects detected from various orientations and during

different lighting conditions. This leads to stronger training of the machine learning

algorithms, and hence leads to better predicted results.

Training Set - 1016 Samples

Testing Set - 500 Samples



7.3 Performance Analysis

The accuracy and precision for each of the three classification algorithms

implemented in this project, i.e. Multi-class SVM (one vs. all), Adaptive Hierarchal Multi-

class SVM and Back-propagation is calculated in this section.

7.3.1 Accuracy

Accuracy is the overall correctness of the model and is calculated as the sum of correct

classifications divided by the total number of classifications. It is described by equation

(7.3.1), and accuracy of the classification algorithms is shown in Table 7.4.

Accuracy = sum of predicted classifications/actual classification (7.3.1)

S/No. Classification Algorithm Accuracy Percentage

1 Multi-class SVM (one vs. all) 460/500 92%

2 Adaptive Hierarchal Multi-class SVM 439/500 87.80%

3 Back-propagation 410/500 82.20%

Table 7.4 Accuracy of Classification Algorithms

7.3.2 Precision

Precision is a measure of the accuracy provided that a specific class has been predicted. It is

defined by equation (7.3.2), where tp and fp are the numbers of true positive and false positive

predictions for the considered class.

Precision = tp/(tp + fp) (7.3.2)

In the confusion matrices above, the precision would be calculated as shown in Table 7.5.

Multi-class SVM

(one vs. all)

Adaptive Hierarchal

Multi-class SVM

Back-propagation

Algorithm

PrecisionC 96.60% 83.00% 86.40%

PrecisionB 97.72% 93.18% 85.22%

PrecisionT 93.75% 81.25% 60.25%

PrecisionH 90.90% 81.81% 27.27%

PrecisionA 66.66% 100% 0%

PrecisionJ 84.90% 91.47% 82.38%

Table 7.5 Precision for Classified Objects



7.4 Inference from the Results

After running the Multi-class SVM (one vs. all), the SVM (using Binary Trees) and

the Back-propagation Algorithm for the same dataset, it was found that the accuracy was

highest for Multi- class SVM (one vs. all), 92%, followed by Adaptive Hierarchal Multi-class

SVM with 88% and for Back-propagation Algorithm, 82%. Hence we infer from the results

that the features used by us for classification are more suitable for Multi-class SVM (one vs.

all).

Page 79

Chapter 8

CONCLUSION

The goal of this project is to develop an Intelligent Traffic Analysis & Monitoring

System, which is capable of operating in real-time, as well as with pre-recorded video

sequences, with a good performance rate. The proposed system provides an efficient and

interesting object-based video indexing and retrieval approach. All concepts are implemented

on MATLAB using our own written codes, with only a minimalistic use of in-built functions.

The performance of the detection and tracking algorithms was found to be 100% for

three test videos used for testing. For the three classification algorithms implemented in this

project, it was found that the accuracy was highest for Multi-SVM, 92%, followed by

Adaptive Hierarchal Multi-SVM with 88% and then the Back-propagation Algorithm, 82%.

Classification of objects is done in real time providing the count for each type of object.

The system also has two additional modules for Accident Detection and Traffic

Analysis. The Traffic Analysis module consists of the following functionalities Traffic

Density Measurement, Traffic Flow Direction Determination and Speed Estimation.

For the testing of the accident detection module, two videos were used, and it was

found that the accidents were detected accurately. The traffic analysis module was also

tested, and it was found that the density, traffic flow and speed estimation was accurate. An

algorithm for object colour estimation has also been implemented, which provided an

accuracy of around 85%.

The system also has a Querying module, which is able to query objects based on their

dimension, colour and type. It also has the capability to display the frames in which a

particular queried object appears. The accuracy of this module was found to be 100%.

A graphical user interface that enables fast search and retrieval interesting vehicles

based on colour, size or type, detection, tracking and count of objects for each type in real

time and playback for the video clip which contains the selected vehicle and for the analysis

of the implemented algorithms has also been implemented. The proposed system

demonstrates its strong ability in digital surveillance.

Conclusion ITAMS


8.1 Limitations of the Project

1. The software is not defined for multiple cameras.

2. Program works only for stationary camera videos only.

3. Foreground blobs vary according to the quality of video, not being well defined in

certain cases even after applying morphological operations.

4. Occlusions are partially handled in certain situation, because of the stationary camera,

due to its position, the vehicles do not diverge while they are captured.

5. Managing huge amounts of training data for SVM and Back-propagation is tedious.

6. The height of the camera is required to get real world coordinates for the objects.

8.2 Future Enhancements

While this project exploits the manipulation of the various parameters, some features

may affect the optimal classification of objects more than others. As part of our future

enhancements, we aim to find these features and optimize them so as to find the most

accurate solution for classification.

Furthermore, it would be worthwhile to run this system with a feed from a greater

variety of cameras, as well as using moving cameras. Most likely, this would aid in complete

handling of occlusion and would lead to improved detection and classification results.

Data storage should be as efficient as possible, in spite of having a large number of

training samples.

Page 81

References

[1] Arun Hampapur, “S3-R1: The IBM Smart Surveillance System-Release 1,” IBM T.J.

Watson Research Centre, New York, U.S.A, 2006.

[2] Scott Bradley and Peter DeCoursey, “Hitachi Data Systems Solutions for Video

Surveillance,” Hitachi Data Systems Corporation, 2011.

[3] Sayanan Sivaraman and Mohan M. Trivedi, "Vehicle Detection by Independent Parts for

Urban Driver Assistance,” IEEE Transactions on Intelligent Transportation Systems,

Anchorage, Alaska, U.S.A., 2013.

[4] Sayanan Sivaraman and Mohan M. Trivedi, "Integrated Lane and Vehicle Detection,

Localization, and Tracking: A Synergistic Approach,” IEEE Transactions on Intelligent

Transportation Systems, Anchorage, Alaska, U.S.A., 2013.

[5] Eshed Ohn-Bar, Sayanan Sivaraman, and Mohan M. Trivedi, “Partially Occluded Vehicle

Recognition and Tracking in 3D,” IEEE Intelligent Vehicles Symposium, San Diego,

California, 2013.

[6] Song Liu, Haoran Yi, Liang-Tien Chia, and Deepu Rajan, “Adaptive Hierarchical Multi-

class SVM Classifier for Texture-based Image Classification,” IEEE International

Conference on Multimedia & Expo, NTU, Singapore, 2005.

[7] Linda G. Shapiro and George C. Stockman, “Computer Vision,” Prentice Hall, 2001,

ISBN 0-13-030796-3.

[8] Alan Hanjalic, “Content-based Analysis of Digital Video,” Kluwer Academic Publishers,

2004, ISBN 1-4020-8115-4.

[9] Xiaochao Yao, “Object Detection and Tracking for Intelligent Vehicle Systems,”

University of Michigan-Dearborn – Technology and Engineering, 2006.

[10] Emilio Maggio and Andrea Cavallaro, “Video Tracking: Theory and Practice,” Wiley,

2011, ISBN 978-0-470-74964-7.

[11] M. Bennamoun and G.J. Mamic, “Object Recognition: Fundamentals and Case Studies,”

Springer-Verlag, 2002, ISBN 1-85233-398-7.

[12] Zheng Rong Yang, “Machine Learning Approaches to Bioinformatics,” World Scientific

Publishing Co. Pte. Ltd., 2010, ISBN 978-981-4287-30-2.

http://en.wikipedia.org/wiki/International_Standard_Book_Number

http://en.wikipedia.org/wiki/Special:BookSources/0-13-030796-3

Bibliography ITAMS


[13] Gaurang Panchal, Amit Gantara, Parth Shah and Devyani Panchal, “Determination of Over-

Learning and Over-Fitting Problem in Backpropagation Neural Network,” International Journal on Soft

Computing, Vol. 2(2), 2011.

[14] Kirk James, F. O’Brien and David A. Forsyth, “Skeletal Parameter Estimation from

Optical Motion Capture Data,” University of California, Berkeley, 2008.

[15] J. Komala Lakshmi and M. Punithavalli, “A Survey on Performance Evaluation of

Object Detection Techniques in Digital Image Processing,” International Journal of

Computer Science, pp. 562-568, Vol. 7(6), 2010.

[16] Shruti V Kamath, Mayank Darbari and Rajashree Shettar, “Content Based Indexing and

Retrieval from Vehicle Surveillance Videos Using Gaussian Mixture Method,” International

Journal of Computer Engineering & Technology, Vol. 4(1), pp. 420-429, 2013.

[17] Paul Viola and Michael Jones, “Robust Real-time Object Detection,” International

Workshop on Statistical and Computing Theories of Vision, Vancouver, Canada, 2001.

[18] H. Schneiderman and T. Kanade “Object Detection Using the Statistics of Parts,”

International Journal of Computer Vision, Vol. 56(3), pp. 151-177, 2004.

[19] Kinjal A Joshi and Darshak G Thakore, “A Survey on Moving Object Detection and

Tracking in Video Surveillance System,” International Journal of Soft Computing and

Engineering, Vol. 2(3), 2012.

[20] Hsu-Yung Cheng, Po-Yi Liu and Yen-Ju Lai, “Vehicle Tracking in Daytime and

Nighttime Traffic Surveillance Videos,” 2nd

International Conference on Education

Technology and Computer (ICETC), Taiwan, 2010.

[21] Andrew Senior, Arun Hampapur, Ying-li Tian, Lisa Brown, Sharath Pankanti and Ruud

Bolle, “Appearance Models for Occlusion Handling,” IEEE Workshop on Performance

Evaluation of Tracking and Surveillance, New York, U.S.A., 2001.

[22] Kamijo, S., Matsushita, Y., Ikeuchi, K. and Sakauchi, M., “Traffic Monitoring and

Accident Detection at Intersections,” IEEE Intelligent Transportation Systems, 2000.

[23] Luis Carlos Molina, Lluís Belanche and Àngela Nebot, “Feature Selection Algorithms:

A Survey and Experimental Evaluation,” Universitat Politècnica de Catalunya, Departament

de Llenguatges i Sistemes Informátics, Barcelona, Spain.

[24] David Lowe, "Object recognition from local scale-invariant features," Proceedings of the

International Conference on Computer Vision, Vol. 2, pp. 1150-1157, 1999.

http://ieeexplore.ieee.org/search/searchresult.jsp?searchWithin=p_Authors:.QT.Hsu-Yung%20Cheng.QT.&searchWithin=p_Author_Ids:37336556000&newsearch=true

http://ieeexplore.ieee.org/search/searchresult.jsp?searchWithin=p_Authors:.QT.Po-Yi%20Liu.QT.&searchWithin=p_Author_Ids:37534820100&newsearch=true

http://ieeexplore.ieee.org/search/searchresult.jsp?searchWithin=p_Authors:.QT.Yen-Ju%20Lai.QT.&searchWithin=p_Author_Ids:37535440300&newsearch=true

http://ieeexplore.ieee.org/search/searchresult.jsp?searchWithin=p_Authors:.QT.Kamijo,%20S..QT.&searchWithin=p_Author_Ids:37326477000&newsearch=true

http://ieeexplore.ieee.org/search/searchresult.jsp?searchWithin=p_Authors:.QT.Matsushita,%20Y..QT.&searchWithin=p_Author_Ids:37279541200&newsearch=true

http://ieeexplore.ieee.org/search/searchresult.jsp?searchWithin=p_Authors:.QT.Ikeuchi,%20K..QT.&searchWithin=p_Author_Ids:37281068600&newsearch=true

http://ieeexplore.ieee.org/search/searchresult.jsp?searchWithin=p_Authors:.QT.Sakauchi,%20M..QT.&searchWithin=p_Author_Ids:37279580400&newsearch=true

http://doi.ieeecs.org/10.1109/ICCV.1999.790410

Bibliography ITAMS


[25] Herbert Bay, Andreas Ess, Tinne Tuytelaars, Luc Van Gool, "SURF: Speeded Up

Robust Features," Computer Vision and Image Understanding, Vol. 110, No. 3, pp. 346-359,

2008.

[26] Krystian Mikolajczyk and Cordelia Schmid, "A performance evaluation of local

descriptors," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 10(27),

pp. 1615-1630, 2005.

[27] N. Dalal and B. Triggs, “Histograms of Oriented Gradients for Human Detection,”

Computer Vision and Pattern Recognition, 2005.

[28] Mehrube Mehrubeoglu and Lifford McLauchlan, “Determination of vehicle speed in

traffic video,” SPIE Digital Library, 2009.

[29] Deng-Yuan Huang, Chao-Ho Chen, Wu-Chih Hu, Shu-Chung Yi and Yu-Feng Lin,

“Feature-Based Vehicle Flow Analysis and Measurement for a Real-Time Traffic

Surveillance System,” Journal of Information Hiding and Multimedia Signal Processing, Vol.

3(3), 2012.

[30] Erhan Ince, “Measuring traffic flow and classifying vehicle types: A surveillance video

based approach,” Turkish Journal on Electrical Engineering & Comp Science, Vol. 19(4),

2011.

[31] Ö. Aköz and M.E. Karsligil, “Severity Detection of Traffic Accidents at Intersections

based on Vehicle Motion Analysis and Multiphase Linear Regression,” Annual Conference

on Intelligent Transportation Systems, Madeira Island, Portugal, 2010.

[32] Ju-Won Hwang, Young-Seol Lee and Sung-Bae Cho, “Structure Evolution of Dynamic

Bayesian Network for Traffic Accident Detection,” IEEE Congress on Evolutionary

Computation, Seoul, Korea, 2011.

[33] In Jung Lee, “An Accident Detection System on Highway Using Vehicle Tracking

Trace,” IEEE International Conference on ICT Convergence, South Korea, 2011.

[34] Shunsuke Kamijo, Yasuyuki Matsushita, Katsushi Ikeuchi and Masao Sakauchi, “Traffic

Monitoring and Accident Detection at Intersections,” IEEE Transactions on Intelligent

Transportation Systems, Vol. 1(2), 2000.

[35] Logesh Vasu, “An Effective Step to Real-Time Implementation of Accident Detection

System Using Image Processing,” Masters Thesis, Oklahoma State University, 2010.

[36] Aroh Barjatya, “Block Matching Algorithm for Motion Estimation,” DIP Spring Project,

2004.

[37] Feris, R.S., Siddiquie, B., Petterson, J., Yun Zhai, Datta, A., Brown, L.M. and Pankanti,

S. “Large-Scale Vehicle Detection, Indexing, and Search in Urban Surveillance Videos,”

IEEE Transactions on Multimedia, 2012.

http://en.wikipedia.org/wiki/Herbert_Bay

http://www.vision.ee.ethz.ch/~surf/papers.html

http://www.vision.ee.ethz.ch/~surf/papers.html

http://lear.inrialpes.fr/pubs/2005/MS05/



http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=9901

http://spie.org/app/profiles/viewer.aspx?profile=PGZQZU

http://spie.org/app/profiles/viewer.aspx?profile=KLRHTO

http://ieeexplore.ieee.org/search/searchresult.jsp?searchWithin=p_Authors:.QT.Feris,%20R.S..QT.&searchWithin=p_Author_Ids:37266960600&newsearch=true

http://ieeexplore.ieee.org/search/searchresult.jsp?searchWithin=p_Authors:.QT.Siddiquie,%20B..QT.&searchWithin=p_Author_Ids:37393934700&newsearch=true

http://ieeexplore.ieee.org/search/searchresult.jsp?searchWithin=p_Authors:.QT.Petterson,%20J..QT.&searchWithin=p_Author_Ids:37846224600&newsearch=true

http://ieeexplore.ieee.org/search/searchresult.jsp?searchWithin=p_Authors:.QT.Yun%20Zhai.QT.&searchWithin=p_Author_Ids:37633146900&newsearch=true

http://ieeexplore.ieee.org/search/searchresult.jsp?searchWithin=p_Authors:.QT.Datta,%20A..QT.&searchWithin=p_Author_Ids:37895267800&newsearch=true

http://ieeexplore.ieee.org/search/searchresult.jsp?searchWithin=p_Authors:.QT.Brown,%20L.M..QT.&searchWithin=p_Author_Ids:37348379000&newsearch=true

http://ieeexplore.ieee.org/search/searchresult.jsp?searchWithin=p_Authors:.QT.Pankanti,%20S..QT.&searchWithin=p_Author_Ids:37269971500&newsearch=true

http://ieeexplore.ieee.org/search/searchresult.jsp?searchWithin=p_Authors:.QT.Pankanti,%20S..QT.&searchWithin=p_Author_Ids:37269971500&newsearch=true

http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6046

Bibliography ITAMS


[38] Corinna Cortes and Vladimir N. Vapnik, “Support-Vector Networks,” Machine

Learning, 20, 1995.

[39] Freund, Yoav, Schapire and Robert E, “A Decision-Theoretic Generalization of on-Line

Learning and an Application to Boosting,” 1995.

[40] Chang Liu, George Chen, Yingdong Ma and Xiankai Chen, “A System for Indexing and

Retrieving Vehicle Surveillance Videos,” 4th International Congress on Image and Signal

Processing, 2011.

[41] Shruti V Kamath, Mayank Darbari and Rajashree Shettar, “Content Based Indexing and

Retrieval from Vehicle Surveillance Videos Using Optical Flow Method,” International

Journal of Scientific Research, Vol. 2(4), 2013.

[42] R.E. Kalman, “A new approach to linear filtering and prediction problems,” Journal of

Basic Engineering, Vol. 82(1), pp. 35–45, 1960.

[43] R.E. Kalman and R.S. Bucy, “New Results in Linear Filtering and Prediction Theory,”

Research Institute For Advanced Study, Baltimore, Maryland, 1961.

[44] Greg Welch, Gary Bishop, “An Introduction to the Kalman Filter,” University of North

Carolina at Chapel Hill, Department of Computer Science, 2001.

[45] M.S. Grewal and A.P. Andrews, “Kalman Filtering - Theory and Practice Using

MATLAB,” Wiley, 2001.

[46] András Frank, “On Kuhn's Hungarian Method – A tribute from Hungary,” Egervary

Research Group, Budapest, Hungary, 2004.

[47] Vikramaditya Jakkula, “Tutorial on SVM,” Washington State University, 2006.

[48] Raúl Rojas, “The backpropagation algorithm of Neural Networks - A Systematic

Introduction,” Springer-Verlag, Berlin, 1996.

[49] David Houcque, “Introduction to MATLAB for engineering students,” Northwestern

University, 2005.

http://www.elo.utfsm.cl/~ipd481/Papers%20varios/kalman1960.pdf

http://www.dtic.mil/srch/doc?collection=t2&id=ADD518892

http://en.wikipedia.org/wiki/Andr%C3%A1s_Frank

http://www.cs.elte.hu/egres/tr/egres-04-14.pdf

http://en.wikipedia.org/wiki/Ra%C3%BAl_Rojas

http://page.mi.fu-berlin.de/rojas/neural/chapter/K7.pdf

http://page.mi.fu-berlin.de/rojas/neural/index.html.html

http://page.mi.fu-berlin.de/rojas/neural/index.html.html

8th Sem Project ITAMS

Documents