Top Banner
Camera handoff and placement for automated tracking systems with multiple omnidirectional cameras Chung-Hao Chen * , Yi Yao, David Page, Besma Abidi, Andreas Koschan, Mongi Abidi Imaging, Robotics, and Intelligent Systems Laboratory, Department of Electrical Engineering and Computer Science, The University of Tennessee, Knoxville, TN 37996, USA article info Article history: Received 12 March 2008 Accepted 13 April 2009 Available online 8 July 2009 Keywords: Omnidirectional camera Consistent labeling Camera placement Camera handoff Multi-object multi-camera tracking Automated surveillance systems abstract In a multi-camera surveillance system, both camera handoff and placement play an important role in generating an automated and persistent object tracking, typical of most surveillance requirements. Cam- era handoff should comprise three fundamental components, time to trigger handoff process, the execu- tion of consistent labeling, and the selection of the next optimal camera. In this paper, we design an observation measure to quantitatively formulate the effectiveness of object tracking so that we can trig- ger camera handoff timely and select the next camera appropriately before the tracked object falls out of the field of view (FOV) of the currently observing camera. In the meantime, we present a novel solution to the consistent labeling problem in omnidirectional cameras. A spatial mapping procedure is proposed to consider both the noise inherent to the tracking algorithms used by the system and the lens distortion introduced by omnidirectional cameras. This does not only avoid the tedious process, but also increases the accuracy, to obtain the correspondence between omnidirectional cameras without human interven- tions. We also propose to use the Wilcoxon Signed-Rank Test to improve the accuracy of trajectory asso- ciation between pairs of objects. In addition, since we need a certain amount of time to successfully carry out the camera handoff procedure, we introduce an additional constraint to optimally reserve sufficient cameras’ overlapped FOVs for the camera placement. Experiments show that our proposed observation measure can quantitatively formulate the effectiveness of tracking, so that camera handoff can smoothly transfer objects of interest. Meanwhile, our proposed consistent labeling approach can perform as accu- rately as the geometry-based approach without tedious calibration processes and outperform Calderara’s homography-based approach. Our proposed camera placement method exhibits a significant increase in the camera handoff success rate at the cost of slightly decreased coverage, as compared to Erdem and Sclaroff’s method without considering the requirement on overlapped FOVs. Ó 2009 Elsevier Inc. All rights reserved. 1. Introduction Due to their panoramic views, omnidirectional cameras have been widely used in surveillance systems [1–4]. Literature men- tions intensive research interest in projection modeling, object tracking, and stereo vision of omnidirectional cameras. With the increased scale of practical surveillance systems, even equipped with a 360° 90° view, a single omnidirectional camera is incapa- ble of monitoring the entire environment. A network of multiple omnidirectional cameras emerges for improved overall coverage and configuration flexibility. Even though the use of multiple cam- eras is popular, the discussion regarding systems using multiple omnidirectional cameras is relatively underdeveloped. In particu- lar, the non-uniform resolution and severe distortion from non- perspective projection result in considerable difficulties in estab- lishing such a surveillance system. To set up an automated surveillance system using multiple omnidirectional cameras, we encounter the same issues as systems based on multiple perspective cameras, including camera place- ment, camera handoff, and object tracking. Camera placement, as the first step to set up a surveillance system, determines the cam- eras’ configuration including intrinsic and extrinsic parameters according to the geometry of the environment and the require- ments of the system performance. Camera handoff, as the dynamic online coordination center, determines when and which camera will track and monitor the object of interest. Consistent labeling, as an important step in camera handoff, builds the connections of the same object in different camera’s FOVs. Object tracking, as the fundamental online tracking function in a single camera, lays the foundation to keep track of the object of interest and the understanding of their behaviors for an advanced application. With the proper functioning of these units, an automated surveillance system based on multiple omnidirectional cameras is able to fulfill 1077-3142/$ - see front matter Ó 2009 Elsevier Inc. All rights reserved. doi:10.1016/j.cviu.2009.04.004 * Corresponding author. Fax: +1 865 974 5459. E-mail addresses: [email protected] (C.-H. Chen), [email protected] (Y. Yao), [email protected] (D. Page), [email protected] (B. Abidi), [email protected] (A. Koschan), [email protected] (M. Abidi). Computer Vision and Image Understanding 114 (2010) 179–197 Contents lists available at ScienceDirect Computer Vision and Image Understanding journal homepage: www.elsevier.com/locate/cviu
19

Camera handoff and placement for automated tracking systems with multiple omnidirectional cameras

Apr 30, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Camera handoff and placement for automated tracking systems with multiple omnidirectional cameras

Computer Vision and Image Understanding 114 (2010) 179–197

Contents lists available at ScienceDirect

Computer Vision and Image Understanding

journal homepage: www.elsevier .com/ locate/cviu

Camera handoff and placement for automated tracking systems with multipleomnidirectional cameras

Chung-Hao Chen *, Yi Yao, David Page, Besma Abidi, Andreas Koschan, Mongi AbidiImaging, Robotics, and Intelligent Systems Laboratory, Department of Electrical Engineering and Computer Science, The University of Tennessee, Knoxville, TN 37996, USA

a r t i c l e i n f o a b s t r a c t

Article history:Received 12 March 2008Accepted 13 April 2009Available online 8 July 2009

Keywords:Omnidirectional cameraConsistent labelingCamera placementCamera handoffMulti-object multi-camera trackingAutomated surveillance systems

1077-3142/$ - see front matter � 2009 Elsevier Inc. Adoi:10.1016/j.cviu.2009.04.004

* Corresponding author. Fax: +1 865 974 5459.E-mail addresses: [email protected] (C.-H. Chen

[email protected] (D. Page), [email protected] (B. Abidi), [email protected] (M. Abidi).

In a multi-camera surveillance system, both camera handoff and placement play an important role ingenerating an automated and persistent object tracking, typical of most surveillance requirements. Cam-era handoff should comprise three fundamental components, time to trigger handoff process, the execu-tion of consistent labeling, and the selection of the next optimal camera. In this paper, we design anobservation measure to quantitatively formulate the effectiveness of object tracking so that we can trig-ger camera handoff timely and select the next camera appropriately before the tracked object falls out ofthe field of view (FOV) of the currently observing camera. In the meantime, we present a novel solution tothe consistent labeling problem in omnidirectional cameras. A spatial mapping procedure is proposed toconsider both the noise inherent to the tracking algorithms used by the system and the lens distortionintroduced by omnidirectional cameras. This does not only avoid the tedious process, but also increasesthe accuracy, to obtain the correspondence between omnidirectional cameras without human interven-tions. We also propose to use the Wilcoxon Signed-Rank Test to improve the accuracy of trajectory asso-ciation between pairs of objects. In addition, since we need a certain amount of time to successfully carryout the camera handoff procedure, we introduce an additional constraint to optimally reserve sufficientcameras’ overlapped FOVs for the camera placement. Experiments show that our proposed observationmeasure can quantitatively formulate the effectiveness of tracking, so that camera handoff can smoothlytransfer objects of interest. Meanwhile, our proposed consistent labeling approach can perform as accu-rately as the geometry-based approach without tedious calibration processes and outperform Calderara’shomography-based approach. Our proposed camera placement method exhibits a significant increase inthe camera handoff success rate at the cost of slightly decreased coverage, as compared to Erdem andSclaroff’s method without considering the requirement on overlapped FOVs.

� 2009 Elsevier Inc. All rights reserved.

1. Introduction

Due to their panoramic views, omnidirectional cameras havebeen widely used in surveillance systems [1–4]. Literature men-tions intensive research interest in projection modeling, objecttracking, and stereo vision of omnidirectional cameras. With theincreased scale of practical surveillance systems, even equippedwith a 360� � 90� view, a single omnidirectional camera is incapa-ble of monitoring the entire environment. A network of multipleomnidirectional cameras emerges for improved overall coverageand configuration flexibility. Even though the use of multiple cam-eras is popular, the discussion regarding systems using multipleomnidirectional cameras is relatively underdeveloped. In particu-lar, the non-uniform resolution and severe distortion from non-

ll rights reserved.

), [email protected] (Y. Yao),[email protected] (A. Koschan),

perspective projection result in considerable difficulties in estab-lishing such a surveillance system.

To set up an automated surveillance system using multipleomnidirectional cameras, we encounter the same issues as systemsbased on multiple perspective cameras, including camera place-ment, camera handoff, and object tracking. Camera placement, asthe first step to set up a surveillance system, determines the cam-eras’ configuration including intrinsic and extrinsic parametersaccording to the geometry of the environment and the require-ments of the system performance. Camera handoff, as the dynamiconline coordination center, determines when and which camerawill track and monitor the object of interest. Consistent labeling,as an important step in camera handoff, builds the connectionsof the same object in different camera’s FOVs. Object tracking, asthe fundamental online tracking function in a single camera, laysthe foundation to keep track of the object of interest and theunderstanding of their behaviors for an advanced application. Withthe proper functioning of these units, an automated surveillancesystem based on multiple omnidirectional cameras is able to fulfill

Page 2: Camera handoff and placement for automated tracking systems with multiple omnidirectional cameras

180 C.-H. Chen et al. / Computer Vision and Image Understanding 114 (2010) 179–197

tasks such as activity monitoring, behavior understanding, andthreat awareness. Table 1 summarizes and compares the functionsof these units. The difficulties caused by the use of omnidirectionalcameras are listed as well.

In general, camera handoff regulates the collaboration amongmultiple cameras and answers the questions of When and Who:when a handoff request should be triggered to secure sufficienttime for a successful consistent labeling and who is the most qual-ified camera to take over the object of interest, before it falls out ofthe FOV of currently observing camera. In other words, camerahandoff should comprise three fundamental components, time totrigger handoff process, the execution of consistent labeling, andthe selection of the next camera. Nevertheless, most existing cam-era handoff algorithms discussed in systems based on multipleperspective cameras concentrate on the execution of consistentlabeling and ignore the interrelation among those three fundamen-tal components. As a result, there is no clear formulation to governthe transition between adjacent cameras. For instance, a consistentlabeling request in the work of Khan and Shan [12] is triggeredwhen the object is too close to the edge of the camera’s FOV(EFOV). No quantitative measure is given to describe what is con-sidered as too close to EFOV and which camera is the most quali-fied camera to track the handoff object.

Thus, in this paper, we first propose the observation measureincluding two components: resolution and distance to EFOV, eachof which describes different aspect of object tracking, to quantita-tively formulate the effectiveness of a camera in observing thetracked object. Equipped with the quantified and comprehensivemeasure of the effectiveness of object tracking we can answerthe questions of When and Who with the optimized solution. How-ever, different from systems with perspective cameras, omnidirec-tional cameras suffer from severe distortions to realize a wide FOVand frequently different cameras have different projection models,which leads to difficulties in deriving the observation measure. Inthis paper, we adopt a polynomial model to approximate variousprojection models of omnidirectional cameras. In addition, a statis-tical measure is used to automatically select the optimal polyno-mial degree. Based on this polynomial approximation withautomated model selection, we are able to derive a unified defini-tion of observation measure for various types of omnidirectionalcameras.

In the meantime, since omnidirectional cameras have non-uni-form resolution and severe distortion, we build a spatial mappingprocedure to automatically obtain correspondence information be-tween overlapped omnidirectional cameras without human inter-ventions. This spatial mapping method considers both the noiseinherent to the tracking algorithms used by the system and thelens distortion introduced by the omnidirectional cameras. Withthis spatial mapping method, the system can identify the trajecto-ries of the same object from different omnidirectional cameraviews and avoid the necessity of fully calibrating cameras as inthe works of Black and Ellis [10], and Kelly et al. [11]. Accordingly,our proposed consistent labeling is divided into two phases, thespatial mapping phase and the pair matching phase. In the spatial

Table 1Comparison of the functions of camera placement, camera handoff, and objecttracking in a surveillance system.

Unit Range Execution Difficulties

Camera placement Multiple cameras Offline Various projection modelsCamera handoff Adjacent cameras Online Nonuniform resolution

Distorted appearanceVarious projection models

Object tracking Single camera Online Nonuniform resolutionDistorted appearance

mapping phase, the homography between two overlapped FOVscan be automatically recovered. The Wilcoxon Signed-Rank Test[25] is then used to match objects to increase the success rate ofconsistent labeling in the pair matching phase.

A complete camera handoff process including abovementionedthree fundamental components needs a certain amount of time tobe executed successfully, especially the time needed to executeconsistent labeling. Thus, the size of overlapped FOV should be re-served enough for successfully carrying out consistent labeling, be-fore the object falls out of the FOV of the observing camera. Eventhough the works of Javed et al. [19,20], Kang et al. [27], and Limet al. [28] can consistently label the object in the disjoint viewsof two cameras, those algorithms still need an amount of time torecover an untracked object after the camera sees the object. Inparticular, constraints in their works, such as the size of disjointview is restrained and the locations of exits and entrances acrossthe cameras have to be correlated, lead to more complicated ques-tions, how to determine the size of disjoint views and where toplace them in the monitored environment without deterioratingthe performance of the automated surveillance system in termsof the continuity of the tracked object. In addition, those trackingsystems cannot detect the occurrence of unusual events due tothe lack of clear and continuous views on the object. This maycause a serious loophole in a surveillance system. Thus, we proposea camera placement approach finding an optimized tradeoff be-tween the overall coverage and the size of overlapped FOVs tomaximize the performance of the automated surveillance systemin terms of the continuity of the tracked object.

Since the focus of the paper is not developing a multi objecttracking algorithm in the omnidirectional camera, we assume thatreasonably correct object tracking result is available throughoutwhatever method is preferred by the user. Indeed, we use the workof Cui et al. [24] for target detection and tracking in an omnidirec-tional camera. In summary, the contributions of this paper arelisted as follows: (1) An observation measure is introduced toquantitatively formulate the effectiveness of a camera in observingthe tracked object. This gives a quantified metric to direct camerahandoff for continuous and automated tracking before the trackedobject falls out of the FOV of currently observing camera. (2) Ourproposed consistent labeling algorithm can perform as accuratelyas the geometry-based approach [10,11] without tedious calibra-tion processes and outperform Calderara’s homography-based ap-proach [14]. (3) Our proposed camera placement approach featuresa significant improvement in the handoff success rate at the cost ofslightly decreased coverage as compared to Erdem and Sclaroff’smethod [22] who do not consider the necessary overlapped FOVs.

The remainder of this paper is organized as follows: Section 2gives a brief introduction to related works and where our motiva-tion stems from. The camera handoff approach including consis-tent labeling method and observation measure is discussed inSection 3. Section 4 describes the camera placement approach. Sec-tion 5 illustrates our experimental results and Section 6 concludesthis paper.

2. Related work

As mentioned before, most existing camera handoff algorithmsconcentrate on the execution of consistent labeling in systemsbased on multiple perspective cameras and ignore the interrelationamong the three fundamental components. Thus, our related workstarts with the consistent labeling problem in systems based onmultiple perspective cameras and discuss the difficulties in apply-ing them in omnidirectional cameras. Afterwards, we explain whyexisting camera placement approaches fail to address the problemof determination of size of overlapped views.

Page 3: Camera handoff and placement for automated tracking systems with multiple omnidirectional cameras

C.-H. Chen et al. / Computer Vision and Image Understanding 114 (2010) 179–197 181

In the literature, consistent labeling methods can be groupedinto five categories: Feature-based, geometry-based, alignment-based, homography-based, and hybrid approaches. In feature-based approaches [7–9], color or other distinguishing features ofthe tracked objects are matched, generating correspondenceamong cameras. In the geometry-based approach [10,11], consis-tent labeling can be established by projecting the trace of thetracked object back into the world coordinate system, and thenestablishing equivalence between objects projected onto the samelocation. In the alignment-based approach [5,6,12,13], the tracks ofthe same object are recovered across different cameras after align-ment by the geometric transformation between cameras. Thehomography-based approach [14,15] obtains correspondences be-tween overlapped views in the 2D image plane. The hybrid ap-proach [16,17] is often a combination of geometry and feature-based methods.

The consistent labeling approaches mentioned above are notefficient in some cases. The geometry-based approach usuallyneeds an expensive process in real life surveillance applicationsto fully calibrate each camera in order to derive 3D information,as pointed out by Khan and Shah [12]. The work of Cai and Aggar-wal [40,41] pointed out that modeling the environment in 3D andestablishing a world coordinate are computationally expensive anddo not adapt to changes in dynamic environments. This is unneces-sary for a camera surveillance system, because most of the infor-mation needed can be extracted by observing motion over aperiod of time [12]. A feature-based approach may become unreli-able, when the disparity increases. This includes illuminationchanges, viewpoint angle [18] increases, objects with different col-ors or textiles on front and back, and so forth. Consider the work ofLowe [7] as an example, which is generally abbreviated as SIFT(scale-invariant feature transform). The stability of detection forkeypoint location does not only decrease when the viewpoint an-gle increases, but also the keypoints are not robust to severely dis-torted images such as images acquired by omnidirectionalcameras. Fig. 1 illustrates that even though SIFT can, respectively,

Fig. 1. Illustration of the SIFT approach for images acquired by two omnidirectionalcameras of the same scene. (a) Keypoint locations in the image taken byomnidirectional camera one and two and (b) no common keypoints in both imagestaken by camera one and two are found.

find keypoints in two omnidirectional images, it cannot find anycomparable keypoints in both images. Similarly, Zhou and Aggar-wal [42] integrate spatial position, shape, and color to track objectsacross multiple cameras. Their method cannot be applied to omni-directional cameras due to severe lens distortion and low imageresolution.

The work of Khan and Shah [12] in the alignment-based ap-proach has its limits if two or more objects cross simultaneously,an incorrect labeling can be established, as pointed out by thehomography-based approach of Calderara et al. [14]. As mentionedbefore, the feature-based approach is not always robust enough inmany situations, especially with severely distorted images inomnidirectional cameras. Even though the accuracy of fully cali-brated cameras is promising for the consistent labeling in omnidi-rectional cameras, the necessity of fully calibrating cameras isoften impractical in a real-time case where multiple cameras areused. The works of Gandhi and Trivedi [33], Calderara et al. [14]and Lee et al. [15], therefore, are an inspiration for us to learnhow to solve the consistent labeling problem in omnidirectionalcameras. Nevertheless, Calderara et al. [14] and Lee et al. [15] re-quire considerable manual interventions to obtain the correspon-dence between two images and do not consider both the noiseinherent to the tracking algorithms and the lens distortion intro-duced by omnidirectional cameras. This can reduce the accuracyof consistent labeling in a real-time case. Moreover, since omnidi-rectional cameras have non-uniform resolution, this increases thedifficulty in finding pixel-to-pixel correspondence between twoomnidirectional images. However, even though Gandhi and Trivedi[33] can identify each target in the scene without tedious cameracalibration or even any manual intervention, they do not considerthe case of two overlapped omnidirectional cameras.

Moreover, regardless of what approaches are used to solve theconsistent labeling problem, either multi-frame solutions such asalignment-based and homography-based approaches or the time-consuming solutions such as feature-based and hybrid-basedbased approaches need a certain amount of time to execute consis-tent labeling successfully. The size of overlapped FOV, therefore,should be large enough for successfully carrying out consistentlabeling before the object falls out of the FOV of the observing cam-era. However, most existing camera placement approaches such asthe art gallery problem (AGP) [21] and their variants are mainlyused to determine the minimum number of guards/cameras andtheir static positions for the maximum coverage of a given area.Even though the works of Erdem and Sclaroff [22] and Mittal andDavis [23] consider camera constraints such as: FOV, spatial reso-lution, depth of field (DOF), minimal cost, etc. to solve the cameraplacement problem in a surveillance system, they still do not pro-vide a solution to optimally determine the size of overlapped FOVfor carrying out camera handoff successfully. In conclusion, for thepurpose of automated and persistent object tracking, we introducean approach to optimally reserve sufficient cameras’ overlappedFOVs for carrying out camera handoff successfully, and then pro-pose an approach to formulate when camera handoff should betriggered accordingly and which camera it should be handoff toappropriately. On the other hand, our proposed camera handoffalgorithm can handoff the to-be-unseen objects to an adjacentcamera beforehand and the works of Javed et al. [19,20], Kanget al. [27], and Lim et al. [28] can be used for the compensationpurpose.

3. Camera handoff

The flow chart of the proposed camera handoff algorithm is de-picted in Fig. 2, where operations are carried out at the handoff re-quest and handoff response ends. Let the jth camera be the handoff

Page 4: Camera handoff and placement for automated tracking systems with multiple omnidirectional cameras

Handoff request end

NOHandoff response end

YES

Handoff Request

Handoff Success

Handoff Response

YES

NOj*=argmax{Sij’}

Handoff Failure

TargetVisible

NO

Tij SS <

YES

Consistent Labeling

Handoff Reject

YES

NO

Handoff Response

ConsistentLabeling

Successful

YES

Successful

Compute Sij’

NO

Fig. 2. Flow chart of the proposed camera handoff algorithm.

182 C.-H. Chen et al. / Computer Vision and Image Understanding 114 (2010) 179–197

request end and the ith object be the one that needs a transfer. Ahandoff request of the ith object is triggered and broadcasted bythe the jth camera to adjacent cameras if Sij 6 ST where Sij is theobservation measure of the ith object in the jth camera and ST rep-resents the trigger threshold. The trigger threshold is determinedby the average moving speed of the object of interest in the groundplane space and time needed to execute camera handoff success-fully. Afterwards, the jth camera keeps tracking the ith objectand waiting for positive responses from adjacent cameras whileit is still visible. Fig. 3 illustrates the concept of ST. Fig. 3a and bdemonstrate the scenarios with Sij > ST, where the object of interestremains in the field of view of the observing camera and presentsan acceptable resolution. Fig. 3c demonstrates scenario withSij 6 ST, where camera handoff is necessary due to the rapidly de-creased resolution although the object of interest stays is still inthe field of view of the observing camera.

Let the (j’)th camera be the handoff response end. Once a posi-tive response is received, the jth and (j’)th cameras perform consis-tent labeling to identify the ith object. If the association of the ithobject is established successfully, the (j’)th camera becomes a validcandidate. Otherwise, the handoff request is rejected. Back to thehandoff request end, among all valid candidate cameras the (j�)th

Fig. 3. Illustration of ST. (a and b) Demonstrate the scenarios with Sij > ST , where the objacceptable resolution. (c) Demonstrates scenario with Sij 6 ST , where camera handoff is nis still in the field of view of the observing camera.

camera with the highest observation measure j� ¼ arg maxj0

fSij0 g isselected as the most appropriate camera to take over the ith objectin the pool of candidate cameras. If no positive response is re-ceived, the jth camera continues tracking the ith object and broad-casting the handoff request to adjacent cameras until the targetfalls out of its FOV or a positive handoff response is granted. Ahandoff failure is finally issued when the target becomesuntraceable.

3.1. Observation measure

To maintain persistent and continuous object tracking, a hand-off request is triggered before the object of interest is untraceableor unidentifiable. The object of interest may become untraceable orunidentifiable due to the following reasons: (1) the object is leav-ing the camera’s FOV and (2) the object’s resolution is getting low.Accordingly, two criteria are defined in the observation measure todetermine when to trigger a handoff request: resolution Sr and dis-tance to the edge of the camera’s FOV Sd. Both Sr and Sd are scaledto [0, 1] where zero means that the object is untraceable or uniden-tifiable and one means that the camera has the best effectivenessin tracking the object.

ect of interest remains in the field of view of the observing camera and presents anecessary due to the rapidly decreased resolution although the object of interest stays

Page 5: Camera handoff and placement for automated tracking systems with multiple omnidirectional cameras

C.-H. Chen et al. / Computer Vision and Imag

The geometry of an omnidirectional camera is depicted in Fig. 4.Given a point P(X, Y, Z) in the world coordinate, the pan hP and tilthT angles are hP = tan�1(X/Y) and hT = tan�1(R/Z), respectively, withR ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiX2 þ Y2

p. The imaging process of an omnidirectional camera

does not comply with the traditional perspective projection. Let rdenote the distance between the projected point p(x, y) and theprinciple point and h the angle between the incoming ray andthe optical axis. The perspective projection is characterized byr = f tan h. To realize a wider opening angle, this relation is chan-ged. Various projection models exist in literature [34], such asthe equidistance projection r = fh, the general polynomial modelr ¼ f

PKk¼1;oddkkh

k where kk denote the approximation coefficients.The use of a polynomial model provides us the flexibility of

modeling and unifying omnidirectional cameras with various pro-jection models, which is important for the integration of multipleomnidirectional cameras. For the purpose of camera handoff andcamera placement, we need a unified basis on which omnidirec-tional cameras with various projection models can communicate.However, the difficulty lies in the selection of an appropriate poly-nomial power to avoid both over-estimated and under-estimatedproblems. To meet the above requirement, the work of Orekhovet al. [36] had proven that the Akiake’s information criterion(AIC) [30] is efficient to select the optimal polynomial power [31]over conventional methods [37–39]. Therefore, the AIC criterionis used in our model selection algorithm.

Statistical model selection is used to optimize the model param-eters when several competing models can be used to explain anobservation. In our applications, model selection optimizes thepolynomial degree, K. The AIC criterion,�2log L(x; pi) + 2N, is usedby the work of Akiake [30], where L(x; pi) is the likelihood of themodel parameters x (including a total of N camera intrinsic andextrinsic parameters). Assuming a Gaussian distribution of pi:

PrðpijxÞ ¼Y

i

1ffiffiffiffiffiffiffi2pp

r

� �exp �e2

i =ð2r2Þ� �

; ð1Þ

where ei ¼ kpi � p^

ik and p^

i is the estimated projection based on x,we have

log Lðx; piÞ ¼ argx½log PrðpijxÞ� / �

12r2

Xi

e2i ; ð2Þ

and the AIC criterion reduces to 1r2

Pie

2i þ 2N. Our model selection

algorithm optimizes the polynomial degree by minimizing the AICcriterion and proceeds as follows:

X

YZ

World coordinate

Camera coordinate

P (X, Y, Z)

X’

Y’

Z’p(x, y)r

R

X

YZ

X

YZ

X

YZ

θ

World coordinate

Camera coordinate

P (X, Y, Z)

X’

Y’

Z’

X’

Y’

Z’p(x, y)r

R

Fig. 4. Illustration of the geometry of an omnidirectional camera.

(1) Increase K.(2) Perform camera calibration and obtain x.

(3) Compute the corresponding model selection measure AIC.(4) If the measure keeps decreasing, go to step (1). Otherwise,

stop and output the model obtained in the previous iterationas the final camera calibration results.

As the polynomial degree K increases, the corresponding modelvaries from an under-fitting to an over-fitting one. As a result, theAIC criterion decreases and then increases. Since our model selec-tion algorithm starts from a smaller K (usually the initial K is set to1), it is sufficient to stop the process once the AIC criterion beginsto increase and assume that the model obtained in the previousiteration is the optimal solution. A detailed discussion regardingthe performance of the aforementioned polynomial approximationcan be found in [31,36], where an accuracy of 94.8% is reported.Another concern regarding the model selection algorithm thatmay arise is attributed to the assumption of Gaussian distribution.An extension of the AIC criterion, information complexity (ICOM) isproposed [32], where the assumption of Gaussian distribution isrelaxed. However, in our experiments, the use of ICOM does notintroduce noticeable performance improvement while the compu-tational complexity increases significantly. Therefore, consideringboth accuracy and computational cost, the AIC with the assump-tion of Gaussian distribution is exploited.

Based on the polynomial approximation with automated modelselection, we are able to define our observation measure for omni-directional cameras. The image resolution of the ith object in thejth camera is actually the partial derivative of r with respectiveto R:

Sr;ij ¼ a@r@R¼ afZ

Z2 þ R2

Xk¼1;odd

kkkhk�1; ð3Þ

where a is a normalization coefficient.The distance to EFOV of the currently observing camera for the

ith object in the jth camera is given by

Sd;ij ¼ bð1� r=roÞ2; ð4Þ

where ro represents the image size of the omnidirectional cameraand b is a normalization coefficient. The observation measure is gi-ven by:

Sij ¼ wrSr;ij þwdSd;ij; ð5Þ

where wd and wr are importance weights. Their sum is one.

3.2. Consistent labeling

Our consistent labeling algorithm can be divided into twophases, the spatial mapping phase and the pair matching phase.Fig. 5 illustrates the flow chart of these two phases. In Fig. 5a,the purpose of the spatial mapping phase is to automatically obtainthe homography functions,

xn ¼ Fxðxm; ymÞ and yn ¼ Fyðxm; ymÞ; ð6Þ

between any two adjacent omnidirectional cameras with over-lapped FOVs, where (xm, ym) represents the image coordinates of asingle object seen in the mth camera and ðxn; ynÞ represents the esti-mated image coordinates of the same object in the nth camera.

As the first step, a single object moves around randomly in theoverlapped FOVs of the mth and nth omnidirectional camera to col-lect its motion trajectory tracked by the two cameras, (xm, ym) and(xn, yn). Then, the homography functions are obtained via multipleregression models. The justification of using multiple regressingmodels is addressed in Section 3.2.1. Afterwards, the estimatedhomography functions are deployed in the system until the config-

e Understanding 114 (2010) 179–197 183

Page 6: Camera handoff and placement for automated tracking systems with multiple omnidirectional cameras

Fig. 5. Illustration of our consistent labeling algorithm including the spatial mapping phase and the pair matching phase. (a) The spatial mapping phase and (b) the pairmatching phase.

184 C.-H. Chen et al. / Computer Vision and Image Understanding 114 (2010) 179–197

uration of the system is changed. In Fig. 5b, the purpose of the pairmatching phase is to utilize the derived homography functions tomatch any pair of objects in the nth camera such as (xni, yni) andðxni; yniÞ where (xni, yni) represents the image coordinates of theith object seen in the nth camera and ðxni; yniÞ represents the esti-mated image coordinates of the ith object in the nth camera, whichis derived from the mth camera by the homography functionsshown in Eq. (6). In essence, the Wilcoxon Signed-Rank Test isincorporated into the pair matching phase to increase the accuracyof matching pairs of objects. The explanation of Wilcoxon Signed-Rank Test is detailed in Section 3.2.2.

Our spatial mapping method proceeds without the knowledgeof either the camera’s projection model or their relative position.

Polynomial approximation is used to directly model the relationbetween (xm, ym) and (xn, yn), and then obtain Eq. (6). On the otherhand, our calibration approach does not only have high flexibilityand autonomy, as compared to the geometry-based approachbased on the camera calibration, but also add the noise inherentto the tracking algorithms used by the system and the lens distor-tion introduced by omnidirectional cameras. In other words, ourapproach directly identifies objects via their image coordinateswhich have covered the noise inherent to the tracking algorithmsand the lens distortion introduced by omnidirectional cameras.Due to the imperfection inherent in the fitting model used by con-sistent labeling methods in all categories, the least square error orsimilar approaches is inefficient in matching pairs of objects, which

Page 7: Camera handoff and placement for automated tracking systems with multiple omnidirectional cameras

C.-H. Chen et al. / Computer Vision and Image Understanding 114 (2010) 179–197 185

is detailed in the following section. According to our experiments,the Wilcoxon Signed-Rank Test, which studies if any pairs have thesame distribution versus the alternative that distributions differ inlocation, is proved robust and efficient.

3.2.1. Spatial mapping phaseAt first, a single object moves around randomly in the over-

lapped FOVs of the mth and nth omnidirectional camera to collectits motion trajectory tracked by the two cameras, (xm, ym) and(xn, yn). Since the focus of this paper is not developing a multi ob-ject tracking approaches, we simply utilize the algorithm discussedby Cui et al. [24]. Once (xm, ym) and (xn, yn) are collected, we wantto find a suitable fitting method to derive the homography func-tions. Thus, we first study the correlation between (xm, ym) and(xn, yn). Table 2 shows the mean correlation values averaged acrossa variety of omnidirectional camera system setups where omnidi-rectional cameras are placed overhead with various relative dis-tances and heights. Rotations are not considered since anomnidirectional has a 360� FOV. This configuration is commonlyused in most surveillance systems. Fig. 6 shows one scatter plotmatrix between (xm, ym) and (xn, yn) for one system setup.

In Table 2, we can see that the correspondences between(xm, ym) and (xn, yn) are highly correlated, which reaches up to0.69 between xm and xn, and 0.71 between ym and yn. Two variablescan be considered as linearly related if the absolute value of theirestimated sample correlation is greater than 0.6 [35]. More specif-ically, the confidence level of a population correlation based on thederived sample correlation can be computed [35]. If we have 25–40training data trials for each camera setup, the estimated correla-tions of xm and xn, and ym and yn are 0.69 and 0.71, respectively,

Table 2Averaged correlation between (xm, ym) and (xn, yn).

xm ym

xn 0.6961 0.1027yn �0.1374 0.7126

5075

100125150175

50

100

150

100

150

200

250

100

150

200

250

xm

50 75100 150

ym

50 100 150

xn

100 150 200 250

yn

100 150 200 250

Fig. 6. Illustration of one set of the scatter plot matrix between (xm, ym) and (xn, yn).The correspondences between (xm, ym) and (xn, yn) are highly correlated, whichvalidates the use of the multiple regression model.

which both yield a confidence level of linearity of 95%. In Fig. 6,taking the plot of xm in the first row against the plot of xn in thethird column as an example, we can see a strong linear tendencybetween xm and xn. Alternatively, by viewing the fourth columnyn against ym in the second row, we observe a linear tendency be-tween ym and yn. In particular, the scattered points shown in Fig. 6are caused by non-ideal tracking. Since developing new trackingalgorithms with improved performance is not the focus of this pa-per, we choose to employ existing tracking algorithms. Accordingto the linear tendency shown in Table 2 and the scatter plot inFig. 6, the multiple regression model [25] appears to be a good can-didate to derive the homography functions by considering bothaccuracy and computational complexity.

Since the derivations for Fx and Fy functions are similar, in thefollowing discussion, we will take xn ¼ Fxðxm; ymÞ as an exampleto save space. In general, we first fit a model with all possible pre-dictor variables such as xm; ym; x

2m; xmym; y

2m; . . . ; xn

m; xmyn�1m ; . . . ; and

ynm. Let wi, with i = 1, . . . , k, represent these k predictor variables.

The pan angle in a complete model can then be expressed as:

xn ¼ b0 þ b1w1 þ b2w2 þ � � � þ bkwk þ eC ; ð7Þ

where bi, denotes the model fitting parameter and eC is a randomerror term with EfeCg ¼ 0. Usually not all predictor variables areequally significant. A subset of these variables can be found forminga reduced model:

hP;R ¼ b0 þ b1w1 þ b2w2 þ � � � þ bgwg þ eR; ð8Þ

where g < k and eR is a random error term with EfeRg ¼ 0. Let SSEC

and SSER denote the sum of squared error of the complete and re-duced models:

SSEC ¼ HTP;CHP;C �HT

P;CWP;C WTP;CWP;C

� ��1WT

P;CHP;C ;

SSER ¼ HTP;RHP;R �HT

P;RWP;R WTP;RWP;R

� ��1WT

P;RHP;R;

ð9Þ

where HP,C/HP,R is the vector of all response variables in a com-plete/reduced model and WP,C/WP,R is the vector of all predictorvariables wk/wg in a complete/reduced model.

Intuitively, if w1, w2, . . . , wk are important information contrib-uting variables, the complete model in Eq. (7) should have a smal-ler prediction error than the reduced model in Eq. (8): SSEC 6 SSER.The greater the difference (SSER � SSEC) is, the stronger is the evi-dence to support the alternative hypothesis that w1, w2, . . . , wk

are significant information contributing terms and to reject thenull hypothesis:

H0 : bgþ1 ¼ bgþ2 ¼ � � � ¼ bk ¼ 0: ð10Þ

Conversely, the acceptance of the null hypothesis suggests that theadditional predictors in the complete model, wg+1, wg+2, . . . , wk,introduce no improvement to fitting accuracy. The predictors,w1, w2, . . . , wg, in the reduced model are sufficient and more signif-icant information contributing terms than predictors,wg+1, wg+2, . . . , wk.

In order to determine when to accept or reject the null hypoth-esis, F� test is used:

F� ¼SSER�SSEC

k�gSSEC

Np�ðkþ1Þ; ð11Þ

where Np represent number of training data trials. If the nullhypothesis H0 is true, F� should possess an F distribution with(k � g) numerator degrees of freedom and Np � (k + 1) denominatordegrees of freedom. Large value of (SSER � SSEC) or equivalentlylarge value of F� leads to the rejection of the null hypothesis. Leta denote the level of significance, the probability of rejecting ahypothesis when it is true. In general, we want the probability of

Page 8: Camera handoff and placement for automated tracking systems with multiple omnidirectional cameras

186 C.-H. Chen et al. / Computer Vision and Image Understanding 114 (2010) 179–197

rejecting a hypothesis to be low when it is true. In this paper, we seta as 0.01. If a test with the level of significance a is used, F� > Fa,where Fa can be found in the percentiles of the F distribution, ap-pears to the appropriate rejection region.

Consider an example of establishing xn ¼ Fxðxm; ymÞ where acomplete model has three predictors, ym; y2

m, and y3m, and 20 train-

ing data trials (Np = 20). The level of significance is a = 0.01. Wewant to test whether the third predictor y3

m can be dropped. Theremainder parameters are given as follows: k = 3, g = 2,Fa = 8.53, SSER = 109.95 and SSEC = 98.41. According to Eq. (11), F�

is 1.88 and less than Fa. As a result, the null hypothesis is accepted.In other words, the predictor y3

m can be dropped forming a reducedmodel with two predictors, ym and y2

m.

3.2.2. Pair matching phaseThe least square error or similar approaches have been widely

used to match pairs of objects in the geometry-based or homogra-phy-based approach. Fig. 7 illustrates the problem of using theleast square error or similar approaches in the homography-basedapproach. In Fig. 7, p1m and p1n denote the pixel locations of object1 in the mth and nth camera, respectively. p2m and p2n denote thepixel locations of object 2 in the mth and nth camera, respectively.p1n and p2n, respectively, denotes the estimated pixel locations ofobjects 1 and 2 in the mth camera, which are, respectively, derivedfrom p1m and p2m by correspondence functions shown in Eq. (6).Both estimated pixel locations, p1n and p2n, suffer degradationcaused by image noise and distortions, while the precision of cali-bration method cannot be guaranteed. As a result, the distances be-tween p1n and p1n, and p1n and p2n (p2n and p1n, and p2n and p2n)are the same based on the least square error method, therefore,the least square error or similar approaches unable to match thepairs appropriately.

In order to overcome the problem that least square error or sim-ilar approaches face, the Wilcoxon Signed-Rank Test is used tosolve the problem [25], because it can test if any pairs have thesame distribution versus the alternative that distributions differin location. In addition, since the distribution of each pair is un-known, a nonparametric statistical approach should be used in thiscase, instead of the parametric approach (such as the small-samplehypothesis testing based on a normal distribution presumption[25]). To carry out the Wilcoxon Signed-Rank Test, we calculatethe differences for each pair in the pool of collected objects’ motiontrajectories. Then we rank the absolute values of the differences,assigning a 1 to the smallest, a 2 to the second smallest, and so

Object 1 Object 2

Camera nCamera m

Fx , Fy

Fx , Fyp1m

p2m

p1n

p2n

nˆ 2p

n1p

Fig. 7. Illustration of the problem caused by the least square error or similarapproaches for matching pairs. Because the distances between p1n and p1n , and p1n

and p2n (p2n and p1n; and p2n and p2n) are similar due to image noise, the leastsquare error or similar approaches cannot match the pairs appropriately.

on. If two or more absolute differences are tied for the same rank,the average of the ranks that would be assigned to these differ-ences is assigned to each member of the tied group. We use T[25] as a test statistic to test the null hypothesis that the two pop-ulation relative frequency histograms are identical. The smaller thevalue of T, the greater will be the evidence favoring the rejection ofnull hypothesis. Hence we will reject the null hypothesis if T is lessthan or equal to value, T0, based on the assigned significance level,a. Since each object in our case has 2D position in the image coor-dinates, positions along x-axis and y-axis are, respectively, calcu-lated and tested twice, only the two of that show the sameevidence will be paired.

For clear presentation, an example of how the Wilcoxon Signed-Rand Test is incorporated into the pair matching phase is illus-trated. Since the calculation for positions along x-axis and y-axisare similar, only the computation along the x-axis is presented tosave space. Test the hypothesis that there is no difference in pop-ulation distributions of positions along the x-axis for a matchedpairs experiment involving six acquired positions over six frames,one for object A and the other for object B in each pair in imageswith 320 � 240 resolutions. Table 3 illustrates paired data andthe calculation for the Wilcoxon Signed-Rank Test.

The null hypothesis to be tested is that the two population dis-tributions of positions along x-axis are identical. The alternativehypothesis is that the distributions differ in location. We conductour test with a = 0.1. According to the work of Wackerly et al.[25], T0 is equal to 2. Hence, the null hypothesis is rejected ifT 6 2. Because there is only positive difference that has rank 3,T+ = 3 and T� = 18 (5 + 1.5 + 4 + 1.5 + 6 = 18), we have T = 3 by therule of T = min (T+, T�). Since the observed value of T exceeds T0,there is not sufficient evidence to indicate a difference in the twopopulation distributions of positions along the x-axis. If the sameresult with the two population distributions of positions alongthe y-axis, we could claim that these two objects, A and B, arethe same and consistent labeling is established.

4. Camera placement

As we discussed before, a complete camera handoff processincluding abovementioned three components needs a certainamount of time to be executed successfully, especially the timeneeded to execute consistent labeling. This verifies the need of suf-ficient overlapped FOVs. Most existing camera placement ap-proaches such as the art gallery problem [21], and the works ofErdem and Sclaroff [22] and Mittal and Davis [23] do not providea solution to optimally determine the size of overlapped FOVs forcarrying out consistent labeling successfully.

Assume that a polygonal floor plan is represented as an occu-pancy grid. Let A1 represent the grid coverage with a1,ij = 1 if Sij > SF

and a1,ij = 0 otherwise. Two additional matrices are constructed A2

and A3. The matrix A2 has a2,ij = 1 if SF < Sij 6 ST and a2,ij = 0 other-wise, where SF and ST denote the failure and triggering threshold.The following relation holds: ST = SF + lVmTH where Vm representsthe average moving speed of the object of interest, TH denotesthe average duration for a successful camera handoff, and l is a

Table 3Paired data and the calculation for the Wilcoxon Signed-Rank Test.

Object A Object B Difference (A � B) Absolutedifference

Rank of absolutedifference

135 129 6 6 3102 120 �18 18 5108 112 �4 4 1.5141 152 �11 11 4131 135 �4 4 1.5144 163 �19 19 6

Page 9: Camera handoff and placement for automated tracking systems with multiple omnidirectional cameras

C.-H. Chen et al. / Computer Vision and Image Understanding 114 (2010) 179–197 187

conversion scalar. SF is the failure threshold that can simply beinterpreted as invisible areas. In doing so, the necessary time mar-gin for executing camera handoff is converted to the thresholdsthat trigger camera handoff. The matrix A3 has a3,ij = 1 if Sij P ST

and a3,ij = 0 otherwise. Matrices A2 and A3 represent the handoffsafety margin and visible area, respectively. The solution vector xspecifies a set of chosen camera configurations with the corre-sponding element xj = 1 if the configuration is chosen and xj = 0otherwise. Let a1,i, a2,i, and a3,i denote the ith row of the coefficientmatrices A1, A2, and A3. The objective function is formulated as:

ci ¼ w1ða1;ix > 0Þ þw2ða2;ix ¼ 2Þ �w3ða3;ix > 1Þ; ð12Þ

where w1, w2, and w3 are predefined importance weights. The oper-ation (a1,ix > 0) means

ða1;ix > 0Þ ¼1 a1;ix > 0;0 otherwise

:

The first term in the objective function considers coverage, the sec-ond term produces sufficient overlapped handoff safety margins,and the third term penalizes excessive overlapped visible areas.

Let the cost associated with the jth camera configuration be xj.Given the maximum cost Cmax, the Max-Coverage problem can bedescribed by:

maxX

i

ci; subject toX

j

xjxj 6 Cmax: ð13Þ

Given a specified coverage vector bC,o or a minimum overall cover-age Cmin, the Min-Cost problem can be modeled as:

minX

j

xjxj then maxX

i

ci; subject to A1xPbC;o orX

i

bi PCmin: ð14Þ

To validate our objective function, we consider the positioning oftwo cameras for example. Since the resolution and distance compo-nents, Sr,ij and Sd,ij, for omnidirectional cameras are radial symmet-ric, it is sufficient to study the variations along the radial direction.Fig. 8a shows the FOVs of two omnidirectional cameras placed DRdistance apart. We want to examine the behavior of our objectivefunction F ¼

Pici with varying DR. The contours defined by Sij = SF

and Sij = ST are concentric circles with radii of RF and RT, respectively.Fig. 8e depicts the values of the objective function as a function ofDR. Different choices of w1 are used to illustrate their influenceon the optimal camera position. The optimal camera position isachieved with 2RT 6 DR� 6 RF + RT. The actual position depends onthe w1 used. Like the case of perspective cameras, a smaller w1 re-sults in a camera placement with a smaller DR�. The exact expres-sion and derivative of the objective function for omnidirectionalcameras are given in Eqs. (15) and (16), respectively. The derivationof the following equations involves the computation of the over-lapped areas of two adjacent discs with radii of RF and RT. To facil-itate the understanding of Eq. (15), the computations of theoverlapped areas are illustrated in Fig. 8b–d.

F1 ¼w1pR2F þðw2�w1Þ R2

F cos�1 DR2RF

� ��DR

2

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiR2

F � DR2

�2q�

RF þRT 6DR62RF ;

F2 ¼ F1�w2 R2T cos�1 DR2�R2

TþR2F

2DRRT

� �þR2

F cos�1 DR2�R2FþR2

T2DRRF

� �h

�12

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið�DRþRF þRT ÞðDRþRF �RT ÞðDR�RF þRTÞðDRþRF þRTÞ

p �2RT 6DR<RF þRT ;

F3 ¼ F2þðw2�w3Þ R2T cos�1 DR

2RT

� �� DR

2

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiR2

T � DR2

�2q�

RT 6DR<2RT ;

8>>>>>>>>>>>>>>>><>>>>>>>>>>>>>>>>:

ð15Þ

F 01 ¼�ðw2�w1ÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiR2

F � DR2

�2q

RF þRT 6DR62RF ;

F 02 ¼ F 01þw2

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi�DR4�R4

F�R4Tþ2DR2R2

Fþ2DR2 R2Tþ2R2

F R2T

DR2

q2RT 6DR<RF þRT ;

F 03 ¼ F 02�ðw2�w3ÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiR2

T � DR2

�2q

RT 6DR<2RT :

8>>>><>>>>:

ð16Þ

5. Experimental results

In this section, we first verify the effectiveness of our proposedcamera handoff algorithm. For consistent labeling, our spatial map-ping method is compared with the geometry-based and homogra-phy-based approaches in the spatial mapping stage, while ourWilcoxon Signed-Rank Test method is compared with the leastsquare error approach in the pair matching stage. In the meantime,we study the individual and combined effects of the two compo-nents, Sr,ij and Sd,ij defined in the observation measure by a real-time video sequence. To show the effectiveness of our proposedcamera placement method, the algorithm presented by Erdemand Sclaroff [22] is implemented and used as a comparison refer-ence. The performance of these two algorithms is compared interms of coverage and handoff success rate.

Hand off successrate ¼Number of successfully carried out handoff requests

Number of handoff requests: ð17Þ

5.1. Experiments on observation measure

In the following experiments, we study the behavior of thenewly defined observation measure, as shown in Figs. 9 and 10.In Fig. 9, we illustrate the resolution Sr,ij and distance to the bound-ary of camera’s FOV Sd,ij by a video sequence. As expected, Sr,ij in-creases as the target moves toward one of the omnidirectionalcamera along the optical axis and Sd,ij increases as the target movestoward the image center. In order to provide a clearer view of Sij ina generic case, Fig. 10 shows the plot of the corresponding Sij val-ues. In Fig. 10, an omnidirectional camera is placed atTc ¼ ½0 0 3 m�T . The image size is conventional 640 � 640.Points are uniformly sampled in the ground plane (Z = 0) with X:�6 m � 6 m and Y: 6 m � 6 m. The normalization coefficient forthe resolution component is given by a ¼ 6

640 ¼ 9:4� 10�3. Otherparameters used are listed as follows: D = 1, wr = 0.25, andwd = 0.75. The best observation area is in the vicinity of [0, 0, 0].As the object moves away from this area, the Sij value decreases.The proposed observation measure gives a quantified measure ofthe tracking or observation suitability, which also agrees withour intuition and visual inspection.

5.2. Experiments on consistent labeling

In order to show how to derive the homography functions de-fined in Eq. (6), an indoor real-time surveillance environment withdimensions of 30 m � 15 m � 3 m, two omnidirectional cameras(IQeye3) are placed 3 m apart. Background differencing and radialprofile analysis [24] are used for target detection and tracking.Omnidirectional images, with a resolution of 320 � 320, are ob-tained via an intranet connection with four frames per second.During the spatial mapping phase, one single object moves aroundrandomly in the environment when its relative coordinates aresimultaneously collected by two IQeye3 cameras to derive thehomography functions defined in Eq. (6). From our experimentalresults, the homography functions are listed in Eq. (18),

x2¼�755:23þ2:88x1þ0:03ðx1�225:7Þ2þ0:02ðy1�174:42Þ2

and

y2¼�281:1þ0:77x1þ1:2y1þ0:03ðy1�174:42Þ2þ0:02ðy1�174:42Þðx1�225:7Þ:ð18Þ

The polynomial model used in this paper is utilized to estimatemodel parameters in situations that error terms, eR and eC , are nor-mally distributed and that variance of the error terms does not de-pend on the value of any independent variables. Generally,

Page 10: Camera handoff and placement for automated tracking systems with multiple omnidirectional cameras

Fig. 8. (a) Illustration of the FOVs in the ground plane (Z = 0) of two omnidirectional cameras. The position of camera 1 is fixed while camera 2 is free to translate. (b–d) Thecomputation of the overlapped areas. (e) The objective function for omnidirectional cameras with varying DR and different choices of w1, the weight assigned to the coverageterm in Eq. (12) w2 = 2, w3 = 5, RF = 1, RT = 0.5.

188 C.-H. Chen et al. / Computer Vision and Image Understanding 114 (2010) 179–197

assessments of the validity of our polynomial model assumptionsare based on analyses of residuals, the differences between the ob-served and predicted values of the response variable. Data pointswith unusually large residuals may be outliers that indicate thatsomething went wrong when the derived model was made. In otherwords, error terms, eR and eC , are not normally distributed or vari-ance of the error terms depend on the value of any independentvariables. This can be caused by the nature of collected data. Theroot mean squared error (RMSE) between the observed and pre-dicted response variables along x-axis is in between 10 and 17 pix-els in our data set. It is calculated by

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðx2i � x2iÞ

p 2where x2i and x2i

represent the ith predicted and observed response variables, corre-sponding to a maximally relative error of 5.3% 17

320 ¼ 5:3% �

whennormalized with respect to the image width. Similarly, the RMSE

between the observed and predicted response variables along y-axisis in between 8 and 14 pixels, corresponding to a maximally relativeerror of 5.7% 14

320 ¼ 5:7% �

when normalized with respect to the im-age height. It does not appear unusually large residuals according tothe work of Wackerly et al. [25]. Thus, it validates the accuracy ofthis selected model.

In order to further understand, in terms of statistical compari-sons, how the Wilcoxon Signed-Rank Test improves the accuracyof pair matching as compared with the least square error approachin conjunction with different calibration methods, we conductexperiments in an indoor environment with dimension of50 m � 35 m � 3 m, where 10 people walk in both random and reg-ular ways in a tested indoor environment with five omnidirectionalcameras (three and two joint views) at a speed less than 4 km/h.

Page 11: Camera handoff and placement for automated tracking systems with multiple omnidirectional cameras

Fig. 9. Illustration of the resolution and distance to the boundary of camera’s FOV. (a) Sr,ij = 0.35 and Sd,ij = 0.36, (b) Sr,ij = 0.54 and Sd,ij = 0.59, (c) Sr,ij = 0.74 and Sd,ij = 0.79 m, (d)Sr,ij = 0.89 and Sd,ij = 0.88, (e) Sr,ij = 0.59 and Sd,ij = 0.6, and (f) Sr,ij = 0.18 and Sd,ij = 0.15.

-4 -2 0 2 4 6-6 -4 -2 0 2 4

0

0.2

0.4

0.6

0.8

1

X (m)Y (m)

SijHandoff

safety marginVisible area

ST

SF

-6

Fig. 10. Graphical illustration of the observation measure and handoff safetymargin for the omnidirectional camera.

Table 4Specification of tested methods for the study of the performance of various consistentlabeling methods.

Tested methods A B C D E

Spatial mapping Our method � �Geometry-based approach � �Calderara’s method [14] �

Pair matching Wilcon Signed-Rank Test � �The least square error � � �

C.-H. Chen et al. / Computer Vision and Image Understanding 114 (2010) 179–197 189

Background differencing and radial profile analysis [24] are used fortarget detection and tracking. Omnidirectional images, with a reso-lution of 320 � 320, are obtained via an intranet connection withfour frames per second. Configurations between any two omnidirec-tional cameras are identical to one in the abovementioned experi-ment. Thus, homography functions listed in Eq. (18) are applied tothis. The performance of the proposed consistent labeling algorithmis compared with two reference algorithms: geometry-basedmethod and Calderara’s homography-based method [14]. For thegeometry-based method, Zhang’s calibration algorithm [26] isimplemented to recover the 3D information of tracked objects bylearning its intrinsic and extrinsic parameters and distortion modelbased on a total of eight images (resolution: 800 � 800 pixels) of aplanar checkerboard with 49 control points per image. Table 4 spec-ifies the methods used in the calibration and pair matching stages foreach tested method. For example, method C employs the geometry-based approach and the least square error approach for calibrationand pair matching, respectively.

Fig. 11 shows the success rate of consistent labeling with re-spect to the number of frames used for pair matching. In general,we can see that the success rate of consistent labeling increasesas the number of frames used for pair matching increases for alltested methods. Method C has the highest success rate of consis-tent labeling (61%) when only one frame is used, because it can re-cover the 3D position of the tracked object with a better accuracy.Our method (method A) can achieve up to 90% success rate of con-sistent labeling if at least 10 frames are used and yields a similarperformance as method B. Since Calderara’s method uses the leastsquare error method to match pairs and is inefficient in findingpixel-to-pixel correspondence between two omnidirectional cam-era images, our method (method A) outperforms it when morethan three frames are used to match pairs. Moreover, the WilcoxonSigned-Rank Test used in method A improves the success rate ofconsistent labeling of method D where the least square error ap-proach is used, which validates the effectiveness of the WilcoxonSigned-Rank Test. However, when only one frame/trial datum iscollected, the Wilcoxon Signed-Rank Test cannot rank the absolutevalue of no difference. As a result, its success rate of consistentlabeling is approximately zero. However, the more trail data arecollected, the higher the success rate of consistent labeling willturn to. In conclusion, even though geometry-based approach(method C) can reach a 80% success rate of consistent labeling withsix frames, which is only four frames less than what our proposedconsistent labeling method needs (method A), it needs an expen-sive procedure to calibrate each camera, which is almost impracti-cal in real-time surveillance.

Page 12: Camera handoff and placement for automated tracking systems with multiple omnidirectional cameras

1 For interpretation of color in Figs. 13–15, the reader is referred to the web versionof this article.

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9 10 12 14

Number of Frames

Han

do

ff S

ucc

ess

Rat

e (%

)

Our consistent labelling method (Method A)

Geometry-based approach with Wilcoxon Signed-RankTest (Method B)Geometry-based approach with the least square error(Method C)Our callibratioin phase with the least square error(Method D) Calderara's method (Method E)

Fig. 11. Performance of various consistent labeling methods based on the handoff success rate versus the number of frames used.

190 C.-H. Chen et al. / Computer Vision and Image Understanding 114 (2010) 179–197

The performance of our proposed consistent labeling methodrelies on the Wilcoxon Signed-Rank Test to compensate for theimperfection of homography functions derived by multiple regres-sion models. This method is not only computationally efficient(O(n) where n is the number of observations/frames) [43], but ro-bust to varied proximity and trajectory difference between objects.In essence, the common parametric alternative to this nonpara-metric method, Wilcoxon Signed-Rank Test, is the t test. This isthe statistical test of choice if the data are found to be normally dis-tributed. The t test, however, has been found to be unsatisfactorywhen the distributions of the variables being considered heavytails [44]. Blair and Higgins [44] used Monte Carlo methods to as-sess the relative power of the t test and Wilcoxon Signed-Rank Testunder 10 different distributional shapes. They concluded that (1)when the data are not normally distributed, the WilcoxonSigned-Rank test is more powerful than the t test. Thus, the Wilco-xon Signed-Rank Test is more suitable for this application, becauseHelbing [45] has proven that the distribution of pedestrian trajec-tories is less likely to be normal distribution and (2) the magnitudeof the Wilcoxon’s power often increased with sample size.

Nevertheless, the Wilcoxon Signed-Rank test is less robust toclustered data where pairs might be dependent [46,47]. Thisdependence can be translated to the proximity and trajectory dif-ference in our experiments. One solution [44,48,49] to improveit’s the robustness of the Wilcoxon Signed-Rank test is to increaseits sample size (i.e. the number of frames in our experiments),which agrees to our experiments. However, in literature, no theo-retic derivations are available to determine the appropriate samplesize. Instead, the appropriate sample size is usually obtainedempirically. In general, the bigger the proximity and trajectory dif-ference between objects are, the smaller number of frames isneeded. Fig. 12 illustrates two examples with different levels ofproximity and trajectory difference between objects at which theWilcoxon Signed-rank test can and cannot tolerate. In Fig. 12a,even though two objects are close to each other and their trajecto-ries are identical, the Wilcoxon Signed-Rank test is still capable ofdistinguishing those two objects across cameras. However, inFig. 12b, when these two objects are within 30 cm in ground space,

causing partial occlusions, and their trajectories are identical, theWilcoxon Signed-Rank test is incapable of distinguishing thosetwo objects before they fall out of the FOV of the observing camerasince there is no sufficient size of overlapped FOVs. This is cased bythe tie of T, which is a statistic to test whether or not the two pop-ulations are identical [25]. Consistent labeling may fail when thetie of T is not resolved before the objects fall out of the FOV ofthe observing camera. However, the Wilcoxon Signed-Rank Testis still robust to the case where the objects’ paths intersect witheach other in our experiments, as long as these paths only haveseveral overlapped points and the rest of their trajectories do notappear to the case where when these two objects are within30 cm in ground space, causing partial occlusions, and their trajec-tories are identical. The main challenging part in this case is how todifferentiate the paths after they merged. The Wilcoxn Signed-Rank Test fails in such scenarios. In the future work, a compensa-tion method will be investigated.

5.3. Experiments on camera handoff

To clearly exhibit how a camera handoff is triggered, three casesare illustrated in omnidirectional cameras environment where twoomnidirectional cameras are used in the first two case and threeomnidirectional cameras are used in the third case. In case 1,two objects are walking in an opposite direction. In case 2, four ob-jects are walking in the same direction. In case 3, two objects arewalking in the same direction within three joint omnidirectionalviews. The threshold ST is 0.3 to comply with the time needed forexecuting camera handoff (10 frames) and the maximal movingspeed of the objects (0.6 m/s). Fig. 13a and b shows the sampledframes for the first two cases, respectively. In both cases, solidgreen,1 blue, yellow, and purple circles/rectangles represent ob-jects 1, 2, 3, and 4, respectively. Solid green, blue, yellow, or purplecircles/rectangles with red circles or rectangles outside indicate

Page 13: Camera handoff and placement for automated tracking systems with multiple omnidirectional cameras

Fig. 12. Performance illustration of Wilcoxon Signed-Rank Test. (a) Two objects are close to each other and their trajectories are identical and (b) two objects are within30 cm in the ground space, causing partly occluded, and their trajectories are identical.

C.-H. Chen et al. / Computer Vision and Image Understanding 114 (2010) 179–197 191

this object is under camera handoff. Fig. 14 shows the sampledframes for the third case. In this case, solid green and blue cir-cles/rectangles represent object 1 and 2. Solid green and blue cir-cles/rectangles with red circles or rectangles outside indicate thisobject is under camera handoff.

In Fig. 13a, since both objects are close to the EFOV of camera1 (S11 = 0.2 and S21 = 0.2) in frame f0, they are under camerahandoff process and camera two is only capable of tracking ob-ject 1. However, since object 2 is not seen by camera 2, it istracked by camera 1 until it becomes untraceable. From framef0+10 to f0+20, object 2 is no long under camera handoff processand object 1 (S12 = 0.22) is under camera handoff process sinceit is close to EFOV of camera 2. In frame f0+30, object 1 is handedover to camera 1 and object 2 (S21 = 0.29) is under camera hand-off process. In frame f0+45, object 1 and object 2 are tracked bycameras 1 and 2, respectively. In Fig. 13b, four objects aretracked by camera 1 from frame f0+100 to f0+130. In frame f0+145,because four objects are close to the EFOV of camera 1 and theirresolutions are deteriorating (S11 = 0.27, S21 = 0.2, S31 = 0.2, andS41 = 0.6), they are under handoff process. In frame f0+155, thehandoff process is carried out successfully for each object andthe four objects are tracked by camera 2.

In Fig. 14, both objects are tracked by camera 1 from frame f0 toframe f0+30, but they are moving toward the EFOV of camera 1. Inframe f0+30, even though objects 1 and 2 are visible to cameras 2and 3, they are not tracked by cameras 2 and 3, respectively, at thatmoment. This is because camera 1 is still able to track objects 1 and2 in frame f0+30 (S11 = 0.38 and S21 = 0.32) so camera handoff re-quests are not triggered. In frame f0+40, since they are close tothe EFOV of camera 1 (S11 = 0.19 and S21 = 0.21), they are undercamera handoff process and both cameras 2 and 3 are capable oftracking both objects. However, according to our criterion indi-cated in Section 3, the highest observation measurej� ¼ arg max

j0fSij0 g is selected as the most appropriate camera to

take over the ith object in the pool of candidate cameras j’. Thus,object 1 is assigned to camera 2, because it has the highest obser-vation measure in camera 2 (S12 = 0.68 and S13 = 0.28). Object 2 isassigned to camera 3, because it has the highest observation mea-sure in camera 2 (S22 = 0.29 and S23 = 0.8). In conclusion, our pro-posed consistent labeling algorithm can perform as accurately asthe geometry-based approach without tedious calibration pro-cesses and outperform Calderara’s homography-based approach[14]. In the meantime, our observation measure can quantitativelyformulate the effectiveness of a camera in observing the trackedobject, so that camera handoff can smoothly transfer objects forautomated and persistent object tracking. In addition, our systemdesign follows the distributed approach, where cameras only ex-change information with adjacent cameras. Usually, one cameracommunicates with 2–4 other cameras, which is optimized bycamera placement. As the scale of the camera network increases,it is always doable to divide the whole network into several sub-nets, where one camera communicates with limited number ofadjacent cameras. Therefore, due to the distributed nature of oursystem, computations are carried out in each subnetwork indepen-dently. In this sense, our system readily adopts parallelcomputations.

5.4. Experiments on camera placement

In this section, we conduct camera placement experiments intwo indoor floor plans. Our proposed camera placement methodis compared with the reference algorithm proposed by Erdemand Sclaroff [22]. The floor plans under the test are shown inFig. 15a and b. The floor plan in Fig. 15a represents two types of in-door areas encountered in practical surveillance: space with obsta-cles (region A illustrated in yellow) and open space wherepedestrian can move freely (region B illustrated in green). RegionB is deliberately included since it imposes more challenges on cam-

Page 14: Camera handoff and placement for automated tracking systems with multiple omnidirectional cameras

Camera 1

Camera 2

fo fo+10 fo+20 fo+30fo+45

Camera 1

Camera 21

1 1

112 2 2

2 2

21

1

Object 2

Object 1

Object 1

Object 2

Camera 1

Camera 2

1 1 1 12

11

22

2

2 2

3 3 3

3 3

4 4 4

4 4

Camera 1

Camera 2

Fo+100 fo+120 fo+130 fo+145 fo+155

Object 1

Object 2

Object 3

Object 4

Object 1

Object 2

Object 3

Object 4

Fig. 13. The illustration of camera handoff procedure in a real time system with two cases. Solid green, blue, yellow, and purple circles/rectangles represent tracked object 1,2, 3, and 4, respectively. Solid green, blue, yellow, or purple circles/rectangles with red circles or rectangles outside indicate this object is under camera handoff. (a) Case 1:two objects are walling in an opposite direction and (b) case 2: four objects are walking in the same direction.

192 C.-H. Chen et al. / Computer Vision and Image Understanding 114 (2010) 179–197

era placement when considering the handoff success rate. Fig. 15billustrates an environment with a predefined path where workersproceed in a predefined sequence.

To obtain a statistically valid estimation of handoff success rate,simulations are carried out to enable a large number of tests undervarious conditions. The work of Antonini et al. [29] for pedestrianbehavior simulator is implemented so that we can have a closeresemblance to the experiments in real environments and in turnan accurate estimation of the handoff success rate. In our experi-ments, the arrival of the pedestrian follows a Poisson distributionwith an average arrival rate of 0.05 (person/second). The averagemoving speed is 0.5 m/s. This is obtained by dividing the lengthof the target’s trajectory in the ground plane by the time duration

from the moment the target first appears in the environment tillthe moment the target disappears from the environment. Duringits stay in the environment, the target may walk and stop for someother activities. Therefore the average moving speed is lower thanthe average walking speed of a pedestrian. Note that differentbehaviors are included in the experiments purposefully so as totest the performance of our algorithm under various conditions be-sides the standard test where the pedestrians walk at a normspeed. It is also worth pointing out that the moving speed usedin our experiments is not necessarily the same as the speed usedin the definition of the trigger threshold. In our implementation,a speed of 2 m/s is used in trigger threshold ST, which guaranteesthe successful handoff between adjacent cameras as the pedestri-

Page 15: Camera handoff and placement for automated tracking systems with multiple omnidirectional cameras

1 1

1 1

1

2

1

22 2 2

2

1 2

Camera 1

Camera 2

Camera 3

fo fo+20 fo+30 fo+40fo+50

Camera 1

Camera 2

Camera 3

Object 1

Object 2

Object 1

Object 2

Object 1

Object 2

1 1

1 1

1

2

1

22 2 2

2

1 2

Camera 1

Camera 2

Camera 3

fo fo+20 fo+30 fo+40fo+50

Camera 1

Camera 2

Camera 3

Object 1

Object 2

Object 1

Object 2

Object 1

Object 2

Fig. 14. The illustration of camera handoff procedure in a real time system with two objects walking in the same direction within three joint ominidirectional views. Solidgreen and blue circles/rectangles represent tracked objects 1 and 2, respectively. Solid green and blue circles/rectangles with red circles or rectangles outside indicate thisobject is under camera handoff.

C.-H. Chen et al. / Computer Vision and Image Understanding 114 (2010) 179–197 193

ans walks at a norm speed. Three hundred pedestrian traces arerandomly generated for our simulation. Several points of interestare generated randomly to form a pedestrian trace. Fig. 15a depictsome randomly generated pedestrian traces.

Figs. 16 and 17 show optimal camera arrangements for two in-door floor plans shown in Fig. 15a and b, respectively. In Fig. 16, atthe cost of slight decrease in coverage, the handoff success rate sig-nificantly increases from 52.8% to 79.0%. In Fig. 17, the similar re-sult exists. The handoff success rate increases from 50% to 92.6% atthe cost of slight decrease in coverage from 92.1% to 81.5%. Ourexperiment validates the importance of reserving sufficient cam-eras’ overlapped FOVs for improving the overall performance ofthe automated surveillance system in terms of the handoff successrate. Our proposed camera placement method exhibits a significantincrease in the camera handoff success rate at the cost of slightlydecreased coverage, as compared to Erdem and Sclaroff’s methodwithout considering the necessary overlapped FOVs.

Fig. 18 illustrates the effect of the case shown in Fig. 17 on areal-time system. DRa and DRb are 10 m and 7 m, respectively. Inthis experiment, the threshold ST is 0.3 to comply with the timeneeded for executing camera handoff (10 frames) and the maximalmoving speed of the objects (0.6 m/s). In Fig. 18a, positions of twoomnidirectional cameras are determined by Erdem and Sclaroff’smethod. We can see that the object is tracked by camera 1 fromframe f0+10 to f0+20. In frame f0+30, since the object is close to EFOV

of camera 1 and its resolution is deteriorating (S11 = 0.24), it is un-der handoff process. However, since the size of overlapped FOV isnot large enough, camera 2 cannot track the handoff object withenough resolution even in frame f0+40. As a result, camera handofffails and the track of the object is lost. Fig. 18b illustrates a similarscenario with a camera placement generated from our method. Asexpected, camera handoff is successfully carried out from f0+30 tof0+40, because the size of overlapped FOV is optimized. The objectof interest is tracked continuously across two cameras.

6. Conclusions

A complete camera handoff comprises three fundamental com-ponents, time to trigger handoff process, the execution of consis-tent labeling, and the selection of the next camera. Becauseomnidirectional cameras experience non-uniform resolution andsevere distortion, existing homography-based and feature-basedconsistent labeling methods are not robust. Even though the accu-racy of fully calibrated cameras is promising for handling thisproblem, the necessity of fully calibrating cameras is impracticalin a real-time case where multiple omnidirectional cameras areused. Thus, we presented a novel consistent labeling algorithmincluding a spatial mapping phase and a pair matching phase toaddress the problem. In the spatial mapping phase, the homogra-

Page 16: Camera handoff and placement for automated tracking systems with multiple omnidirectional cameras

(a) (b)

15m

20m

Operational tablePathObstaclesTrajectory

10m

11m

10m 11m

16m

: Entrance/Exit

Height: 3m

: Obstacles: Trajectory

AB

: Point of interest

10m

11m

10m 11m

16m

: Entrance/Exit

Height: 3m

: Obstacles: Trajectory

AB

: Point of interest

Fig. 15. Illustration of the two indoor floor plans (a) and (b).

Areas covered by 0 1 2cameras

Camera position

(a)

Areas covered by 0 1 2cameras

Camera position

(b)

Fig. 16. Optimal camera positioning of the first indoor floor plan using omnidirectional cameras (a) Erdem and Sclaroff’s method (coverage: 88.4% and handoff success rate:52.8%) and (b) our method (coverage: 86.0% and handoff success rate: 79.0%).

Area

s co

vere

d by

2

1

0

ca

mer

as

Camera position

(a)

Area

s co

vere

d by

2

1

0

ca

mer

as

Camera position

(b)

Camera 1

Camera 2

Camera 1

Camera 2

ΔRa

ΔRb

Fig. 17. Optimal camera positioning of the second indoor floor plan using omnidirectional cameras (a) Erdem and Sclaroff’s method (coverage: 92.1% and handoff successrate: 50%) and (b) our method (coverage: 81.5% and handoff success rate: 92.6%).

194 C.-H. Chen et al. / Computer Vision and Image Understanding 114 (2010) 179–197

phy functions across omnidirectional cameras can be automati-cally recovered. In the pair matching phase, the WilcoxonSigned-Rank Test was used to match objects for an improved suc-cess rate of consistent labeling. Experimental results verified thatour proposed consistent labeling algorithm can perform as accu-rately as the geometry-based approach without tedious calibration

processes and outperform Calderara’s homography-basedapproach.

Since most existing camera handoff algorithms do not clearlydefine the time to trigger handoff and the selection of the nextcamera, in this paper we designed an observation measure toquantitatively formulate the effectiveness of object tracking so

Page 17: Camera handoff and placement for automated tracking systems with multiple omnidirectional cameras

fo+10 fo+20 fo+30 fo+40

Camera 1

Camera 2

fo+10 fo+20 fo+30 fo+40

Camera 1

Camera 2

fo+10 fo+20 fo+30 fo+40fo+10 fo+20 fo+30 fo+40

Camera 1

Camera 2

(a)

fo+10 fo+20 fo+30 fo+40

Camera 1

Camera 2

fo+10 fo+20 fo+30 fo+40

Camera 1

Camera 2

(b)

Fig. 18. Illustration of the effect of two camera placement methods on consistent labeling in a real time system (a) Erdem and Sclaroff’s method and (b) our method.

C.-H. Chen et al. / Computer Vision and Image Understanding 114 (2010) 179–197 195

that we can trigger camera handoff timely and select the nextcamera appropriately before the tracked object falls out of thefield of view (FOV) of the currently observing camera. The effec-tiveness of our proposed complete camera handoff method wasvalidated via a real-time multi-camera and multi-object trackingsystem, where camera handoffs are executed smoothly andsuccessfully.

In addition, most existing camera placement algorithms do notconsider a sufficient size of overlapped FOVs, which is necessary tosecure the success of the camera handoff. We proposed a cameraplacement approach that searches for the optimized tradeoff be-tween the overall coverage and the size of overlapped FOVs tomaximize the performance of the automated surveillance systemin terms of the continuity of the tracked object. Experimental re-

sults exhibited a significant increase in the handoff success rateat the cost of slightly decreased coverage as compared to Erdemand Sclaroff’s method.

Acknowledgment

This work was supported in part by the University ResearchProgram in Robotics under Grant DOE-DE-FG52-2004NA25589.

References

[1] S. Morita, K. Yamazawa, N. Yokoya, Networked video surveillance usingmultiple omnidirectional cameras, in: IEEE International Symposium onComputational Intelligence in Robotics and Automation, Japan, July 2003.

Page 18: Camera handoff and placement for automated tracking systems with multiple omnidirectional cameras

196 C.-H. Chen et al. / Computer Vision and Image Understanding 114 (2010) 179–197

[2] K. Iwata, Y. Satoh, I. Yoda, K. Sakaue, Hybrid camera surveillance system byusing stereo omnidirectional system and robust human detection, in: IEEEPacific-Rim Symposium on Image and Technology (PSIVT 2006), Taiwan,December 2006.

[3] T.E. Boult, X. Gao, R. Micheals, M. Eckmann, Omnidirectional visualsurveillance, Image and Vision Computing 22 (2004) 515–534.

[4] T.E. Boult, R. Micheals, X. Gao, P. Lewis, C. Power, W. Yin, A. Erkan, Fame-rateomnidirectional surveillance and tracking of camouflaged and occludedtargets, in: IEEE Workshop on Visual Surveillance, June 1999.

[5] T. Zhao, M. Aggarwal, R. Kumar, H. Sawhney, Real-time wide area multi-camera stereo tracking, in: IEEE Conference on Computer Vision and PatternRecognition (CVPR 05), USA, June 2005.

[6] S. Guler, J.M. Griffith, I.A. Pushee, Tracking and handoff between multipleperspective camera views, in: IEEE Proceedings of the 32nd Applied ImageryPattern Recognition Workshop (AIPR 03), USA, October 2003.

[7] D.G. Lowe, Distinctive image features from scale-invariant keypoints,International Journal of Computer Vision 60 (2) (2004) 91–110.

[8] M. Balcells, D. DeMenthon, D. Doermann, An appearance-based approach forconsistent labeling of humans and objects in video, Pattern and Application(2005) 373–385.

[9] H. Bay, T. Tuytelaars, L.V. Gool, SURF: speeded up robust feature, in: 9thEuropean Conference on Computer vision, 2006.

[10] J. Black, T. Ellis, Multiple camera image tracking, in: Proceedings of thePerformance Evaluation of Tracking and Surveillance Conference (PETS 2001),with CVPR 2001, December 2001.

[11] P. Kelly, A. Katkere, D. Kuramura, S. Moezzi, S. Chatterjee, R. Jain, Anarchitecture for multiple perspective interactive video, in: Proceedings of theACM Multimedia 95, May 1995.

[12] S. Khan, M. Shah, Consistent labeling of tracked objects in multiple cameraswith overlapping fields of view, IEEE Transactions on Pattern Analysis andMachine Intelligence 25 (10) (2003) 1355–1360.

[13] Y. Caspi, M. Irani, A step towards sequence-to-sequence alignment, in: IEEEConference on Computer Vision and Pattern Recognition, June 2000.

[14] S. Calderara, A. Prati, R. Vezzani, R. Cucchiara, Consistent labeling for multi-camera object tracking, in: 13th International Conference on Image Analysisand Processing, September 2005.

[15] L. Lee, R. Romano, G. Stein, Monitoring activities from multiple video streams:establishing a common coordinate frame, IEEE Transactions on PatternAnalysis and Machine Intelligence 22 (8) (2000) 758–767.

[16] J. Kang, I. Cohen, G. Medioni, Continuous tracking within and across camerastreams, in: IEEE International Conference on Computer Vision and PatternRecognition, June 2003.

[17] S. Chang, T.-H. Gong, Tracking multiple people with a multi-camera system, in:IEEE Workshop on Multi-Object Tracking, July 2001.

[18] T. Tuytelaars, L.V. Gool, Matching widely separated views based on affineinvariant regions, International Journal of Computer Vision 59 (1) (2004) 61–85.

[19] O. Javed, Z. Rasheed, K. Shafique, M. Shah, Tracking across multiple cameraswith disjoint views, in: IEEE International Conference on Computer Vision,October 2003.

[20] O. Javed, K. Shafique, M. Shas, Appearance modeling for tracking in multiplenon-overlapping cameras, in: IEEE Conference on Computer Vision and PatternRecognition, June 2005.

[21] J. O’Rourke, Art Gallery Theorems and Algorithms, Oxford, New York, 1987.[22] U.M. Erdem, S. Sclaroff, Automated camera layout to satisfy task-specific and

floor plan-specific coverage requirements, Computer Vision and ImageUnderstanding 103 (3) (2006) 156–169.

[23] A. Mittal, L.S. Davis, Visibility analysis and sensor Planning in dynamicenvironments, in: European Conference on Computer Vision, May 2004.

[24] Y. Cui, S. Samarasekera, Q. Huang, M. Greiffenhagen, Indoor monitoring via thecollaboration between a peripheral sensor and a foveal sensor, in: IEEEWorkshop on Visual Surveillance, January 1998.

[25] D.D. Wackerly, W. Mendenhall III, R.L. Scheaffer, Mathematical statistics withapplications, Sixth ed., Duxbury Advanced Series, 2002.

[26] Z. Zhang, A flexible new technique for camera calibration, IEEE Transactions onPattern Analysis and Machine Intelligence 22 (11) (2000) 1330–1334.

[27] J. Kang, I. Cohen, G. Medioni, Persistent objects tracking across multiple nonoverlapping cameras, in: IEEE Workshop on Motion and Video Computing,2005.

[28] F.L. Lim, W. Leoputra, T. Tan, Non-overlapping distributed tracking systemutilizing particle filter, The Journal of VLSI Signal Processing 49 (3) (2007) 343–362.

[29] G. Antonini, S. Venegas, M. Bierlaire, J. Thiran, Behavioral priors for detectionand tracking of pedestrians in video sequences, International Journal ofComputer Vision 69 (2) (2006) 159–180.

[30] H. Akaike, A new look at the statistical model identification, IEEE Transactionson Automatic Control 19 (6) (1974) 16–23.

[31] Y. Yao, B. Abidi, M. Abidi, Fusion of omnidirectional and PTZ cameras foraccurate cooperative tracking, in: IEEE International Conference on AdvancedVideo and Signal Based Surveillance, Sydney, Australia, November 2006.

[32] H. Bozdogan, Akaike’s information criterion and recent developments ininformation complexity, Journal of Mathematical Psychology 44 (2000) 62–69.

[33] T. Gandhi, M.M. Trivedi, Person tracking and reidentification: introducingpanoramic appearance map for feature representation, Machine Vision andApplications 18 (2007) 207–220.

[34] J. Kannala, S. Brandt, A generic camera calibration method for fish-eye lenses, in:International Conference on Pattern Recognition, Cambridge, UK, August 2004.

[35] D. Johnson, Applied Multivariate Methods for Data Analysis, Duxbury, 1998.[36] V. Orekhov, B. Abidi, C. Broaddus, M. Abidi, Universal camera calibration with

automatic distortion model selection, in: Proceedings of the IEEE InternationalConference on Image Processing ICIP2007, vol. VI, San Antonio TX, August2007.

[37] J. Rissanen, Modeling by shortest data description, Automatica 14 (1978) 465–471.

[38] G. Schwarz, Estimating the dimension of a model, Annals of Statistics 6 (1978)461–464.

[39] H. Bozdogan, Model selection and Akaike’s information criterion,Psychometrika 53 (3) (1987) 345–370.

[40] Q. Cai, J.K. Aggarwal, Automatic tracking of human motion in indoor scenesacross multiple synchronized video streams, in: International Conference onComputer Vision, Bombay, January 1998.

[41] Q. Cai, J.K. Aggarwal, Tracking human motion in structured environmentsusing a distributed-camera system, IEEE Transactions on Pattern Analysis andMachine Intelligence 21 (12) (1999) 1241–1247.

[42] Q. Zhou, J.K. Aggarwal, Object tracking in an outdoor environment usingfusion of features and cameras, Image and Vision Computing 24 (2006)1244–1255.

[43] M.A. Pett, Nonparametric Statistics for Health Care Research: Statistics forSmall Samples and Unusual Distributions, SAGE, 1997.

[44] R.C. Blair, J.J. Higgins, Comparison of the power of the paired samples t test tothat of Wilcoxon Signed-Rank Test under various population shapes,Psychological Bulletin 97 (1) (1985) 119–128.

[45] D. Helbing, A mathematical model for the behavior of pedestrians, BehavioralScience 36 (1991) 298–310.

[46] S. Datta, G.A. Satten, A signed-rank test for clustered data, Biometrics 64 (June)(2008) 501–507.

[47] B. Rosner, R.J. Glynn, M.-L.T. Lee, The Wilcoxon Signed Rank Test for pairedcomparisons of clustered data, Biometrics 62 (2006) 185–192.

[48] W.J. Conover, Practical Nonparametric Statistics, third ed., Wiley, 1999.[49] N. Neumann, Some procedures for calculating the distributions of elementary

non-parametric test statistics, Statistical Software Newsletter 14 (3) (1988)120–126.

Chung-Hao Chen received his B.S. and M.S. in ComputerScience and Information Engineering from Fu-Jen Uni-versity, Taiwan 1997 and 2001, respectively, and his Ph.D. in Electrical Engineering from the University of Ten-nessee, Knoxville, in 2009. His research interests includeobject tracking, robotics and image processing.

Yi Yao received her B.S. and M.S. in Electrical Engineer-

ing from Nanjing University of Aeronautics and Astro-nautics, China in 1996 and 2000, respectively, and herPh.D. degree in Electrical Engineering from the Univer-sity of Tennessee, Knoxville, in 2008. Currently, sheworks at the Global Research Center of General Electric.Her research interests include object tracking, sensorplanning, and multi-camera surveillance systems.

David Page received his B.S. and M.S. degrees from

Tennessee Technological University in electrical engi-neering in 1993 and 1995, respectively. After gradua-tion, he worked as a civilian research engineer with theNaval Surface Warfare Center in Dahlgren, Virginia.Later, in 1997, he returned to academia and in 2003completed his Ph.D. in electrical engineering at theUniversity of Tennessee in Knoxville. From 2003 to2008, he served as a research assistant professor in theImaging, Robotics, and Intelligent Systems Laboratory atUT. Currently, he is a partner in Third DimensionTechnolgoies LLC, a Knoxville-based startup. His

research interests are in 3D scanning and modeling for computer vision applica-tions, robotic vision systems, and 3D shape analysis for object description.

Page 19: Camera handoff and placement for automated tracking systems with multiple omnidirectional cameras

Imag

Besma Abidi recieved two M.S. in 1985 and 1986 in

image processing and Remote Sensing with honors fromthe National Engineering School of Tunis. She receivedher Ph.D. from the University of Tennessee, Knoxville, in1995. Currently, she is a Research Assistant Professorwith the Department of Electrical and Computer Engi-neering at the University of Tennessee, Knoxville. Hergeneral areas of research are in senor positioning andgeometry, video tracking, sensor fusion, nano-vision,and biometrics. She is a senior member of IEEE, memberof SPIE, Tau Beta Pi, Eta Kappa Nu, Phi Kappa Phi, andThe Order of Engineer.

C.-H. Chen et al. / Computer Vision and

Andreas Koschan received his Diplom (M.S.) in Computer

Science and his Dr.-Ing. (Ph.D.) in Computer Engineeringfrom the Technical University Berlin, Germany in 1985and 1991, respectively. Currently he is a ResearchAssociate Professor in the Department of Electrical andComputer Engineering at the University of Tennessee,Knoxville. His work has primarily focused on colorimage processing and 3D computer vision includingstereo vision and laser range finding techniques. He is acoauthor of two textbooks on 3D image processing andhe is a member of IS&T and IEEE.

Mongi Abidi received his Principal Engineer diploma in

Electrical Engineering from the National EngineeringSchool of Tunis, Tunisia, in 1981, and his M.S. and Ph.D.degrees in Electrical Engineering from the University ofTennessee, Knoxville, in 1985 and 1987, respectively.Currently, he is Professor and Associate DepartmentHead in the Department of Electrical and ComputerEngineering, directing activities in the Imaging, Robot-ics, and Intelligent Systems Laboratory. He conductsresearch in the field of 3D imaging, specifically in theareas of scene building, scene description, and datavisualization.

e Understanding 114 (2010) 179–197 197