Page 1
Online Multi-Person Tracking Using Variance Magnitude of
Image colors and Solving Short Minimum Clique Problem
Pourya Jafarzadeh
1, Bijan Shoushatrain
2
1-MSc Student University of Isfahan- Iran, Ahwaz
2- Assistant Professor, University of Isfahan- Iran, Isfahan
Corresponding Author’s E-mail: [email protected]
Abstract Multi-object tracking (MOT) is an essential but challenging task in many computer vision applications.
Numerous researches have been performed on this topic where first objects are detected independently in
each frame (object detection) and then the detected objects are linked together into trajectories (data
association). In complex scenes, MOT is still a difficult task due to many problems including long-term
occlusion by clutter or other objects, similar appearances of different objects, crowded scenes, etc. In this
paper, data association is formulated as a Short Minimum Clique Problem (SMCP). Using three
consecutive frames, three clusters are created where each clique between these clusters is a tracklet
(partial trajectory) of a person. For this purpose, a fast and simple method is proposed for creating cliques
by pruning the extra edges between clusters. For edge weights, color histogram similarities and similarity
of eigenvalues of bounding boxes of people are used. Moreover for occlusion handling a trustable and fast
method is applied. By saving the color histograms of people, the occlusion handling is performed. The
proposed algorithm is evaluated on three challenging sequences of TUD-Crossing, TUD-Stadtmitte and
PET 2009 and then compared to state-of-the-art methods where promising results are obtained.
Keywords: Multi-object tracking, Clique, Short Minimum Clique Problem, SMCP,
1. INTRODUCTION
One of the most important tasks in many computer vision applications is multi-object tracking (MOT). It has
wide applications including various video analysis scenarios, such as motion and scene analysis, video
indexing, activity recognition, video surveillance and traffic monitoring, among which traffic video
surveillance motivates most of the investigations on multi-object tracking. Using MOT, the states of multiple
objects are estimated while their identifications are conserved under appearance and motion variations with
time. In complex scenes, MOT is still a difficult task due to many problems consisting of long-term occlusion
by clutter or other objects, ID-switching, crowded scenes, and so on.
“Multi-object tracking (MOT) aims to estimate object trajectories according to the identities in image
sequences. Recently, thanks to the advances of object detectors [1], [2], numerous tracking-by-detection
approaches have been developed for MOT. In this type of approaches, target objects are detected first and
tracking algorithms estimate their trajectories using detection results, this part is called data association.
Tracking-by-detection methods can be broadly categorized into online and offline (batch or semi-batch)
tracking methods. Offline MOT methods generally utilize detection results from past and future frames.
Tracklets are first generated by linking individual detections in a number of frames, and then iteratively
associated to construct long trajectories of objects in the entire sequence, or in a time-sliding window with a
temporal delay (e.g.,[3], [4]). On the other hand, online MOT algorithms estimate object trajectories using
only detections from the current as well as past frames (e.g. [5]–[7]), and online MOT algorithms are more
applicable to real-time applications such as advanced driving assistant systems and robot navigation” [8].
Page 2
Data association techniques are divided in two groups: temporally local and temporally global.
Bipartite matching is the most popular method for temporally local approaches where a Hungarian algorithm
is an exact solution for it. Another group of data association is temporally global which has the ability to
better deal with the challenges. The popularity of the global data association based tracking methods has
recently increased. In global data association, optimization is achieved over a batch of frames instead of just
two/few consecutive frames [3], [7], [9]–[11].
Authors in [10], [11] suggested global trackers by finding minimum/maximum cliques in graphs
where each clique corresponds to the tracklet of a person in the frames. In [11], cliques are found by
optimizing motion and appearance. Recently, GMCP has been used in the fields of biology and
telecommunication [12]. Also in [10], binary integer programming is used to find cliques by considering all
possible connections in the graph.
In this paper, due to the fact that the current and past frames are used, the proposed method is an
online method. In the proposed method tracking is based on solving minimum clique problem. For this
purpose, three consecutive frames as three clusters are used where cliques are considered as the trakclets of a
person in those frames. By finding and analyzing cliques, occluded and visible (unoccluded) objects are
determined. Moreover, the new approach is based on similarity of eigenvalues along with color histograms of
bounding boxes of corresponding people in three frames. The proposed tracker eliminates the extra edges of
the graph which have no effect in vital comparisons in the clique problem.
The rest of the paper is organized as follows. In Section 2, our tracking method based on Short
Minimum Clique Problem (SMCP) is introduced. Section 3 presents the experiments and the proposed
tracker is compared with the state-of-the-art methods. Finally, the paper conclusion is given in Section 4.
2. Problem Formulation
Our proposed tracker is based on three-partite matching. A frame is considered as a cluster in which a
detected person is defined as a node of that cluster. A node in a cluster doesn’t have any connection with
other nodes of the same cluster but it has a number of connections to the other nodes in other clusters (see
figure 1.b). Suppose there are K people in three consecutive frames. The goal is to find a sub-graphs which
forms cliques in which the sum of edges are minimized and exactly K nodes are selected from each cluster.
Thus, K cliques are obtained where each clique has three nodes. Each clique indicates the tracklet of one
person from K people in the video (see figure 1(a)).
To give a more formal definition, the input to the SMCP is a graph where , and are
the nodes, edges and their corresponding weights, respectively. is divided into a set of disjoint nodes of
clusters where is defined the th node in the th cluster. The goal is now to pick a set of K cliques by
selecting exactly K nodes from each cluster that minimizes the total score. Minimum cliques according to the
weights of edges of graph set are then selected as explained in the following sections.
2.1 Finding Tracklets Using SMCP
The proposed method in this paper operates on three consecutive frames in each step in order to find
corresponding people in those frames. The performance of the proposed is comparable with the results of the
aforementioned global methods.
Consider three consecutive frames as , , . By comparing detected pedestrians in those
frames, a graph is created in which the edges represent all the connections the corresponding bounding boxes.
After finding minimum cliques in the graph, corresponding pedestrians in the three frames are obtained and
tracker continues with the next three frames that the first frame in the new sequence is the last one in the
previous sequence.
The graphs of some scenes may become large with a huge number of edges. For instance, in TUD-
Crossing dataset, there are 6 pedestrians in some frames (clusters) where each cluster has 6 nodes. In the full
graph of the three frames, there are 108 edges and the problem space becomes huge. Hence in building the
Page 3
tracker’s graph, an efficient method for pruning the edges of the graph is applied. The distance of pedestrians
in frames for creating the edges is utilized, and the Euclidean distance (equation 1) between two bounding
boxes is used for this purpose. A predefined distance measure is set as a threshold and the tracker just
considers edges among the pedestrians (nodes) that their Euclidean distances are less than the mentioned
threshold. As a result, the graph with 108 edges is reduced to a graph by approximately 30 edges that is
reduced to 27% of first state (see figure 1). So the problem space is reduced substantially and then the tracker
finds the minimum cliques in the new graph by using the weights of edges.
( ) √(
) (
)
(1)
(a)
(b)
Figure 1. In (a) we show the full graph with 108 edges, but in (b) with regarding
the distance the sparse graph is created and two cliques as sample solutions. The
cliques are determined by the bold edges (blue and orange). The graph has 30
edges that is of the (a).
Page 4
2.2 Calculation of Edges’ Weights
The intersection of the color histograms of bounding boxes and similarity of eigenvalues are employed for
computing the weights of edges. For appearance representation of a node, the color histogram [13] is utilized.
Formula (2) is used to compute the intersection of the color histograms of bounding boxes in which
means the part of the node of frame.
( ) ∑ (
)
(2)
where k represents histogram intersection kernel. The root mean square error (RMSE) kernel is utilized in
the proposed method.
The eigenvalues in a covariance matrix represent the variance magnitude in the direction of the largest
spread of the data. The direction of the largest spread of the data is the direction of the eigenvector of the
corresponding eigenvalue [14]. So, the covariance of the bounding box’s parts is computed and then the
eigenvalues of each covariance matrix is computed. After this the eigenvalues of each matrix is sorted in
descending order. Then the sorted eigenvalues of each part ( is compared with the eigenvalues of its
corresponding part in other bounding box ( Formula (3) is used to compare the similarity of
eigenvalues of two bounding boxes:
( ) ∑ (
)
(3)
Where EigsofCov is the eigenvalues of the covariance matrix of , and K represents the kernel for
comparing eigenvalues. The root mean square error (RMSE) kernel is utilized in the proposed method.
In the algorithm, three frames are compared with each other; i.e. with , with and ,
with . If the Euclidean distance between two nodes is less than a predefined threshold, an edge between
them is created. It should be noted that
demonstrates the weight of the edge between th node in th
cluster (frame) and th node in th cluster (frame). Similarly
is the edge between the nodes and
. Now assume that node in has two connections with and nodes in . The
algorithm then computes the
and
.Firstly the intersections of histograms between and
two nodes are computed. After this, the two comparison values are sorted in
descending order (the less value of RMSE indicates more similarity of color histograms). Secondly,
similarity of eigenvalues between mentioned bounding boxes is computed. Then the comparison values are
sorted in descending order (the less value of RMSE, the more similarity of eigenvalues). Suppose that
is located at the top of the sorted list of the intersection of histograms and similarity of eigenvalues.
In the other words, it is ranked first in both lists. Meanwhile, is ranked as second one in both lists.
After that,
and
are computed by formula (3) as follows:
(4)
In this case,
and
. So the weight of the edge between and is 1 and the
weight of the edge between and is 2. Thus, the node is more similar to the node than the
node .
The proposed tracker creates all possible edges in the graph by considering the distance threshold. All
possible cliques in three clusters are found by a simple iterative algorithm and then a clique with minimum
edges’ weight in the graph is selected. The minimum clique is the tracklet of a person in the three frames (in
figure. 1 (a) two cliques are illustrated). When the minimum clique is found, all its nodes and connected
Page 5
edges are removed. As a result, the problem space is reduced more than before. In the next step, the next
minimum clique is found. This process is continued until all cliques are found, then the algorithm analysis
remainder nodes in the three frames for occlusion handling that is elucidated in section 2.3. Then, the tracker
selects next three frames for tracking where the first frame of the next sequence is the last frame in the
previous sequence.
In the proposed method, each clique consists of three nodes and each of which has two edges. So the sum of
outgoing edges of one node entering another cluster is less than or equal to one. Thus, one clique does not
include more than one node at each cluster [10].
(5) ∑
2.3 Occlusion Handling Model
The most essential issue in tracking is said to be occlusion handling. For example, a person goes behind
another person or an object in th frame; then he/she may appear in ( th frame. A precise tracker should
be able to handle occlusions by recognizing the person in frame as the same person in ( frame.
Basically, the proposed algorithm performs the occlusion handling by analyzing the remaining nodes in the
graph (the nodes that don’t belong to any clique).
As mentioned in Section 2.1, our tracker deals with 3 frames. In the simplest situation, the same
people present in all three frames. As stated before, the algorithm selects minimum cliques after each other.
In this circumstance, the tracker finds cliques in the graph where the cliques are tracklet of people in the
frames. But in another situation, a person may be occluded in the third frame. The tracker discovers
cliques and a single edge between the first and the second frames (see figure. 2(a)). If the person does not exit
from the scene, the average of histograms of the nodes (bounding boxes) in the edge is saved in a buffer
along with its location in the second frame and the second frame number. Also in another situation, the
person may be present in the first frame and he /she is occluded in the second and third frames (figure 2(b)).
The tracker finds cliques among frames and a single node remains in the first frame. The algorithm
saves histogram of eight partitions of a single node, the coordinates of its bounding box and the frame
number in the buffer. Generally after finding full cliques, the remaining nodes in the first and second clusters
determine which nodes are stored in the buffer.
In a sample situation, there are people in frames , respectively. So there are
cliques and a single node in the third frame. Meanwhile, in another situation, there are
people in frames , respectively and there exist cliques and an edge between frames
and . These conditions demonstrate that one person is added to the scene in the third frame or the
second and third frames. By checking its coordinates, the tracker decides whether it is a new arrival in the
scene. If that is a new arrival, it is assigned a new ID and is identified as a new person. But if it is not
recognized as a new arrival, it is identified as the person who is released from occlusion. Therefore, the
single node in the 3rd
frame or the node of the edge between the 2nd
and 3rd
frames is compared with the
buffer. As mentioned in above, the histograms of the occluded people along with their coordinates and their
frame numbers are stored in the buffer. So the histograms of the bounding boxes of new people are compared
with the histograms of the bounding boxes of the stored people in the buffer. The stored person in the buffer
that has the most similarity with the new person is released from the buffer. For example, a person is
occluded in frame and it is identified in frame by the tracker. The coordinates of the person in
frame is available and the coordinate in frame is obtained from the buffer. Now its location in
frames should be determined (i.e., in the frames in which the person was occluded). Formulas (6-9) can be
used to estimate the location of the person in occluded frames . Similar to [10],
[11] methods, a constant velocity model is employed. The formulas (6-9) assume the constant velocity for
Page 6
people in sequences. The tracker computes the rate of spatial movement in direction as and the rate of
spatial movement in direction as . These two rates are used to locate the person in the occluded frames.
Figures 4, 5 and 6 showed occlusion handling examples of the proposed method.
(a)
(b)
Figure 2. In (a), the man (blue bounding box) is occluded in the frame t+2. In (b), the man (blue
bounding box) is occluded in frame t+1and frame t+2
Page 7
2.4 Structure of the Buffer
The histograms of the bounding boxes of occluded people are stored in a buffer and after appearing again in
the scene, their bounding boxes histograms are compared with those stored in the buffer. In this section the
structure of the buffer is delineated. The space for each person in the buffer is: table which is the
histogram of eight parts of a person’s bounding box. The histogram is computed by method [13].The
coordinates of the left upper corner of the bounding box of the person and the frame number.
The steps of the proposed tracker are as follows:
Algorithm 1 input: is the whole of a sequence, T frames,
output: the tracks of people in the sequence 1- Input 3 frames
2- Create the sparse graph in addition to the weights of the edges
3- Find all the possible cliques in the graph
4- Select the minimum clique
5- Remove nodes and edges of the minimum clique from the graph,
remove all the edges that connected to the nodes of the minimum clique
6- If there is another clique go to step 4; otherwise go to step 7
7- Analyze the nodes and edges in the 1st and the 2
nd frames for storing in
the buffer as occluded people,
8- Analyze the nodes and edges in the 2nd
and the 3rd
frames for occlusion handling.
9- Update the buffer by releasing identified occluded people and store new occluded people in the
buffer.
10- If then and go to step 1 else exit.
3. Tracking Evaluation
In this section, the experimental evaluation of the proposed tracking algorithm and comparison against the
state-of-the-art methods are presented. We carry out experiments on three publicly available sequences,
which provide a wide range of significant challenges: TUD-Crossing, TUD-Stadtmitte and sequence S2L1
from VS-PET2009 benchmark. We compared our method with the state of the art trackers, borrowing the
numbers from the authors’ papers.
The trackers are as follows: GMMCP [10], GMCP [11], WMWIS [15], MAT [16], DCPF [17],
OMTUC [18], OMPTH [19], DLP [20], MTEM [21], GORMT [22], KSO [9], GAC [23], MOTMM [24].
3.1 Implementation
We used deformable part based model [1] to get the detection hypothesis in each frame. The coefficient
for computing edge weight is set to . We implemented sensitivity analysis in TUD-Stadtmitte dataset and
(6)
(7)
(8)
( ) (9)
Page 8
compared the people in each two consecutive frames with different values for . According to the curve in
figure 3, the best value for is . The distance threshold for edge creating is set to 40, due to the fact that
most of the time a pedestrian does not move more than 40 pixels in two consecutive frames.
3.2 Evaluation Metrics
Standard CLEAR MOT [25] are used as evaluation metrics. False positives, false negatives (Missed people)
and ID-Switches are measured by MOTA. MOTP is defined as the average distance between the ground truth
and estimated targets locations. MOTP shows the ability of the tracker in estimating the precise location of
the object, regardless of its accuracy at recognizing object configurations, keeping consistent trajectories, and
so forth. Therefore, MOTA has been widely accepted in the literature as the main scale of performance of
tracking methods.
Figure 3. Sensitivity analysis TUD-Stadtmitte dataset. The recall value for different is
computed.The maximum value achieved at .
TUD Data set
TUD-Crossing and TUD-Stadtmitte are two sequences in this data set with low camera angle and
frequent occlusions. Crossing and Stadtmitte include 201 and 179 frames respectively.
PET2009-S2L1-View one
The sequence consists of 795 frames. This sequence consists of occlusions and people crossing
from sides of each other many times. Due to the long-time of the sequence, it is so useful for
testing methods. Proposed method has low ID-Switches like many states-of-the-art methods in this
sequence. Table 1. Tracking result for TUD-Crossing sequence.
IDSW Rec. Prec. MOTP% MOTA% Method
2 - - 70 91.9 GMMCP
2 98.8 89.2 73 85.9 WMWIS
0 92.83 98.6 75.6 91.63 GMCP
2 98.6 85.1 71.0 84.3 DCPF
8 - - 76.9 90.6 MAT
11 - - 67.5 71.3 OMPTH
2 - - 71.0 84.3 OMTUC
2 94.64 94.11 73.6 92.31 Our method
Page 9
Table 2. Tracking result for TUD-Stadtmitte sequence.
IDSW Rec. Prec. MOTP% MOTA% Method
0 - - 73.9 82.4 GMMCP
4 - - 73.9 79.3 DLP
0 85.4 95.6 63.4 77.7 GMCP
7 - - 65.8 60.5 MTEM
3 - - 70 75.4 MAT
2 - - 59.8 75.0 OMPTH
- - - 64.0 68.6 GORMT
0 70 100 73 80.83 Our method
Table 3. Tracking result for PET 2009-S2L1 sequence.
I
DSW Rec. Prec. MOTP% MOTA% Method
8 96.45 93.64 69.02 90.3 GMCP
28 60.00 81.00 58.00 80.0 KSO
19 90.81 90.66 58.38 81.46 GAC
15 85.13 96.28 73.93 82.84 MTEM
10 94.03 92.40 68.74 84.77 MOTMM
8 - - 74.3 92.8 MAT
10 - - 66.1 91.0 OMPTH - - - 75.7 88.3 GORMT
6 98.7 97.7 85.33 96.57 Our method
4. Conclusion
In this work, we propose to formulate multi-person tracking as a Short Minimum Clique Problem, which it is
solved through sparse graphs and using comparison of eigenvalues of covariance matrix besides color
histogram for finding corresponding people. Then we show that by using three frames in each step and using
a fluent occlusion handler, satisfactory results can be achieved. In our experiment the tracker compared with
the state-of-the-art methods on three challenging sequences. Experimental results demonstrate that our
method is effective and efficient.
Page 10
Frame Number = 61
Frame Number = 68
Frame Number = 75
Frame Number = 83
Frame Number = 90
Frame Number = 103
Figure 4. The Woman with ID=7 that is cleared with a yellow bounding box is occluded in 43 frames. The
occlusion handling method saves her ID. Moreover other pedestrians are tracked correctly with their ID at #61 to
#103 frames.
Page 11
Frame Number = 87
Frame Number = 95
Frame Number = 105
Frame Number = 109
Frame Number = 117
Frame Number = 128
Figure 5. The Woman with ID=10 that is cleared with a yellow bounding box is occluded in many times, but her
ID preserved by proposed occlusion handling method.
Page 12
Frame Number = 292
Frame Number = 293
Frame Number = 294
Frame Number = 295
Frame Number = 296
Frame Number = 297
Figure 6. The person with ID=10 that is cleared with a black arrow is occluded behind the person with ID=9. The
occlusion handling method saves his ID. Moreover other pedestrians are tracked correctly with their ID at #292 to
#297 frames.
Page 13
REFERENCES [1] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object Detection with
Discriminative Trained Part Based Models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 9,
pp. 1627–1645, 2010.
[2] Y. Amit and P. Felzenszwalb, “Object Detection,” Comput. Vis. A Ref. Guid., no. Springer US, pp.
537–542, 2014.
[3] H. Pirsiavash, D. Ramanan, and C. C. Fowlkes, “Globally-Optimal Greedy Algorithms for Tracking
a Variable Number of Objects,” 2011.
[4] J. Xing, H. Ai, and S. Lao, “Multi-object tracking through occlusions by local tracklets filtering and
global tracklets association with detection responses,” 2009 IEEE Comput. Soc. Conf. Comput. Vis.
Pattern Recognit. Work. CVPR Work. 2009, pp. 1200–1207, 2009.
[5] A. Dehghan and M. Shah, “Binary Quadratic Programing for Online Tracking of Hundreds of People
in Extremely Crowded Scenes,” vol. 14, no. 8, pp. 1–14, 2016.
[6] T. Wu, Y. Lu, and S.-C. Zhu, “Online Object Tracking, Learning and Parsing with And-Or Graphs,”
pp. 1–14, 2015.
[7] Z. Wu, J. Zhang, and M. Betke, “Online Motion Agreement Tracking,” Procedings Br. Mach. Vis.
Conf. 2013, pp. 63.1–63.11, 2013.
[8] J. H. Yoon and C. Lee, “Online Multi-Object Tracking via Structural Constraint Event Aggregation,”
IEEE Int. Conf. Comput. Vis. Pattern Recognit., 2016.
[9] J. Berclaz, F. Fleuret, E. Türetken, and P. Fua, “Multiple object tracking using k-shortest paths
optimization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 9, pp. 1806–1819, 2011.
[10] A. Dehghan, “GMMCP Tracker : Globally Optimal Generalized Maximum Multi Clique Problem for
Multiple Object Tracking,” Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. (CVPR 2015),
2015.
[11] A. Roshan Zamir, A. Dehghan, and M. Shah, “GMCP-tracker: Global multi-object tracking using
generalized minimum clique graphs,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif.
Intell. Lect. Notes Bioinformatics), vol. 7573 LNCS, no. PART 2, pp. 343–356, 2012.
[12] C. Feremans, M. Labbé, and G. Laporte, “Generalized network design problems,” Eur. J. Oper. Res.,
vol. 148, no. 1, pp. 1–13, 2003.
[13] J. Domke and Y. Aloimonos, “Deformation and Viewpoint Invariant Color Histograms,” Procedings
Br. Mach. Vis. Conf. 2006, pp. 53.1–53.10, 2006.
[14] V. Spruyt, “A geometric interpretation of the covariance matrix,” 2104. [Online]. Available:
http://www.visiondummy.com/2014/04/geometric-interpretation-covariance-matrix/.
[15] W. Brendel, M. Amer, and S. Todorovic, “Multiobject tracking as maximum weight independent
set,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 1273–1280, 2011.
[16] Z. Wu, J. Zhang, and M. Betke, “Online Motion Agreement Tracking,” Procedings Br. Mach. Vis.
Conf. 2013, pp. 63.1–63.11, 2013.
[17] M. D. Breitenstein, F. Reichlin, B. Leibe, E. K. L. Van Gool, and K. U. Leuven, “Robust Tracking-
by-Detection using a Detector Confidence Particle Filter,” IEEE Int. Conf. Comput. Vis., vol. i, 2009.
[18] M. D. Breitenstein, F. Reichlin, B. Leibe, E. Koller-Meier, and L. Van Gool, “Online multiperson
tracking-by-detection from a single, uncalibrated camera,” IEEE Trans. Pattern Anal. Mach. Intell.,
vol. 33, no. 9, pp. 1820–1833, 2011.
[19] J. Zhang, L. Lo Presti, S. Sclaroff, C. Street, and B. Ma, “Online Multi-Person Tracking by Tracker
Hierarchy,” pp. 379–385, 2012.
[20] K. C. A. Kumar and C. De Vleeschouwer, “Discriminative label propagation for multi-object
tracking with sporadic appearance features,” Proc. IEEE Int. Conf. Comput. Vis., pp. 2000–2007,
2013.
[21] A. Andriyenko and K. Schindler, “Multi-target tracking by continuous energy minimization,” Proc.
IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 1265–1272, 2011.
[22] A. Andriyenko, S. Roth, and K. Schindler, “An analytical formulation of global occlusion reasoning
for multi-target tracking,” Proc. IEEE Int. Conf. Comput. Vis., no. November, pp. 1839–1846, 2011.
[23] H. Ben Shitrit, J. Berclaz, F. Fleuret, and P. Fua, “Tracking multiple people under global appearance
Page 14
constraints,” Proc. IEEE Int. Conf. Comput. Vis., no. 247022, pp. 137–144, 2011.
[24] J. F. Henriques, R. Caseiro, and J. Batista, “Globally optimal solution to multi-object tracking with
merged measurements,” Proc. IEEE Int. Conf. Comput. Vis., pp. 2470–2477, 2011.
[25] R. Kasturi, D. Goldgof, V. Korzhova, S. Member, J. Zhang, and S. Member, “Framework for
Performance Evaluation of Face , Text , and Vehicle Detection and Tracking in Video : Data ,
Metrics , and Protocol,” Pattern Anal. Mach. Intell. IEEE Trans., vol. 31, no. 2, pp. 319–336, 2009.