Improving Robustness of Infrared Target Tracking Algorithms Based on Template Matching A methodology for improving the robustness of target tracking algorithms for forward looking infrared (FLIR) imagery is presented. The designed technique exploits a motion prediction metric to identify the occurrence of false alarms and to control the activation of a template matching (TM) based phase. The proposed approach is particularly suited to high speed algorithms in which tracking is generally performed by using a computationally efficient target detection (TD) step and TM only acts as a recovery means. In such frameworks, the activation strategy has a strong impact on tracking performance, as an improper activation pattern could both affect execution speed and result into critical tracking failures. The designed strategy is compared with a reference implementation relying on a distance-based activation logic, showing improved performance and self-adaptability to variations both in image sequence and target characteristics, which would be required in most high speed automatic target tracking scenarios. I. INTRODUCTION Target tracking in forward looking infrared (FLIR) imagery is a challenging task. Image sequences obtained from IR sensors are often characterized by low signal-to-noise ratio and heavy background cluttering. Furthermore, when scenarios with nonstationary cameras are considered, tracking requires properly dealing with sensor ego-motion through suitable estimation and compensation techniques. Moreover, further challenges are posed by imagery with multiple and possibly moving target and nontarget objects, that can blend in the background, change their signature, size, shape, and even overlap during their motion. Finally, specific applications could introduce cumbersome real-time constraints, thus requiring tracking techniques with a reduced computational footprint. Some techniques providing significant results under the above conditions already exist. In particular, techniques proposed in [1]—[3] are capable of ensuring high speed processing of FLIR image sequences recorded from an airborne platform; this goal is achieved by splitting tracking operations into two main conceptual phases, namely a target detection Manuscript received October 13, 2009; revised May 19, 2010; released for publication August 6, 2010. IEEE Log No. T-AES/47/2/940858. Refereeing of this contribution was handled by D. Salmond. 0018-9251/11/$26.00 c ° 2011 IEEE (TD) phase and a template matching (TM) phase. The first phase is expected to be carried out in a very efficient way, e.g. by exploiting techniques based on fringe-adjusted joint transform correlation (FJTC) [1] or intensity variation function (IVF) [2, 3] . Despite their impressive detection performance and limited computational requirements, the above techniques may occasionally generate false alarms in the TD phase. In these cases, a second correlation-based recovery phase is activated to restore target position and avoid possible target losses. By relying on target shape information, this second phase is capable of coping with marked variations in target signature and also acts as a powerful ego-motion compensation mechanism (at the cost of a significantly higher complexity). Thus, for a given TD strategy, tracking robustness and performance are strongly related to the metric used for controlling TM intervention. In this paper, a probabilistic metric relying on motion prediction capable of providing for improved robustness with respect to existing implementations of IVF-based target tracking algorithms is proposed. Basically, motion prediction is used to estimate the probability associated with a given target position; this information is then considered with respect to a confidence index that is used to control the activation of the TM phase. Experimental tests were performed using FLIR image sequences from the Army Missile Command (AMCOM). Results obtained by testing the proposed solution with respect to an IVF-based reference implementation showed a significantly improved ability of the tracking algorithm to cope with sudden frame and target changes (basically due to the low signal-to-noise ratio as well as to variations in size, position and thermal imprint). Moreover, the designed algorithm exhibited an increased self-adaptability to target and sequence characteristics that could be of particular interest when automatic target tracking scenarios are considered (as a matter of example, a single confidence index allowed correct tracking of all the targets in the image sequences belonging to the experimental data set). This paper has been organized as follows. Section II reviews the main works related to IR target tracking. In Section III, issues related to false alarm generation are specifically addressed. Section IV describes the implementation chosen as a reference for comparing with the proposed methodology. The designed metric for controlling the activation of the TM phase is described in Section V. Finally, experimental results are discussed in depth in Section VI. II. BACKGROUND Although a number of techniques for tracking targets in visual images already exists, only a limited CORRESPONDENCE 1467
14
Embed
Improving Robustness of Infrared Target Tracking Algorithms Based on Template Matching
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Improving Robustness of Infrared Target TrackingAlgorithms Based on Template Matching
A methodology for improving the robustness of target tracking
algorithms for forward looking infrared (FLIR) imagery is
presented. The designed technique exploits a motion prediction
metric to identify the occurrence of false alarms and to control
the activation of a template matching (TM) based phase.
The proposed approach is particularly suited to high speed
algorithms in which tracking is generally performed by using
a computationally efficient target detection (TD) step and TM
only acts as a recovery means. In such frameworks, the activation
strategy has a strong impact on tracking performance, as an
improper activation pattern could both affect execution speed
and result into critical tracking failures. The designed strategy
is compared with a reference implementation relying on a
or intensity variation function (IVF) [2, 3] . Despite
their impressive detection performance and limited
computational requirements, the above techniques
may occasionally generate false alarms in the TD
phase. In these cases, a second correlation-based
recovery phase is activated to restore target position
and avoid possible target losses. By relying on target
shape information, this second phase is capable of
coping with marked variations in target signature and
also acts as a powerful ego-motion compensation
mechanism (at the cost of a significantly higher
complexity). Thus, for a given TD strategy, tracking
robustness and performance are strongly related to the
metric used for controlling TM intervention.
In this paper, a probabilistic metric relying on
motion prediction capable of providing for improved
robustness with respect to existing implementations
of IVF-based target tracking algorithms is proposed.
Basically, motion prediction is used to estimate the
probability associated with a given target position;
this information is then considered with respect to a
confidence index that is used to control the activation
of the TM phase.
Experimental tests were performed using FLIR
image sequences from the Army Missile Command
(AMCOM). Results obtained by testing the proposed
solution with respect to an IVF-based reference
implementation showed a significantly improved
ability of the tracking algorithm to cope with sudden
frame and target changes (basically due to the low
signal-to-noise ratio as well as to variations in size,
position and thermal imprint). Moreover, the designed
algorithm exhibited an increased self-adaptability to
target and sequence characteristics that could be of
particular interest when automatic target tracking
scenarios are considered (as a matter of example, a
single confidence index allowed correct tracking of
all the targets in the image sequences belonging to the
experimental data set).
This paper has been organized as follows.
Section II reviews the main works related to IR
target tracking. In Section III, issues related to
false alarm generation are specifically addressed.
Section IV describes the implementation chosen
as a reference for comparing with the proposed
methodology. The designed metric for controlling the
activation of the TM phase is described in Section V.
Finally, experimental results are discussed in depth in
Section VI.
II. BACKGROUND
Although a number of techniques for tracking
targets in visual images already exists, only a limited
CORRESPONDENCE 1467
amount of works address the particular issue of
target tracking for FLIR imagery. Moreover, some
of the above approaches can only be used when the
sensor ego-motion is very limited or when target
features cannot change noticeably during the tracking.
However, in real scenarios, the above assumptions
could be not applicable [29].
When targets do not change their size significantly
and ego-motion is small, detection and tracking can
be efficiently performed by exploiting morphological
operators and thresholding techniques. As a matter of
example, in [4] morphological connected operators
designed by means of general size, connectivity
and motion criteria are used in the context of target
tracking in FLIR imagery to reduce the background
clutter. In [24] and [25], morphological filters are
used in conjunction with multiscale decomposition
and adaptive thresholding techniques to enhance
signal-to-noise ratio and detect targets coarse-to-fine.
Detected targets are then tracked using different
strategies, e.g. based on spatiotemporal correlation
techniques, on the mean-shift algorithm, etc. In
particular, with the mean-shift algorithm, tracking
is achieved by iteratively translating a kernel in the
image space such that the past and current target
observations are similar [7]. Mean-shift based
approaches are widely used both in visible and IR
imagery, as they provide for a general optimization
solution that is independent from target features
[28, 29].
However, the standard mean-shift algorithm is
affected by a severe drawback, i.e., it requires that at
least some part of the target in the successive frame
reside inside the kernel [6]. When imagery is affected
by strong sensor ego-motion, this assumption might
not hold; moreover, in [17] it was demonstrated
that even in case of clutter, occlusion, or rapidly
moving objects the mean-shift cannot guarantee
global optimality, resulting in failures that cannot
be recovered. Adaptive versions of the mean-shift
approach capable of tackling the above limitations
are presented in [26] and [27]. In particular, in [27]
multiple features are extracted from both target and
background during the tracking process, and an online
feature ranking method is deployed to adaptively
select the most discriminative features, which are
then utilized for mean-shift iteration in the next
frame. Nonetheless, when shape and size of the target
change, the additional problem of adaptively adjusting
the size of the region used to track the target is
introduced. The above issue can be tackled by taking
into account the complete contour of the tracked
object. For instance, in [30] a Bayesian approach
using the probability density function of texture and
color features is adopted to dynamically adapt the
target window as the camera approaches the target.
A similar approach exploiting binary classification
techniques is presented in [11], whereas in [15] a
deformable template representation accommodatingboth geometric and signature variability is discussed.
Target tracking in FLIR images has been
historically analyzed also as a template correlation
process [19], in which a template containing a
representative target signature is first initialized.
Tracking is then performed by trying to maximize
the matching with the above template. When target
signature does not exhibit significant temporal
evolution, after initialization the template can be
maintained unchanged for the whole tracking phase.
However, one of the main challenges in FLIR target
tracking is that two consecutive frames may exhibit
little correlation between regions of interest (i.e.,
image regions containing the target). In this scenario,
in order to prevent the template from becoming
stale (i.e., it no longer provides for an accurate
characterization of the observed target) specific
strategies addressing the so-called “template update
problem” have to be designed [12]. For instance, in
[14] and [16] a dual domain approach is presented
to improve tracking accuracy by automatically
detecting when a template update is needed through a
combination of pixel domain and modulation domain
correlation trackers. As the failure modes in the two
domains are rarely the same, the above technique
proved to be able to hold on the target in almost all
the considered frames from the experimental data set.
The tracking technique selected as a reference for
the present work represents an attractive solution to
the template update problem with a specific focus
on computational complexity saving [2, 3]. In fact,
a compact signature based on target IVF is generally
used for fast frame-to-frame detection and tracking.
When reliability of the IVF-based tracking cannot
be guaranteed, the TM phase is executed using an
up-to-date template. The above approach proved to be
capable of effectively dealing with challenging FLIR
sequences with low signal-to-noise ratios and high
ego-motion. Although tracking accuracy is strongly
dependent on the strategy used for activating the TM
phase, a high processing speed can be achieved by
getting rid of redundant image processing steps and
by minimizing the number of computations. Further
optimization approaches of the technique in [2, 3]
exploiting genetic algorithms and telemetry data based
ego-motion compensation strategies can be found in
[18] and [21], respectively.The approaches summarized above are generally
referred to as target representation and localizationbased techniques, as they follow a strategy that triesto identify target position based on its appearance.Alternative strategies, which are often used (possiblyin combination with the above techniques) arerepresented by the so-called filtering and dataassociation algorithms [6]. In this case, predictionstrategies are applied to estimate updated targetfeatures including, position, appearance, size, etc.
1468 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 47, NO. 2 APRIL 2011
Fig. 1. False alarms generated in tracking phases (black boxes), and correct target positions (white boxes). (a) Sequence mwir 1415,
Kalman filters have traditionally been used inautomatic target tracking with the aim of estimatingthe target kinematics [20, 8]. In [26], an adaptiveprediction method relying on a Kalman filter has beenintegrated into a mean-shift based tracking algorithmto adaptively predict the initial searching point.Recently, Kalman filters have also been employed fordesigning on-the-fly appearance learning algorithms[23]. An alternative to Kalman filters is representedby particle filters; despite their higher computationalcomplexity, they can provide for higher accuracy, thuspossibly resulting in improved tracking effectiveness.As a matter of example, in [5] dynamic models takinginto account target aspect change, target motion,and background clutter are combined to improvetracking performance and let the particle filter dealalso with strong ego-motion of the sensor. In [22] and[13], multiple particle filters are run concurrently toestimate target appearance and kinematics using astate vector encompassing target position, velocity,and size both in the pixel and modulation domains.The methodology presented in this work moves
from the strategies discussed above and is targetedat improving the robustness of tracking algorithmsrelying on the TM technique by incorporating anadaptive activation metric based on motion prediction.As the proposed approach builds upon a highperformance TD technique that aims at keepingcomputational complexity low by trying to limitthe overhead associated with the intervention ofthe TM algorithm, a probability-based strategyrelying on an efficient Kalman estimator is usedto control the activation of the recovery phase.Although in the current experimental setup, targetedto observer-in-the-loop high speed target trackingapplications, the selected prediction strategy provedto be capable of achieving effective performance,different filtering techniques could possibly beconsidered for dealing with alternative (e.g. withsensor-in-the-loop) scenarios.
III. FALSE ALARMS
In order to clarify the relation between the metric
for TM activation and the overall algorithm behavior,
before discussing the specific implementation and
introducing the designed strategy for improvement,
it is better to focus on the categories of false alarms
that may occur during the tracking phases.
Basically, the need for a recovery is generally due
to three main failures modes [9], which are illustrated
in Fig. 1 (making reference to sample frames from
sequences in the AMCOM data set). The first failure
mode (Fig. 1(a)) is related to abrupt discontinuities
in target position due to camera ego-motion. The
second failure mode (Fig. 1(b)) occurs when the
TD phase fails in determining the correct target
location (e.g., because of the presence of other targets
the third failure mode (Fig. 1(c)—(d)), known as the
“drifting problem” [10], is related to the behavior of
the TM phase. Specifically, when the TM technique
is activated, it uses information from the target in
the previously determined position to recover from
the false alarm. However, because of changes in
target shape, size, orientation, etc., a partially wrong
template can be gathered in some cases; when this
occurs over consecutive frames, small tracking errors
are accumulated and recovery leads to (ever more)
incorrect results.
In summary, on the one hand an effective
activation metric should allow the TM to intervene
all the times a false alarm belonging to the first and
second categories occurs. On the other hand, in
the above situations the metric should be capable
of detecting those cases in which the activation of
the TM technique would lead (possibly after some
frames) to a target loss due to the third failure mode.
Furthermore, in high speed tracking scenarios, the
TM activation metric should also take into account
the trade-off between algorithm robustness and
computational complexity.
IV. REFERENCE IMPLEMENTATION
In order to present the proposed technique and
compare it with existing approaches, the tracking
algorithm described in [3] was selected. It is worth
CORRESPONDENCE 1469
observing that, as the proposed approach deals with
the TM phase, it could be easily applied to other
works relying on a different implementation of the
TD phase. In [3], the detection phase is based, as in
many IR target tracking algorithms, on the hot spot
technique, which assumes that the target is brighter
than the background and noise effects. Specifically,
in order to determine the new target position, a
target intensity variation function characterizing the
similarity with respect to the local maximum value
describing the target in the previous frame is used.
The IVF technique requires two parameters to be
defined, i.e., the target window size and the subframe
size. The target window roughly corresponds to
the rectangular area including the target, while the
subframe defines the area where the target is searched
for in the new frame. The subframe size has to be
large enough to allow for abrupt camera movements
(in [3], it was set to 33£33 pixels).When tracking has to be performed on a given
frame n, the subframe is positioned at the reference
target location from the previous frame (pn¡1) and theintensity variation function Fn is computed for the
subframe matrix Sn as
Fn(v,z) =1
¤
lXj=1
kXi=1
jSn(i+ v,j+ z)¡!n¡1j (1)
where (v,z) are the spatial coordinates in the subframe
and ¤ is the size (k£ l) of the target window; finally,!n¡1 is defined as a 3£ 3 pixels matrix representingthe local maximum of the target window in the
previous (n¡ 1)th frame.To represent the candidate target coordinates in the
form of a peak value, an exponential function called
correlation output plane (CF) is used, which is defined
asCF(v,z) = e
¡¸Fn(v,z) (2)
where ¸ is a constant value and (v,z) represent the
coordinates of the target window in the subframe.
The CF generates the maximum peak (in the subframe
coordinate system) where the candidate target intensity
variation is close to that in the reference target
position. The coordinates of the peak are therefore
converted in the candidate target position p̄nIVF.
On the other hand, the TM technique can be
modeled as
Tn(v,z) =1
¤
lXj=1
kXi=1
jSn(i+ v,j+ z)¡Wn¡1j (3)
where (v,z), ¤, and (k, l) have the same meaning in
(1). However, in the TM phase, the target window
W is used instead of the local maximum window
matrix !. Although the use of the target window
increases the computational complexity, W provides
essential information concerning target shape. In fact,
being W defined as two to three pixels larger than the
Fig. 2. Application of reference tracking technique to sample
frame from sequence lwir 1520. (a) Frame 119. (b) Results
generated by IVF algorithm.
reference target size, it contains both target as well as
surrounding background information, which allows
to distinguish between target and nontarget objects.
As with the IVF technique, a correlation output plane
can be computed by using (2) and replacing Fn(v,z)
with Tn(v,z); the candidate target position p̄nTM is
then located where the correlation output plane yields
the maximum peak (after conversion in the image
coordinate system).
In Fig. 2, the application of the technique in [3] to
a sample frame from the AMCOM data set is shown.
The black box indicates the (wrong) candidate target
position generated by the IVF technique, whereas the
white box points out the real target position that can
be properly identified through the activation of the
TM-based recovery phase. The correlation output
planes resulting from the application of the IVF
and TM techniques are illustrated in Fig. 3. From
Fig. 3(a) it can be observed that the IVF technique
generated two peaks; the above ambiguity is properly
solved through the application of the TM algorithm
(Fig. 3(b)). It is worth observing that, although the use
of a larger neighborhood (e.g. 5£ 5, 7£ 7, etc.) in theIVF technique could possibly lead to different results,
in this work the high performance configuration used
in [3] was chosen to allow for a fair comparison.
In [3] the TM activation strategy relies on a
controller module that identifies the occurrence of a
false alarm in the TD phase by using a metric based
on the Euclidean distance ¯ (Fig. 4(a)) between the
candidate target position in the current frame (p̄nIVF)
and the reference target position in the previous frame
(pn¡1). If the distance exceeds a constant threshold(expressed in number of pixels) specified for the
considered image sequence, the TM phase is activated,
p̄nTM is computed, and the position closest to pn¡1 is
selected as the reference target position (i.e., either
p̄nIVF or p̄nTM). However, according to the authors,
finding the best criterion valid for all the sequences is
a task hard to accomplish. Thus, in [3] the threshold is
chosen by considering the trade-off between tracking
effectiveness and computational complexity. Although
1470 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 47, NO. 2 APRIL 2011
Fig. 3. Correlation output planes produced by IVF and TM techniques on sequence lwir 1520, frame 119. (a) False alarm due to
presence of two maximum peaks. (b) Improved result obtained by TM technique.
Fig. 4. Strategies for activation of TM phase. (a) Euclidean
distance. (b) Motion prediction.
the authors demonstrated the robustness of their
technique with selected thresholds, the need for a
specific parameter configuration depending on the
particular sequence being considered could limit
the applicability of such technique when high speed
automatic target tracking scenarios are considered.
V. MOTION PREDICTION-BASED METRIC
In this paper, a different TM activation metric is
proposed, aimed at removing the need for identifying
a particular threshold for any image sequence. By
exploiting a motion prediction technique, the new
strategy is capable of linking the activation of the
TM technique to the history of target positions,
thus adapting its behavior to the particular frame or
sequence being considered (Fig. 4(b)).
Basically, for a given frame n, besides the
candidate target position p̄nIVF obtained through
the IVF, a predicted target position p̂n is estimated
using a Kalman filter (that provided the best results
even in sequences affected by significant camera
ego-motion). The candidate target position p̄nIVF,
through its associated motion vector (pn¡1, p̄nIVF), isthen evaluated against the predicted motion vector
(pn¡1, p̂n).
According to the terminology used in Fig. 4(b),
the reliability of candidate target position is measured