Improving Robustness of Infrared Target Tracking Algorithms Based on Template Matching

Improving Robustness of Infrared Target TrackingAlgorithms Based on Template Matching

A methodology for improving the robustness of target tracking

algorithms for forward looking infrared (FLIR) imagery is

presented. The designed technique exploits a motion prediction

metric to identify the occurrence of false alarms and to control

the activation of a template matching (TM) based phase.

The proposed approach is particularly suited to high speed

algorithms in which tracking is generally performed by using

a computationally efficient target detection (TD) step and TM

only acts as a recovery means. In such frameworks, the activation

strategy has a strong impact on tracking performance, as an

improper activation pattern could both affect execution speed

and result into critical tracking failures. The designed strategy

is compared with a reference implementation relying on a

distance-based activation logic, showing improved performance

and self-adaptability to variations both in image sequence and

target characteristics, which would be required in most high

speed automatic target tracking scenarios.

I. INTRODUCTION

Target tracking in forward looking infrared (FLIR)

imagery is a challenging task. Image sequences

obtained from IR sensors are often characterized

by low signal-to-noise ratio and heavy background

cluttering. Furthermore, when scenarios with

nonstationary cameras are considered, tracking

requires properly dealing with sensor ego-motion

through suitable estimation and compensation

techniques. Moreover, further challenges are posed by

imagery with multiple and possibly moving target and

nontarget objects, that can blend in the background,

change their signature, size, shape, and even overlap

during their motion. Finally, specific applications

could introduce cumbersome real-time constraints,

thus requiring tracking techniques with a reduced

computational footprint.

Some techniques providing significant results

under the above conditions already exist. In particular,

techniques proposed in [1]—[3] are capable of ensuring

high speed processing of FLIR image sequences

recorded from an airborne platform; this goal is

achieved by splitting tracking operations into two

main conceptual phases, namely a target detection

Manuscript received October 13, 2009; revised May 19, 2010;

released for publication August 6, 2010.

IEEE Log No. T-AES/47/2/940858.

Refereeing of this contribution was handled by D. Salmond.

0018-9251/11/$26.00 c° 2011 IEEE

(TD) phase and a template matching (TM) phase.

The first phase is expected to be carried out in a very

efficient way, e.g. by exploiting techniques based on

fringe-adjusted joint transform correlation (FJTC) [1]

or intensity variation function (IVF) [2, 3] . Despite

their impressive detection performance and limited

computational requirements, the above techniques

may occasionally generate false alarms in the TD

phase. In these cases, a second correlation-based

recovery phase is activated to restore target position

and avoid possible target losses. By relying on target

shape information, this second phase is capable of

coping with marked variations in target signature and

also acts as a powerful ego-motion compensation

mechanism (at the cost of a significantly higher

complexity). Thus, for a given TD strategy, tracking

robustness and performance are strongly related to the

metric used for controlling TM intervention.

In this paper, a probabilistic metric relying on

motion prediction capable of providing for improved

robustness with respect to existing implementations

of IVF-based target tracking algorithms is proposed.

Basically, motion prediction is used to estimate the

probability associated with a given target position;

this information is then considered with respect to a

confidence index that is used to control the activation

of the TM phase.

Experimental tests were performed using FLIR

image sequences from the Army Missile Command

(AMCOM). Results obtained by testing the proposed

solution with respect to an IVF-based reference

implementation showed a significantly improved

ability of the tracking algorithm to cope with sudden

frame and target changes (basically due to the low

signal-to-noise ratio as well as to variations in size,

position and thermal imprint). Moreover, the designed

algorithm exhibited an increased self-adaptability to

target and sequence characteristics that could be of

particular interest when automatic target tracking

scenarios are considered (as a matter of example, a

single confidence index allowed correct tracking of

all the targets in the image sequences belonging to the

experimental data set).

This paper has been organized as follows.

Section II reviews the main works related to IR

target tracking. In Section III, issues related to

false alarm generation are specifically addressed.

Section IV describes the implementation chosen

as a reference for comparing with the proposed

methodology. The designed metric for controlling the

activation of the TM phase is described in Section V.

Finally, experimental results are discussed in depth in

Section VI.

II. BACKGROUND

Although a number of techniques for tracking

targets in visual images already exists, only a limited

CORRESPONDENCE 1467

amount of works address the particular issue of

target tracking for FLIR imagery. Moreover, some

of the above approaches can only be used when the

sensor ego-motion is very limited or when target

features cannot change noticeably during the tracking.

However, in real scenarios, the above assumptions

could be not applicable [29].

When targets do not change their size significantly

and ego-motion is small, detection and tracking can

be efficiently performed by exploiting morphological

operators and thresholding techniques. As a matter of

example, in [4] morphological connected operators

designed by means of general size, connectivity

and motion criteria are used in the context of target

tracking in FLIR imagery to reduce the background

clutter. In [24] and [25], morphological filters are

used in conjunction with multiscale decomposition

and adaptive thresholding techniques to enhance

signal-to-noise ratio and detect targets coarse-to-fine.

Detected targets are then tracked using different

strategies, e.g. based on spatiotemporal correlation

techniques, on the mean-shift algorithm, etc. In

particular, with the mean-shift algorithm, tracking

is achieved by iteratively translating a kernel in the

image space such that the past and current target

observations are similar [7]. Mean-shift based

approaches are widely used both in visible and IR

imagery, as they provide for a general optimization

solution that is independent from target features

[28, 29].

However, the standard mean-shift algorithm is

affected by a severe drawback, i.e., it requires that at

least some part of the target in the successive frame

reside inside the kernel [6]. When imagery is affected

by strong sensor ego-motion, this assumption might

not hold; moreover, in [17] it was demonstrated

that even in case of clutter, occlusion, or rapidly

moving objects the mean-shift cannot guarantee

global optimality, resulting in failures that cannot

be recovered. Adaptive versions of the mean-shift

approach capable of tackling the above limitations

are presented in [26] and [27]. In particular, in [27]

multiple features are extracted from both target and

background during the tracking process, and an online

feature ranking method is deployed to adaptively

select the most discriminative features, which are

then utilized for mean-shift iteration in the next

frame. Nonetheless, when shape and size of the target

change, the additional problem of adaptively adjusting

the size of the region used to track the target is

introduced. The above issue can be tackled by taking

into account the complete contour of the tracked

object. For instance, in [30] a Bayesian approach

using the probability density function of texture and

color features is adopted to dynamically adapt the

target window as the camera approaches the target.

A similar approach exploiting binary classification

techniques is presented in [11], whereas in [15] a

deformable template representation accommodatingboth geometric and signature variability is discussed.

Target tracking in FLIR images has been

historically analyzed also as a template correlation

process [19], in which a template containing a

representative target signature is first initialized.

Tracking is then performed by trying to maximize

the matching with the above template. When target

signature does not exhibit significant temporal

evolution, after initialization the template can be

maintained unchanged for the whole tracking phase.

However, one of the main challenges in FLIR target

tracking is that two consecutive frames may exhibit

little correlation between regions of interest (i.e.,

image regions containing the target). In this scenario,

in order to prevent the template from becoming

stale (i.e., it no longer provides for an accurate

characterization of the observed target) specific

strategies addressing the so-called “template update

problem” have to be designed [12]. For instance, in

[14] and [16] a dual domain approach is presented

to improve tracking accuracy by automatically

detecting when a template update is needed through a

combination of pixel domain and modulation domain

correlation trackers. As the failure modes in the two

domains are rarely the same, the above technique

proved to be able to hold on the target in almost all

the considered frames from the experimental data set.

The tracking technique selected as a reference for

the present work represents an attractive solution to

the template update problem with a specific focus

on computational complexity saving [2, 3]. In fact,

a compact signature based on target IVF is generally

used for fast frame-to-frame detection and tracking.

When reliability of the IVF-based tracking cannot

be guaranteed, the TM phase is executed using an

up-to-date template. The above approach proved to be

capable of effectively dealing with challenging FLIR

sequences with low signal-to-noise ratios and high

ego-motion. Although tracking accuracy is strongly

dependent on the strategy used for activating the TM

phase, a high processing speed can be achieved by

getting rid of redundant image processing steps and

by minimizing the number of computations. Further

optimization approaches of the technique in [2, 3]

exploiting genetic algorithms and telemetry data based

ego-motion compensation strategies can be found in

[18] and [21], respectively.The approaches summarized above are generally

referred to as target representation and localizationbased techniques, as they follow a strategy that triesto identify target position based on its appearance.Alternative strategies, which are often used (possiblyin combination with the above techniques) arerepresented by the so-called filtering and dataassociation algorithms [6]. In this case, predictionstrategies are applied to estimate updated targetfeatures including, position, appearance, size, etc.

1468 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 47, NO. 2 APRIL 2011

Fig. 1. False alarms generated in tracking phases (black boxes), and correct target positions (white boxes). (a) Sequence mwir 1415,

frame 4. (b) Sequence lwir 1618, frame 9. (c)—(d) Sequence lwir 1608, frames 7 and 19.

Kalman filters have traditionally been used inautomatic target tracking with the aim of estimatingthe target kinematics [20, 8]. In [26], an adaptiveprediction method relying on a Kalman filter has beenintegrated into a mean-shift based tracking algorithmto adaptively predict the initial searching point.Recently, Kalman filters have also been employed fordesigning on-the-fly appearance learning algorithms[23]. An alternative to Kalman filters is representedby particle filters; despite their higher computationalcomplexity, they can provide for higher accuracy, thuspossibly resulting in improved tracking effectiveness.As a matter of example, in [5] dynamic models takinginto account target aspect change, target motion,and background clutter are combined to improvetracking performance and let the particle filter dealalso with strong ego-motion of the sensor. In [22] and[13], multiple particle filters are run concurrently toestimate target appearance and kinematics using astate vector encompassing target position, velocity,and size both in the pixel and modulation domains.The methodology presented in this work moves

from the strategies discussed above and is targetedat improving the robustness of tracking algorithmsrelying on the TM technique by incorporating anadaptive activation metric based on motion prediction.As the proposed approach builds upon a highperformance TD technique that aims at keepingcomputational complexity low by trying to limitthe overhead associated with the intervention ofthe TM algorithm, a probability-based strategyrelying on an efficient Kalman estimator is usedto control the activation of the recovery phase.Although in the current experimental setup, targetedto observer-in-the-loop high speed target trackingapplications, the selected prediction strategy provedto be capable of achieving effective performance,different filtering techniques could possibly beconsidered for dealing with alternative (e.g. withsensor-in-the-loop) scenarios.

III. FALSE ALARMS

In order to clarify the relation between the metric

for TM activation and the overall algorithm behavior,

before discussing the specific implementation and

introducing the designed strategy for improvement,

it is better to focus on the categories of false alarms

that may occur during the tracking phases.

Basically, the need for a recovery is generally due

to three main failures modes [9], which are illustrated

in Fig. 1 (making reference to sample frames from

sequences in the AMCOM data set). The first failure

mode (Fig. 1(a)) is related to abrupt discontinuities

in target position due to camera ego-motion. The

second failure mode (Fig. 1(b)) occurs when the

TD phase fails in determining the correct target

location (e.g., because of the presence of other targets

as well as of other unwanted image artifacts like

blurring, cluttering, intensity variation, etc.). Finally,

the third failure mode (Fig. 1(c)—(d)), known as the

“drifting problem” [10], is related to the behavior of

the TM phase. Specifically, when the TM technique

is activated, it uses information from the target in

the previously determined position to recover from

the false alarm. However, because of changes in

target shape, size, orientation, etc., a partially wrong

template can be gathered in some cases; when this

occurs over consecutive frames, small tracking errors

are accumulated and recovery leads to (ever more)

incorrect results.

In summary, on the one hand an effective

activation metric should allow the TM to intervene

all the times a false alarm belonging to the first and

second categories occurs. On the other hand, in

the above situations the metric should be capable

of detecting those cases in which the activation of

the TM technique would lead (possibly after some

frames) to a target loss due to the third failure mode.

Furthermore, in high speed tracking scenarios, the

TM activation metric should also take into account

the trade-off between algorithm robustness and

computational complexity.

IV. REFERENCE IMPLEMENTATION

In order to present the proposed technique and

compare it with existing approaches, the tracking

algorithm described in [3] was selected. It is worth

CORRESPONDENCE 1469

observing that, as the proposed approach deals with

the TM phase, it could be easily applied to other

works relying on a different implementation of the

TD phase. In [3], the detection phase is based, as in

many IR target tracking algorithms, on the hot spot

technique, which assumes that the target is brighter

than the background and noise effects. Specifically,

in order to determine the new target position, a

target intensity variation function characterizing the

similarity with respect to the local maximum value

describing the target in the previous frame is used.

The IVF technique requires two parameters to be

defined, i.e., the target window size and the subframe

size. The target window roughly corresponds to

the rectangular area including the target, while the

subframe defines the area where the target is searched

for in the new frame. The subframe size has to be

large enough to allow for abrupt camera movements

(in [3], it was set to 33£33 pixels).When tracking has to be performed on a given

frame n, the subframe is positioned at the reference

target location from the previous frame (pn¡1) and theintensity variation function Fn is computed for the

subframe matrix Sn as

Fn(v,z) =1

¤

lXj=1

kXi=1

jSn(i+ v,j+ z)¡!n¡1j (1)

where (v,z) are the spatial coordinates in the subframe

and ¤ is the size (k£ l) of the target window; finally,!n¡1 is defined as a 3£ 3 pixels matrix representingthe local maximum of the target window in the

previous (n¡ 1)th frame.To represent the candidate target coordinates in the

form of a peak value, an exponential function called

correlation output plane (CF) is used, which is defined

asCF(v,z) = e

¡¸Fn(v,z) (2)

where ¸ is a constant value and (v,z) represent the

coordinates of the target window in the subframe.

The CF generates the maximum peak (in the subframe

coordinate system) where the candidate target intensity

variation is close to that in the reference target

position. The coordinates of the peak are therefore

converted in the candidate target position p̄nIVF.

On the other hand, the TM technique can be

modeled as

Tn(v,z) =1

¤

lXj=1

kXi=1

jSn(i+ v,j+ z)¡Wn¡1j (3)

where (v,z), ¤, and (k, l) have the same meaning in

(1). However, in the TM phase, the target window

W is used instead of the local maximum window

matrix !. Although the use of the target window

increases the computational complexity, W provides

essential information concerning target shape. In fact,

being W defined as two to three pixels larger than the

Fig. 2. Application of reference tracking technique to sample

frame from sequence lwir 1520. (a) Frame 119. (b) Results

generated by IVF algorithm.

reference target size, it contains both target as well as

surrounding background information, which allows

to distinguish between target and nontarget objects.

As with the IVF technique, a correlation output plane

can be computed by using (2) and replacing Fn(v,z)

with Tn(v,z); the candidate target position p̄nTM is

then located where the correlation output plane yields

the maximum peak (after conversion in the image

coordinate system).

In Fig. 2, the application of the technique in [3] to

a sample frame from the AMCOM data set is shown.

The black box indicates the (wrong) candidate target

position generated by the IVF technique, whereas the

white box points out the real target position that can

be properly identified through the activation of the

TM-based recovery phase. The correlation output

planes resulting from the application of the IVF

and TM techniques are illustrated in Fig. 3. From

Fig. 3(a) it can be observed that the IVF technique

generated two peaks; the above ambiguity is properly

solved through the application of the TM algorithm

(Fig. 3(b)). It is worth observing that, although the use

of a larger neighborhood (e.g. 5£ 5, 7£ 7, etc.) in theIVF technique could possibly lead to different results,

in this work the high performance configuration used

in [3] was chosen to allow for a fair comparison.

In [3] the TM activation strategy relies on a

controller module that identifies the occurrence of a

false alarm in the TD phase by using a metric based

on the Euclidean distance ¯ (Fig. 4(a)) between the

candidate target position in the current frame (p̄nIVF)

and the reference target position in the previous frame

(pn¡1). If the distance exceeds a constant threshold(expressed in number of pixels) specified for the

considered image sequence, the TM phase is activated,

p̄nTM is computed, and the position closest to pn¡1 is

selected as the reference target position (i.e., either

p̄nIVF or p̄nTM). However, according to the authors,

finding the best criterion valid for all the sequences is

a task hard to accomplish. Thus, in [3] the threshold is

chosen by considering the trade-off between tracking

effectiveness and computational complexity. Although


Fig. 3. Correlation output planes produced by IVF and TM techniques on sequence lwir 1520, frame 119. (a) False alarm due to

presence of two maximum peaks. (b) Improved result obtained by TM technique.

Fig. 4. Strategies for activation of TM phase. (a) Euclidean

distance. (b) Motion prediction.

the authors demonstrated the robustness of their

technique with selected thresholds, the need for a

specific parameter configuration depending on the

particular sequence being considered could limit

the applicability of such technique when high speed

automatic target tracking scenarios are considered.

V. MOTION PREDICTION-BASED METRIC

In this paper, a different TM activation metric is

proposed, aimed at removing the need for identifying

a particular threshold for any image sequence. By

exploiting a motion prediction technique, the new

strategy is capable of linking the activation of the

TM technique to the history of target positions,

thus adapting its behavior to the particular frame or

sequence being considered (Fig. 4(b)).

Basically, for a given frame n, besides the

candidate target position p̄nIVF obtained through

the IVF, a predicted target position p̂n is estimated

using a Kalman filter (that provided the best results

even in sequences affected by significant camera

ego-motion). The candidate target position p̄nIVF,

through its associated motion vector (pn¡1, p̄nIVF), isthen evaluated against the predicted motion vector

(pn¡1, p̂n).

According to the terminology used in Fig. 4(b),

the reliability of candidate target position is measured

through a probabilistic approach as

P(p̄nIVF) = P(dIVF \®IVF)= P(dIVF)£P(®IVFjdIVF) (4)

where P(dIVF) and P(®IVFjdIVF) are defined as

P(dIVF) =

8>>>>>>>>><>>>>>>>>>:

1¡ d̂¡ dIVFd̂

if dIVF · d̂

1 if dIVF = d̂

0 if dIVF = dmax

1¡ dIVF¡ d̂dmax¡ d̂

otherwise

(5)

P(®IVFjdIVF)

=

8>><>>:j®IVF¡ 180±j

180±£μ1¡ dmax¡ dIVF

dmax

¶if ®IVF 6= 0, dIVF > 0

1 otherwise

(6)

and dmax is the maximum distance at which the target

can be found given the size of the subframe.

In (4), P(dIVF) takes into account the relation

between the lengths of the candidate and predicted

motion vectors. Probability is maximized when

lengths are equal and it is minimized when dIVF is

maximum. Probability values in the range ]0,1[ are

linearly assigned based on the relative difference in

motion vector lengths. Similarly, P(®IVF) is mapped to

angular differences between the above motion vectors,

modulo 180±. In (6), angular differences are weightedto minimize the contribution of P(®IVF) on the overall

probability for small motion vector lengths.

CORRESPONDENCE 1471

Fig. 5. Tracking results with Euclidean-based metric in [3] on sequence lwir 1913 for different values of ¯. For every ¯, the frame

where target loss occurs is shown (black boxes indicate the tracked position, whereas white boxes indicate the reference target position).

(a) ¯ = 2, frame 89. (b) ¯ = 6, frame 171. (c) ¯ = 10, frame 147. (d) ¯ = 14, frame 136.

TABLE I

Relation Between ¯ and Tracking Performance using the Euclidean Distance-Based Activation Metric

Sequence Target Frames ¯ = 2 ¯ = 3 ¯ = 4 ¯ = 5 ¯ = 6 ¯ = 7 ¯ = 8

lwir 1520 tank 215 134 76 33 26 16 15 FAIL

lwir 15NS tank 230 FAIL 94 FAIL 70 FAIL FAIL FAIL

lwir 1604 tank 279 FAIL FAIL 77 71 66 FAIL FAIL

lwir 1608 apc 81 FAIL 22 FAIL 6 4 2 2

truck 101 FAIL FAIL FAIL FAIL FAIL FAIL FAIL

lwir 1618 apc 300 FAIL FAIL 31 15 5 3 2

truck 18 10 FAIL 2 1 1 0 0

lwir 1701 bradley 388 FAIL FAIL 117 92 75 41 40

pickup 31 FAIL 27 20 FAIL 0 0 0

lwir 1720 target/m60 778 FAIL 126 43 44 FAIL FAIL FAIL

lwir 1807 bradley 240 FAIL 102 94 79 FAIL FAIL FAIL

lwir 1816 tank 328 FAIL FAIL FAIL 96 74 56 46

m60 208 131 106 31 27 27 FAIL FAIL

lwir 1906 apc 208 FAIL 40 9 6 6 6 6

lwir 1913 apc 265 FAIL FAIL 33 39 24 23 20

tank 265 FAIL FAIL FAIL FAIL FAIL FAIL FAIL

lwir 2115 bradley 333 FAIL FAIL FAIL 23 5 1 0

lwir 2117 apc 360 FAIL 16 FAIL 0 0 0 0

Note: When tracking succeeds, the number of false alarms, i.e., the number of calls to the TM technique, is indicated.

By setting a confidence level ¹ on P(p̄nIVF) it

is possible to decide whether to elect p̄nIVF as the

reference position for the current frame or to activate

the TM technique. In this case, P(p̄nTM) is computed

(by replacing dIVF and ®IVF with dTM and ®TM,

respectively, in the above equations) and assessed

against P(p̄nIVF), so that the position characterized by

the highest probability is selected.

VI. EXPERIMENTAL RESULTS

In order to illustrate the robustness improvement

provided by the proposed metric with respect to the

Euclidean-based activation strategy, single target and

multi-target imagery used in [3] were considered. The

data set consisted of 21 image sequences from the

AMCOM, including 33 different targets.

The analysis was started by running the Euclidean

distance-based metric using parameters in [3] and

varying the activation threshold ¯. Results obtained

for the most representative sequences are reported

in Table I. It can be easily observed that tracking

performance strongly depends on the value of ¯. In

fact, a single value that is valid for all the sequences

cannot be found. Moreover, a configuration working

for a particular target could possibly fail for other

targets within the same sequence. Finally, there

exist targets that cannot be tracked at all if ¯ is kept

constant for all the frames of the considered sequence.

In order to gain further insight into the above

considerations, Fig. 5 could be considered, where

tracking results obtained on sequence lwir 1913 for

different values of ¯ are reported. Frames where target

losses occur are shown; black boxes correspond to

the position determined by the tracking algorithm,

whereas white boxes indicate the reference target

position. As anticipated, none of the considered values


Fig. 6. Behavior of proposed metric for activation of recovery phase on sequence lwir 1906, with gray areas corresponding to frames

where it was P(p̄nIVF)< ¹. (a) Reliability of candidate target position resulting from application of the IVF and TM techniques,

expressed in terms of P(p̄nIVF) and P(p̄nTM). (b) Tracked x-location w.r.t. AMCOM ground truth. (c) Target y-position w.r.t. AMCOM

ground truth.

of ¯ allow for the tracking algorithm to hold on the

target for the whole duration of the sequence. For

instance, when ¯ = 2 frequent activations of the TM

technique have the effect of propagating a wrong

template which finally lead to a target loss at frame

89 (Fig. 5(a)). On the contrary, when ¯ = 6 a less

constrained activation metric lets the target window

move to a brighter nontarget object that is entering the

subframe at frame 171 (Fig. 5(b)); although the reason

for failure is different, the initially tracked object is

again definitely lost. Similar considerations also apply

to Fig. 5(c)—(d) where, for sake of completeness,

tracking results for larger values of ¯ are depicted.

Experimental tests were therefore repeated on

the same sequences by replacing the reference

activation methodology with the technique presented

in this work, and by looking for a confidence value

¹ enabling the proposed metric to both mimic the

behavior of the the Euclidean-based strategy in [3]

and to overcome at the same time its limitations on

critical frames. A confidence value ¹= 0:85 was

experimentally found, which allowed the proposed

metric to properly cope with all the sequences in the

considered data set.

In Fig. 6 and Fig. 7, tracking results achieved

on sequence lwir 1906 by means of the designed

probabilistic metric are compared with those obtained

with the strategy in [3]. A value of ¯ = 5 was selected

for the reference technique, in order to avoid target

losses and to investigate the ability of the designed

approach to replicate the results that could be obtained

by the distance-based metric. In Fig. 6(a), the value of

P(p̄nIVF) and P(p̄nTM) for any given frame is shown.

Gray areas indicate frames where the recovery

phase was activated, i.e., it was P(p̄nIVF)< ¹. More

specifically, light gray areas correspond to frames

CORRESPONDENCE 1473

Fig. 7. Behavior of Euclidean distance-based metric in [3] on sequence lwir 1906 and tracking errors with respect to AMCOM

ground truth.

where the candidate target position identified by

the TM technique was selected, whereas dark gray

areas indicate frames where, after activating the TM

phase, the candidate target position identified by

the IVF technique was actually selected, i.e., it was

P(p̄nIVF)¸ P(p̄nTM). Similarly, in Fig. 7(a), the valuesof dIVF and dTM obtained using the approach in [3]

are depicted, and frames where the TM technique

was activated (dIVF > ¯) are highlighted. Moreover,

in Fig. 6(b)—(c) and Fig. 7(b)—(c) the reference target

positions obtained by the two approaches for any

given frame are plotted, together with ground truth

data by AMCOM.

It can be easily observed that in this case, since

a suitable value of ¯ was used, performance of the

proposed approach are almost indistinguishable from

those obtained with the Euclidean distance-based

metric. This is confirmed by the mean absolute

errors (MAE) of target position in the horizontal and

vertical directions, that are equal to 0.52 and 0.98

pixels for the Euclidean distance-based metric, and

to 0.49 and 0.98 pixels for the probabilistic metric,

respectively. Tracking results on some interesting

frames of the considered sequence are reported in

Fig. 8 in order to allow for visual inspection. White

boxes indicate the target positions identified by the

designed probabilistic metric (which, in this case,

are overlapped to those generated by the activation

strategy in [3]). White and black pointers indicate

the IVF and TM (when computed) candidate target

positions, respectively. Finally, gray boxes indicate

the predicted target positions (for the probabilistic

approach). It can be observed that, at frame 204, both

approaches considered the candidate target position

identified by the IVF. At frame 205, the candidate

target position produced by the TM technique was

used. At frame 206, the recovery phase was activated,

but the IVF candidate target position was chosen.

Finally, at frame 207 the IVF technique was directly

selected by both the approaches. For this sequence,


Fig. 8. Tracking results with proposed metric (white boxes) on sequence lwir 1906 and activation of recovery phase relying on

predicted (gray boxes) and IVF/TM-based candidate target positions (white/black pointers). (a) Frame 204, IVF. (b) Frame 205, TM.

(c) Frame 207, computed TM, then selected IVF. (d) Frame 207, IVF.

Fig. 9. Frame-to-frame tracking jitter on sequence lwir 1906 using proposed metric (ground truth jitter is also shown). (a) x-direction.

(b) y-direction. (c) Absolute tracking jitter.

the frame-to-frame tracking jitter is also reported

in Fig. 9, as it is considered to be a key parameter

for the design of sensor-in-the-loop applications. It

is worth observing that, at frames 204 and 205, an

abrupt change in target position was experienced;

nonetheless, this situation was properly handled by the

subframe size that was selected for the experimental

tests [3].

A totally different scenario is depicted in Fig. 10,

where tracking results on sequence lwir 1913 are

CORRESPONDENCE 1475

Fig. 10. Behavior of proposed metric for activation of recovery phase on sequence lwir 1913, with gray areas corresponding to frames

where it was P(p̄nIVF)< ¹. (a) Reliability of candidate target position resulting from application of IVF and TM techniques, expressed in

terms of P(p̄nIVF) and P(p̄nTM). (b) Tracked x-location w.r.t. AMCOM ground truth. (c) Target y-position w.r.t. AMCOM ground truth.

reported. As previously seen, this is an extremely

interesting sequence for showing the effectiveness

of the proposed approach, as a value of ¯ allowing

for the Euclidean distance-based metric to properly

track the target for the whole sequence cannot be

found. On the contrary, using the selected confidence

index, the proposed metric results in a successful

tracking. It is worth observing that, at frame 242, an

abrupt change in target location is experienced, which

is again properly managed thanks to the selected

subframe size [3]. Some representative frames of

the considered sequence are illustrated in Fig. 11.

In particular, in Fig. 11(b) it could be noticed that

an improper candidate target position identified by

the IVF was recovered through the activation of the

TM phase. Moreover, Fig. 11(d)—(h) show several

frames where the target is making a sharp turn and

another moving object is overlapping. At frame 173

and 179 (Fig. 11(d)—(e)), thanks to the activation

of the TM mechanism, the tracker held on the the

target (while the IVF technique was trying to move

it over the approaching object). Similarly, at frame

231 (Fig. 11(f)) the probabilistic metric prevented the

target window from moving on a nontarget region

showing the best matching intensity variation. At

frame 232 (Fig. 11(g)), the IVF technique was used,

and a reference target position corresponding to the

target hot spot (at the rear of the target itself) was

identified, as expected. It is worth observing that, as

the AMCOM ground truth does not correspond to

target hot spot tracked by the IVF technique, tracking

errors computed with respect to these reference data

are expected to grow as the camera approaches the

target (i.e., when strong magnifications occur in the

sequence). Nevertheless, independent of the above

error (whose management is out of the scope of this


Fig. 11. Tracking results with proposed metric (white boxes) on sequence lwir 1913 and activation of recovery phase relying on

predicted (gray boxes) and IVF/TM-based candidate target positions (white/black pointers). (a) Frame 86, IVF. (b) Frame 87, TM.

(c) Frame 88, IVF. (d) Frame 173, TM. (e) Frame 179, TM. (f) Frame 231, TM. (g) Frame 232, IVF. (h) Frame 264, IVF.

TABLE II

Target Tracking Accuracy: Percentages of Correctly Tracked Targets with the Two Metrics

Euclidean Distance-Based Metric Proposed Metric

¯ = 2 ¯ = 3 ¯ = 4 ¯ = 5 ¯ = 6 ¯ = 7 ¯ = 8 ¹= 0:85

39.4 57.6 75.8 90.9 84.8 75.8 72.7 100.0

work), the tracker correctly follows the target until the

end of the sequence (Fig. 11(h)).

Percentages of correctly tracked targets using the

two metrics are reported in Table II. From this table it

can be easily observed that although with the strategy

in [3] a value of ¯ = 5 allows to reach the 90.9%

of tracking accuracy on the considered sequences,

the proposed technique ensures a definitely higher

robustness, especially when automatic target tracking

is of interest.

From the examples above it should be evident

that the improved robustness of the proposed metric

derives from a TM activation pattern that is different

than the one resulting from the application of the

strategy in [3]. In order to further analyze this aspect,

sequence lwir 15NS is considered, and tracking results

obtained by the application of the two strategies are

visually compared by specifically considering some

representative frames. In particular, in Fig. 12(a)—(d),

tracking results obtained with the approach in

[3] using a critical value for the recovery phase

activation threshold (namely, ¯ = 6), are shown. In the

considered frames, the TM technique was used only at

frame 88. However, as it can be seen in Fig. 12(c), at

frame 89 the candidate position identified by the IVF

was actually outside the target, but the selected value

of ¯ did not require the activation of the recovery

phase. From Fig. 12(d) it could be observed that,

at frame 94, the target was actually lost. Results

obtained on the same set of frames using the proposed

metric are illustrated in Fig. 12(e)—(d). Besides

frame 86, where the reference target position was

generated by the IVF (as with the metric in [3]), in

all the remaining frames the TM phase was activated.

Specifically, at frame 89 (Fig. 12(g)) the candidate

target position identified by the TM technique was

selected rather than the one generated by the IVF

(as in Fig. 12(c)). Thus, as it can be noticed from

Fig. 12(e), at frame 94 the target was still correctly

tracked.

Results concerning recovery phase activation

rates for the same sequences analyzed in Table II

are reported in Table III. Specifically, the fourth

column reports the number of times the recovery

phase was activated, i.e., TM was computed. The fifth

column tabulates the number of times the candidate

CORRESPONDENCE 1477

Fig. 12. Tracking results (white boxes), IVF and TM candidate target positions (white and black pointers) and predicted target

positions (gray boxes) on frames 86, 88, 89, 94 from sequence lwir 15NS. (a)—(d) Distance-based metric. (e)—(h) Proposed approach.

TABLE III

TM Activation Rates (Proposed Metric) Compared with Results Obtained with the Euclidean Distance-Based Approach (¯ = 5)

Sequence Target Frames TM Comp. TM Sel. TM Diff. (¯ = 5) TM Rate (%)

lwir 1520 tank 215 32 28 +6 14.88

lwir 15NS tank 230 90 72 +20 39.13

lwir 1604 tank 279 68 60 ¡3 24.37

lwir 1608 apc 81 10 9 +4 12.35

truck 101 62 48 N/A 61.39

lwir 1618 apc 300 28 25 +13 9.33

truck 18 3 2 +2 16.67

lwir 1701 bradley 388 113 104 +21 29.12

pickup 31 24 22 N/A 77.42

lwir 1720 target/m60 778 52 45 +8 6.68

lwir 1807 bradley 240 89 82 +10 37.08

lwir 1816 tank 328 88 76 ¡8 26.83

m60 208 31 29 +4 14.90

lwir 1906 apc 240 11 10 +5 4.58

lwir 1913 apc 265 39 33 0 14.72

tank 265 103 99 N/A 38.87

lwir 2115 bradley 333 33 31 +10 9.91

lwir 2117 apc 360 1 1 +1 0.28

target position identified by the TM technique was

actually elected as the reference target position for

the considered frame, i.e., P(p̄nTM)< P(p̄nIVF). Column

six reports the difference in the number of activations

with respect to the Euclidean distance-based metric;

the reference value ¯ = 5 was considered, in this

case. Finally, column seven tabulates the overall

recovery phase activation rate for any considered

sequence.

As a general consideration, from Table III it can

be observed that the application of the proposed

metric often results in a higher recovery rate, i.e.,

the improved robustness is generally paid with

the computational overhead associated with the

execution of a larger number of TM steps. However,

for all the considered sequences the proposed

approach allows to preserve a generally low activation

rate by frequently reducing the tracking phase to


Fig. 13. Tracking results with proposed metric (white boxes) compared with the approach in [3] (black boxes) for given value of ¯,

(a)—(d) sequence lwir 2115, ¯ = 4. (e)—(f) sequence lwir 1816, ¯ = 7. (a) Frame 120. (b) Frame 122. (c) Frame 123. (d) Frame 163.

(e) Frame 48. (f) Frame 49. (g) Frame 50. (h) Frame 130.

the execution of the low complexity IVF-based

technique.

For sake of completeness, several image sequences

presenting a direct comparison between tracking

results achieved with the two metrics are available

for download from: http://gohan.polito.it:8080/taes/.

Moreover, several representative frames are also

reported in Fig. 13.

VII. CONCLUSION

In this paper, a strategy for improving the

robustness of IR target tracking algorithms based on

TM has been presented. The proposed methodology

is developed in the context of high speed target

tracking applications. For this, in order to assess the

improvement ensured by the designed approach, a

reference algorithm relying on an efficient mechanism

for TD and tracking is chosen. In the selected

implementation, tracking is generally accomplished

by exploiting a reduced complexity signature based

on target intensity variation. Failures of the above

technique are identified based on a distance metric

and they are recovered by means of a TM algorithm

relying on target shape information.

In the designed approach, the logic controlling the

activation of the TM phase is replaced by a novel

strategy relying on a motion prediction technique.

The proposed solution was tested on challenging IR

image sequences, characterized by low signal-to-noise

ratio, highly nonstationary signature evolution,

and complex kinematics arising both from the

target and the sensor motion. Experimental tests

performed in an observer-in-the-loop configuration

showed that the self-adaptability of the designed

activation mechanism definitely contributes to

improving the algorithm accuracy, thus representing

an interesting methodology for the design of more

robust real-time solutions for automatic target tracking

applications.

Future work will be aimed at investigating the

effectiveness of the proposed metric in alternative

application contexts, by focusing on the more general

template update problem and taking into account

both the trade-off between algorithm accuracy and

computational complexity as well as the specific

effects of particular changes in target signature.

Moreover, experimental tests are currently being

performed to assess the behavior of the designed

methodology in a closed-loop scenario, with the

IR sensor mounted on a mobile robot platform.

Results achieved under the above conditions

will eventually drive the experimentation on the

infrared-enabled unmanned aerial vehicle (UAV)

presented in [21].

FABRIZIO LAMBERTI

ANDREA SANNA

GIANLUCA PARAVATI

Dipartimento di Automatica e Informatica

Politecnico di Torino

C.so Duca degli Abruzzi 24

I-10129 Torino

Italy

E-mail: ([email protected])

CORRESPONDENCE 1479

REFERENCES

[1] Alam, M. S. and Bal, A.Improved multiple target tracking via global motion

compensation and optoelectronic correlation.

IEEE Transactions on Industrial Electronics, 54, 1 (2007),522—529.

[2] Alam, M. S. and Bal, A.

Automatic target tracking in FLIR image sequences.

In Proceedings of SPIE, Automatic Target Recognition

XIV, vol. 5426, 2004, 30—36.

[3] Bal, A. and Alam, M. S.

Automatic target tracking in FLIR image sequences using

intensity variation function and template modeling.

IEEE Transactions on Instrumentation and Measurement,54, 5 (2005), 1846—1852.

[4] Braga-Neto, U., Choudhary, M., and Goutsias, J.

Automatic target detection and tracking in

forward-looking infrared image sequences usingmorphological connected operators.

Journal of Electronic Imaging, 13, 4 (2004), 802—813.

[5] Bruno, M. G. S.

Bayesian methods for multiaspect target tracking in imagesequences.

IEEE Transactions on Signal Processing, 52, 7 (2004),1848—1861.

[6] Comaniciu, D., Ramesh, V., and Meer, P.Kernel-based object tracking.

IEEE Transactions on Pattern Analysis and Machine

Intelligence, 25, 5 (2003), 564—577.

[7] Comaniciu, D., Ramesh, V., and Meer, P.

Real-time tracking of non-rigid objects using mean shift.Computer Vision and Pattern Recognition, 2 (2000),142—149.

[8] Davies, D., Palmer, P., and Mirmehdi, M.

Detection and tracking of very small low contrast objects.In Proceedings of the 9th British Machine Vision

Conference, 1998, 599—608.

[9] Dawoud, A., Alam, M. S., and Bal, A.

Target tracking in infrared imagery using weightedcomposite reference function-based decision fusion.

IEEE Transactions on Image Processing, 15, 2 (2006),404—410.

[10] Han, T. S., Liu, M., and Huang, T. S.A drifting-proof framework for tracking and online

appearance learning.

In Proceedings of the 8th IEEE Workshop on Applications

of Computer Vision, 2007, 10.

[11] Hu, T., Liu, E., and Yang, J.

Multi-feature based ensemble classification and regression

tree (ECART) for target tracking in infrared imagery.

Journal of Infrared, Millimeter and Terahertz Waves, 30, 5(2009), 484—495.

[12] Iain, M., Ishikawa, T., and Baker, S.

The template update problem.

IEEE Transactions on Pattern Analysis and Machine

Intelligence, 26, 6 (2004), 810—815.

[13] Johnston, C. M., Mould, N., A., and Havlicek, J. P.

Multichannel dual domain infrared target tracking for

highly evolutionary target signatures.In Proceedings of the 16th IEEE International Conference

on Image Processing, 2009, 4117—4120.

[14] Johnston, C. M., Mould, N., and Havlicek, J. P.

Dual domain auxiliary particle filter with integrated target

signature update.In Proceedings of the IEEE Computer Society Conference

on Computer Vision and Pattern Recognition Workshops,

2009, 54—59.

[15] Miller, P. C., Royce, M., and Virgo, P.Evaluation of an optical correlator target recognition

system for acquisition and tracking in densely cluttered

natural scenes.

Optical Engineering, 38, 11 (1999), 1814—1825.

[16] Mould, N. A., Nguyen, C. T., and Havlicek, J. P.

Infrared target tracking with AM-FM consistency checks.In Proceedings of the IEEE Southwest Symposium on

Image Analysis and Interpretation, 2008, 5—8.

[17] Nummiaro, K., Koller-Meier, E., and Van Gool, L.

An adaptive color-based particle filter.

Image Vision Computing, 21 (2003), 99—110.

[18] Paravati, G., Sanna, A., and Pralio, B.

A genetic algorithm for target tracking in FLIR video

sequences using intensity variation function.IEEE Transactions on Instrumentation and Measurement,

58, 10 (2009), 3457—3467.

[19] Parry, H. S., Marshall, A., and Markham, K. C.

Tracking targets in FLIR images by region template

correlation.In Proceedings of SPIE, Acquisition, Tracking, and

Pointing XI, vol. 3086, 1997, 221—232.

[20] Salmond, D.

Target tracking: Introduction and Kalman tracking filters.

In Proceedings of the IEE Workshop on Target Tracking:

Algorithms and Applications, 2001, 1—16.

[21] Sanna, A., Pralio, B., and Lamberti, F.A novel ego-motion compensation strategy for automatic

target tracking in FLIR video sequences taken from

UAVs.

IEEE Transactions on Aerospace and Electronic Systems,45, 2 (2009), 723—734.

[22] Venkataraman, V., Guoliang, F., and Xin, F.Target tracking with online feature selection in FLIR

imagery.

In Proceedings of the IEEE Conference on Computer

Vision and Pattern Recognition, 2007, 1—8.

[23] Venkataraman, V., Fan, G., and Fan, V.

Appearance learning by adaptive Kalman filters for FLIRtracking.

In Proceedings of the IEEE Computer Society Conference

on Computer Vision and Pattern Recognition Workshops,

2009, 46—53.

[24] Xin, H.A novel infrared target detection and tracking algorithm

based on morphological filters.

In Proceedings of the International Workshop on

Information Security and Application, 2009, 260—263.

[25] Wei, C. and Jiang, S.

Automatic target detection and tracking in FLIR imagesequences using morphological connected operator.

In Proceedings of the International Conference on

Intelligent Information Hiding and Multimedia Signal

Processing, 2008, 414—417.

[26] Yang, W., Li, J., and Shi, D.Mean shift based target tracking in FLIR imagery via

adaptive prediction of initial searching points.

In Proceedings of the 2nd International Symposium on

Intelligent Information Technology Application, 2008,852—855.

[27] Yin, Y. and Man, H.Adaptive mean shift for target-tracking in FLIR imagery.

In Proceedings of the18th Annual Conference on Wireless

and Optical Communications, 2009, 1—3.

[28] Yilmaz, A., Shafique, K., and Lobo, N.

Target-tracking in FLIR imagery using mean-shift and

global motion compensation.In Proceedings of the IEEE Workshop on Computer Vision

Beyond the Visible Spectrum, 2001, 54—58.

[29] Yilmaz, A., Shafique, K., and Shah, M.

Tracking in airborne forward looking infrared imagery.

Image and Vision Computing Journal, 21, 7 (2003),623—635.

[30] Yilmaz, A., Li, X., and Shah, M.Object contour tracking using level sets.

In Proceedings of the Asian Conference on Computer

Vision, 2004.


Improving Robustness of Infrared Target Tracking Algorithms Based on Template Matching

Documents