MSDF - Study Paper

Hindawi Publishing CorporationThe Scientific World JournalVolume 2013, Article ID 704504, 19 pageshttp://dx.doi.org/10.1155/2013/704504

Review ArticleA Review of Data Fusion Techniques

Federico Castanedo

Deusto Institute of Technology, DeustoTech, University of Deusto, Avenida de las Universidades 24, 48007 Bilbao, Spain

Correspondence should be addressed to Federico Castanedo; [email protected]

Received 9 August 2013; Accepted 11 September 2013

Academic Editors: Y. Takama and D. Ursino

Copyright 2013 Federico Castanedo.This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The integration of data and knowledge from several sources is known as data fusion. This paper summarizes the state of the datafusion field and describes the most relevant studies. We first enumerate and explain different classification schemes for data fusion.Then, the most common algorithms are reviewed. These methods and algorithms are presented using three different categories: (i)data association, (ii) state estimation, and (iii) decision fusion.

1. Introduction

In general, all tasks that demand any type of parameterestimation from multiple sources can benefit from the useof data/information fusion methods. The terms informationfusion and data fusion are typically employed as synonyms;but in some scenarios, the term data fusion is used forraw data (obtained directly from the sensors) and the terminformation fusion is employed to define already processeddata. In this sense, the term information fusion implies ahigher semantic level than data fusion. Other terms associ-ated with data fusion that typically appear in the literatureinclude decision fusion, data combination, data aggregation,multisensor data fusion, and sensor fusion.

Researchers in this field agree that the most accepteddefinition of data fusion was provided by the Joint Directorsof Laboratories (JDL) workshop [1]: A multi-level processdealing with the association, correlation, combination of dataand information from single and multiple sources to achieverefined position, identify estimates and complete and timelyassessments of situations, threats and their significance.

Hall and Llinas [2] provided the following well-knowndefinition of data fusion: data fusion techniques combine datafrom multiple sensors and related information from associateddatabases to achieve improved accuracy and more specificinferences than could be achieved by the use of a single sensoralone.

Briefly, we can define data fusion as a combination ofmultiple sources to obtain improved information; in thiscontext, improved information means less expensive, higherquality, or more relevant information.

Data fusion techniques have been extensively employedon multisensor environments with the aim of fusing andaggregating data from different sensors; however, these tech-niques can also be applied to other domains, such as textprocessing.The goal of using data fusion inmultisensor envi-ronments is to obtain a lower detection error probability anda higher reliability by using data from multiple distributedsources.

The available data fusion techniques can be classified intothree nonexclusive categories: (i) data association, (ii) stateestimation, and (iii) decision fusion. Because of the largenumber of published papers on data fusion, this paper doesnot aim to provide an exhaustive review of all of the studies;instead, the objective is to highlight the main steps that areinvolved in the data fusion framework and to review themostcommon techniques for each step.

The remainder of this paper continues as follows. Thenext section provides various classification categories for datafusion techniques. Then, Section 3 describes the most com-mon methods for data association tasks. Section 4 providesa review of techniques under the state estimation category.Next, the most common techniques for decision fusion areenumerated in Section 5. Finally, the conclusions obtained

2 The Scientific World Journal

from reviewing the different methods are highlighted inSection 6.

2. Classification of Data Fusion Techniques

Data fusion is a multidisciplinary area that involves severalfields, and it is difficult to establish a clear and strict classifi-cation.The employedmethods and techniques can be dividedaccording to the following criteria:

(1) attending to the relations between the input datasources, as proposed by Durrant-Whyte [3]. Theserelations can be defined as (a) complementary, (b)redundant, or (3) cooperative data;

(2) according to the input/output data types and theirnature, as proposed by Dasarathy [4];

(3) following an abstraction level of the employed data:(a) rawmeasurement, (b) signals, and (c) characteris-tics or decisions;

(4) based on the different data fusion levels defined by theJDL;

(5) Depending on the architecture type: (a) centralized,(b) decentralized, or (c) distributed.

2.1. Classification Based on the Relations between the DataSources. Based on the relations of the sources (see Figure 1),Durrant-Whyte [3] proposed the following classificationcriteria:

(1) complementary: when the information provided bythe input sources represents different parts of thescene and could thus be used to obtainmore completeglobal information. For example, in the case of visualsensor networks, the information on the same targetprovided by two cameras with different fields of viewis considered complementary;

(2) redundant: when two or more input sources provideinformation about the same target and could thus befused to increment the confidence. For example, thedata coming from overlapped areas in visual sensornetworks are considered redundant;

(3) cooperative: when the provided information is com-bined into new information that is typically morecomplex than the original information. For example,multi-modal (audio and video) data fusion is consid-ered cooperative.

2.2. Dasarathys Classification. One of the most well-knowndata fusion classification systems was provided by Dasarathy[4] and is composed of the following five categories (seeFigure 2):

(1) data in-data out (DAI-DAO): this type is the mostbasic or elementary data fusion method that is con-sidered in classification. This type of data fusionprocess inputs and outputs raw data; the results

are typically more reliable or accurate. Data fusion atthis level is conducted immediately after the data aregathered from the sensors. The algorithms employedat this level are based on signal and image processingalgorithms;

(2) data in-feature out (DAI-FEO): at this level, the datafusion process employs raw data from the sourcesto extract features or characteristics that describe anentity in the environment;

(3) feature in-feature out (FEI-FEO): at this level, boththe input and output of the data fusion process arefeatures. Thus, the data fusion process addresses aset of features with to improve, refine or obtain newfeatures. This process is also known as feature fusion,symbolic fusion, information fusion or intermediate-level fusion;

(4) feature in-decision out (FEI-DEO): this level obtains aset of features as input and provides a set of decisionsas output. Most of the classification systems thatperform a decision based on a sensors inputs fall intothis category of classification;

(5) Decision In-Decision Out (DEI-DEO): This type ofclassification is also known as decision fusion. It fusesinput decisions to obtain better or new decisions.

The main contribution of Dasarathys classification is thespecification of the abstraction level either as an input or anoutput, providing a framework to classify different methodsor techniques.

2.3. Classification Based on the Abstraction Levels. Luo et al.[5] provided the following four abstraction levels:

(1) signal level: directly addresses the signals that areacquired from the sensors;

(2) pixel level: operates at the image level and could beused to improve image processing tasks;

(3) characteristic: employs features that are extractedfrom the images or signals (i.e., shape or velocity),

(4) symbol: at this level, information is represented assymbols; this level is also known as the decision level.

Information fusion typically addresses three levels ofabstraction: (1) measurements, (2) characteristics, and (3)decisions. Other possible classifications of data fusion basedon the abstraction levels are as follows:

(1) low level fusion: the raw data are directly providedas an input to the data fusion process, which providemore accurate data (a lower signal-to-noise ratio)than the individual sources;

(2) medium level fusion: characteristics or features(shape, texture, and position) are fused to obtainfeatures that could be employed for other tasks. Thislevel is also known as the feature or characteristiclevel;

The Scientific World Journal 3

S1 S2 S3 S4 S5

Complementaryfusion

Redundantfusion

Cooperativefusion

Fusedinformation

Sources

Information

(a + b) (b) (c)

A B

A B

B C

C

C

Figure 1: Whytes classification based on the relations between the data sources.

Data

Data

Features

Features

Decisions

Data

Features

Features

Decisions

Decisions

Data in-data out(DAI-DAO)

Data in-feature out(DAI-FEO)

Feature in-decision out(FEI-DEO)

Decision in-decision out(DEI-DEO)

Feature in-feature out(FEI-FEO)

Figure 2: Dasarathys classification.

(3) high level fusion: this level, which is also knownas decision fusion, takes symbolic representations assources and combines them to obtain amore accuratedecision. Bayesiansmethods are typically employed atthis level;

(4) multiple level fusion: this level addresses data pro-vided from different levels of abstraction (i.e., whena measurement is combined with a feature to obtain adecision).

2.4. JDL Data Fusion Classification. This classification is themost popular conceptual model in the data fusion commu-nity. It was originally proposed by JDL and the American

Department of Defense (DoD) [1]. These organizations clas-sified the data fusion process into five processing levels, anassociated database, and an information bus that connectsthe five components (see Figure 3). The five levels could begrouped into two groups, low-level fusion and high-levelfusion, which comprise the following components:

(i) sources: the sources are in charge of providingthe input data. Different types of sources can beemployed, such as sensors, a priori information (ref-erences or geographic data), databases, and humaninputs;

(ii) human-computer interaction (HCI): HCI is an inter-face that allows inputs to the system from the oper-ators and produces outputs to the operators. HCIincludes queries, commands, and information on theobtained results and alarms;

(iii) database management system: the database manage-ment system stores the provided information andthe fused results. This system is a critical componentbecause of the large amount of highly diverse infor-mation that is stored.

In contrast, the five levels of data processing are defined asfollows:

(1) level 0source preprocessing: source preprocessingis the lowest level of the data fusion process, andit includes fusion at the signal and pixel levels. Inthe case of text sources, this level also includes theinformation extraction process.This level reduces theamount of data and maintains useful information forthe high-level processes;

(2) level 1object refinement: object refinement employsthe processed data from the previous level. Com-mon procedures of this level include spatio-temporalalignment, association, correlation, clustering orgrouping techniques, state estimation, the removal offalse positives, identity fusion, and the combining offeatures that were extracted from images. The output


Fusion domain

Level 0 Level 1 Level 2 Level 3Sourcepreprocessing

Objectrefinement

Situationassessment

Threatassessment

Information bus

SourcesSensorsDatabasesKnowledge

Level 4 DatabasemanagementProcess

refinement

Userinterface

Figure 3: The JDL data fusion framework.

results of this stage are the object discrimination(classification and identification) and object track-ing (state of the object and orientation). This stagetransforms the input information into consistent datastructures;

(3) level 2situation assessment: this level focuses ona higher level of inference than level 1. Situationassessment aims to identify the likely situations giventhe observed events and obtained data. It establishesrelationships between the objects. Relations (i.e.,proximity, communication) are valued to determinethe significance of the entities or objects in a specificenvironment. The aim of this level includes perform-ing high-level inferences and identifying significantactivities and events (patterns in general). The outputis a set of high-level inferences;

(4) level 3impact assessment: this level evaluates theimpact of the detected activities in level 2 to obtain aproper perspective.The current situation is evaluated,and a future projection is performed to identifypossible risks, vulnerabilities, and operational oppor-tunities. This level includes (1) an evaluation of therisk or threat and (2) a prediction of the logicaloutcome;

(5) level 4process refinement: this level improves theprocess from level 0 to level 3 and provides resourceand sensor management. The aim is to achieve effi-cient resource management while accounting for taskpriorities, scheduling, and the control of availableresources.

High-level fusion typically starts at level 2 because thetype, localization, movement, and quantity of the objectsare known at that level. One of the limitations of the JDLmethod is how the uncertainty about previous or subsequentresults could be employed to enhance the fusion process(feedback loop). Llinas et al. [6] propose several refinementsand extensions to the JDL model. Blasch and Plano [7]proposed to add a new level (user refinement) to support ahumanuser in the data fusion loop.The JDLmodel represents

the first effort to provide a detailed model and a commonterminology for the data fusion domain. However, becausetheir roots originate in the military domain, the employedterms are oriented to the risks that commonly occur inthese scenarios. The Dasarathy model differs from the JDLmodel with regard to the adopted terminology and employedapproach. The former is oriented toward the differencesamong the input and output results, independent of theemployed fusion method. In summary, the Dasarathy modelprovides a method for understanding the relations betweenthe fusion tasks and employed data, whereas the JDL modelpresents an appropriate fusion perspective to design datafusion systems.

2.5. Classification Based on the Type of Architecture. One ofthe main questions that arise when designing a data fusionsystem is where the data fusion process will be performed.Based on this criterion, the following types of architecturescould be identified:

(1) centralized architecture: in a centralized architecture,the fusion node resides in the central processor thatreceives the information from all of the input sources.Therefore, all of the fusion processes are executedin a central processor that uses the provided rawmeasurements from the sources. In this schema, thesources obtain only the observationas measurementsand transmit them to a central processor, where thedata fusion process is performed. If we assume thatdata alignment and data association are performedcorrectly and that the required time to transfer thedata is not significant, then the centralized scheme istheoretically optimal. However, the previous assump-tions typically do not hold for real systems. Moreover,the large amount of bandwidth that is required to sendraw data through the network is another disadvantagefor the centralized approach. This issue becomes abottleneck when this type of architecture is employedfor fusing data in visual sensor networks. Finally,the time delays when transferring the informationbetween the different sources are variable and affect


the results in the centralized scheme to a greaterdegree than in other schemes;

(2) decentralized architecture: a decentralized architec-ture is composed of a network of nodes in which eachnode has its own processing capabilities and there isno single point of data fusion. Therefore, each nodefuses its local information with the information thatis received from its peers. Data fusion is performedautonomously, with each node accounting for its localinformation and the information received from itspeers. Decentralized data fusion algorithms typicallycommunicate information using the Fisher and Shan-non measurements instead of the objects state [8];The main disadvantage of this architecture is thecommunication cost, which is (2) at each com-munication step, where is the number of nodes;additionally, the extreme case is considered, in whicheach node communicates with all of its peers. Thus,this type of architecture could suffer from scalabilityproblems when the number of nodes is increased;

(3) distributed architecture: in a distributed architecture,measurements from each source node are processedindependently before the information is sent to thefusion node; the fusion node accounts for the infor-mation that is received from the other nodes. In otherwords, the data association and state estimation areperformed in the source node before the informationis communicated to the fusion node. Therefore, eachnode provides an estimation of the object state basedon only their local views, and this information isthe input to the fusion process, which provides afused global view. This type of architecture providesdifferent options and variations that range from onlyone fusion node to several intermediate fusion nodes;

(4) hierarchical architecture: other architectures com-prise a combination of decentralized and distributednodes, generating hierarchical schemes in which thedata fusion process is performed at different levels inthe hierarchy.

In principle, a decentralized data fusion system is moredifficult to implement because of the computation andcommunication requirements. However, in practice, there isno single best architecture, and the selection of the mostappropriate architecture should be made depending on therequirements, demand, existing networks, data availability,node processing capabilities, and organization of the datafusion system.

The reader might think that the decentralized anddistributed architectures are similar; however, they havemeaningful differences (see Figure 4). First, in a distributedarchitecture, a preprocessing of the obtainedmeasurements isperformed, which provides a vector of features as a result (thefeatures are fused thereafter). In contrast, in the decentralizedarchitecture, the complete data fusion process is conductedin each node, and each of the nodes provides a globallyfused result. Second, the decentralized fusion algorithmstypically communicate information, employing the Fisher

and Shannon measurements. In contrast, distributed algo-rithms typically share a common notion of state (position,velocity, and identity) with their associated probabilities,which are used to perform the fusion process [9]. Third,because the decentralized data fusion algorithms exchangeinformation instead of states and probabilities, they havethe advantage of easily separating old knowledge from newknowledge. Thus, the process is additive, and the associativemeaning is not relevant when the information is receivedand fused. However, in the distributed data fusion algorithms(i.e., distributed by Kalman Filter), the state that is goingto be fused is not associative, and when and how the fusedestimates are computed is relevant. Nevertheless, in contrastto the centralized architectures, the distributed algorithmsreduce the necessary communication and computationalcosts because some tasks are computed in the distributednodes before data fusion is performed in the fusion node.

3. Data Association Techniques

The data association problem must determine the set ofmeasurements that correspond to each target (see Figure 5).Let us suppose that there are targets that are being trackedby only one sensor in a cluttered environment (by a clutteredenvironment, we refer to an environment that has severaltargets that are to close each other).Then, the data associationproblem can be defined as follows:

(i) each sensors observation is received in the fusionnode at discrete time intervals;

(ii) the sensormight not provide observations at a specificinterval;

(iii) some observations are noise, and other observationsoriginate from the detected target;

(iv) for any specific target and in every time interval, wedo not know (a priori) the observations that will begenerated by that target.

Therefore, the goal of data association is to establish theset of observations or measurements that are generated bythe same target over time. Hall and Llinas [2] provided thefollowing definition of data association: The process of assignand compute the weights that relates the observations or tracks(A track can be defined as an ordered set of points that followa path and are generated by the same target.) from one set tothe observation of tracks of another set.

As an example of the complexity of the data associationproblem, if we take a frame-to-frame association and assumethat possible points could be detected in all frames, thenthe number of possible sets is (!)1. Note that from allof these possible solutions, only one set establishes the truemovement of the points.

Data association is often performed before the stateestimation of the detected targets. Moreover, it is a keystep because the estimation or classification will behaveincorrectly if the data association phase does not workcoherently. The data association process could also appear inall of the fusion levels, but the granularity varies dependingon the objective of each level.


Preprocessing

Preprocessing

Preprocessing

Alignment Association Estimation

Stateof theobject

Centralized architecture

Decentralized architecture

Distributed architecture

S1

S2

Fusion node

Preprocessing

Stateof theobject

Stateof theobject

Stateof theobject

S1

S2

S1

S2

Preprocessing

Preprocessing

Preprocessing

Preprocessing

Preprocessing

Alignment

Alignment

Alignment

Alignment

Alignment

Alignment

Alignment

Association

Association

Association

Association

Association

Association

Association

Estimation

Estimation

Estimation

Estimation

Estimation

Estimation

Estimation

Sn

Sn

Sn

Stateof theobject

Figure 4: Classification based on the type of architecture.

In general, an exhaustive search of all possible combina-tions grows exponentially with the number of targets; thus,the data association problem becomes NP complete. Themost common techniques that are employed to solve the dataassociation problem are presented in the following sections(from Sections 3.1 to 3.7).

3.1. Nearest Neighbors and K-Means. Nearest neighbor(NN) is the simplest data association technique. NN isa well-known clustering algorithm that selects or groups

the most similar values. How close the one measurement isto another depends on the employed distance metric andtypically depends on the threshold that is established by thedesigner. In general, the employed criteria could be based on(1) an absolute distance, (2) the Euclidean distance, or (3) astatistical function of the distance.

NN is a simple algorithm that can find a feasible (approx-imate) solution in a small amount of time. However, in acluttered environment, it could provide many pairs that havethe same probability and could thus produce undesirable


Targets Sensors Observations Tracks

Track 1

Track 2

False alarms

Ass

ocia

tion

S1

S2

...

Sn

Track n

y1, y2, . . . , yn

Figure 5: Conceptual overview of the data association process from multiple sensors and multiple targets. It is necessary to establish the setof observations over time from the same object that forms a track.

error propagation [10]. Moreover, this algorithm has poorperformance in environments in which false measurementsare frequent, which are in highly noisy environments.

All neighbors use a similar technique, in which all of themeasurements inside a region are included in the tracks.-Means [11] method is a well-known modification of

the NN algorithm. -Means divides the dataset values into different clusters. -Means algorithm finds the best local-ization of the cluster centroids, where best means a centroidthat is in the center of the data cluster.-Means is an iterativealgorithm that can be divided into the following steps:

(1) obtain the input data and the number of desiredclusters ();

(2) randomly assign the centroid of each cluster;(3) match each data point with the centroid of each

cluster;(4) move the cluster centers to the centroid of the cluster;(5) if the algorithm does not converge, return to step (3).

-Means is a popular algorithm that has been widelyemployed; however, it has the following disadvantages:

(i) the algorithm does not always find the optimal solu-tion for the cluster centers;

(ii) the number of clusters must be known a priori andone must assume that this number is the optimum;

(iii) the algorithm assumes that the covariance of thedataset is irrelevant or that it has been normalizedalready.

There are several options for overcoming these limita-tions. For the first one, it is possible to execute the algorithmseveral times and obtain the solution that has less variance.For the second one, it is possible to start with a low valueof and increment the values of until an adequate resultis obtained. The third limitation can be easily overcome bymultiplying the datawith the inverse of the covariancematrix.

Many variations have been proposed to Lloyds basic-Means algorithm [11], which has a computational upperbound cost of (), where is the number of input pointsand is the number of desired clusters. Some algorithmsmodify the initial cluster assignments to improve the separa-tions and reduce the number of iterations. Others introduce

soft or multinomial clustering assignments using fuzzy logic,probabilistic, or the Bayesian techniques. However, most ofthe previous variations still must perform several iterationsthrough the data space to converge to a reasonable solution.This issue becomes a major disadvantage in several real-time applications. A new approach that is based on havinga large (but still affordable) number of cluster candidatescompared to the desired clusters is currently gainingattention. The idea behind this computational model is thatthe algorithm builds a good sketch of the original data whilereducing the dimensionality of the input space significantly.In this manner, a weighted -Means can be applied to thelarge candidate clusters to derive a good clustering of theoriginal data. Using this idea, [12] presented an efficientand scalable -Means algorithm that is based on randomprojections. This algorithm requires only one pass throughthe input data to build the clusters. More specifically, if theinput data distribution holds some separability requirements,then the number of required candidate clusters grows onlyaccording to(log ), where is the number of observationsin the original data. This salient feature makes the algorithmscalable in terms of both the memory and computationalrequirements.

3.2. Probabilistic Data Association. The probabilistic dataassociation (PDA) algorithm was proposed by Bar-Shalomand Tse [13] and is also known as the modified filter of allneighbors. This algorithm assigns an association probabilityto each hypothesis from a valid measurement of a target.A valid measurement refers to the observation that falls inthe validation gate of the target at that time instant. Thevalidation gate, , which is the center around the predictedmeasurements of the target, is used to select the set of basicmeasurements and is defined as

( () ( | 1))1() ( () ( | 1)) ,

(1)

where is the temporal index, () is the covariance gain,and determines the gating or window size. The set of validmeasurements at time instant is defined as

() = () , = 1, . . . ,

, (2)


where () is the -measurement in the validation region at

time instant . We give the standard equations of the PDAalgorithm next. For the state prediction, consider

( | 1) = ( 1) ( 1 | 1) , (3)

where ( 1) is the transition matrix at time instant 1.To calculate the measurement prediction, consider

( | 1) = () ( | 1) , (4)

where () is the linearization measurement matrix. Tocompute the gain or the innovation of the -measurement,consider

V() =

() ( | 1) . (5)

To calculate the covariance prediction, consider

( | 1) = ( 1) ( 1 | 1) ( 1)+ () ,

(6)

where () is the process noise covariance matrix. To com-pute the innovation covariance () and the Kalman gain ()

() = () ( | 1)()+ ,

() = ( | 1)()()1.

(7)

To obtain the covariance update in the case in which themea-surements originated by the target are known, consider

0( | ) = ( | 1) () ()()

. (8)

The total update of the covariance is computed as

V () =

=1

() V() ,

() = () [

=1

(() V() V()) V () V()

]() ,

(9)

whereis the number of valid measurements in the instant

.The equation to update the estimated state, which is formedby the position and velocity, is given by

( | ) = ( | 1) + () V () . (10)

Finally, the association probabilities of PDA are as follows:

() =

()

=0(), (11)

where

() =

{{{{{{

{{{{{{

{

(2)/2 ()

(1 )

if = 0

exp [12V () 1 () V ()] if = 0

0 in other cases,(12)

where is the dimension of themeasurement vector, is thedensity of the clutter environment,

is the detection prob-

ability of the correct measurement, and is the validation

probability of a detected value.In the PDA algorithm, the state estimation of the target is

computed as a weighted sum of the estimated state under allof the hypotheses.The algorithm can associate different mea-surements to one specific target. Thus, the association of thedifferent measurements to a specific target helps PDA toestimate the target state, and the association probabilitiesare used as weights. The main disadvantages of the PDAalgorithm are the following:

(i) loss of tracks: because PDA ignores the interferencewith other targets, it sometimes could wrongly clas-sify the closest tracks. Therefore, it provides a poorperformance when the targets are close to each otheror crossed;

(ii) the suboptimal Bayesian approximation: when thesource of information is uncertain, PDA is the sub-optimal Bayesian approximation to the associationproblem;

(iii) one target: PDA was initially designed for the asso-ciation of one target in a low-cluttered environment.The number of false alarms is typically modeled withthe Poisson distribution, and they are assumed to bedistributed uniformly in space. PDA behaves incor-rectlywhen there aremultiple targets because the falsealarm model does not work well;

(iv) track management: because PDA assumes that thetrack is already established, algorithms must be pro-vided for track initialization and track deletion.

PDA is mainly good for tracking targets that do notmake abrupt changes in their movement patterns. PDA willmost likely lose the target if it makes abrupt changes in itsmovement patterns.

3.3. Joint Probabilistic Data Association. Joint probabilisticdata association (JPDA) is a suboptimal approach for trackingmultiple targets in cluttered environments [14]. JPDA issimilar to PDA, with the difference that the associationprobabilities are computed using all of the observationsand all of the targets. Thus, in contrast to PDA, JPDAconsiders various hypotheses together and combines them.JPDA determines the probability

() that measurement is

originated from target , accounting for the fact that underthis hypothesis, the measurement cannot be generated byother targets. Therefore, for a known number of targets, itevaluates the different options of the measurement-targetassociation (for the most recent set of measurements) andcombines them into the corresponding state estimation. Ifthe association probability is known, then the Kalman filterupdating equation of the track can be written as

( | ) =

( | 1) + () V

() , (13)

where ( | ) and ( | 1) are the estimation andprediction of target , and() is the filter gain.Theweighted


sum of the residuals associated with the observation () oftarget is as follows:

V() =

()

=1

() V

() , (14)

where V= ()

( | 1). Therefore, this method

incorporates all of the observations (inside the neighborhoodof the targets predicted position) to update the estimatedposition by using a posterior probability that is a weightedsum of residuals.

The main restrictions of JPDA are the following:

(i) a measurement cannot come from more than onetarget;

(ii) two measurements cannot be originated by the sametarget (at one time instant);

(iii) the sum of all of the measurements probabilities thatare assigned to one target must be 1: ()

=0

() = 1.

The main disadvantages of JPDA are the following:

(i) it requires an explicit mechanism for track initial-ization. Similar to PDA, JPDA cannot initialize newtracks or remove tracks that are out of the observationarea;

(ii) JPDA is a computationally expensive algorithm whenit is applied in environments that havemultiple targetsbecause the number of hypotheses is incrementedexponentially with the number of targets.

In general, JPDA is more appropriate than MHT insituations in which the density of false measurements is high(i.e., sonar applications).

3.4. Multiple Hypothesis Test. The underlying idea of themultiple hypothesis test (MHT) is based on using more thantwo consecutive observations to make an association withbetter results. Other algorithms that use only two consecutiveobservations have a higher probability of generating an error.In contrast to PDA and JPDA, MHT estimates all of thepossible hypotheses and maintains new hypotheses in eachiteration.

MHTwas developed to track multiple targets in clutteredenvironments; as a result, it combines the data associationproblem and tracking into a unified framework, becomingan estimation technique as well. The Bayes rule or theBayesian networks are commonly employed to calculate theMHT hypothesis. In general, researchers have claimed thatMHT outperforms JPDA for the lower densities of falsepositives. However, the main disadvantage of MHT is thecomputational cost when the number of tracks or falsepositives is incremented. Pruning the hypothesis tree usinga window could solve this limitation.

The Reid [15] tracking algorithm is considered the stan-dard MHT algorithm, but the initial integer programmingformulation of the problem is due to Morefield [16]. MHT isan iterative algorithm in which each iteration starts with a set

of correspondence hypotheses. Each hypothesis is a collec-tion of disjoint tracks, and the prediction of the target in thenext time instant is computed for each hypothesis. Next, thepredictions are compared with the new observations by usinga distance metric. The set of associations established in eachhypothesis (based on a distance) introduces new hypothesesin the next iteration. Each new hypothesis represents a newset of tracks that is based on the current observations.

Note that each new measurement could come from (i) anew target in the visual field of view, (ii) a target being tracked,or (iii) noise in the measurement process. It is also possiblethat a measurement is not assigned to a target because thetarget disappears, or because it is not possible to obtain atarget measurement at that time instant.

MHT maintains several correspondence hypotheses foreach target in each frame. If the hypothesis in the instant is represented by () = [

(), = 1, . . . , ], then

the probability of the hypothesis () could be represented

recursively using the Bayes rule as follows:

(() | ()) = (

( 1) ,

() | ())

=1

( () |

( 1) ,

())

(() |

( 1)) (

( 1)) ,

(15)

where ( 1) is the hypothesis of the complete set until

the time instant 1; () is the th possible association of the

track to the object; () is the set of detections of the currentframe, and is a normal constant.

The first term on the right side of the previous equationis the likelihood function of the measurement set () giventhe joint likelihood and current hypothesis. The second termis the probability of the association hypothesis of the currentdata given the previous hypothesis

( 1). The third term

is the probability of the previous hypothesis from which thecurrent hypothesis is calculated.

The MHT algorithm has the ability to detect a newtrack while maintaining the hypothesis tree structure. Theprobability of a true track is given by theBayes decisionmodelas

( | ) = ( | )

()

(), (16)

where ( | ) is the probability of obtaining the set ofmeasurements given ,

() is the a priori probability of

the source signal, and () is the probability of obtaining theset of detections .

MHT considers all of the possibilities, including boththe track maintenance and the initialization and removalof tracks in an integrated framework. MHT calculates thepossibility of having an object after the generation of a setof measurements using an exhaustive approach, and thealgorithm does not assume a fixed number of targets.The keychallenge of MHT is the effective hypothesis management.The baseline MHT algorithm can be extended as follows:(i) use the hypothesis aggregation for missed targets births,


cardinality tracking, and closely spaced objects; (ii) applya multistage MHT for improving the performance androbustness in challenging settings; and (iii) use a feature-aided MHT for extended object surveillance.

The main disadvantage of this algorithm is the compu-tational cost, which grows exponentially with the number oftracks andmeasurements.Therefore, the practical implemen-tation of this algorithm is limited because it is exponential inboth time and memory.

With the aim of reducing the computational cost, [17]presented a probabilistic MHT algorithm in which theassociations are considered to be random variables thatare statistically independent and in which performing anexhaustive search enumeration is avoided. This algorithm isknown as PMHT. The PMHT algorithm assumes that thenumber of targets and measurements is known. With thesame goal of reducing the computational cost, [18] presentedan efficient implementation of the MHT algorithm. Thisimplementation was the first version to be applied to performtracking in visual environments. They employed the Murty[19] algorithm to determine the best set of hypothesesin polynomial time, with the goal of tracking the points ofinterest.

MHT typically performs the tracking process by employ-ing only one characteristic, commonly the position. TheBayesian combination to use multiple characteristics wasproposed by Liggins II et al. [20].

A linear-programming-based relaxation approach to theoptimization problem in MHT tracking was proposed inde-pendently by Coraluppi et al. [21] and Storms and Spieksma[22]. Joo and Chellappa [23] proposed an association algo-rithm for tracking multiple targets in visual environments.Their algorithm is based on in MHT modification in whicha measurement can be associated with more than one target,and several targets can be associated with one measurement.They also proposed a combinatorial optimization algorithmto generate the best set of association hypotheses. Theiralgorithm always finds the best hypothesis, in contrast toother models, which are approximate. Coraluppi and Carthel[24] presented a generalization of the MHT algorithm usinga recursion over hypothesis classes rather than over a singlehypothesis. This work has been applied in a special case ofthemulti-target tracking problem, called cardinality tracking,in which they observed the number of sensor measurementsinstead of the target states.

3.5. Distributed Joint Probabilistic Data Association. The dis-tributed version of the joint probabilistic data association(JPDA-D) was presented by Chang et al. [25]. In this tech-nique, the estimated state of the target (using two sensors)after being associated is given by

{ | 1, 2} =

1

=0

2

=0

{ | 1

, 2

, 1, 2}

{1

, 2

| 1, 2} ,

(17)

where , = 1, 2, is the last set of measurements of

sensor 1 and 2, , = 1, 2, is the set of accumulative data,

and is the association hypothesis.The first term of the rightside of the equation is calculated from the associations thatwere made earlier. The second term is computed from theindividual association probabilities as follows:

(1

, 2

| 1, 2) =

1

2

= (1, 2| 1, 2) 1

(1) 2

(2) ,

(1, 2| 1, 2) =1

(1| 1) (

2| 2) (

1, 2) ,

(18)

where are the joint hypotheses involving all of themeasurements and all of the objectives, and

() are the

binary indicators of the measurement-target association.Theadditional term (1, 2) depends on the correlation of theindividual hypothesis and reflects the localization influenceof the current measurements in the joint hypotheses.

These equations are obtained assuming that commu-nication exists after every observation, and there are onlyapproximations in the case in which communication issporadic and when a substantial amount of noise occurs.Therefore, this algorithm is a theoretical model that has somelimitations in practical applications.

3.6. Distributed Multiple Hypothesis Test. The distributedversion of the MHT algorithm (MHT-D) [26, 27] follows asimilar structure as the JPDA-D algorithm. Let us assume thecase in which one node must fuse two sets of hypotheses andtracks. If the hypotheses and track sets are represented by() and () with = 1, 2, the hypothesis probabilities

are represented by ; and the state distribution of the tracks

() is represented by (

) and ( | ,

); then, the

maximum available information in the fusion node is =1 2. The data fusion objective of the MHT-D is to

obtain the set of hypotheses(), the set of tracks (), thehypothesis probabilities ( | ), and the state distribution( | , ) for the observed data.

The MHT-D algorithm is composed of the followingsteps:

(1) hypothesis formation: for each hypothesis pair 1and

2

, which could be fused, a track is formed by

associating the pair of tracks 1and 2

, where each

pair comes from one node and could originate fromthe same target. The final result of this stage is a setof hypotheses denoted by () and the fused tracks();

(2) hypothesis evaluation: in this stage, the associationprobability of each hypothesis and the estimatedstate of each fused track are obtained. The dis-tributed estimation algorithm is employed to calcu-late the likelihood of the possible associations andthe obtained estimations at each specific association.


Using the information model, the probability of eachfused hypothesis is given by

( | ) = 1

(()| ())()

( | ) , (19)

where is a normalizing constant, and ( | ) is thelikelihood of each hypothesis pair.

The main disadvantage of the MHT-D is the high com-putational cost that is in the order of (), where is thenumber of possible associations and is the number ofvariables to be estimated.

3.7. Graphical Models. Graphical models are a formalism forrepresenting and reasoning with probabilities and indepen-dence. A graphical model represents a conditional decom-position of the joint probability. A graphical model can berepresented as a graph in which the nodes denote randomvariables; the edges denote the possible dependence betweenthe random variables, and the plates denote the replication ofa substructure, with the appropriate indexing of the relevantvariables. The graph captures the joint distribution over therandom variables, which can be decomposed into a productof factors that each dependononly a subset of variables.Thereare two major classes of graphical models: (i) the Bayesiannetworks [28], which are also known as the directed graphicalmodels, and (ii) the Markov random fields, which are alsoknown as undirected graphical models. The directed graph-ical models are useful for expressing causal relationshipsbetween random variables, whereas undirected models arebetter suited for expressing soft constraints between randomvariables. We refer the reader to the book of Koller andFriedman [29] for more information on graphical models.

A framework based on graphical models can solve theproblem of distributed data association in synchronizedsensor networkswith overlapped areas andwhere each sensorreceives noisy measurements; this solution was proposedby Chen et al. [30, 31]. Their work is based on graphicalmodels that are used to represent the statistical dependencebetween random variables. The data association problem istreated as an inference problem and solved by using themax-product algorithm [32]. Graphical models representstatistical dependencies between variables as graphs, andthe max-product algorithm converges when the graph isa tree structure. Moreover, the employed algorithm couldbe implemented in a distributed manner by exchangingmessages between the source nodes in parallel. With thisalgorithm, if each sensor has possible combinations ofassociations and there are variables to be estimated, it hasa complexity of (2), which is reasonable and less thanthe () complexity of the MHT-D algorithm. However,aspecial attention must be given to the correlated variableswhen building the graphical model.

4. State Estimation Methods

State estimation techniques aim to determine the state ofthe target under movement (typically the position) given

the observation or measurements. State estimation tech-niques are also known as tracking techniques. In their generalform, it is not guaranteed that the target observations arerelevant, which means that some of the observations couldactually come from the target and others could be only noise.The state estimation phase is a common stage in data fusionalgorithms because the targets observation could come fromdifferent sensors or sources, and the final goal is to obtain aglobal target state from the observations.

The estimation problem involves finding the values of thevector state (e.g., position, velocity, and size) that fits as muchas possible with the observed data. From a mathematicalperspective, we have a set of redundant observations, andthe goal is to find the set of parameters that provides thebest fit to the observed data. In general, these observationsare corrupted by errors and the propagation of noise in themeasurement process. State estimation methods fall underlevel 1 of the JDL classification and could be divided into twobroader groups:

(1) linear dynamics and measurements: here, the esti-mation problem has a standard solution. Specifically,when the equations of the object state and the mea-surements are linear, the noise follows the Gaussiandistribution, and we do not refer to it as a clutterenvironment; in this case, the optimal theoreticalsolution is based on the Kalman filter;

(2) nonlinear dynamics: the state estimation problembecomes difficult, and there is not an analytical solu-tion to solve the problem in a generalmanner. In prin-ciple, there are no practical algorithms available tosolve this problem satisfactorily.

Most of the state estimationmethods are based on controltheory and employ the laws of probability to compute avector state from a vector measurement or a stream of vectormeasurements. Next, the most common estimation methodsare presented, including maximum likelihood and maxi-mum posterior (Section 4.1), the Kalman filter (Section 4.2),particle filter (Section 4.3), the distributed Kalman filter(Section 4.4), distributed particle filter (Section 4.5) and,covariance consistency methods (Section 4.6).

4.1. Maximum Likelihood andMaximum Posterior. Themax-imum likelihood (ML) technique is an estimation methodthat is based on probabilistic theory. Probabilistic estimationmethods are appropriate when the state variable follows anunknown probability distribution [33]. In the context ofdata fusion, is the state that is being estimated, and =((1), . . . , ()) is a sequence of previous observations of. The likelihood function () is defined as a probabilitydensity function of the sequence of observations given thetrue value of the state . Consider

() = ( | ) . (20)

The ML estimator finds the value of that maximizes thelikelihood function:

() = argmax ( | ) , (21)


which can be obtained from the analytical or empiricalmodels of the sensors.This function expresses the probabilityof the observed data. The main disadvantage of this methodin practice is that it requires the analytical or empirical modelof the sensor to be known to provide the prior distributionand compute the likelihood function. This method can alsosystematically underestimate the variance of the distribution,which leads to a bias problem. However, the bias of the MLsolution becomes less significant as the number of datapoints increases and is equal to the true variance of thedistribution that generated the data at the limit .

The maximum posterior (MAP) method is based on theBayesian theory. It is employed when the parameter tobe estimated is the output of a random variable that has aknown probability density function (). In the context ofdata fusion, is the state that is being estimated and =((1), . . . , ()) is a sequence of previous observations of .The MAP estimator finds the value of that maximizes theposterior probability distribution as follows:

() = argmax ( | ) . (22)

Both methods (ML andMAP) aim to find the most likelyvalue for the state . However, ML assumes that is a fixedbut an unknown point from the parameter space, whereasMAP considers to be the output of a random variable witha known a priori probability density function. Both of thesemethods are equivalent when there is no a priori informationabout , that is, when there are only observations.

4.2.The Kalman Filter. TheKalman filter is the most popularestimation technique. It was originally proposed by Kalman[34] and has been widely studied and applied since then. TheKalman filter estimates the state of a discrete time processgoverned by the following space-time model:

( + 1) = () () + () () + () (23)

with the observations ormeasurements at time of the state represented by

() = () () + V () , (24)

where () is the state transition matrix, () is the inputmatrix transition, () is the input vector, () is themeasurement matrix, and and V are the random Gaussianvariables with zero mean and covariance matrices of ()and (), respectively. Based on the measurements and onthe system parameters, the estimation of (), which isrepresented by (), and the prediction of ( + 1), whichis represented by ( + 1 | ), are given by the following:

() = ( | + 1) + () [ () () ( | 1)] ,

( + 1 | ) = () ( | ) + () () ,

(25)

respectively, where is the filter gain determined by

() = ( | 1)()

[ () ( | 1)() + ()]

1

,

(26)

where ( | 1) is the prediction covariance matrix andcan be determined by

( + 1 | ) = () ()() + () (27)

with

() = ( | 1) () () ( | 1) . (28)

The Kalman filter is mainly employed to fuse low-leveldata. If the system could be described as a linear model andthe error could be modeled as the Gaussian noise, then therecursive Kalman filter obtains optimal statistical estimations[35]. However, othermethods are required to address nonlin-ear dynamicmodels and nonlinearmeasurements.Themodi-fied Kalman filter known as the extended Kalman filter (EKF)is an optimal approach for implementing nonlinear recursivefilters [36]. The EKF is one of the most often employedmethods for fusing data in robotic applications. However,it has some disadvantages because the computations of theJacobians are extremely expensive. Some attempts have beenmade to reduce the computational cost, such as linearization,but these attempts introduce errors in the filter and make itunstable.

The unscented Kalman filter (UKF) [37] has gainedpopularity, because it does not have the linearization step andthe associated errors of the EKF [38]. The UKF employs adeterministic sampling strategy to establish theminimum setof points around the mean. This set of points captures thetrue mean and covariance completely. Then, these points arepropagated through nonlinear functions, and the covarianceof the estimations can be recuperated. Another advantage ofthe UKF is its ability to be employed in parallel implementa-tions.

4.3. Particle Filter. Particle filters are recursive implemen-tations of the sequential Monte Carlo methods [39]. Thismethod builds the posterior density function using severalrandom samples called particles. Particles are propagatedover time with a combination of sampling and resamplingsteps. At each iteration, the sampling step is employed todiscard some particles, increasing the relevance of regionswith a higher posterior probability. In the filtering process,several particles of the same state variable are employed,and each particle has an associated weight that indicatesthe quality of the particle. Therefore, the estimation is theresult of a weighted sum of all of the particles. The standardparticle filter algorithm has two phases: (1) the predictingphase and (2) the updating phase. In the predicting phase,each particle is modified according to the existing modeland accounts for the sum of the random noise to simulatethe noise effect. Then, in the updating phase, the weight ofeach particle is reevaluated using the last available sensorobservation, and particles with lower weights are removed.Specifically, a generic particle filter comprises the followingsteps.


(1) Initialization of the particles:

(i) let be equal to the number of particles;(ii) ()(1) = [(1), (1), 0, 0] for = 1, . . . , .

(2) Prediction step:

(i) for each particle = 1, . . . , , evaluate the state( + 1 | ) of the system using the state at timeinstant with the noise of the system at time .Consider

()( + 1 | ) = ()

()()

+ (cauchy-distribution-noise)(),

(29)

where () is the transition matrix of the sys-tem.

(3) Evaluate the particle weight. For each particle =1, . . . , :

(i) compute the predicted observation state of thesystem using the current predicted state and thenoise at instant . Consider

()( + 1 | ) = ( + 1)

()( + 1 | )

+ (gaussian-measurement-noise)(+1);

(30)

(ii) compute the likelihood (weights) according tothe given distribution. Consider

likelihood() = (() ( + 1 | ) ; () ( + 1) , var) ; (31)

(iii) normalize the weights as follows

()=

likelihood()

=1likelihood()

. (32)

(4) Resampling/Selection: multiply particles with higherweights and remove those with lower weights. Thecurrent state must be adjusted using the computedweights of the new particles.

(i) Compute the cumulative weights. Consider

Cum Wt() =

=1

(). (33)

(ii) Generate uniform distributed random variablesfrom () (0, 1) with the number of stepsequal to the number of particles.

(iii) Determine which particles should be multipliedand which ones removed.

(5) Propagation phase:

(i) incorporate the new values of the state after theresampling of instant to calculate the value atinstant + 1. Consider

(1:)( + 1 | + 1) = ( + 1 | ) ; (34)

(ii) compute the posterior mean. Consider

( + 1) = mean [ ( + 1 | + 1)] , = 1, . . . , ; (35)

(iii) repeat steps 2 to 5 for each time instant.

Particle filters are more flexible than the Kalman filtersand can copewith nonlinear dependencies and non-Gaussiandensities in the dynamic model and in the noise error.However, they have some disadvantages. A large numberof particles are required to obtain a small variance in theestimator. It is also difficult to establish the optimal number ofparticles in advance, and the number of particles affects thecomputational cost significantly. Earlier versions of particlefilters employed a fixed number of particles, but recent studieshave started to use a dynamic number of particles [40].

4.4. The Distributed Kalman Filter. The distributed Kalmanfilter requires a correct clock synchronization between eachsource, as demonstrated in [41]. In other words, to correctlyuse the distributed Kalman filter, the clocks from all ofthe sources must be synchronized. This synchronization istypically achieved through using protocols that employ ashared global clock, such as the network time protocol (NTP).Synchronization problems between clocks have been shownto have an effect on the accuracy of the Kalman filter,producing inaccurate estimations [42].

If the estimations are consistent and the cross covarianceis known (or the estimations are uncorrelated), then it ispossible to use the distributed Kalman filters [43]. However,the cross covariance must be determined exactly, or theobservations must be consistent.

We refer the reader to Liggins II et al. [20] formore detailsabout the Kalman filter in a distributed and hierarchicalarchitecture.

4.5. Distributed Particle Filter. Distributed particle filtershave gained attention recently [4446]. Coates [45] used adistributed particle filter to monitor an environment thatcould be captured by the Markovian state-space model,involving nonlinear dynamics and observations and non-Gaussian noise.

In contrast, earlier attempts to solve out-of-sequencemeasurements using particle filters are based on regeneratingthe probability density function to the time instant of theout-of-sequence measurement [47]. In a particle filter, thisstep requires a large computational cost, in addition to thenecessary space to store the previous particles. To avoidthis problem, Orton and Marrs [48] proposed to store theinformation on the particles at each time instant, saving thecost of recalculating this information. This technique is close


to optimal, and when the delay increases, the result is onlyslightly affected [49].However, it requires a very large amountof space to store the state of the particles at each time instant.

4.6. Covariance Consistency Methods: Covariance Intersec-tion/Union. Covariance consistency methods (intersectionand union) were proposed by Uhlmann [43] and are generaland fault-tolerant frameworks for maintaining covariancemeans and estimations in a distributed network.These meth-ods do not comprise estimation techniques; instead, they aresimilar to an estimation fusion technique. The distributedKalman filter requirement of independent measurements orknown cross-covariances is not a constraintwith thismethod.

4.6.1. Covariance Intersection. If the Kalman filter is employ-ed to combine two estimations, (

1, 1) and (

2, 2), then it

is assumed that the joint covariance is in the following form:

[

[

1

2

]

]

, (36)

where the cross-covariance should be known exactly sothat the Kalman filter can be applied without difficulty.Because the computation of the cross-covariances is compu-tationally intensive, Uhlmann [43] proposed the covarianceintersection (CI) algorithm.

Let us assume that a joint covariance can be definedwith the diagonal blocks

1> 1and

2> 2. Consider

[

[

1

2

]

]

(37)

for every possible instance of the unknown cross-covariance; then, the components of the matrix could be employedin the Kalman filter equations to provide a fused estimation(, ) that is considered consistent. The key point of thismethod relies on generating a joint covariance matrix thatcan represent a useful fused estimation (in this context, usefulrefers to something with a lower associated uncertainty). Insummary, the CI algorithm computes the joint covariancematrix , where the Kalman filter provides the best fusedestimation (, ) with respect to a fixed measurement of thecovariance matrix (i.e., the minimum determinant).

Specific covariance criteria must be established becausethere is not a specific minimum joint covariance in theorder of the positive semidefinite matrices. Moreover, thejoint covariance is the basis of the formal analysis of theCI algorithm; the actual result is a nonlinear mixture of theinformation stored on the estimations being fused, followingthe following equation.

= (1

11

11+ 2

21

22+ +

1

)1

,

= (1

11

11+ 2

21

22+ +

1

)1

,

(38)

where is the transformation of the fused state-space

estimation to the space of the estimated state . The values

of can be calculated to minimize the covariance determi-nant using convex optimization packages and semipositivematrix programming. The result of the CI algorithm hasdifferent characteristics compared to the Kalman filter. Forexample, if two estimations are provided (, ) and (, )and their covariances are equal = , since the Kalmanfilter is based on the statistical independence assumption, itproduces a fused estimation with covariance = (1/2).In contrast, the CI method does not assume independenceand, thus, must be consistent even in the case in whichthe estimations are completely correlated, with the estimatedfused covariance = . In the case of estimations where < , the CI algorithm does not provide information aboutthe estimation (, ); thus, the fused result is (, ).

Every joint-consistent covariance is sufficient to producea fused estimation, which guarantees consistency. However,it is also necessary to guarantee a lack of divergence. Diver-gence is avoided in the CI algorithm by choosing a specificmeasurement (i.e., the determinant), which is minimized ineach fusion operation. This measurement represents a non-divergence criterion, because the size of the estimated covari-ance according to this criterion would not be incremented.

The application of the CI method guarantees consis-tency and nondivergence for every sequence of mean andcovariance-consistent estimations. However, this methoddoes not work well when the measurements to be fused areinconsistent.

4.6.2. Covariance Union. CI solves the problem of correlatedinputs but not the problemof inconsistent inputs (inconsistentinputs refer to different estimations, each of which has ahigh accuracy (small variance) but also a large differencefrom the states of the others); thus, the covariance union(CU) algorithm was proposed to solve the latter [43]. CUaddresses the following problem: two estimations (

1, 1)

and (2, 2) relate to the state of an object and are mutually

inconsistent from one another. This issue arises when thedifference between the average estimations is larger thanthe provided covariance. Inconsistent inputs can be detectedusing the Mahalanobis distance [50] between them, which isdefined as

= (1 2)

(1+ 2)1

(1 2) , (39)

and detecting whether this distance is larger than a giventhreshold.

The Mahalanobis distance accounts for the covarianceinformation to obtain the distance. If the difference betweenthe estimations is high but their covariance is also high,the Mahalanobis distance yields a small value. In contrast,if the difference between the estimations is small and thecovariances are small, it could produce a larger distancevalue. A high Mahalanobis distance could indicate that theestimations are inconsistent; however, it is necessary tohave a specific threshold established by the user or learnedautomatically.

The CU algorithm aims to solve the following prob-lem: let us suppose that a filtering algorithm provides twoobservations withmean and covariance (

1, 1) and (

2, 2),


respectively. It is known that one of the observations is correctand the other is erroneous. However, the identity of thecorrect estimation is unknown and cannot be determined.In this situation, if both estimations are employed as aninput to the Kalman filter, there will be a problem, becausethe Kalman filter only guarantees a consistent output if theobservation is updated with a measurement consistent withboth of them. In the specific case, in which themeasurementscorrespond to the same object but are acquired from twodifferent sensors, the Kalman filter can only guarantee thatthe output is consistent if it is consistent with both separately.Because it is not possible to knowwhich estimation is correct,the only way to combine the two estimations rigorously isto provide an estimation (, ) that is consistent with bothestimations and to obey the following properties:

1+ (

1) (

1)

,

2+ (

2) (

2)

,

(40)

where somemeasurement of thematrix size (i.e., the deter-minant) is minimized.

In other words, the previous equations indicate that if theestimation (

1, 1) is consistent, then the translation of the

vector 1to requires to increase the covariance by the sum

of a matrix at least as big as the product of (1) in order to

be consistent.The same situation applies to the measurement(2, 2) in order to be consistent.

A simple strategy is to choose the mean of the estimationas the input value of one of themeasurements ( =

1). In this

case, the value of must be chosen, such that the estimationis consistent with the worst case (the correct measurement is2). However, it is possible to assign an intermediate value

between 1and

2to decrease the value of . Therefore, the

CU algorithm establishes the mean fused value that hasthe least covariance but is sufficiently large for the twomeasurements (

1and

2) for consistency.

Because the matrix inequalities presented in previousequations are convex, convex optimization algorithms mustbe employed to solve them. The value of can be computedwith the iterative method described by Julier et al. [51].The obtained covariance could be significantly larger thanany of the initial covariances and is an indicator of theexisting uncertainty between the initial estimations. One ofthe advantages of the CU method arises from the fact thatthe same process could be easily extended to inputs.

5. Decision Fusion Methods

A decision is typically taken based on the knowledge of theperceived situation, which is provided by many sources inthe data fusion domain. These techniques aim to make ahigh-level inference about the events and activities that areproduced from the detected targets. These techniques oftenuse symbolic information, and the fusion process requires toreasonwhile accounting for the uncertainties and constraints.These methods fall under level 2 (situation assessment) andlevel 4 (impact assessment) of the JDL data fusion model.

5.1. The Bayesian Methods. Information fusion based on theBayesian inference provides a formalism for combining evi-dence according to the probability theory rules. Uncertaintyis represented using the conditional probability terms thatdescribe beliefs and take on values in the interval [0, 1], wherezero indicates a complete lack of belief and one indicates anabsolute belief. The Bayesian inference is based on the Bayesrule as follows:

( | ) = ( | ) ()

(), (41)

where the posterior probability, ( | ), represents thebelief in the hypothesis given the information . Thisprobability is obtained by multiplying the a priori probabilityof the hypothesis () by the probability of having giventhat is true, ( | ). The value () is used as anormalizing constant.Themain disadvantage of the Bayesianinference is that the probabilities () and ( | ) mustbe known. To estimate the conditional probabilities, Panet al. [52] proposed the use of NNs, whereas Coue et al. [53]proposed the Bayesian programming.

Hall and Llinas [54] described the following problemsassociated with Bayesian inference.

(i) Difficulty in establishing the value of a priori proba-bilities.

(ii) Complexity when there are multiple potential hypo-theses and a substantial number of events that dependon the conditions.

(iii) The hypothesis should be mutually exclusive.

(iv) Difficulty in describing the uncertainty of the deci-sions.

5.2. The Dempster-Shafer Inference. The Dempster-Shaferinference is based on the mathematical theory introducedby Dempster [55] and Shafer [56], which generalizes theBayesian theory. The Dempster-Shafer theory provides aformalism that could be used to represent incomplete knowl-edge, updating beliefs, and a combination of evidence andallows us to represent the uncertainty explicitly [57].

A fundamental concept in the Dempster-Shafer reason-ing is the frame of discernment, which is defined as follows.Let = {

1, 2, . . . ,

} be the set of all possible states

that define the system, and let be exhaustive and mutuallyexclusive due to the system being only in one state

,

where 1 . The set is called a frame of discernment,because its elements are employed to discern the current stateof the system.

The elements of the set 2 are called hypotheses. Inthe Dempster-Shafer theory, based on the evidence , aprobability is assigned to each hypothesis 2 accordingto the basic assignment of probabilities or the mass function : 2

[0.1], which satisfies

(0) = 0. (42)


Thus, themass function of the empty set is zero. Furthermore,the mass function of a hypothesis is larger than or equal tozero for all of the hypotheses. Consider

() 0, 2. (43)

The sum of the mass function of all the hypotheses is one.Consider

2

() = 1. (44)

To express incomplete beliefs in a hypothesis , the Demp-ster-Shafer theory defines the belief function bel : 2 [0, 1] over as

bel () =

() , (45)

where bel(0) = 0, and bel() = 1.The doubt level in can beexpressed in terms of the belief function by

dou () = bel () =

() . (46)

To express the plausibility of each hypothesis, the functionpl : 2 [0, 1] over is defined as

pl () = 1 dou () = =0

() . (47)

Intuitive plausibility indicates that there is less uncer-tainty in hypothesis if it is more plausible. The confidenceinterval [bel(), pl()] defines the true belief in hypothesis. To combine the effects of the two mass functions

1and

2, the Dempster-Shafer theory defines a rule

1 2as

1 2(0) = 0,

1 2() =

=

1()

2()

1 =0

1()

2().

(48)

In contrast to the Bayesian inference, a priori probabilitiesare not required in the Dempster-Shafer inference, becausethey are assigned at the instant that the information is pro-vided. Several studies in the literature have compared the useof the Bayesian inference and the Dempster-Shafer inference,such as [5860]. Wu et al. [61] used the Dempster-Shafertheory to fuse information in context-aware environments.This work was extended in [62] to dynamically modify theassociated weights to the sensor measurements. Therefore,the fusion mechanism is calibrated according to the recentmeasurements of the sensors (in cases in which the ground-truth is available). In themilitary domain [63], theDempster-Shafer reasoning is used with the a priori information storedin a database for classifying military ships. Morbee et al. [64]described the use of the Dempster-Shafer theory to build 2Doccupancy maps from several cameras and to evaluate thecontribution of subsets of cameras to a specific task. Each taskis the observation of an event of interest, and the goal is toassess the validity of a set of hypotheses that are fused usingthe Dempster-Shafer theory.

5.3. Abductive Reasoning. Abductive reasoning, or inferringthe best explanation, is a reasoning method in which ahypothesis is chosen under the assumption that in case itis true, it explains the observed event most accurately [65].In other words, when an event is observed, the abductionmethod attempts to find the best explanation.

In the context of probabilistic reasoning, abductive infer-ence finds the posterior ML of the system variables givensome observed variables. Abductive reasoning is more areasoning pattern than a data fusion technique. Therefore,different inference methods, such as NNs [66] or fuzzy logic[67], can be employed.

5.4. Semantic Methods. Decision fusion techniques thatemploy semantic data fromdifferent sources as an input couldprovide more accurate results than those that rely on onlysingle sources. There is a growing interest in techniques thatautomatically determine the presence of semantic features invideos to solve the semantic gap [68].

Semantic information fusion is essentially a scheme inwhich raw sensor data are processed such that the nodesexchange only the resultant semantic information. Semanticinformation fusion typically covers two phases: (i) build-ing the knowledge and (ii) pattern matching (inference).The first phase (typically offline) incorporates the mostappropriate knowledge into semantic information. Then, thesecond phase (typically online or in real-time) fuses relevantattributes and provides a semantic interpretation of thesensor data [6971].

Semantic fusion could be viewed as an idea for integratingand translating sensor data into formal languages. Therefore,the obtained resulting language from the observations ofthe environment is compared with similar languages thatare stored in the database. The key of this strategy is thatsimilar behaviors represented by formal languages are alsosemantically similar. This type of method provides savingsin the cost of transmission, because the nodes need onlytransmit the formal language structure instead of the rawdata. However, a known set of behaviors must be storedin a database in advance, which might be difficult in somescenarios.

6. Conclusions

This paper reviews the most popular methods and tech-niques for performing data/information fusion. To determinewhether the application of data/information fusion methodsis feasible, we must evaluate the computational cost of theprocess and the delay introduced in the communication.A centralized data fusion approach is theoretically optimalwhen there is no cost of transmission and there are sufficientcomputational resources. However, this situation typicallydoes not hold in practical applications.

The selection of the most appropriate technique dependson the type of the problem and the established assumptionsof each technique. Statistical data fusion methods (e.g., PDA,JPDA, MHT, and Kalman) are optimal under specific condi-tions [72]. First, the assumption that the targets are moving


independently and the measurements are normally dis-tributed around the predicted position typically does nothold. Second, because the statistical techniques model allof the events as probabilities, they typically have severalparameters and a priori probabilities for false measurementsand detection errors that are often difficult to obtain (atleast in an optimal sense). For example, in the case of theMHT algorithm, specific parameters must be established thatare nontrivial to determine and are very sensitive [73]. Incontrast, statisticalmethods that optimize over several framesare computationally intensive, and their complexity typicallygrows exponentially with the number of targets. For example,in the case of particle filters, tracking several targets can beaccomplished jointly as a group or individually. If severaltargets are tracked jointly, the necessary number of particlesgrows exponentially. Therefore, in practice, it is better toperform tracking on them individually, with the assumptionthat targets do not interact between the particles.

In contrast to centralized systems, the distributed datafusion methods introduce some challenges in the data fusionprocess, such as (i) spatial and temporal alignments of theinformation, (ii) out-of-sequence measurements, and (iii)data correlation reported by Castanedo et al. [74, 75]. Theinherent redundancy of the distributed systems could beexploited with distributed reasoning techniques and cooper-ative algorithms to improve the individual node estimationsreported by Castanedo et al. [76]. In addition to the previousstudies, a new trend based on the geometric notion of a low-dimensional manifold is gaining attention in the data fusioncommunity. An example is the work of Davenport et al. [77],which proposes a simple model that captures the correlationbetween the sensor observations by matching the parametervalues for the different obtained manifolds.

Acknowledgments

The author would like to thank Jesus Garca, Miguel A.Patricio, and James Llinas for their interesting and relateddiscussions on several topics that were presented in thispaper.

References

[1] JDL, Data Fusion Lexicon. Technical Panel For C3, F.E. White,San Diego, Calif, USA, Code 420, 1991.

[2] D. L. Hall and J. Llinas, An introduction to multisensor datafusion, Proceedings of the IEEE, vol. 85, no. 1, pp. 623, 1997.

[3] H. F. Durrant-Whyte, Sensor models and multisensor integra-tion, International Journal of Robotics Research, vol. 7, no. 6, pp.97113, 1988.

[4] B. V. Dasarathy, Sensor fusion potential exploitation-inno-vative architectures and illustrative applications, Proceedings ofthe IEEE, vol. 85, no. 1, pp. 2438, 1997.

[5] R. C. Luo, C.-C. Yih, and K. L. Su, Multisensor fusion andintegration: approaches, applications, and future research direc-tions, IEEE Sensors Journal, vol. 2, no. 2, pp. 107119, 2002.

[6] J. Llinas, C. Bowman, G. Rogova, A. Steinberg, E. Waltz, andF. White, Revisiting the JDL data fusion model II, TechnicalReport, DTIC Document, 2004.

[7] E. P. Blasch and S. Plano, JDL level 5 fusionmodel user refine-ment issues and applications in group tracking, in Proceedingsof the Signal Processing, Sensor Fusion, and Target RecognitionXI, pp. 270279, April 2002.

[8] H. F. Durrant-Whyte and M. Stevens, Data fusion in decen-tralized sensing networks, in Proceedings of the 4th Interna-tional Conference on Information Fusion, pp. 302307,Montreal,Canada, 2001.

[9] J. Manyika and H. Durrant-Whyte, Data Fusion and SensorManagement: A Decentralized Information-Theoretic Approach,Prentice Hall, Upper Saddle River, NJ, USA, 1995.

[10] S. S. Blackman, Association and fusion ofmultiple sensor data,inMultitarget-Multisensor: Tracking Advanced Applications, pp.187217, Artech House, 1990.

[11] S. Lloyd, Least squares quantization in pcm, IEEETransactionson Information Theory, vol. 28, no. 2, pp. 129137, 1982.

[12] M. Shindler, A. Wong, and A. Meyerson, Fast and accurate-means for large datasets, in Proceedings of the 25th AnnualConference on Neural Information Processing Systems (NIPS 11),pp. 23752383, December 2011.

[13] Y. Bar-Shalom and E. Tse, Tracking in a cluttered environmentwith probabilistic data association, Automatica, vol. 11, no. 5,pp. 451460, 1975.

[14] T. E. Fortmann, Y. Bar-Shalom, and M. Scheffe, Multi-targettracking using joint probabilistic data association, in Pro-ceedings of the 19th IEEE Conference on Decision and Controlincluding the Symposium on Adaptive Processes, vol. 19, pp. 807812, December 1980.

[15] D. B. Reid, An algorithm for tracking multiple targets, IEEETransactions on Automatic Control, vol. 24, no. 6, pp. 843854,1979.

[16] C. L. Morefield, Application of 0-1 integer programming tomultitarget tracking problems, IEEETransactions onAutomaticControl, vol. 22, no. 3, pp. 302312, 1977.

[17] R. L. Streit and T. E. Luginbuhl, Maximum likelihood methodfor probabilistic multihypothesis tracking, in Proceedings of theSignal and Data Processing of Small Targets, vol. 2235 of Pro-ceedings of SPIE, p. 394, 1994.

[18] I. J. Cox and S. L. Hingorani, Efficient implementation of Reidsmultiple hypothesis tracking algorithm and its evaluation forthe purpose of visual tracking, IEEE Transactions on PatternAnalysis and Machine Intelligence, vol. 18, no. 2, pp. 138150,1996.

[19] K. G. Murty, An algorithm for ranking all the assignments inorder of increasing cost, Operations Research, vol. 16, no. 3, pp.682687, 1968.

[20] M. E. Liggins II, C.-Y. Chong, I. Kadar et al., Distributed fusionarchitectures and algorithms for target tracking, Proceedings ofthe IEEE, vol. 85, no. 1, pp. 95106, 1997.

[21] S. Coraluppi, C. Carthel, M. Luettgen, and S. Lynch, All-source track and identity fusion, in Proceedings of the NationalSymposium on Sensor and Data Fusion, 2000.

[22] P. Storms and F. Spieksma, An lp-based algorithm for the dataassociation problem in multitarget tracking, in Proceedings ofthe 3rd IEEE International Conference on Information Fusion,vol. 1, 2000.

[23] S.-W. Joo and R. Chellappa, A multiple-hypothesis approachfor multiobject visual tracking, IEEE Transactions on ImageProcessing, vol. 16, no. 11, pp. 28492854, 2007.

[24] S. Coraluppi andC. Carthel, Aggregate surveillance: a cardinal-ity tracking approach, in Proceedings of the 14th InternationalConference on Information Fusion (FUSION 11), July 2011.


[25] K. C. Chang, C. Y. Chong, and Y. Bar-Shalom, Joint proba-bilistic data association in distributed sensor networks, IEEETransactions on Automatic Control, vol. 31, no. 10, pp. 889897,1986.

[26] Y. Chong, S. Mori, and K. C. Chang, Information lusion indistributed sensor networks, in Proceedings of the 4th AmericanControl Conference, Boston, Mass, USA, June 1985.

[27] Y. Chong, S. Mori, and K. C. Chang, Distributed multitar-get multisensor tracking, in Multitarget-Multisensor Tracking:Advanced Applications, vol. 1, pp. 247295, 1990.

[28] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networksof Plausible Inference, Morgan Kaufmann, San Mateo, Calif,USA, 1988.

[29] Koller and N. Friedman, Probabilistic Graphical Models: Princi-ples and Techniques, MIT press, 2009.

[30] L. Chen, M. Cetin, and A. S. Willsky, Distributed data associ-ation for multi-target tracking in sensor networks, in Proceed-ings of the 7th International Conference on Information Fusion(FUSION 05), pp. 916, July 2005.

[31] L. Chen, M. J. Wainwright, M. Cetin, and A. S. Willsky, Dataassociation based on optimization in graphical models withapplication to sensor networks, Mathematical and ComputerModelling, vol. 43, no. 9-10, pp. 11141113, 2006.

[32] Y. Weiss and W. T. Freeman, On the optimality of solutionsof the max-product belief-propagation algorithm in arbitrarygraphs, IEEE Transactions on InformationTheory, vol. 47, no. 2,pp. 736744, 2001.

[33] C. Brown, H. Durrant-Whyte, J. Leonard, B. Rao, and B. Steer,Distributed data fusion using Kalman filtering: a roboticsapplication, in Data, Fusion in Robotics and Machine Intelli-gence, M. A. Abidi and R. C. Gonzalez, Eds., pp. 267309, 1992.

[34] R. E. Kalman, A new approach to linear filtering and predictionproblems, Journal of Basic Engineering, vol. 82, no. 1, pp. 3545,1960.

[35] R. C. Luo and M. G. Kay, Data fusion and sensor integration:state-of-the-art 1990s, in Data Fusion in Robotics and MachineIntelligence, pp. 7135, 1992.

[36] Welch and G. Bishop, An Introduction to the Kalman Filter,ACM SIC-CRAPH, 2001 Course Notes, 2001.

[37] S. J. Julier and J. K. Uhlmann, A new extension of the Kalmanfilter to nonlinear systems, in Proceedings of the InternationalSymposium on Aerospace/Defense Sensing, Simulation and Con-trols, vol. 3, 1997.

[38] A. Wan and R. Van Der Merwe, The unscented kalman filterfor nonlinear estimation, in Proceedings of the Adaptive Systemsfor Signal Processing, Communications, and Control Symposium(AS-SPCC 00), pp. 153158, 2000.

[39] D. Crisan and A. Doucet, A survey of convergence results onparticle filtering methods for practitioners, IEEE Transactionson Signal Processing, vol. 50, no. 3, pp. 736746, 2002.

[40] J. Martinez-del Rincon, C. Orrite-Urunuela, and J. E. Herrero-Jaraba, An efficient particle filter for color-based tracking incomplex scenes, in Proceedings of the IEEE Conference onAdvancedVideo and Signal Based Surveillance, pp. 176181, 2007.

[41] S. Ganeriwal, R. Kumar, and M. B. Srivastava, Timing-syncprotocol for sensor networks, in Proceedings of the 1st Inter-national Conference on Embedded Networked Sensor Systems(SenSys 03), pp. 138149, November 2003.

[42] M. Manzo, T. Roosta, and S. Sastry, Time synchronization innetworks, in Proceedings of the 3rd ACMWorkshop on Securityof Ad Hoc and Sensor Networks (SASN 05), pp. 107116,November 2005.

[43] J. K. Uhlmann, Covariance consistency methods for fault-tolerant distributed data fusion, Information Fusion, vol. 4, no.3, pp. 201215, 2003.

[44] S. Bashi, V. P. Jilkov, X. R. Li, and H. Chen, Distributed imple-mentations of particle filters, in Proceedings of the 6th Interna-tional Conference of Information Fusion, pp. 11641171, 2003.

[45] M. Coates, Distributed particle filters for sensor networks, inProceedings of the 3rd International symposium on InformationProcessing in Sensor Networks (ACM 04), pp. 99107, New York,NY, USA, 2004.

[46] D. Gu, Distributed particle filter for target tracking, in Pro-ceedings of the IEEE International Conference on Robotics andAutomation (ICRA 07), pp. 38563861, April 2007.

[47] Y. Bar-Shalom, Update with out-of-sequencemeasurements intracking: exact solution, IEEE Transactions on Aerospace andElectronic Systems, vol. 38, no. 3, pp. 769778, 2002.

[48] M. Orton and A. Marrs, A Bayesian approach to multi-targettracking and data fusionwithOut-of-SequenceMeasurements,IEE Colloquium, no. 174, pp. 15/115/5, 2001.

[49] M. L. Hernandez, A. D. Marrs, S. Maskell, and M. R. Orton,Tracking and fusion for wireless sensor networks, in Proceed-ings of the 5th International Conference on Information Fusion,2002.

[50] P. C. Mahalanobis, On the generalized distance in statistics,Proceedings National Institute of ScienceIndia, vol. 2, no. 1, pp.4955, 1936.

[51] S. J. Julier, J. K. Uhlmann, and D. Nicholson, A methodfor dealing with assignment ambiguity, in Proceedings of theAmerican Control Conference (AAC 04), vol. 5, pp. 41024107,July 2004.

[52] H. Pan, Z.-P. Liang, T. J. Anastasio, and T. S. Huang, HybridNN-Bayesian architecture for information fusion, in Proceed-ings of the International Conference on Image Processing (ICIP98), pp. 368371, October 1998.

[53] C. Coue, T. Fraichard, P. Bessie`re, and E. Mazer, Multi-sensordata fusion using Bayesian programming: an automotive appli-cation, in P