Lecture5 - Linköping Universityusers.isy.liu.se/rt/fredrik/edu/sensorfusion/lecture5.pdf · Sensor Fusion, 2014 Lecture 5: 1 Lecture5: Whiteboard: I DerivationframeworkforKF,EKF,UKF

Sensor Fusion, 2014 Lecture 5: 1

Lecture 5:

Whiteboard:I Derivation framework for KF, EKF, UKF

Slides:I Kalman filter summary: main equations, robustness, sensitivity,

divergence monitoring, user aspectsI Nonlinear transforms revisitedI Application to derivation of EKF and UKFI User guidelines and interpretations


Lecture 4: SummaryI Detection problems as hypothesis tests:

H0 : y = e,H1 : y = x + e = h(x) + e.

I Neyman-Pearson’s lemma: T (y) = pe(y − h(x0))/pe(y)maximizes PD for given PFA (best ROC curve).

I In general case

T (y) = 2 log pe(y − h(xML))− 2 log pe(y) ∈ χ2dim(x)(x

0,TI(xo)xo).

I Bayes optimal filter

p(xk |y1:k) ∝ pek (yk − h(xk))p(xk |y1:k−1)

p(xk+1|y1:k) =

∫Rnx

pvk (xk+1 − f (xk))p(xk |y1:k) dxk

I Intuitive work flow of nonlinear filter:I MU: estimation from yk = h(xk) + ek and fusion with xk|k−1I TU: nonlinear transformation z = f (xk) and diffusion from

xk−1 = zk + vk


Chapter 7 overview

Kalman filterI Algorithms and derivationI Practical issuesI Computational aspectsI Filter monitoring

The discussion and conclusions do usually apply to all nonlinearfilters, though it is more concrete in the linear Gaussian case


The Kalman filter

Time-varying state space model:

xk+1 = Fkxk + Gkvk , Cov(vk) = Qk

yk = Hkxk + ek , Cov(ek) = Rk

Time update:

xk+1|k = Fk xk|k

Pk+1|k = FkPk|kFTk + GkQkGT

k

Measurement update:

xk|k = xk|k−1 + Pk|k−1HTk (HkPk|k−1H

Tk + Rk)−1(yk − Hk xk|k−1)

Pk|k = Pk|k−1 − Pk|k−1HTk (HkPk|k−1H

Tk + Rk)−1HkPk|k−1.


ModificationsAuxiliary quantities: innovation εk , innovation covariance Sk andKalman gain Kk

εk = yk − Hk xk|k−1

Sk = HkPk|k−1HTk + Rk

Kk = Pk|k−1HTk (HkPk|k−1H

Tk + Rk)−1 = Pk|k−1H

Tk S−1

k

Filter form

xk|k = Fk−1xk−1|k−1 + Kk(yk − HkFk−1xk−1|k−1)

= (Fk−1 − KkHkFk−1)xk−1|k−1 + Kkyk ,

Predictor form

xk+1|k = Fk xk|k−1 + FkKk(yk − Hk xk|k−1)

= (Fk − FkKkHk)xk|k−1 + FkKkyk


Simulation example

Create a constant velocity model, simulate and Kalman filterT=0.5;A=[1 0 T 0; 0 1 0 T; 0 0 1 0; à

0 0 0 1];B=[T^2/2 0; 0 T^2/2; T 0; 0 T];C=[1 0 0 0; 0 1 0 0];R=0.03* eye(2);m=lss(A,[],C,[],B*B’,R,1/T);m.xlabel={’X’,’Y’,’vX’,’vY ’};m.ylabel={’X’,’Y’};m.name=’Constant velocity à

motion model ’;z=simulate(m,20);xhat1=kalman(m,z,’alg ’,1); % à

Stationaryxhat2=kalman(m,z,’alg ’,4); % à

Smootherxplot2(z,xhat1 ,xhat2 ,’conf ’,90,[1 à

2])


Covariance illustrated as confidence ellipsoids in 2D plots orconfidence bands in 1D plots.xplot(z,xhat1 ,xhat2 ,’conf ’,99)


Tuning the KFI The SNR ratio ‖Q‖/‖R‖ is the most crucial, it sets the filter

speeds. Note difference of real system and model used in theKF.

I Recommentation: fix R according to sensorspecification/performance, and tune Q (motion models areanyway subjective approximation of reality).

I High SNR in the model, gives fast filter that is quick inadapting to changes/maneuvers, but with larger uncertainty(small bias, large variance).

I Conversely, low SNR in the model, gives slow filter that is slowin adapting to changes/maneuvers, but with small uncertainty(large bias, small variance).

I P0 reflects the belief in the prior x1 ∈ N (x1|0,P0). Possible tochoose P0 very large (and x1|0 arbitrary), if no priorinformation exists.

I Tune covariances in large steps (order of magnitudes)!


Optimality properties

I For a linear model, the KF provides the WLS solutionI The KF is the best linear unbiased estimator (BLUE)I It is the Bayes optimal filter for linear model when x0, vk , ek

are Gaussian variables,

xk+1|y1:k ∈ N (xk+1|k ,Pk+1|k)

xk |y1:k ∈ N (xk|k ,Pk|k)

εk ∈ N (0, Sk)


Robustness and sensitivity

The following concepts are relevant for all filtering applications, butthey are most explicit for KF

I Observability.I Divergence tests: monitor performance measures and restart

the filter after divergence.I Outlier rejection: monitor sensor observations.I Bias error: incorrect model gives bias in estimates.I Sensitivity analysis: uncertain model contributes to the total

covariance.I Numerical issues: may give complex estimates.


Observability

1. Snapshot observability if Hk has full rank. WLS can be appliedto estimate x .

2. Classical observability for time-invariant and time/varying case,

O =

HHFHF 2

...HF n−1

Ok =

Hk−n+1

Hk−n+2Fk−n+1Hk−n+3Fk−n+2Fk−n+1

...HkFk−1 . . .Fk−n+1

.

3. The covariance matrix Pk|k extends the observability conditionby weighting with the measurement noise and to forget oldinformation according to the process noise. Thus, (thecondition number of) Pk|k is the natural indicator ofobservability!


Divergence tests

When is εkεTk significantly larger than its computed expected valueSk = E(εkε

Tk ) (note that εk ∈ N (0, Sk))?

Principal reasons:I Model errors.I Sensor model errors: offsets, drifts, incorrect covariances,

scaling factor in all covariances.I Sensor errors: outliers, missing dataI Numerical issues.

In the first two cases, the filter has to be redesigned.In the last two cases, the filter has to be restarted.


Outlier rejection

Let H0 : εk ∈ N (0, Sk), then

T (yk) =εTk S−1k εk ∼ χ2(dim(yk))

if everything works fine, and there is no outlier. If T (yk) > hPFA ,this is an indication of outlier, and the measurement update can beomitted.In the case of several sensors, each sensor i should be monitored foroutliers

T (y ik) =εi ,Tk S−1

k εik ∼ χ2(dim(y ik)).


Divergence monitoring

Related to outlier detection, but performance is monitored on alonger time horizon.One way to modify the chi2-test to a Gaussian test using thecentral limit theorem:

T =1N

N∑k=1

1dim(yk)

εTk S−1k εk ∼ N

(1,

2∑Nk=1 dim(yk)

),

If

(T − 1)

√√√√ N∑k=1

dim(yk)/2 > hPFA ,

filter divergence can be concluded, and the filter must be restarted.Instead of all data, a long sliding window or an exponential window(forgetting factor) can be used.


Sensitivity analysis: parameter uncertainty

Sensitivity analysis can be done with respect to uncertainparameters with known covariance matrix using for instance Gaussapproximation formula.Assume F (θ),G (θ),H(θ),Q(θ),R(θ) have uncertain parameters θwith E(θ) = θ and Cov(θ) = Pθ.The state estimate xk is a stochastic variable with four stochasticsources, vk , ek , x1 at one hand, and θ on the other hand. . The lawof total variance (Var(X ) = EVar(X |Y ) + VarE(X |Y )) and Gaussapproximation formula (Var(h(Y )) ≈ h′Y (Y )Var(Y )(h′Y (Y ))T )gives

Cov(xk|k) ≈ Pk|k +dxk|k

dθPθ

(dxk|k

dθ

)T

.

The gradient dxk|k/dθ can be computed numerically bysimulations.


Numerical issues

Some simple fixes if problem occurs:I Assure that the covariance matrix is symmetric

P=0.5*P+0.5*P’I Assure that the covariance matrix is positive definite by setting

negative eigenvalues in P to zero or small positive values.I Avoid singular R = 0, even for constraints.I Dithering. Increase Q and R if needed. This can account for

all kind of model errors.


EKF and UKF

Theory from Chapter 8:I Nonlinear transformations.I Details of the EKF algorithms.I Numerical methods to compute Jacobian and Hessian in the

Taylor expansion.I An alternative EKF version without the Ricatti equation.I The unscented Kalman filter (UKF).

Applications from Chapter 16:I Automotive applications: yaw rate, roll ange, friction

estimation.I Rocket application: integration of IMU and GPS.


Non-linear transformations

z = g(x) = g(x) + g ′(x)(x − x) +12

(x − x)Tg ′′(ξ)(x − x)︸︷︷︸r(x ;x ,g ′′(ξ))

,

The rest term is negligible and EKF works fine ifI the model is almost linear,I or the SNR is high, so ‖x − x‖ can be considered small.

The size of the rest term can be approximated a priori. Note: thesize may depend on the choice of state coordinates!If the rest term is large, use either of

I the second order compensated EKF that compensates for themean and covariance of r(x ; x , g ′′(ξ)) ≈ r(x ; x , g ′′(x)).

I the unscented KF (UKF).


TT1: first order Taylor approximation

Gauss’ approximation formula

x ∈ N(x ,P

)→N

(g(x),

[g ′i (x)P(g ′j (x))T

]ij

)= N

(g(x), g ′(x)P(g ′(x))T )

Here [A]ij means element i , j in the matrix A. This is used in EKF1(EKF with first order Taylor expansion). Leads to a KF wherenonlinear functions are approximated with their Jacobians.Compare with the linear transformation rule

z = Gx ,x ∈ N

(x ,P

)→

z ∈ N(Gx ,GPGT )

Note that GPGT can be written [GiPGTj ]ij , where Gi denotes row i

of G .


TT2: second order Taylor approximation

x ∈ N(x ,P

)→

N(g(x)+[tr(g ′′i (x)P)]i ,

[g ′i (x)P(g ′j (x))T +

12tr(Pg ′′i (x)Pg ′′j (x))

]ij

)This is used in EKF2 (EKF with second order Taylor expansion).Leads to a KF where nonlinear functions are approximated withtheir Jacobians and Hessians.UKF tries to do this approximation numerically, without formingthe Hessian g ′′(x) explicitly. This reduces the n5

x complexity in[Pg ′′i (x)Pg ′′j (x))

]ijto n3

x complexity.


MC: Monte Carlo sampling

Generate N samples, transform them, and fit a Gaussiandistribution

x (i) ∈ N(x ,P

),

z(i) = g(x (i)),

µz =1N

N∑i=1

z(i),

Pz =1

N − 1

N∑i=1

(z(i) − µz

)(z(i) − µz

)T.

Not commonly used in nonlinear filtering, but a valid and solidapproach!


UT: the unscented transformAt first sight, similar to MC:Generate 2nx + 1 so called sigma points, transform these, and fit aGaussian distribution:

x0 = x ,

x±i = x ±√

nx + λP1/2:,i , i = 1, 2, . . . , nx ,

z i = g(x i ),

E(z) ≈ λ

nx + λz0 +

±N∑i=±1

12(nx + λ)

z i ,

Cov(z) ≈(

λ

nx + λ+ (1− α2 + β)

)(z0 − E(z)

)(z0 − E(z)

)T+±N∑

i=±1

12(nx + λ)

(z i − E(z)

)(z i − E(z)

)T.


UT: design parameters

I λ is defined by λ = α2(nx + κ)− nx .I α controls the spread of the sigma points and is suggested to

be chosen around 10−3.I β compensates for the distribution, and should be chosen toβ = 2 for Gaussian distributions.

I κ is usually chosen to zero.

Note thatI nx + λ = α2nx when κ = 0.I The weights sum to one for the mean, but sum to

2− α2 + β ≈ 4 for the covariance. Note also that the weightsare not in [0, 1].

I The mean has a large negative weight!I If nx + λ→ 0, then UT and TT2 (and hence UKF and EKF2)

are identical for nx = 1, otherwise closely related!


Example 1: squared norm

z = g(x) = xT x , x ∈ N (0, In)⇒ z ∈ χ2(n).

Theoretical distribution is χ2(n) with mean n and variance 2n. Themean and variance are below summarized as a Gaussiandistribution. The number of Monte Carlo simulations is 10 000.

n TT1 TT2 UT1 UT2 MCT1 N(0,0) N(1,2) N(1,2) N(1,2) N(1.02,2.15)2 N(0,0) N(2,4) N(2,2) N(2,8) N(2.02,4.09)3 N(0,0) N(3,6) N(3,0) N(3,18) N(3.03,6.3)4 N(0,0) N(4,8) N(4,-4) N(4,32) N(4.03,8.35)5 N(0,0) N(5,10) N(5,-10) N(5,50) N(5.08,10.4)

Theory N(0,0) N(n, 2n) N(n, (3− n)n) N(n, 2n2) →N(n, 2n)Conclusion: TT2 works, not the unscented transforms.


Example 2: radar

z = g(x) =

(x1 cos x2x1 sin x2

)X TT1 TT2(

3.00.0

),

(1.0 0.00.0 1.0

) (3.00.0

),

(1.0 0.00.0 9.0

) (2.0−0.0

),

(3.0 0.00.0 10.0

)(

3.00.5

),

(1.0 0.00.0 1.0

) (2.61.5

),

(3.0 −3.5−3.5 7.0

) (−1.40.5

),

(27.0 2.52.5 9.0

)(

3.00.8

),

(1.0 0.00.0 1.0

) (2.12.1

),

(5.0 −4.0−4.0 5.0

) (2.12.1

),

(9.0 0.00.0 13.0

)UT1 UT2 MCT(

1.80.0

),

(3.7 0.00.0 2.9

) (1.50.0

),

(5.5 0.00.0 9.0

) (1.80.0

),

(2.5 0.00.0 4.4

)(

1.60.9

),

(3.5 0.30.3 3.1

) (1.30.8

),

(6.4 −1.5−1.5 8.1

) (1.60.9

),

(2.9 −0.8−0.8 3.9

)(

1.31.3

),

(3.3 0.40.4 3.3

) (1.11.1

),

(7.2 −1.7−1.7 7.2

) (1.31.3

),

(3.4 −1.0−1.0 3.4

)Conclusion: UT works better than TT1 and TT2


Example 3

gTOA(x) = ‖x‖ =

√√√√ n∑i=1

x2i

gRSS(x) = c0 − c2 · 10 log10(‖x‖2)

gDOA(x) = arctan2(x1, x2),

Conclusion from these standard sensor network examples: UTworks slightly better than TT1 and TT2

TOA2d g(x) = ‖x‖ N([3;0],[1 0;0 10])TT1 N(3,1)TT2 N(4.67,6.56)UT2 N(4.08,3.34)MCT N(4.08,1.94)

DOA g(x) = arctan2(x1, x2) N([3;0],[10,0;0,1])TT1 N(0,0.111)TT2 N(0,0.235)UT2 N(0.524,1.46)MCT N(0.0702,1.6)


EKF1 and EKF2 principle

Apply TT1 and TT2, respectively, to the dynamic and observationmodels. For instance,

xk+1 = f (xk) + vk = f (x) + g ′(x)(x − x) +12

(x − x)Tg ′′(ξ)(x − x).

I EKF1 neglects the rest term.I EKF2 compensates with the mean and covariance of the rest

term using ξ = x .


EKF1 and EKF2 algorithm

Sk = h′x (xk|k−1)Pk|k−1(h′x (xk|k−1))

T

+ h′e(xk|k−1)Rk(h′e(xk|k−1))T

Kk = Pk|k−1(h′x (xk|k−1))

TS−1k

εk = yk − h(xk|k−1, 0)− 12

[tr(h′′i,xPk|k−1)

]i

xk|k = xk|k−1 + Kkεk

Pk|k = Pk|k−1

− Pk|k−1(h′x (xk|k−1))

TS−1k h′x (xk|k−1)Pk|k−1

12

[tr(h′′i,x (xk|k−1)Pk|k−1h

′′j,x (xk|k−1)Pk|k−1)

]ij

xk+1|k = f (xk|k , 0)

Pk+1|k = f ′x (xk|k)Pk|k(f′x (xk|k))

T

+ f ′v (xk|k)Qk(f ′v (xk|k))T .

12

[tr(f ′′i,x (xk|k)Pk|k f ′′j,x (xk|k)Pk|k)

]ij.


Comments

I The EKF1, using the TT1transformation, is obtained byletting both Hessians f ′′x and h′′x be zero.

I Analytic Jacobian and Hessian needed. If not available, usenumerical approximations (done in Signal and Systems Lab asdefault!).

I The complexity of EKF1 is as in KF n3x due to the FPFT

operation.I The complexity of EKF2 is n5

x due to the FiPFTj operation for

i , j = 1, . . . , nx .I Dithering is good! That is, increase Q and R from the

simulated values to account for the approximation errors.


EKF variants

I The standard EKF linearizes around the current state estimate.I The linearized Kalman filter linearizes around some reference

trajectory.I The error state Kalman filter, also known as the

complementary Kalman filter, estimates the state errorxk = xk − xk with respect to some approximate or referencetrajectory. Feedforward or feedback configurations.

linearized Kalman filter = feedforward error state Kalman filterEKF = feedback error state Kalman filter


Derivative-free algorithms

Numeric derivatives are preferred in the following cases:I The non-linear function is too complex.I The derivatives are too complex functions.I A user-friendly algorithm is desired, with as few user inputs as

possible.This can be achieved with either numerical approximation or usingsigma points!


KF, EKF and UKF in one framework

First, recall the lemma(XY

)∈ N

((µxµy

),

(Pxx PxyPxy Pyy

))= N

((µxµy

),P)

Then, the conditional distribution for X , given the observed Y = y ,is Gaussian distributed:

(X |Y = y) ∈ N(µx + PxyP−1yy (y − µy ),Pxx − PxyP−1

yy Pyx)

The Kalman gain is in this notation given by

Kk = PxyP−1yy .


Algorithm

Time update: Let

x =

(xkvk

)∈ N

((xk|k0

),

(Pk|k 00 Qk

))z = xk+1 = f (xk , uk , vk) = g(x).

The transformation approximation (UT, MC, TT1, TT2) gives

z ∼ N(xk+1|k ,Pk+1|k

).


Measurement update: Let

x =

(xkek

)∈ N

((xk|k−1

0

),

(Pk|k−1 0

0 Rk

))z =

(xkyk

)=

(xk

h(xk , uk , ek)

)= g(x)

The transformation approximation (UT, MC, TT1, TT2) gives

z ∼ N

((xk|k−1yk|k−1

),

(Pxx

k|k−1 Pxyk|k−1

Pyxk|k−1 Pyy

k|k−1

))

The measurement update is now

Kk = Pxyk|k−1

(Pyy

k|k−1

)−1,

xk|k = xk|k−1 + Kk(yk − yk|k−1

).


Linear case

Time update: f (xk , uk , vk) = Axk + vk gives

z ∼ N(Axk|k ,APk|kA

T + Qk

)= N

(xk+1|k ,Pk+1|k

)This is the KF time update.Measurement update: h(xk , ek) = Hxk + ek gives

z ∼ N((

xk|k−1Hxk|k−1

),

(Pk|k−1 HPk|k−1

Pk|k−1HT HPk|k−1HT + R

))This gives the KF measurement update!


Comments

I The filter obtained using TT1 is equivalent to the standardEKF1.

I The filter obtained using TT2 is equivalent to EKF2.I The filter obtained using UT is equivalent to UKF.I The Monte Carlo approach should be the most accurate, since

it asymptotically computes the correct first and second ordermoments.

I There is a freedom to mix transform approximations in thetime and measurement update.


Choice of nonlinear filterI Depends mainly on (i) SNR, (ii) the degree on nonlinarity and

(iii) the degree of non-Gaussian noise, in particular if anydistribution is multi-modal (has several local maxima).

I SNR and degree of nonlinearity is connected through the restterm, whose expected value is

E(x − x)Tg ′′(ξ)(x − x) = E traceg ′′(ξ)(x − x)(x − x)T = traceg ′′(ξ)P.

Small rest term requires either high SNR (small P) or almostlinear functions (small f ′′ and h′′).

I If the rest term is small, use EKF1.I If the rest term is large, and the nonlinearities are essentially

quadratic (example xT x) use EKF2.I If the rest term is large, and the nonlinearities are not

essentially quadratic try UKF.I If the functions are severly nonlinear or any distribution is

multi-modal, consider filterbanks or particle filter.


Course estimation in cars

Car with course gyro and wheel speed sensors. Gyro has drift.Wheel speeds can be converted to course angular rate, but with a(speed dependent) drift. EKF can correct the drifts!http://youtu.be/d9rzCCIBS9I

xk =(ψk , ψk , bk ,

rk,3rk,4

),

y1k = ψk + bk + e1

k ,

y2k =

ω3rnom + ω4rnom

22B

ω3ω4

rk,3rk,4− 1

ω3ω4

rk,3rk,4

+ 1+ e2

k .

http://youtu.be/d9rzCCIBS9I


Fusion of GPS and IMU in a Sounding Rocket

Possibilities:I Loose integration: GPS is seen as one position sensor.I Tight integration: GPS is seen as many TDOA sensors.

Latter option gives better outlier detection possibilities.http://youtu.be/zRHFXfZLQ64

0 50 100 150 200 250 300 350 400 450 500 550−150

−100

−50

0

50Position error relative reference trajectory

Time [s]

Err

or

[m]

x

y

z

0 50 100 150 200 250 300 350 400 450 500 550−20

−10

0

10

20Position error relative reference trajectory

Time [s]

Err

or

[m]

x

y

z

http://youtu.be/zRHFXfZLQ64


Roll angle estimation in motor cycles

Applications: ABS, traction control, headlight controlSensors: many possible configurations of gyroscopes andaccelerometers. http://youtu.be/hT6S1FgHxOc

F

http://youtu.be/hT6S1FgHxOc

Lecture5 - Linköping Universityusers.isy.liu.se/rt/fredrik/edu/sensorfusion/lecture5.pdf · Sensor Fusion, 2014 Lecture 5: 1 Lecture5: Whiteboard: I DerivationframeworkforKF,EKF,UKF

Documents