-
21st Electronics New Zealand Conference (ENZCon) University of
Waikato, Hamilton, 20-21 November, 2014
73Oral Session 3 (Measuring Things)
Grid Based Sequential Inference
Malcolm Morrison, Colin Fox
Electronics Group, Physics Department
University of Otago
Dunedin, New Zealand
Email: [email protected]
Abstract—Sequential inference and filtering methods oftenmake
approximations and simplifications to the system dynamicsthat they
look at in order to be as efficient as possible. Thisleads to
limits in accuracy and even wrong results when non-linearities
dominate the dynamics of said system. In this paper weacknowledge
that computers have vastly improved since methodssuch as the
Unscented Kalman Filter (UKF) were devised andcreate a new
filtering method that takes full advantage of this fact.This is
done by discretizing the system of interest and evolve theresultant
continuity equation with a PDE solver while updatingthe state with
observations by directly applying Bayes’ Theorem.We use a simple
finite volume method on a square grid-discretizedpendulum and show
through comparison that this method issuperior to the UKF, provided
computational time is not an issue.
I. INTRODUCTION
Sequential inference methods have been dominated by theKalman
filter [1] and, from the late 90’s on, the unscentedKalman filter
(UKF) [2]. These filters have been used togreat effect in
prediction and guidance systems, such asearly warning earthquake
[3] and vision tracking [4] systems.However the Kalman filter only
works well for approximatelylinear systems with zero mean Gaussian
noise [5]. While theUKF represents dynamics and probability
distributions of upto second order perfectly [2], it too breaks
down in situationswhere the higher order dynamics dominate, as will
be shownhere.
In this paper we will propose an alternative that, whilelacking
the elegance and speed of the UKF, has theoreticallyno limits to
accuracy. This is done by looking directly atthe properties of
probability to construct a partial differentialequation (PDE) and
solve directly using a grid based method.Hence the accuracy is
entirely dependant on the solver thatis used and can be increased
by throwing more computationalpower at it. This method will be
suited very well for those thatdon’t have time constraints and have
dynamics that are highlynon-linear.
For the purposes of this paper we will be using a verysimple PDE
solver as an example of how this may be doneand then will compare
with the UKF. We will focus on thedynamics of a pendulum as this
ranges from approximatelylinear, allowing for the small angle
approximation, to highlynon-linear when completely vertical.
II. THE METHOD
We are going to be performing inference on a systemby evolving
the probability in phase-space directly through
solving the continuity equation:
∂ρ
∂t= −∇ · (ρf) (1)
for some quantity ρ on a velocity field f . This equation
isoften used in areas such as fluid dynamics or any field thatdeals
with the transport of some quantity over space. In laymanterms it
says that if something enters or leaves a volume, thenthe amount of
stuff left must change proportionately. This isthe probability
density function that we shall be solving toevolve a probability
distribution in time.
Using a finite volume method to solve this is a sensiblechoice
as it is derived through this exact equation. Here arethe details
and approximations to our solver:
• Our space will be divided up into a square grid ofcells of
width ∆x ranging from −π to π.
• This discretization must be fine enough that the valuesof ρ
and f are approximately constant in a cell andacross a boundary,
respectively.
• We will use the method of up-streaming to calculatethe flux at
the boundary between cell (i, j) and (i +1, j). That is:
ρ |xi+1/2
=Pi j
∆x2if f · n ≥ 0, (2)
ρ |xi+1/2 =Pi+1 j
∆x2if f · n < 0 (3)
where Pi,j is the total probability in cell (i, j), and
thenotation xi+1/2 refers to the boundary between cell(i, j) and
(i+ 1, j) with associated outward pointingnormal vector n.
• We will be using a simple Euler step for our timepartial
differential:
∂
∂tPi j ≈
P k+1i j − Pki j
∆t(4)
with time step ∆t.
• The time step size is going to be limited by:
∆t ≤∆x
2N max(|f · n|)(5)
in order to preserve positivity, where n is the outwardpointing
normal vector of a cell and N is the numberof neighbouring cells to
a single one, 4 in our case.
-
21st Electronics New Zealand Conference (ENZCon) University of
Waikato, Hamilton, 20-21 November, 2014
74Oral Session 3 (Measuring Things)
This all leads to the equation for each cell to get from time
kto k + 1:
P k+1i j = Pki j−∆t(Fv−1/2+Fv+1/2+Fx−1/2+Fx+1/2) (6)
where the F ’s refer to the outward fluxs through each
boundaryand the subscripts x and v refer to the space and
velocitydirections respectively.
If we create a vector that contains all Pi j’s, this can thenbe
summed up into the matrix equation:
Pk+1 = (I −∆tA)P k, (7)
where I is the identity and A is a matrix that contains the
up-streaming information plus the associated velocities for
eachcell.
We can now use Bayes’ Theorem [5] to update our proba-bility
from an observation through direct vector multiplication:
P new ∝ P observation · P old (8)
These two sides just differ by a normalization term.
A. Kullback-Leibler Divergence
In order to have a measure of how close we are to thetrue
solutions we are going to be using the Kullback-Leiblerdivergence
[6]. It is a way of comparing how different oneprobability
distribution is from another. The divergence of thediscrete
probability distribution Q from P is:
DKL(P ||Q) =∑
i
log
(
P (i)
Q(i)
)
P (i). (9)
Note that this is not symmetric. Again, this is the divergenceof
Q from P and not the other way around.
This only works for cases where Q(i) = 0 =⇒ P (i) = 0as limx→0 x
log(x) = 0, hence, we avoid infinities. Also notethat both
probabilities must have the same discretization.
In the cases where the analytical solution exists, we
shallcalculate the divergence of our numerical solution from
theanalytical one. For the majority of cases where this does
notexist we will use the fact that as our discretization
becomesfiner and finer, the numerical solution converges to the
truesolution. If we take our “true” solution, P , to be
anothersimulation run with a finer grid for the same time then
weshould get an estimate of how close we are to the real
truesolution.
For the purposes of this paper we shall be comparing asimulation
with one where each cell is divided into 16 smallerones. For later
analysis we shall refer to the “fineness” of agrid by how many
cells per dimension, n. For the most partwe will be comparing
simulations of n cells per dimensionwith one of 4n.
We also need the 4n case to have the same discretizationas the n
case. So we need to match the cells for the n case tothose that
would be contained in it for the 4n case, then sumbefore
calculating the divergence. This then gives:
DKL(Pn||P4n) =
n2∑
i
log
(
P (i)16∑
j
P4n(j)
)
P (i), (10)
where the subscript 4n refers to the 4x finer grid.
III. SIMULATIONS
(a) x0 = 0.2π (b) x0 = 0.6π
Fig. 1. The final snapshot of the simulations on a pendulum in
phase-space,showing both the n = 100 and the n = 400 that is used
to calculate thedivergence DKL.
For the following simulations the initial state is a one
di-mensional Gaussian in phase-space with covariance Σ = 0.2I ,and
mean µ = (x0, 0). These are then run for until t = 2πand relevant
values are calculated. All other constants are setto unity, so this
would be a full period for a simple harmonicoscillator.
The velocity field that we are using is of the form:
f = (v,− sin (x)), (11)
where x and v are the angular displacement and
velocityrespectively.
Figure 1 shows two examples of what these simulationslook like.
Notice that figure 1(a) is approximately Gaussian,
Fig. 2. The divergence after t = π using the x4 finer grid as
comparisonfor different starting positions. All are exponential
decays, slowing down thefurther from x0 = 0 we go, excluding x0 =
π. That one grows slightly fromn = 50 to 100, then drops off ever
so slightly at 400.
-
21st Electronics New Zealand Conference (ENZCon) University of
Waikato, Hamilton, 20-21 November, 2014
75Oral Session 3 (Measuring Things)
Fig. 3. Computing the divergence of all of the t = π states
starting at x0 = πfrom one with n = 3200 produces an exponential
decay, which shows thatthe method does indeed converge.
this is due to the fact that for a small angular displacement
thedynamics are approximately linear, so a Gaussian will keep
itsshape. It is clear that figure 1(b) is highly non-linear from
theshape of the PDF.
A. Rate Of Convergence
The first thing to note is that the up-streaming that wehave
implemented causes an exaggerated spreading out of thedistribution.
This is particularly noticeable in the lower gridresolution
simulations like the top plots in figure 1.
Calculating the divergence from the 4x finer grid as dis-cussed
earlier we see in figure 2 that most of the initialconditions lead
to an exponential-esque decay. However, thefurther away from 0 we
go, the slower this decay is. Forthe case where x0 = π there
doesn’t seem to be any decaywhatsoever.
In figure 3 we look at the x0 = π case, but rather thantaking
the 4x finer grid to be the “true” value we take one withn = 3200.
Here we do see a decay, so the solutions do get
Fig. 4. Expectation values over time for position and velocities
of somefinite volume simulations with observations on a pendulum
with grid size n.The solid line is the expectation value, the
dotted lines are the variance fromthe expected value. The plus
signs are the observations made based on thetrue value (the dashed,
dotted line). This simulation is for x0 = 0.2π withn = 100 and 20
observations were made at regular intervals.
Fig. 5. As in figure 4, only with n = 400.
closer as we increase the fineness of the grid. The
behaviourdisplayed in figure 2 can be explained by the fact that
thelower values of n do not produce the convergent behaviourused to
justify the x4 finer grid method when run for t = π.However this is
not a problem due to the fact we will bemaking observations on our
system more often than once everyπ seconds. For the simulations
shown in this paper we shallbe making 30 observations every π
seconds, hence the PDEsolver will only need to run for π/30 seconds
accurately. Forthis even as low as n = 100 is sufficient to
demonstrate theadvantages of our inference method.
B. Comparison With UKF
For all of the simulations presented here the initial
condi-tions are a Gaussian with covariance matrix Σ = 0.2I as
abovewith all observations being Gaussian distributed variables
withcovariance Σz = 0.1I .
Figures 4 and 5 show our finite volume method’s
positionexpectation values with initial condition x0 = 0.2π for n
=100 and n = 400 respectively. Note the closer fit to the truevalue
in the latter.
Fig. 6. Expectation values over time for position and velocities
from theUKF for a pendulum. x0 = 0.2π with 30 observations.
-
21st Electronics New Zealand Conference (ENZCon) University of
Waikato, Hamilton, 20-21 November, 2014
76Oral Session 3 (Measuring Things)
Fig. 7. As in figure 6, but we have told the filter that the
dynamics are noisy(σx = 0.01) for a better result. x0 = 0.2π with
30 observations.
We now compare this with the UKF run for the same lengthin
figure 6. Here we see that the predicted path that the
filteroutputs overestimates the points of most curvature. When
runfor longer times this forms a positive feedback and
increaseswith every oscillation. The variance decreases quickly for
thefirst few observations, and stays small despite being far
awayfrom the observed data and true path.
In figure 7 we have introduced a fudge factor in the formof
process noise. A simple pendulum has no noise in thedynamics, but
this does lead to a more precise fit to the path.
Figures 8 and 9 are the finite volume method run for
higherinitial positions. The initial position of x0 = 0.8π in
figure 8leads to non-linear dynamics, and yet the true data stays
withinthe variances of the filter. Taking this to the extreme,
figure9 is run from x0 = π, which is an unstable equilibrium.
Youcan see that even in this situation, after a brief run-in time,
thefilter keeps the true values within the variances.
In stark contrast to that, figures 10 and 11 show thepredicted
values wandering off from the true data completely.This is lessened
with more regular observations being made,
Fig. 8. Expectation values over time for position and velocities
from the FVmethod for a pendulum. x0 = 0.8π with 30
observations.
Fig. 9. Finite Volume simulation with x0 = π and 30
observations.
but even this is limited as shown by figure 11 where even
300observations does not keep it from drifting away.
Running the same conditions with the dynamic noise asabove
produces figure 12 and 13. Figure 12 is an adequatefit, but does
tend to drift away from the true values briefly.
Fig. 10. x0 = 0.8π with 30 observations on the UKF.
Fig. 11. x0 = π and 300 observations on the UKF.
-
21st Electronics New Zealand Conference (ENZCon) University of
Waikato, Hamilton, 20-21 November, 2014
77Oral Session 3 (Measuring Things)
Fig. 12. x0 = 0.8π and 30 observations on the UKF with fudge
factorσx = 0.01.
Fig. 13. x0 = 0.8π and 30 observations on the UKF with fudge
factorσx = 0.01.
However, for our unstable equilibrium in figure 13 the
expec-tation value moves around the true value, but the variance
isunderestimated as earlier.
IV. CONCLUSION
In this paper we have demonstrated a new way to performinference
on a dynamic system. We have shown throughcomparison that it has
several advantages over the unscentedKalman filter:
• While the UKF is far more computationally efficient,it tends
to underestimate the uncertainties for thependulum, both in the
non-linear and even the mostlylinear areas.
• Even in the approximately linear dynamics, the UKFneeds there
to be some process noise in order toaccurately estimate the state
of the system. There isno process noise in a simple pendulum.
• Our grid-based method can accurately evolve the sys-tem for
some time without performing an observation.Whereas the UKF must
perform an observation everytime step. This can lead to the true
value being well
outside the estimated statistics when observations
areinfrequent.
• For both methods, the more non-linearities in thedynamics, the
less accurate the result. However, for thegrid based method one can
counter this by devotingmore computational power to the
problem.
These differences lead us to conclude that our method isbest
suited for inference problems where the system is highlynon-linear,
accuracy is valued over speed, and observations arelimited or
obtained sporadically. This method also stands out ifone cares
about more than just the statistics. In this grid basedmethod one
can gain arbitrary information from the state atany point in time,
not only at the observation steps.
ACKNOWLEDGMENT
The authors would like to thank Richard Norton for helpingpolish
the rough edges off finite volume solver and to the
OtagoElectronics group in general for their support.
REFERENCES
[1] R. E. Kalman, A New Approach to Linear Filtering and
PredictionProblems Journal of Basic Engineering 82 (1960):
3545.
[2] S. J. Julier and J. K. Uhlmann, A New Extension of the
Kalman Filterto Nonlinear Systems Proc. of AeroSense: The 11th Int.
Symp. onAerospace/Defense Sensing, Simulations and Controls,
1997.
[3] Y. Bock; B. Crowell; F. Webb; S. Kedar; R. Clayton; B.
Miyahara, Fusionof High-Rate GPS and Seismic Data: Applications to
Early Warning
Systems for Mitigation of Geological Hazards American
GeophysicalUnion, 2008.
[4] Shaikh, M.M. ; Wook Bahn ; Changhun Lee ; Kwang-soo Kim ;
Tae-jae Lee ; Kwang-soo Kim ; Dongil Cho, Mobile robot vision
trackingsystem using Unscented Kalman Filter System Integration
(SII), 2011IEEE/SICE International Symposium on.
[5] Chen, Zhe. Bayesian filtering: From Kalman filters to
particle filters, andbeyond. Statistics 182.1 (2003): 1-69.
[6] Kullback, S.; Leibler, R.A. On Information and Sufficiency
Annals ofMathematical Statistics 22 (1) (1951): 7986.