Kalman Filtering and Model Estimation - Steven Lillywhitestevenlillywhite.com/papers/kalman_filter.pdf · An estimator X^ of X given Y is called a minimum variance estimator if kX^

Kalman Filtering and Model Estimation

Steven Lillywhite

Steven Lillywhite () Kalman Filtering and Model Estimation 1 / 29

Introduction

We aim to do the following.

Explain the basics of the Kalman Filter .

Explain the relationship with MLE estimation.

Show some real applications.


Overview

1 Some Applications

2 Some History

3 Minimum Variance Estimation

4 Kalman FilterState-Space FormKalman Filter AlgorithmInitial State ConditionsStability

5 Maximum Likelihood Estimation

6 Estimating Commodities Models


Applications

Keywords: estimation, control theory, signal processing, filtering,linear stochastic systems.

Navigational systems. Used in the Apollo lunar landing. From theNASA Ames’ website:“The Kalman-Schmidt filter was embedded in the Apollo navigationcomputer and ultimately into all air navigation systems, and laid thefoundation for Ames’ future leadership in flight and air trafficresearch.”

Satellite tracking. Missile tracking. Radar tracking.

Computer vision. Robotics. Speech enhancement.

Economics, Math Finance.


Gauss was no dummy!

Late 1700s. Problem: estimate planet and comet motion using datafrom telescopes.

1795. Gauss first uses least-squares method at age of 18.

1912. Fisher introduces the method of maximum likelihood.

1940s. Wiener-Kolmogorov linear minimum variance estimationtechnique. Signal processing. Unwieldly for large data sets.

1960. Kalman introduces Kalman filter.


Minimum Variance Estimators

Let (Ω,P) be a probability space.

Definition (Estimator)

Let X , Y1,Y2, . . .Yn ∈ L2(P) be random variables. Let

Ydef= (Y1,Y2, . . . ,Yn). By an estimator X for X given Y we mean a

random variable of the form X = gY, where g : Rn → R is a givenBorel-measurable function.

Definition (Minimum Variance Estimator)

An estimator X of X given Y is called a minimum variance estimator if

‖X − X‖ ≤ ‖hY − X‖ (1)

for all Borel-measurable h. Let us denote MVE (X |Y)def= X .


Minimum Variance Estimators

MVE (X ) = E (X |Y)

Let M(Y)def= gY|g Borel-measurable, gY ∈ L2(P). Then M(Y)

is a closed subspace of L2(P), and MVE (X |Y) is the projection of Xonto M(Y).

As a corollary, MVE (X |Y) exists, is unique, and is characterized bythe condition:

(MVE (X |Y)− X ) ⊥M(Y) (2)


Linear Minimum Variance Estimators

Definition (Estimator)

Let X , Y1,Y2, . . .Yn ∈ L2(P) be random variables. Let

Ydef= (Y1,Y2, . . . ,Yn). By an linear estimator X for X given Y we mean a

random variable of the form X = gY, where g : Rn → R is a given linearfunction.

Let’s ramp it up a bit by letting X be multi-dimensional.

Definition (Best Linear Minimum Variance Estimator)

Let X ∈ L2(P)n,Y ∈ L2(P)m. An linear estimator X of X given Y iscalled a best linear minimum variance estimator if

‖X− X‖ ≤ ‖hY − X‖ (3)

for all linear h. Let us denote BLMVE (X|Y)def= X . Here h is given by an

n ×m matrix.


Linear Minimum Variance Estimators

If X and Y are multivariate normal, thenMVE (X|Y) = BLMVE (X|Y) (up to a constant term).


State-Space Form

Definition (State-Space Form)

The state-space form is defined by the following pair of equations:

xi+1 = Jixi + gi + ui (state)

zi = Hixi + bi + wi (observation)

Here xi , zi are vectors representing a discrete random variables.

In general the elements of xi are not observable.

We assume that the elements of zi are observable.

ui and wi are white noise processes.

We assume that all vectors and matrices take values in Euclideanspace and can vary with i , but apart from xi and zi , that they onlyvary in a deterministic manner.


State-Space Form

Furthermore, we denote Qidef= E (uiu

Ti ) and Ri

def= E (wiw

Ti ), and assume

that the following hold:

E (uixT0 ) = 0 (4)

E (wixT0 ) = 0 (5)

E (uiwTj ) = 0 for all i , j (6)


Kalman Filter Notation

Definition

Denote Yjdef= (z0, z1, . . . zj). By xi |j , (resp. zi |j) we shall mean the best

linear minimum variance estimate(BLMVE) of xi (resp. zi ) based on Yj .We also define

Pi |jdef= E(xi − xi |j)(xi − xi |j)

T (7)

and call this the error matrix. When i = j , the estimate is called a filteredestimate, when i > j , the estimate is called a predicted estimate, andwhen i < j , the estimate is called a smoothed estimate


Discrete Kalman Filter

Theorem (Kalman, 1960)

The BLMVE xi |i may be generated recursively by

xi+1|i = Ji xi |i + gi (predicted state)

Pi+1|i = JiPi |iJTi + Qi (predicted state error matrix)

zi+1|i = Hi+1xi+1|i + bi+1 (predicted observation)

ri+1def= zi+1 − zi+1|i (predicted obs error)

Σi+1def= Hi+1Pi+1|iH

Ti+1 + Ri+1 (predicted obs error matrix)

Ki+1 = Pi+1|iHTi+1Σ

−1i+1 (Kalman gain)

xi+1|i+1 = xi+1|i + Ki+1ri+1 (next filtered state)

Pi+1|i+1 = [I − Ki+1Hi+1]Pi+1|i (next filtered state error matrix)


Discrete Kalman Filter

If the initial state x0 and the innovations ui , wi are multivariateGaussian, then the forecasts xi |j , (resp. zi |j) are minimum varianceestimators(MVE).

Note that the updated filtered state estimate is a sum of thepredicted state estimate and the predicted observation error weightedby the gain matrix.

Observe that the gain matrix is proportional to the predicted stateerror covariance matrix, and inversely proportional to the predictedobservation error covariance matrix. Thus, in updating the stateestimator, more weight is given to the observation error when theerror in the predicted state estimate is large, and less when theobservation error is large.


Kalman Filter Initial State Conditions

To run the Kalman filter, we begin with the pair x0|0, P0|0 (alternatively,one may also use x1|0, P1|0). A difficuly with the Kalman filter is thedetermination of these initial conditions. In many real applications, thedistribution for x0 is unknown. Several approaches are possible.

For stationary state series, we can compute x0|0, P0|0 directly.

Prior information.

Diffuse prior: x0|0 = 0, and P0|0 = kI , k 0. The details are moreinvolved.

Or one may treat x0 as a fixed vector, taking x0|0 = x0, and P0|0 = 0,and estimate its components by treating them as extra parameters inthe model. The details are more involved.

General rule of thumb is that for long time series, the initial stateconditions will have little impact.


Kalman Filter Stability

Under certain conditions, the err matrices Pi+1|i (equivalently Pi |i ) willstabilize

limi→∞

Pi+1|i = P (8)

with P independent of P1|0. Convergence is often exponentially fast.

This means that for stable filters, the initial state conditions won’thave much impact so long as we have enough data to get to a stablestate. Need to be more concerned with initial state conditions insmall samples.

We gain significant computational advantage exploiting convergencein the filter. Especially when the matrices are time-invariant, the thepredicted observation err matrix and the Kalman gain stabilize, too.See next slide.


Kalman Filter Stability

This part is independent of the data

Pi+1|i = JiPi |iJTi + Qi (predicted state error matrix)

Σi+1def= Hi+1Pi+1|iH

Ti+1 + Ri+1 (predicted obs error matrix)

Ki+1 = Pi+1|iHTi+1Σ

−1i+1 (Kalman gain)

Pi+1|i+1 = [I − Ki+1Hi+1]Pi+1|i (next filtered state error matrix)

xi+1|i = Ji xi |i + gi (predicted state)

zi+1|i = Hi+1xi+1|i + bi+1 (predicted observation)

ri+1def= zi+1 − zi+1|i (predicted obs error)

xi+1|i+1 = xi+1|i + Ki+1ri+1 (next filtered state)


Kalman Filter Divergence

Numerical instability in the algorithm, round-off errors, etc., cancause divergence in the filter.

Model fit. If the underlying state model does not fit the real-worldprocess well, then the filter can diverge.

Observability. If we cannot observe some of the state variables(orlinear combinations), then we can get divergence in the filter.


Kalman Filter Other Items

Kalman advantage: real-time updating. No need to store past data toupdate current data.

Can handle missing data, since the matrices in the algorithm can varyover time.

Smoothing. The filter algorithm above gives BLMVE at time t basedon data up to time t. However, once all data is in, we can make betterestimates of the state variables at time t using also data after time t.

Alternative forms for the filter algorithm based on algebraicmanipulation of the equations. Information filter computes P−1

i |i .

Depending on the situation, this can be more(or less) useful.Square-Root filter uses square roots of P−1

i |i . It is morecomputationally burdensome, but can improve numerical instabilityproblems.


Kalman Filter Other Items

Non-linear state-space filters. This is called the Extended KalmanFilter. Here, we allow arbitrary functions in the state-spaceformulation, rather than the linear functions above.

xi+1 = f (xi , gi , ui ) (state)

zi = h(xi , bi ,wi ) (observation)

One proceeds by linearizing the functions about the estimates at eachstep, and thereby obtain an analogous filter algorithm.

There is a continuous version of the filter due to Kalman and Bucy.


Maximum Likelihood Estimation

If the initial state x0 and the innovations ui , wi are multivariate Gaussian,then the distribution of zi conditional on the set Yi−1 is also Gaussian, andthe error matrices above are covariance matrices of the error randomvariables.

zi |Yi−1 ∼ N(zi |i−1, Σi ) (9)

Now let us suppose that the state-space vectors and matrices depend oncertain unknown parameters. Let us denote by θ the vector of theseparameters. We may form the likelihood function by taking the jointprobability density function(pdf):

L(z ; θ) =n∏

i=1

pdf (zi |Yi−1) (10)


Maximum Likelihood Estimation

Then the log of the likelihood function is

log(L(z ; θ)) = −mn2 log(2π)− 1

2

n∑i=1

log(det(Σi )) + rTi Σ−1i ri (11)

By maximizing log(L(z ; θ)) with respect to θ for a particular realization ofz , we obtain the maximum likelihood estimates for the parameters. Themain point of this section is that log(L(z ; θ)) may be computed via theKalman filter, since the algorithm naturally computes both ri and Σi .


Estimating Commodities Models

Kalman filtering with maximum likelihood can be used to estimateparameters in various models in financial engineering applications.

One such use is for the estimation of parameters in commoditiesmodels. Here the state system would model the spot price, and theobservation system would be futures prices.Kalman filtering can be useful here for the following reasons

It is not uncommon that there is no true spot price process in the realworld.

Even if there is a spot price process, it can be be highly illiquid, errorprone, and unreliable for modelling.

In multi-factor models, we may have the spot price divided intounobservable components.


A Two-Factor Model of Schwartz-Smith for Oil prices

Combining geometric brownian motion with mean-reversion. This is theshort-long model of Schwartz-Smith. The idea is that short-termvariations revert back to an equilibrium level. But the equilibrium level isuncertain, and follows a Brownian motion process with drift.

ln(St) = ξt + χt (12)

dξt = µξdt + σξdWχ (13)

dχt = −κχtdt + σχdWξ (14)

dWχdWξ = ρχξdt (15)


A Two-Factor Model of Schwartz-Smith for Oil pricesWe shall use the risk-neutral framework to price derivatives. We shallassume that the market price of risk is constant, and we write:

dξt = (µξ − λξ)dt + σξdWχ (16)

dχt = (−κχt − λχ)dt + σχdWξ (17)

Here (λξσξ, λχσχ) denotes the change of measure according to theGirsanov theorem. We shall denote µ∗ξ = µξ − λξ. Under theseassumptions, we obtain that the distribution of ln(St) is normal, and

ln(F (t,T )) = e−κ(T−t)χt + ξt + A(T − t) (18)

A(T − t) = µ∗ξ(T − t)− (1− e(−κ(T−t))λχ/κ

+ 12 (1− e−2κ(T−t)) + 1

2σ2ξ (T − t)

+(1− e−κ(T−t))ρχξσχσξ/κ


Parameter Estimation Via Kalman filter MLE

State Equation: discretize the model for the spot price

Measurement Equation: discretize the formula for the futures price interms of the spot. Add noise term.

We estimate the parameters in the model using maximum likelihoodwith the Kalman filter.


State-Space Formulation

xt = Jxt−1 + g + ut−1 (19)

where

xt =

[χt

ξt

]g =

[0

µ∆t

]J =

[e−κ∆t 0

0 1

](20)

and ut ∼ N(0,Q), with

Q =

[(1− e−2κ∆t)σ2

χ/2κ (1− e−κ∆t)ρσχσξ/κ(1− e−κ∆t)ρσχσξ/κ σ2

ξ∆t

](21)

Here, ∆t represents the data frequency, or time between observations.Note that J, g , and Q do not vary with time.



For the observation equation, we have

zt = Htxt + bt + wt (22)

wherezt = [log FT1(t) log FT2(t) . . . log FTm(t)]T (23)

bt = [A(φ(t,T1)− t) A(φ(t,T2)− t) . . . A(φ(t,Tm)− t)]T (24)

Ht =

e−κ(φ(t,T1)−t) 1

e−κ(φ(t,T2)−t) 1. . .

e−κ(φ(t,Tm)−t) 1

(25)



Here wt ∼ N(0,R) represents measurement error, which could come aboutvia error in price reporting, or alternatively can represent errors in fittingthe model. To simplify the problem, it is common practice to take thematrix R to be diagonal, or even a constant times the identity. Thiscorresponds to assuming that the measurement errors are not correlated,resp. that measurement errors are uncorrelated and equal for allmaturities. This assumption has the effect of introducing m, resp. 1, extraparameter(s) to be estimated with the model.


Kalman Filtering and Model Estimation - Steven Lillywhitestevenlillywhite.com/papers/kalman_filter.pdf · An estimator X^ of X given Y is called a minimum variance estimator if kX^

Documents