-
arX
iv:p
hysi
cs/0
5112
36v1
[ph
ysic
s.da
ta-a
n] 2
8 N
ov 2
005
Efficient Data Assimilation for Spatiotemporal Chaos: a
Local Ensemble Transform Kalman Filter
Brian R. Hunt
Institute for Physical Science and Technology and Department of
Mathematics
University of Maryland, College Park MD 20742
September 3, 2018
Abstract
Data assimilation is an iterative approach to the problem
ofestimating the state of a dynam-
ical system using both current and past observations of the
system together with a model for the
system’s time evolution. Rather than solving the problem from
scratch each time new obser-
vations become available, one uses the model to “forecast” the
current state, using a prior state
estimate (which incorporates information from past data) as the
initial condition, then uses cur-
rent data to correct the prior forecast to a current state
estimate. This Bayesian approach is most
effective when the uncertainty in both the observations andin
the state estimate, as it evolves
over time, are accurately quantified. In this article, I
describe a practical method for data assim-
ilation in large, spatiotemporally chaotic systems. The method
is a type of “Ensemble Kalman
Filter”, in which the state estimate and its approximate
uncertainty are represented at any given
time by an ensemble of system states. I discuss both the
mathematical basis of this approach
and its implementation; my primary emphasis is on ease of useand
computational speed rather
than improving accuracy over previously published approaches to
ensemble Kalman filtering.
1 Introduction
Forecasting a physical system generally requires both a model
for the time evolution of the system
and an estimate of the current state of the system. In some
applications, the state of the system can be
measured accurately at any given time, but in other
applications, such as weather forecasting, direct
measurement of the system state is not feasible. Instead, the
state must be inferred from available
data. While a reasonable state estimate based on current data
may be possible, in general one can
obtain a better estimate by using both current and past
data.“Data assimilation” provides such an
1
http://arxiv.org/abs/physics/0511236v1
-
estimate on an ongoing basis, iteratively alternating between a
forecast step and a state estimation
step; the latter step is often called the “analysis”. The
analysis step combines information from
current data and from a prior short-term forecast (which is
based on past data), producing a current
state estimate. This estimate is used to initialize the
nextshort-term forecast, which is subsequently
used in the next analysis, and so on. The data assimilation
procedure is itself a dynamical system
driven by the physical system, and the practical problem is to
achieve good “synchronization” [33]
between the two systems.
Data assimilation is widely used to study and forecast
geophysical systems [10, 21]. The analy-
sis step is generally a statistical procedure (specifically, a
Bayesian maximum likelihood estimate)
involving a prior (or “background”) estimate of the currentstate
based on past data, and current data
(or “observations”) that are used to improve the state estimate.
This procedure requires quantifi-
cation of the uncertainty in both the background state and the
observations. While quantifying the
observation uncertainty can be a nontrivial problem, in this
article I will consider that problem to be
solved, and instead concentrate on the problem of quantifying
the background uncertainty.
There are two main factors that create background uncertainty.
One is the uncertainty in the
initial conditions from the previous analysis, which produced
the background state via a short-term
forecast. The other is “model error”, the unknown discrepancy
between the model dynamics and
actual system dynamics. Quantifying the uncertainty due tomodel
error is a challenging problem,
and while this problem generally can’t be ignored in practice, I
will discuss only crude ways of
accounting for it in this article. For the time being, let us
consider an idealized “perfect model”
scenario, in which there is no model error.
The main purpose of this article is to describe a practical
framework for data assimilation that
is both relatively easy to implement and computationally
efficient, even for large, spatiotemporally
chaotic systems. (By “spatiotemporally chaotic” I mean a
spatially extended system that exhibits
temporally chaotic behavior with weak long-range spatial
correlations.) The emphasis here is on
methodology that scales well to high-dimensional systems and
large numbers of observations, rather
than on what would be optimal given unlimited
computationalresources. Ideally one would keep
track of a probability distribution of system states,
propagating the distribution using the Fokker-
Planck-Kolmogorov equation during the forecast step. While this
approach provides a theoretical
basis for the methods used in practice [18], it would be
computationally expensive even for a low-
dimensional system and not at all feasible for a
high-dimensional system. Instead one can use
a Monte Carlo approach, using a large ensemble of system states
to approximate the distribution
(see [5] for an overview), or a parametric approach like the
Kalman Filter [19, 20], which assumes
Gaussian distributions and tracks their mean and covariance.
(The latter approach was derived
originally for linear problems, but serves as a reasonable
approximation for nonlinear problems
when the uncertainties remain sufficiently small.)
2
-
The methodology of this article is based on the Ensemble Kalman
Filter [6, 8], which has ele-
ments of both approaches: it uses the Gaussian approximation,
and follows the time evolution of
the mean and covariance by propagating an ensemble of states.
The ensemble can be reasonably
small relative to other Monte Carlo methods because it is used
only to parametrize the distribution,
not to sample it thoroughly. The ensemble should be large enough
to approximately span the space
of possible system states at a given time, because the analysis
essentially determines which linear
combination of the ensemble members form the best estimate of
the current state, given the current
observations.
Many variations on the Ensemble Kalman Filter have been
published in the geophysical liter-
ature, and this article draws ideas from a number of them [1, 2,
3, 12, 14, 15, 23, 29, 30, 38, 40].
These articles in turn draw ideas both from earlier work on
geophysical data assimilation and from
the engineering and mathematics literature on nonlinear
filtering. For the most part, I will limit my
citations to ensemble-based articles rather than attempt to
trace all ideas to their original sources. I
call the method described here a Local Ensemble Transform Kalman
Filter (LETKF), because it is
most closely related to the Ensemble Transform Kalman Filter [3]
and the Local Ensemble Kalman
Filter [29, 30].
In Section 2, I start by posing a general problem about which
trajectory of a dynamical system
“best fits” a time series of data; this problem is solved
exactly for linear problems by the Kalman Fil-
ter and approximately for nonlinear problems by ensemble Kalman
filters. Next I derive the Kalman
Filter equations as a guide for what follows. Then I discuss
ensemble Kalman filters in general and
the issue of “localization”, which is important for applications
to spatiotemporally chaotic systems.
Finally, I develop the basic LETKF equations, which providea
framework for data assimilation that
allow a system-dependent localization strategy to be developed
and tuned. I discuss also several
options for “covariance inflation” to compensate for the effects
of model error and the deficiencies
of small sample size and linear approximation that are inherent
to ensemble Kalman filters.
In Section 3, I give step-by-step instructions for
efficientimplementation of the approach devel-
oped in Section 2, and discuss options for further
improvingcomputational speed in certain cases.
Then in Section 4, I present a generalization that allows
observations gathered at different times to
be assimilated simultaneously in a natural way. Finally,
inSection 5 I briefly discuss preliminary
results and work in progress with the LETKF approach. The
notation in this article is based largely
on that proposed in [17], with some elements from [30].
3
-
2 Mathematical Formulation
Consider a system governed by the ordinary differential
equation
dx
dt= F (t,x), (1)
wherex is anm-dimensional vector representing the state of the
system ata given time. Suppose
we are given a set of (noisy) observations of the system made at
various times, and we want to
determine which trajectory{x(t)} of (1) “best” fits the
observations. For any givent, this trajectorygives an estimate of
the system state at timet.
To formulate this problem mathematically, we need to define
“best fit” in this context. Let us
assume that the observations are the result of measuring
quantities that depend on the system state
in a known way, with Gaussian measurement errors. In other
words, an observation at timetj is a
triple (yoj , Hj,Rj), whereyoj is a vector of observed values,
andHj andRj describe the relationship
betweenyoj andx(tj):
yoj = Hj(x(tj)) + εj,
whereεj is a Gaussian random variable with mean0 and covariance
matrixRj. Notice that I am
assuming a perfect model here: the observations are based ona
trajectory of(1), and our problem
is simply to infer which trajectory produced the observations.
In a real application, the observations
come from a trajectory of the physical system for which(1) is
only a model. So a more realistic (but
more complicated) problem would be to determine a
pseudo-trajectory of(1), or a trajectory of an
associated stochastic differential equation, that best fits the
observations. Formulating this problem
mathematically then requires some assumption about the size and
nature of the model error. I will
use the perfect model problem as motivation and defer the
consideration of model error until later.
Given our assumptions about the observations, we can formulate a
maximum likelihood estimate
for the trajectory of (1) that best fits the observations at
timest1 < t2 < · · · < tn. The likelihood ofa
trajectoryx(t) is proportional to
n∏
j=1
exp(−[yoj −Hj(x(tj))]TR−1j [yoj −Hj(x(tj))]).
The most likely trajectory is the one that maximizes this
expression, or equivalently minimizes the
“cost function”
Jo({x(t)}) =n∑
j=1
[yoj −Hj(x(tj))]TR−1j [yoj −Hj(x(tj))]. (2)
Thus, the “most likely” trajectory is also the one that best
fits the observations in a least square
sense.
4
-
Notice that (2) expresses the costJo as a function of the
trajectory{x(t)}. To minimize the cost,it is more convenient to
write it as a function of the system state at a particular timet.
LetMt,t′ be
the map that propagates a solution of (1) from timet to timet′.
Then
Jot (x) =n∑
j=1
[yoj −Hj(Mt,tj (x))]TR−1j [yoj −Hj(Mt,tj (x))] (3)
expresses the cost in terms of the system statex at timet. Thus
to estimate the state at timet, we
attempt to minimizeJot .
For a nonlinear model, there is no guarantee that a unique
minimum exists. And even if it does,
evaluatingJot is apt to be computationally expensive, and
minimizing it may be impractical. But
if both the model and the observation operatorsHj are linear,
the minimization is quite tractable,
becauseJot is then quadratic. Furthermore, one can compute the
minimumby an iterative method,
namely the Kalman Filter [19, 20], which I will now
describe.This method forms the basis for the
approach we will use in the nonlinear scenario.
2.1 Linear Scenario: the Kalman Filter
In the linear scenario, we can writeMt,t′(x) = Mt,t′x andHj(x) =
Hjx whereMt,t′ andHjare matrices. Using the terminology from the
introduction,we now describe how to perform a
forecast step from timetn−1 to time tn followed by an analysis
step at timetn, in such a way that
if we start with the most likely system state, in the sense
described above, given the observations
up to timetn−1, we end up with the most likely state given the
observations up to timetn. The
forecast step propagates the solution from timetn−1 to timetn,
and the analysis step combines the
information provided by the observations at that time with the
propagated information from the
prior observations. This iterative approach requires thatwe keep
track not only the most likely
state, but also its uncertainty, in the sense described below.
(Of course, the fact that the Kalman
Filter computes the uncertainty in its state estimate may
beviewed as a virtue.)
Suppose the analysis at timetn−1 has produced a state
estimatex̄an−1 and an associated covari-
ance matrixPan−1. In probabilistic terms,̄xan−1 andP
an−1 represent the mean and covariance of a
Gaussian probability distribution that represents the relative
likelihood of the possible system states
given the observations from timet1 to tn−1. Algebraically, what
we assume is that for some constant
c,
n−1∑
j=1
[yoj −HjMtn−1,tjx]TR−1j [yoj −HjMtn−1,tjx] = [x− x̄an−1]T
(Pan−1)−1[x− x̄an−1] + c. (4)
In other words, the analysis at timetn−1 has “completed the
square” to express the part of the
quadratic cost functionJotn−1 that depends on the observations
up to that time as a single quadratic
5
-
form plus a constant. The Kalman Filter determinesx̄an andPan
such that an analogous equation
holds at timetn.
First we propagate the analysis state estimatex̄an−1 and its
covariance matrixPan−1 using the
forecast model to produce a background state estimatex̄bn and
covariancePbn for the next analysis:
x̄bn = Mtn−1,tnx̄an−1, (5)
Pbn = Mtn−1,tnPan−1M
Ttn−1,tn
. (6)
Under a linear model, a Gaussian distribution of states at one
time propagates to a Gaussian dis-
tribution at any other time, and the equations above describe
how the model propagates the mean
and covariance of such a distribution. (Often a constant matrix
is added to the right side of (6) to
represent additional uncertainty due to model error.)
Next, we want to rewrite the cost functionJotn given by (3) in
terms of the background state
estimate and the observations at timetn. (This step is often
formulated as applying Bayes’ Rule to
the corresponding probability density functions.) In (4),x
represents a hypothetical system state at
time tn−1. In our expression forJotn , we wantx to represent
instead a hypothetical system state at
time tn, so we first replacex byMtn,tn−1x = M−1tn−1,tnx in (4).
Then using (5) and (6) yields
n−1∑
j=1
[yoj −HjMtn,tjx]TR−1j [yoj −HjMtn,tjx] = [x− x̄bn]T (Pbn)−1[x−
xbn] + c.
It follows that
Jotn(x) = [x− x̄bn]T (Pbn)−1[x− x̄bn] + [yon −Hnx]TR−1n [yon
−Hnx] + c. (7)
To complete the data assimilation cycle, we determine the state
estimatēxan and its covariance
Pan so that
Jotn(x) = [x− x̄an]T (Pan)−1[x− x̄an] + c′
for some constantc′. Equating the terms of degree2 in x, we
get
Pan =[
(Pbn)−1 +HTnR
−1n Hn
]
−1. (8)
Equating the terms of degree1, we get
x̄an = Pan
[
(Pbn)−1x̄bn +H
TnR
−1n y
on
]
. (9)
This equation in some sense (consider, for example, the
casewhereHn is the identity matrix)
expresses the analysis state estimate as a weighted averageof
the background state estimate and the
observations, weighted according to the inverse covariance of
each.
6
-
Equations (8) and (9) can be written in many different but
equivalent forms, and it will be useful
later to rewrite both of them now. Using (8) to eliminate(Pbn)−1
from (9) yields
x̄an = x̄bn +P
anH
TnR
−1n (y
on −Hnx̄bn). (10)
The matrixPanHTnR
−1n is called the “Kalman gain”. It multiplies the difference
between the ob-
servations at timetn and the values predicted by the background
state estimate toyield the incre-
ment between the background and analysis state estimates. Next,
multiplying (8) on the right by
(Pbn)−1Pbn and combining the inverses yields
Pan = (I+PbnH
TnR
−1n Hn)
−1Pbn. (11)
This expression is better than the previous one from a practical
point of view, since it does not
require invertingPbn.
Initialization. The derivation above of the Kalman Filter
avoided the issue of how to initialize the
iteration. In order to solve the best fit problem we originally
posed, we should make no assumptions
about the system state prior to the analysis at timet1. Formally
we can regard the background
covariancePb1 to be infinite, and forn = 1 use (8) and (9)
with(Pb1)
−1 = 0. This works if there are
enough observations at timet1 to determine (aside from the
measurement errors) the systemstate;
that is, ifH1 has rank equal to the number of model variablesm.
The analysis then determines
x̄a1 in the appropriate least-square sense. However, if there
are not enough observations, then the
matrix to be inverted in (8) does not have full rank. To avoid
this difficulty, one can assume a prior
background distribution at timet1, with Pb1 reasonably large but
finite. This adds a small quadratic
term to the cost function being minimized, but with sufficient
observations over time, the effect of
this term on the analysis at timetn decreases in significance
ast increases.
2.2 Nonlinear Scenario: Ensemble Kalman Filtering
Many approaches to data assimilation for nonlinear problems are
based on the Kalman Filter, or at
least on minimizing a cost function similar to (7). At a
minimum, a nonlinear model forces a change
in the forecast equations (5) and (6), while nonlinear
observation operatorsHn force a change in the
analysis equations (10) and (11). The Extended Kalman Filter
(see, for example, [18]) computes
x̄bn = Mtn−1,tn(x̄an−1) using the nonlinear model, but
computesP
bn using the linearizationMtn−1,tn
of Mtn−1,tn aroundx̄an−1. The analysis then uses the
linearizationHn of Hn aroundx̄
bn. This
approach is problematic for complex, high-dimensional models
such as a global weather model for
(at least) two reasons. First, it is not easy to linearize such
a model. Second, the number of model
variablesm is several million, and as a result them × m matrix
inverse required by the analysiscannot be performed in a reasonable
amount of time.
7
-
Approaches used in operational weather forecasting generally
eliminate for pragmatic reasons
the time iteration of the Kalman Filter. The U.S. National
Weather Service performs data assimila-
tion every 6 hours using the “3D-VAR” method [25, 31], in which
the background covariancePbnin (7) is replaced by a constant
matrixB representing typical uncertainty in a 6 hour forecast.
This
simplification allows the analysis to be formulated in a manner
that does not require a large matrix
to be inverted each time. The 3D-VAR cost function also includes
a nonlinear observation operator
Hn, and is minimized numerically to produce the analysis
stateestimatexan.
The “4D-VAR” method [24, 35] used by the European Centre for
Medium-Range Weather Fore-
casts uses a cost function that includes a constant-covariance
background term as in 3D-VAR to-
gether with a sum like (2) accounting for the observations
collected over a 12 hour time span. Again
the cost function is minimized numerically; this procedureis
computationally intensive, because
both the the nonlinear model and its linearization must be
integrated over the 12 hour interval to
compute the gradient of the 4D-VAR cost function, and this
procedure is repeated until a satisfac-
tory approximation to the minimum is found.
The key idea of ensemble Kalman filtering [6] is to choose at
time tn−1 an ensemble of initial
conditions whose spread aroundx̄an−1 characterizes the analysis
covariancePan−1, propagate each
ensemble member using the nonlinear model, and computePbn based
on the resulting ensemble at
time tn. Thus like the Extended Kalman Filter, the (approximate)
uncertainty in the state estimate is
propagated from one analysis to the next, unlike 3D-VAR (which
does not propagate the uncertainty
at all) or 4D-VAR (which propagates it only for a limited time).
Furthermore, it does this without
requiring a linearized model. While these are advantages over
the other methods, there are some
potential disadvantages as well.
Perhaps the most important difference between ensemble Kalman
filtering and the other meth-
ods described above is that it only quantifies uncertainty inthe
space spanned by the ensemble.
Assuming that computational resources restrict the numberof
ensemble membersk to be much
smaller than the number of model variablesm, this can be a
severe limitation. On the other hand, if
this limitation can be overcome (see the section on
“Localization” below), then the analysis can be
performed in a much lower-dimensional space (k versusm). Thus,
ensemble Kalman filtering has
the potential to be more computationally efficient than the
other methods. Indeed, the main point
of this article is to describe how to do ensemble Kalman
filtering efficiently without sacrificing
accuracy.
Notation. We start with an ensemble{xa(i)n−1 : i = 1, 2, . . . ,
k} of m-dimensional model statevectors at timetn−1. One approach
would be to let one of the ensemble members represent the
best estimate of the system state, but here we assume the
ensemble to be chosen so that its average
represents the analysis state estimate. We evolve each ensemble
member according to the nonlinear
8
-
model to obtain a background ensemble{xb(i)n : i = 1, 2, . . . ,
k} at timetn:
xb(i)n = Mtn−1,tn(xa(i)n−1).
For the rest of this article, I will discuss what to do at the
analysis timetn, and so I now drop the
subscriptn. Thus, for example,H andR will represent respectively
the observation operator and the
observation error covariance matrix at the analysis time. Let ℓ
be the number of scalar observations
used in the analysis.
For the background state estimate and its covariance we use the
sample mean and covariance of
the background ensemble:
x̄b = k−1k∑
i=1
xb(i),
Pb = (k − 1)−1k∑
i=1
(xb(i) − x̄b)(xb(i) − x̄b)T = (k − 1)−1Xb(Xb)T , (12)
whereXb is them × k matrix whoseith column isxb(i) − x̄b. The
analysis must determine notonly an state estimatēxa and
covariancePa, but also an ensemble{xa(i) : i = 1, 2, . . . , k}
with theappropriate sample mean and covariance:
x̄a = k−1k∑
i=1
xa(i),
Pa = (k − 1)−1k∑
i=1
(xa(i) − x̄a)(xa(i) − x̄a)T = (k − 1)−1Xa(Xa)T , (13)
whereXa is them× k matrix whoseith column isxa(i) − x̄a.In
Section 2.3, I will describe how to determinex̄a andPa for a
(possibly) nonlinear observation
operatorH in a way that agrees with the Kalman Filter equations
(10) and(11) in the case thatH is
linear.
Choice of analysis ensemble. Oncex̄a arePa specified, there are
still many possible choices of
an analysis ensemble (or equivalently, a matrixXa that satisfies
(13) and the sum of whose columns
is zero). A variety of ensemble Kalman filters have been
proposed, and one of the main differences
among them is how the analysis ensemble is chosen. The simplest
approach is to apply the Kalman
filter update (10) separately to each background ensemble member
(rather than their mean) to get
the corresponding analysis ensemble member. However, thisresults
in an analysis ensemble whose
sample covariance is smaller thanPa, unless the observations are
artificially perturbed so thateach
ensemble member is updated using different random realization of
the perturbed observations [4,
14]. Ensemble square-root filters [1, 38, 3, 36, 29, 30] instead
use more involved but deterministic
algorithms to generate an analysis ensemble with the desired
sample mean and covariance. As such,
9
-
their analyses coincide exactly with the standard Kalman Filter
in the linear scenario of the previous
section. I will use this deterministic approach below.
Localization. Another important issue in ensemble Kalman
filtering of spatiotemporally chaotic
systems is spatial localization. If the ensemble hask members,
then the background covariance
matrixPb given by (12) describes nonzero uncertainty only in
thek-dimensional subspace spanned
by the ensemble, and a global analysis will allow adjustments to
the system state only in this sub-
space. If the system is high-dimensionally unstable, then
forecast errors will grow in directions not
accounted for by the ensemble, and these errors will not be
corrected by the analysis. On the other
hand, in a sufficiently small local region, the system may
behave like a low-dimensionally unsta-
ble system driven by the dynamics in neighboring region; such
behavior was observed for a global
weather model in [32]. Performing the analysis locally requires
only that the ensemble in some
sense span the local unstable space; by allowing the local
analyses to choose different linear com-
binations of the ensemble members in different regions,
theglobal analysis is not confined to the
k-dimensional ensemble space and instead explores a much higher
dimensional space [9, 29, 30].
Another view on the necessity of localization for
spatiotemporally chaotic systems is that the limited
sample size provided by an ensemble will produce spurious
correlations between distant locations
in the background covariance matrixPb [14, 12]. Unless they are
suppressed, these spurious cor-
relations will cause observations from one location to affect,
in an essentially random manner, the
analysis an arbitrarily large distance away. If the system has a
characteristic “correlation distance”,
then the analysis should ignore ensemble correlations overmuch
larger distances. In addition to
providing better results in many cases, localization allows the
analysis to be done more efficiently
as a parallel computation [23, 29, 30].
Localization is generally done either explicitly, considering
only the observations from a region
surrounding the location of the analysis [22, 14, 23, 1, 29,
30], or implicitly, by multiplying the
entries inPb by a distance-dependent function that decays to
zero beyonda certain distance, so
that observations do not affect the model state beyond that
distance [15, 12, 38]. I will follow the
explicit approach here, doing a separate analysis for each
spatial grid point of the model. (My use of
“grid point” assumes the model to be a discretization of a
partial differential equation, or otherwise
be defined on a lattice, but the method is also applicable to
systems with other geometries.) The
choice of which observations to use for each grid point is up to
the user of the method, and a
good choice will depend both on the particular system being
modeled and the size of the ensemble
(more ensemble members will generally allow more distant
observations to be used gainfully). It is
important however to have significant overlap between the
observations used for one grid point and
the observations used for a neighboring grid point; otherwise
the analysis ensemble may change
suddenly from one grid point to the next. For an atmospheric
model, a reasonable approach would
10
-
be to use observations within a cylinder of a given radius
andheight and determine empirically
which values of these two parameters work best. At its simplest,
the method I describe gives all of
the chosen observations the same weight, but I will also
describe how to make the weights given to
the observations decay more smoothly to zero as the distancefrom
the analysis location.
2.3 LETKF: A local ensemble transform Kalman filter
I will now describe an efficient means of performing the
analysis that transforms a background
ensemble{xb(i) : i = 1, 2, . . . , k} into an appropriate
analysis ensemble{xa(i) : i = 1, 2, . . . , k},using the notation
defined above. I assume that the number of ensemble membersk is
smaller than
both the number of model variablesm and the number of
observationsℓ, even when localization has
reduced the effective values ofm andℓ considerably compared to a
global analysis. (In this section
I will assume the choice of observations to use for the local
analysis to have been performed already,
and consideryo,H andR to be truncated to these observations; as
such, correlations between errors
in the chosen observations and errors in other observationsare
ignored.) Most of the analysis will
take place in ak-dimensional space, with as few operations as
possible in the model and observation
spaces.
Formally, we want the analysis meanx̄a to minimize the Kalman
filter cost function (7), modi-
fied to allow for a nonlinear observation operatorH:
J(x) = (x− x̄b)T (Pb)−1(x− x̄b) + [yo −H(x)]TR−1[yo −H(x)].
(14)
However, them×m background covariance matrixPb = (k−1)−1Xb(Xb)T
can have rank at mostk−1, and is therefore not invertible.
Nonetheless, its inverseis well-defined on the spaceS spannedby the
background ensemble perturbations, that is, the columns ofXb. ThusJ
is also well-defined
for x − x̄b in S, and the minimization can be carried out in
this subspace.1 As we have said, thisreduced dimensionality is an
advantage from the point of view of efficiency, though the
restriction
of the analysis mean toS is sure to be detrimental ifk is too
small.
In order to perform the analysis onS, we must choose an
appropriate coordinate system. A
natural approach is to use the singular vectors ofXb (the
eigenvectors ofPb) to form a basis forS
[1, 29, 30]. Here we avoid this step by using instead the
columns ofXb to spanS, as in [3]. One1Considerably more general
cost functions could be used in the relatively low-dimensional
ensemble spaceS. In
particular, one could consider non-Gaussian background
distributions as follows. Given a distribution that can be
parametrized solely by a mean and covariance matrix, substitute
the negative logarithm of its probability distribution
function for the first term on the right side of (14). Though
the formulas below that determine the analysis mean and
covariance would not be valid, one could numerically determine
the appropriate mean and covariance. In principle,
distributions that are parametrized by higher order moments
could be considered, but this would require significantly
larger ensembles.
11
-
conceptual difficulty in this approach is that the sum of these
columns is zero, so they are necessarily
not linearly independent. We could assume the firstk − 1 columns
to be independent and use themas a basis, but this assumption is
unnecessary and clutters the resulting equations. Instead, we
regard
Xb as a linear transformation from ak-dimensional spacẽS ontoS,
and perform the analysis iñS.
Let w denote a vector iñS; thenXbw belongs to the spaceS
spanned by the background ensemble
perturbations, andx = x̄b +Xbw is the corresponding model
state.
Notice that ifw is a Gaussian random vector with mean0 and
covariance(k − 1)−1I, thenx = x̄b +Xbw is Gaussian with mean̄xb and
covariancePb = (k − 1)−1Xb(Xb)T . This motivatesthe cost
function
J̃(w) = (k − 1)wTw + [yo −H(x̄b +Xbw)]TR−1[yo −H(x̄b +Xbw)]
(15)
on S̃. In particular, I claim that ifw̄a minimizesJ̃ , then x̄a
= x̄b + Xbw̄a minimizes the cost
functionJ . Substituting the change of variables formula into
(14) andusing (12) yields the identity
J̃(w) = (k − 1)wT (I− (Xb)T [Xb(Xb)T ]−1Xb)w + J(x̄b +Xbw).
(16)
The matrixI − (Xb)T [Xb(Xb)T ]−1Xb is the orthogonal projection
onto the null spaceN of Xb.(GenerallyN will be one-dimensional,
spanned by the vector(1, 1, . . . , 1)T , but it could be
higher-
dimensional.) Thus, the first term on the right side of (16)
depends only on the component ofw in
N , while the second term depends only on its component in the
space orthogonal toN (which is in
one-to-one correspondence withS underXb). Thus ifw̄a minimizesJ̃
, then it must be orthogonal
to N , and the corresponding vectorx̄a minimizesJ .
Nonlinear Observations. The most accurate way to allow for a
nonlinear observation operatorH
would be to numerically minimizẽJ in thek-dimensional spacẽS,
as in [40]. IfH is sufficiently
nonlinear, thenJ̃ could have multiple minima, but a numerical
minimization using w = 0 (corre-
sponding tox = x̄b) as an initial guess would still be a
reasonable approach. Having determined
w̄a in this manner, one would compute the analysis covarianceP̃a
in S̃ from the second partial
derivatives ofJ̃ at w̄a, then useXb to transform the analysis
results into the model space, as below.
But in order to formulate the analysis more explicitly, we now
linearizeH about the background
ensemble mean̄xb. Of course, ifH is linear then we will find the
minimum of̃J exactly. And if the
spread of the background ensemble is not too large, the
linearization should be a decent approxima-
tion, similar to the approximation we have already made thata
linear combination of background
ensemble members is also a plausible background model state.
Since we only need to evaluateH in the ensemble space (or
equivalently to evaluateH(x̄b +
Xbw) for w in S̃), the simplest way to linearizeH is to apply it
to each of the ensemble members
xb(i) and interpolate. To this end, we define an ensembleyb(i)
of background observation vectors by
yb(i) = H(xb(i)). (17)
12
-
We define also their mean̄yb, and theℓ × k matrix Yb whoseith
column isyb(i) − ȳb. We thenmake the linear approximation
H(x̄b +Xbw) ≈ ȳb +Ybw. (18)
The same approximation is used in, for example, [15], and is
equivalent to the joint state-observation
space method in [1].
Analysis. The linear approximation we have just made yields the
quadratic cost function
J̃∗(w) = (k − 1)wTw + [yo − ȳb −Ybw]TR−1[yo − ȳb −Ybw].
(19)
This cost function is in the form of the Kalman filter cost
function (7), using the background mean
w̄b = 0 and background covariance matrix̃Pb = (k − 1)−1I, with
Yb playing the role of theobservation operator. The analogues of
the analysis equations (10) and (11) are then
wa = P̃a(Yb)TR−1(yo − ȳb), (20)
P̃a = [(k − 1)I+ (Yb)TR−1Yb]−1. (21)
In model space, the analysis mean and covariance are then
x̄a = x̄b +Xbw̄a, (22)
Pa = XbP̃a(Xb)T . (23)
In order to initiate the ensemble forecast that will producethe
background for the next analysis, we
must choose an analysis ensemble whose sample mean and
covariance are equal tōxa andPa. As
mentioned above, this amounts to choosing a matrixXa so that the
sum of its columns is zero and
(13) holds. Then one can form the analysis ensemble by addingx̄a
to each of the columns ofXa.
Symmetric Square Root. Our choice of analysis ensemble is
described byXa = XbWa, where
Wa = [(k − 1)P̃a]1/2 (24)
and by the1/2 power of a symmetric matrix we mean its symmetric
square root. Then P̃a =
(k−1)−1Wa(Wa)T , and (13) follows from (23). The use of the
symmetric square root to determineWa from P̃a (as compared to, for
example, a Cholesky factorization, or the choice described in
[3]),
is important for two main reasons. First, as we will see below,
it ensures that the sum of the columns
of Xa is zero, so that the analysis ensemble has the correct
samplemean (this is also shown for the
symmetric square root in [37]). Second, it ensures thatWa
depends continuously oñPa; while this
13
-
may be a desirable property in general, it is crucial in a local
analysis scheme, so that neighboring
grid points with slightly different matrices̃Pa do not yield
very different analysis ensembles.
Another potentially desirable property of the symmetric square
root is that it minimizes the
(mean-square) distance betweenWa and the identity matrix [29,
30], though because of the dif-
ferent choice of basis, it does not minimize the same quantity,
and thus does not yield the same
analysis ensemble, as in that article. Numerical experiments to
be published elsewhere produce
similar quality results to other reasonable choices of the
analysis ensemble in a square-root filter;
see Section 5.
To see that the sum of the columns ofXa is zero, we express this
condition asXav = 0, where
v is a column vector ofk ones:v = (1, 1, . . . , 1)T . Notice
that by (21),v is an eigenvector of̃Pa
with eigenvalue(k − 1)−1:
(P̃a)−1v = [(k − 1)I+ (Yb)TR−1Yb]v = (k − 1)v,
because the sum of the columns ofYb is zero. Then by (24),v is
also an eigenvector ofWa with
eigenvalue1. Since the sum of the columns ofXb is zero,Xav =
XbWav = Xbv = 0 as desired.
Finally, notice that we can form the analysis ensemble first in
S̃ by addingw̄a to each of the
columns ofWa; let {wa(i)} be the columns of the resulting
matrix. These “weight” vectors specifywhat linear combinations of
the background ensemble perturbations to add to the background
mean
in order to get the analysis ensemble in model space:
xa(i) = x̄b +Xbwa(i), (25)
Local Implementation. Notice that once the background ensemble
has been used to form ȳb
andYb, it is no longer needed in the analysis, except in (25) to
translate the results from̃S to
model space. This point is useful to keep in mind when
implementing a local filter that computes
a separate analysis for each model grid point. In principle,one
should form a global background
observation ensembleyb(i)[g] from the global background vectors,
though in practice thiscan be done
locally when the global observation operatorH[g] uses local
interpolation. After the background
observation ensemble is formed, the analyses at different grid
points are completely independent
of each other and can be computed in parallel. The observations
chosen for a given grid point will
dictate which coordinates ofyb(i)[g] are used to form the local
background observation ensembleyb(i)
for that analysis, and the analysis iñS will produce the weight
vectors{wa(i)} for that grid point.Thus computing the analysis
ensemble{xa(i)} for that grid point using (25) requires only using
thebackground model states at that grid point.
As long as the sets of observations used at a pair of
neighboring grid points overlap heavily,
the linear combinations used at the two grid points will be
similar, and thus the global analysis
ensemble members formed by these spatially varying linear
combinations will change slowly from
14
-
one grid point to the next. In a local region of several grid
points, they will approximately be
linear combinations of the background ensemble members, and thus
should represent reasonably
“physical” initial conditions for the forecast model. However,
if the model requires of its initial
conditions high-order smoothness and/or precise conformance to
an conservation law, it may be
necessary to post-process the analysis ensemble members tosmooth
them and/or project them onto
the manifold determined by the conserved quantities beforeusing
them as initial conditions (this
procedure is often called “balancing” in geophysical data
assimilation).
In other localization approaches [15, 12, 38], the influenceof
an observation at a particular point
on the analysis at a particular model grid point decays smoothly
to zero as the distance between the
two points increases. A similar effect can be achieved here by
multiplying the entries in the inverse
observation error covariance matrixR−1 by a factor that decays
from one to zero as the distance of
the observations from the analysis grid point increases. This
“smoothed localization” corresponds
to gradually increasing the uncertainty assigned to the
observations until beyond a certain distance
they have infinite uncertainty and therefore no influence on the
analysis.
Covariance Inflation. In practice, an ensemble Kalman filter
that adheres strictlyto the Kalman
Filter equations (10) and (11) may fail to synchronize with the
“true” system trajectory that produces
the observations. One reason for this is model error, but even
with a perfect model the filter will tend
to underestimate the uncertainty in its state estimate [38].
Regardless of the cause, underestimating
the uncertainty leads to overconfidence in the background state
estimate, and hence underweighting
of the observations by the analysis. If the discrepancy becomes
too large over time, the observations
are essentially ignored by the analysis, and the dynamics ofthe
data assimilation system become
decoupled from the truth.
Generally this effect is countered by an ad hoc procedure (with
at least one tunable parameter)
that inflates either the background covariance or the analysis
covariance during each data assimi-
lation cycle. One “hybrid” approach adds a multiple of the
background covariance matrixB from
the 3D-VAR method to the background covariance prior to the
analysis [11]. “Multiplicative infla-
tion” [2, 12] instead multiplies the background covariancematrix
(or equivalently, the differences
or “perturbations” between the background ensemble members and
their mean) by a constant factor
larger than one. “Additive inflation” adds a small multiple of
the identity matrix to either the back-
ground covariance or the analysis covariance during each cycle
[29, 30]. Finally, if one chooses the
analysis ensemble in such a way that each member has a
corresponding member of the background
ensemble, then one can inflate the analysis ensemble by
“relaxation” toward the background en-
semble: replacing each analysis perturbation from the meanby a
weighted average of itself and the
corresponding background perturbation [39].
Within the framework described in this article, the hybrid
approach is not feasible because it
15
-
requires the analysis to consider uncertainty outside the space
spanned by the background ensem-
ble. However, once the analysis ensemble is formed, one could
develop a means of inflating it in
directions (derived from the 3D-VAR background covariancematrix
B or otherwise) outside the
ensemble space so that uncertainty in these directions is
reflected in the background ensemble at
the next analysis step. Additive inflation is feasible, but
requires substantial additional computation
in order to determine the adjustment necessary in
thek-dimensional spacẽS that corresponds to
adding a multiple of the identity matrix to the model space
covariancePb or Pa. Relaxation is
simple to implement, and is most efficiently done inS̃ by
replacingWa with a weighted average of
it and the identity matrix.
Multiplicative inflation can be performed most easily on
theanalysis ensemble by multiply-
ing Wa by an appropriate factor (namely√ρ in order to multiply
the analysis covariance byρ).
To perform multiplicative inflation on the background covariance
instead, one should theoretically
multiply Xb by such a factor, and adjust the background
ensemble{xb(i)} accordingly before ap-plying the observation
operatorH to form the background observation ensemble{yb(i)}.
However,a more efficient approach, which is equivalent ifH is
linear, and is a close approximation even for
nonlinearH if the inflation factorρ is close to one, is simply
to replace(k − 1)I by (k − 1)I/ρin (21), since(k − 1)I is the
inverse of the background covariance matrixP̃b in
thek-dimensionalspaceS̃. One can check that this has the same
effect on the analysis meanx̄a and covariancePa as
multiplyingXb andYb by√ρ.
3 Efficient Computation of the Analysis
Here is a step-by-step description of how to perform the
analysis described in the previous sec-
tion, designed for efficiency both in ease of implementationand
in the amount of computation and
memory usage. Of course there are some trade-offs between these
objectives, so in each step I first
describe the simplest approach and then in some cases mention
alternate approaches and possible
gains in computational efficiency. I also give a rough
accounting of the computational complexity
of each step, and at the end discuss the overall computational
complexity. After that, I describe
an approach that in some cases will produce a significantly
faster analysis, at the expense of more
memory usage and more difficult implementation, by reorganizing
some of the steps. As before,
I will use “grid point” in this section to mean a spatial
location in the forecast model, whether or
not the model is actually based on a grid geometry; I will use
“array” to mean a vector or matrix.
The use of “columns” and “rows” below is for exposition only;one
should of course store arrays in
whatever manner is most efficient for one’s computing
environment.
The inputs to the analysis are a background ensemble
ofm[g]-dimensional model state vec-
tors {xb(i)[g] : i = 1, 2, . . . , k}, a functionH[g] from
them[g]-dimensional model space to theℓ[g]-
16
-
dimensional observation space, anℓ[g]-dimensional vectoryo[g] of
observations, and anℓ[g] × ℓ[g]observation error covariance
matrixR[g]. The subscriptg here signifies that these inputs
reflect
the global model state and all available observations, fromwhich
a local subset should be chosen
for each local analysis. How to choose which observations touse
is entirely up to the user of this
method, but a reasonable general approach is to choose
thoseobservations made within a certain
distance of the grid point at which one is doing the local
analysis and determine empirically which
value of the cutoff distance produces the “best” results. Ifone
deems localization to be unnecessary
in a particular application, then one can ignore the distinction
between local and global, and skip
Steps 3 and 9 below.
Steps 1 and 2 are essentially global operations, but may be done
locally in a parallel implemen-
tation. Steps 3–8 should be performed separately for each local
analysis (generally this means for
each grid point, but see the parenthetical comment at the endof
Step 3). Step 9 simply combines
the results of the local analyses to form a global analysis
ensemble{xa(i)[g] }, which is the final outputof the analysis.
1. ApplyH[g] to eachxb(i)[g] to form the global background
observation ensemble{y
b(i)[g] }, and
average the latter vectors to get theℓ[g]-dimensional column
vector̄yb[g]. Subtract this vector
from each{yb(i)[g] } to form the columns of theℓ[g] × k matrix
Yb[g]. (This subtraction can bedone “in place”, since the
vectors{yb(i)[g] } are no longer needed.) This requiresk
applicationsof H, plus2kℓ[g] (floating-point) operations. IfH is an
interpolation operator that requires
only a few model variables to compute each observation variable,
then the total number of
operations for this step is proportional tokℓ[g] times the
average number of model variables
required to compute each scalar observation.
2. Average the vectors{xb(i)[g] } to get them[g]-dimensional
vector̄xb[g], and subtract this vectorfrom eachxb(i)[g] to form the
columns of them[g] × k matrixXb[g]. (Again the subtraction can
bedone “in place”; the vectors{xb(i)[g] } are no longer needed).
This step requires a total of2km[g]operations. (IfH is linear, one
can equivalently perform Step 2 before Step 1,and
obtain̄yb[g]andYb[g] by applyingH to x̄
b[g] andX
b[g].)
3. This step selects the necessary data for a given grid
point(whether it is better to form the local
arrays described below explicitly or select them later as needed
from the global arrays depends
on one’s implementation).Select the rows of̄xb[g] andXb[g]
corresponding to the given grid
point, forming their local counterparts: them-dimensional
vector̄xb and them × k matrixXb, which will be used in Step 8.
Likewise, select the rows ofȳb[g] andY
b[g] corresponding to
the observations chosen for the analysis at the given grid
point, forming theℓ-dimensional
vectorȳb and theℓ×k matrixYb. Select the corresponding rows
ofyo[g] and rows and columnsof R[g] to form theℓ-dimensional
vectoryo and theℓ × ℓ matrixR. (For a high-resolution
17
-
model, it may be reasonable to use the same set of observations
for multiple grid points, in
which case one should select here the rows ofXb[g] andx̄b[g]
corresponding to all of these grid
points.)
4. Compute thek × ℓ matrixC = (Yb)TR−1. Since this is the only
step in whichR is used,it may be most efficient to computeC by
solving the linear systemRCT = Yb rather than
invertingR. In some applications,R may be diagonal, but in
othersR will be block diagonal
with each block representing a group of correlated observations.
As long as the size of each
block is relatively small, invertingR or solving the linear
system above will not be computa-
tionally expensive. Furthermore, many or all of the blocks that
make upR may be unchanged
from one analysis time to the next, so that their inverses need
not be recomputed each time.
Based on these considerations, the number of operations required
(at each grid point) for this
step in a typical application should be proportional tokℓ,
multiplied by a factor related to the
typical block size ofR.
5. Compute thek × k matrix P̃a =[
(k − 1)I/ρ+CYb]
−1, as in (21).Hereρ > 1 is a multi-
plicative covariance inflation factor, as described at the end
of the previous section. Though
trying some of the other approaches described there may be
fruitful, a reasonable general ap-
proach is to start withρ > 1 and increase it gradually until
one finds a value that is optimal
according to some measure of analysis quality. MultiplyingC
andYb requires less than2k2ℓ
operations, while the number of operations needed to invertthek
× k matrix is proportionalto k3.
6. Compute thek × k matrixWa = [(k − 1)P̃a]1/2, as in (24).Again
the number of operationsrequired is proportional tok3; it may be
most efficient to compute the eigenvalues and eigen-
vectors of[
(k − 1)I/ρ+CYb]
in the previous step and then use them to compute bothP̃a
andWa.
7. Compute thek-dimensional vectorwa = P̃aC(yo− ȳb), as in
(20), and add it to each columnof Wa, forming ak × k matrix whose
columns are the analysis vectors{wa(i)}. Computingthe formula forwa
from right-to-left, the total number of operations required for
this step is
less than3k(ℓ+ k).
8. Multiply Xb by eachwa(i) and addx̄b to get the analysis
ensemble members{xa(i)} at theanalysis grid point, as in (25).This
requires2k2m operations.
9. After performing Steps 3–8 for each grid point, the outputs
of Step 8 form the global analysis
ensemble{xa(i)[g] }.
18
-
We now summarize the overall computational complexity of the
algorithm described above. If
p is the number local analyses performed (equal to the number of
grid points in the most basic
approach), then notice thatpm = m[g], while pℓ̄ = qℓ[g], whereℓ̄
is the average number of obser-
vations used in a local analysis andq is the average number of
local analyses in which a particular
observation is used. If̄ℓ is large compared tok andm, then the
most computationally expensive step
is either Step 5, requiring approximately2k2pℓ̄ = 2k2qℓ[g]
operations over all the local analyses, or
Step 4, whose overall number of operations is proportional to
kpℓ̄ = kqℓ[g], but with a proportional-
ity constant dependent on the correlation structure ofR[g]. In
any case, as long as the typical number
of correlated observations in a block ofR[g] remains constant,
the overall computation time grows at
most linearly with the total numberℓ[g] of observations. It also
grows at most linearly with the total
numberm[g] of model variables; if this is large enough compared
to the number of observations,
then the most expensive step is Step 8, with2k2m[g] overall
operations. The computation time tends
to be roughly quadratic in the numberk of ensemble members,
though for a very large ensemble
the terms of orderk3 above would become significant.
Batch Processing of Observations. Some of the steps above have
aq-fold redundancy, in that
computations involving a given observation are repeated over an
average ofq different local analy-
ses. For a general observation error covariance matrixR[g] this
redundancy may be unavoidable, but
it can be avoided as described below if the global observations
can be partitioned into local groups
(or “batches”) numbered1, 2, . . . that meet the following
conditions. First, all of the observations
in a given batch must be used in the exact same subset of the
local analyses. Second, observations
in different batches must have uncorrelated errors, so thateach
batchj corresponds to a blockRjin a block diagonal decomposition
ofR[g]. (These conditions can always met ifR[g] is diagonal, by
making each batch consist of a single observation. However,as
explained below, for efficiency one
should make the batches as large as possible while still meeting
the first condition.) Then at Step 3,
instead of selecting (overlapping) submatrices ofȳb[g], Yb[g],
y
o[g], andR[g], for each grid point, let̄y
bj ,
Ybj , yoj , represent the rows corresponding to the observations
in batch j, and do the following for
each batch. Compute and store thek × k matrixCjYbj and
thek-dimensional vectorCj(yoj − ȳbj),whereCj = (Ybj)
TR−1j as in Step 4. (This can be done separately for each batch,
in parallel,
and the total number of operations is roughly2k2ℓ[g].) Then do
Steps 5–8 separately for each local
analysis; whenCYb andC(yo − ȳb) are required in Steps 5 and 7,
compute them by summing thecorresponding arraysCjYbj andCj(y
oj − ȳbj) over the batchesj of observations that are used in
the
local analysis. To avoid redundant addition in these
steps,batches that are used in exactly the same
subset of the local analyses should be combined into a
singlebatch. The total number of operations
required by the summations over batches roughlyk2ps, wheres is
the average number of batches
used in each local analysis. Both this and the2k2ℓ[g] operations
described before are smaller than
19
-
the roughly2k2pℓ̄ = 2k2qℓ[g] operations they combine to
replace.
This approach has similarities with the “sequential” approach of
[15] and [38], in which ob-
servations are divided into uncorrelated batches and a separate
analysis is done for each batch; the
analysis is done in the observation space whose dimension isthe
number of observations in a batch.
However, in the sequential approach, the analysis ensemblefor
one batch of observations is used
as the background ensemble for the next batch of observations.
Since batches with disjoint local
regions of influence can be analyzed separately, some
parallelization is possible, though the LETKF
approach described above is more easily distributed over a large
number of processors. For a serial
implementation, either approach may be faster depending onthe
application and the ensemble size.
4 Asynchronous Observations: 4D-LETKF
In theory, one can perform a new analysis each time new
observations are made. In practice, this
is a good approach if observations are made at regular and
sufficiently infrequent time intervals.
However, in many applications, such as weather forecasting,
observations are much too frequent
for this approach. Imagine, for example, a 6 hour interval
between analyses, like at the National
Weather Service. Since weather can change significantly over
such a time interval, it is important
to consider observations taken at intermediate times in a more
sophisticated manner than to pretend
they occurred at the analysis time (or to simply ignore them).
Operational versions of 3D-VAR and
4D-VAR (see Section 2.2) do take into account the timing of the
observations, and one of the primary
strengths of 4D-VAR is that it does so in a precise manner, by
considering which forecast model
trajectory best fits the observations over a given time interval
(together with assumed background
statistics at the start of this interval).
We have seen that the analysis step in an ensemble Kalman filter
considers model states that
are linear combinations of the background ensemble states at the
analysis time, and compares these
model states to observations taken at the analysis time.
Similarly, we can consider approximate
model trajectories that are linear combinations of the
background ensemble trajectories over an
interval of time, and compare these approximate trajectories
with the observations taken over that
time interval. Instead of asking which model trajectory best
fits the observations, we ask which
linear combination of the background ensemble trajectories best
fits the observations. As before, this
is relatively a low-dimensional optimization problem thatis much
more computationally tractable
than the full nonlinear problem.
This approach is similar to that of an ensemble Kalman smoother
[7, 8], but over a much shorter
time interval. As compared to a “filter”, which estimates
thestate of a system at timet using
observations made up to timet, a “smoother” estimates the system
state at timet using observations
made before and after timet. Over a long time interval, one must
generally take a more sophisticated
20
-
approach to smoothing than to simply consider linear
combinations of an ensemble of trajectories
generated over the entire interval, both because the
trajectories may diverge enough that linear
combinations of them will not approximate model trajectories,
and because in the presence of model
error there may be no model trajectory that fits the
observations over the entire interval. Over a short
enough time interval however, the approximation of true system
trajectories by linear combinations
of model trajectories with similar initial conditions is quite
reasonable.
While this approach to assimilating asynchronous observations is
suitable for any ensemble
Kalman filter [16], it is particularly simple to implement inthe
LETKF framework; I will call this
extension 4D-LETKF. To be specific, suppose that we have
observations(τj ,yoτj) taken at various
timesτj since the previous analysis. LetHτj be the observation
operator for timeτj and letRτj be
the error covariance matrix for these observations. In Section
2.3, we mapped a vectorw in the
k-dimensional spacẽS into observation space using the
formulaȳb +Ybw, where the background
observation mean̄yb and perturbation matrixYb were formed by
applying the observation operator
H to the background ensemble at the analysis time. So now, for
each timeτj , we applyHτj to the
background ensemble at timeτj , calling the mean of the
resulting vectorsȳbτj and forming their
differences from the mean into the matrixYbτj .
We now form a combined observation vectoryo by concatenating
(vertically) the (column) vec-
torsyoτj , and similarly by vertical concatenation of the
vectorsȳbτj
and matricesYbτj respectively, we
form the combined background observation meanȳb and
perturbation matrixYb. We form the cor-
responding error covariance matrixR as a block diagonal matrix
with blocksRτj (this assumes that
observations taken at different times have uncorrelated errors,
though such correlations if present
could be included inR).
Given this notation, we can then use the same analysis equations
as in the previous sections,
which are based on minimizing the cost functionJ̃∗ given by
(19). (We could instead write down
the appropriate analogue to (15) and minimize the
resultingnonlinear cost functioñJ ; this would be
no harder than in the case of synchronous observations.)
Referring to Section 3, the only change is
in Step 1, which one should perform for each observation timeτj
(using the background ensemble
and observation operator for that time) and then concatenate the
results as described above. Step 2
still only needs to be done at the analysis time, since its
output is used only in Step 8 to form the
analysis ensemble in model space. All of the intermediate steps
work exactly the same, in terms of
the output of Step 1.
In practice the model will be integrated with a discrete
timestep that in general will not coincide
with the observation timesτj . One should either interpolate the
background ensemble trajectories
to the observation times, or simply round the observation times
off to the nearest model integration
time. In either case, one must either store the background
ensemble trajectories until the analysis
time, or perform Step 1 of Section 3 during the model
integration and store its results. The latter
21
-
approach will require less storage if the number of observations
per model integration time step is
less than the number of model variables.
5 Summary, Results, and Acknowledgments
In this article I have described a general framework for
dataassimilation in spatiotemporally chaotic
systems using an ensemble Kalman filter that in its basic
version (Section 3) is relatively efficient
and simple to implement. In a particular application, one may be
able to improve accuracy by
experimenting with different approaches to localization (see the
discussion in Sections 2.2 and 2.3),
covariance inflation (see the end of Section 2.3), and/or
asynchronous observations (Section 4). For
very large systems and/or when maximum efficiency is important,
one should consider carefully the
comments about implementation in Section 3 (and at the end
ofSection 4, if applicable). One can
also apply this method to low-dimensional chaotic
systems,without using localization.
Results using the LETKF approach will be reported elsewhere. The
quality of these results is
similar to other published results for square-root ensemble
Kalman filters [38, 30, 34]. In particular,
J. Harlim has obtained results [13] comparable to those in [38,
30] for a perfect model scenario, us-
ing a system with one spatial dimension proposed by E. Lorenzin
1995 [26, 27]. Also, E. Kostelich
and I. Szunyogh have obtained preliminary results comparable to
those in [34] using the LEKF ap-
proach of [29, 30] for the National Weather Service’s
globalforecast model, again in a perfect model
scenario, with the LETKF approach running several times faster.
Thus, this article does not describe
a fundamentally new method for data assimilation, but rather a
refinement of existing approaches
that combines simplicity with the flexibility to adapt to a
variety of applications.
I thank my collaborators for their ideas and encouragement.In
particular, E. Kostelich, I. Szun-
yogh, and G. Gyarmati have put considerable effort into a
parallel LETKF implementation for global
forecast models. J. Harlim has performed a 4D-LETKF
implementation and done extensive testing
on several models. E. Fertig has tested the approach described
here for a nonlinear observation op-
erator, and is working with H. Li and J. Liu on applying our
parallel LETKF implementation to the
NASA global forecast model. M. Cornick has done an LETKF
implementation for Rayleigh-Bénard
convection. Using the LEKF of [29, 30], T. Miyoshi has testedthe
smoothed localization described
in Section 2.3 [28], and T. Sauer helped develop and test the 4D
extension described here [16]. E.
Kalnay, E. Ott, D. Patil, and J. Yorke have provided
numerousinsights that form the basis for this
work. I am grateful to all for their feedback, which has helped
to refine the approach described in
this paper.
22
-
References
[1] J. L. Anderson, An ensemble adjustment Kalman filter for
data assimilation, Mon. Wea. Rev.
129, 2884–2903 (2001).
[2] J. L. Anderson & S. L. Anderson, A Monte Carlo
implementation of the nonlinear filtering prob-
lem to produce ensemble assimilations and forecasts, Mon. Wea.
Rev.127, 2741–2758 (1999).
[3] C. H. Bishop, B. J. Etherton, S. J. Majumdar, Adaptive
sampling with the ensemble transform
Kalman filter. Part I: Theoretical aspects, Mon. Wea. Rev.129,
420–436 (2001).
[4] G. Burgers, P. J. van Leeuwen, G. Evensen, Analysis scheme
in the ensemble Kalman filter,
Mon. Wea. Rev.126, 1719–1724 (1998).
[5] A. Doucet, N. de Freitas, N. Gordon, eds.,Sequential Monte
Carlo Methods in Practice,
Springer-Verlag, 2001.
[6] G. Evensen, Sequential data assimilation with a nonlinear
quasi-geostrophic model using Monte
Carlo methods to forecast error statistics, J. Geophys. Res. 99,
10143–10162 (1994).
[7] G. Evensen & P. J. van Leeuwen, An ensemble Kalman
smoother for nonlinear dynamics, Mon.
Wea. Rev.128, 1852–1867 (2000).
[8] G. Evensen, The ensemble Kalman filter: theoretical
formulation and practical implementation,
Ocean Dynam.53, 343–367 (2003).
[9] I. Fukumori, A partitioned Kalman filter and smoother, Mon.
Wea. Rev.130, 1370–1383 (2002).
[10] M. Ghil & P. Malanotte-Rizzoli, Data assimilation in
meteorology and oceanography, Adv.
Geophys.33, 141–266 (1991).
[11] T. M. Hamill & C. Snyder, A hybrid ensemble Kalman
filter–3D variational analysis scheme
Mon. Wea. Rev.1282905–2919 (2000).
[12] T. M. Hamill, J. S. Whitaker, and C. Snyder,
Distance-dependent filtering of background error
covariance estimates in an ensemble Kalman filter, Mon.
Wea.Rev.129, 2776–2790 (2001).
[13] J. Harlim & B. R. Hunt, A local ensemble transform
Kalmanfilter: an efficient scheme for
assimilating atmospheric data, preprint.
[14] P. L. Houtekamer & H. L. Mitchell, Data assimilation
using an ensemble Kalman filter tech-
nique, Mon. Wea. Rev.126, 796–811 (1998).
23
-
[15] P. L. Houtekamer & H. L. Mitchell, A sequential
ensembleKalman filter for atmospheric data
assimilation, Mon. Wea. Rev.129, 123–137 (2001).
[16] B. R. Hunt, E. Kalnay, E. J. Kostelich, E. Ott, D. J.
Patil, T. Sauer, I. Szunyogh, J. A. Yorke,
A. V. Zimin, Four-dimensional ensemble Kalman filtering, Tellus
A 56, 273–277 (2004).
[17] K. Ide, P. Courtier, M. Ghil, A. C. Lorenc, Unified
notation for data assimilation: operational,
sequential, and variational, J. Meteo. Soc. Japan75, 181–189
(1997).
[18] A. H. Jazwinski,Stochastic Processes and Filtering Theory,
Academic Press (1970).
[19] R. E. Kalman, A new approach to linear filtering and
prediction problems, Trans. ASME, Ser.
D: J. Basic Eng.82, 35–45 (1960).
[20] R. E. Kalman & R. S. Bucy, New results in linear
filtering and prediction theory, Trans. ASME,
Ser. D: J. Basic Eng.83, 95–108 (1961).
[21] E. Kalnay,Atmospheric Modeling, Data Assimilation and
Predictability, Cambridge Univ.
Press (2002).
[22] E. Kalnay & Z. Toth, Removing growing errors in the
analysis, Proc. of the 10th Conf. on
Numerical Weather Prediction, Amer. Meteo. Soc., Portland,
Oregon (1994).
[23] C. L. Keppenne, Data assimilation into a primitive-equation
model with a parallel ensemble
Kalman filter, Mon. Wea. Rev.128, 1971–1981 (2000).
[24] F.-X. Le Dimet & O. Talagrand, Variational algorithms
for analysis and assimilation of mete-
orological observations: theoretical aspects, Tellus A38, 97–110
(1986).
[25] A. C. Lorenc, A global three-dimensional
multivariatestatistical interpolation scheme, Mon.
Wea. Rev.109, 701–721 (1981).
[26] E. N. Lorenz, Predictability: a problem partly solved,in
Proc. of the ECMWF Seminar on
Predictability, vol. 1, Reading, UK (1996).
[27] E. N. Lorenz & K. A. Emanuel, Optimal sites for
supplementary weather observations: simu-
lation with a small model, J. Atmos. Sci.55, 399–414 (1998).
[28] T. Miyoshi, Ensemble Kalman filter experiments with a
primitive-equation global model,
Ph.D. dissertation, University of Maryland (2005).
24
-
[29] E. Ott, B. R. Hunt, I. Szunyogh, M. Corazza, E. Kalnay,
D.J. Patil, J. A. Yorke, A. V. Zimin,
E. J. Kostelich, Exploiting Local Low Dimensionality of
theAtmospheric Dynamics for Efficient
Ensemble Kalman Filtering,arXiv:physics/0203058v3 (2002).
[30] E. Ott, B. R. Hunt, I. Szunyogh, A. V. Zimin, E. J.
Kostelich, M. Corazza, E. Kalnay, D. J.
Patil, J. A. Yorke, A local ensemble Kalman filter for
atmospheric data assimilation, Tellus A56,
415–428 (2004).
[31] D. F. Parrish & J. C. Derber, The National
Meteorological Center’s spectral statistical-
interpolation analysis system, Mon. Wea. Rev.120, 1747–1763
(1992).
[32] D. J. Patil, B. R. Hunt, E. Kalnay, J. A. Yorke, E. Ott,
Local low dimensionality of atmospheric
dynamics, Phys. Rev. Lett.86, 5878–5881 (2001).
[33] L. M. Pecora & T. L. Carroll, Synchronization in
chaoticsystems, Phys. Rev. Lett. 64, 821–824
(1990).
[34] I. Szunyogh, E. J. Kostelich, G. Gyarmati, D. J. Patil, B.
R. Hunt, E. Kalnay, E. Ott, J. A.
Yorke, Assessing a local ensemble Kalman filter: perfect model
experiments with the National
Centers for Environmental Prediction global model, TellusA 57,
528–545 (2005).
[35] O. Talagrand & P. Courtier, Variational assimilation of
meteorological observations with the
adjoint vorticity equation I: theory, Quart. J. Roy.
Meteo.Soc.113, 1311–1328 (1987).
[36] M. K. Tippett, J. L. Anderson, C. H. Bishop, T. M.
Hamill,J. S. Whitaker, Ensemble square-
root filters, Mon. Wea. Rev.131, 1485–1490 (2003).
[37] X. Wang, C. H. Bishop, S. J. Julier, Which is better, an
ensemble of positive-negative pairs or
a centered spherical simplex ensemble?, Mon. Wea. Rev.132,
1590–1605 (2004).
[38] J. S. Whitaker, T. M. Hamill, Ensemble data assimilation
without perturbed observations, Mon.
Wea. Rev.130, 1913–1924 (2002).
[39] F. Zhang, C. Snyder, J. Sun, Impacts of initial estimateand
observation availability on
convective-scale data assimilation with an ensemble Kalman
filter, Mon. Wea. Rev.132, 1238–
1253 (2004).
[40] M. Zupanski, Maximum likelihood ensemble filter:
theoretical aspects, Mon. Wea. Rev.133,
1710–1726 (2005).
25
http://arxiv.org/abs/physics/0203058
IntroductionMathematical FormulationLinear Scenario: the Kalman
FilterNonlinear Scenario: Ensemble Kalman FilteringLETKF: A local
ensemble transform Kalman filter
Efficient Computation of the AnalysisAsynchronous Observations:
4D-LETKFSummary, Results, and Acknowledgments