EnKF Localization Techniques and Balance

EnKF Localization Techniques and Balance

Steven Greybush

Eugenia Kalnay, Kayo Ide, Takemasa Miyoshi, and Brian Hunt

Weather Chaos Meeting – September 21, 2009

Data Assimilation Equation

xa = xb + w*(yo – h(xb)), w = σb2 / (σb

2 + σo2)

xa = xb + K(yo – H(xb)), K = BHT(HBHT + R)-1

The analysis is equal to the background plus a weighted sum of observation increments.

Scalar form:

Matrix form:

B = background error covariance matrix

R = observation error covariance matrix Background error covariance in observation space.

Background error covariance between model variables and observed variables.

Ensemble Perturbation matrix.

Number of ensemble members.

Motivation for Localization

•  Distance-dependent assumption (reasoning empirically):

•  (typically large) covariances between nearby locations are physically valid

•  whereas (typically small) covariances between locations that are far away are more noise than signal, and thus spurious. (Hamill et al., 2001)

Covariance Localization •  A modification of the covariance matrices in the Kalman

gain formula that reduces the influence of distant regions. (Houtekamer and Mitchell, 2001)

•  Removes spurious long distance correlations due to sampling error of the model covariance from finite ensemble size. (Anderson, 2007)

•  Takes advantage of the atmosphere’s lower dimensionality in local regions. (Hunt et. al., 2007)

•  Ultimately creates a more accurate analysis (reduces RMSE) – it is a practical necessity.

The Notion of “Balance”

•  An atmospheric state that approximately follows physical balance equations appropriate to the scale and location

•  Forecast will not have spurious time oscillations.

•  Example: geostrophic balance between wind and mass (temperature / height) field

Geostrophic Adjustment Temperature (Mass Field) Wind Magnitude (Wind Field)

T = 0 hours


T = 2 hours


T = 4 hours


T = 6 hours


T = 8 hours

Gravity Waves

Coping with Imbalance

•  Richardson’s failed forecast

•  Simplified models

•  Initialization step

•  Penalty methods

Balance vs. Accuracy

•  Observations are noisy, and hence unbalanced. Therefore an analysis that fits the observations too closely (accurate) is not balanced.

•  Additionally, data assimilation techniques can introduce imbalance.

Covariance Localization

•  Accomplished by taking a Schur product between the model covariance matrix and a matrix whose elements are dependent upon the distance between the corresponding grid points:

Bloc = B * exp(-(ri-rj)2 / 2L2 ) Localization Distance L

(Hamill et al., 2001)

Problems with Localization •  Lorenc (2003) and Kepert (2006) argue that

localization reduces the balance information encoded in the model covariance matrix.

•  Houtekamer and Mitchell (2005) noted balance problems when applying a localized EnKF to the Canadian GCM.

•  Imbalanced analyses project information onto inertial-gravity waves, which are filtered out (geostrophic adjustment, digital filtering, etc.), resulting in a loss of information and a suboptimal analysis.

Localization Methods

•  B Localization - Model grid points that are far apart have zero error covariance.

•  R Localization - Observations that are far away from a grid point have infinite error covariance.

Rloc = R * exp(+(d)2 / 2L2 )

Bloc = B * exp(-(ri-rj)2 / 2L2 )

R localization can be used with LETKF. (Hunt 2005, Miyoshi 2005)

Research Questions

•  How does localization introduce imbalance into an analysis? Can it be avoided?

•  How do the analyses produced by B-localization and R-localization EnKF compare in terms of accuracy (RMSE) and (geostrophic) balance?

•  Consider variation only along the x-axis. •  The variables of interest are thus h and v. •  Linearize the equations, and apply a

harmonic form to the solution:

Part I: Simple Model The shallow water equations in

a rotating, inviscid fluid:

The geostrophic balance is thus:

Substituting into the governing equations, and assuming geostrophic balance, yields the following solutions for h and v:

Experimental Design

L=500 km

Analysis Increments

Circles are observations.

Here, RMS Error: B Localization ~ R Localization < No Localization L=500 km

Analysis Error Circles are observations.

Analysis Imbalance

Imbalance: B Localization >> R Localization > No Localization L=500 km

Black circles are observation locations.

Covariance Localization; Varying the Localization Distance L

• Wavelength (W) = 2000 km

• Distance between obs (D) = 250 km,

• Number of Ensemble Members (p) = 5

RMS Error RMS Imbalance

Localization Distance L (km) Localization Distance L (km)

• Results taken as mean over 100 random simulations. • Use LETKF for R-localization to avoid undesired statistical properties (asymmetric B-matrix). Results are very similar to EnKF, so the comparison is fair.

Simple Model Conclusions

•  Both types of localization do introduce imbalance into analysis increments, especially for short localization distances.

•  R localization is more balanced than B localization for same L, but is slightly less accurate.

•  The two methods have differing optimal localization length scales (L).

Why Does Localization Produce Imbalance?

L=250 km

Example: Apply a Gaussian localization function to an h and v waveform based upon the distance from the origin:

Original (solid) and localized (dashed) Waveforms

Imbalance of original and localized Waveforms

Example adapted from Lorenc 2003.

Analogy:

• Assimilate height observation at origin.

• Waveforms are proportional to analysis increments.

• Example considers modification of K, irrespective of modification of B or R.

Why do the optimum length scales differ?

•  Two grid points, observation at grid point 1.

•  K for grid point 2: –  B localization K2 = fBloc(d12)B12 (B11 + R1)-1

–  R localization K2 = B12 (B11+ fRloc(d12)R1)-1 = fBloc(d12)B12 (fBloc(d12)B11+ R1)-1

=

fBloc = exp(-(dij)2 / 2L2 ) fRloc = exp(+(dij)2 / 2L2 )

Measuring Balance in Full Model

•  Background can no longer be considered to be balanced.

•  Natural imbalance vs. imbalance induced by data assimilation

Methods: •  Magnitude of the Ageostrophic Wind •  2nd Derivative of Surface Pressure •  Difference between original analysis and

initialized analysis (with digital filter)

SPEEDY Model •  Simplified Parametrizations, primitivE-Equation

DYnamics (SPEEDY) •  Atmospheric Global Circulation Model •  seven vertical levels using the sigma coordinate system •  horizontal spectral resolution of T30, which corresponds

to a standard 96x48 Gaussian grid •  Leapfrog time step •  There are five dynamical variables included in the

output: zonal wind (u), meridional wind (v), temperature (T), specific humidity (q), and surface pressure (ps).

SPEEDY Evaluation Metrics

•  Compare EnSRF B-localization vs. LETKF R-localization for accuracy and balance at level 4 (~500 hPa) mid-latitudes in both hemispheres.

Observing system:

(rawinsonde distribution located on grid points)

Experiment Length:

2 months (Feb. and Mar.)

SPEEDY Results: Southern Hemisphere

Dark Solid Line = Nature Run;

Dotted Line = Free Run

SPEEDY Results: Northern Hemisphere

Dark Solid Line = Nature Run;

Dotted Line = Free Run

SPEEDY Results

Results averaged between Feb 20 and Mar 20, which is after spin-up.

SPEEDY Conclusions •  Error is greater in the Southern Hemisphere (less

observations) than in the Northern Hemisphere. •  Imbalance is greater in the Northern Hemisphere

(presence of Tibetan plateau; seasonal dependence). •  Optimal localization scale for LETKF R-localization is

shorter (~300-500 km) than for EnKF B-localization (~500-750 km). This agrees with previous results with simple model.

•  Both localization methods introduce similar imbalance when the optimal length scale of each technique is considered.

•  Additional balance metrics must be evaluated.

Advanced Localization Methods

•  Adaptive Localization (do not require distance-dependent assumption) –  Anderson – hierarchical filter –  Bishop and Hodyss – raising correlations to a power –  Miyoshi

•  Variable Transformations –  Kepert – transform variables to streamfunction and

velocity potential •  Variable Localization

–  Kang et al.

B Localization vs. K Localization •  B cannot be expressed explicitly for high-dimensional systems.

Therefore, BHT and HBHT are determined directly from the ensemble.

Y= HX

Kalman Gain: K = BHT (HBHT + R)-1

In K-localization, elements of K are localized based upon the distance between the observation and the model grid point. It is thus a hybrid of B-localization and R-localization, and is used in place of B-localization in practice for several variations of EnKF.

K localization reduces to B localization for EnSRF if observations are located on grid points.

Discussion Questions

•  How does one avoid / mitigate / cope with assimilation-induced imbalance?

•  Will adaptive localization schemes improve accuracy and balance? Are they practical?

•  Should an initialization step (digital filters) be used with assimilation?

•  Are there other ways of encouraging balance within the LETKF?

EnKF Localization Techniques and Balance

Documents

EnKF Localization Techniques and Balance