EnKF Localization Techniques and Balance
Steven Greybush
Eugenia Kalnay, Kayo Ide, Takemasa Miyoshi, and Brian Hunt
Weather Chaos Meeting – September 21, 2009
Data Assimilation Equation
xa = xb + w*(yo – h(xb)), w = σb2 / (σb
2 + σo2)
xa = xb + K(yo – H(xb)), K = BHT(HBHT + R)-1
The analysis is equal to the background plus a weighted sum of observation increments.
Scalar form:
Matrix form:
B = background error covariance matrix
R = observation error covariance matrix Background error covariance in observation space.
Background error covariance between model variables and observed variables.
Ensemble Perturbation matrix.
Number of ensemble members.
Motivation for Localization
• Distance-dependent assumption (reasoning empirically):
• (typically large) covariances between nearby locations are physically valid
• whereas (typically small) covariances between locations that are far away are more noise than signal, and thus spurious. (Hamill et al., 2001)
Covariance Localization • A modification of the covariance matrices in the Kalman
gain formula that reduces the influence of distant regions. (Houtekamer and Mitchell, 2001)
• Removes spurious long distance correlations due to sampling error of the model covariance from finite ensemble size. (Anderson, 2007)
• Takes advantage of the atmosphere’s lower dimensionality in local regions. (Hunt et. al., 2007)
• Ultimately creates a more accurate analysis (reduces RMSE) – it is a practical necessity.
The Notion of “Balance”
• An atmospheric state that approximately follows physical balance equations appropriate to the scale and location
• Forecast will not have spurious time oscillations.
• Example: geostrophic balance between wind and mass (temperature / height) field
Geostrophic Adjustment Temperature (Mass Field) Wind Magnitude (Wind Field)
T = 0 hours
Geostrophic Adjustment Temperature (Mass Field) Wind Magnitude (Wind Field)
T = 2 hours
Geostrophic Adjustment Temperature (Mass Field) Wind Magnitude (Wind Field)
T = 4 hours
Geostrophic Adjustment Temperature (Mass Field) Wind Magnitude (Wind Field)
T = 6 hours
Geostrophic Adjustment Temperature (Mass Field) Wind Magnitude (Wind Field)
T = 8 hours
Gravity Waves
Coping with Imbalance
• Richardson’s failed forecast
• Simplified models
• Initialization step
• Penalty methods
Balance vs. Accuracy
• Observations are noisy, and hence unbalanced. Therefore an analysis that fits the observations too closely (accurate) is not balanced.
• Additionally, data assimilation techniques can introduce imbalance.
Covariance Localization
• Accomplished by taking a Schur product between the model covariance matrix and a matrix whose elements are dependent upon the distance between the corresponding grid points:
Bloc = B * exp(-(ri-rj)2 / 2L2 ) Localization Distance L
(Hamill et al., 2001)
Problems with Localization • Lorenc (2003) and Kepert (2006) argue that
localization reduces the balance information encoded in the model covariance matrix.
• Houtekamer and Mitchell (2005) noted balance problems when applying a localized EnKF to the Canadian GCM.
• Imbalanced analyses project information onto inertial-gravity waves, which are filtered out (geostrophic adjustment, digital filtering, etc.), resulting in a loss of information and a suboptimal analysis.
Localization Methods
• B Localization - Model grid points that are far apart have zero error covariance.
• R Localization - Observations that are far away from a grid point have infinite error covariance.
Rloc = R * exp(+(d)2 / 2L2 )
Bloc = B * exp(-(ri-rj)2 / 2L2 )
R localization can be used with LETKF. (Hunt 2005, Miyoshi 2005)
Research Questions
• How does localization introduce imbalance into an analysis? Can it be avoided?
• How do the analyses produced by B-localization and R-localization EnKF compare in terms of accuracy (RMSE) and (geostrophic) balance?
• Consider variation only along the x-axis. • The variables of interest are thus h and v. • Linearize the equations, and apply a
harmonic form to the solution:
Part I: Simple Model The shallow water equations in
a rotating, inviscid fluid:
The geostrophic balance is thus:
Substituting into the governing equations, and assuming geostrophic balance, yields the following solutions for h and v:
Experimental Design
L=500 km
Analysis Increments
Circles are observations.
Here, RMS Error: B Localization ~ R Localization < No Localization L=500 km
Analysis Error Circles are observations.
Analysis Imbalance
Imbalance: B Localization >> R Localization > No Localization L=500 km
Black circles are observation locations.
Covariance Localization; Varying the Localization Distance L
• Wavelength (W) = 2000 km
• Distance between obs (D) = 250 km,
• Number of Ensemble Members (p) = 5
RMS Error RMS Imbalance
Localization Distance L (km) Localization Distance L (km)
• Results taken as mean over 100 random simulations. • Use LETKF for R-localization to avoid undesired statistical properties (asymmetric B-matrix). Results are very similar to EnKF, so the comparison is fair.
Simple Model Conclusions
• Both types of localization do introduce imbalance into analysis increments, especially for short localization distances.
• R localization is more balanced than B localization for same L, but is slightly less accurate.
• The two methods have differing optimal localization length scales (L).
Why Does Localization Produce Imbalance?
L=250 km
Example: Apply a Gaussian localization function to an h and v waveform based upon the distance from the origin:
Original (solid) and localized (dashed) Waveforms
Imbalance of original and localized Waveforms
Example adapted from Lorenc 2003.
Analogy:
• Assimilate height observation at origin.
• Waveforms are proportional to analysis increments.
• Example considers modification of K, irrespective of modification of B or R.
Why do the optimum length scales differ?
• Two grid points, observation at grid point 1.
• K for grid point 2: – B localization K2 = fBloc(d12)B12 (B11 + R1)-1
– R localization K2 = B12 (B11+ fRloc(d12)R1)-1 = fBloc(d12)B12 (fBloc(d12)B11+ R1)-1
=
fBloc = exp(-(dij)2 / 2L2 ) fRloc = exp(+(dij)2 / 2L2 )
Measuring Balance in Full Model
• Background can no longer be considered to be balanced.
• Natural imbalance vs. imbalance induced by data assimilation
Methods: • Magnitude of the Ageostrophic Wind • 2nd Derivative of Surface Pressure • Difference between original analysis and
initialized analysis (with digital filter)
SPEEDY Model • Simplified Parametrizations, primitivE-Equation
DYnamics (SPEEDY) • Atmospheric Global Circulation Model • seven vertical levels using the sigma coordinate system • horizontal spectral resolution of T30, which corresponds
to a standard 96x48 Gaussian grid • Leapfrog time step • There are five dynamical variables included in the
output: zonal wind (u), meridional wind (v), temperature (T), specific humidity (q), and surface pressure (ps).
SPEEDY Evaluation Metrics
• Compare EnSRF B-localization vs. LETKF R-localization for accuracy and balance at level 4 (~500 hPa) mid-latitudes in both hemispheres.
Observing system:
(rawinsonde distribution located on grid points)
Experiment Length:
2 months (Feb. and Mar.)
SPEEDY Results: Southern Hemisphere
Dark Solid Line = Nature Run;
Dotted Line = Free Run
SPEEDY Results: Northern Hemisphere
Dark Solid Line = Nature Run;
Dotted Line = Free Run
SPEEDY Results
Results averaged between Feb 20 and Mar 20, which is after spin-up.
SPEEDY Conclusions • Error is greater in the Southern Hemisphere (less
observations) than in the Northern Hemisphere. • Imbalance is greater in the Northern Hemisphere
(presence of Tibetan plateau; seasonal dependence). • Optimal localization scale for LETKF R-localization is
shorter (~300-500 km) than for EnKF B-localization (~500-750 km). This agrees with previous results with simple model.
• Both localization methods introduce similar imbalance when the optimal length scale of each technique is considered.
• Additional balance metrics must be evaluated.
Advanced Localization Methods
• Adaptive Localization (do not require distance-dependent assumption) – Anderson – hierarchical filter – Bishop and Hodyss – raising correlations to a power – Miyoshi
• Variable Transformations – Kepert – transform variables to streamfunction and
velocity potential • Variable Localization
– Kang et al.
B Localization vs. K Localization • B cannot be expressed explicitly for high-dimensional systems.
Therefore, BHT and HBHT are determined directly from the ensemble.
Y= HX
Kalman Gain: K = BHT (HBHT + R)-1
In K-localization, elements of K are localized based upon the distance between the observation and the model grid point. It is thus a hybrid of B-localization and R-localization, and is used in place of B-localization in practice for several variations of EnKF.
K localization reduces to B localization for EnSRF if observations are located on grid points.
Discussion Questions
• How does one avoid / mitigate / cope with assimilation-induced imbalance?
• Will adaptive localization schemes improve accuracy and balance? Are they practical?
• Should an initialization step (digital filters) be used with assimilation?
• Are there other ways of encouraging balance within the LETKF?