How to deal with model error in data assimilation

Jacky Goddard, Patrick Laloyaux, Simon Lang, Martin Leutbecher


ECMWF is developing an updated 4D-Var data assimilation system for its Integrated Forecasting System (IFS) which takes into account model error in estimating the initial conditions at the start of a forecast. This ‘weak-constraint 4D-Var’ has recently been implemented for the stratosphere but further work is needed before it can be made operational for the troposphere.

Weak- and strong-constraint 4D-Var compared
Weak- and strong-constraint 4D-Var compared. The plot shows analysis and background mean departures with respect to GPS-RO measurements for weak-constraint 4D-Var and strong-constraint 4D-Var over the period 1 January 2016 to 30 April 2016. Weak-constraint 4D-Var produces a better analysis with a smaller bias in the stratosphere.

Strong-constraint 4D-Var

Strong-constraint 4D-Var was implemented at ECMWF in 1997 to produce a more accurate and physically consistent estimate of the state of the atmosphere at the start of a forecast. This method was designed to optimally blend information from the observations and the model in the presence of random (zero-mean) errors.

In reality many conventional and satellite observations contain systematic errors due to instrument configuration or approximations in radiative transfer calculations. To take into account these biases in the observations, a variational bias correction scheme (VarBC) was embedded inside the 4D-Var system. The scheme estimates the instrument biases by finding corrections that minimise the systematic observation departures.

Strong-constraint 4D-Var relies on the assumption that the numerical model's representation of the evolution of atmospheric flow is perfect. The error in the model trajectory thus depends only on the background (short-range forecast) error. As data assimilation processes have advanced, it is no longer possible to ignore the model error which accumulates during the 12-hour assimilation window due to inaccurate surface forcing, simplified representations of moist physics and clouds, and various other imperfections. To address this issue, ECMWF has been developing a weak-constraint 4D-Var system where the model error is explicitly taken into account in the data assimilation.

Weak-constraint 4D-Var

Model error covariance matrix

Each term in the 4D-Var formulation requires the specification of an error covariance matrix to describe the error statistics of the different sources of information. Therefore, weak-constraint 4D-Var requires a model error covariance matrix in addition to the classic background and observation error covariance matrices. This model error covariance matrix describes how fast the model error can change between assimilation cycles and how the model error between different levels is correlated.

A large set of samples is required to generate covariance statistics for model errors. It is not possible to explicitly generate a sample of model errors, so instead a proxy has to be used. To create this proxy set of statistics we have run ECMWF ensemble forecasts without initial perturbations. In these runs, members diverge from each other solely due to the stochastic representation of model uncertainties, which is used operationally in the ensemble forecasts. The differences between members after 12 hours of model integration are used to construct the model error covariance matrix as they provide an estimate of the integrated effect of model error over 12 hours.

The perfect model assumption is relaxed in weak-constraint 4D-Var by adding a correction term in the model integration to account for the different sources of model error. The 4D-Var control variable is augmented by this correction term and a corresponding term is added to the cost function which constrains model error in accordance with its covariance statistics (see box). In order to account for systematic model error and to make the implementation affordable on today’s supercomputers, the model error is assumed to be constant for the 12-hour assimilation windows. With this assumption the size of the control variable almost doubles compared to strong-constraint 4D-Var.

Weak-constraint 4D-Var has been evaluated against strong-constraint 4D-Var. Experiments have shown that in the current setup the model error forcing needs to be restricted to the stratosphere above 40 hPa to avoid the erroneous interpretation of aircraft observation error as model error.

GPS Radio Occultation measurements are extremely valuable to assess improvements in analyses and short-range forecasts for the stratosphere as they are considered to be bias-free. They are based on analysing the bending caused by the atmosphere along paths between a GPS satellite and a receiver placed on a low-Earth-orbiting satellite. As illustrated in the first figure, such measurements show that weak-constraint 4D-Var produces an analysis and a background with a smaller systematic error compared to strong-constraint 4D-Var. Similar improvements have been found with other instruments, including AMSU-A and radiosondes.

Results from operations

Weak-constraint 4D-Var was made operational for the stratosphere in IFS Cycle 43r1 launched in November 2016. Very few diagnostics exist at the moment to quantify the actual model error and to monitor the performance of weak-constraint 4D-Var. We present here a first attempt where the model error correction term estimated by the weak-constraint 4D-Var system is compared to the forecast error computed as the difference between the analysis and the forecast after 12 hours. Although the forecast error is not only due to model error but also due to the analysis error, we expect weak-constraint 4D-Var to estimate some of the systematic features of the forecast error. The model error correction term is represented in the top panel of the second figure as it has evolved since its introduction into operations. As expected, its evolution is slow, focusing on the representation of systematic errors in the model, and there is no model error estimation below 40 hPa as the weak-constraint 4D-Var is active only in the stratosphere. The forecast error in the bottom panel of the figure shows more daily variability. However, the systematic signals present, mainly between 5 hPa and 1 hPa, are captured by weak-constraint 4D-Var, which estimates a model error correction term to correct them.


The weak-constraint formulation is an ongoing development at the cutting edge of 4D-Var. It reduces analysis and background biases in the stratosphere and improves the fit to satellite and radiosonde observations. Characterising the statistical properties of model error is one of the main current challenges. Another important aspect is also to disentangle correctly the different sources of error between observations and the model. At the moment this prevents us from activating the weak-constraint formulation in the troposphere, where aircraft observation error can be erroneously interpreted as model error.

Evolution of model error correction term and forecast error
Evolution of model error correction term and forecast error. The charts show time series of the temperature model error correction term estimated by the weak-constraint 4D-Var system (top) and the temperature forecast error (bottom) after 12 hours over the south polar region. Positive (negative) model error correction terms estimated by the weak-constraint 4D-Var system correct for the systematic negative (positive) forecast errors, especially in the upper stratosphere between 5 hPa and 1 hPa.