Newsletter No. 149 banner

Use of forecast departures in verification against observations

Mohamed Dahoui
Gabor Radnoti
Sean Healy
Lars Isaksen
Thomas Haiden


Forecast users need to know how well ECMWF forecasts predict the actual observed conditions. Recent work makes it possible to assess the performance of our forecasts through detailed and accurate comparisons against all available observations. The forecast verification has thus been extended to incorporate all the observations used and quality-controlled by the data assimilation system (4DVAR).


Main operations to compare forecasts with observations in the IFS

The differences between observations and the short-range forecast are the most important input for the data assimilation process. Their computation is based on sophisticated infrastructure involving the following operations:

  • Interpolation from forecast time to observation time (in 4DVAR this means running the forecast model over the assimilation window)
  • Horizontal and vertical interpolations
  • Vertical integration
  • Horizontal integration for limb geometry observations
  • Converting model variables to the observed geophysical quantity (not needed when the observed quantity is directly represented by the model)
  • Computing the differences between observed and simulated quantities (background departures)
  • Quality control checks (first-guess checks plus variational quality control)
  • Data thinning (avoiding over-sampling and problems due to correlated errors)
  • Data blacklisting (for systematic poor performance or ongoing assessment)

Forecast verification is routinely performed against a subset of observations: radiosondes, SYNOP stations and buoy data for upper air, near-surface and wave forecasts, respectively. These observations provide independent verification, but they lack temporal and spatial coverage, leading to sampling issues. Extending the verification to other observation types, GPS radio occultation (GPS-RO) data for example, is very useful where the coverage of radiosondes is insufficient.

Computing forecast departures

The first step in data assimilation produces a precise comparison between observations and their counterparts from a short-range forecast (see Box A). This procedure has now been applied to forecast steps up to day 10, leading to the computation of forecast departures (observation minus forecast) against all quality-controlled observations. The computation of forecast departures is performed with respect to observed quantities (e.g. satellite radiances, GPS-RO bending angles). For ranges beyond 12 hours, the forecast is independent of the set of observations against which it is verified. This is also true of the quality control tests applied to observations, which are based on recent short-range forecasts. The availability of such forecast departures has a number of benefits for the verification of forecasts:

  • The verifying observations are to a large extent independent from the forecasts being verified.
  • Verification can be carried out against a wide range of observation types with good availability in time and space. The variety and redundancy of the observing system helps users to disentangle forecast and observation errors.
  • For longer ranges (typically beyond 48 hours), forecast errors are significantly larger than typical observation errors. This significantly reduces the undesirable effect of observation errors masking forecast improvements.
  • It is possible to estimate the forecast error growth rate and model activity.
  • There is increased synergy between observation monitoring activities and forecast verification activities.

Figure 1
Figure 1 Statistics of forecast departures for radiances from all used channels from Metop-A/AMSU-A for successive forecast ranges over the southern hemisphere extratropics, showing (a) the standard deviation (random error) and (b) the bias (mean error). The forecasts were produced between 1 September and 30 September 2015 using a lower resolution of IFS Cycle 41r1.

Figure 2
Figure 2 Statistics of differences between day-3 absolute mean forecast departures for radiances from Metop-A/AMSU-A channel 14 between an experiment based on IFS Cycle 41r2 and a control experiment based on IFS Cycle 41r1. Negative values indicate that the mean forecast departures using IFS Cycle 41r2 are smaller. Dots indicate areas where the differences are statistically significant at the 95% confidence level. The forecasts cover the period from 31 August to 1 October 2015.

Figure 3
Figure 3 Statistics of the differences in day-3 forecast departures, for radiances from all used channels from Metop-A/AMSU-A, between an experiment based on IFS Cycle 41r2 and a control experiment based on IFS Cycle 41r1 over the northern hemisphere extratropics, showing (a) the standard deviation of the forecast departures from the experiment normalised by the standard deviation (random error) of forecast departures from the control experiment and (b) the bias (mean error) for IFS Cycle 41r2 and IFS Cycle 41r1. The forecasts cover the period from 1 September to 30 September 2015.

Figure 1 shows an example of statistics of forecast departures for successive forecast ranges up to three days for AMSU-A radiances. It highlights the increase in the random and systematic components of the forecast error as the range increases. Figure 2 shows the difference in day-3 forecast departures for radiances from Metop-A/AMSU-A Channel 14 between two model cycles (IFS cycles 41r1 and 41r2). The plot shows a statistically significant reduction in the upper stratospheric temperature bias caused by a slight cooling in model cycle 41r2. Figure 3 shows the distribution of the same comparison for all used Metop-A/AMSU-A channels, highlighting the temperature bias change.


Procedure to compute forecast departures against GPS-RO bending angles and retrieved temperatures

  • Extraction of surface pressure, surface elevation, pressure, temperature and humidity on model levels from the desired forecast range and a short-range (6-hour) forecast.
  • Retrieval of the GPS-RO data from the BUFR file.
  • Usage of a 1D operator to compute the bending angles using the desired forecast range and 6-hour forecast. There is no time or horizontal interpolation (usage of the forecast time and the nearest grid point). The 2D aspects are ignored (no horizontal integration is performed).
  • Usage of a 1D operator to retrieve temperatures in the stratosphere using the desired forecast range and 6-hour forecast. The procedure starts by deriving the refractive index profile, which involves the use of a priori information. The pressure (at each impact height) is derived using the relation between refractivity, pressure and temperature and assuming dry conditions. The temperature is then computed using the ideal gas law.
  • Quality control of observations based on observation fit to 6-hour forecast.
  • Computing of bending angle departures for the desired forecast range.

Stratospheric forecast verification

The general availability of observation-minus-forecast differences has the potential to allow the estimation of the relative impact of assimilated observations on forecast quality (Todling, 2012). This method is being explored.

The verification of forecasts in the stratosphere is best performed against GPS-RO-derived observations. GPS-RO have a good vertical resolution as well as a global and homogeneous distribution (around 3,000 profiles daily) and, most importantly, their biases are small enough for the data to be assimilated without bias correction. Initially the forecast departures were produced against bending angles only (which is the quantity being assimilated). However, bending angle statistics are not easy to interpret when dealing with biases, mainly due to the combined impact of temperature and moisture on bending angles. To address this limitation, the computation of departures procedure has been extended to enable the comparison of any forecast range (up to day 10) against temperature retrievals from GPS-RO (see Box B). This extension offers a good method to assess the impact of model changes on systematic errors. Since GPS-RO temperature retrievals require a priori information on the upper atmosphere, it is important to restrict the use of temperature retrievals to the atmospheric region less constrained by the prior (below the 5 hPa level ). Furthermore, standard GPS-RO temperature retrievals are performed in dry conditions, which makes them less valid in the troposphere. Despite the good accuracy of temperature retrievals in the mid- to lower stratosphere, they remain to some extent dependent on the quality of the prior information used. For this reason, in order to obtain robust results when comparing model cycles, it is important to use the same prior (preferably the operational short-range forecasts) for GPS-RO temperature retrievals based on different model versions.

Figure 4 shows a comparison of day-3 forecast departures for GPS-RO temperature retrievals in November 2014 to January 2015 between the then operational cycle 40r1 and the then pre-operational cycle 41r1 over the northern hemisphere extratropics. It shows a reduction in the standard deviation in the pre-operational cycle, which is an indication of improvement.

Figure 4
Figure 4 Vertical profile of statistics of the differences in day-3 forecast departures, for temperature retrieved from GPS-RO instruments, between an experiment based on IFS Cycle 41r2 and a control experiment based on IFS Cycle 41r1 over the northern hemisphere extratropics, showing (a) the standard deviation from the experiment normalised by the standard deviation from the control experiment and (b) the bias (mean error) for IFS Cycle 41r2 and IFS Cycle 41r1. The forecasts cover the period 1 November 2014 to 30 January 2015.


There are plans to routinely compute and archive the differences between forecasts and observations (the departures). These departures will be used for forecast verification and for assessments of observation impact on forecast quality. The details of the implementation (resolution of the model and the set of forecast ranges to consider) will be defined in due course.

Further Reading

Todling, R., 2012: Comparing two approaches for assessing observations impact. Monthly Weather Review, 141, 1484–1505.