A new tool to understand changes in ensemble forecast skill

Martin Leutbecher, Thomas Haiden


The continuous ranked probability score (CRPS) is a scoring rule that is popular for assessing the quality of ensemble forecasts. The CRPS can be used to compare different ensemble forecasts and it plays an important role in guiding the development process for forecast systems at ECMWF and beyond. A new tool has been developed that makes it easier to understand the reasons for differences in the CRPS between sets of forecasts.


The tool consists of two elements. First, the sample of forecast–observation pairs is approximated by a homogeneous Gaussian (hoG) distribution. For each location and lead time, the distribution is characterised by the mean and variance of the error of the ensemble mean together with the ensemble spread. The second element consists of a closed-form expression for the expected CRPS of the hoG distribution. This expression depends on three variables: the ensemble mean error variance, the spread-error ratio and the mean error of the ensemble mean – the bias.

Therefore, any difference in the CRPS between forecasts can be linked to changes of these three variables when using this approximation.

In order to examine how well this approximation works, medium-range scores for several upper air variables have been computed and compared with the hoG approximation. Although the actual forecast–observation sample is simplified considerably, the approximation works well. It captures the geographical variations of the CRPS and it gives useful approximations of score changes due to e.g. changes in bias or ensemble spread. It thus enables users and developers to better understand the reasons for score differences between two ensemble systems. It is planned to add this new diagnostic to existing verification software used routinely at ECMWF for development and monitoring of the forecast system.

CRPS and its approximation compared.
CRPS and its approximation compared. Geographical distribution of CRPS differences (left) and hoG approximation of the CRPS differences (right) between model cycles 47r1 and 46r1 for 5-day ensemble forecasts of temperature at 250 hPa. Each model cycle has been verified against own analysis and the verification period is 16 July – 31 Oct 2019.


A recent example illustrates how the new diagnostic works in practice. Parallel runs with Cycle 47r1 of ECMWF’s Integrated Forecasting System showed improvements compared to the previous Cycle 46r1 (see the article comparing the two cycles in Newsletter No. 164). However, in the tropics at 250 hPa, the CRPS for temperature showed a degradation of several percent. The hoG approximation of the CRPS reproduces the geographical distribution of the CRPS changes well, as can be seen in the map plot (first figure). The bar diagram (second figure) shows that for the tropics as a whole, defined here as latitudes between –20° and +20°, the increase in CRPS is well approximated by the hoG model (compare the red and the first blue bar). The decomposition of the total CRPS change into the three contributions reveals that by far the largest contribution of 86% comes from a change of the bias, 14% is due to an increase in ensemble mean error variance, while changes in spread contribute less than 0.1%. The absolute change in the bias is small (less than 0.1 K). Yet, it has a considerable impact on the relative CRPS change as the bias is of a similar magnitude as the standard deviation of the error of the ensemble mean.

CRPS change decomposed.
CRPS change decomposed. The chart shows the CRPS change for temperature at 250 hPa in the tropics, based on the same data as in the first figure (ΔCRPS, red). It also shows the change in the hoG approximation of the CRPS (ΔC, blue) and its decomposition into contributions from changes in the ensemble mean error variance (ΔC1), changes in the spread– error ratio (ΔC2), and changes in the normalised bias (ΔC3). This decomposition of the approximated CRPS change is exact in the sense that ΔC= ΔC1 + ΔC2 + ΔC3.

Deeper implications

In addition to the new diagnostic, this work has potentially deeper implications regarding the criteria used to determine improvements in the representation of uncertainties in ensemble forecasts. At present, the CRPS of raw model output is the main metric guiding the development of medium-range weather forecasts. It will continue to remain a relevant metric as the monitoring and understanding of model biases will always play an important role in model development. However, in the presence of a bias, solely focusing on this metric implies convergence towards a system with spread–error ratios larger than one. In consequence, users who bias-correct their forecasts will end up with an overdispersive probabilistic forecast. To address this issue, it is recommended to also use the CRPS of bias-corrected forecasts for guiding the representation of uncertainties in the medium range. Since extended-range forecasts are already bias corrected, this will have the advantage of making the medium-range and extended-range development more consistent. 

Further reading

Further information can be found in an article by Martin Leutbecher and Thomas Haiden, Quarterly Journal of the Meteorological Society,