The ECMWF Technical Advisory Committee (TAC) Subgroup on Verification Measures has recommended the introduction of two new headline scores for the monitoring of forecast skill in the medium and extended range. The new scores are user oriented and make a contribution to the overall evaluation of progress towards ECMWF’s strategic goals.
Both new scores measure ensemble forecast skill, one in the medium range, one in the extended range. Both are based on the verification of 2 m temperature against SYNOP weather station observations. In other respects they are quite different. The additional headline measure for the medium range is the percentage of large errors, defined by the continuous ranked probability score (CRPS) exceeding 5 K at day 5 in the extratropics. It is sensitive to developments in boundary-layer physics as well as overall forecast system improvements, such as model resolution increases. Over the last ten years, the occurrence of large errors defined in this way has decreased from 6–7% to about 5% in the annual average. A large fraction of these errors occur in stable boundary-layer situations, where 2 m temperature is particularly sensitive to errors in low cloudiness, wind speed or snow cover.
The additional headline measure for the extended range is the discrete ranked probability skill score (RPSSD) for terciles of the weekly mean 2 m temperature in the northern extratropics in week 3 of the forecast (days 15–21). Unlike the medium-range score, which is based on real-time forecasts, this score is based on the evaluation of re-forecasts covering the preceding 18 to 20 years in order to increase sample size and improve the signal-to-noise ratio. The re-forecasts, run with the same model version as the real-time forecast, are used in the generation of the operational extended-range forecast products: forecasts are presented as anomalies relative to the re-forecast climate to account for model bias. The headline score is insensitive to bias because model quantiles are used for the forecast and analysis quantiles for the verification. The focus on week 3 means that the score targets a forecast range which still has relatively low skill, although this has improved substantially over the last ten years. The downward trend from 2012–2015 visible in the plot is due to interannual atmospheric variability, which is driven by large-scale phenomena such as the El Niño Southern Oscillation (ENSO) or the Madden–Julian Oscillation (MJO). It occurs in spite of the relatively long (18-year to 20-year) re-forecast period on which each skill score value is based.
Another TAC Subgoup recommendation is to provide information on the partitioning of ENS improvements over time into resolution and reliability components. The former represents the information content (or discriminating ability) of the forecast, the latter shows how well the forecast is calibrated. Also, in addition to 2 m temperature, the skill of the forecast for 10 m wind speed and precipitation in week 3 will be monitored. The Subgroup agreed that the four Euro–Atlantic weather regimes are relevant for users. It encouraged ECMWF to continue its work on regimes and regime transitions, some of which could be carried out within the sub-seasonal to seasonal (S2S) project. Furthermore, it recommended that the MJO and North Atlantic regime real-time indices be included as supplementary scores. This would make it possible to monitor the skill of ECMWF’s Integrated Forecasting System (IFS) with respect to one of the main sources of extended-range predictability and its effects on European weather.
ECMWF would like to take this opportunity to thank the representatives of the Member States and verification experts for their work within the TAC Subgroup.