Adapting the AIFS for 50r1

The implementation of the Integrated Forecasting System (IFS) Cycle 50r1 introduces improvements, including stronger ocean–atmosphere coupling and updated sea-ice representations.

These changes alter the characteristics of the analysis used to initialise forecasts and improve the representation of key physical processes. However, they also introduce downstream effects on data-driven models, including the Artificial Intelligence Forecasting System (AIFS). To ensure the AIFS continues to perform well under the new conditions, we explored several strategies for adapting its training and fine-tuning.

Here we explain in more detail what we did to reduce the impact for the next AIFS version (AIFS v2) and what can be expected going forward.

Adapting AIFS training for 50r1

Having identified that the AIFS (both Single v1.1 and ENS v1) was adversely affected by the new analysis, we explored several training strategies for the AIFS. We utilised five months of prototype data from early versions of IFS Cycle 50r1 to test and implement these strategies. We tested two main approaches:

Introducing a new fine-tuning step at the end of training, using only prototype data.
Adding the prototype data alongside the operational data used in the final steps of training.

The second strategy performed better. The first approach either overfitted or failed to adequately learn the dynamics of the new analysis, depending on the learning rate.

The results presented below are explored through a series of scorecards that provide a broad view of forecast performance, verified against analysis and observations. Colour maps present the underlying data, showing both raw and relative skill of the two forecasting systems.

What’s the outcome for AIFS Single?

This scorecard compares AIFS Single v2, initialised with 50r1 analysis, against a baseline of AIFS Single v1.1, initialised with 49r1 analysis. In both configurations, the initialising analysis has been used as the definition of truth.

Broadly over the first five days of the forecast, the scorecard is predominantly blue, indicating improvement. The columns showing the standard deviation of forecast anomalies are purple, indicating an increase in forecast activity, i.e. reduced forecast smoothing. The trade-off of this increased activity is a small reduction in skill, as measured by root-mean-squared error (RMSE) or the anomaly correlation coefficient (ACC). However, we view this as making AIFS Single v2 a more useful forecast, even if it is less skilful by those metrics.

Some of the strongest degradations are observed when scoring 2 m temperature (2t) against analysis in the Arctic, with degradations of 30% reported (see Figure 1). This indicates that v2 has not fully learnt the new near-surface temperature profile induced by the improved snow representation on sea ice in IFS Cycle 50r1. The degradations are captured in the combined northern hemisphere metrics (first columns). Degradation in near-surface temperatures can also be seen in the tropics, driven by changes to ocean coupling in the IFS.

Global map showing normalised differences in 2‑metre temperature RMSE, with blue indicating reduced error and orange indicating increased error, and the strongest degradations concentrated over Arctic sea‑ice regions.

Figure 1: Normalised change in RMSE for 2 m temperature relative to the initial condition (50r1 for AIFS Single v2 and 49r1 for AIFS Single v1.1) for the period 1 January – March 2026. Blue colours indicate that the skill improves in AIFS v2 in comparison to AIFS v1.1 skill, while red colours indicate a degradation in skill.

However, fine-tuning on prototype 50r1 data certainly has a positive impact. If we compare the same models (AIFS Single v2 vs v1.1) while initialising and verifying both with the same analysis (50r1), we see a different story. This indicates a positive response to incorporating IFS 50r1 data during training. It also demonstrated a clear improvement compared to not implementing an AIFS Single version alongside IFS 50r1.

What’s the outcome for AIFS ENS?

We see the same phenomena in AIFS ENS.

Comparing AIFS ENS v2 (from 50r1) vs AIFS ENS v1 (from 49r1), the strongest degradations again appear in the Arctic. Switching the view to initialising both models with 50r1 data, we again see that using 50r1 prototype data has improved the situation. If we examine the score of AIFS ENS compared with IFS 50r1, we see that for 2t in the Arctic, using observations as the truth, the AIFS still generally shows better forecast scores.

Outlook

Whilst the scorecards indicate some areas where forecast skill is reduced, we view the upgrade to version 2 of the AIFS as the right step forward. The new versions of AIFS Single and AIFS ENS introduce new parameters and improve physical consistency. The degradations are more pronounced in near-surface temperatures and are less evident when compared with observations.

Cycle 50r1 marks a significant step forward for the IFS. The improved coupling between the atmosphere and the ocean introduced in 50r1 also paves the way for ocean–atmosphere coupled versions of the AIFS. As Cycle 50r1 uses a new version of the NEMO ocean model and a new sea-ice model, biases are changing significantly, more than in typical IFS model upgrades.

Overall, the scorecards above show that AIFS v2 has learnt to perform better than current versions of the AIFS when initialised from 50r1, although there is still room for improvement. We see evidence that exposure to more 50r1 data (an increase from the five months currently used and covering more of the annual cycle) will further improve the AIFS, and this information will be used in the development of AIFS v3.

For those who want to explore AIFS v2 themselves, HuggingFace pages for AIFS Single v2 and AIFS ENS v2 are now available.

DOI

10.21957/3c84860825