Recently, weather forecasting models based on machine learning (ML) have surpassed physics-based models in their ability to predict large-scale weather patterns. However, deterministic ML forecast models have a tendency to unrealistically damp small-scale features, which impairs their ability to represent extreme events. In addition, they are usually run at lower resolution than physics-based models. Here, we build a hybrid forecasting system by combining our physics-based numerical weather prediction (NWP) model, the Integrated Forecasting System (IFS), with the deterministic version of our ML model, the Artificial Intelligence Forecasting System – Single (AIFS Single). The result is a forecasting system that inherits AIFS large-scale skill and IFS physics-based realism of small-scale weather systems and extreme events. To combine the two forecasting systems, we make use of spectral nudging. This technique constrains the large-scale components of virtual temperature and vorticity in the IFS forecast to follow the AIFS forecast. By only constraining the large scales, we allow small-scale features to evolve freely under the IFS's physics-based dynamics.
Our approach
A similar hybrid system has already been pioneered at Environment and Climate Change Canada (ECCC) using a variant of the GraphCast ML model and the ECCC's GEM NWP model. We are also collaborating and sharing experiences on a hybrid approach with national meteorological services in our Member States, such as the UK Met Office and Météo-France. A major difference in our approach to that used in Canada is that we make use of a version of our AIFS Single that produces forecasts directly on IFS model levels instead of pressure levels. This ensures vertical consistency and computational efficiency within the nudging framework and enables nudging to be applied from the surface to the tropopause.
We find that nudging the zonal wavenumber-2 divergent wind component leads to temporal aliasing of the semi-diurnal tidal signal, due to the availability of AIFS Single forecasts only at 6-hourly intervals. For this reason, only virtual temperature and the rotational wind component (vorticity) are spectrally nudged up to total wavenumber 21 (approximately 2,000 km length scale), using a nudging timescale of 12 hours. Moreover, we introduce a gradual ramp-up to the nudging so that full strength is reached after 12 hours of forecast lead time. This is because, at very short lead times, the un-nudged model performs better than the nudged one when initialised from the operational analysis.
Results
Validation over 1.5 years of medium-range control forecasts at 9 km resolution shows that the hybrid model improves northern hemisphere skill scores by 15–20% for upper-air and surface fields (see the first figure). The biggest gains in large-scale skill occur in the tropics (up to 30% improvement, not shown). The better large-scale representation leads to better tropical cyclone track forecasting due to a more accurate steering flow. At day 5, the hybrid model reduces track errors by ~50 km (left-hand plot in the second figure) and halves the mean propagation speed error compared to the operational control forecast. Moreover, tropical cyclone (TC) intensity is unaffected (right-hand plot in the second figure), unlike in deterministic ML models, which tend to produce storms that are too weak. This makes the hybrid approach the best of both worlds for TC forecasting.


It is important to ensure that the nudging framework does not introduce unintended effects, such as degrading the conservation of axial angular momentum or total energy compared to the un-nudged model. Our checks confirm that this is not the case - both energy and angular momentum conservation are similar to, or slightly better than, those in the operational forecasts.
Outlook
At present, since the deterministic AIFS model tends to damp features smaller than approximately 2,000 km (total wavenumber 21) with lead time, nudging at scales below about 2,000 km is not performed. However, ongoing research is exploring the use of alternative loss functions for training deterministic ML models. This could help reduce the excessive smoothing of small-scale features and potentially allow nudging to extend to finer spatial scales. Additional developments include the use of AIFS hourly output to mitigate the current issue of temporal aliasing. Finally, the nudging framework is being investigated for ensemble forecasting, combining the AIFS-Continuous Ranked Probability Score (CRPS) model with the IFS ensemble system. Since the AIFS-CRPS does not exhibit the same level of small-scale smoothing as the AIFS Single, it offers a promising avenue for a more spatially detailed hybrid ensemble.