Severe storms in the age of machine learning

Machine learning (ML) weather models are evolving rapidly, and a key question is: how well do these systems represent the fast-evolving dynamics that define severe mid-latitude storms?

We explored this question using Storm Amy, the first named storm of the 2025–26 season for the western group of the European storm-naming initiative. Amy brought disruptive winds across northwestern Europe in early October 2025.

We compared the performance of ECMWF’s forecasting systems: the physics-based Integrated Forecasting System Control forecast (IFS-CF, Cycle 49r1), the ML-based Artificial Intelligence Forecasting System deterministic forecast (AIFS Single), and the control member of the AIFS Ensemble forecast (AIFS ENS CF). To put the results in context, we also looked at Storm Éowyn, a more classical winter storm.

Capturing Storm Amy’s evolution

Storm Amy developed rapidly from the interaction of the remnants of tropical cyclones Imelda and Humberto, southwest of the North Atlantic storm track (Figure 1).

By 3 October, Amy’s centre had crossed the northern fringes of Scotland as a deepening cyclone, before stalling north of Shetland and turning towards Norway.

All three forecasting systems captured the storm track, pressure pattern, and depth of the low reasonably well for these short lead times. In particular, ML-based systems continue to perform well on these larger-scale flow features. The main differences appear in how well each system captures localised wind maxima linked, in part, to Amy’s strong frontal gradients.

Figure 1: Animation of mean sea-level pressure (MSLP) and 10 m wind speed for Storm Amy, based on forecasts initialised on 2 October 2025 at 00 UTC. The analysis is shown in the top-right panel; the bottom row displays IFS Control (IFS CF), AIFS Single, and AIFS Ensemble Control forecast (AIFS ENS CF).

Wind and pressure verification

Focusing on the passage of Amy over the region in Figure 2a, all models captured the storm’s overall intensity, but some struggled with the timing of its rapid deepening. The AIFS Single initially underestimated the depth due in part to land influence on 3 October, 18 UTC (see also Figure 3c). Although the forecast for midnight improved, the system was still a few hectopascals shallower than the IFS analysis. Later, the AIFS Single recovered rapidly and captured the depth of the low remarkably well during the storm’s mature phase. Overall, the ensemble systems performed better, particularly during the earlier stages of the storm’s evolution.

Four-panel figure showing time series of minimum sea‑level pressure and maximum 10‑metre wind speed for two storm periods. Each panel compares observations with analysis, physics‑based forecasts, and machine‑learning forecasts, with shaded areas showing ensemble spread and small inset maps marking station locations and data outages.

Figure 2: (a-b) Storm Amy: (a) minimum MSLP and (b) maximum 10 m wind speed forecasts for Storm Amy over the thick black box in panel a (54°N–61.3°N, 10°W–0.5°W) from IFS CF (red), AIFS Single (green) and AIFS ENS CF (blue) initialised on 2 October 2025, 00UTC. Shaded envelopes indicate the ensemble spread: orange shading denotes the IFS ENS 5th–95th percentile range (light) and 25th–75th percentile range (dark), while purple shading shows the corresponding percentile ranges for AIFS ENS. (c-d) Same as a-b but for Storm Éowyn over black box in panel c (51°N–56°N, 11°W–5°W) with forecasts initialised on 22 January 2025, 00 UTC. For inclusion as observation data (solid black line), station altitude had to be <500m.

The wind verification shows ML forecasts underestimating peak winds. The AIFS ENS CF produced stronger values (~27 m/s) than the AIFS Single and benefited from its better-located low. However, the “observed” peaks were very likely under-captured, due to sparse station coverage, and ironically also because of a wind-related outage of the station in Tiree, Scotland.

For Storm Éowyn, the ML systems performed more consistently. Both AIFS forecasts represented the storm’s temporal evolution more accurately than they did for Amy. Nevertheless, wind maxima from both ML models were still underestimated, despite their pressure being only a few hectopascals higher. The observed wind-speed drop on 24 January at 06 UTC reflects a station outage at Mace Head (see this Newsletter article featuring recovered data from Met Éireann).

It is also worth noting that verification only used stations below 500 m, likely missing Amy’s strongest winds at higher elevations. In addition, forecast peaks occurred over sea, underscoring a common challenge for both physics-based and ML-based forecasts in capturing extreme winds over complex terrain at current resolution.

Amy’s wind patterns suggest a sting-jet-related enhancement, prompting further analysis of how models represent storm structure. Although Amy and Éowyn followed similar synoptic evolution, they formed in different thermodynamic environments, offering a useful contrast for understanding model behaviour.

Storm structure

Applying diagnostics similar to those used in analyses of Storm Ciarán, we examined how the AIFS Single and AIFS ENS CF represented the internal structure of storms Amy (Figure 3) and Éowyn (Figure 4).

For Storm Amy, both AIFS systems captured the broad geometry correctly, including its frontal layout and deeper-core structure. However, they lacked the mesoscale dynamical sharpness seen in the IFS analysis. For the deterministic AIFS Single, this is in line with the smoother gradients and lower wind extremes documented in earlier evaluations of deterministic ML-forecast models trained with a mean-squared-error (MSE) loss or similar. In contrast, the AIFS ENS CF (product of probabilistic training) shows increased fine-scale spatial variability in the pressure field, resulting in more detailed but locally noisier contours which appear to contribute to the disjointed nature of its weak vorticity field.

Multi-panel maps comparing Storm Amy analysis and forecasts, showing pressure, wind speeds, and vorticity at two lead times; ML models capture structure but appear smoother and less intense than IFS.

Figure 3: Storm Amy. (a-d) Maps of wind speeds at 850 hPa > 36m/s (shading), with thin black contours showing MSLP (960 mb in bold), relative humidity at 700 hPa (grey shading over regions above 80%), 850 hPa wet bulb potential temperatures (θ_w) are shown at 2 °C intervals using contours with colours: blue, θ_w= 2 °C to 10 °C inclusive; orange, θ_w ≥ 12 °C. Green shadings show the vertical component of relative vorticity at 850 hPa. Fields valid 3 October 2025, 18 UTC from the IFS Control run analysis (a), and T+42 forecasts from the IFS CF (b), AIFS Single (c) and AIFS ENS CF (d), which all start from the aforementioned analysis, albeit at lower-resolution equivalents in the case of (c) and (d). (e-h) Same as a-d but valid 4 October 2025, 00 UTC, with f-h being from the same forecast data time (i.e. now T+48 fields). For simplicity and computational reasons, in this figure, all models were regridded to a regular 0.25 degree latitude/longitude grid before calculating the fields shown on the map. This makes vorticity values directly comparable. Methods following Charlton-Perez et al., 2024.

We see a similar pattern for Storm Éowyn, with both ML systems reproducing the synoptic geometry realistically. The AIFS Single showed the smoothest features, but still gave a coherent, well-organised depiction of the storm, while the AIFS ENS CF showed more fluctuations (noise) in the contours. As Éowyn crossed Ireland (T+54), both AIFS forecasts showed a weakening of 850 hPa winds near the centre over land, unlike the IFS or the analysis. This occurs despite the low central pressure in those simulations being only about 4 hPa too shallow and at a time when high impact winds were affecting Ireland.

Both AIFS systems exhibit reduced dynamical sharpness expressed through weaker vorticity cores, smoother frontal gradients, and attenuated 850 hPa wind maxima near the cyclone centre and when the storm crosses land. Despite these limitations, the ML models reliably captured the storm’s overall synoptic structure, even under rapidly evolving conditions.

Multi-panel maps comparing Storm Éowyn analysis and forecasts, showing pressure, wind speeds, and vorticity at two lead times; ML models capture structure but appear smoother and less intense than IFS.

Figure 4: Same as Figure 3 but for Storm Éowyn with forecasts initialised on 22 January 2025, 00 UTC at steps T+48 (a-d) and T+54 (e-h). Magenta contours show θ_w = 0 °C.

What these storms tell us about forecast capabilities

Systematic intensity differences arise naturally from the structural design of the AI forecasting systems. The AIFS Single, trained with a MSE loss, favours smooth, spatially averaged representations of fields. This enhances large-scale forecast performance but dampens sharp gradients and narrow wind maxima typical of rapidly evolving extratropical storms. The AIFS ENS CF, trained using the almost fair continuous ranked probability score (afCRPS), captures forecast uncertainty and often depicts the extremes better than the AIFS Single. However, both models remain constrained by ~32 km grid spacing, six-hourly output, and absence of a wind-gust diagnostic. As a stochastic system designed to sample from a learned probability distribution of atmospheric states, AIFS ENS CF is not directly comparable to deterministic forecasts such as the AIFS Single or IFS Control; a more appropriate comparison considers the ensemble distribution rather than a single realisation.

Storms Amy and Éowyn illustrate a consistent pattern in current ML-based forecasts:

Large-scale structures such as cyclone tracks, frontal patterns, and synoptic evolution are well predicted, routinely outperforming the IFS.
Fine-scale dynamical features such as compact vorticity maxima, sharp fronts, local jets, and sting-jet-related structures are often smoothed or weakened.
Probabilistic training can improve realism but cannot fully compensate for the coarser spatial resolution compared with the IFS.
In the AIFS ENS, small-scale energy is elevated, leading to enhanced high-frequency variability in the fields.
At longer lead times, the AIFS Single becomes progressively smoother.

Although forecasts vary from one initialisation to another, these features are regularly seen.

As ECMWF develops higher-resolution AIFS configurations, improved training strategies, and hybrid modelling approaches, many of these gaps are likely to narrow in the near future. For now, however, traditional numerical weather prediction, particularly the IFS, remains essential for resolving the fine-scale processes responsible for the most damaging winds of severe mid-latitude storms.

DOI

10.21957/69f0f99fdd