The AI Weather Quest: spotlighting the best of machine learning sub-seasonal prediction

The AI Weather Quest is an ambitious international initiative led by ECMWF to evaluate the potential of machine learning (ML) methods for sub-seasonal prediction.

Forecasting on sub-seasonal timescales (two to six weeks) is notoriously difficult as predictive skill from initial conditions and slowly evolving environmental characteristics is limited. The Quest brings together teams from around the world to enable knowledge exchange across forecasting communities, to advance ML-based Earth system modelling within the Destination Earth (DestinE) initiative, and to benchmark ML forecasting approaches within an operational-style environment. The initiative has now reached the midpoint of its first competitive year.

Over the past six months, participants have submitted global quintile-based probabilistic forecasts at three- and four-week lead times for near-surface temperature, mean sea level pressure, and accumulated precipitation.

We’ve previously introduced the Quest’s design and evaluation framework in a science blog and accompanying publication.

Here we discuss who has taken part, what modelling approaches have been taken, and whether initial results illustrate that ML modelling outperforms traditional dynamical approaches. We focus our analysis on the first two 13-week competitive periods: September to November 2025 (SON 2025) and December 2025 to February 2026 (DJF 2025/26).

Competitive participation in the first two periods

A key aim of the Quest is to bring together a diverse community of participants. Since the competition began, 42 teams have submitted at least one forecast for at least one variable or lead time. Throughout the first six months, weekly participation never fell below 25 teams. While most submissions come from Europe, China, and the United States, participation has also come from Niger, Morocco, Kenya, South Africa, Peru, and South Korea (Figure 1a), reflecting the increasingly global reach of ML approaches to forecasting.

The Quest has attracted a broad mix of contributors from universities and research institutes to technology companies (Figure 1b). Surprisingly, meteorological institutions account for the smallest share of contributors, suggesting that operational centres may benefit from greater support in adopting and experimenting with ML-based forecasting tools for sub-seasonal timescales.

Each team can submit forecasts from up to three individual models. So far, the Quest has received forecasts under 74 uniquely named models, though several may share similar underlying methodologies. All submitted forecasts can be explored through our dedicated forecast portal.

At the end of each competitive period, teams are invited to provide summaries of their model design. These descriptions, available on individual team web pages, offer insight into the range of infrastructures and approaches used for ML-based forecasting.

Bar chart titled “Number of teams by region/country,” showing China with 10 teams, Europe with 15 teams, USA with 11 teams, and Other with 6 teams.

Pie chart titled “Teams by organisation type (%)” showing: Research Organisation 33.3%, Student 26.2%, Meteorological Institution 11.9%, Small to Medium Enterprise 11.9%, Large Tech Company 9.5%, and Other 7.1%

Figure 1: Summary of team characteristics having contributed to the AI Weather Quest in the first two competitive periods. (a) Number of teams grouped by geographic region. The Europe bar is subdivided to distinguish teams in EU member states (striped) from those elsewhere in Europe (solid). (b) Distribution of teams by organisation type. The team location and organisation type is based on team leader details. A team is counted if it contributed at least one forecast for any variable or lead time.

An overview of model types

Figure 2 shows the distribution of model types among systems that forecasted all target variables over a full 13-week evaluation period. Each model is assigned to one of four methodological categories (Table 1), including purely data‑driven systems, ML-based post‑processing, and hybrid approaches.

Across the two competitive periods so far, 39 models meet the ‘period-aggregated’ evaluation criteria. Around half of these models rely exclusively on data‑driven techniques, while the remainder consist of post‑processing systems and hybrid ML-dynamical approaches. This model diversity highlights both the rapid progress in data‑driven prediction and the continued value of combining ML with physics-based forecasting systems.

In the remainder of this blog, we highlight standout contributions to the Quest and present initial comparisons between ML-based and dynamical forecasting systems.

Bar chart showing Data‑driven with 19 systems, Post‑processing with 8, Hybrid with 7, and Unknown with 4.

Figure 2: Total number of models eligible for variable‑averaged, period‑aggregated scores for either the SON 2025 or DJF 2025/26 period, grouped by modelling approach (data‑driven, post‑processing, hybrid, or unknown) inferred from participant‑provided model descriptions.

Table 1: Classification of participating models by methodological approach

Model type	Description	Examples
Data-driven	Forecasts generated solely from historical observations or reanalysis using statistical or machine-learning methods, without real-time dynamical model inputs.	Team AIFS contributions including AIFSgaia, AIFShera and AIFSthalassa
Post-processing	ML or statistical methods to recalibrate, bias-correct, or combine outputs from dynamical forecast systems, with the dynamical model providing the primary forecast signal.	Team MicroEnsemble contributions including Huracan, StillLearning, and MicroDuet
Hybrid	Approaches that integrate dynamical forecasts and historical observations within a unified ML framework.	Team CMA and FDU contribution FengshunHybrid.
Unknown	Models for which no description was provided by participants.

Spotlighting top performers

Throughout the Quest, forecasts are evaluated against the near real‑time ERA5T reanalysis using a climatology-based ranked probability skill score (RPSS) metric. Mean sea level pressure is evaluated globally, whereas temperature and precipitation are assessed over land only. A full description of the evaluation framework is available on our website.

Online leaderboards are updated every Friday with the latest RPSSs, ensuring a transparent and consistent view of forecast performance throughout the competition.

Figure 3 shows period-aggregated RPSSs averaged across all three forecasted variables for the top-performing model from each of the top 10 teams for each lead time and competitive period. For comparison, we also show the corresponding RPSS for ECMWF’s dynamical Integrated Forecasting System (IFS) and a dynamical multi‑model mean. Details of the dynamical forecast and re-forecast configuration are provided on an accompanying Confluence page.

Four‑panel chart comparing team rankings and variable‑averaged RPSS scores. Panels show SON 2025 Week 3, SON 2025 Week 4, DJF 2025/26 Week 3, and DJF 2025/26 Week 4. Each panel displays horizontal bars for multiple forecast teams, positioned around the RPSS axis from about –0.3 to 0.3. A dashed orange line marks the dynamical multimodel mean and a solid orange line marks the ECMWF IFS benchmark. Legends list contributing teams and models.

Figure 3: Variable‑averaged, period‑aggregated RPSSs for the best-performing model from each of the top 10 teams, shown for two competitive periods and two forecast lead times. The top and bottom rows show results for SON 2025 and DJF 2025/26 respectively, evaluated at lead times of (left) three and (right) four weeks. Vertical lines show the corresponding RPSSs for a (red) dynamical multi‑model mean (computed using six dynamical models) and (dashed orange) ECMWF’s IFS. Positive values indicate skill above climatology, whilst negative values indicate lower skill. Model types (Table 1) are distinguished using stippling: plain fill for data‑driven models, dotted hatching for post‑processing models, and striped hatching for hybrid models. Models appear in the legend by rank, cycling through lead time and competitive period.

During the first competitive period, five models achieved positive skill, meaning forecast skill outperformed climatology, at both lead times (Figures 3a and b). In the following period, this increased to seven models (Figures 3c and 3d). Along with higher RPSSs for lower-ranked teams during DJF compared to SON, this suggests that teams continued refining and improving their forecasting systems as the Quest progressed.

Across both periods and lead times, the post‑processing model MicroDuet from team MicroEnsemble consistently led the leaderboard. MicroDuet integrates an ensemble transformer-based, post-processing method with an adaptive bias correction applied to ECMWF’s IFS sub-seasonal forecasts. Several data‑driven models, including contributions from teams AIFS and CMAandFDU, also demonstrated notable skill, while hybrid approaches such as FengshunHybrid and LPM achieved competitive performance. An overview of the methodologies from these teams is presented in our end-of-period awards webinars.

Did ML-based forecasts outperform dynamical models?

A key question for the Quest is whether ML-based forecasting systems significantly outperform dynamical systems? Figure 4 shows weekly RPSSs for each variable and lead time across all weeks within the first two competitive periods. We focus on the best performing forecasting system in each category (Table 1): data-driven (AIFShera), post-processing (MicroDuet), hybrid (FengshunAdjust) and dynamical (ECMWF IFS).

Overall, MicroDuet significantly outperforms the ECMWF IFS for all variables and lead times. In contrast, the best data‑driven and hybrid models show limited significant improvements over the dynamical system, except for precipitation at both lead times.

These initial results suggest that while ML approaches can enhance sub‑seasonal forecast skill, further development is needed before fully data‑driven systems reliably and significantly outperform dynamical prediction models. Notably, very few ML models significantly surpass the skill of leading dynamical systems, and those that do rely on dynamical inputs through post‑processing or hybrid designs. This highlights the continued effort required to develop truly data‑driven forecasting systems capable of outperforming dynamical modelling approaches.

Six‑panel figure showing weekly RPSS time series for four forecasting systems across SON 2025 and DJF 2025/26. Panels display near‑surface air temperature, mean sea level pressure, and accumulated precipitation for days 19–25 (left column) and days 26–32 (right column). Each line plot shows scores for MicroDuet, AIFShera, FengshunHybrid, and ECMWF, with shaded regions marking SON and DJF periods and summary values listed at the right of each panel.

Figure 4: Weekly RPSSs over the first two competitive periods for the top-scoring model in each model type: (dark blue) MicroDuet, a post‑processing approach; (green) AIFShera, a data‑driven ML system; (orchid) FengshunHybrid, a hybrid method; and (yellow) the ECMWF IFS, a dynamical model. Scores are shown for (top) near‑surface air temperature, (middle) mean sea level pressure, and (bottom) accumulated precipitation, evaluated at (left) three-week and (right) four-week lead time. For each variable and lead time, correlation coefficients between each ML model and ECMWF IFS are shown in bottom right corner. Scatter markers are filled when the model’s weekly RPSS differs significantly from that of the ECMWF IFS at the 0.05 level according to the paired Wilcoxon signed‑rank test; otherwise markers are unfilled.

Future challenges and opportunities in ML sub-seasonal forecasting

The first two competitive periods of the Quest have highlighted both the progress and remaining challenges in ML-based sub-seasonal prediction. Post-processing approaches show potential to substantially enhance sub-seasonal forecast skill. However, for near-surface temperature and mean sea level pressure, data-driven systems still need further development to consistently outperform widely used dynamical models. Additionally, a key challenge remains in translating these improvements into actionable guidance issued by national meteorological services that regularly monitor sub-seasonal forecasts.

Opportunities to get involved in the AI Weather Quest

There are still several ways to engage with the AI Weather Quest and its growing network of researchers, developers and practitioners.

First, you can hear directly from some of the top-performing teams in our upcoming DJF Awards Webinar. MicroEnsemble, LP, and AIFS will present their approaches and share insights into the ML methods behind their systems, along with lessons learned from the competition.

Researchers also have the opportunity to contribute to a forthcoming joint special issue of the Royal Meteorological Society journals International Journal of Climatology and Meteorological Applications. The issue, titled Advances in Machine Learning for Weather and Climate: Modelling, Forecasting, and Applications, will bring together peer-reviewed research emerging from the Quest and the wider community working on ML approaches for weather and climate prediction.

Finally, it’s not too late to take part in the competition itself. The third competitive period of the AI Weather Quest is currently under way, and the fourth will begin on 14th May 2026. New teams are encouraged to join and test whether their models can outperform current dynamical and ML-based forecasting systems.

For more information and to join, visit the AI Weather Quest website.

Acknowledgements

All authors and organisation of the AI Weather Quest are supported with funding from the European Union, provided to ECMWF under the Contribution Agreement between the European Union, represented by the European Commission, and ECMWF on the implementation of the Destination Earth (DestinE) Initiative.

We gratefully acknowledge the hard work and enthusiasm of all AI Weather Quest participants. We also extend our thanks to the Advisory Board, whose interdisciplinary expertise in AI and meteorology has helped shape the structure and direction of this challenge.

DOI

10.21957/0fddea4898