ECMWF Newsletter #184

Data assimilation workshop probes traditional and machine learning methods

Massimo Bonavita

 

Traditional data assimilation is about achieving a statistically optimal blending of observations and model information to provide the best estimate of the state of the Earth system and its uncertainty for both monitoring and prediction purposes. The recent advent of machine learning (ML) in numerical weather prediction (NWP) and climate studies has shown that data assimilation and observations can be effectively used to directly improve NWP and climate prediction capabilities beyond providing initial conditions for both physics-based and data-driven machine learning forecast models. The focus of a two-day workshop held in Bonn on 9 and 10 April 2025 was to discuss current development directions in both traditional data assimilation and observations for NWP and climate and the emerging area of hybridising aspects of the data assimilation workflow, or even fully replacing data assimilation with machine learning technologies.

The participants.
The participants. The workshop involved 23 presentations and had 88 participants.

Main themes

During the first day of the workshop, attention was focused on the current status of Earth system data assimilation systems in major operational NWP centres; the sources and limits of weather predictability and how this could be extended by further gains in the accuracy of initial conditions; and the prospects for improvements in this area from, e.g., advances in resolution, model complexity, coupling, and new observations.

The talks by Massimo Bonavita, George Craig (Meteorological Institute, LMU Munich) and Nedjeljka Žagar (University of Hamburg) discussed the prospects and challenges of bridging the gap between current predictive capabilities (about nine days) and theoretical predictability limits for weather (about two weeks). The role of improved initial conditions, especially in the tropics, was highlighted as the likely biggest driver of future improvements in predictive capabilities.

One recurrent theme in the talks by representatives of NWP centres present at the workshop (the German National Meteorological Service, DWD; Météo-France; the UK Met Office; the HIRLAM Consortium) was the drive to improve the use of current and future observations in the data assimilation workflow. To achieve this objective, an increasingly accurate representation of observation errors (Sarah Dance, University of Reading) and a concurrent increase in the resolution of the analysis updates (Žiga Zaplotnik and Emiliano Orlandi) both in space and frequency are necessary.

The second day of the workshop was devoted to discussing the rapidly emerging area of hybridising traditional data assimilation methodologies with ML techniques. As discussed by Marc Bocquet (ENPC and ECMWF Fellow) in his introductory talk, the field is rapidly evolving, and it is still too early to say which of the many competing proposals will ultimately become the new paradigm for data assimilation.

Broadly speaking, one large area of development can be framed as ML-aided data assimilation and prediction. Here people are using ML tools to improve the quality and/or the computational efficiency of parts of the standard NWP workflow. Examples of this kind of activities were discussed, e.g. in the talks of Alban Farchi and Marcin Chrust (model error estimation and correction); Elias Holm and Wei Pan (building an ML emulator for the ECMWF Ensemble of Data Assimilations); Vincent Chabot (Météo-France; estimation of uncertainties in the data assimilation cycle); Alberto Carrassi (University of Bologna; hybrid sea-ice data assimilation and modelling); and Alan Geer (hybrid physical and data-driven observation models). In this area, the importance and challenges connected to maintaining physical consistency in the resulting analyses was stressed by many speakers, e.g. Tijana Janjic (MIDS, KU Eichstätt-Ingolstadt).

At the other end of the spectrum, ML can be viewed as a general-purpose tool that can provide end-to-end solutions to both state estimation and state prediction problems. In this area, Jan Keller (DWD) provided an overview of DWD's long-term strategy of transitioning from the current NWP-based DA and forecast system to a fully data-driven one. An even more disruptive development idea was presented by ECMWF's Mihai Alexe in his presentation of the DOP (Direct Observation Prediction) project, where ML is used to construct a generalised regression tool which, starting from recent and current observations, can extrapolate to predict their future values.

Wide range of views

A panel discussion concluded the second day of the workshop with the participation of all speakers and the on-site and on-line audiences. Given the wide range of views expressed in the talks, it was not surprising that a consensus on the way forward proved hard to reach. On the other hand and for similar reasons, it was also clear that data assimilation remains central for the success of the forecasting enterprise, and it is one of the areas where rapid and exciting progress can be expected to take place. For more details, consult the workshop page on the ECMWF website: https://events.ecmwf.int/event/428/.