Machine learning and physics in weather forecasting: a discussion between Alan Thorpe and Florian Pappenberger

Illustration two people in conversation

Image: © Elena Medvedeva/iStock/Getty Images Plus 

In the rapidly evolving field of meteorology, the integration of machine learning (ML) techniques with traditional numerical weather prediction (NWP) presents both exciting opportunities and complex challenges.

This dialogue between Alan Thorpe (former Director-General at ECMWF, Professor at the University of Reading and co-host of The WeatherPod videocast and podcast) and Florian Pappenberger (Deputy Director-General and Director of Forecasts and Services at ECMWF) explores the landscape of this integration, examining the potential synergies, philosophical divergences, and practical implications of using ML tools for weather forecasting.

Alan Thorpe
Florian Pappenberger

Alan Thorpe (left) and Florian Pappenberger (right). 

Part 1: The challenge for weather forecasting

Alan: ML models for weather forecasting are based on neural networks trained on extensive databases of past weather states which come from reanalyses like ERA5 (a combination of observations and short-range NWP forecasts). The resulting ML models are initialised with NWP-based operational analyses of current weather conditions to create an ML prediction, which can be compared to an NWP-based forecast. Figure 1 shows the process for ECMWF’s Artificial Intelligence Forecasting System (AIFS).

Training of the AIFS

Figure 1: The principles of training ECMWF’s AIFS – one example of an ML-based weather prediction model. The weather as recorded in ERA5 is sampled, and ‘input’ and ‘output’ separated by six hours are batched together. These batches are used to train the AIFS so that it can make predictions six hours ahead.

Alan: From this, I think it is important to recognise that current ML predictions require state-of-the-art NWP models.

Florian: Yes, I agree, although there is much current research exploring ML models and how they can be designed and operated to produce predictions. Also, we should remember that these predictions, be they NWP or ML based, are the raw data from which forecasts for a wide range of users are produced. Weather forecasts need to be as accurate, usable, and reliable as possible.

Alan: I absolutely agree, and I would add a further attribute – trustworthiness. These attributes represent the scientific challenge facing the enterprise of weather forecasting, but they are far from straightforward to measure. I believe that underpinning all of these is the requirement for weather prediction data to be dynamically and physically consistent; that is, they should satisfy the laws of physics. What do you feel about that, Florian?

Florian: My only hesitation in answering in the affirmative is that many users probably do not always recognise or care about dynamical and physical consistency. Could weather predictions be accurate enough even if they are not fully consistent? Measuring dynamical and physical consistency is itself not an easy matter. For NWP satisfying the laws of physics, at least as far as we know them, is the basis of the algorithm. However, for ML predictions, physical consistency must arise in a different way. 

Alan: I am sure we will come back to this during our discussions! 

Part 2: The integration challenge and the essence of ML in weather prediction

Integrating ML and NWP

Alan: I’m a little hazy about what integration of ML prediction and NWP means. Does it mean developing and producing ML and NWP predictions side by side? This sounds like a big increase in the operational and computational resources required. 

Florian: Yes and no. While integration does indeed imply developing ML and NWP predictions side by side, it encompasses more than just running two parallel systems.

Integration would also involve selectively replacing parts of the NWP forecast chain with ML approaches where they can offer improvements or efficiencies (e.g. replacing the radiative transfer physics scheme with a neural network).

GPU-based computing, which excels in handling large parallel computations efficiently, is ideal for the kind of data-intensive tasks ML involves. On the other hand, NWP models currently rely more heavily on CPU-based systems, which are better suited for the sequential processing tasks that are common in traditional forecasting models.

So, integration is not just about more resources, it involves smarter, targeted use of different technologies. We might see a shift where data flows are optimised differently, depending on whether the task is more suitable for ML or NWP. This could mean rethinking how and where processing happens, perhaps moving some processes closer to data sources to reduce latency and increase speed. Such changes will influence how quickly and efficiently we can incorporate new scientific discoveries into operational forecasts.

Incorporating scientific advances

Alan: NWP has advanced over the years as new scientific, computational and observational advances have been absorbed into operational systems on a regular basis – sometimes this is called the ‘quiet revolution’. I think ECMWF is now on Cycle 49 of its Integrated Forecasting System (IFS). So, Florian, I wonder how new scientific discoveries used within physics-based NWP can be efficiently transferred into operational ML models. It seems to me that this presents a substantial challenge. For instance, when a new piece of physics is added to a physics-based NWP model, how do we ensure this knowledge is effectively incorporated into the ML model without delay?

Florian: ECMWF’s current strategy involves initially training our ML model on ERA5 data, followed by fine-tuning with operational analyses. This method allows us to rapidly incorporate new data and even significant shifts in model versions, including resolution changes.

Alan: I wonder though whether incremental additions of data (arising from new science) into the ML training dataset might either conflict with the pre-existing reanalysis data or be swamped by the volume of pre-existing training data. This could potentially slow down the integration of new scientific knowledge. You mentioned fine-tuning, but could you elaborate on how this process helps in aligning the ML models with the latest scientific findings?

Florian: Fine-tuning is a critical technique in ML where we adjust a pre-trained model to better suit specific tasks or incorporate newer data. It leverages the ML model's existing knowledge base, which has already been optimised on a broad dataset. By starting from this established baseline, we can integrate new insights more swiftly, using significantly smaller datasets, and with significantly less computational cost.

When we fine-tune an ML model, we're essentially integrating the latest observational data or new science and outputs from NWP models. What's crucial here is that this process doesn't require a full retraining of the model. Instead, we make targeted adjustments to the model's weights and parameters to reflect the new data and scientific findings. This selective updating helps ensure that the new information is not drowned out by the volume of pre-existing training data and avoids conflicts with established reanalyses. It's an efficient way to keep our ML models at the cutting edge, continuously aligning them with the latest scientific advances.

Alan: But I wonder how effective fine-tuning will prove to be in practice. Whilst I get the point about adjusting weights etc. in the ML model, atmospheric processes are highly nonlinear, so there will potentially be inconsistencies created throughout the model from each change. I find it hard to imagine how deleterious unintended consequences could be avoided or managed. But of course, the proof of the pudding will be in the eating! 

Florian: Agreed – and our experience at ECMWF thus far with fine-tuning has indicated it has great promise. 

Physical and dynamical processes in ML models

Alan: OK, let’s talk about another concern regarding whether ML models just mimic, rather than genuinely discover and incorporate the dynamical and physical principles underlying weather phenomena. For example, Figure 2 shows the range of Earth-system processes incorporated within the IFS.

The basic premise of the ML approach is that there is sufficient information in the training dataset to enable ML models to be able to accurately predict future behaviour. Yet to me, given the number of degrees of freedom of the atmospheric system and the relative sparseness of the global observations, it seems unlikely that the database does contain enough information (and of the right type).

Florian: Indeed, the representation of physical processes in ML models is a challenge. However, research, including into adaptations and fine-tuning, is making strides toward addressing this. It does depend on what scales one wants to resolve, however, that is no different to physically-based models.

Earth system diagram

Figure 2: ECMWF’s physically-based IFS represents a whole range of Earth system processes. The nature of ML-based models makes the representation of physical and dynamical processes a challenge.

Alan: As we said earlier, assessing consistency and fidelity is a complex issue, Florian. But there are studies emerging that show certain weaknesses in ML-based predictions, for example in the intensity of smaller-scale features like frontal structures and even in the derived vertical velocity (see Figures 3 and 4). Weather systems develop from a complex interaction between all the meteorological variables like pressure, wind velocity, temperature, humidity, and atmospheric composition. I tend to think of this in terms of the correlation between the multiple variables such as fluxes of quantities and conservation principles. I hope more work can be done to investigate these aspects.

Vertical velocity standard deviation, comparison of ERA5, IFS, and Pangu

Figure 3: Studies are showing some limitations in ML-based predictions, for example weak vertical velocity. While vertical velocity is not typically predicted by ML-based weather forecasting models it can be diagnosed from the predicted divergence field. Bonavita, M., 2024: "On Some Limitations of Current Machine Learning Weather Prediction Models", Geophys. Res. Lett., DOI: 10.1029/2023GL107377.

AIFS and IFS forecasts for extreme cold in northern Europe January 2024

Figure 4: In this example, the IFS forecast captures the extreme cold across northern Europe during 3 to 5 January 2024 more accurately than the AIFS. Plots are for 3-day average 2-metre temperature. The analysis (top) highlights the severity of cold in northern Sweden and central Finland, where temperatures dipped below -30°C on average over three days. The 2-to-5 day forecast, based on the IFS (bottom left), predicted slightly milder conditions but still expected temperatures to remain below -30°C in some regions. A comparison with the AIFS (bottom right) shows it predicted warmer conditions, with temperatures not falling below -26°C in eastern Finland.

Alan: So, a fundamental question for me: can ML models produce physically and dynamically consistent predictions that are sufficiently skilful for all relevant scales? I think that assessing skill or the quality of a forecast in terms of dynamical and physical consistency provides more discriminating measures than say RMSE or CRPS, which can be misleading. I hope such measures can be developed and deployed.  

Florian: It's a fair question. While it's true that some ML models have shown limitations, it's also a rapidly evolving field. The balance between skilful predictions and maintaining physical consistency is challenging but I believe not unattainable. We've seen ML models that, when properly trained and fine-tuned, can offer predictions that are not only skilful but also maintain a commendable level of physical consistency. It's about evolving our understanding and application of ML in meteorology.

Part 3: Philosophical divergences, practical applications, and future horizons

Florian: Reflecting on our discussion, it seems we're at a crossroads of philosophy and practicality regarding weather prediction.

In the future, will physics-based modelling still be needed?

Alan: Perhaps there is an elephant in the room we should mention. Would it be possible to only use observations to train and initialise ML models and for those predictions to be sufficiently skilful? In other words, do away with the physics-based modelling entirely. Intuitively, I think that this would be neither possible nor desirable. What do you think, Florian?

Florian: This would be a revolutionary step indeed and is still at the frontiers of science. I am fairly certain that we will see ML forecasts which go from observations to forecasts (they already exist in the area of nowcasting). Whether they are used to create initial conditions for other ML forecasts or to create the forecasts directly from the observations without the in-between step of initial conditions remains to be seen. However, I wouldn’t speculate yet whether we will do away with the physical model entirely as many hybrid or in-between stages could be possible and in my opinion it is still too early to say what will really happen. Even at the radical end of the spectrum, it seems likely that some anchoring with physical models (e.g. in training) would likely remain beneficial.

ML and operational weather forecasting

Florian: Now, let's delve into the practical side of things. How do you see the advances in ML contributing to the operational aspects of weather forecasting?

Alan: The operational benefits of ML, especially its computational efficiency, cannot be overlooked. The ability to produce forecasts at a fraction of the time taken by traditional physics-based NWP models is indeed groundbreaking. However, my caution lies in whether this speed compromises the quality or accuracy of the forecasts and whether there is reduced focus on physics-based NWP. Could ML models, in their pursuit of efficiency, inadvertently overlook complex atmospheric processes that physics-based NWP meticulously simulates?

Florian: A valid concern, Alan, but ML models have shown remarkable skill in certain aspects of weather prediction, such as precipitation forecasting (see Figure 5), arguably outpacing traditional models in terms of development speed. These advances are not just about speed but also about harnessing vast datasets to uncover patterns and insights that might not be immediately apparent through conventional methods.
Graph comparing AIFS v0.2.1 and IFS for 24-hour accumulated total precipitation assessed against station observations using the SEEPS score, aggregated over the year 2022, for northern hemisphere extratropics

Figure 5: ECMWF’s AIFS has shown notable skill in forecasting large-scale precipitation at the medium range. Comparison between the AIFS v0.2.1 and the IFS for 24-hour accumulated total precipitation for the northern hemisphere extratropics assessed against station observations using the SEEPS score (larger numbers indicate a more skilful forecast), aggregated over the year 2022. 

Explainability and ML models

But, Alan, what about explainability in ML models – can we trust these models without a clear understanding of their decision-making processes?

Alan: Explainability is paramount, especially when we're dealing with phenomena as complex and impactful as weather. If an ML model offers a forecast but can't elucidate the physical reasoning behind it, how can we validate its accuracy or trust its predictions, especially in critical situations? The question then becomes: can ML models evolve to not only predict accurately but also to provide insights into the physical processes driving those predictions?

Florian: I share your concern for explainability, Alan, and it's an area where ML is indeed striving to improve. As ML models become more sophisticated, there's a concerted effort to make their workings more transparent and their predictions more interpretable. This involves developing methods that can trace predictions back to the underlying data, thereby offering a form of ‘reasoning’. This doesn't replace the deep physical understanding provided by physically-based NWP models, but complements it.

What might the future hold?

Florian: Alan, how do you see the future of weather prediction evolving with these technologies?

Alan: If the potential of ML models is realised, I see a future where ML and physics-based NWP modelling coexist and complement each other, each playing to their strengths. The use of ML tools offers the possibility to enhance forecasting capabilities, particularly if the enhancement is in areas where traditional models require improvement. Yet, the depth and rigour of NWP models in capturing and simulating the physical essence of our atmosphere are invaluable. I envisage a future in which physics-based NWP remains as the core of the forecasting system, but where deep learning is used to accelerate efficiency and add value.

Florian: I think the synergy between ML and physics-based NWP modelling could usher in a new epoch in meteorology, where we leverage the best of both worlds. As we continue to explore these frontiers, our focus must remain on improving forecast reliability and usefulness for societal needs. The journey may be long, with much to learn and refine, but the potential benefits for humanity and our understanding of the Earth's systems are immense.

Alan: Precisely, Florian. As we navigate these challenges and opportunities, our guiding principle should be the relentless pursuit of scientific knowledge and the application of that knowledge to serve the global community. Weather prediction, at its core, is about safeguarding lives and livelihoods and so predictions must be set the highest bar of scientific credibility and utility. If ML and physics-based NWP modelling can together enhance our predictive capabilities, then it's a journey worth undertaking, continuing the long history of discovery and innovation in meteorology. I see this as another step along the road of the quiet revolution.

Florian: Here's to a future where technology and tradition blend seamlessly to advance our understanding and prediction of weather, for the benefit of all.

This dialogue took place following recording of a videocast on TheWeatherPod hosted by Alan Thorpe and David Rogers.