Flexible data assimilation framework offers opportunities for innovation

Marcin Chrust

Marcin Chrust works in ECMWF's Data Assimilation Methodology Group and is on the organising committee for an ECMWF/ESA workshop on machine learning.

A joint ECMWF/ESA workshop taking place next week (5 to 8 October 2020) will explore the application of machine and deep learning techniques in Earth system observation and prediction (ESOP).

The interest of ESOP scientists in such techniques stems from different perspectives. Marcin Chrust, for example, who is part of this event’s organising committee at ECMWF, sees potential on the data assimilation side. He is interested in exploring the possibility of making ECMWF’s object-oriented data assimilation framework into a common framework for machine learning (ML) algorithms that could be used efficiently by different applications.

Marcin leads the development of ECMWF’s Object-Oriented Prediction System (OOPS) and is also helping to coordinate work to develop an ensemble of ocean data assimilations system at ECMWF.

Object-oriented data assimilation  

OOPS is a framework for running different variational data assimilation algorithms with a variety of forecast models, based on an object-oriented software paradigm.

Its design comprises three layers, as shown below. In the middle layer, OOPS defines a set of templated abstract interface classes, referred to as building blocks. They represent entities manipulated by standard variational algorithms like state, increment, control vectors, covariances, model, and observation operators.

“This design will also prove useful when we start to work on adapting data assimilation models to the next generation of hardware architectures,” Marcin says. “It can be seen as a divide and conquer strategy, where we break complex code into simple, self-contained pieces that can be modified more independently from each other.”

The building blocks are combined by the top layer into applications, like the 4D-Var data assimilation system, forecast, FSOI, etc. Abstract interface classes wrap around model-specific implementations of those classes through the C++ templating mechanism. Each abstract interface class receives a model as a template argument. A model template argument defines a set of aliases allowing generic classes to be linked to model-specific implementations, which amount to the bottom layer of the system. The link only happens at compile time.

OOPS conceptual design

Conceptual design of OOPS, with the application top layer, the templated building blocks of variational assimilation in the middle layer and the interfaces to model-specific components in the lowest layer.

Work to develop the system was already well under way when Marcin became part of the OOPS team. His first task was to develop the interface layer for the NEMOVAR ocean data assimilation system, which required a major re-factoring of the code. He has since taken on responsibility for OOPS development and has become more involved in the development of the interface layer for ECMWF’s Integrated Forecasting System (IFS). His background in computer science, numerical methods and linear algebra – gained from a PhD in computational fluid mechanics, subsequent post-doctoral positions, and industry experience at Bombardier Aerospace – has proven a solid basis for these tasks. 

The intention now is to implement OOPS as a new 4D-Var data assimilation framework for the IFS in one of the first IFS upgrades on ECMWF’s new supercomputers in the Bologna data centre. OOPS development has been an international effort, involving major input from ECMWF, Météo-France, CERFACS and the HIRLAM-ALADIN community.

Flexible framework

The new system will allow developers to more quickly research and test new ideas, especially with a system as complex as the IFS. Recent examples include work on weak-constraint 4D-Var and work on the efficient preconditioning of minimisation algorithms using randomised singular vector decomposition methods.

The flexibility of OOPS could also offer potential to investigate different algorithms, including for machine learning.

“We could try to push the idea of object-oriented design further down into the model, to expose the full control layer of the IFS model to the OOPS system,” Marcin says. “Firstly, this would offer the opportunity to explore task-based parallelism as proposed recently by colleagues in the modelling section, and secondly we could try to build a common framework for machine learning in OOPS.”

This would involve defining interfaces to different components of the IFS at a fine granularity, down to the level of physics parametrizations or individual observation operators. At present, only a joint observation operator is exposed to OOPS, which further down in the code is composed of observation operators for individual instruments. Exposing each observation operator for each instrument type could pave the way for applying ML algorithms in applications such as bias correction and the quality control of observations, and it could even allow physics-/knowledge-based observation operators to be replaced with ML emulators.

Similar ideas could be applied in the model, where ML emulators could be exploited for physics parametrizations for computational performance reasons, for example. “The advantage would be that the same code used to implement ML algorithms could be applied in all these different applications,” Marcin explains. “We would need to design an interface between OOPS and external artificial intelligence libraries that would facilitate the training of the ML models and subsequently allow the models to be imported and applied in OOPS applications.”

But he is cautious about embarking on such developments before understanding if they can be really useful: “It remains to be seen if data-driven approaches in modelling or data assimilation can successfully complement process- or knowledge-driven approaches. Once the reliability, robustness, efficacy and quality of such approaches is demonstrated on a wider scale, we can start moving in that direction.”

Another exciting area of work is the development of a CO2 inversion system with the EU-funded Copernicus Atmosphere Monitoring Service (CAMS). “This work will be carried out in OOPS,” he explains. “The flexibility of the system will allow scientists to research algorithms to effectively assimilate human emissions of CO2.”

Ocean data assimilation

Marcin also shares responsibility for coordinating work to develop an ensemble of ocean data assimilations at ECMWF. This work, supported by a contract from the EU-funded Copernicus Climate Change Service (C3S), aims to improve ocean data assimilation capabilities at ECMWF, for both the initialisation of seasonal forecasts and the generation of coupled Earth system analyses and reanalyses.

“A reliable ensemble is vital for obtaining reliable background error statistics, which are fundamental for the effective and efficient use of ocean observations through data assimilation,” says Marcin, explaining that the work seeks to employ methodologies that are already used successfully in atmospheric data assimilation.

The current ECMWF ocean data assimilation suites generate an ensemble of ocean reanalyses, where each member of the ensemble evolves independently. The next step in the development is to use the ensemble perturbations to specify the flow-dependent background-error covariance matrix in the variational data assimilation system NEMOVAR.

Ocean grid

The assimilation of satellite sea-surface temperature (SST) observations will complement the existing ocean observing network and allow for a stronger coupling of the upper ocean state and the atmosphere.

The aim is also to improve the impact of surface observations in the ECMWF ocean data assimilation system. The final goal is to move away from the current nudging approach for sea-surface temperature (SST) and start assimilating these observations in the system. These efforts go in tandem with the work that is being carried out at ECMWF on the assimilation of skin temperature in the IFS.

The assimilation of SST will allow strong control of the upper ocean state via the vertical background-error correlations. Ideally, the vertical correlation length scales should be flow dependent and linked to the mixed layer depth in order to effectively propagate the signal into the upper ocean. A solution devised by Marcin and colleagues from CERFACS and INPT-IRIT is described in a paper to appear in the Quarterly Journal of the Royal Meteorological Society.

The outcomes of this work will also be showcased at a joint ECMWF/OceanPredict workshop in May 2021.

ECMWF/ESA machine learning workshop

The ECMWF/ESA Workshop on Machine Learning for Earth System Observation and Prediction takes place virtually from 5 to 8 October 2020. Talks will be livestreamed and are open to all.