Land carbon fluxes represent the largest sources of uncertainties in the global carbon budget. Improving the estimation of their spatial and temporal variability is therefore crucial for the global monitoring of anthropogenic CO2 emissions in the framework of the future Copernicus CO2 Monitoring and Verification Support (CO2MVS) service. The Gross Primary Productivity (GPP), which is the largest biogenic carbon flux, is difficult to model due to the complexity of the processes involved and the lack of direct observations. One of the objectives of the EU-funded CO2MVS Research on Supplementary Observations (CORSO) project is to exploit new types of Earth observations, such as solar-induced chlorophyll fluorescence (SIF), in land data assimilation systems to improve the prediction of GPP in Earth system models.
SIF represents the emission of electromagnetic radiation in the red and far-red by chlorophyll under visible light conditions. While SIF is directly sensitive to changes in leaf photosynthetic activity and thus GPP, the canopy structure, represented by leaf area index (LAI) in Earth system models, is the prevailing driver of the SIF signal measured by the satellite. Assimilating satellite observations requires an observation operator to predict the model-simulated counterpart of the observation (here SIF) from the model fields. Machine-learning-(ML)-based observation operators are good alternatives to process-based models, which are generally computationally expensive and associated with large uncertainties over land. We present here the work, conducted by ECMWF at global scale and Météo-France (MF) at regional scale, to assimilate SIF observations from the TROPOMI instrument on board the Sentinel-5P satellite. This is done by leveraging an ML-based observation operator to update LAI, which is a key driver of GPP.
Assimilation of SIF at ECMWF
The representation of vegetation in the Integrated Forecasting System (IFS) developed at ECMWF relies on a satellite-based LAI monthly mean climatology, which does not capture the impacts of climate anomalies. The approach chosen in this work consists in (1) assimilating SIF in the offline land data assimilation system (LDAS) to update the IFS LAI climatology and (2) using the updated LAI in the IFS coupled model to evaluate the impacts on GPP forecasts.
The ML technique involving XGBoost gradient-boosted trees was employed to develop an observation operator using the Copernicus Land Monitoring Service (CLMS) LAI satellite dataset and spatiotemporal localisation variables (week of the year, latitude, longitude) as predictors. The ML model was trained over 2019-2020 and tuned over 2021. The evaluation, which was conducted in 2022, indicates that the ML model has a good global prediction performance (determination coefficient of 0.83 and root-mean-square error of 0.1 mW m-2nm-1sr-1). It also accurately reproduces the spatial variability and the temporal evolution of the satellite SIF at global scale.
The ML model was then implemented in the ECMWF LDAS to assimilate the TROPOMI SIF at 0.1° spatial resolution and update the IFS LAI climatology once a day. The assimilation of SIF generates meaningful spatial patterns of LAI increments, such as the reduction of LAI over western Europe in July 2022 in response to the summer 2022 drought. Evaluation against the satellite-based CLMS LAI dataset, for a different year than the one used in the training of the ML observation operator, shows that the assimilation of SIF improves LAI over cropland (see the first figure). Here, the strong correlation between the SIF satellite signal and canopy structure variability is well captured by the ML-based observation operator. However, degradations are obtained over rainforest in the Amazon region, central Africa and Indonesia. This is due to higher cloud contamination of the SIF satellite observations and the inability of the ML model observation operator to resolve the variability of light use efficiency, which has a larger impact on SIF compared to LAI for rainforest.

The comparison of the GPP forecasts with a satellite-based GPP dataset showed improvements over limited regions, such as central Europe and part of North America. However, the improvements obtained for LAI do not systematically translate into improved GPP, which is likely due to the prevailing effect of other sources of biases in the current coupled IFS model.
Assimilation of SIF at Météo-France
To assimilate the TROPOMI-SIF product within the Météo-France LDAS, a deep-learning operator was trained to replicate the product. This operator was then used as the observation operator, in line with the ECMWF methodology. While the ML model used in the Météo-France LDAS is different from the one used by ECMWF (XGBoost), both are trained using the same set of predictors, which was found to be the most influential component of the observation operator for the SIF assimilation results. Once the required level of accuracy was achieved, assimilation experiments were conducted over the Spanish Ebro basin at a resolution of 0.1° between 2018 and 2021.
The Ebro basin was selected because in situ and airborne observations for verification were available from the Land Surface Interactions with the Atmosphere over the Iberian Semi-Arid Environment (LIAISE) field campaign in July 2021. The area includes an intensively irrigated region within a dry zone. The second figure compares the estimated monthly average LAI in August 2021 without the assimilation of SIF (left panel) and with the assimilation of SIF (right panel).

The irrigated area between the Ebro and Rio Segre basins, which is not resolved by the modelled LAI ('Without assimilation of SIF'), is clearly visible on the analysis map ('With assimilation of SIF'). This demonstrates the benefit of assimilating SIF to improve the monitoring of vegetation, which is influenced by the impact of irrigation, over cropland.
Lessons learned and future directions
The assimilation of SIF at ECMWF and Météo-France both demonstrate the potential of ML to exploit new types of observation in numerical weather prediction systems at global scale. These studies highlight the potential of SIF data assimilation to improve LAI over cropland regions, which is promising for the monitoring of anthropogenic emissions over agricultural regions in the context of CO2MVS.
Acknowledgment
The CORSO project (grant agreement No. 101082194) is funded by the European Union. Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the Commission. Neither the European Union nor the granting authority can be held responsible for them.