ERA5 is by far one of the most popular products of the Copernicus Climate Change Service, a programme which ECMWF has implemented on behalf of the European Commission since its introduction in 2018. In recent years, one of the most prominent examples of the use of ERA5 has been in machine learning, particularly in training data-driven models for weather forecasting. ERA5 has been key to a series of high-profile papers that showcased the ability to build state-of-the-art forecasting systems training from this dataset alone. This body of work inspired ECMWF to create the Artificial Intelligence Forecasting System (AIFS), its family of data-driven forecasting systems, which became the first operational models of this kind earlier this year. Both the deterministic and ensemble AIFS systems use ERA5 in the first stage of training.
Working with its Member States, ECMWF has created Anemoi, a framework for training data-driven forecasting models at both global and local scales. Anemoi is used by ECMWF to train and run the AIFS, and by many other partners across Europe to develop their own data-driven forecasting systems. Part of the Anemoi ecosystem is the Anemoi datasets package, which transforms raw meteorological data sources into Zarr datasets – an open-source format optimised for efficient training. Until now, anyone wishing to train a model with Anemoi first had to build a training dataset.
ECMWF has now released the first Anemoi version of the Copernicus ERA5 dataset, spanning 1979 to 2023 and openly accessible to all users, with a highly permissive licence (CC-BY-4.0). This dataset, hosted on ECMWF infrastructure, is provided on a grid with approximately 1 degree resolution. Directly derived from ERA5, it has been made available to accelerate the engagement with Anemoi for researchers to train their own data-driven forecasting systems. This Anemoi view of a subset of ERA5 aims to complement the broader access to the full dataset through the Climate Data Store.
Once downloaded, data-driven models can be quickly trained using this dataset, provided users have suitable hardware. For this first release, a lower-resolution version of ERA5 was chosen. While ERA5 is natively produced on a grid of approximately 0.25 degrees, the coarser grid of the Anemoi version allows faster downloads and more efficient training of experimental systems. This approach intends to help engage an even wider community in this rapidly evolving field, which is now powering operational forecasting systems.
What's next
It is hoped that the community finds this a useful source for exploring data-driven forecasting. Activity will be monitored to shape further expansions. Depending on uptake, further datasets of this type may be released, including higher-resolution versions, Anemoi versions of ECMWF's operational analysis – designed for fine-tuning data-driven models for live initialisation – and Anemoi versions of the future ERA6. As other Zarr views of ERA5 already exist, ECMWF will also explore ways of coordinating these derived datasets to maintain consistency.
The Anemoi-ERA5 O96 dataset provides data in a convenient Zarr format, optimised for efficient training of data-driven forecasting systems. Comprised of roughly 0.5 TB of data across more than 65,000 files, it offers a rich suite of atmospheric variables commonly used in data-driven forecasting. These include six atmospheric variables on 13 pressure levels, and 23 single-level variables such as near surface winds, temperatures and precipitation.
How to access the dataset
Access is streamlined through the anemoi-datasets Python package. To retrieve the dataset:
- pip install "anemoi-datasets>=0.5.22" anemoi-datasets copy \
- url https://data.ecmwf.int/anemoi-datasets/era5-096-1979-2023-6h-v8.zarr \
- target era5-096-1979-2023-6h-v8.zarr
Further details on use and the full list of available variables can be found at https://anemoi.readthedocs.io/projects/training/en/latest/user-guide/download-era5-o96.html.