ERA5 was publicly released for the first time in 2019 and has since become one of the most popular products of the Copernicus Climate Change Service (C3S), implemented by ECMWF on behalf of the European Commission.
The dataset offers a detailed, consistent record of global atmospheric, land, and oceanic conditions from 1940 to the present. In recent years, ERA5 has expanded beyond the original remit of a tool for understanding our climate, becoming a key resource for training data-driven machine learning (ML) weather forecasting models.
Several high-profile publications have exemplified that ML models trained entirely on ERA5 can produce remarkably skilful forecasts. Examples include the work by Ryan Keisler, NVIDIA’s FourCastNet model, Huawei’s Pangu-Weather and Google Deepmind’s GraphCast.
At ECMWF, this helped to inspire the development of our own Artificial Intelligence Forecasting System (AIFS), which became operational earlier this year. Both the deterministic and ensemble AIFS systems begin their training using ERA5 – a testament to the dataset’s value and reliability.
Collaboration and reducing barriers
As this new era of ML-driven forecasting took shape, we realised the opportunity for a robust framework to work with our Member States to support the training and deployment of data-driven forecasting models. That’s how Anemoi was established.
Anemoi – named after the Greek gods of the winds – is a collaborative framework created by ECMWF and several national meteorological services across Europe.
Together, we’ve built an open-source, Python-based ecosystem that makes it easier to train, test and deploy ML weather forecasting models. Launched in 2024, Anemoi provides the core tools: from preparing optimised datasets, to training models, to running them operationally. It’s designed to be modular and, crucially, accessible – lowering the barriers to entry for anyone interested in data-driven forecasting.
Figure 1: Illustration of the stack of variables provided in a slice of the ERA5 Anemoi dataset.
What’s new?
We have now released the Anemoi training-ready version of ERA5 which is openly available to everyone. Using a time period spanning 1979 to 2023, this dataset provides a ready-to-use, machine-learning-optimised subset of ERA5 data in the Zarr format – an open-source, cloud-friendly structure that enables efficient training. It’s hosted on ECMWF infrastructure and released under a permissive CC-BY-4.0 licence, making it free for all to access and use.
The release removes one of the biggest practical barriers in that you no longer need to build your own training dataset from scratch. Once downloaded, data-driven models can be quickly trained. Hosted on ECMWF infrastructure, this first release is provided on a grid with approximately 1 degree resolution – a coarser resolution than the native 0.25 degree ERA5 grid, but one that significantly reduces data volume and is more efficient in training systems. At around half a terabyte of data across more than 65,000 files, it offers a rich suite of atmospheric variables commonly used in data-driven forecasting. These include six atmospheric variables on 13 pressure levels, and 23 single-level variables such as near surface winds, temperatures and precipitation.
The release also strengthens the collaborative philosophy behind Anemoi. It’s not just a tool, but a platform allowing the exchange of tools, methods, knowledge and ideas, helping to accelerate the evolution and accuracy of AI-based forecasting. We’re already seeing this model of collaboration succeed across Europe, where national meteorological services are training and testing their own ML systems using the Anemoi framework, and driving forward the evolution and use of ML.
Looking ahead
We’re looking forward to seeing how the community uses this dataset, and we’re keen to hear your feedback.
We're actively monitoring how the community engages with it, and depending on uptake, future releases may include higher-resolution versions, or even Anemoi versions of ECMWF’s operational analyses and future Anemoi versions of ERA6. As other Zarr-based views of ERA5 continue to emerge, we’ll also be exploring ways to coordinate these efforts to maintain consistency.
For now, we hope this new Anemoi ERA5 dataset helps more people take their first steps into the rapidly evolving world of data-driven forecasting.
Access the dataset
Access is streamlined through the Anemoi-datasets Python package. To retrieve the dataset:
pip install "anemoi-datasets>=0.5.22" anemoi-datasets copy \ --url https://data.ecmwf.int/anemoi-datasets/era5-o96-1979-2023-6h-v8.zarr \ --target era5-o96-1979-2023-6h-v8.zarr
Further details on use and the full list of available variables can be found on the Anemoi training website.