Data without friction – ECMWF’s multi-faceted approach to improving data usability

Share
Stylised digital globe composed of data points and binary code, with colourful light trails representing fast-flowing global data.

Image: © monsitj / iStock / Getty Images Plus

Every day, ECMWF produces and disseminates hundreds of terabytes of forecast data, along with reference climate datasets and many other data products that inform decisions across society. Maximising the value of that data depends on making it easy for people to find, access and use.

This is why the FAIR principles – that data should be Findable, Accessible, Interoperable and Reusable – sit at the heart of how we operate. In Earth system science, where data is both mission critical and highly diverse, these principles matter more than ever. 

Earth system data takes many forms and comes from many sources, and its users are equally diverse: meteorological services, researchers, industry and members of the public. Their questions, workflows and access patterns vary widely.

Reducing friction in how they find, access and use our data is central to ECMWF's mission. 

Describing and discovering data 

Making data findable, accessible and interoperable requires commonly understood data formats, a consistent way to describe the data, and application programming interfaces (APIs) that enable access. Together, these elements provide a well-defined "language" for interacting with Earth system data. 

ECMWF has designed and built an ecosystem with consistent, well-defined interfaces, in which data, datasets, user requests and processing actions are described using a common, semantically meaningful metadata language. This language (the MARS language) aligns with the metadata concepts in the World Meteorological Organization (WMO) GRIB and BUFR standards and underpins services such as WebMARS, the Copernicus Climate Data Store (CDS) and Destination Earth's (DestinE) Polytope. It can be used to query data availability, receive notifications, filter results, access data, and integrate these steps into automated workflows. 

We also need to support users who are not yet familiar with ECMWF data and access systems. This calls for a multi-faceted approach that combines user-friendly discovery with machine-readable interfaces. 

ECMWF operates several data catalogues tailored to different user groups. Some are web interfaces – such as the CDS Dataset Explorer – that help users discover datasets relevant to their needs. Others, such as the MARS catalogue, enable experienced users to locate specific data precisely. We also provide catalogues built on technologies such as the SpatioTemporal Asset Catalog (STAC), offering efficient machine-readable interfaces to search, filter and identify data assets. 

ECMWF reduces barriers to uptake by aligning, as far as possible, with well-defined and widely used standards and by providing tooling that makes our services accessible and practical to use. 

Supporting standards 

ECMWF has long contributed to the development of standards, engaging with the WMO (on GRIB and BUFR), the Open Geospatial Consortium (OGC) and other standards bodies. But formal standards are not enough: to be useful, they must be usable – supported by real implementations and practical tooling. ECMWF invests heavily in these building blocks to enable downstream use of our data. 

Beyond GRIB and BUFR, ECMWF actively champions the adoption of the OGC APIs – RESTful interfaces that provide standardised, programmatic access to geospatial data. We are contributing to their evaluation within the WMO Study Group on Future Data Infrastructure (SG-FIT), where they are expected to be recommended for international data exchange. We also align our NetCDF and Zarr datasets with the Climate and Forecast (CF) conventions, which standardise the description of variables and associated coordinates and underpin a wide range of community workflows. 

ECMWF develops robust, full-featured reference implementations of standards for handling meteorological data. The most widely used of these is ecCodes, a library for encoding and decoding GRIB and BUFR, which is used across the global meteorological community. 

Working implementations are essential but not sufficient. To bring new users on board, we also need tools that are easy to use, that span different use cases and data types, and that capture domain practice in accessible workflows. In short, we need ergonomic tooling. 

ECMWF has developed earthkit – an open-source, Python-based software ecosystem that makes working with Earth system data easier and more coherent across a wide range of contexts, while keeping the learning curve manageable. It has a component-based architecture that supports reusability, interoperability and open development, and it provides support for cloud-based workflows, GPU-based operations and in-memory processing to maximise performance and minimise unnecessary data movement. 

Earthkit already underpins a wide range of internal operational services and workflows, and we are preparing the release of earthkit 1.0 (Q2 in 2026), the first set of stable APIs, to further support the wider community. 

Globe surrounded by Earthkit modules - data, geo, hydro, plots, meteo, and transforms - on a dark space background.

The earthkit ecosystem.

Making it all work 

Without operational workflows, systems and services that actually deliver data at scale, all of the above is aspirational. Meteorological and climate data systems are inherently complex. 

ECMWF has operated large-scale systems for decades. These include highly resource-intensive, time-critical operations that generate forecasts, produce hundreds of terabytes of data each day, and disseminate millions of products to Member States and other external users. ECMWF also operates user-facing components of the EU's Copernicus programme, serving data that monitor the current and historical state of the Earth system to millions of users. 

Our pipelines connect many data sources – through robust ingestion and large, carefully orchestrated suites of scientific models and tools – and distribute outputs via services including WebMARS, ECMWF Production Data Store (ECPDS), Polytope and Open Data. Our systems are designed from the ground up to support everything from individual research queries to large-scale operational workflows, consistently and reliably. 

Into the future 

The world never stands still, and this is as true in Earth system science as anywhere else. New data sources, use cases and standards are constantly emerging, particularly driven by the growth of cloud-based and AI workflows. Looking ahead, both the volume and diversity of the data we handle will continue to grow, as will our users' needs. 

ECMWF forecasts and those produced in DestinE are generating global Earth-system data at kilometre-scale resolution. As resolution rises, the fraction of the data any one user needs for their use case continues to shrink. 

Polytope is a new data service that lets users retrieve only the subsets and features they need – for example, a point time-series, a vertical profile, or all data within a polygon. It provides semantic access to gridded meteorological data at native resolution, without copying or restructuring the data first. 

Polytope is being deployed at ECMWF’s Bologna data centre and at several other data centres across Europe, where it will support current and future data services. Together with the upcoming release of earthkit 1.0, these simultaneous innovations on both the service and user sides position ECMWF well to meet the exciting future as it unfolds. 


Further reading

This article is part of ECMWF’s In Focus series on data, exploring how evolving infrastructure, open data, and AI-ready systems are reshaping access to weather and climate information: