Image: © undefined / iStock / Getty Images Plus
Every day, ECMWF’s Integrated Forecasting System (IFS) and Artificial Intelligence Forecasting System (AIFS) generate approximately 400 terabytes (TB) of operational data per day, with 100 TB per day distributed via dissemination. This data is some of the highest-quality and most skilfully produced weather forecast output in the world.
For many years, most of that data was subject to a highly restrictive commercial licence, accessible only under tightly controlled conditions. This meant that its potential for broader use in research, AI model development, and downstream applications remained largely unrealised.
The data friction concept describes the resistance that users, providers, and systems encounter when moving data from where it exists to where it can create value. As the AI weather revolution has demonstrated, understanding which frictions are worth keeping and which are overdue for removal is among the most consequential decisions an environmental data institution such as ECMWF can make.
ERA5, produced by ECMWF within the EU’s Copernicus Climate Change Service (C3S) and governed by the Copernicus open data policy from its very first release, offers an instructive contrast. It was released without a licensing barrier – yet it still presented users with real access challenges: registration requirements, API complexity, throttling, and the domain knowledge required to work productively with GRIB-format data. These frictions persist even when a dataset has an unrestricted licence, and they matter because they shape who can work with data and at what scale. The broader story of how ERA5 and the open-source Anemoi framework contributed to the AI weather revolution is explored in an earlier article in this series.
In recent years, ECMWF has undertaken a deliberate, stepwise approach to opening its IFS products. AIFS launched mid-way through this process and was open by design from the outset. In October 2025, the remaining restrictive licence on IFS operational products was removed, completing the transition and opening a substantial volume of data for purposes previously not possible. Both the IFS and ERA5 stories illustrate the same underlying dynamic: the distance between data and its potential value is not fixed. It is shaped by decisions about formats, licences, access mechanisms, APIs, support, documentation, and governance that together determine how much effort users must expend before they can work with the data productively.
What data friction describes
Data friction takes several forms. Some of it is structural: producing FAIR-compliant data – data that is Findable, Accessible, Interoperable, and Reusable – requires sustained investment in metadata, cataloguing, standardised formats, and access infrastructure. Some friction accumulates incidentally, through legacy decisions, format conventions, or API designs that have not kept pace with evolving user expectations.
Some friction is also deliberate and serves a clear purpose. A requirement to register before accessing a dataset adds a step for the user. It is also, from the provider’s perspective, a mechanism for understanding who is using their data, for what purposes, and at what scale. More importantly, this knowledge supports funding decisions, informs service improvements, and maintains a productive relationship between provider and user over time. Anonymous access cuts that connection, making it harder to justify investment, demonstrate impact, or improve what is offered. Understood in these terms, registration is friction that sustains in the longer term the relationship between provider and user rather than impeding it.
Rate limits and access prioritisation are another example. Transparently applied, they impose minimal burden on most users while protecting shared infrastructure from degradation and allowing fair service use. At ECMWF, prioritisation mechanisms ensure that Member and Co-operating States can always access the data they need to fulfil their operational obligations, regardless of broader system load.
The more challenging category is friction that was once justified but has since become misaligned with current norms, capabilities, or expectations: commercial licensing on publicly funded datasets, access systems requiring institutional affiliation, and proprietary APIs that cannot interoperate with neighbouring services. These may have served clear purposes when established. As the data landscape evolves, they can become barriers to value that are difficult to quantify until removed – at which point the scale of what was being constrained often becomes apparent.
A framework for evaluating dataset friction
Dataset Friction Factors (DFFs) provide a structured framework for examining these barriers across six dimensions, each contributing to the overall effort a user must expend to work with a dataset productively:
- Discoverability and understanding covers documentation quality and data catalogue integration – whether a dataset can be found in an open or federated catalogue, and whether the documentation is structured and accessible to users of different expertise levels.
- Access and delivery examines the complexity of the access method, the API type (OGC Environmental Data Retrieval (EDR), STAC, and RESTful standards represent low friction; custom Command Line Interface (CLI) or proprietary scripts represent high friction), and the transparency and generosity of rate limits and throttling.
- Licence and legal covers licence openness (CC-BY 4.0 representing low friction through to restricted use with no redistribution), licence clarity, and the clarity of attribution requirements.
- Data structure and format addresses data format accessibility (GRIB and NetCDF can be made more accessible through tooling such as ecCodes, earthkit, and CDSAPI), file organisation, and the flexibility of temporal and spatial granularity.
- Tooling and support covers the availability of open tools and SDKs, community or peer support, and the responsiveness of institutional helpdesk provision.
- Overall complexity captures consistency across datasets, the learning curve for new users, and update transparency – whether changes are communicated via stable APIs and versioned logs or arrive as silent breaks.
A visual summary of the six Dataset Friction Factors.
Across these six dimensions, frictions compound. A dataset with an open licence and strong documentation may still present significant barriers if its temporal granularity is fixed at inconvenient resolutions or if its format lacks tooling support. A moderately complex format like GRIB can be made substantially more accessible through well-designed tooling – libraries like ecCodes and earthkit, programmatic access via CDSAPI, and API designs aligned with standards like OGC EDR all reduce the effective friction of working with GRIB considerably. The framework focuses attention on where the greatest reductions in user effort are available, and on which dimensions a provider has the most leverage.
“Dataset Friction Factors give us a language for a set of decisions that have always existed but have rarely been examined systematically. Every data policy choice is a friction choice – the question is whether we are making it deliberately,” said Emma Pidduck, User Solutions Team Leader at ECMWF.
The emergence of AI-assisted interfaces is beginning to change this picture. Conversational tools can help users construct API queries, interpret metadata, or select parameters without the domain knowledge that has traditionally been a prerequisite. These tools rely on well-documented data and services and have the potential to reduce unintentional friction considerably, particularly for users approaching complex datasets for the first time. But it also raises new concerns about the accuracy of AI-generated interpretations of scientific data, where errors may not be apparent to users who lack the background to verify them.
Open data at scale: friction as a sustainability instrument
Removing a licensing restriction does not remove all friction – nor should it. The scale of ECMWF’s IFS operational data is substantial and continues to grow. Providing this volume freely and openly, without constraint, to an unbounded global user base would impose significant infrastructure costs on networks, storage, and compute that few public institutions could absorb indefinitely. Managing this reality is itself an exercise in deliberate friction design.
ECMWF’s approach combines several instruments. A defined subset of IFS products is available free of charge, with delivery subject to latency offsets and network restrictions that spread the load and protect operational delivery. Users requiring faster access, higher volumes, additional parameters or enhanced service levels – including commercial developers, large-scale research pipelines, and operational users outside the Member and Co-operating State network – are served through a fee structure calibrated to service level and volume. This is friction working as intended: not to restrict access or usage, but to allocate infrastructure fairly and sustain the quality of service provided.
A complementary instrument is the fee waiver scheme, which removes financial friction where it is most consequential. Qualifying users can access data at no service cost, ensuring that the ability to pay does not become the determining factor in who can benefit from publicly funded data. Taken together, these mechanisms represent a considered allocation of friction: generous where access should be broad, cost-reflective where demand places a genuine burden on shared systems.
A visual overview of how reducing friction in data access unlocks greater value from ECMWF’s meteorological data. Credit: generated with NotebookLM
Friction between data producers and providers
The DFF framework applies most naturally to individual datasets and the organisations that produce them. A substantial share of friction in the European data landscape, however, arises at the boundaries between organisations, and a particular form of it occurs when a dataset produced by one organisation is subsequently distributed by another.
When a third-party provider redistributes ECMWF products, the original producer has no visibility of how it is served, documented, or supported. Users encountering ECMWF products through a redistribution channel may face a friction profile that differs entirely from what ECMWF itself provides – different API conventions, different metadata quality, different latency characteristics, and different support arrangements. ECMWF cannot speak to the accuracy of redistributed processing, guarantee alignment with its own documentation, or resolve issues that originate downstream. This creates a category of friction that is effectively invisible to the producer: real in its effects on users, but outside the producer’s control. However, in some cases, third-party redistributors are removing friction and making access much easier – sometimes it’s free, sometimes with a charge.
More broadly, a data producer who also acts as a direct provider occupies a different position from organisations that distribute data they did not generate. The former covers production, quality assurance, and user support; the latter may serve the same data with lower overhead. Navigating this asymmetry – maintaining the value of direct provision while enabling the broader reach that redistribution enables – is one of the more complex friction-management questions facing major environmental data institutions.
A user who needs to combine data across multiple providers, each with different access conventions, metadata standards, and licensing frameworks, faces an aggregate friction that no single provider can fully address. This is where cross-institutional standards – OGC EDR, STAC, shared federated identity – and the governance frameworks being developed under the Common European Data Spaces initiative provide the most leverage. Their purpose is precisely to reduce the friction that accumulates at institutional boundaries, and that individual providers cannot resolve alone.
“The friction that matters most in environmental data today is rarely within organisations – it is between them. Data spaces represent a serious and timely attempt to address that, and the meteorological community has a real opportunity to shape how they develop,” said Jeremy Tandy, Principal Fellow at the Met Office, UK.
Friction as a design question
Dataset Friction Factors do not resolve data policy questions. Every decision about a licence, an API design, an access mechanism, a metadata standard, or a support model creates or removes friction for users. Bringing these decisions into a common framework makes it easier to ask which frictions are serving their intended purpose, which have accumulated without intention, and which were once justified but are now due for review.
ECMWF’s experience with providing IFS products over the past several years illustrates what this looks like in practice. The decision to move from restrictive licensing to a stepwise opening was not a single policy change but a sequence of decisions, each reducing friction in specific dimensions while preserving the mechanisms – tiered access, fee structures, operational prioritisation, fee waivers – that allow ECMWF to sustain the quality and reliability of the data and services it provides.
The objective of a modern data strategy is not to eliminate all friction, but to ensure that every barrier is deliberate – understood, justified, and kept under review. This requires knowing exactly what each restriction protects and its impact on users and communities affected.
As the volume of available environmental data continues to grow and AI-based tools lower the threshold for working with complex datasets, the question of how data reaches its users becomes more consequential, not less. The institutions best placed to create value from their data will be those that have thought carefully about what stands between it and the users who need it.
Further reading
This article is part of ECMWF’s In Focus series on data exploring how evolving infrastructure, open data, and AI-ready systems are reshaping access to weather and climate information: