The living archive: inside ECMWF’s exabyte-scale meteorological data repository

Share
Photo of ECMWF's HPC illuminated green. The image is slightly swirled

ECMWF's Meteorological Archival and Retrieval System (MARS) is one of the largest meteorological archives in the world.

What began in the mid 1980s as a pioneering archival and retrieval system has evolved into an exabyte-scale archive, containing operational and research data. The archive continues to expand by roughly 480 terabytes each day, and more than 670 billion meteorological fields are stored within it. 

Yet MARS is far more than a long-term vault. It’s a living, continuously accessed archive that supports operational forecasting, scientific research and major European programmes. Its architecture, data model and governance have been deliberately designed to ensure consistency across decades of technological change, while meeting the demands of a global user community. 

From gigabytes to exabytes

When MARS started operating in 1987, it ran on a mainframe equipped with 12.5 gigabytes of disk storage and a mass storage system capable of holding 35 gigabytes. 

At the time, it initially covered deterministic analyses and medium-range forecasts from ECMWF’s numerical weather prediction system. 

Over the decades, however, the archive has grown substantially, both in scale and in the variety of datasets it contains. 

Today MARS holds a wide range of meteorological and climate data, including: 

  • operational forecast outputs 
  • ensemble prediction system data 
  • ensemble data assimilation outputs 
  • seasonal prediction datasets 
  • limited-area model experiments 
  • climate reanalysis products 
  • research experiments 

This includes operational outputs from both ECMWF’s Integrated Forecasting System (IFS) and the Artificial Intelligence Forecasting System (AIFS), which are archived daily as part of ECMWF’s global analyses and forecasts. 

This expanding dataset forms a long-term record of the atmosphere, enabling research on weather and climate processes across decades. 

One of the system’s defining features is continuity. Despite major technological changes over the years, the structure of MARS requests has remained stable. A data request written decades ago can still retrieve the same dataset today.

“What makes MARS unique is that it never sleeps. It’s continuously ingesting new data while simultaneously serving hundreds of thousands of retrievals each day – a true living archive,” said Umberto Modigliani, Acting Director of Forecasts and Services at ECMWF.

A line graph showing ECMWF archive growth from 2003 to 2025. Three lines—green, red, and blue—rise slowly until 2012, then accelerate. By 2025, the green line is just under 600 PB, the red near 900 PB, and the blue above 1,200 PB.

Growth of the ECMWF data archive from 2003 to 2025. The figure shows the rapid expansion of storage for the Meteorological Archival and Retrieval System (MARS) Primary, MARS Primary plus ECMWF File Storage (ECFS) Primary, and the full system including the secondary data store, reflecting the accelerating volume of data generated and preserved by ECMWF.

Most large archives are designed on a “write once, read rarely” principle, storing data for preservation with minimal access.  

MARS, on the other hand, operates very differently, handling a continuous daily stream of retrieval requests from scientists and operational users worldwide.

The archive supports thousands of users, with many more accessing its data via associated platforms. On an average day, the system performs more than 20,000 tape mounts, reflecting high demand for data.

This heavy read workload creates unique challenges. Much of the archive resides on large tape libraries, which provide highly cost-efficient long-term storage but must cope with frequent access. 

To manage this demand, MARS maintains disk-based caches that stage recently produced or frequently accessed data, reducing the need to repeatedly retrieve the same fields from tape. 

The system operates within ECMWF’s Data Handling System (DHS), which manages the physical storage hierarchy across disks and tape libraries.

Querying the archive 

One of the most distinctive features of MARS is that users do not retrieve files directly. 

Instead, they query the archive using a specialised meteorological description language, providing unique semantic-based access to the archive.

Users specify the characteristics of the data they need, such as: 

  • parameter (for example temperature or wind) 
  • date and time 
  • forecast step 
  • model configuration 
  • vertical level 
  • geographic domain 

The system then translates these requests into the corresponding meteorological fields. 

Behind the system lies a two-layer architecture:

  • The MARS Server contains the semantic knowledge of the data. It knows what a meteorological field is and what a forecast is. 
  • The Data Server manages the physical knowledge. It knows where the data is located – whether on disk, tape or in a cache – and retrieves it accordingly. 
A schematic diagram showing the MARS server sending and receiving data, requests, and references with metadata stores and a data server, which in turn links to a storage hierarchy.

MARS architecture, from requests, to references, files, and data. The MARS Server does not handle files but data references. When a user request is processed, the MARS Server translates it into a list of data references, which are passed to the Data Server. The Data Server translates data references into actual files and returns the data.

A 3D grid of stacked cubes representing archived meteorological data organised by dates, members, times, levels and parameters.

Example of an archived object in MARS, stored as hypercubes. Each cube represents a specific time, forecast date, and ensemble member, and is subdivided by parameters, levels, and time steps.

Because these layers are separated, the archive can reorganise its physical storage without affecting how users access the data. 

To make this vast archive accessible, ECMWF provides WebMARS, a web-based interface for searching, browsing and retrieving data without needing to interact directly with the underlying archive.

Within ECMWF’s broader data infrastructure, MARS sits alongside two other specialised systems: 

  • The Fields DataBase (FDB), used for storing and retrieving operational model fields 
  • ECMWF File Storage (ECFS), which complements MARS by providing a logical view of unstructured data, accessible via UNIX-like commands, and stored on the same tape hierarchy managed by the high-performance storage system (HPSS). 

Together, these systems form the backbone of ECMWF’s data handling capabilities. 

Ensuring trust through metadata governance 

Each field archived in MARS is stored with detailed metadata describing its scientific characteristics such as model cycle, experiment version, parameter, level and time. These metadata conventions, defined through the MARS request and indexing model, allow users to trace data back to the forecasting or assimilation systems that produced them, supporting long-term scientific reproducibility.

This design is reflected in ECMWF’s documentation of MARS keywords (such as expver, stream, type, and date) and the GRIB/BUFR formats used to encode meteorological fields, both of which carry rich descriptive information. 

A seamless relocation: moving MARS to Bologna 

In 2022, ECMWF completed one of the most complex infrastructure migrations in its history by relocating its operational data archive from Reading in the United Kingdom to its new data centre in Bologna, Italy. 

At the time, the archive contained more than 450 petabytes of primary data (around 700 petabytes including backup copies). 

Tape libraries, disk systems and servers were gradually dismantled in Reading, transferred to Bologna and reassembled – all while maintaining operational forecasting services. 

By the end of the process, the entire archive had been successfully relocated without interrupting ECMWF’s operational forecasts, demonstrating the robustness of the MARS architecture. 

Supporting major programmes 

Today, MARS underpins a number of major European initiatives. 

Large volumes of reanalysis and model data are archived in MARS and partially replicated into the cloud infrastructures powering the Copernicus Climate Change Service (C3S) and Copernicus Atmosphere Monitoring Service (CAMS) data stores. 

The Destination Earth (DestinE) initiative produce enormous Earth system datasets that require long-term preservation, quality control and retrieval. MARS and its associated systems play an increasing role in integrating, indexing and distributing these data holdings. 

Balancing cost and scalability 

Replicating the entire archive into commercial hyperscale cloud platforms would be prohibitively expensive at exabyte scale. MARS’s hybrid architecture, combining high-capacity tape libraries with targeted cloud replicas for user-facing services, provides a far more cost-efficient model while still supporting cloud native access for programmes such as Copernicus. 

A model of modern stewardship 

Despite its scale, MARS operates like a living system continuously ingesting new model outputs and research experiments. 

Each new dataset and each retrieval request adds another layer to an archive that has been evolving for nearly four decades. 

“By combining innovative architecture, meticulous metadata management, and a pragmatic approach to cost and accessibility, MARS ensures that every byte is not only preserved, but remains discoverable, usable, and trustworthy for decades to come,” said Manuel Fuentes, Principal Analyst & Data Archives and Dissemination Services Team Leader at ECMWF.

As we pivot toward the era of AI forecasting systems, MARS is taking on its most important role yet. AI models are data-hungry: they require decades of high-quality, structured historical data to 'learn' the physics of the atmosphere. MARS is the ultimate training ground for these models, and, ironically, the outputs of these new AI forecasts are already being archived back into MARS every day. 

As we move deeper into an age where AI begins to sift through 40 years of atmospheric memory to predict the next superstorm, MARS stands as the essential infrastructure. It is a reminder that the future of our planet depends on our ability to remember its past – one petabyte at a time. 


Further reading

MARS is part of a broader data ecosystem at ECMWF that is evolving to support AI-driven forecasting and European data services. Discover how these systems interconnect to shape the future of weather and climate prediction in our recent In Focus articles: