For many years, ECMWF has operated a large-scale data handling system (DHS), in which all ECMWF users can store and retrieve data that is needed to perform weather modelling, research in weather modelling and mining of weather data. ECMWF's meteorological archive contains petabytes of operational and research data.
The user view
From the user viewpoint the DHS supports two main applications developed by ECMWF to hide the complexities of the underlying storage management from users:
MARS, the Meteorological Archival and Retrieval System, provides access to a powerful abstraction engine that allows the more than 7,000 registered users to access the meteorological data that has been collected or generated at ECMWF for almost 40 years. MARS stores GRIB and BUFR data, hiding from its users all of the details concerning the physical location and internal organisation of this data. It manages its own set of disk caches for staging data that has been recently acquired, generated or accessed. However, the bulk of its information is stored in HPSS (see below).
ECFS provides users with a logical view of a seemingly very large file system and is used for data that is not suitable for storing in MARS. UNIX-like commands enable users to copy whole files to and from any of ECMWF's computing platforms. ECFS uses the storage hierarchy of disks and tapes within HPSS to store the files and their associated metadata (file ownership, directory structure, etc.).
IBM's High Performance Storage System (HPSS) is used to manage data on a resilient hardware platform. HPSS keeps track of files that are stored, provides Hierarchical Storage Management (HSM) facilities when needed, and manages activities related to disks, tapes, tape drives and automated tape libraries.
HPSS is based on version 5 of the IEEE Mass Storage Reference Model. It supports a variety of tape drives and automated tape libraries. Data is transferred from multiple storage devices via multiple data streams over multiple network paths; in this way high aggregate transfer rates are achieved. 'Data movers' (specialised software modules), which can execute on different server machines, send streams of data directly between those servers and the client machines requesting the data transfer. This distributed multi-processing nature of HPSS is one of the keys to its scalability.
In turn, HPSS uses IBM's DB2, a high-performance database management system with advanced transaction-processing techniques, to guarantee security, protection and integrity of data.
All of the data is stored on tape. The primary copy is held on IBM 3592 TS1160 tape as it better supports the high levels of read access required for our use. A secondary copy of the most critical data is kept on separate tapes, some in separate libraries, some off site. For the secondary copy, LTO tapes are used, both to avoid having data on the same type of media in case of some underlying technology fault and for cost-effectiveness as these tapes are only needed to be read in the case of a problem.
The tapes are held in ten large automated tape library systems from IBM and SpectraLogic. In front of the tape systems there is a layer of disk storage that provides buffer and cache space, this disk layer is small, only a few percent of the total size of the archive. The buffer space allows data to be collected and then packed together to be efficiently written to tape and the cache spaces holds the most frequently requested files and fields to speed up user read requests.
A large fleet of dedicated Linux x86 servers from a variety of manufacturers are required to manage and support the system. They all connect to the tape and disk storage via a Fibre Channel Storage Area Network.
Some figures (October 2022)
On an average day the system handles requests for more than 13,000 tape mounts.
In a typical day the archive grows by about 287 TB, and 215 TB is retrieved.
MARS data represents about 75% of the volume of data stored in the DHS, but only about 4% of the number of files. ECFS data represents almost all of the remaining 25% of the data, corresponding to 96% of the files.
Total amount of primary data: 510 PB
Secondary data: 188 PB
Number of tapes: 22,300 primary, 24,500 secondary
Number of Linux servers: 290
Number of IBM 3592 tape drives: 396
Number of LTO drives: 50
Total amount of usable disk space: 28 PiB