For many years, ECMWF has operated a large-scale data handling system (DHS), in which all ECMWF users can store and retrieve data that is needed to perform weather modelling, research in weather modelling and mining of weather data.
The user view
From the user viewpoint the DHS supports two main applications developed by ECMWF to hide the complexities of the underlying storage management from users:
- MARS, the Meteorological Archival and Retrieval System, provides access to a powerful abstraction engine that allows staff and applications to access the meteorological data that has been collected or generated at ECMWF for more than 30 years. MARS stores GRIB and BUFR data, hiding from its users all of the details concerning the physical location and internal organisation of this data. It manages its own set of disk caches for staging data that has been recently acquired, generated or accessed. However the bulk of its information is stored in HPSS (see below).
- ECFS provides users with a logical view of a seemingly very large file system, and is used for data that is not suitable for storing in MARS. UNIX-like commands enable users to copy whole files to and from any of ECMWF's computing platforms. ECFS uses the storage hierarchy of disks and tapes within HPSS to store the files and their associated metadata (file ownership, directory structure, etc.).
Underlying storage management
Supporting ECFS and most of MARS is an underlying file archiving component, IBM's High Performance Storage System (HPSS), in which data is kept and managed. It keeps track of files that are stored, provides Hierarchical Storage Management (HSM) facilities when needed, and it manages activities related to disks, tapes, tape drives and automated tape libraries.
HPSS is based on version 5 of the IEEE Mass Storage Reference Model. It supports a variety of tape drives and automated tape libraries. Data is transferred from multiple storage devices via multiple data streams over multiple network paths; in this way high aggregate transfer rates are achieved. 'Data movers' (specialised software modules), which can execute on different server machines, send streams of data directly between those servers and the client machines requesting the data transfer. This distributed multi-processing nature of HPSS is one of the keys to its scalability.
In turn, HPSS uses IBM's DB2, a high performance database management system with advanced transaction-processing techniques, to guarantee security, protection and integrity of data.
In addition a secondary copy of the most important data is kept on tape cartridges in the Disaster Recovery System (DRS).
At Spring 2014 the DHS hardware includes the following:
- Many servers are used to execute the HPSS, MARS and ECFS applications. Most are now Intel-based running RHEL6 (MARS and most HPSS data handling) which are replacing earlier IBM pSeries servers running AIX (still used for ECFS and the core HPSS services).
- A set of four Oracle (Sun) SL8500 tape libraries provide access to tape cartridges on which the bulk of the DHS data is stored.
- Many IBM V7000 and DDN subsystems provide disk storage that is used to cache data being stored into, or retrieved from, the tape libraries, as well as the metadata needed by HPSS, MARS and ECFS.
The DHS servers are connected to each other, to the DHS clients including the HPC, and to the Centre's general purpose servers and desktops through the Centre's main 10-gigabit network. Some 1 gigabit networks provide additional internal control paths.
As with the data, some of this equipment is housed in the DRS.
Some figures (February 2017)
- On an average day the system handles requests for about 15,500 tape mounts (increased from 11,500 in 2015), and on some days this can peak at around 25,000.
- In a typical day the archive grows by about 233 TB.
- MARS data represents about 80% of the volume of data stored in the DHS, but only about 6% of the number of files. ECFS data represents the remaining 20% of the data, corresponding to 94% of the files.
- The DHS provides access to over 210 PB of primary data (had been 125 PB in 2015). An additional 46 PB of backup copies (had been 20 PB in 2015) of part of the primary data are stored in the DRS. There are about 260 million files in ECFS (had been 204 million) and over 18 million in MARS (had been about 15 million).