SL8500 tape libraries
For many years, ECMWF has operated a large-scale Data Handling System (DHS), in which all ECMWF users can store and retrieve data that is needed to perform weather modelling, research in weather modelling and mining of weather data.
The user view
From the user viewpoint the DHS supports two main applications developed by ECMWF to hide the complexities of the underlying storage management from users:
- MARS, the Meteorological Archive and Retrieval System, provides access to a powerful abstraction engine that allows staff and applications to access the meteorological data that has been collected or generated at ECMWF for more than 30 years. MARS stores GRIB and BUFR data, hiding from its users all of the details concerning the physical location and internal organisation of this data. It manages its own set of disk caches for staging data that has been recently acquired, generated or accessed. However the bulk of its information is stored in HPSS (see below).
- ECFS provides users with a logical view of a seemingly very large file system, and is used for data that is not suitable for storing in MARS. UNIX-like commands enable users to copy whole files to and from any of ECMWF's computing platforms. ECFS uses the storage hierarchy of disks and tapes within HPSS to store the files and their associated metadata (file ownership, directory structure, etc.).
Underlying storage management
XIV disk storage
Supporting ECFS and most of MARS is an underlying file archiving component, IBM's High Performance Storage System (HPSS), in which data is kept and managed. It keeps track of files that are stored, provides Hierarchical Storage Management (HSM) facilities when needed, and is a single point of control for activities related to tapes, tape drives and automated tape libraries. It also manages disk space on which a significant part of the archive resides.
HPSS is based on version 5 of the IEEE Mass Storage Reference Model. It supports a variety of tape drives and automated tape libraries. Data is transferred from multiple storage devices via multiple data streams over multiple network paths; in this way high aggregate transfer rates are achieved. 'Data movers' (specialised software modules), which can execute on different server machines, send streams of data directly between those servers and the client machines requesting the data transfer.This distributed multi-processing nature of HPSS is one of the keys to its scalability.
In turn, HPSS uses IBM's DB2, a high performance database management system with advanced transaction-processing techniques, to guarantee security, protection and integrity of data.
In addition a secondary copy of the most important data is kept on tape cartridges in the Disaster Recovery System (DRS).
Interior of SL8500 tape library
The DHS hardware comprises:
- Several IBM pSeries servers running the AIX operating system are used to execute the HPSS, MARS and ECFS applications.
- A set of three Oracle (Sun) SL8500 tape libraries provide access to tape cartridges on which the bulk of the DHS data is stored.
- Several IBM FAStT and IBM XIV disk subsystems provide disk storage that is used to cache data being stored into, or retrieved from, the tape libraries, as well as the metadata needed by HPSS, MARS and ECFS.
All DHS servers are connected to each other and to the DHS clients through a collection of gigabit-Ethernet networks.
As with the data, some of this equipment is housed in the DRS.
Some figures (Summer 2010)
- On an average day the system handles requests for about 6,000 tape mounts, and on some days this can double.
- In a typical hour 7TB of data are moved.
- MARS data represents about 75% of the volume of data stored in the DHS, but only about 10% of the number of files. ECFS data represents the remaining 25% of the data, corresponding to 90% of the files.
- As of July 2010, the DHS provides access to over 15PB of primary data (had been 4PB in 2006). An additional 5PB of backup copies (had been 1PB in 2006) of part of the primary data are stored in the DRS. There are about 67 million files in ECFS (had been 25 million) and over 5 million in MARS (had been about 2 million).