Disaster Recovery System (DRS)

ECMWF operates a separate data repository in which are stored backup copies of critical data together with system backups. This serves both as a first level backup of systems and data, and also as a means of establishing a computer service and subsequent access to that data in the event of a major disaster.

The DRS is based on IBM servers running Linux, connected to an IBM 3584 tape library containing IBM Ultrium LTO-5 and LTO-6 tape drives and a mix of LTO-5 and LTO-6 media.

While the library has 1,700 cartridge slots, the DRS contains over 16,000 LTO media, most of which are on shelves. The current writable tapes are held within the library and as they become full they are moved to shelf space. If a shelved tape is required it is manually loaded into the library.

In the Data Handling System (DHS) about 20% of the Centre's most critical meteorological data in MARS and ECFS has a secondary copy in the DRS.  This is managed as additional tiers within the HPSS data archive and written to tape. For both MARS and ECFS, whenever a file cannot be retrieved because of problems with the primary copy, the backup copy will be automatically retrieved from the DRS.

System backups are managed by TSM (Tivoli Storage Manager) running on an IBM Linux server, and are stored on LTO-5 cartridges in the tape library.

