Disaster Recovery System (DRS)

ECMWF operates a separate data repository in which are stored backup copies of critical data together with system backups. This serves both as a first level backup of systems and data, and also as a means of establishing a computer service and subsequent access to that data in the event of a major disaster.

The DRS is based on x64 servers running Linux, connected to an IBM 3584 tape library containing IBM Ultrium LTO-6, LTO-7 and LTO-8 tape drives and a mix of LTO media.

While the library has 1,700 cartridge slots, the DRS contains around 30,500 LTO media, most of which are on shelves with some held offsite. The current writable tapes are held within the library and as they become full they are moved to shelf space. If a shelved tape is required it is manually loaded into the library.

In the Data Handling System (DHS) about 40% of the Centre's data in MARS and ECFS has a secondary copy in the DRS. This represents the most critical meteorological data and is managed as additional tiers within the HPSS data archive and written to tape. For both MARS and ECFS, whenever a file cannot be retrieved because of problems with the primary copy, the backup copy will be retrieved from the DRS.

System backups are managed by TSM (Tivoli Storage Manager) running on an IBM Linux server.