Home page  
Home   Your Room   Login   Contact   Feedback   Site Map   Search:  
Discover this product  
About Us
Overview
Getting here
Committees
Products
Forecasts
Order Data
Order Software
Services
Computing
Archive
PrepIFS
Research
Modelling
Reanalysis
Seasonal
Publications
Newsletters
Manuals
Library
News&Events
Calendar
Employment
Open Tenders
   
Home > Services > Computing > Overview > Data Handling System >     
   

Data Handling System (DHS)

 
 

Automated tape library silos

Introduction

For many years, ECMWF has operated a large-scale Data Handling System (DHS), in which all ECMWF users can store and retrieve data that is needed to perform weather modelling, research in weather modelling or weather data-mining.

The Data Handling System is composed of three main components.

IBM’s High Performance Storage System (HPSS) is used as the underlying archiving system in which data is kept. HPSS keeps track of files that are stored, provides Hierarchical Storage Management facilities when needed, and is a single point of control for activities related to tapes, tape drives and automated tape libraries. It also manages disk space on which a significant part of the archive resides. HPSS is based on version 5 of the IEEE Mass Storage Reference Model. It uses 'data movers' (specialised software modules), which can execute on different server machines, to send streams of data directly between those servers and the client machines that request the data. It enables data to be transferred in a parallel manner, via several different data streams over different network paths from multiple storage devices and in this way it provides extremely high transfer rates. This distributed multi-processing nature of HPSS is one of the keys to its scalability. It uses DB2, a high performance database management system and advanced transaction-processing techniques to guarantee security, protection and integrity of data. It supports a variety of tape drives and automated tape libraries. On an average day the system handles requests for about 5,000 tape mounts, and on some days this can double.

MARS, the Meteorological Archive and Retrieval System, is one of two main applications that have been developed by ECMWF to hide the complexities of HPSS and data management from the users. It provides access to a powerful virtualisation engine that allows scientific and operational staff and applications to access the wealth of meteorological data that has been collected or generated at ECMWF for more than 30 years. MARS hides from its users all of the details concerning the physical location and internal organisation of this data. It manages its own set of disk caches for staging data that has been recently acquired, generated or accessed. However the bulk of its information is stored in HPSS. MARS data represents about three quarters of the volume of data stored in the DHS, but only about one tenth of the number of files.

ECFS is the other application developed by ECMWF. It enables users to store data that is not suitable for storing in MARS. It provides users with a logical view of a seemingly very large file system, with rcp-like tools that enable them to manage their directories and files and to copy whole files to/from any of ECMWF’s computing platforms. ECFS uses the storage hierarchy of disks and tapes within HPSS to store these files as well as the related metadata (file ownership, directory structure, etc. ). ECFS data represents about one quarter of the volume of data stored in the DHS, and about nine tenths of the number of files.

As of September 2006, ECMWF's DHS provided access to about four petabytes of primary data, plus an additional petabyte of backup copies of part of the primary data. There were about 25million files in ECFS and over 2million in MARS.

Configuration.

As shown in the diagram below, the DHS comprises several key components.

Several IBM pSeries servers running the AIX 5.2 operating system are used to execute the HPSS, MARS and ECFS applications.

Several IBM FAStT disk subsystems provide disk storage that is used to cache data being stored or retrieved from the DHS, as well as the metadata needed by HPSS, MARS and ECFS.

An automated tape silo complex provides access to tape cartridges, on which the bulk of the DHS data is stored.

In addition a secondary copy of the most important data is kept on tape cartridges in the Disaster Recovery System.

All DHS servers are connected to each other and to the DHS clients through a collection of gigabit-Ethernet networks.

HPSS Configuration 2006

 


 

Top of page 11.09.2006
 
   Page Details         © ECMWF   
shim shim shim