Home page  
Home   Your Room   Login   Contact   Feedback   Site Map   Search:  
Discover this product  
About Us
Overview
Getting here
Committees
Products
Forecasts
Order Data
Order Software
Services
Computing
Archive
PrepIFS
Research
Modelling
Reanalysis
Seasonal
Publications
Newsletters
Manuals
Library
News&Events
Calendar
Employment
Open Tenders

 

 
Home > Services > Computing > Overview > IBM Supercomputers >     
    

The IBM Supercomputers 

 

 

 
 

 

IBM POWER6 Cluster

One of the two IBM POWER6 Cluster 1600 systems installed at ECMWF

ECMWF's High Performance Computing Facility (HPCF) comprises two identical but independent IBM Cluster 1600 supercomputer systems. The computational basis of the HPCF is IBM's POWER6 microprocessor.

Since no single application will require more than half of the total computing resources it was decided that the system would comprise two independent clusters. This has several advantages. Two independent clusters add significantly to the resiliency of the system. For example, if one cluster were to suffer a major failure, the other cluster could still provide a service while the fault is being rectified. Another advantage is increased availability, for instance a system session that requires a whole cluster to be taken out of production for a period of time will only affect half of the system; the other cluster can continue to run production work. A further advantage is flexibility in maintaining and upgrading the operating system. It is possible to install new releases of software on one of the clusters and allow this release to run in production on that cluster, while the other cluster runs the earlier software release, until the time comes for it too to be upgraded.

The equipment is based on pSeries p6-575 servers interconnected by a low latency high speed 8-plane IB4x-DDR infiniband network. Each separate compute cluster comprises 286 pSeries p6-575 symmetric multiprocessor (SMP) servers (or nodes):

262 "normal memory" application nodes;
2 "hot" spare nodes;
8 "large memory" applications nodes;
2 "service" nodes;
12 network and I/O routing nodes (NIONs).

IBM POWER6 Cluster

Schematic configuration diagram

The high-memory nodes, which have 256GB of memory as opposed to 64GB in the "normal memory" nodes, allow for greater flexibility, especially for serial (non-parallel) programs that require large amounts of memory, but which cannot be converted to parallel programs that could then use the aggregated memory of multiple nodes.

The pSeries p6-575 SMP server has 32 separate processors (or "cores") with a clock frequency of 4.7GHz, giving a theoretical peak performance of 18.8GFLOPS. These processors are capable of running 2 threads concurrently. This is called simultaneous multi-threading (SMT) and has the effect of making each node appear to have 64 (logical) CPUs instead of the 32 (physical) processors. Besides the two cores on the water-cooled microprocessor chip, it also holds two memory controllers within its 790 million transistors as well as two separate 4MB level-2 caches, one for each core.

The 8-plane IB4x-DDR infiniband network connects each of the p6-575 nodes within an individual compute cluster. The eight switch planes provide a considerable increase in performance over that of a single plane and also enable the network to have better resiliency with respect to hardware errors.

Each cluster runs the AIX 5.3 operating system and Cluster Systems Management software (CSM). LoadLeveler is used as the batch subsystem. The multi-cluster version of IBM's General Parallel File System (MC-GPFS) is used by the clusters. This is a distributed journalled file system that uses token-based mechanisms to ensure file system consistency. It utilises NSD, the Network Shared Disk feature, to share data at a much higher level of performance than other file sharing mechanisms, such as NFS.

The whole system has about 1.2PB (1 petabyte = 1 million gigabytes) of SAS (Serial Attached SCSI) disks, connected to two separate storage I/O clusters. Each of these storage I/O clusters are made up of BIUs (Basic I/O Units), each comprising two pSeries p6-520 servers, each having four 4.2GHz POWER6 processors and 8GB of memory. The two servers are connected in a redundant fashion via RAID controllers to 14 drawers of disks, each drawer holding 12 (300GB) SAS disks in a 4D+P+Q RAID-6 configuration. The storage I/O clusters serve their data to both compute clusters over a 2-plane IB4x-DDR infiniband network, to which the 12 NIONs in each compute cluster are connected. The MC-GPFS architecture enables any node on any cluster to access any file on either of the storage I/O clusters. This enables users of ECMWF's high-performance computing facility to work with increased productivity.

Feature comparison between the new Phase 1 (Power 6) and old Phase 4 (Power 5+) systems

Feature
New Phase 1 (POWER 6) Phase 4 (POWER5+)
No. of clusters

2 computational clusters

2 IO storage clusters

1 small test system

2 computational clusters

1 small test cluster

1 MC-GPFS "owning" cluster

Performance
~20 TFLOPS (sustained) ~4 TFLOPS (sustained)
Each Computational Cluster
Operating System
AIX 5.3 AIX 5.3
I/O (VSD) nodes
N/A 8 x 16-way p5-575
Network nodes
12 x 32-way p6-575 (10-G-eth/
2xIB4x-DcDR)
2 x 16-way p5-575+ (10-G-eth)
Compute nodes
272 x 32-way p6-575 155 x 16-way p5-575+
Node interconnect
8-plane IB4X-DDR Infiniband p-series High Performance Switch (dual-plane)
I/O subsystem
Storage cluster
2 of (36 x 4-way p6-520) N/A
Disk types
SAS (Serial-Attached-SCSI) FAStT900 (DS4500)
Amount of disk
~1.2PB in total ~100TB in total
Filesystem
MC-GPFS MC-GPFS
Each server (p6-575/p5-575+)
Memory
64GB (~20TB per cluster) 32GB (~4.5TB per cluster)
Dual-core chips
16 8
Processors (cores)
32 (~8,700 per cluster) 16 (~2,250 per cluster)
SMP building blocks
(16) Dual-Chip Modules (DCMs) with 1 dual-core chip on each

(8) Dual-Chip Modules (DCMs) with 1 dual-core chip on each

Each processor (or "core")
Lithography
65nm 90nm
No. of transistors
790million (per dual-core-chip) 276million (per dual-core chip)
Frequency
4.7GHz 1.9GHz
Peak performance
18.8GFLops (160TFlops per cluster) 7.6GFlops (~19TFlops per cluster)
Level-2 cache
2 x 4MB (10-way LRU) 1.92MB (10-way LRU)
Level-3 cache
32MB (12-way LRU) 36MB (12-way LRU)
Memory controller
2 x on-chip 1 x on-chip
Additional POWER6 processor ("core") features
Upwards binary compatibility with POWER5+
Enhanced dynamic electrical power management and reduced power leakage
Water-cooling

 

IBM POWER6 Server

A view of the IBM p6-575 servers


  

Top of page 02.10.2009
 
   Page Details   © ECMWF   
shim shim shim