Home page  
Home   Your Room   Login   Contact   Feedback   Site Map   Search:  
Discover this product  
About Us
Overview
Getting here
Committees
Products
Forecasts
Order Data
Order Software
Services
Computing
Archive
PrepIFS
Research
Modelling
Reanalysis
Seasonal
Publications
Newsletters
Manuals
Library
News&Events
Calendar
Employment
Open Tenders

 

 
Home > Services > Computing > Overview > IBM Supercomputers     
    

The IBM Supercomputers

 

 

 
 

IBM Super Computer Phase 7

One of the two IBM POWER7 775 clusters installed at ECMWF

ECMWF's current High Performance Computing Facility (HPCF) is the result of a competitive procurement carried out in 2007.  This resulted in IBM being awarded a two phase service contract to supply and support a HPCF until mid 2014. 

The first phase started operation in 2009 and was based on two identical but independent IBM POWER6 Cluster 1600 supercomputer systems. Starting in October 2012 the operational service was migrated to to two new clusters based on IBM POWER7 technology.

The current High Performance Computing Facility continues the successful design of having two independent clusters that can cross-mount storage. Dual clusters add significantly to the resiliency of the system, allowing flexibility in performing maintenance and upgrades and when combined with separate resilient power and cooling systems provide protection against a wide range of possible failures.

The System is based on nodes made up of 4 IBM POWER7 processors each with eight cores. Eight nodes make up a drawer and 4 drawers make a super-node. A low latency high speed network connects each node to every other node in a super-node and each super-node to every other super-node. The bandwidth of this interconnect is 23 terabytes per second per compute cluster.

Overview

Each compute cluster weighs more than 26 metric tons and comprises:

  • 24 super nodes, made up of 32 nodes;
  • 732 “normal memory” application nodes with 64GiB of memory;
  • 20 “large memory” applications nodes with 256GiB of memory;
  • 10 “Availability Plus” spare nodes;
  • 6 “service” nodes;
  • 24,576 POWER7 processor cores;
  • 53TiB of memory

 

IBM Super Computer Phase 7

schematic diagram of the system

The system provides almost three times the sustained performance on ECMWF codes of the previous system and has a theoretical peak performance of about 1.5 petaflops.

Architecture of a compute cluster

The compute clusters are based on nodes made up of 4 IBM POWER7 processors each with eight cores. Eight nodes make up a drawer and 4 drawers make a super-node.  The nodes are connected together via the IBM Host Fabric Interface (HFI) interconnect which was designed specifically for high end POWER7 based supercomputers as a response to the DARPA PERCS (Productive, Easy-to-use, Reliable Computing System) challenge.  Each four-socket node has a HFI hub chip, this directly connects the node to the other seven nodes in the same drawer and the other twenty-four nodes in the same "super-node".  The hub chip also provides optical connections to other super-nodes so that each super-node is directly connected to every other super-node in the cluster.  A simple representation of a super-node is shown below: 

 

 

IBM Super Computer Phase 7

simple representation of a super-node

Storage

The whole system has more than three petabytes of usable storage served from two separate storage I/O clusters using IBM GPFS. Each of these storage I/O clusters is made up of two racks containing five drawers of disks and two eight node server drawers. 

Each rack of storage has five drawers of disk and a drawer has eight storage groups with 47 600GB small form factor SAS disks giving a total of 376 disks per drawer.  Each of the storage groups in a drawer also has a solid-state storage device for system data.

The disk layout is not a traditional RAID layout where disks are arranged in RAID groups (e.g. RAID6, 4 data, 2 parity) which connect to a hardware storage controller. Under GPFS "Native RAID" the data blocks are distributed over all the disks in an array and error detection and correction is provided by Reed-Solomon encoding. This effectively gives a RAID configuration with 8 data and 3 parity disks but in the event of a disk failure allows the data to be quickly rebuilt using all the disks in the array rather than just the disks in the RAID stripe.

The storage is served from 10 of the 16 server nodes in each rack. These nodes act as GPFS Network Shared Disk (NSD) servers, the remaining nodes provide utility functions or are hot spares to enhance the resilience of the service.  Each server node is the same as a node in the compute cluster except that it has 128GiB of memory and two quad-ported SAS adapters.  The SAS adapters connect each node to two disk enclosures.  Each drawer of nodes in the storage cluster is connected to every supernode in both compute clusters via dedicated HFI links.

The GPFS architecture enables any node on any cluster to access any file on either of the storage I/O clusters. This enables users of ECMWF's high-performance computing facility to work with increased productivity.

Phase 1 vs Phase 2

 

Phase 1
Current System "Phase 2"
Compute Clusters
2
2
Peak Performance (teraflops)
325
~1,478
Sustained Performance (teraflops)
20
70
Each compute cluster
Compute Nodes
272
772
Compute cores
8,702
24,704
Operating System
AIX 5.3
AIX 7.1
Interconnect
8 plane IB4x DDR
IBM HFI
Each compute node
Memory in compute node (GiB)
64 ( 8 x 256)
64 ( 20 x 256)
Chips per node
16
4
Cores per chip
2
8
Each processor (core)
Threads per core
1,2
1,2,4
Clock frequency (gigahertz)
4.7
3.8
Operations per clock cycle
4
8
L1,L2,L3 Cache
64KiB/4MiB(shared)/32MiB(off chip)
32KiB/256KiB(private)/4MiB(shared)
Storage system
Storage (petabytes)
1.2
3.14

 

 


  

Top of page 25.03.2013
 
   Compare Pages Page Details   © ECMWF   
shim shim shim