|
|
In order to run weather forecast models within a schedule that has reasonably short
timeslots, powerful supercomputer systems are required..
The first version of the ECMWF weather forecasting model was developed
on a Control Data Corporation 6600 computer from 1976 to 1978. Although
the CDC6600 was one of the most powerful systems available at the time,
the forecast model still needed 12 days to produce a 10-day forecast.
However, this showed that provided a suitably powerful computer could
be acquired, useful forecasts could be produced
In June 1977, before the opening of the permanent headquarters at Shinfield
Park in Reading, ECMWF contracted for delivery of its own supercomputer,
a CRAY-1A, manufactured by Cray Research which was installed in at ECMWF's
headquarters in Shinfield, Reading in late 1978. Before then, the Centre's
scientists had access to "Serial 1", the first production model
of the CRAY 1 series in order to test out all the programs required to
produce a real operational forecast.
 |
ECMWF delivered its first operational medium-range
weather forecast to its Member States on 1 August 1979. It was produced using the
CRAY-1A. This used about 5 hours of CPU time to produce a 10 day
forecast, which was more than 50 times faster then the CDC6600,
thereby making the production of 10-day forecasts a feasible undertaking.
The CRAY 1-A was a single processor computer with
a memory of 8 Mbytes and a disk subsytem totalling 2.4 Gbytes. With
a clock cycle time of 12.5 nanoseconds (equivalent to 80 MHz) and
the ability to produce two results per cycle, the system had a theoretical
peak performance of 160 Megaflops. Running the operational weather
model, the machine was capable of a sustained performance of 50
Megaflops (50 million arithmetic calculations per second). |
 |
In 1984, the Cray-1A was replaced by a dual processor
Cray XMP-22. This had two CPUs and 16 Mbytes of memory. ECMWF used
this system to pioneer the operational use of multitasking, i.e.
a programming paradigm that makes use of more than one CPU by a single program. The XMP system
also incorporated a number of additonal improvements - an IOS (Input-Output
Subsystem), which allowed the disks to be handled more efficiently
and an SSD (Solid State Disk), which provided facilities for temporary
I/O at speeds substantially faster than using disk.
In 1986, the Cray XMP-22 was replaced by a four
processor Cray XMP-48. This system had 4 CPUs, 64 Mbytes of memory
and a cycle time of 10.5 nanoseconds (112 MHz), with a theoretical
peak performance of 800 Megaflops. |
 |
The XMP-48 was replaced in 1990 by a Cray Y-MP
8/8-64. This system had 8 cpus with a cycle time of 8.5 nanoseconds
(166 MHz) , and 512 Mbytes of memory. It was also the first ECMWF
supercomputer running theUnix operating system - the previous three Cray systems
had all used Cray's own proprietary operating system called COS; the YMP
used Cray's implementation of Unix called UNICOS, which was based on ATT
System V Unix with Berkeley extensions and many enhancements developed
by Cray Research Inc. This heralded the gradual introduction of Unix systems
at ECMWF - today all of the systems used at ECMWF, from desktop PCs to
supercomputers, run some form of Unix. |
 |
In 1992 the Y-MP was replaced by a Cray C90, with
16 CPUs and 16 Gbytes of Memory. Each CPU of the C90 had a theoretical
peak perfromance of 1 Gigaflop (1000 million arithmetic calculations
per second).
|
 |
Up until this time, all the Cray supercomputers at
ECMWF used a shared memory, i.e. each of the processors in the system
could access directly any part of the memory. In 1994 ECMWF entered the new
world of distributed memory parallel processing, with the installation
of a Cray T3D system. This system comprised 128 Alpha microprocessors,
each with 128 Mbytes of memory. The processors were connected to form
a 3D torus. The major difference with this system was that since the
memory was distributed, i.e. attached to each processor, message passing
between the processors was required. This meant that substantial changes
to the weather forecasting system were required to operate efficiently
on this type of architecture. The T3D itself did not have any disks
or network connections - these were provided by a small YMP-2E system
connected to the T3D by a 200 Mbytes/sec high speed channel. |
 |
In 1996 the VPP700, the first of three large Fujitsu VPP systems
was installed, initially with 36 processors. This was
also a distributed memory system, each processor (or PE - processor
element) having direct access only to its own memory. The system
also incorporated a very high speed interconnect (a fully non-blocking crossbar switch) which
allowed messages to be passed from one PE to any other PE with the minimum of
latency. The VPP 700 was increased to 116 processors in 1997. Each
processor in the VPP700 was capable of 2.2 Gigaflops peak.
In 1998 an additional VPP700E with 48 processors was installed.
The VPP700E was very similar to the VPP700, but with slightly faster
processors.
|
 |
In 1999 the VPP5000
was installed, initially with 38 processors. This system was upgraded to its final configuration
with 100 PEs in 2000. Each processor in the VPP5000 was capable
of a peak performance of 9.6 Gigaflops, more than 4 times faster than those
in the VPP700. The sustained performance of the whole VPP5000 was
almost 300 Gigaflops.
After providing very good service for many years, the Fujitsu VPP systems were eventually decommissioned at
the end of March 2003.
|
 |
In the second half of 2002, two IBM Cluster 1600 systems, consisting of 30 p690 SMP servers connected by an SP2 switch, were installed and commissioned. These were a departure from the earlier production systems in that they were shared memory scalar (as opposed to vector) systems. The first operational forecasts using this system were produced on 4 March 2003.
In the second half of 2004, these two clusters were replaced by two new IBM clusters each with 70 p690+ servers connected by a pSeries High Performance Switch (also known as a Federation switch) |
 |
In the second half of 2006, the above two clusters were replaced by two new IBM clusters each with 155 p5-575+ servers connected by a pSeries High Performance Switch.
While there are about twice as many servers in this system as in the previous one, each server only contains half as many processors (with the same clock frequency). Nevertheless, new architectural features of the POWER 5 system mean that this system is almost twice as powerful as the system it replaced. The first operational forecasts using this system were produced on 15 August 2006. |
 |
In 2009, the above two POWER5-based clusters were replaced by two new IBM clusters each with 286 p6-575 servers connected
by an 8-plane IB4x-DDR infiniband network.
There are about twice as many servers in this system as in the previous one and each server contains twice as many processors (with a
clock frequency of 4.7Ghz, compared to 1.9GHz). This means that for ECMWF's applications this system is about five times as powerful as
the system it replaced. The first operational forecasts using this system were produced on 1 April 2009. |
 |
In late 2012, the first of the two clusters above was replaced by a new IBM cluster. Replacement of the second cluster will be completed in early 2013. Each cluster has 768 POWER7-775 servers connected by the IBM Host Fabric Interface (HFI) interconnect.
For the first time the processor clock frequency actually decreased, going from 4.7GHz to 3.83GHz, despite this each processor core has a theoretical peak performance 60% greater than that of the POWER6. For ECMWF's applications the system is about three times as powerful as the system it replaced.The first operational forecasts using this system were produced on 24 October 2012. |
Comparison of ECMWF's latest supercomputer with its first one
|
Specification
|
Cray-1A
|
IBM POWER7 System
|
Approx Ratio
|
Year installed |
1978 |
2012 |
|
|
Architecture
|
Vector processor
|
Dual Cluster of scalar CPUs
|
|
|
Number of Cores
|
1
|
~49,000
|
49,000:1
|
|
Clock Speed
|
12.5 nsec (80 MHz)
|
0.26 nsec (3.83 GHz)
|
49:1
|
|
Peak perf per Core
|
160 MFLOPS
|
30 GFLOPS
|
190:1
|
|
Peak perf per system
|
160 MFLOPS
|
~1.5PFLOPS
|
9,200,000:1
|
|
Sustained performance
|
~50 MFLOPS
|
~70 TFLOPS
|
1,400,000:1
|
|
Memory
|
8 MiBytes
|
~106 TiBytes
|
13,900,000:1
|
|
Disk Space
|
2.5 GBytes
|
~3.1 PBytes
|
1,250,000:1
|
|