Home page  
Home   Your Room   Login   Contact   Feedback   Site Map   Search:  
Discover this product  
About Us
Getting here
Order Data
Order Software
Open Tenders
Home > Services > Computing > Overview > Supercomputer History >     

ECMWF Supercomputer History


In order to run weather forecast models within a schedule that has reasonably short timeslots, powerful supercomputer systems are required..

The first version of the ECMWF weather forecasting model was developed on a Control Data Corporation 6600 computer from 1976 to 1978. Although the CDC6600 was one of the most powerful systems available at the time, the forecast model still needed 12 days to produce a 10-day forecast. However, this showed that provided a suitably powerful computer could be acquired, useful forecasts could be produced

In June 1977, before the opening of the permanent headquarters at Shinfield Park in Reading, ECMWF contracted for delivery of its own supercomputer, a Cray-1A, manufactured by Cray Research which was installed in at ECMWF's headquarters in Shinfield, Reading in late 1978. Before then, the Centre's scientists had access to "Serial 1", the first production model of the CRAY 1 series in order to test out all the programs required to produce a real operational forecast.

ECMWF delivered its first operational medium-range weather forecast to its Member States on 1 August 1979. It was produced using the Cray-1A. This used about 5 hours of CPU time to produce a 10 day forecast, which was more than 50 times faster then the CDC6600, thereby making the production of 10-day forecasts a feasible undertaking.

The Cray-1A was a single processor computer with a memory of 8 Mbytes and a disk subsytem totalling 2.4 Gbytes. With a clock cycle time of 12.5 nanoseconds (equivalent to 80 MHz) and the ability to produce two results per cycle, the system had a theoretical peak performance of 160 Megaflops. Running the operational weather model, the machine was capable of a sustained performance of 50 Megaflops (50 million arithmetic calculations per second).

In 1984, the Cray-1A was replaced by a dual processor Cray X-MP/22. This had two CPUs and 16 Mbytes of memory. ECMWF used this system to pioneer the operational use of multitasking, i.e. a programming paradigm that makes use of more than one CPU by a single program. The X-MP system also incorporated a number of additonal improvements - an IOS (Input-Output Subsystem), which allowed the disks to be handled more efficiently and an SSD (Solid State Disk), which provided facilities for temporary I/O at speeds substantially faster than using disk.

In 1986, the Cray X-MP/22 was replaced by a four processor Cray X-MP/48. This system had 4 CPUs, 64 Mbytes of memory and a cycle time of 9.5 nanoseconds (105 MHz), with a theoretical peak performance of 800 Megaflops.

The X-MP/48 was replaced in 1990 by a Cray Y-MP 8/8-64. This system had 8 cpus with a cycle time of 6 nanoseconds (166 MHz) , and 512 Mbytes of memory. It was also the first ECMWF supercomputer running theUnix operating system - the previous three Cray systems had all used Cray's own proprietary operating system called COS; the Y-MP used Cray's implementation of Unix called UNICOS, which was based on ATT System V Unix with Berkeley extensions and many enhancements developed by Cray Research Inc. This heralded the gradual introduction of Unix systems at ECMWF - today all of the systems used at ECMWF, from desktop PCs to supercomputers, run some form of Unix.

In 1992 the Y-MP was replaced by a Cray C90, with 16 CPUs and 16 Gbytes of Memory. Each CPU of the C90 had a theoretical peak perfromance of 1 Gigaflop (1000 million arithmetic calculations per second).


Up until this time, all the Cray supercomputers at ECMWF used a shared memory, i.e. each of the processors in the system could access directly any part of the memory. In 1994 ECMWF entered the new world of distributed memory parallel processing, with the installation of a Cray T3D system. This system comprised 128 Alpha microprocessors, each with 128 Mbytes of memory. The processors were connected to form a 3D torus. The major difference with this system was that since the memory was distributed, i.e. attached to each processor, message passing between the processors was required. This meant that substantial changes to the weather forecasting system were required to operate efficiently on this type of architecture. The T3D itself did not have any disks or network connections - these were provided by a small Y-MP 2E system connected to the T3D by a 200 Mbytes/sec high speed channel.

In 1996 the VPP700, the first of three large Fujitsu VPP systems was installed, initially with 36 processors. This was also a distributed memory system, each processor (or PE - processor element) having direct access only to its own memory. The system also incorporated a very high speed interconnect (a fully non-blocking crossbar switch) which allowed messages to be passed from one PE to any other PE with the minimum of latency. The VPP 700 was increased to 116 processors in 1997. Each processor in the VPP700 was capable of 2.2 Gigaflops peak.

In 1998 an additional VPP700E with 48 processors was installed. The VPP700E was very similar to the VPP700, but with slightly faster processors.

In 1999 the VPP5000 was installed, initially with 38 processors. This system was upgraded to its final configuration with 100 PEs in 2000. Each processor in the VPP5000 was capable of a peak performance of 9.6 Gigaflops, more than 4 times faster than those in the VPP700. The sustained performance of the whole VPP5000 was almost 300 Gigaflops.

After providing very good service for many years, the Fujitsu VPP systems were eventually decommissioned at the end of March 2003.

In the second half of 2002, two IBM Cluster 1600 systems, consisting of 30 p690 SMP servers connected by an SP2 switch, were installed and commissioned. These were a departure from the earlier production systems in that they were shared memory scalar (as opposed to vector) systems. The first operational forecasts using this system were produced on 4 March 2003.

In the second half of 2004, these two clusters were replaced by two new IBM clusters each with 70 p690+ servers connected by a pSeries High Performance Switch (also known as a Federation switch)

In the second half of 2006, the above two clusters were replaced by two new IBM clusters each with 155 p5-575+ servers connected by a pSeries High Performance Switch.

While there are about twice as many servers in this system as in the previous one, each server only contains half as many processors (with the same clock frequency). Nevertheless, new architectural features of the POWER 5 system mean that this system is almost twice as powerful as the system it replaced. The first operational forecasts using this system were produced on 15 August 2006.

In 2009, the above two POWER5-based clusters were replaced by two new IBM clusters each with 286 p6-575 servers connected by an 8-plane IB4x-DDR infiniband network.

There are about twice as many servers in this system as in the previous one and each server contains twice as many processors (with a clock frequency of 4.7Ghz, compared to 1.9GHz). This means that for ECMWF's applications this system is about five times as powerful as the system it replaced. The first operational forecasts using this system were produced on 1 April 2009.


In late 2012/early 2013, the two clusters above were replaced by two new IBM clusters. Each cluster has 768 POWER7-775 servers connected by the IBM Host Fabric Interface (HFI) interconnect.

For the first time the processor clock frequency actually decreased, going from 4.7GHz to 3.83GHz, despite this each processor core has a theoretical peak performance 60% greater than that of the POWER6. For ECMWF's applications the system is about three times as powerful as the system it replaced.The first operational forecasts using this system were produced on 24 October 2012.

Comparison of ECMWF's latest supercomputer with its first one

Approx Ratio
Year installed
Vector processor
Dual Cluster of scalar CPUs
Number of Cores
Clock Speed
12.5 nsec (80 MHz)
0.26 nsec (3.83 GHz)
Peak perf per Core
Peak perf per system
Sustained performance
8 MiBytes
~106 TiBytes
Disk Space
2.5 GBytes
~3.1 PBytes




Top of page 17.06.2013
   Compare Pages Page Details   © ECMWF   
shim shim shim