Home page  
Home   Your Room   Login   Contact   Feedback   Site Map   Search:  
Discover this product  
About Us
Overview
Getting here
Committees
Products
Forecasts
Order Data
Order Software
Services
Computing
Archive
PrepIFS
Research
Modelling
Reanalysis
Seasonal
Publications
Newsletters
Manuals
Library
News&Events
Calendar
Employment
Open Tenders
   
Home > Services > Computing > Overview > Parallel Applications >     
   

Parallel Applications

 
 

Introduction

Applications running on ECMWF's IBM Supercomputers are mainly written in Fortran 90 and use a combination of the Message Passing Interface (MPI) and OpenMP parallel programming standards to take advantage of the large number of processors.

MPI basically provides interfaces to send/receive data and synchronise operations between the multiple tasks of a parallel application, while OpenMP provides a directive level interface specifically to exploit shared memory parallelism on each node.

Information on MPI and OpenMP can be found at www.mpi-forum.org and www.openmp.org respectively.

While MPI is sufficient to run ECMWF's parallel applications, experience has shown a 10% to 25% performance improvement by using MPI together with OpenMP as compared to using MPI exclusively. This performance gain is attributed to a large degree to the use of OpenMP dynamic scheduling.

Typical IFS applications today use from 768 to 1536 MPI tasks and 8 to 16 OpenMP threads.

Main ECMWF Parallel Applications

Some of the main applications run on the IBM Supercomputers are:

  • 4 Dimensional Variational Analysis (T1279/16 km resolution)
    • uses 1536 processors for operational performance and 768 processors for research experiments
  • Global deterministic atmospheric 10 day forecast (T1279/16 km resolution)
    • coupled with a wave model, uses 1536 processors for operational running and 768 processors for research experiments
  • Ensemble Prediction System
    • 51 10 day forecasts at T639/30 km resolution, each using 128 processors
  • Global Seasonal Forecasts
    • 7 month forecasts (ensemble with 41 members)
    • coupled atmosphere (IFS) and ocean (HOPE)
    • runs on a single node (32 processors using OpenMP)

Special Considerations for Systems with Cache

As the ECMWF IBM Supercomputers employ 3 levels of cache, an important consideration in performance programming is locality of reference. Put simply, we need to reuse data that is located in a cache line as many times as possible before the cache line gets ejected by hardware and subsequently reloaded from main memory.

The IFS supports such a mechanism by performing time consuming grid-point calculations (e.g. physics, grid point dynamics) in fixed sized NPROMA blocks. For a vector machine NPROMA would be set as large as possible (e.g. 1023 or 2047 for acceptable stack use) while for a scalar/cache architecture a value between 10 and 40 would be typical.


 

Top of page 26.08.2010
 
   Compare Pages Page Details         © ECMWF   
shim shim shim