![]() |
||||||||
|
||||||||
Parallel Applications |
||||||||
|
|
IntroductionApplications running on ECMWF's IBM Supercomputers are mainly written in Fortran 90 and use a combination of the Message Passing Interface (MPI) and OpenMP parallel programming standards to take advantage of the large number of processors. MPI basically provides interfaces to send/receive data and synchronise operations between the multiple tasks of a parallel application, while OpenMP provides a directive level interface specifically to exploit shared memory parallelism on each node. Information on MPI and OpenMP can be found at www.mpi-forum.org and www.openmp.org respectively. While MPI is sufficient to run ECMWF's parallel applications, experience has shown a 10% to 25% performance improvement by using MPI together with OpenMP as compared to using MPI exclusively. This performance gain is attributed to a large degree to the use of OpenMP dynamic scheduling. Typical IFS applications today use from 32 to 128 MPI tasks and 2 or 4 OpenMP threads. Main ECMWF Parallel ApplicationsSome of the main applications run on the IBM Supercomputers are:
Special Considerations for Systems with CacheAs the ECMWF IBM Supercomputers employ 3 levels of cache, an important
consideration in performance programming is locality of reference. Put
simply, we need to reuse data that is located in a cache line as many
times as possible before the cache line gets ejected by hardware and subsequently
reloaded from main memory. The IFS supports such a mechanism by performing time consuming grid-point calculations (e.g. physics, grid point dynamics) in fixed sized NPROMA blocks. For a vector machine NPROMA would be set as large as possible (e.g. 1023 or 2047 for acceptable stack use) while for a scalar/cache architecture a value between 10 and 40 would be typical. |
|||||||
|
|
|||||||