ecWAM 1.5: waves now support two types of GPUs

To make full use of the diverse range of computing hardware available to the EU Destination Earth (DestinE) initiative across various EuroHPC systems, the ECMWF Integrated Forecasting System (IFS) has been adapted for graphics processing unit (GPU) accelerators. The latest version of ECMWF’s open-source operational wave model, ecWAM 1.5 (https://github.com/ecmwf-ifs/ecwam), is now GPU-ready, as part of an ongoing collaborative effort that includes ECMWF, CINECA, Nvidia and AMD.

Scientific significance

Ocean surface waves, generated primarily by the action of wind, are an integral component of the Earth system due to their role in modulating momentum, heat and moisture fluxes between the ocean and atmosphere. Third-generation spectral wave models are the best tools we have for modelling wave generation, transformation, and decay across the global oceans. These models explicitly resolve the full two-dimensional wave action spectrum and evolve it through physically based source terms representing wind input, nonlinear wave–wave interactions, and dissipation, without imposing constraints on spectral shape or wave growth. They also describe key propagation processes such as advection, refraction, and shoaling. ecWAM is ECMWF’s state-of-the-art third-generation spectral wave model and a core component of the IFS Earth system.

GPU-adaptation strategy

Because ecWAM contains a diverse set of computational patterns in the model, a mixed strategy was used to adapt ecWAM to GPUs. The wave propagation kernel uses a bespoke low-order computational stencil, and was adapted for GPU via manual, tailored code optimisations and insertion of pragma directives.

Source-term computation forms the other major scientific component of ecWAM. It represents the physical processes that generate, dissipate and redistribute wave energy. This is a large and scientifically fast evolving part of the code, and a manual GPU-adaptation strategy would thus be unsustainable. Here, ECMWF’s in-house source-to-source translation tool Loki (https://github.com/ecmwf-ifs/loki) is used to automatically generate the GPU code from the central processing unit (CPU) code as a preprocessor step in the ecWAM build procedure. Moreover, as Nvidia and AMD GPUs use distinct programming models, generating the GPU optimised code using Loki is essential for targeting both architectures without making highly intrusive code modifications. As the ecWAM source-term computation kernel features the same “single-column” (with each column comprised of one grid point) memory access pattern as the grid-point computations in the IFS dynamics and physics, the same Loki GPU-adaptation recipes can be used.

Fig 1. — **Runtime (model execution time) for a six-hour standalone ecWAM forecast on an O320 grid.** Timings were collected on ECMWF’s Atos HPC, JUPITER and LUMI. Results are shown for both double precision (DP) and single precision (SP), which is used more commonly in operational configurations. Note that the performance comparisons are across different node configurations and do not necessarily represent the same number of CPUs and GPUs.

Data transfers between the separate memory spaces of a CPU and a GPU represent one of the key challenges of adaptation. To address this, the ecWAM data structures have been rewritten around Field-API (https://github.com/ecmwf-ifs/field_api), an array data management abstraction co-developed with Météo-France for the highly bespoke memory data layouts used in the IFS. Field-API provides vendor-agnostic support for highly optimised CPU-to-GPU data transfers and enables this functionality to be transparent to scientific developers behind an intuitive and non-intrusive abstraction. More details about Loki, Field-API and the wider IFS GPU-adaptation strategy can be found in the accompanying Newsletter article titled ‘GPU-adaptation of IFS for EuroHPC machines’ (see article by Lange et al. in this Newsletter).

Modularisation and open sourcing

Modularisation and open sourcing have also played a critical role in the GPU-adaptation of ecWAM. Modularisation, specifically the ability to build and test ecWAM independently of the IFS, significantly shortens the development cycle, and open sourcing greatly simplifies collaboration with external contributors.

The initial GPU-adaptation of the wave propagation kernel was performed by CINECA under a contracted DestinE activity. Both modularisation and open sourcing were instrumental in enabling this work to be completed in a timely manner. ecWAM has also benefited immensely from direct vendor contributions, initially from Nvidia and, more recently, AMD, which again was facilitated by open sourcing. Perhaps even more importantly, having a modular and open-source ecWAM also greatly lowers the barrier to continued optimisation, both by internal and external developers.

Computational performance

Whilst there is still room for optimisation, significant speedups have been achieved across a variety of GPU architectures as shown in the first figure. The entire ecWAM timestep is now executed on GPU, and data is only exchanged between CPU and GPU at the coupling points with the IFS when forcings are exchanged either way (see the second figure). Limiting the CPU–GPU data transfers in such a way allows the GPU computational performance gains to translate directly into reduced overall model-execution time.

Fig 2. — **A simplified illustration of ecWAM’s control-flow.** The full model timestep now runs on the GPU with data exchanged only at coupling points with the IFS.

Editorial

News

Earth system science

Computing

Newsletter

ecWAM 1.5: waves now support two types of GPUs

Scientific significance

GPU-adaptation strategy

Modularisation and open sourcing

Computational performance