3.4 Data structures
There are several data structures utilized within the various computational phases of the IFS. Most belong to either the grid-point, Fourier, or spectral domain. These are described in this section. Due to the transposition strategy combined with the spectral-transform method, very few global data structures are required. The large majority of data structures and variables define quantities for the subdomain belonging to the processor, i.e. the number of grid columns on the processor, the number of local grid-point latitudes, and the number of spectral waves this processor is responsible for.
3.4.1 Grid-point data
The fundamental data organization in grid space stores points so that, for a given field, neighbouring storage locations contain field values in a west-east direction. This storage order reflects the nature of the indexing in nearly all inner DO loops of grid-point computational routines. A consequence is that the natural DO loop length is limited to the number of points in each latitude circle. Since the `reduced grid' (reference here) is in widespread use, this leads to a performance penalty on vector based architectures, because the polar rows of a reduced grid would generate very short vector computation (typically only 18 grid points).
3.4.2 Grid-point blocking.
A solution to this performance problem is to introduce a `blocking' mechanism whereby latitude rows are packed into a `super-row' buffer which is then treated computationally as a single entity. The size of the buffer can be defined at run time by a namelist variable NPROMA and provides a way to trade memory for performance. The basic grid-point data structure reflects this mechanism. We first describe the case LCRAYPVP = TRUE which was used in the original (still functioning) IFS version used on Cray C90 and other shared memory computers. Here it is required that complete latitudes plus three extra wrap-around points fit into the NPROMA buffer. So the minimum NPROMA size is three more than the number of points on a latitude near the equator. Two extra points are required at the end of each latitude because of the FFT routines, and one extra point at the start and two point at the end are required by the semi-Lagrangian routines to simplify the interpolation routines. The three extra points are kept for all grid-point and FFT arrays to simplify the code design.
An example of the contents of a block with 2 packed latitudes would be:
F[0, 1], F[1, 1]............F[NLOENG(1), 1], F[NLOENG(1)+1, 1], F[NLOENG(1)+2, 1],
F[0, 2], F[1, 2].........F[NLOENG(2), 2], F[NLOENG(2)+1, 2], F[NLOENG(2)+2, 2]
where NLOENG(nlat) is the number of points in the (reduced) grid for latitude nlat.
3.4.3 The NPROMA block
In general, the NPROMA block will contain an arbitrary number of complete latitudes for a given field, with some unused space at the end.
Figure 3.1 Diagram of the NPROMA block.
NPROMA is contained in namelist NAMDIM. There is a similar variable called NRPROMA which allows control over buffering within the radiation sparse grid.
In the case where LMESSP = TRUE the grid-point arrays do not have wrap-around points, i.e. they only contain the `real' grid points on the globe. The distribution is also generalized so that a subset of latitudes can be kept in the NPROMA buffer. NPROMA can have any value right down to 1. On a vector machine one benefits from choosing as high a value as the memory constrains can afford. On a cache-based machine it is favourable to choose a small value so cache re-utilization is maximized. The extra two points required for FFT calculations are only present in local short-lived arrays used during the FFT. The three wrap-around points required for the semi-Lagrangian calculations are only present in the semi-Lagrangian buffer, and here in a generalized way (see Section 3.7 below) in order to cope with the more flexible grid-point distribution scheme utilized.
An NPROMA data block is followed in memory-storage order by a similar block from a different field or level. This is described in Fortran terms by the declaration: REAL F(NPROMA,NFIELDS,NGPBLKS) where NFIELDS is the total number of fields and levels, and NGPBLKS is the number of blocks necessary to hold a complete field. Within the IFS, the data is structured in this way after conversion to grid-point space by the FFT routine and reorganization by the transposition routine. Subsequently, the fields are passed to the computational routines block by block, as subroutine arguments. Examples are the local arrays ZGT0, ZGT1 and ZGT5 in SCAN2MDM.
3.4.4 Fourier data
In Fourier space, the data are organized for the convenience of the FFT routines. For LMESSP = FALSE each latitude has `wrap-around' points, one located before the first point, and two located after the last point. If LMESSP = TRUE only two points at the end are present. Latitudes are located sequentially for a given field, in a buffer of length NPROMAG.
The data layout used for input and output to the FFT routines is shown in Fig. 3.2 . Note that since the values of N are not the same for each latitude of a reduced grid, there is wasted space at the end of the short rows. Additionally, when the data have been converted to Fourier waves there is again waste space, since there are always fewer waves than grid-points for each latitude. This last extra space is required during the FFT, but to save memory a packed version of this layout (GT0BUF) is used for long-term retention of data in Fourier space. In this case, each row holds only the wave information appropriate for that latitude, the wave-number cut-off is defined by the array NMEN.
Figure 3.2 The data layout used for input and output to the FFT routines
The flow of data between these buffers is illustrated in Fig. 3.3 :
Figure 3.3 Data flow in the FFT routines.