Table of contents >>
Next Section >>
Previous Section >>
4 . Diagnostic tools for an assimilation system
An operational assimilation system is a (ever increasingly) complex machinery comparable with any large-scale industrial application: the scheduling is tight, an effective but robust functioning is required and a quick trouble-shooting is needed in case something goes wrong in an operational run. The complexity of the system dictates that several aspects of the system have to be monitored and diagnosed to make sure the output is reliable. A number of diagnostic tools are presented in this chapter. They are collected under headings according to their most obvious use.
4.1 Code development and trouble-shooting
4.1 (a) Test the correctness of tangent linear and adjoint codes
In the IFS there are tangent linear and adjoint codes associated with the forecast model and the observation operators. A test for the correctness of the tangent linear code can be derived from a Taylor expansion for the perturbed non-linear model state
by dividing by and reorganizing to a formula which behaves asymptotically according to
It is best to do the test for an individual routine at the time of writing, but the test can also be applied to the whole tangent linear model.
The adjoint and tangent linear codes have to form an adjoint pair which can be tested using the definition of the adjoint operator
where the inner products are defined in their respective spaces E and F. In practise, and are (randomly generated) input for tangent linear and adjoint codes (subroutines), respectively, and the inner products have to result in the same value within the computing accuracy.
IFS contains a large number of tangent linear and adjoint routines which are tested at the time of writing. It is best to do the testing individually for each routine and also for the model as a whole. In the IFS there is a built-in facility to test the tangent linear and adjoint of the forecast model but not observation operators. From the maintenance point of view, there are frequent changes to the non-linear code, the observation operators for example, and each such change has to be incorporated in the corresponding tangent linear and adjoint routines. Also changes in the internal data structures or subroutine arguments need to be done consistently in the tangent linear or adjoint codes. Currently at ECMWF, the tangent linear and adjoint coding is finished, however adding new features, like a new observation type which requires a new observation operator, brings along a need for development of the linear codes.
4.1 (b) Gradient test
Testing the gradient of the cost function is similar to that of testing the tangent linear code: the gradient of the cost function must asymptotically point to the same direction as is the difference between two realizations of the cost function which are separated by a small perturbation in model state. A Taylor expansion for the cost function is given by
The perturbation of cost function is given by
and therefore the quantity
approaches unity from below. There is a range of orders of magnitude of for which this is true. Outside the range it is not true because of the computing accuracy for too small values of , or because of the gradient of being non-quadratic for too large values of . In practise, the value if is repeatedly decreased by one order of magnitude resulting in a printout with more and more of 9's appearing until the computing accuracy is been reached.
A failure in the gradient test is a definite signature of an error somewhere in the variational assimilation system and not necessarily just in the tangent linear or adjoint coding. There are many ways of trouble-shooting, one of which is to reduce the dimension of the problem, for instance limiting oneself to a single observation case. The gradient may pass the test if a coding error in the adjoint code creates only a relatively small error in the gradient, so it is important to keep testing the tangent linear and adjoint codes as explained above.
4.1 (c) Convergence checks
The minimization of the cost function faces convergence checks. A trivial test of convergence is to check that the value of cost function decreases in every iteration. This is actually a built-it feature of the decent algorithm used in the IFS. For quadratic minimization problems, the norm of the gradient of the cost function should decrease in every iteration, apart from the rounding errors. The cost function at ECMWF assimilation system is non-quadratic and therefore the norm of the gradient can locally be larger than in the previous iterations when entering a new "valley" in the cost function topology. The gradient test is performed in every minimization at the first and the last minimization steps, as described above. The user also receives a note from the minimization algorithm if the norm of the gradient has not been reduced by more that a predefined factor which is dependent on the number of iterations.
4.1 (d) break-down and screening statistics
The observation term of the cost function describes the misfit of the model state to the observations scaled with their relative accuracy, which is for an individual datum
The expectation for the term before the minimization is given by
and should always be greater than one. If the quality of the background and the observations is similar then the value should be around two. The observation term can be broken down to contributions from different observation types, areas and observed variables and an average Jo contribution for those can be computed by dividing by the cost function by the number of observations. A troublesome subset of observations will show up in this way.
The printout of screening statistics comprises tables of the number of observations rejected (and for which reason) and the number used in the assimilation, and reveals for instance if an observation type is missing. This diagnostic printout as well as the Jo break-down are produced by default in IFS and together they tell reliably
In research experimentation at ECMWF, a common wish for new experiments is that there is a comparison available, either an operational products or another experiment.
4.2 (a) Forecast scores
Modifications to the operational assimilation system are usually justified with positive or neutral forecast scores (defined by anomaly correlation) as compared with the operational scores. A common practice is to perform one or several two-week assimilation experiments in order to objectively see the effect of the changes in assimilation or forecast model. Often the experiments are run for different seasons, as well. For major changes in the operational suite also a separate e-suite parallel to the operations is run to ensure the quality of the products and a smooth transition to the revised system.
Figure 4 gives an example of the forecast scores in a typical two-week pre-implementation experiment. In this case an hourly observation screening is tested in 4D-Var, i.e. allowing more observations from frequently reporting stations into assimilation (dotted line). The forecast scores for Northern Hemisphere are comparable with 4D-Var experiment using six-hourly observation screening (dashed) and better than 3D-Var (full) but for the Southern Hemisphere the hourly screening is clearly a bad option for 4D-Var. Based on these experiments it was decided to continue 4D-Var experimentation using the six-hourly screening of observations (or 3D-screening), and to investigate the reasons behind the bad performance on the Southern Hemisphere.
Figure 4 . An example of the forecast scores in a two-week assimilation experiment for Northern Hemisphere (top panel) and Southern Hemisphere (bottom panel) for geopotential height at 1000hPa. Solid line is for 3D-Var, dashed line for 4D-Var using same observations as 3D-Var (3D-screening) and dotted line for 4D-Var using extra surface observations from frequently reporting stations (4D-screening).
4.2 (b) Observation r.m.s. fit and histograms
The fit of the observations to the background and analysis can be conveniently examined by r.m.s. plots and histograms which are automatically generated for each assimilation experiment. An example of the r.m.s. plot for airep wind and temperature observations used in an assimilation experiment is given in Fig. 5 . One can see that the r.m.s. difference is smaller for the analysis departures (dotted lines) than for the background departures (solid lines) - the analysis is said "to have drawn to the data". The biases are also displayed and they have generally been reduced in the assimilation. Note that in these plots a desirable feature is a small r.m.s. of the background departures. This value is generally smaller, for instance, in 4D-Var than in 3D-Var indicating improved accuracy of the 4D-Var assimilation compared to 3D-Var. A small r.m.s. of the analysis departures is however not a design criterion as such. One could, for instance, specify too small observation errors which would result in unrealistically small r.m.s. of the analysis departures which might deteriorate the subsequent short range forecast, i.e. r.m.s. of the background departures would increase.
A similar diagnostic plot is the histogram of departures which is usually plotted for single level observations, like synop or dribu reports. Figure 6 gives an example of histogram for satob (or cloud track) wind observations. Both the background and analysis departures are displayed. One can note that the mean and standard deviation of the departure distribution is smaller after the assimilation which means that information has been extracted from the observations. The distribution of background departures should be approximately Gaussian with mean near zero.
Figure 5 . An example of an r.m.s. plot for airep wind and temperature observations. r.m.s. on the left and bias on the right, and number of observations used in the assimilation in the middle. Solid line is for background departures and dotted for analysis departures.
4.2 (c) Mean and r.m.s. of analysis increments
The analysis increments can be reconstructed after the assimilation by subtracting the background from the analysis. The mean and r.m.s. of these increment fields can reveal a lot of the performance of the assimilation system. First, large mean increments may result from using biased observations which may be for instance due to incorrect bias correction. It may also be a sign of an unsuccessful model change which has introduced a model bias which may appear only locally. For instance an albedo change over snow covered areas may cause a bias to appear in the background which the unbiased observations try to correct. Second, the r.m.s. of the analysis increments should be small which is a sign of consistency of short range forecast and observations.
Figure 6 . An example of the histogram of the satob wind (v-component) fit to the analysis (top panel) and background (bottom panel).
When 4D-Var was about to be implemented at ECMWF, one of the strong points for the implementation was the smaller analysis increments in 4D-Var compared with 3D-Var. Later when a modification of 4D-Var to use more observations from frequently reporting stations by applying serial correlation of observations errors was discussed, one aspect for the implementation was the further reduced analysis increments (Fig. 7 ), for instance over the Northern Atlantic. The impact due to the addition of more observations can be revealed simply by comparing the difference between the analyses from the two assimilation systems in the r.m.s. sense (Fig. 8 ). The largest impact is, as expected, over the areas where the conventional observational coverage is not a very dense one, and in areas where the atmospheric flow tends to be more unstable, like the storm track areas.
Figure 7 . The improvement of the consistency of the background field with observations when using 4D- screening (plus serial observation error correlation plus joint variational quality control). The quantity is the 1000hPa geopotential difference between r.m.s. of analysis increments in the experiment and its control, for period 11 to 24 December 1997. Contours are +/-0.1, +/-0.25 and +/-0.50 decametres. Green (orange) areas denote smaller (larger) analysis increments in the experiment than in its control.
Figure 8 . The impact on analyses of applying 4D-screening (plus serial observation error correlation plus joint variational quality control). The quantity is the 1000hPa geopotential r.m.s. of analysis differences between the experiment and its control., for period of 11 to 24 December 1997. The contours are 0.35, 0.50, 0.75,.1.00,.1.50,.2.00 and 3.00 decametres. The largest impact is over the areas of sparse conventional observational coverage.
4.3 Operational monitoring
4.3 (a) Cross-validation with satellite products
The operational department at ECMWF is constantly monitoring the quality of the operational production, e.g. use and quality of observations, their availability, character of the analysis increments etc. Many of the suggestions for improving the assimilation system actually come from the results of this intense monitoring. More details of their activities are given in the appropriate Training course module. One method which is used both by the operations and the research is the cross-validation with satellite products. There are some parameters for which direct (in situ) observations are scarce, like clouds or position of a tropical storm, and for those a visual comparison with satellite products may be very useful.
4.3 (b) Back-tracking problems with sensitivity products
An often occurring situation in weather forecasting is an unpredicted small scale flow pattern, followed by a question why it was not predicted. In these cases error back-tracking has long been used (even with subjective forecasts). The adjoint model provides one extra tool for doing the back-tracking. Sensitivity to analysis "errors" can be calculated using the adjoint model in the following way. Two day forecast error is fed to the adjoint model as a forcing and the adjoint calculations result in a gradient, or sensitivity pattern, with respect to the initial condition. This sensitivity pattern tells where and in which direction the initial condition should be perturbed in order to achieve a smaller two day forecast error. Of course, the two day forecast error is not entirely due to an inaccurate initial condition but also due to the model error over the two day integration time. Nevertheless, this sensitivity pattern can give a useful clue for the analyst about where the reason for the forecast failure may be found. This method has been successfully used at ECMWF.
4.4 Estimation and tuning
4.4 (a) Observation and background errors
The specification of observation and background error covariances for the assimilation system is an essential step which determines the relative weight of the observations and the background, respectively. These statistics are not known exactly but are estimated for each assimilation system. Therefore, as the observing network or the assimilation system changes, the statistics may require tuning for optimal performance.
There is a reliable method (Hollingsworth-Lönnberg method) for observation and background error estimation over data rich areas (as explained elsewhere in Lecture Notes). An example of the behaviour of background error covariances is given in Fig. 9 for airep temperature observations over North America at 200hPa. The background departures are correlated at short distances and the correlation rapidly decreases with increasing distance. With distances over about 500km there is hardly any correlation left. In the estimation method it is assumed that the observation errors are not correlated between the stations. This enables partitioning the perceived short-range forecast error variance into contributions from the observation and background errors. A curve is fitted (dashed line in Fig. 9 ) to the histogram of covariance values (filled circles in Fig. 9 ) and the intersect of the fitted curve with the ordinate gives an estimate of the background error variance, the rest of perceived short-range forecast error variance being due to the observation error.
4.4 (b) Verification of structure functions
The structure functions are specified from a sample of short-range forecast differences (24-hour minus 48-hour forecast differences in the NMC method). The Hollingsworth-Lönnberg method is not for re-tuning or changing them, but the method can be used for verifying how well the shape of specified structure functions is supported by the covariance of background departures. An example of the specified structure function is given in Fig. 10 for temperature at model level 10 (about 200hPa) at mid latitudes. Comparing Figs. 9 and 10 reveals the sharper horizontal structure of the short range forecast error as estimated from airep observations departures (calculated at resolution T213) than the modelled structure function at truncation TL159. The difference is partly explained by the resolution. More importantly, the modelled structure function is a global one dominated by Southern Hemisphere mid latitudes, whereas the estimated one is from Northern America with a very dense data coverage which tends to shorten the horizontal scale of short-range forecast error.
Figure 9 . An example of background error covariance for airep (acar) temperature observations in 4D-Var over the period of 1 September 97 - 14 October 97 over North America at 200hPa. In this case, the estimated background error variance at zero distance is about 0.13K2 which would indicate a background error of about 0.36K. As the total perceived error variance is 1.03K2 (not shown), the estimated observation error is therefore 0.95K.
Figure 10 . The specified structure function for temperature at latitude 50oN. Note that the horizontal scale of the absissa is different from Fig. 9 .
Training Course Notes Front Page >>
Table of contents >>
Next Section >>
Previous Section >>