Home page  
Home   Your Room   Login   Contact   Feedback   Site Map   Search:  
Discover this product  
About Us
Overview
Getting here
Committees
Products
Forecasts
Order Data
Order Software
Services
Computing
Archive
PrepIFS
Research
Modelling
Reanalysis
Seasonal
Publications
Newsletters
Manuals
Library
News&Events
Calendar
Employment
Open Tenders
   
Home > Newsevents > Training > Rcourse_notes > DATA_ASSIMILATION > ASSIM_CONCEPTS >  
   

Data assimilation concepts and methods
March 1999

By F. Bouttier and P. Courtier


1. Basic concepts in data assimilation
2. The state vector, control space and observations
3. The modelling of errors
4. Statistical interpolation with least-squares estimation
5. A simple scalar illustration of least-squares estimation
6. Models of error covariance
7. Optimal interpolation (OI) analysis
8. Three-dimensional variational analysis (3D-Var)
9. 1D-Var and other variational analysis systems
10. Four-dimensional variational assimilation (4D-Var)
11. Estimating the quality of the analysis
12. Implementation techniques
13. Dual formulation of 3D/4D-Var (PSAS)
14. The extended Kalman filter (EKF)
15. Conclusion
Appendix A. A primer on linear matrix algebra
Appendix B. Practical adjoint coding
Appendix C. Exercises
Appendix D. Main symbols
References
 
  Training Course Notes Front Page >>
Table of contents >>
Next Section >>
Previous Section >>





12 . Implementation techniques


In most practical applications, numerical cost is an important issue. As shown above, there is a variety of analysis methods available. It does not imply that any of these is the best; they should be regarded as a choice of several compromises between numerical cost, statistical optimality and physical realism of the assimilation system. The sections below describe other features of the analysis algorithms which can be used to further cut down on the numerical cost, without sacrificing too much on the sophistication of the analysis method itself. They are discussed here in the framework of 3D-Var

1, but they can be applied equally well (with a few adaptations) to all related algorithms: 1D-Var, 4D-Var, PSAS or the Kalman filter.



12.1 Minimization algorithms and preconditioning

In a variational analysis system a cost function has to be minimized, usually using an iterative descent algorithm. The cost of the analysis is proportional to the number of evaluations of the cost function and its gradient

2, called the number of simulations. When the state itself is updated, an iteration is performed. Each iteration may require one or more simulations, depending on the minimizing algorithm used. Hence the technical implementation of a variational analysis can be summarized as a simulator operator:



 


How to use the simulator to minimize the cost function is a well-developed area of mathematics (called optimization, a part of numerical analysis). With the analysis methods described above, the cost function will be a scalar function of a real vector in a Euclidean space; in most applications it will be quadratic and will be unconstrained. There are several ready-to-use algorithms that do the minimization, called minimizers. An obvious method, the steepest descent method, is to update by adding a correction that is proportional to . This is usually not very efficient, and more popular algorithms are the conjugate gradient and quasi-Newton methods. They are still being improved. There are more specialized algorithms for situations where is not quadratic or is bounded, e.g. simulated annealing or the simplex, although such methods can be very expensive. The incremental method described below can also be regarded as a particular minimizer. A detailed description of the main minimizing algorithms can be found in dedicated mathematical books. Among the important theoretical results are the optimality properties of the conjugate gradient method in the case of an exactly quadratic cost function, and its equivalence with a Lanczos method for determining eigenvectors of the Hessian matrix. Also, the quasi-Newton methods can be regarded as a preconditioning of the cost function using accumulated information about the second derivatives.

The main aspect of that affects the performance of conventional minimizers (assuming is quadratic or almost) is its condition number. This quantity measures the ellipticity of the iso-surfaces of J, and it describes the difficulty of minimization problem (or ill-conditioning) due to the gradient not pointing accurately toward the minimum (
Fig. 15 ). In this case minimizers have trouble converging, a phenomenon called the narrow valley effect.


Figure 15 . Illustration of the so-called narrow valley effect: in a plane of the control variable space where the convexity of the cost-function depends a lot on direction, the isolines are narrow ellipses, and in most places the gradient of the cost function is nearly orthogonal to the direction of the minimum , which means that minimization algorithms will tend to waste many iterations zigzagging slowly towards the optimum.

Condition number. The condition number of is defined to be the ratio between the largest and the smallest eigenvalue of . The larger the number, the more ill-conditioned the problem is.

If the condition number is equal to one, i.e. is proportional to , the cost function is said to be spherical and the minimum can be found in one iteration because points directly toward the minimum.

In the general case, is elliptic, but it is possible to define a change of minimization space called preconditioning that decreases the condition number. The idea is to present the minimizer with a problem that is not the minimization of , but another easier problem from which can be obtained easily. The mapping between both problems is defined as follows using a preconditioner operator :

12.2 Theorem: preconditioning of a variational analysis


If L is an invertible operator, an equivalent rewriting of the minimization problem:

 
,

  with the initial point ,
is the preconditioned problem:

 
,

  with the initial point .

The solution is given by .


The proof is left as an exercise. In 3D-Var, a simple and efficient preconditioner is the symmetric left-hand square root of , i.e. a matrix
3 L such that . In this case one can show that

 

i.e. the term is now the canonical inner product. An ideal preconditioner would of course be provided by the symmetric square root of the Hessian matrix. Some sophisticated minimizer packages allow the user to provide his own preconditioner to the code, which can take the form of a clever specification of the inner product.

Ref:
Gilbert and Lemaréchal 1989.

12.3 The incremental method

The incremental method is a relatively empirical technique designed to reduce the cost of solving a predefined variational problem, e.g. by reducing the resolution of the increments.

In the introduction it was explained how the control variable could be made smaller than the model state by requiring that the increments can only be non-zero in a subspace of the model. In this case there is no guarantee that the analysis verifies any optimality condition in the full model space. For instance, OI solves the problem separately in a set of subspaces (defined by the observation selection), but the result is not as optimal as a global least-squares analysis. With 3D- or 4D-Var is it usually not affordable to solve the variational problem at the full model resolution. However, it is expected that most of the complexity of the analysis is in the synoptic scales, because this is where most background errors are expected to be. If the increments are right at the synoptic scales, then one can expect the smaller scales to be more or less forced to be realistic features by the model dynamics. It is undesirable, though, to completely neglect the small scales in the analysis procedure because they are important in the comparison of the observations with the background state. In other words, one is looking for a low-resolution correction to a high-resolution background. The incremental method described below has been designed for this particular problem. Mathematically, it can be thought the approximation of a large problem by a sequence of smaller problems. However, there is no proof of the convergence of the general procedure
4.

In the incremental method some high-resolution versions of the cost function, the observation operator and the model state are considered, denoted respectively . We are trying to minimize . One or several successive approximations to this problem are solved successfully. Each one is an inner loop that tries to update a high-resolution state into another one that is more optimal (in the first update, ). The input information to the inner loop is given by the high-resolution departures:

 

and by a low-resolution version of defined by a conversion operator :

 


It is natural to linearize the low-resolution observation operator in the vicinity of which is the best currently available estimate of the analysis
5, which yields a linearized observation operator that depends on the update index , and defined as the differential of in the vicinity of :

 


However, for consistency with the high-resolution problem, one also requires that the low-resolution is kept consistent with the high-resolution one for , so that the linearized departures used at low resolution will be calculated as

 

so that the low-resolution cost-function to minimize in the inner loop is

 

which is exactly quadratic. Its minimum is which can in turn be used to update the high-resolution state using a (possibly nonlinear) ad hoc conversion operator :

 

which ensures that the high-resolution state is not modified if the inner loop minimization does not change the state. From the new high-resolution departures can be calculated and used to define the next low-resolution problem. If then the high- and low-resolution problems are fully consistent with each other and the whole algorithm has converged. However, it not guaranteed that there is a convergence at all. This is why one must be careful about the physical implications of changing the resolution. The intuitively important assumption for convergence (this can be proven in simplified systems) is that

 

i.e. the changes in the model equivalents of the observations should be similar at high and low resolutions. If for instance they are of opposite signs, one can expect the model state at high resolution to go away from the observations during the procedure until it is stopped by the term-a not very desirable behaviour. Whether this is a genuine problem is still an area of research. History has shown so far that 3D-Var with a simple incremental formulation and a rather low resolution of the inner loops can be much better than an OI algorithm at full resolution, for a similar numerical cost.

Ref:
Courtier et al. 1994.

12.4 The adjoint technique

As shown in the explanation of the 4D-Var method, some computational savings can be achieved by a suitable ordering of the algebraic operations, in order to reduce the size and number of the matrix multiplications involved. For minimization problems in particular, when the derivative of a scalar function with respect to a large vector needs to be evaluated (e.g. Jo), it is advantageous to use the chain rule backwards, i.e. from the scalar function to the input vector. Algebraically this means replacing a set of matrices by their transposes, hence the name of adjoint technique. The definition of the adjoint depends on the scalar products
6 used:


Adjoint operator. By definition, given a linear operator going from a space to a space , and scalar products , and in these respective spaces, the adjoint of is the linear operator such that for any vectors in the suitable spaces,

 



Important remarks on the adjoints
    Riesz theorem: The adjoint always exists and it is unique, assuming spaces of finite dimension7. Hence, coding the adjoint does not raise questions about its existence, only questions of technical implementation.
    In the meteorological literature, the term adjoint is often improperly used to denote the adjoint of the tangent linear of a non-linear operator. One must be aware that discussions about the "existence of the adjoint" usually address the existence of the tangent linear operator (or the acceptability of using the adjoint of an improper tangent-linear in order to minimize a 4D-Var cost-function). As explained above, the adjoint itself always exist.
    In general, the adjoint depends on the definition of spaces and . For instance, a canonical injection (i.e. with being a subspace of ) is not necessarily self-adjoint although does not involve any arithmetic operation.
    In general, the adjoint depends on the choice of scalar products, even if . For instance, a symmetric matrix may not be self-adjoint if the scalar product is not the canonical product (see below).


Theorem: adjoint and scalar product change. The operator being identified with its matrix, and the scalar products and being identified with their symmetric positive definite matrices and such that e.g. , the matrix of the adjoint of is

 



The proof is obvious from the definition of :

 

and noting that E is invertible. In most practical cases (such as in the rest of this paper) the implicit scalar product used is the canonical inner product8, so that the transpose is the adjoint: . However, one must take care whenever another scalar product is used, because it has implications on the coding of the adjoint: the scalar product coefficients or their inverses must be used according to the above equation.

Adjoint of a sequence of operators. Like the transpose, the adjoint of a product of operators is the product of the adjoints in the reverse order. The scalar product matrices cancel out each other, so that if is a sequence of operators, its adjoint is

 

which shows that, even if the scalar products are not the canonical inner product, in most of the adjoint coding it can be considered that the adjoint is the transpose. The guidelines for practical adjoint coding are detailed in an appendix.

Ref:
Errico and Vukicevic 1992.


Training Course Notes Front Page >>
Table of contents >>
Next Section >>
Previous Section >>




1 This reflects history. The main step in meteorological data assimilation methods was the move from OI to 3D-Var. It was a major technical challenge in terms of coding and numerical cost at the time, which required some major developments in the fields of adjoint coding, formulation of the incremental technique and design of the preconditioner.
2 Some minimization algorithms also use information about the second derivative of the cost function, which requires the coding of the second-order adjoint of its components.
3 The symmetric square root is not unique, it is defined modulo an orthogonal matrix.
4 It is possible to guarantee convergence for some special forms of the incremental algorithm.
5 One would rather like to use a low-resolution version of the linearized high-resolution in the vicinity of but it would be more expensive than the technique described here.
6 or: inner products
7 It is actually true for all continuous operators in Hilbert spaces, but this is outside the scope of this paper.
8 or: inner dot product, or: Euclidean product.



 

Top of page 03.12.2001
 
   Page Details         © ECMWF
shim shim shim