|
|
Training
Course Notes Front Page >>
Table of contents >>
Next Section >>
Previous Section >>
12 . Implementation techniques
In most practical applications, numerical cost is an
important issue. As shown above, there is a variety of analysis methods
available. It does not imply that any of these is the best; they should
be regarded as a choice of several compromises between numerical cost,
statistical optimality and physical realism of the assimilation system.
The sections below describe other features of the analysis algorithms
which can be used to further cut down on the numerical cost, without sacrificing
too much on the sophistication of the analysis method itself. They are
discussed here in the framework of 3D-Var
1, but they can be applied equally well
(with a few adaptations) to all related algorithms: 1D-Var, 4D-Var, PSAS
or the Kalman filter.
12.1 Minimization algorithms and preconditioning
In a variational analysis system a cost function has
to be minimized, usually using an iterative descent algorithm. The cost
of the analysis is proportional to the number of evaluations of the cost
function and its gradient
2, called the number of simulations.
When the state itself is updated,
an iteration is performed. Each iteration may require one or more
simulations, depending on the minimizing algorithm used. Hence the technical
implementation of a variational analysis can be summarized as a simulator
operator:
| |
|
|
How to use the simulator to minimize the cost function
is a well-developed area of mathematics (called optimization,
a part of numerical analysis). With the analysis methods described
above, the cost function will be a scalar function of a real vector in a
Euclidean space; in most applications it will be quadratic and will be unconstrained.
There are several ready-to-use algorithms that do the minimization, called
minimizers. An obvious method, the steepest descent method,
is to update by adding a correction that is proportional to . This is usually not very efficient, and more popular algorithms
are the conjugate gradient and quasi-Newton methods.
They are still being improved. There are more specialized algorithms for
situations where is not
quadratic or is bounded, e.g. simulated annealing
or the simplex, although such methods can be very expensive. The
incremental method described below can also be regarded as a particular
minimizer. A detailed description of the main minimizing algorithms can
be found in dedicated mathematical books. Among the important theoretical
results are the optimality properties of the conjugate gradient method in
the case of an exactly quadratic cost function, and its equivalence with
a Lanczos method for determining eigenvectors of the Hessian matrix. Also,
the quasi-Newton methods can be regarded as a preconditioning of the cost
function using accumulated information about the second derivatives.
The main aspect of that affects
the performance of conventional minimizers (assuming is quadratic or almost) is its condition number.
This quantity measures the ellipticity of the iso-surfaces of J,
and it describes the difficulty of minimization problem (or ill-conditioning)
due to the gradient not pointing accurately toward the minimum (Fig. 15
). In this case minimizers have trouble converging, a phenomenon called
the narrow valley effect.
Figure 15 . Illustration of the so-called narrow
valley effect: in a plane of the control variable space where the convexity
of the cost-function depends a lot on direction, the isolines are narrow
ellipses, and in most places the gradient of the cost function is nearly
orthogonal to the direction of the minimum , which means that minimization algorithms
will tend to waste many iterations zigzagging slowly towards the optimum.
Condition number. The condition number
of is defined to be the ratio between the largest and
the smallest eigenvalue of . The larger
the number, the more ill-conditioned the problem is.
If the condition number is equal to one, i.e. is proportional to , the cost function is said to be spherical
and the minimum can be found in one iteration because points directly toward the minimum.
In the general case, is elliptic,
but it is possible to define a change of minimization space called preconditioning
that decreases the condition number. The idea is to present the minimizer
with a problem that is not the minimization of , but another
easier problem from which can be
obtained easily. The mapping between both problems is defined as follows
using a preconditioner operator :
12.2 Theorem: preconditioning of a variational analysis
The proof is left as an exercise. In 3D-Var, a simple and
efficient preconditioner is the symmetric left-hand square root of , i.e. a matrix3 L
such that . In this case
one can show that
i.e. the term is now the canonical inner product. An ideal
preconditioner would of course be provided by the symmetric square root
of the Hessian matrix. Some sophisticated minimizer packages allow the user
to provide his own preconditioner to the code, which can take the form of
a clever specification of the inner product.
Ref: Gilbert
and Lemaréchal 1989.
12.3 The incremental method
The incremental method is a relatively empirical technique
designed to reduce the cost of solving a predefined variational problem,
e.g. by reducing the resolution of the increments.
In the introduction it was explained how the control variable
could be made smaller than the model state by requiring that the increments
can only be non-zero in a subspace of the model. In this case there is no
guarantee that the analysis verifies any optimality condition in the full
model space. For instance, OI solves the problem separately in a set of
subspaces (defined by the observation selection), but the result is not
as optimal as a global least-squares analysis. With 3D- or 4D-Var is it
usually not affordable to solve the variational problem at the full model
resolution. However, it is expected that most of the complexity of the analysis
is in the synoptic scales, because this is where most background errors
are expected to be. If the increments are right at the synoptic scales,
then one can expect the smaller scales to be more or less forced to be realistic
features by the model dynamics. It is undesirable, though, to completely
neglect the small scales in the analysis procedure because they are important
in the comparison of the observations with the background state. In other
words, one is looking for a low-resolution correction to a high-resolution
background. The incremental method described below has been designed for
this particular problem. Mathematically, it can be thought the approximation
of a large problem by a sequence of smaller problems. However, there is
no proof of the convergence of the general procedure4.
In the incremental method some high-resolution versions
of the cost function, the observation operator and the model state are considered,
denoted respectively . We are trying
to minimize . One or several successive approximations
to this problem are solved successfully. Each one is an inner loop
that tries to update a high-resolution state into another one that is more
optimal (in the first update, ). The
input information to the inner loop is given by the high-resolution departures:
and by a low-resolution version of defined by a conversion operator :
It is natural to linearize the low-resolution observation
operator in the vicinity of which is the best currently available estimate of the analysis5, which yields a linearized observation
operator that depends
on the update index , and
defined as the differential of in the
vicinity of :
However, for consistency with the high-resolution problem,
one also requires that the low-resolution is kept consistent with the high-resolution
one for , so that
the linearized departures used at low resolution will be calculated as
so that the low-resolution cost-function to minimize in the inner loop is
which is exactly quadratic. Its minimum is which can in
turn be used to update the high-resolution state using a (possibly nonlinear)
ad hoc conversion operator :
which ensures that the high-resolution state is not modified if the inner
loop minimization does not change the state. From the new high-resolution departures can be calculated
and used to define the next low-resolution problem. If then the high- and low-resolution problems are fully
consistent with each other and the whole algorithm has converged. However,
it not guaranteed that there is a convergence at all. This is why one must
be careful about the physical implications of changing the resolution. The
intuitively important assumption for convergence (this can be proven in
simplified systems) is that
| |
|
|
i.e. the changes in the model equivalents of the observations should be
similar at high and low resolutions. If for instance they are of opposite
signs, one can expect the model state at high resolution to go away
from the observations during the procedure until it is stopped by the term-a
not very desirable behaviour. Whether this is a genuine problem is still
an area of research. History has shown so far that 3D-Var with a simple
incremental formulation and a rather low resolution of the inner loops can
be much better than an OI algorithm at full resolution, for a similar numerical
cost.
Ref: Courtier et al.
1994.
12.4 The adjoint technique
As shown in the explanation of the 4D-Var method, some
computational savings can be achieved by a suitable ordering of the algebraic
operations, in order to reduce the size and number of the matrix multiplications
involved. For minimization problems in particular, when the derivative of
a scalar function with respect to a large vector needs to be evaluated (e.g.
Jo), it is advantageous to use the chain rule backwards,
i.e. from the scalar function to the input vector. Algebraically this means
replacing a set of matrices by their transposes, hence the name of adjoint
technique. The definition of the adjoint depends on the scalar products6 used:
Important remarks on the adjoints
| |
• Riesz theorem:
The adjoint always exists and it is unique, assuming spaces of finite
dimension7. Hence, coding the adjoint does not raise questions
about its existence, only questions of technical implementation. |
| |
• In the meteorological
literature, the term adjoint is often improperly used to
denote the adjoint of the tangent linear of a non-linear operator.
One must be aware that discussions about the "existence of the adjoint"
usually address the existence of the tangent linear operator (or the
acceptability of using the adjoint of an improper tangent-linear in
order to minimize a 4D-Var cost-function). As explained above, the
adjoint itself always exist. |
| |
• In general, the adjoint
depends on the definition of spaces and . For instance, a canonical injection (i.e. with being a subspace of ) is not necessarily self-adjoint although does not involve any arithmetic operation. |
| |
• In general, the adjoint
depends on the choice of scalar products, even if . For instance, a symmetric matrix may not be self-adjoint if
the scalar product is not the canonical product (see below). |
The proof is obvious from the definition of :
and noting that E is invertible. In most practical cases (such
as in the rest of this paper) the implicit scalar product used is the canonical
inner product8, so that the transpose is the adjoint: . However, one must take care whenever another scalar product is
used, because it has implications on the coding of the adjoint: the scalar
product coefficients or their inverses must be used according to the above
equation.
Adjoint of a sequence of operators. Like
the transpose, the adjoint of a product of operators is the product of the
adjoints in the reverse order. The scalar product matrices cancel out each
other, so that if is a sequence of operators, its adjoint is
which shows that, even if the scalar products are not the canonical inner
product, in most of the adjoint coding it can be considered that the
adjoint is the transpose. The guidelines for practical adjoint
coding are detailed in an appendix.
Ref: Errico
and Vukicevic 1992.
Training Course Notes Front Page >>
Table of contents >>
Next Section >>
Previous Section >>
1 This reflects history. The main step in meteorological
data assimilation methods was the move from OI to 3D-Var. It was a major
technical challenge in terms of coding and numerical cost at the time, which
required some major developments in the fields of adjoint coding, formulation
of the incremental technique and design of the preconditioner.
2 Some minimization algorithms also use information
about the second derivative of the cost function, which requires the coding
of the second-order adjoint of its components.
3 The symmetric square root is not unique, it
is defined modulo an orthogonal matrix.
4 It is possible to guarantee convergence for
some special forms of the incremental algorithm.
5 One would rather like to use a low-resolution
version of the linearized high-resolution in the vicinity of but it would
be more expensive than the technique described here.
6 or: inner products
7 It is actually true for all continuous operators
in Hilbert spaces, but this is outside the scope of this paper.
8 or: inner dot product, or: Euclidean
product.
|