|
|
Training
Course Notes Front Page >>
Table of contents >>
Next Section >>
Previous Section >>
APPENDIX B. Practical adjoint coding
As explained previously, coding the adjoint is mostly a
problem of coding a transpose. Assuming a linear operator is available as
a piece of code, called direct code, there are two approaches to
implement the code for the adjoint operator. One is to take the operator
as a whole, store its matrix (e.g. by taking the image of each canonical
basis vector; the matrix of a tangent linear operator is called the Jacobian
matrix and its coefficients are partial derivatives of the output with respect
to the input) and code the multiplication by its transpose, which is only
feasible if the matrix can be evaluated and stored at a reasonable price.
The other, more common approach, is to use the rule above
for taking the adjoint of a sequence of operators, and to apply it to each
elementary step of the direct code, called "model" here to fix ideas. Most
of the time there is a piece of adjoint to code for each (or almost) active
instruction of the direct code, considered as elementary linear operators,
each in its little subspace. The concept of `subspace' of a piece of code
is justified by the fact that most components of the state are not modified
by it, so that the corresponding operator is a block-diagonal matrix with
just a little block spanning the variables that are actually used on input
and modified on output:
| |
|
|
From a coding point of view, it is only necessary to code
the action performed by , the other variables are kept unchanged
anyway. This allows one to work locally, by following a few simple rules:
| |
• the adjoint of a sequence
of operations is the reverse sequence of the transposes of each operation. |
| |
• the scalar products need
to be considered only at the beginning and at the end of the code
that is being adjointed (unless one wants to use some special properties
of pieces of code with respect to particular products, like the unitary
character of Fourier transforms with respect to the norm). |
| |
• the input to a piece
of code (e.g. a subroutine) becomes the output of the corresponding
adjoint code, and vice versa. Care must be taken when specifying the
interfaces between subroutines, so what is input and what is output
at each stage must be clear. It means that the adjoint coding is much
easier if good programming principles have been respected in the direct
code to start with, such as modularity, consistent variable naming
and interface control. |
| |
• it is recommended to
use the same variable names for matching direct (i.e. tangent linear)
and adjoint model states, in order to be able to reuse the direct
code for array dimensioning and self- adjoint operations. |
| |
• the actual coding of
the adjoint is performed at the smallest possible level of active
subsets of code (one active instruction, or a small number of instructions
that clearly depict an explicit linear operator) that must each be
a linear operator with known coefficients. Its adjoint is the transpose
operator, taken in the relevant space, which implies the following
items. |
| |
• Each modified variable
is a part of the input space unless this subset of code is the
first time it is used in the whole direct code, i.e. it is being
"defined" at this stage. |
| |
• Each input variable is
a part of the output space unless this subset of code is the last
time it is used in the whole direct code, i.e. it is being "undefined"
at this stage. |
| |
• The adjoint of a variable
"undefinition", i.e. the end of its scope, is its setting to zero. |
| |
• For code robustness,
it is advised to consider that no variable is being undefined anywhere
except at the end of code units like subroutines where they must all
be pre-initialized to zero, so that each adjoint operation will be
written as the addition of something to a variable. |
The last items deserve some illustration. When a new variable
starts to be used at some point in the code, (e.g. an array is allocated,
or a variable is initialized for the first time) we go from a space e.g.
to a bigger space, e.g. . Hence in the adjoint we go from to , which is a projection operator, and is "undefined" in the adjoint code, although no matching instruction
exists in a language like Fortran, so that no specific statement is needed
in the adjoint. The undefinition is usually performed when returning from
an adjoint subroutine. If is used later in the adjoint code, it must have been re-initialized.
When a new variable stops being used, we go from space
to , and this is usually implicit in the direct code
after the last instruction that uses . One
can consider that the definition of a local variable is lost when returning
from a subroutine. This inconspicuous operation in the direct code is mathematically
known as a canonical injection. Its matrix is obtained from the
direct code matrix, which is a projection:
| |
|
|
(So that the transpose operator1 reads,
using the same variable letters (although they do not necessarily have the
same values as in the direct operation):
or, in Fortran:
If this instruction is forgotten it will result in a badly
initialized b variable, with possibly erroneous results if the same variable
name is used for other computations before.
Hence the adjoint of even a simple assignment a=b depends
on the scope of the variables. If the input space is and the output space is , the algebraic
direct operation is
so that the adjoint is trivially
and the adjoint code is b=a. If however may be
used later in the direct code, it is not being undefined, the output space
is and the algebraic direct operation a=b
is now
The adjoint is
| |
|
|
(and the adjoint code is b=b+a, which is quite different, because b is now
both in the output and in the output of the direct code2. If b is used later in the direct code, it will
already contain something which will be used when doing b=b+a in the adjoint
code. Physically speaking, it means that the sensitivity of the output to
(which
is what the adjoint variable b contains) is the sum of the sensitivities
to in all operations that read the value
of b in the direct code.
If one codes b=b+a although b is not used later in the
code, b is still correctly initialized in the adjoint because the adjoint
of its eventual undefinition is b=0 which will be placed before. It can
be difficult to remember in a large code where each variable is used for
the last time. Variable undefinition is usually easy to spot because it
is always at the end of program sections (subroutines) or at variable de-allocation.
If the interface between program sections is clearly documented, this makes
it easy to pre-initialize the adjoint variables to zero at the right place.
Hence the best adjoint programming rule is to always assume
that a variable is being used later, and to set all adjoint code variables
to zero when they are defined.
For instance, the adjoint of a typical line of code like
a=s*b+t*c (the * is the multiplication, s and t are constants) is
|
a=0 ! when a is first defined
in the adjoint code |
|
b=0 ! when b is first defined
in the adjoint code |
|
c=0 ! when c is first defined
in the adjoint code |
If any of the a,b,c variables are defined as input arguments
to a subroutine in the adjoint, then of course their initial value is defined
outside and they should retain their input value.
However, there is no problem of undefinition of a in a
statement like a=s*a+t*c which has the adjoint
|
c=0 ! when c is first defined
in the adjoint code |
because a is both in the input and output spaces. Note that
a conditional like this in the direct code:
defines a non-linear function of a which is not licit. The
problem is in the linearization, not in taking the adjoint.
Most problems with writing adjoint codes are with in the
handling of the trajectory (the linearization coefficients that appear when
taking the differential of a non-linear operator), because the adjoint requires
these values in an order that is the reverse of their computation order.
They need to be stored, or recomputed on the fly, which is usually a matter
of compromising between storage space (or disk I/O) and CPU time, to assess
on a case-to-case basis.
Training Course Notes Front Page >>
Table of contents >>
Next Section >>
Previous Section >>
1 Sometimes called the adjoint of
identity.
2 Whether a is part of the input space in the
direct code is not important, because it is being overwritten. In the adjoint,
putting explicitly a in both input and output spaces would simply result
in the additional useless line of adjoint code: a=a. One should worry more
about the scope of input variables than about output variables when examining
the direct code.
|