Adjoint state method

From HandWiki
Short description: Numerical method

The adjoint state method is a numerical method for efficiently computing the gradient of a function or operator in a numerical optimization problem.[1] It has applications in geophysics, seismic imaging, photonics and more recently in neural networks.[2]

The adjoint state space is chosen to simplify the physical interpretation of equation constraints.[3]

Adjoint state techniques allow the use of integration by parts, resulting in a form which explicitly contains the physically interesting quantity. An adjoint state equation is introduced, including a new unknown variable.

The adjoint method formulates the gradient of a function towards its parameters in a constraint optimization form. By using the dual form of this constraint optimization problem, it can be used to calculate the gradient very fast. A nice property is that the number of computations is independent of the number of parameters for which you want the gradient. The adjoint method is derived from the dual problem[4] and is used e.g. in the Landweber iteration method.[5]

The name adjoint state method refers to the dual form of the problem, where the adjoint matrix [math]\displaystyle{ A^*=\overline A ^T }[/math] is used.

When the initial problem consists of calculating the product [math]\displaystyle{ s^T x }[/math] and [math]\displaystyle{ x }[/math] must satisfy [math]\displaystyle{ Ax=b }[/math], the dual problem can be realized as calculating the product [math]\displaystyle{ r^T b }[/math] ([math]\displaystyle{ = s^T x }[/math]), where [math]\displaystyle{ r }[/math] must satisfy [math]\displaystyle{ A^* r = s }[/math]. And [math]\displaystyle{ r }[/math] is called the adjoint state vector.

General case

The original adjoint calculation method goes back to Jean Cea,[6] with the use of the Lagrangian of the optimization problem to compute the derivative of a functional with respect to a shape parameter.

For a state variable [math]\displaystyle{ u \in \mathcal{U} }[/math], an optimization variable [math]\displaystyle{ v\in\mathcal{V} }[/math], an objective functional [math]\displaystyle{ J:\mathcal{U}\times\mathcal{V}\to\mathbb{R} }[/math] is defined. The state variable [math]\displaystyle{ u }[/math] is often implicitly dependant on [math]\displaystyle{ v }[/math] through the (direct) state equation [math]\displaystyle{ D_v(u)=0 }[/math] (usually the weak form of a partial differential equation), thus the considered objective is [math]\displaystyle{ j(v) = J(u_v,v) }[/math]. Usually, one would be interested in calculating [math]\displaystyle{ \nabla j(v) }[/math] using the chain rule:

[math]\displaystyle{ \nabla j(v) = \nabla_v J(u_v,v) + \nabla_u J(u_v)\nabla_v u_v. }[/math]

Unfortunately, the term [math]\displaystyle{ \nabla_v u_v }[/math] is often very hard to differentiate analytically since the dependance is defined through an implicit equation. The Lagrangian functional can be used as a workaround for this issue. Since the state equation can be considered as a constraint in the minimization of [math]\displaystyle{ j }[/math], the problem

[math]\displaystyle{ \text{minimize}\ j(v) = J(u_v,v) }[/math]
[math]\displaystyle{ \text{subject to}\ D_v(u_v) = 0 }[/math]

has an associate Lagrangian functional [math]\displaystyle{ \mathcal{L}:\mathcal{U}\times\mathcal{V}\times\mathcal{U} \to \mathbb{R} }[/math] defined by

[math]\displaystyle{ \mathcal{L}(u,v,\lambda) = J(u,v) + \langle D_v(u),\lambda\rangle, }[/math]

where [math]\displaystyle{ \lambda\in\mathcal{U} }[/math] is a Lagrange multiplier or adjoint state variable and [math]\displaystyle{ \langle\cdot,\cdot\rangle }[/math] is an inner product on [math]\displaystyle{ \mathcal{U} }[/math]. The method of Lagrange multipliers states that a solution to the problem has to be a stationary point of the lagrangian, namely

[math]\displaystyle{ \begin{cases} d_u\mathcal{L}(u,v,\lambda;\delta_u) = d_uJ(u,v;\delta_u) + \langle \delta_u,D_v^*(\lambda)\rangle = 0 & \forall \delta_u \in \mathcal{U},\\ d_v\mathcal{L}(u,v,\lambda;\delta_v) = d_vJ(u,v;\delta_v) + \langle d_vD_v(u;\delta_v),\lambda\rangle = 0 & \forall \delta_v\in\mathcal{V},\\ d_\lambda\mathcal{L}(u,v,\lambda;\delta_\lambda) = \langle D_v(u), \delta_\lambda\rangle = 0 \quad & \forall \delta_\lambda \in \mathcal{U}, \end{cases} }[/math]

where [math]\displaystyle{ d_xF(x;\delta_x) }[/math] is the Gateaux derivative of [math]\displaystyle{ F }[/math] with respect to [math]\displaystyle{ x }[/math] in the direction [math]\displaystyle{ \delta_x }[/math]. The last equation is equivalent to [math]\displaystyle{ D_v(u) = 0 }[/math], the state equation, to which the solution is [math]\displaystyle{ u_v }[/math]. The first equation is the so-called adjoint state equation,

[math]\displaystyle{ \langle \delta_u,D_v^*(\lambda)\rangle = -d_uJ(u_v,v;\delta_u) \quad \forall \delta_u \in \mathcal{U}, }[/math]

because the operator involved is the adjoint operator of [math]\displaystyle{ D_v }[/math], [math]\displaystyle{ D_v^* }[/math]. Resolving this equation yields the adjoint state [math]\displaystyle{ \lambda_v }[/math]. The gradient of the quantity of interest [math]\displaystyle{ j }[/math] with respect to [math]\displaystyle{ v }[/math] is [math]\displaystyle{ \langle \nabla j(v),\delta_v\rangle = d_v j(v;\delta_v) = d_v \mathcal{L}(u_v,v,\lambda_v;\delta_v) }[/math] (the second equation with [math]\displaystyle{ u=u_v }[/math] and [math]\displaystyle{ \lambda=\lambda_v }[/math]), thus it can be easily identified by subsequently resolving the direct and adjoint state equations. The process is even simpler when the operator [math]\displaystyle{ D_v }[/math] is self-adjoint or symmetric since the direct and adjoint state equations differ only by their right-hand side.

Example: Linear case

In a real finite dimensional linear programming context, the objective function could be [math]\displaystyle{ J(u,v) = \langle Au , v \rangle }[/math], for [math]\displaystyle{ v\in\mathbb{R}^n }[/math], [math]\displaystyle{ u\in\mathbb{R}^m }[/math] and [math]\displaystyle{ A \in \mathbb{R}^{m\times n} }[/math], and let the state equation be [math]\displaystyle{ B_vu = b }[/math], with [math]\displaystyle{ B_v \in \mathbb{R}^{m\times m} }[/math] and [math]\displaystyle{ b\in \mathbb{R}^m }[/math].

The lagrangian function of the problem is [math]\displaystyle{ \mathcal{L}(u,v,\lambda) = \langle Au,v\rangle + \langle B_vu - b,\lambda\rangle }[/math], where [math]\displaystyle{ \lambda\in\mathbb{R}^m }[/math].

The derivative of [math]\displaystyle{ \mathcal{L} }[/math] with respect to [math]\displaystyle{ \lambda }[/math] yields the state equation as shown before, and the state variable is [math]\displaystyle{ u_v = B_v^{-1}b }[/math]. The derivative of [math]\displaystyle{ \mathcal{L} }[/math] with respect to [math]\displaystyle{ u }[/math] is equivalent to the adjoint equation, which is, for every [math]\displaystyle{ \delta_u \in \mathbb{R}^m }[/math],

[math]\displaystyle{ d_u[\langle B_v\cdot- b, \lambda\rangle](u;\delta_u)= - \langle A^\top v,\delta u\rangle \iff \langle B_v \delta_u,\lambda\rangle = - \langle A^\top v,\delta u\rangle \iff \langle B_v^\top \lambda + A^\top v,\delta_u\rangle = 0 \iff B_v^\top \lambda = - A^\top v. }[/math]

Thus, we can write symbolically [math]\displaystyle{ \lambda_v = B_v^{-\top}A^\top v }[/math]. The gradient would be

[math]\displaystyle{ \langle \nabla j(v),\delta_v\rangle = \langle Au_v,\delta_v\rangle + \langle \nabla_vB_v : \lambda_v\otimes u_v,\delta_v\rangle, }[/math]

where [math]\displaystyle{ \nabla_v B_v = \frac{\partial B_{ij}}{\partial v_k} }[/math] is a third order tensor, [math]\displaystyle{ \lambda_v \otimes u_v = \lambda^\top_v u_v }[/math] is the dyadic product between the direct and adjoint states and [math]\displaystyle{ : }[/math] denotes a double tensor contraction. It is assumed that [math]\displaystyle{ B_v }[/math] has a known analytic expression that can be differentiated easily.

Numerical consideration for the self-adjoint case

If the operator [math]\displaystyle{ B_v }[/math] was self-adjoint, [math]\displaystyle{ B_v = B_v^\top }[/math], the direct state equation and the adjoint state equation would have the same left-hand side. In the goal of never inverting a matrix, which is a very slow process numerically, a LU decomposition can be used instead to solve the state equation, in [math]\displaystyle{ O(m^3) }[/math] operations for the decomposition and [math]\displaystyle{ O(m^2) }[/math] operations for the resolution. That same decomposition can then be used to solve the adjoint state equation in only [math]\displaystyle{ O(m^2) }[/math] operations since the matrices are the same.

See also

References

  1. Pollini, Nicolò; Lavan, Oren; Amir, Oded (2018-06-01). "Adjoint sensitivity analysis and optimization of hysteretic dynamic systems with nonlinear viscous dampers" (in en). Structural and Multidisciplinary Optimization 57 (6): 2273–2289. doi:10.1007/s00158-017-1858-2. ISSN 1615-1488. 
  2. Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud Neural Ordinary Differential Equations Available online
  3. Plessix, R-E. "A review of the adjoint-state method for computing the gradient of a functional with geophysical applications." Geophysical Journal International, 2006, 167(2): 495-503. free access on GJI website
  4. McNamara, Antoine; Treuille, Adrien; Popović, Zoran; Stam, Jos (August 2004). "Fluid control using the adjoint method". ACM Transactions on Graphics 23 (3): 449–456. doi:10.1145/1015706.1015744. https://www.dgp.toronto.edu/public_user/stam/reality/Research/pdf/sig04.pdf. Retrieved 28 October 2022. 
  5. Lundvall, Johan (2007). "Data Assimilation in Fluid Dynamics using Adjoint Optimization" (in en). Sweden: Linköping University of Technology. http://liu.diva-portal.org/smash/get/diva2:24091/FULLTEXT01.pdf. 
  6. Cea, Jean (1986). "Conception optimale ou identification de formes, calcul rapide de la dérivée directionnelle de la fonction coût" (in fr). ESAIM: Mathematical Modelling and Numerical Analysis - Modélisation Mathématique et Analyse Numérique 20 (3): 371–402. doi:10.1051/m2an/1986200303711. http://www.numdam.org/item/M2AN_1986__20_3_371_0/. 

External links

  • A well written explanation by Errico: What is an adjoint Model?
  • Another well written explanation with worked examples, written by Bradley [1]
  • More technical explanation: A review of the adjoint-state method for computing the gradient of a functional with geophysical applications
  • MIT course [2]
  • MIT notes [3]