Augmented Lagrangian method

From HandWiki
Short description: Class of algorithms for solving constrained optimization problems

Augmented Lagrangian methods are a certain class of algorithms for solving constrained optimization problems. They have similarities to penalty methods in that they replace a constrained optimization problem by a series of unconstrained problems and add a penalty term to the objective, but the augmented Lagrangian method adds yet another term designed to mimic a Lagrange multiplier. The augmented Lagrangian is related to, but not identical with, the method of Lagrange multipliers.

Viewed differently, the unconstrained objective is the Lagrangian of the constrained problem, with an additional penalty term (the augmentation).

The method was originally known as the method of multipliers and was studied in the 1970s and 1980s as a potential alternative to penalty methods. It was first discussed by Magnus Hestenes[1] and then by Michael Powell in 1969.[2] The method was studied by R. Tyrrell Rockafellar in relation to Fenchel duality, particularly in relation to proximal-point methods, Moreau–Yosida regularization, and maximal monotone operators; these methods were used in structural optimization. The method was also studied by Dimitri Bertsekas, notably in his 1982 book,[3] together with extensions involving non-quadratic regularization functions (e.g., entropic regularization). This combined study gives rise to the "exponential method of multipliers" which handles inequality constraints with a twice-differentiable augmented Lagrangian function.

Since the 1970s, sequential quadratic programming (SQP) and interior point methods (IPM) have been given more attention, in part because they more easily use sparse matrix subroutines from numerical software libraries, and in part because IPMs possess proven complexity results via the theory of self-concordant functions. The augmented Lagrangian method was rejuvenated by the optimization systems LANCELOT, ALGENCAN[4][5] and AMPL, which allowed sparse matrix techniques to be used on seemingly dense but "partially-separable" problems. The method is still useful for some problems.[6]

Around 2007, there was a resurgence of augmented Lagrangian methods in fields such as total variation denoising and compressed sensing. In particular, a variant of the standard augmented Lagrangian method that uses partial updates (similar to the Gauss–Seidel method for solving linear equations) known as the alternating direction method of multipliers or ADMM gained some attention.

General method

Consider solving the following constrained optimization problem:

[math]\displaystyle{ \min f(\mathbf{x}) }[/math]

subject to

[math]\displaystyle{ c_i(\mathbf{x}) = 0 ~\forall i \in \mathcal{E}, }[/math]

where [math]\displaystyle{ \mathcal{E} }[/math] denotes the indices for equality constraints. This problem can be solved as a series of unconstrained minimization problems. For reference, we first list the kth step of the penalty method approach:

[math]\displaystyle{ \min \Phi_k (\mathbf{x}) = f (\mathbf{x}) + \mu_k ~ \sum_{i\in \mathcal{E}} ~ c_i(\mathbf{x})^2. }[/math]

The penalty method solves this problem, then at the next iteration it re-solves the problem using a larger value of [math]\displaystyle{ \mu_k }[/math] and using the old solution as the initial guess or "warm start".

The augmented Lagrangian method uses the following unconstrained objective:

[math]\displaystyle{ \min \Phi_k (\mathbf{x}) = f (\mathbf{x}) + \frac{\mu_k}{2} ~ \sum_{i\in \mathcal{E}} ~ c_i(\mathbf{x})^2 + \sum_{i\in \mathcal{E}} ~ \lambda_i c_i(\mathbf{x}) }[/math]

and after each iteration, in addition to updating [math]\displaystyle{ \mu_k }[/math], the variable [math]\displaystyle{ \lambda }[/math] is also updated according to the rule

[math]\displaystyle{ \lambda_i \leftarrow \lambda_i + \mu_k c_i(\mathbf{x}_k) }[/math]

where [math]\displaystyle{ \mathbf{x}_k }[/math] is the solution to the unconstrained problem at the kth step (i.e. [math]\displaystyle{ \mathbf{x}_k=\text{argmin} \Phi_k(\mathbf{x}) }[/math]).

The variable [math]\displaystyle{ \lambda }[/math] is an estimate of the Lagrange multiplier, and the accuracy of this estimate improves at every step. The major advantage of the method is that unlike the penalty method, it is not necessary to take [math]\displaystyle{ \mu \rightarrow \infty }[/math] in order to solve the original constrained problem. Because of the presence of the Lagrange multiplier term, [math]\displaystyle{ \mu }[/math] can stay much smaller, and thus avoiding ill-conditioning.[6] Nevertheless, it is common in practical implementations to project multipliers estimates in a large bounded set (safeguards) which avoids numerical instabilities and leads to strong theoretical convergence.[5]

The method can be extended to handle inequality constraints. For a discussion of practical improvements, see refs.[6][5]

Alternating direction method of multipliers

The alternating direction method of multipliers (ADMM) is a variant of the augmented Lagrangian scheme that uses partial updates for the dual variables. This method is often applied to solve problems such as,

[math]\displaystyle{ \min_x f(x) + g(x). }[/math]

This is equivalent to the constrained problem,

[math]\displaystyle{ \min_{x,y} f(x) + g(y), \quad \text{subject to}\quad x = y. }[/math]

Though this change may seem trivial, the problem can now be attacked using methods of constrained optimization (in particular, the augmented Lagrangian method), and the objective function is separable in x and y. The dual update requires solving a proximity function in x and y at the same time; the ADMM technique allows this problem to be solved approximately by first solving for x with y fixed and then solving for y with x fixed. Rather than iterate until convergence (like the Jacobi method), the algorithm proceeds directly to updating the dual variable and then repeating the process. This is not equivalent to the exact minimization, but the method still converges to the correct solution under some assumptions. Because of this approximation, the algorithm is distinct from the pure augmented Lagrangian method.

The ADMM can be viewed as an application of the Douglas-Rachford splitting algorithm, and the Douglas-Rachford algorithm is in turn an instance of the Proximal point algorithm; details can be found in ref.[7] There are several modern software packages, including YALL1[8] (2009), SpaRSA[9] (2009) and SALSA[10] (2009), which solve Basis pursuit and variants and use the ADMM. There are also packages which use the ADMM to solve more general problems, some of which can exploit multiple computing cores (e.g., SNAPVX[11] (2015), parADMM[12] (2016)).

Stochastic optimization

Stochastic optimization considers the problem of minimizing a loss function with access to noisy samples of the (gradient of the) function. The goal is to have an estimate of the optimal parameter (minimizer) per new sample. With some modifications, ADMM can be used for stochastic optimization. In a stochastic setting, only noisy samples of a gradient are accessible, so an inexact approximation of the Lagrangian is used:

[math]\displaystyle{ \hat{\mathcal{L}}_{\rho,k} = f_1(x_k)+\langle \nabla f(x_k,\zeta_{k+1}),x \rangle+g(y)-z^T (Ax + By - c)+\frac{\rho}{2} \Vert Ax + By - c \Vert^2+\frac{\Vert x-x_k \Vert^2}{2\eta_{k+1}}, }[/math]

where [math]\displaystyle{ \eta_{k+1} }[/math] is a time-varying step size.[13]

ADMM has been applied to solve regularized problems, where the function optimization and regularization can be carried out locally and then coordinated globally via constraints.[14][15][16][17]

Regularized optimization problems are especially relevant in the high-dimensional regime as regularization is a natural mechanism to overcome ill-posedness and to encourage parsimony in the optimal solution (e.g., sparsity and low rank). ADMM's effectiveness for solving regularized problems may mean it could be useful for solving high-dimensional stochastic optimization problems.[18]

Alternative approaches

Software

Open source and non-free/commercial implementations of the augmented Lagrangian method:

  • Accord.NET (C# implementation of augmented Lagrangian optimizer)
  • ALGLIB (C# and C++ implementations of preconditioned augmented Lagrangian solver)
  • PENNON (GPL 3, commercial license available)
  • LANCELOT (free "internal use" license, paid commercial options)
  • MINOS (also uses an augmented Lagrangian method for some types of problems).
  • The code for Apache 2.0 licensed REASON is available online.[19]
  • ALGENCAN (Fortran implementation of augmented Lagrangian method with safeguards). Available online.[20]
  • NLOPT (C++ implementation of augmented Lagrangian optimizer, accessible from different programming languages[21][22])[23]
  • PyProximal (Python implementation of augmented Lagrangian method).[24]

See also

References

  1. Hestenes, M. R. (1969). "Multiplier and gradient methods". Journal of Optimization Theory and Applications 4 (5): 303–320. doi:10.1007/BF00927673. 
  2. Powell, M. J. D. (1969). "A method for nonlinear constraints in minimization problems". in Fletcher, R.. Optimization. New York: Academic Press. pp. 283–298. ISBN 0-12-260650-7. 
  3. Bertsekas, Dimitri P. (1996). Constrained optimization and Lagrange multiplier methods. Athena Scientific. 
  4. Andreani, R.; Birgin, E. G.; Martínez, J. M.; Schuverdt, M. L. (2007). "On Augmented Lagrangian Methods with General Lower-Level Constraints". SIAM Journal on Optimization 18 (4): 1286–1309. doi:10.1137/060654797. http://www.repositorio.unicamp.br/handle/REPOSIP/67595. 
  5. 5.0 5.1 5.2 (Birgin Martínez)
  6. 6.0 6.1 6.2 (Nocedal Wright), chapter 17
  7. Eckstein, J.; Bertsekas, D. P. (1992). "On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators". Mathematical Programming 55 (1–3): 293–318. doi:10.1007/BF01581204. 
  8. "YALL1: Your ALgorithms for L1". http://yall1.blogs.rice.edu/. 
  9. "SpaRSA". http://www.lx.it.pt/~mtf/SpaRSA/. 
  10. "(C)SALSA: A Solver for Convex Optimization Problems in Image Recovery". http://cascais.lx.it.pt/~mafonso/salsa.html. 
  11. "SnapVX". https://snap.stanford.edu/snapvx/. 
  12. "parADMM/engine". February 6, 2021. https://github.com/parADMM/engine. 
  13. Ouyang, H.; He, N.; Tran, L.; Gray, A. G (2013). "Stochastic alternating direction method of multipliers". Proceedings of the 30th International Conference on Machine Learning: 80–88. 
  14. Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. (2011). "Distributed optimization and statistical learning via the alternating direction method of multipliers". Foundations and Trends{\textregistered} in Machine Learning 3 (1): 1–122. doi:10.1561/2200000016. 
  15. Wahlberg, B.; Boyd, S.; Annergren, M.; Wang, Y. (2012). "An ADMM algorithm for a class of total variation regularized estimation problems". arXiv:1203.1828 [stat.ML].
  16. Esser, E.; Zhang, X.; Chan, T. (2010). "A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science". SIAM Journal on Imaging Sciences 3 (4): 1015–1046. doi:10.1137/09076934X. 
  17. Mota, J. FC; Xavier, J. MF; Aguiar, P. MQ; Puschel, M. (2012). "Distributed ADMM for model predictive control and congestion control". 2012 IEEE 51st IEEE Conference on Decision and Control (CDC). pp. 5110–5115. doi:10.1109/CDC.2012.6426141. ISBN 978-1-4673-2066-5. 
  18. "Alternating Direction Method of Multipliers - an overview | ScienceDirect Topics". https://www.sciencedirect.com/topics/mathematics/alternating-direction-method-of-multipliers. 
  19. "Bitbucket". https://bitbucket.org/megaDataLab/reason2. 
  20. "TANGO Project". https://www.ime.usp.br/~egbirgin/tango/codes.php. 
  21. Stamm, Aymeric (2022-07-15), nloptr, https://github.com/astamm/nloptr, retrieved 2022-07-19 
  22. The NLopt module for Julia, JuliaOpt, 2022-06-25, https://github.com/JuliaOpt/NLopt.jl, retrieved 2022-07-19 
  23. Johnson, Steven G. (2022-07-14), stevengj/nlopt, https://github.com/stevengj/nlopt, retrieved 2022-07-19 
  24. "PyProximal Project". http://pyproximal.readthedocs.io. 

Bibliography