Moreau envelope

From HandWiki

The Moreau envelope (or the Moreau-Yosida regularization) [math]\displaystyle{ M_f }[/math] of a proper lower semi-continuous convex function [math]\displaystyle{ f }[/math] is a smoothed version of [math]\displaystyle{ f }[/math]. It was proposed by Jean-Jacques Moreau in 1965.[1] The Moreau envelope has important applications in mathematical optimization: minimizing over [math]\displaystyle{ M_f }[/math] and minimizing over [math]\displaystyle{ f }[/math] are equivalent problems in the sense that set of minimizers of [math]\displaystyle{ f }[/math] and [math]\displaystyle{ M_f }[/math] are the same. However, first-order optimization algorithms can be directly applied to [math]\displaystyle{ M_f }[/math], since [math]\displaystyle{ f }[/math] may be non-differentiable while [math]\displaystyle{ M_f }[/math] is always continuously differentiable. Indeed, many proximal gradient methods can be interpreted as a gradient descent method over [math]\displaystyle{ M_f }[/math].

Definition

The Moreau envelope of a proper lower semi-continuous convex function [math]\displaystyle{ f }[/math] from a Hilbert space [math]\displaystyle{ \mathcal{X} }[/math] to [math]\displaystyle{ (-\infty,+\infty] }[/math] is defined as[2]

[math]\displaystyle{ M_{\lambda f}(v) = \inf_{x\in\mathcal{X}} \left(f(x) + \frac{1}{2\lambda} \|x - v\|_2^2\right). }[/math]

Given a parameter [math]\displaystyle{ \lambda \in \mathbb{R} }[/math], the Moreau envelope of [math]\displaystyle{ \lambda f }[/math] is also called as the Moreau envelope of [math]\displaystyle{ f }[/math] with parameter [math]\displaystyle{ \lambda }[/math].[2]

Properties

  • The Moreau envelope can also be seen as the infimal convolution between [math]\displaystyle{ f }[/math] and [math]\displaystyle{ (1/2)\| \cdot \|^2_2 }[/math].
  • The proximal operator of a function is related to the gradient of the Moreau envelope by the following identity:

[math]\displaystyle{ \nabla M_{\lambda f}(x) = \frac{1}{\lambda} (x - \mathrm{prox}_{\lambda f}(x)) }[/math]. By defining the sequence [math]\displaystyle{ x_{k+1} = \mathrm{prox}_{\lambda f}(x_k) }[/math] and using the above identity, we can interpret the proximal operator as a gradient descent algorithm over the Moreau envelope.

[math]\displaystyle{ M_{\lambda f}(v) = \max_{p \in \mathcal X} \left( \langle p, v \rangle - \frac{\lambda}{2} \| p \|^2 - f^*(p)\right), }[/math] where [math]\displaystyle{ f^* }[/math] denotes the convex conjugate of [math]\displaystyle{ f }[/math]. Since the subdifferential of a proper, convex, lower semicontinuous function on a Hilbert space is inverse to the subdifferential of its convex conjugate, we can conclude that if [math]\displaystyle{ p_0 \in \mathcal X }[/math] is the maximizer of the above expression, then [math]\displaystyle{ x_0 := v - \lambda p_0 }[/math] is the minimizer in the primal formulation and vice versa.

See also

References

  1. Moreau, J. J. (1965). "Proximité et dualité dans un espace hilbertien". Bulletin de la Société Mathématique de France 93: 273–299. doi:10.24033/bsmf.1625. ISSN 0037-9484. https://eudml.org/doc/87067. 
  2. 2.0 2.1 Neal Parikh and Stephen Boyd (2013). "Proximal Algorithms". Foundations and Trends in Optimization 1 (3): 123–231. https://web.stanford.edu/~boyd/papers/pdf/prox_algs.pdf. Retrieved 2019-01-29. 
  3. Heaton, Howard; Fung, Samy Wu; Osher, Stanley (2022-10-09). "Global Solutions to Nonconvex Problems by Evolution of Hamilton–Jacobi PDEs". arXiv:2202.11014 [math.OC].
  4. Osher, Stanley; Heaton, Howard; Fung, Samy Wu (2022-11-23). "A Hamilton–Jacobi-based Proximal Operator". arXiv:2211.12997 [math.OC].

External links