Chambolle-Pock algorithm

From HandWiki
Short description: Primal-Dual algorithm optimization for convex problems
Original image and damaged
Original test image and damaged one
Original image and damaged
Example of application of the Chambolle-Pock algorithm to image reconstruction.

In mathematics, the Chambolle-Pock algorithm is an algorithm used to solve convex optimization problems. It was introduced by Antonin Chambolle and Thomas Pock[1] in 2011 and has since become a widely used method in various fields, including image processing,[2][3][4] computer vision,[5] and signal processing.[6]

The Chambolle-Pock algorithm is specifically designed to efficiently solve convex optimization problems that involve the minimization of a non-smooth cost function composed of a data fidelity term and a regularization term.[1] This is a typical configuration that commonly arises in ill-posed imaging inverse problems such as image reconstruction,[2] denoising[3] and inpainting.[4]

The algorithm is based on a primal-dual formulation, which allows for simultaneous updates of primal and dual variables. By employing the proximal operator, the Chambolle-Pock algorithm efficiently handles non-smooth and non-convex regularization terms, such as the total variation, specific in imaging framework.[1]

Problem statement

Let be [math]\displaystyle{ \mathcal{X}, \mathcal{Y} }[/math] two real vector spaces equipped with an inner product [math]\displaystyle{ \langle \cdot, \cdot \rangle }[/math] and a norm [math]\displaystyle{ \lVert \,\cdot \,\rVert = \langle \cdot, \cdot \rangle^{\frac{1}{2}} }[/math]. From up to now, a function [math]\displaystyle{ F }[/math] is called simple if its proximal operator [math]\displaystyle{ \text{prox}_{\tau F} }[/math] has a closed-form representation or can be accurately computed, for [math]\displaystyle{ \tau \gt 0 }[/math],[1] where [math]\displaystyle{ \text{prox}_{\tau F} }[/math] is referred to

[math]\displaystyle{ x = \text{prox}_{\tau F}(\tilde{x}) = \text{arg } \min_{x'\in \mathcal{X}}\left\{ \frac{\lVert x'-\tilde{x}\rVert^2}{2\tau} + F(x') \right\} }[/math]

Consider the following constrained primal problem:[1]

[math]\displaystyle{ \min_{x\in\mathcal{X}} F(Kx) + G(x) }[/math]

where [math]\displaystyle{ K:\mathcal{X} \rightarrow \mathcal{Y} }[/math] is a bounded linear operator, [math]\displaystyle{ F:\mathcal{Y} \rightarrow [0, +\infty), G:\mathcal{X} \rightarrow [0, +\infty) }[/math] are convex, lower semicontinuous and simple.[1]

The minimization problem has its dual corresponding problem as[1]

[math]\displaystyle{ \max_{y\in\mathcal{Y}} -\left(G^*(-K^*y) + F^*(y)\right) }[/math]

where [math]\displaystyle{ F^*, G^* }[/math] and [math]\displaystyle{ K^* }[/math] are the dual map of [math]\displaystyle{ F, G }[/math] and [math]\displaystyle{ K }[/math], respectively.[1]

Assume that the primal and the dual problems have at least a solution [math]\displaystyle{ (\hat{x}, \hat{y}) \in \mathcal{X}\times \mathcal{Y} }[/math], that means they satisfies[7]

[math]\displaystyle{ \begin{align} K\hat{x} &\in \partial F^*(\hat{y})\\ -(K^*\hat{y}) &\in \partial G(\hat{x}) \end{align} }[/math]

where [math]\displaystyle{ \partial F^* }[/math] and [math]\displaystyle{ \partial G }[/math] are the subgradient of the convex functions [math]\displaystyle{ F^* }[/math] and [math]\displaystyle{ G }[/math], respectively.[7]

The Chambolle-Pock algorithm solves the so-called saddle-point problem[1]

[math]\displaystyle{ \min_{x\in\mathcal{X}} \max_{y\in\mathcal{Y}} \langle Kx, y \rangle + G(x) - F^*(y) }[/math]

which is a primal-dual formulation of the nonlinear primal and dual problems stated before.[1]

Algorithm

The Chambolle-Pock algorithm primarily involves iteratively alternating between ascending in the dual variable [math]\displaystyle{ y }[/math] and descending in the primal variable [math]\displaystyle{ x }[/math] using a gradient-like approach, with step sizes [math]\displaystyle{ \sigma }[/math] and [math]\displaystyle{ \tau }[/math] respectively, in order to simultaneously solve the primal and the dual problem.[2] Furthermore, an over-relaxation technique is employed for the primal variable with the parameter [math]\displaystyle{ \theta }[/math].[1]

Algorithm Chambolle-Pock algorithm
Input:  [math]\displaystyle{  F, G, K, \tau, \sigma \gt 0, \, \theta \in[0,1],\, (x^0,y^0)\in\mathcal{X}\times\mathcal{Y} }[/math] and set [math]\displaystyle{  \overline{x}^0 = x^0 }[/math], stopping criterion.
[math]\displaystyle{  k \leftarrow 0  }[/math]
do while stopping criterion not satisfied
    [math]\displaystyle{  y^{n+1} \leftarrow \text{prox}_{\sigma F^*}\left(y^{n} + \sigma K\overline{x}^{n}\right)  }[/math]
    [math]\displaystyle{  x^{n+1} \leftarrow \text{prox}_{\tau G}\left(x^{n} - \tau K^*y^{n+1}\right)  }[/math]
    [math]\displaystyle{  \overline{x}^{n+1} \leftarrow x^{n+1} + \theta\left( x^{n+1} -x^{n}\right) }[/math]
    [math]\displaystyle{ k\leftarrow k+1 }[/math]
end do
  • "←" denotes assignment. For instance, "largestitem" means that the value of largest changes to the value of item.
  • "return" terminates the algorithm and outputs the following value.

Chambolle and Pock proved[1] that the algorithm converges if [math]\displaystyle{ \theta = 1 }[/math] and [math]\displaystyle{ \tau \sigma \lVert K \rVert^2 \leq 1 }[/math], sequentially and with [math]\displaystyle{ \mathcal{O}(1/N) }[/math] as rate of convergence for the primal-dual gap. This has been extended by S. Banert et al.[8] to hold whenever [math]\displaystyle{ \theta\gt 1/2 }[/math] and [math]\displaystyle{ \tau \sigma \lVert K \rVert^2 \lt 4 / (1+2\theta) }[/math].

The semi-implicit Arrow-Hurwicz method[9] coincides with the particular choice of [math]\displaystyle{ \theta = 0 }[/math] in the Chambolle-Pock algorithm.[1]

Acceleration

There are special cases in which the rate of convergence has a theoretical speed up.[1] In fact, if [math]\displaystyle{ G }[/math], respectively [math]\displaystyle{ F^* }[/math], is uniformly convex then [math]\displaystyle{ G^* }[/math], respectively [math]\displaystyle{ F }[/math], has a Lipschitz continuous gradient. Then, the rate of convergence can be improved to [math]\displaystyle{ \mathcal{O}(1/N^2) }[/math], providing a slightly changes in the Chambolle-Pock algorithm. It leads to an accelerated version of the method and it consists in choosing iteratively [math]\displaystyle{ \tau_n, \sigma_n }[/math], and also [math]\displaystyle{ \theta_n }[/math], instead of fixing these values.[1]

In case of [math]\displaystyle{ G }[/math] uniformly convex, with [math]\displaystyle{ \gamma\gt 0 }[/math] the uniform-convexity constant, the modified algorithm becomes[1]

Algorithm Accelerated Chambolle-Pock algorithm
Input:  [math]\displaystyle{  F, G, \tau_0, \sigma_0 \gt 0 }[/math] such that [math]\displaystyle{  \tau_0\sigma_0 L^2 \leq 1,\, (x^0,y^0)\in\mathcal{X}\times\mathcal{Y} }[/math] and set [math]\displaystyle{  \overline{x}^0 = x^0. }[/math], stopping criterion.
[math]\displaystyle{  k \leftarrow 0  }[/math]
do while stopping criterion not satisfied
    [math]\displaystyle{  y^{n+1} \leftarrow \text{prox}_{\sigma_n F^*}\left(y^{n} + \sigma_n K\overline{x}^{n}\right)  }[/math]
    [math]\displaystyle{  x^{n+1} \leftarrow \text{prox}_{\tau_n G}\left(x^{n} - \tau_n K^*y^{n+1}\right)  }[/math]
    [math]\displaystyle{  \theta_n \leftarrow \frac{1}{\sqrt{1+2\gamma \tau_n}} }[/math]
    [math]\displaystyle{  \tau_{n+1} \leftarrow  \theta_n \tau_n }[/math]
    [math]\displaystyle{  \sigma_{n+1} \leftarrow  \frac{\sigma_n}{\theta_n} }[/math]
    [math]\displaystyle{  \overline{x}^{n+1} \leftarrow x^{n+1} + \theta_n\left( x^{n+1} -x^{n}\right) }[/math]
    [math]\displaystyle{ k\leftarrow k+1 }[/math]
end do
  • "←" denotes assignment. For instance, "largestitem" means that the value of largest changes to the value of item.
  • "return" terminates the algorithm and outputs the following value.

Moreover, the convergence of the algorithm slows down when [math]\displaystyle{ L }[/math], the norm of the operator [math]\displaystyle{ K }[/math], cannot be estimated easily or might be very large. Choosing proper preconditioners [math]\displaystyle{ T }[/math] and [math]\displaystyle{ \Sigma }[/math], modifying the proximal operator with the introduction of the induced norm through the operators [math]\displaystyle{ T }[/math] and [math]\displaystyle{ \Sigma }[/math], the convergence of the proposed preconditioned algorithm will be ensured.[10]

Application

Denoising example
Fishing boat image original
Original test image
Fishing boat GIF with noise
Application of the Chambolle-Pock algorithm to the test image with noise.

A typical application of this algorithm is in the image denoising framework, based on total variation.[3] It operates on the concept that signals containing excessive and potentially erroneous details exhibit a high total variation, which represents the integral of the absolute value gradient of the image.[3] By adhering to this principle, the process aims to decrease the total variation of the signal while maintaining its similarity to the original signal, effectively eliminating unwanted details while preserving crucial features like edges. In the classical bi-dimensional discrete setting,[11] consider [math]\displaystyle{ \mathcal{X} = \mathbb{R}^{NM} }[/math], where an element [math]\displaystyle{ u\in\mathcal{X} }[/math] represents an image with the pixels values collocated in a Cartesian grid [math]\displaystyle{ N\times M }[/math].[1]

Define the inner product on [math]\displaystyle{ \mathcal{X} }[/math] as[1]

[math]\displaystyle{ \langle u, v\rangle_{\mathcal{X}} = \sum_{i,j} u_{i,j}v_{i,j},\quad u,v \in \mathcal{X} }[/math]

that induces an [math]\displaystyle{ L^2 }[/math] norm on [math]\displaystyle{ \mathcal{X} }[/math], denoted as [math]\displaystyle{ \lVert \, \cdot \, \rVert_2 }[/math].[1]

Hence, the gradient of [math]\displaystyle{ u }[/math] is computed with the standard finite differences,

[math]\displaystyle{ \left(\nabla u \right)_{i,j} = \left( \begin{aligned} \left(\nabla u \right)^1_{i,j}\\ \left(\nabla u \right)^2_{i,j} \end{aligned} \right) }[/math]

which is an element of the space [math]\displaystyle{ \mathcal{Y}=\mathcal{X}\times \mathcal{X} }[/math], where[1]

[math]\displaystyle{ \begin{align} & \left( \nabla u \right)_{i,j}^1 = \left\{ \begin{aligned} &\frac{u_{i+1,j}-u_{i,j}}{h} &\text{ if } i\lt M\\ &0 &\text{ if } i=M \end{aligned} \right. ,\\ & \left( \nabla u \right)_{i,j}^2 = \left\{ \begin{aligned} &\frac{u_{i,j+1}-u_{i,j}}{h} &\text{ if } j\lt N\\ &0 &\text{ if } j=N \end{aligned} \right. \end{align} }[/math]

On [math]\displaystyle{ \mathcal{Y} }[/math] is defined an [math]\displaystyle{ L^1- }[/math] based norm as[1]

[math]\displaystyle{ \lVert p \rVert_1 = \sum_{i,j} \sqrt{\left(p_{i,j}^1\right)^2 + \left(p_{i,j}^2\right)^2}, \quad p\in \mathcal{Y}. }[/math]

Then, the primal problem of the ROF model, proposed by Rudin, Osher, and Fatemi,[12] is given by[1]

[math]\displaystyle{ h^2 \min_{u\in \mathcal{X}} \lVert \nabla u \rVert_1 + \frac{\lambda}{2} \lVert u-g\rVert^2_2 }[/math]

where [math]\displaystyle{ u \in \mathcal{X} }[/math] is the unknown solution and [math]\displaystyle{ g \in \mathcal{X} }[/math] the given noisy data, instead [math]\displaystyle{ \lambda }[/math] describes the trade-off between regularization and data fitting.[1]

The primal-dual formulation of the ROF problem is formulated as follow[1]

[math]\displaystyle{ \min_{u\in \mathcal{X}}\max_{p\in \mathcal{Y}} -\langle u, \text{div}\, p\rangle_{\mathcal{X}} + \frac{\lambda}{2} \lVert u-g\rVert^2_2 - \delta_P(p) }[/math]

where the indicator function is defined as[1]

[math]\displaystyle{ \delta_P(p) = \left\{ \begin{aligned} &0, & \text{if } p \in P\\ &+\infty,& \text{if } p \notin P \end{aligned} \right. }[/math]

on the convex set [math]\displaystyle{ P = \left\{ p\in \mathcal{Y}\, : \, \max_{i,j}\sqrt{\left(p_{i,j}^1\right)^2 + \left(p_{i,j}^2\right)^2} \leq 1 \right\}, }[/math] which can be seen as [math]\displaystyle{ L^\infty }[/math] unitary balls with respect to the defined norm on [math]\displaystyle{ \mathcal{Y} }[/math].[1]


Observe that the functions involved in the stated primal-dual formulation are simple, since their proximal operator can be easily computed[1][math]\displaystyle{ \begin{align} p &= \text{prox}_{\sigma F^*}(\tilde{p}) &\iff p_{i,j} &= \frac{\tilde{p}_{i,j}}{\max\{1,| \tilde{p}_{i,j}| \}}\\ u &= \text{prox}_{\tau G}(\tilde{u}) &\iff u_{i,j} &= \frac{ \tilde{u}_{i,j}+\tau\lambda g_{i,j}}{1+\tau \lambda} \end{align} }[/math]The image total-variation denoising problem can be also treated with other algorithms[13] such as the alternating direction method of multipliers (ADMM),[14] projected (sub)-gradient[15] or fast iterative shrinkage thresholding.[16]

Implementation

  • The Manopt.jl[17] package implements the algorithm in Julia
  • Gabriel Peyré implements the algorithm in MATLAB,[note 1] Julia, R and Python[18]
  • In the Operator Discretization Library (ODL),[19] a Python library for inverse problems, chambolle_pock_solver implements the method.

See also

Notes

  1. These codes were used to obtain the images in the article.

References

  1. 1.00 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.10 1.11 1.12 1.13 1.14 1.15 1.16 1.17 1.18 1.19 1.20 1.21 1.22 1.23 1.24 1.25 1.26 Chambolle, Antonin; Pock, Thomas (2011-05-01). "A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging" (in en). Journal of Mathematical Imaging and Vision 40 (1): 120–145. doi:10.1007/s10851-010-0251-1. ISSN 1573-7683. https://doi.org/10.1007/s10851-010-0251-1. 
  2. 2.0 2.1 2.2 Sidky, Emil Y; Jørgensen, Jakob H; Pan, Xiaochuan (2012-05-21). "Convex optimization problem prototyping for image reconstruction in computed tomography with the Chambolle–Pock algorithm". Physics in Medicine and Biology 57 (10): 3065–3091. doi:10.1088/0031-9155/57/10/3065. ISSN 0031-9155. PMID 22538474. Bibcode2012PMB....57.3065S. 
  3. 3.0 3.1 3.2 3.3 Fang, Faming; Li, Fang; Zeng, Tieyong (2014-03-13). "Single Image Dehazing and Denoising: A Fast Variational Approach" (in en). SIAM Journal on Imaging Sciences 7 (2): 969–996. doi:10.1137/130919696. ISSN 1936-4954. http://epubs.siam.org/doi/10.1137/130919696. 
  4. 4.0 4.1 Allag, A.; Benammar, A.; Drai, R.; Boutkedjirt, T. (2019-07-01). "Tomographic Image Reconstruction in the Case of Limited Number of X-Ray Projections Using Sinogram Inpainting" (in en). Russian Journal of Nondestructive Testing 55 (7): 542–548. doi:10.1134/S1061830919070027. ISSN 1608-3385. https://doi.org/10.1134/S1061830919070027. 
  5. Pock, Thomas; Cremers, Daniel; Bischof, Horst; Chambolle, Antonin (2009). "An algorithm for minimizing the Mumford-Shah functional". 2009 IEEE 12th International Conference on Computer Vision. pp. 1133–1140. doi:10.1109/ICCV.2009.5459348. ISBN 978-1-4244-4420-5. https://ieeexplore.ieee.org/document/5459348. 
  6. "A Generic Proximal Algorithm for Convex Optimization—Application to Total Variation Minimization". IEEE Signal Processing Letters 21 (8): 985–989. 2014. doi:10.1109/LSP.2014.2322123. ISSN 1070-9908. Bibcode2014ISPL...21..985.. https://ieeexplore.ieee.org/document/6810809. 
  7. 7.0 7.1 Ekeland, Ivar; Témam, Roger (1999) (in en). Convex Analysis and Variational Problems. Society for Industrial and Applied Mathematics. p. 61. doi:10.1137/1.9781611971088. ISBN 978-0-89871-450-0. http://epubs.siam.org/doi/book/10.1137/1.9781611971088. 
  8. Banert, Sebastian; Upadhyaya, Manu; Giselsson, Pontus (2023). "The Chambolle-Pock method converges weakly with [math]\displaystyle{ \theta \gt 1/2 }[/math] and [math]\displaystyle{ \tau \sigma \lVert L \rVert^{2} \lt 4 / ( 1 + 2 \theta ) }[/math]". arXiv:2309.03998 [math.OC].
  9. Uzawa, H. (1958). "Iterative methods for concave programming". in Arrow, K. J.; Hurwicz, L.; Uzawa, H.. Studies in linear and nonlinear programming. Stanford University Press. https://archive.org/details/studiesinlinearn0000arro. 
  10. Pock, Thomas; Chambolle, Antonin (2011-11-06). "Diagonal preconditioning for first order primal-dual algorithms in convex optimization". 2011 International Conference on Computer Vision. pp. 1762–1769. doi:10.1109/ICCV.2011.6126441. ISBN 978-1-4577-1102-2. https://ieeexplore.ieee.org/document/6126441. 
  11. Chambolle, Antonin (2004-01-01). "An Algorithm for Total Variation Minimization and Applications" (in en). Journal of Mathematical Imaging and Vision 20 (1): 89–97. doi:10.1023/B:JMIV.0000011325.36760.1e. ISSN 1573-7683. https://doi.org/10.1023/B:JMIV.0000011325.36760.1e. 
  12. Getreuer, Pascal (2012). "Rudin–Osher–Fatemi Total Variation Denoising using Split Bregman". https://www.ipol.im/pub/art/2012/g-tvd/article_lr.pdf. 
  13. Esser, Ernie; Zhang, Xiaoqun; Chan, Tony F. (2010). "A General Framework for a Class of First Order Primal-Dual Algorithms for Convex Optimization in Imaging Science" (in en). SIAM Journal on Imaging Sciences 3 (4): 1015–1046. doi:10.1137/09076934X. ISSN 1936-4954. http://epubs.siam.org/doi/10.1137/09076934X. 
  14. Lions, P. L.; Mercier, B. (1979). "Splitting Algorithms for the Sum of Two Nonlinear Operators". SIAM Journal on Numerical Analysis 16 (6): 964–979. doi:10.1137/0716071. ISSN 0036-1429. Bibcode1979SJNA...16..964L. https://www.jstor.org/stable/2156649. 
  15. Beck, Amir; Teboulle, Marc (2009). "A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems" (in en). SIAM Journal on Imaging Sciences 2 (1): 183–202. doi:10.1137/080716542. ISSN 1936-4954. http://epubs.siam.org/doi/10.1137/080716542. 
  16. Nestorov, Yu.E.. "A method of solving a convex programming problem with convergence rate [math]\displaystyle{ O\bigl(\frac1{k^2}\bigr) }[/math]". Dokl. Akad. Nauk SSSR 269 (3): 543–547. https://www.mathnet.ru/eng/dan46009. 
  17. "Chambolle-Pock · Manopt.jl" (in en). https://docs.juliahub.com/Manopt/h1Pdc/0.3.8/solvers/ChambollePock.html. 
  18. "Numerical Tours - A Numerical Tour of Data Science". http://www.numerical-tours.com/. 
  19. "Chambolle-Pock solver — odl 0.6.1.dev0 documentation". https://odl.readthedocs.io/guide/chambolle_pock_guide.html#chambolle-pock-guide. 

Further reading

External links

  • EE364b, a Stanford course homepage.