Coordinate-wise descent method

From HandWiki

One of the methods for minimizing a function of several variables based only on the values of the function to be minimized. The method is used when the function is not differentiable or if a calculation of the derivatives involves a large amount of computation. Below the use of the coordinate-wise descent method for minimizing a function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c0264601.png" /> on a set

<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c0264602.png" />

where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c0264603.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c0264604.png" /> are given numbers, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c0264605.png" />, is described and the cases where all or some of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c0264606.png" /> or <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c0264607.png" /> are not excluded. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c0264608.png" /> be the coordinate vector in which the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c0264609.png" />-th coordinate is equal to 1 and the other coordinates are equal to zero. One specifies an initial approximation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646010.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646011.png" />. Assume that the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646012.png" />-th approximation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646013.png" /> is known and that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646014.png" />, for some <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646015.png" />. Take <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646016.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646017.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646018.png" /> is the integer part of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646019.png" />. Then

<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646020.png" />
<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646021.png" />
<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646022.png" />

i.e. one performs a cyclic selection of the coordinate vectors <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646023.png" />. First one checks if the conditions

<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646024.png" /> (1)

are fulfilled. If (1) is fulfilled, one sets <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646025.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646026.png" />. If on the other hand (1) is not fulfilled, one checks the condition

<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646027.png" /> (2)

If (2) is fulfilled, one sets <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646028.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646029.png" />. If conditions (1) and (2) are both not fulfilled, one sets <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646030.png" />,

<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646031.png" /> (3)

where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646032.png" /> is the parameter of the method, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646033.png" />. Condition (3) means that if at least one of the conditions (1) and (2) is fulfilled in a single cycle of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646034.png" /> iterations involving a selection of all coordinate vectors <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646035.png" /> with step <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646036.png" />, then the length of the step <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646037.png" /> is not reduced and is retained at least during the following cycle of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646038.png" /> iterations; if on the other hand neither (1) nor (2) is ever fulfilled in the subsequent <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646039.png" /> iterations, the step <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646040.png" /> is reduced.

Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646041.png" /> be convex and continuously differentiable on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646042.png" />, let the set <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646043.png" /> be bounded and let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646044.png" /> be a positive number. Then the methods (1)–(3) converge, i.e.

<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646045.png" />

and the sequence <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646046.png" /> converges to the set of minima for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646047.png" /> in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646048.png" />. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646049.png" /> is not differentiable on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026460/c02646050.png" />, the method need not converge [1], [2].

References

[1] F.P. Vasil'ev, "Numerical methods for solving extremum problems" , Moscow (1980) (In Russian)
[2] V.G. Karmanov, "Mathematical programming" , Moscow (1975) (In Russian)


Comments

References

[a1] W.I. Zangwill, "Nonlinear programming: a unified approach" , Prentice-Hall (1969)