# Line search

__: Optimization algorithm__

**Short description**

In optimization, **line search** is a basic iterative approach to find a local minimum [math]\displaystyle{ \mathbf{x}^* }[/math] of an objective function [math]\displaystyle{ f:\mathbb R^n\to\mathbb R }[/math]. It first finds a descent direction along which the objective function [math]\displaystyle{ f }[/math] will be reduced, and then computes a step size that determines how far [math]\displaystyle{ \mathbf{x} }[/math] should move along that direction. The descent direction can be computed by various methods, such as gradient descent or quasi-Newton method. The step size can be determined either exactly or inexactly.

## One-dimensional line search

Suppose *f* is a one-dimensional function, [math]\displaystyle{ f:\mathbb R\to\mathbb R }[/math], and assume that it is unimodal, that is, contains exactly one local minimum *x** in a given interval [*a*,*z*]. This means that *f* is strictly decreasing in [a,x*] and strictly increasing in [x*,*z*]. There are several ways to find an (approximate) minimum point in this case.^{[1]}^{:{{{1}}}}

### Zero-order methods

Zero-order methods use only function evaluations (i.e., a value oracle) - not derivatives:^{[1]}^{:{{{1}}}}

- Ternary search: pick some two points
*b,c*such that*a*<*b*<*c*<*z*. If f(*b*)≤f(*c*), then x* must be in [*a*,*c*]; if f(*b*)≥(*c*), then x* must be in [*b*,*z*]. In both cases, we can replace the search interval with a smaller one. If we pick*b*,*c*very close to the interval center, then the interval shrinks by ~1/2 at each iteration, but we need two function evaluations per iteration. Therefore, the method has linear convergence with rate [math]\displaystyle{ \sqrt{0.5}\approx 0.71 }[/math]. If we pick b,c such that the partition a,b,c,z has three equal-length intervals, then the interval shrinks by 2/3 at each iteration, so the method has linear convergence with rate [math]\displaystyle{ \sqrt{2/3}\approx 0.82 }[/math]. - Fibonacci search: This is a variant of ternary search in which the points
*b*,*c*are selected based on the Fibonacci sequence. At each iteration, only one function evaluation is needed, since the other point was already an endpoint of a previous interval. Therefore, the method has linear convergence with rate [math]\displaystyle{ 1/ \varphi \approx 0.618 }[/math] . - Golden-section search: This is a variant in which the points
*b*,*c*are selected based on the golden ratio. Again, only one function evaluation is needed in each iteration, and the method has linear convergence with rate [math]\displaystyle{ 1/ \varphi \approx 0.618 }[/math] . This ratio is optimal among the zero-order methods.

Zero-order methods are very general - they do not assume differentiability or even continuity.

### First-order methods

First-order methods assume that *f* is continuously differentiable, and that we can evaluate not only *f* but also its derivative.^{[1]}^{:{{{1}}}}

- The bisection method computes the derivative of
*f*at the center of the interval,*c*: if f'(c)=0, then this is the minimum point; if f'(*c*)>0, then the minimum must be in [*a*,*c*]; if f'(*c*)<0, then the minimum must be in [*c*,*z*]. This method has linear convergence with rate 0.5.

### Curve-fitting methods

Curve-fitting methods try to attain superlinear convergence by assuming that *f* has some analytic form, e.g. a polynomial of finite degree. At each iteration, there is a set of "working points" in which we know the value of *f* (and possibly also its derivative). Based on these points, we can compute a polynomial that fits the known values, and find its minimum analytically. The minimum point becomes a new working point, and we proceed to the next iteration:^{[1]}^{:{{{1}}}}

- Newton's method is a special case of a curve-fitting method, in which the curve is a degree-two polynomial, constructed using the first and second derivatives of
*f*. If the method is started close enough to a non-degenerate local minimum (= with a positive second derivative), then it has quadratic convergence. - Regula falsi is another method that fits the function to a degree-two polynomial, but it uses the first derivative at two points, rather than the first and second derivative at the same point. If the method is started close enough to a non-degenerate local minimum, then it has superlinear convergence of order [math]\displaystyle{ \varphi \approx 1.618 }[/math].
*Cubic fit*fits to a degree-three polynomial, using both the function values and its derivative at the last two points. If the method is started close enough to a non-degenerate local minimum, then it has quadratic convergence.

Curve-fitting methods have superlinear convergence when started close enough to the local minimum, but might converge otherwise. *Safeguarded curve-fitting methods* simultaneously execute a linear-convergence method in parallel to the curve-fitting method. They check in each iteration whether the point found by the curve-fitting method is close enough to the interval maintained by safeguard method; if it is not, then the safeguard method is used to compute the next iterate.^{[1]}^{:{{{1}}}}

## Multi-dimensional line search

In general, we have a multi-dimensional objective function [math]\displaystyle{ f:\mathbb R^n\to\mathbb R }[/math]. The line-search method first finds a descent direction along which the objective function [math]\displaystyle{ f }[/math] will be reduced, and then computes a step size that determines how far [math]\displaystyle{ \mathbf{x} }[/math] should move along that direction. The descent direction can be computed by various methods, such as gradient descent or quasi-Newton method. The step size can be determined either exactly or inexactly. Here is an example gradient method that uses a line search in step 5:

- Set iteration counter [math]\displaystyle{ k=0 }[/math] and make an initial guess [math]\displaystyle{ \mathbf{x}_0 }[/math] for the minimum. Pick [math]\displaystyle{ \epsilon }[/math] a tolerance.
- Loop:
- Compute a descent direction [math]\displaystyle{ \mathbf{p}_k }[/math].
- Define a one-dimensional function [math]\displaystyle{ h(\alpha_k)=f(\mathbf{x}_k+\alpha_k\mathbf{p}_k) }[/math], representing the function value on the descent direction given the step-size.
- Find an [math]\displaystyle{ \displaystyle \alpha_k }[/math] that minimizes [math]\displaystyle{ h }[/math] over [math]\displaystyle{ \alpha_k\in\mathbb R_+ }[/math].
- Update [math]\displaystyle{ \mathbf{x}_{k+1}=\mathbf{x}_k+\alpha_k\mathbf{p}_k }[/math], and [math]\displaystyle{ k=k+1 }[/math]

- Until [math]\displaystyle{ \|\nabla f(\mathbf{x}_{k+1})\|\lt \epsilon }[/math]

At the line search step (5), the algorithm may minimize *h* *exactly*, by solving [math]\displaystyle{ h'(\alpha_k)=0 }[/math], or *approximately*, by using one of the one-dimensional line-search methods mentioned above. It can also be solved *loosely*, by asking for a sufficient decrease in *h* that does not necessarily approximate the optimum. One example of the former is conjugate gradient method. The latter is called inexact line search and may be performed in a number of ways, such as a backtracking line search or using the Wolfe conditions.

## Overcoming local minima

Like other optimization methods, line search may be combined with simulated annealing to allow it to jump over some local minima.

## See also

- Trust region - a dual approach for finding a local minimum: it first computes a step size, and then determines the descent direction.
- Grid search
- Learning rate
- Pattern search (optimization)
- Secant method

## References

- ↑
^{1.0}^{1.1}^{1.2}^{1.3}^{1.4}Nemirovsky and Ben-Tal (2023). "Optimization III: Convex Optimization". http://www2.isye.gatech.edu/~nemirovs/OPTIIILN2023Spring.pdf.

## Further reading

- Dennis, J. E. Jr.; Schnabel, Robert B. (1983). "Globally Convergent Modifications of Newton's Method".
*Numerical Methods for Unconstrained Optimization and Nonlinear Equations*. Englewood Cliffs: Prentice-Hall. pp. 111–154. ISBN 0-13-627216-9. - Nocedal, Jorge; Wright, Stephen J. (1999). "Line Search Methods".
*Numerical Optimization*. New York: Springer. pp. 34–63. ISBN 0-387-98793-2. - Sun, Wenyu; Yuan, Ya-Xiang (2006). "Line Search".
*Optimization Theory and Methods: Nonlinear Programming*. New York: Springer. pp. 71–117. ISBN 0-387-24975-3.

Original source: https://en.wikipedia.org/wiki/Line search.
Read more |