Numerical certification

From HandWiki

Numerical certification is the process of verifying the correctness of a candidate solution to a system of equations. In (numerical) computational mathematics, such as numerical algebraic geometry, candidate solutions are computed algorithmically, but there is the possibility that errors have corrupted the candidates. For instance, in addition to the inexactness of input data and candidate solutions, numerical errors or errors in the discretization of the problem may result in corrupted candidate solutions. The goal of numerical certification is to provide a certificate which proves which of these candidates are, indeed, approximate solutions. Methods for certification can be divided into two flavors: a priori certification and a posteriori certification. A posteriori certification confirms the correctness of the final answers (regardless of how they are generated), while a priori certification confirms the correctness of each step of a specific computation. A typical example of a posteriori certification is Smale's alpha theory, while a typical example of a priori certification is interval arithmetic.

Certificates

A certificate for a root is a computational proof of the correctness of a candidate solution. For instance, a certificate may consist of an approximate solution [math]\displaystyle{ x }[/math], a region [math]\displaystyle{ R }[/math] containing [math]\displaystyle{ x }[/math], and a proof that [math]\displaystyle{ R }[/math] contains exactly one solution to the system of equations.

In this context, an a priori numerical certificate is a certificate in the sense of correctness in computer science. On the other hand, an a posteriori numerical certificate operates only on solutions, regardless of how they are computed. Hence, a posteriori certification is different from algorithmic correctness – for an extreme example, an algorithm could randomly generate candidates and attempt to certify them as approximate roots using a posteriori certification.

A posteriori certification methods

There are a variety of methods for a posteriori certification, including

Alpha theory

The cornerstone of Smale's alpha theory is bounding the error for Newton's method. Smale's 1986 work[1] introduced the quantity [math]\displaystyle{ \alpha }[/math], which quantifies the convergence of Newton's method. More precisely, let [math]\displaystyle{ F }[/math] be a system of analytic functions in the variables [math]\displaystyle{ x }[/math], [math]\displaystyle{ D }[/math] the derivative operator, and [math]\displaystyle{ N }[/math] the Newton operator. The quantities [math]\displaystyle{ \beta(f,x) = \|x - N(x)\| = \|Df(x)^{-1} f(x)\| }[/math] [math]\displaystyle{ \gamma(f,x) = \sup_{k\geq 2}\left\| \frac{Df(x)^{-1}D^kf(x)}{k!} \right\|^\frac{1}{k-1} }[/math] and [math]\displaystyle{ \alpha(f,x) = \beta(f,x) \gamma(f,x) }[/math] are used to certify a candidate solution. In particular, if [math]\displaystyle{ \alpha(f,x) \lt \frac{13 - 3\sqrt{17}}{4}, }[/math] then [math]\displaystyle{ x }[/math] is an approximate solution for [math]\displaystyle{ f }[/math], i.e., the candidate is in the domain of quadratic convergence for Newton's method. In other words, if this inequality holds, then there is a root [math]\displaystyle{ x^\ast }[/math] of [math]\displaystyle{ F }[/math] so that iterates of the Newton operator converge as [math]\displaystyle{ \left\|N^k(x)-x^\ast\right\|\leq\frac{1}{2^{2^k-1}}\|x-x^\ast\|. }[/math]

The software package alphaCertified provides an implementation of the alpha test for polynomials by estimating [math]\displaystyle{ \beta }[/math] and [math]\displaystyle{ \gamma }[/math].[2]

Interval Newton and Krawczyck methods

Suppose [math]\displaystyle{ G:\mathbb{R}^n\rightarrow\mathbb{R}^n }[/math] is a function whose fixed points correspond to the roots of [math]\displaystyle{ F }[/math]. For example, the Newton operator has this property. Suppose that [math]\displaystyle{ I }[/math] is a region, then,

  1. If [math]\displaystyle{ G }[/math] maps [math]\displaystyle{ I }[/math] into itself, i.e., [math]\displaystyle{ G(I)\subseteq I }[/math], then by Brouwer fixed-point theorem, [math]\displaystyle{ G }[/math] has at least one fixed point in [math]\displaystyle{ I }[/math], and, hence [math]\displaystyle{ F }[/math] has at least one root in [math]\displaystyle{ I }[/math].
  2. If [math]\displaystyle{ G }[/math] is contractive in a region containing [math]\displaystyle{ I }[/math], then there is at most one root in [math]\displaystyle{ I }[/math].

There are versions of the following methods over the complex numbers, but both the interval arithmetic and conditions must be adjusted to reflect this case.

Interval Newton method

In the univariate case, Newton's method can be directly generalized to certify a root over an interval. For an interval [math]\displaystyle{ J }[/math], let [math]\displaystyle{ m(J) }[/math] be the midpoint of [math]\displaystyle{ J }[/math]. Then, the interval Newton operator applied to [math]\displaystyle{ J }[/math] is

[math]\displaystyle{ IN(J)=m(J)-F(m(J))/F'(J). }[/math]

In practice, any interval containing [math]\displaystyle{ F'(J) }[/math] can be used in this computation. If [math]\displaystyle{ x }[/math] is a root of [math]\displaystyle{ F }[/math], then by the mean value theorem, there is some [math]\displaystyle{ c\in J }[/math] such that [math]\displaystyle{ F(m(J))-F'(c)(m(J)-x)=F(x)=0 }[/math]. In other words, [math]\displaystyle{ F(m(J))=F'(c)(m(J)-x) }[/math]. Since [math]\displaystyle{ F'(J) }[/math] contains the inverse of [math]\displaystyle{ F }[/math] at all points of [math]\displaystyle{ J }[/math], it follows that [math]\displaystyle{ m(J)-x\in F(m(J))/F'(J) }[/math]. Therefore, [math]\displaystyle{ x=m(J)-(m(J)-x)\in IN(J) }[/math].

Furthermore, if [math]\displaystyle{ 0\not\in F'(J) }[/math], then either [math]\displaystyle{ m(J) }[/math] is a root of [math]\displaystyle{ F }[/math] and [math]\displaystyle{ IN(J)=\{m(J)\} }[/math] or [math]\displaystyle{ m(J)\not\in IN(J) }[/math]. Therefore, [math]\displaystyle{ J\cap N(J) }[/math] is at most half the width of [math]\displaystyle{ J }[/math]. Therefore, if there is some root of [math]\displaystyle{ F }[/math] in [math]\displaystyle{ J }[/math], the iterative procedure of replacing [math]\displaystyle{ J }[/math] by [math]\displaystyle{ J\cap IN(J) }[/math] will converge to this root. If, on the other hand, there is no root of [math]\displaystyle{ F }[/math] in [math]\displaystyle{ J }[/math], this iterative procedure will eventually produce an empty interval, a witness to the nonexistence of roots.

See interval Newton method for higher dimensional analogues of this approach.

Krawczyck method

Let [math]\displaystyle{ Y }[/math] be any [math]\displaystyle{ n\times n }[/math] invertible matrix in [math]\displaystyle{ GL(n,\mathbb{R}) }[/math]. Typically, one takes [math]\displaystyle{ Y }[/math] to be an approximation to [math]\displaystyle{ F'(y)^{-1} }[/math]. Then, define the function [math]\displaystyle{ G(x)=x-YF(x). }[/math] We observe that [math]\displaystyle{ x }[/math] is a fixed of [math]\displaystyle{ G }[/math] if and only if [math]\displaystyle{ x }[/math] is a root of [math]\displaystyle{ F }[/math]. Therefore the approach above can be used to identify roots of [math]\displaystyle{ F }[/math]. This approach is similar to a multivariate version of Newton's method, replacing the derivative with the fixed matrix [math]\displaystyle{ Y }[/math].

We observe that if [math]\displaystyle{ J }[/math] is a compact and convex region and [math]\displaystyle{ y\in J }[/math], then, for any [math]\displaystyle{ x\in J }[/math], there exist [math]\displaystyle{ c_1,\dots,c_n\in J }[/math] such that

[math]\displaystyle{ G(y)-G(x)=\begin{bmatrix}\nabla g_1(c_1)^T\\\vdots\\\nabla g_n(c_n)^T\end{bmatrix}(y-x). }[/math]

Let [math]\displaystyle{ G'(J) }[/math] be the Jacobian matrix of [math]\displaystyle{ G }[/math] evaluated on [math]\displaystyle{ J }[/math]. In other words, the entry [math]\displaystyle{ (G'(J))_{ij} }[/math] consists of the image of [math]\displaystyle{ \frac{\partial g_i}{\partial x_j} }[/math] over [math]\displaystyle{ J }[/math]. It then follows that [math]\displaystyle{ G(x)\in G(y)+\nabla G(J)(x-y), }[/math] where the matrix-vector product is computed using interval arithmetic. Then, allowing [math]\displaystyle{ x }[/math] to vary in [math]\displaystyle{ J }[/math], it follows that the image of [math]\displaystyle{ G }[/math] on [math]\displaystyle{ J }[/math] satisfies the following containment: [math]\displaystyle{ G(J)\subset G(y)+G'(J)(J-y), }[/math] where the calculations are, once again, computed using interval arithmetic. Combining this with the formula for [math]\displaystyle{ G }[/math], the result is the Krawczyck operator

[math]\displaystyle{ K_{y,Y}(J)=y-YF(y)+(I-F'(J))(J-y), }[/math]

where [math]\displaystyle{ I }[/math] is the identity matrix.

If [math]\displaystyle{ K_{y,Y}(J)\subset J }[/math], then [math]\displaystyle{ G }[/math] has a fixed point in [math]\displaystyle{ J }[/math], i.e., [math]\displaystyle{ F }[/math] has a root in [math]\displaystyle{ J }[/math]. On the other hand, if the maximum matrix norm using the supremum norm for vectors of all matrices in [math]\displaystyle{ I-F'(J) }[/math] is less than [math]\displaystyle{ 1 }[/math], then [math]\displaystyle{ G }[/math] is contractive within [math]\displaystyle{ J }[/math], so [math]\displaystyle{ G }[/math] has a unique fixed point.

A simpler test, when [math]\displaystyle{ J }[/math] is an axis-aligned parallelepiped, uses [math]\displaystyle{ y=m(J) }[/math], i.e., the midpoint of [math]\displaystyle{ J }[/math]. In this case, there is a unique root of [math]\displaystyle{ F }[/math] if

[math]\displaystyle{ \|K(X)-m(X)\|\lt \frac{w(X)}{2}, }[/math]

where [math]\displaystyle{ w(X) }[/math] is the length of the longest side of [math]\displaystyle{ J }[/math].

Miranda test

  • Miranda test (Yap, Vegter, Sharma)

A priori certification methods

  • Interval arithmetic (Moore, Arb, Mezzarobba)
  • Condition numbers (Beltran–Leykin)

Interval arithmetic

Main page: Interval arithmetic

Interval arithmetic can be used to provide an a priori numerical certificate by computing intervals containing unique solutions. By using intervals instead of plain numeric types during path tracking, resulting candidates are represented by intervals. The candidate solution-interval is itself the certificate, in the sense that the solution is guaranteed to be inside the interval.

Condition numbers

Main page: Condition number

Numerical algebraic geometry solves polynomial systems using homotopy continuation and path tracking methods. By monitoring the condition number for a tracked homotopy at every step, and ensuring that no two solution paths ever intersect, one can compute a numerical certificate along with a solution. This scheme is called a priori path tracking.[3]

Non-certified numerical path tracking relies on heuristic methods for controlling time step size and precision.[4] In contrast, a priori certified path tracking goes beyond heuristics to provide step size control that guarantees that for every step along the path, the current point is within the domain of quadratic convergence for the current path.

References

  1. Smale, Steve (1986). "Newton’s method estimates from data at one point". The merging of disciplines: new directions in pure, applied, and computational mathematics: 185–196. 
  2. Hauenstein, Jonathan; Sottile, Frank (2012). "Algorithm 921: alphaCertified: certifying solutions to polynomial systems". ACM Transactions on Mathematical Software 38 (4): 28. doi:10.1145/2331130.2331136. 
  3. Beltran, Carlos; Leykin, Anton (2012). "Certified numerical homotopy tracking". Experimental Mathematics 21 (1): 69–83. 
  4. Bates, Daniel; Hauenstein, Jonathan; Sommese, Andrew; Wampler, Charles (2009). "Stepsize control for path tracking". Contemporary Mathematics 496 (21).