Matrix sign function

From HandWiki
Short description: Generalization of signum function to matrices

In mathematics, the matrix sign function is a matrix function on square matrices analogous to the complex sign function.[1]

It was introduced by J.D. Roberts in 1971 as a tool for model reduction and for solving Lyapunov and Algebraic Riccati equation in a technical report of Cambridge University, which was later published in a journal in 1980.[2][3]

Definition

The matrix sign function is a generalization of the complex signum function

[math]\displaystyle{ \operatorname{csgn}(z)= \begin{cases} 1 & \text{if } \mathrm{Re}(z) \gt 0, \\ -1 & \text{if } \mathrm{Re}(z) \lt 0, \end{cases} }[/math]

to the matrix valued analogue [math]\displaystyle{ \operatorname{csgn}(A) }[/math]. Although the sign function is not analytic, the matrix function is well defined for all matrices that have no eigenvalue on the imaginary axis, see for example the Jordan-form-based definition (where the derivatives are all zero).

Properties

Theorem: Let [math]\displaystyle{ A\in\C^{n\times n} }[/math], then [math]\displaystyle{ \operatorname{csgn}(A)^2 = I }[/math].[1]

Theorem: Let [math]\displaystyle{ A\in\C^{n\times n} }[/math], then [math]\displaystyle{ \operatorname{csgn}(A) }[/math] is diagonalizable and has eigenvalues that are [math]\displaystyle{ \pm 1 }[/math].[1]

Theorem: Let [math]\displaystyle{ A\in\C^{n\times n} }[/math], then [math]\displaystyle{ (I+\operatorname{csgn}(A))/2 }[/math] is a projector onto the invariant subspace associated with the eigenvalues in the right-half plane, and analogously for [math]\displaystyle{ (I-\operatorname{csgn}(A))/2 }[/math] and the left-half plane.[1]

Theorem: Let [math]\displaystyle{ A\in\C^{n\times n} }[/math], and [math]\displaystyle{ A = P\begin{bmatrix}J_+ & 0 \\ 0 & J_-\end{bmatrix}P^{-1} }[/math] be a Jordan decomposition such that [math]\displaystyle{ J_+ }[/math] corresponds to eigenvalues with positive real part and [math]\displaystyle{ J_- }[/math] to eigenvalue with negative real part. Then [math]\displaystyle{ \operatorname{csgn}(A) = P\begin{bmatrix}I_+ & 0 \\ 0 & -I_-\end{bmatrix}P^{-1} }[/math], where [math]\displaystyle{ I_+ }[/math] and [math]\displaystyle{ I_- }[/math] are identity matrices of sizes corresponding to [math]\displaystyle{ J_+ }[/math] and [math]\displaystyle{ J_- }[/math], respectively.[1]

Computational methods

The function can be computed with generic methods for matrix functions, but there are also specialized methods.

Newton iteration

The Newton iteration can be derived by observing that [math]\displaystyle{ \operatorname{csgn}(x) = \sqrt{x^2}/x }[/math], which in terms of matrices can be written as [math]\displaystyle{ \operatorname{csgn}(A) = A^{-1}\sqrt{A^2} }[/math], where we use the matrix square root. If we apply the Babylonian method to compute the square root of the matrix [math]\displaystyle{ A^2 }[/math], that is, the iteration [math]\displaystyle{ X_{k+1} = \frac{1}{2} \left(X_k + A X_k^{-1}\right) }[/math], and define the new iterate [math]\displaystyle{ Z_k = A^{-1}X_k }[/math], we arrive at the iteration

[math]\displaystyle{ Z_{k+1} = \frac{1}{2}\left(Z_k + Z_k^{-1}\right) }[/math],

where typically [math]\displaystyle{ Z_0=A }[/math]. Convergence is global, and locally it is quadratic.[1][2]

The Newton iteration uses the explicit inverse of the iterates [math]\displaystyle{ Z_k }[/math].

Newton–Schulz iteration

To avoid the need of an explicit inverse used in the Newton iteration, the inverse can be approximated with one step of the Newton iteration for the inverse, [math]\displaystyle{ Z_k^{-1}\approx Z_k\left(2I-Z_k^2\right) }[/math], derived by Schulz(de) in 1933.[4] Substituting this approximation into the previous method, the new method becomes

[math]\displaystyle{ Z_{k+1} = \frac{1}{2}Z_k\left(3I - Z_k^2\right) }[/math].

Convergence is (still) quadratic, but only local (guaranteed for [math]\displaystyle{ \|I-A^2\|\lt 1 }[/math]).[1]

Applications

Solutions of Sylvester equations

Theorem:[2][3] Let [math]\displaystyle{ A,B,C\in\R^{n\times n} }[/math] and assume that [math]\displaystyle{ A }[/math] and [math]\displaystyle{ B }[/math] are stable, then the unique solution to the Sylvester equation, [math]\displaystyle{ AX +XB = C }[/math], is given by [math]\displaystyle{ X }[/math] such that

[math]\displaystyle{ \begin{bmatrix} -I &2X\\ 0 & I \end{bmatrix} = \operatorname{csgn} \left( \begin{bmatrix} A &-C\\ 0 & -B \end{bmatrix} \right). }[/math]

Proof sketch: The result follows from the similarity transform

[math]\displaystyle{ \begin{bmatrix} A &-C\\ 0 & -B \end{bmatrix} = \begin{bmatrix} I & X \\ 0 & I \end{bmatrix} \begin{bmatrix} A & 0\\ 0 & -B \end{bmatrix} \begin{bmatrix} I & X \\ 0 & I \end{bmatrix}^{-1}, }[/math]

since

[math]\displaystyle{ \operatorname{csgn} \left( \begin{bmatrix} A &-C\\ 0 & -B \end{bmatrix} \right) = \begin{bmatrix} I & X \\ 0 & I \end{bmatrix} \begin{bmatrix} I & 0\\ 0 & -I \end{bmatrix} \begin{bmatrix} I & -X \\ 0 & I \end{bmatrix}, }[/math]

due to the stability of [math]\displaystyle{ A }[/math] and [math]\displaystyle{ B }[/math].

The theorem is, naturally, also applicable to the Lyapunov equation. However, due to the structure the Newton iteration simplifies to only involving inverses of [math]\displaystyle{ A }[/math] and [math]\displaystyle{ A^T }[/math].

Solutions of algebraic Riccati equations

There is a similar result applicable to the algebraic Riccati equation, [math]\displaystyle{ A^H P + P A - P F P + Q = 0 }[/math].[1][2] Define [math]\displaystyle{ V,W\in\Complex^{2n\times n} }[/math] as

[math]\displaystyle{ \begin{bmatrix} V & W \end{bmatrix} = \operatorname{csgn} \left( \begin{bmatrix} A^H &Q\\ F & -A \end{bmatrix} \right) - \begin{bmatrix} I &0\\ 0 & I \end{bmatrix}. }[/math]

Under the assumption that [math]\displaystyle{ F,Q\in\Complex^{n\times n} }[/math] are Hermitian and there exists a unique stabilizing solution, in the sense that [math]\displaystyle{ A-FP }[/math] is stable, that solution is given by the over-determined, but consistent, linear system

[math]\displaystyle{ VP=-W. }[/math]

Proof sketch: The similarity transform

[math]\displaystyle{ \begin{bmatrix} A^H &Q\\ F & -A \end{bmatrix} = \begin{bmatrix} P & -I \\ I & 0 \end{bmatrix} \begin{bmatrix} (-A-FP) & -F\\ 0 & (A-FP) \end{bmatrix} \begin{bmatrix} P & -I \\ I & 0 \end{bmatrix}^{-1}, }[/math]

and the stability of [math]\displaystyle{ A-FP }[/math] implies that

[math]\displaystyle{ \left( \operatorname{csgn} \left( \begin{bmatrix} A^H &Q\\ F & -A \end{bmatrix} \right) - \begin{bmatrix} I &0\\ 0 & I \end{bmatrix} \right) \begin{bmatrix} X & -I \\ I & 0 \end{bmatrix} = \begin{bmatrix} X & -I\\ I & 0 \end{bmatrix} \begin{bmatrix} 0 & Y \\ 0 & -2I \end{bmatrix}, }[/math]

for some matrix [math]\displaystyle{ Y\in\Complex^{n\times n} }[/math].

Computations of matrix square-root

The Denman–Beavers iteration for the square root of a matrix can be derived from the Newton iteration for the matrix sign function by noticing that [math]\displaystyle{ A - PIP=0 }[/math] is a degenerate algebraic Riccati equation[3] and by definition a solution [math]\displaystyle{ P }[/math] is the square root of [math]\displaystyle{ A }[/math].

References

  1. 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 Higham, Nicholas J. (2008). Functions of matrices : theory and computation. Society for Industrial and Applied Mathematics. Philadelphia, Pa.: Society for Industrial and Applied Mathematics (SIAM, 3600 Market Street, Floor 6, Philadelphia, PA 19104). ISBN 978-0-89871-777-8. OCLC 693957820. https://www.worldcat.org/oclc/693957820. 
  2. 2.0 2.1 2.2 2.3 Roberts, J. D. (October 1980). "Linear model reduction and solution of the algebraic Riccati equation by use of the sign function" (in en). International Journal of Control 32 (4): 677–687. doi:10.1080/00207178008922881. ISSN 0020-7179. https://www.tandfonline.com/doi/full/10.1080/00207178008922881. 
  3. 3.0 3.1 3.2 Denman, Eugene D.; Beavers, Alex N. (1976). "The matrix sign function and computations in systems" (in en). Applied Mathematics and Computation 2 (1): 63–94. doi:10.1016/0096-3003(76)90020-5. ISSN 0096-3003. https://www.sciencedirect.com/science/article/abs/pii/0096300376900205. 
  4. Schulz, Günther (1933). "Iterative Berechung der reziproken Matrix" (in en). ZAMM - Journal of Applied Mathematics and Mechanics / Zeitschrift für Angewandte Mathematik und Mechanik 13 (1): 57–59. doi:10.1002/zamm.19330130111. ISSN 1521-4001. Bibcode1933ZaMM...13...57S. https://onlinelibrary.wiley.com/doi/abs/10.1002/zamm.19330130111.