Sylvester's criterion

From HandWiki
Short description: Criterion of positive definiteness of a matrix

In mathematics, Sylvester’s criterion is a necessary and sufficient criterion to determine whether a Hermitian matrix is positive-definite. It is named after James Joseph Sylvester.

Sylvester's criterion states that a n × n Hermitian matrix M is positive-definite if and only if all the following matrices have a positive determinant:

  • the upper left 1-by-1 corner of M,
  • the upper left 2-by-2 corner of M,
  • the upper left 3-by-3 corner of M,
  • [math]\displaystyle{ {}\quad\vdots }[/math]
  • M itself.

In other words, all of the leading principal minors must be positive. By using appropriate permutations of rows and columns of M, it can also be shown that the positivity of any nested sequence of n principal minors of M is equivalent to M being positive-definite.[1]

An analogous theorem holds for characterizing positive-semidefinite Hermitian matrices, except that it is no longer sufficient to consider only the leading principal minors: a Hermitian matrix M is positive-semidefinite if and only if all principal minors of M are nonnegative.[2][3]

Simple proof for special case

Suppose [math]\displaystyle{ M_n }[/math]is [math]\displaystyle{ n\times n }[/math] Hermitian matrix [math]\displaystyle{ M_n^{\dagger}=M_n }[/math]. Let [math]\displaystyle{ M_k,k=1,\ldots n }[/math] be the principal minor matrices, i.e. the [math]\displaystyle{ k\times k }[/math] upper left corner matrices. It will be shown that if [math]\displaystyle{ M_n }[/math] is positive definite, then the principal minors are positive; that is, [math]\displaystyle{ \det M_k\gt 0 }[/math] for all [math]\displaystyle{ k }[/math].

[math]\displaystyle{ M_k }[/math] is positive definite. Indeed, choosing

[math]\displaystyle{ x=\left( \begin{array}{c} x_1\\ \vdots\\ x_k\\ 0\\ \vdots\\0\end{array} \right) = \left( \begin{array}{c} \vec{x}\\ 0\\ \vdots\\ 0 \end{array} \right) }[/math]

we can notice that [math]\displaystyle{ 0\lt x^{\dagger} M_n x=\vec{x}^{\dagger}M_k\vec{x}. }[/math] Equivalently, the eigenvalues of [math]\displaystyle{ M_k }[/math] are positive, and this implies that [math]\displaystyle{ \det M_k\gt 0 }[/math] since the determinant is the product of the eigenvalues.

To prove the reverse implication, we use induction. The general form of an [math]\displaystyle{ (n+1) \times (n+1) }[/math] Hermitian matrix is

[math]\displaystyle{ M_{n+1}= \left( \begin{array}{cc}M_n&\vec{v}\\ \vec{v}^{\dagger}&d\end{array}\right) \qquad (*) }[/math],

where [math]\displaystyle{ M_n }[/math] is an [math]\displaystyle{ n \times n }[/math] Hermitian matrix, [math]\displaystyle{ \vec{v} }[/math] is a vector and [math]\displaystyle{ d }[/math] is a real constant.

Suppose the criterion holds for [math]\displaystyle{ M_n }[/math]. Assuming that all the principal minors of [math]\displaystyle{ M_{n+1} }[/math] are positive implies that [math]\displaystyle{ \det M_{n+1}\gt 0 }[/math], [math]\displaystyle{ \det M_n\gt 0 }[/math], and that [math]\displaystyle{ M_n }[/math] is positive definite by the inductive hypothesis. Denote

[math]\displaystyle{ x=\left( \begin{array}{c}\vec{x}\\ x_{n+1}\end{array} \right) }[/math]

then

[math]\displaystyle{ x^{\dagger}M_{n+1}x=\vec{x}^{\dagger}M_n\vec{x}+x_{n+1}\vec{x}^{\dagger}\vec{v}+\bar{x}_{n+1}\vec{v}^{\dagger}\vec{x}+d|x_{n+1}|^2 }[/math]

By completing the squares, this last expression is equal to

[math]\displaystyle{ (\vec{x}^{\dagger}+\vec{v}^{\dagger}M_n^{-1}\bar{x}_{n+1})M_n(\vec{x}+x_{n+1}M_n^{-1}\vec{v})-|x_{n+1}|^2\vec{v}^{\dagger}M_n^{-1}\vec{v}+d|x_{n+1}|^2 }[/math]
[math]\displaystyle{ =(\vec{x}+\vec{c})^{\dagger}M_n(\vec{x}+\vec{c})+|x_{n+1}|^2(d-\vec{v}^{\dagger}M_n^{-1}\vec{v}) }[/math]

where [math]\displaystyle{ \vec{c}=x_{n+1}M_n^{-1}\vec{v} }[/math] (note that [math]\displaystyle{ M_n^{-1} }[/math] exists because the eigenvalues of [math]\displaystyle{ M_n }[/math] are all positive.) The first term is positive by the inductive hypothesis. We now examine the sign of the second term. By using the block matrix determinant formula

[math]\displaystyle{ \det \left(\begin{array}{cc}A&B\\C&D\end{array} \right)=\det A\det(D-CA^{-1}B) }[/math]

on [math]\displaystyle{ (*) }[/math] we obtain

[math]\displaystyle{ \det M_{n+1}=\det M_n(d-\vec{v}^{\dagger}M_n^{-1}\vec{v})\gt 0 }[/math], which implies [math]\displaystyle{ d-\vec{v}^{\dagger}M_n^{-1}\vec{v}\gt 0 }[/math].

Consequently, [math]\displaystyle{ x^{\dagger}M_{n+1}x\gt 0. }[/math]

Proof for general case

The previous proof is only for nonsingular Hermitian matrix with coefficients in [math]\displaystyle{ \mathbb{R} }[/math], and therefore only for nonsingular real-symmetric matrices.

Theorem I: A real-symmetric matrix A has nonnegative eigenvalues if and only if A can be factored as A = BTB, and all eigenvalues are positive if and only if B is nonsingular.[4]

Proof:

Forward implication: If ARn×n is symmetric, then, by the spectral theorem, there is an orthogonal matrix P such that A = PDPT , where D = diag (λ1, λ2, . . . , λn) is real diagonal matrix with entries being eigenvalues of A and P is such that its columns are the eigenvectors of A. If λi ≥ 0 for each i, then D1/2 exists, so A = PDPT = PD1/2D1/2PT = BTB for B = D1/2PT, and λi > 0 for each i if and only if B is nonsingular.

Reverse implication: Conversely, if A can be factored as A = BTB, then all eigenvalues of A are nonnegative because for any eigenpair (λx):

[math]\displaystyle{ \lambda = {\frac{x^TAx}{x^Tx}}={\frac{x^TB^TBx}{x^Tx}}={\frac{\|Bx\|^2}{\|x\|^2}} \ge 0. }[/math]

Theorem II (The Cholesky decomposition): The symmetric matrix A possesses positive pivots if and only if A can be uniquely factored as A = RTR, where R is an upper-triangular matrix with positive diagonal entries. This is known as the Cholesky decomposition of A, and R is called the Cholesky factor of A.[5]

Proof:

Forward implication: If A possesses positive pivots (therefore A possesses an LU factorization: A = L·U'), then, it has an LDU factorization A = LDU = LDLT in which D = diag(u11, u22, . . . , unn) is the diagonal matrix containing the pivots uii > 0.

[math]\displaystyle{ \begin{align} \mathbf{A} & =LU'= \begin{bmatrix} 1 & 0 & \cdots & 0\\ \ell_{12} & 1 & \cdots & 0 \\ \vdots & \vdots & & \vdots \\ \ell_{1n} & \ell_{2n} & \cdots & 1 \end{bmatrix} \begin{bmatrix} u_{11} & u_{12} & \cdots & u_{1n}\\ 0 & u_{22} & \cdots & u_{2n} \\ \vdots & \vdots & & \vdots \\ 0 & 0 & \cdots & u_{nn} \end{bmatrix} \\[8pt] & = LDU= \begin{bmatrix} 1 & 0 & \cdots & 0\\ \ell_{12} & 1 & \cdots & 0 \\ \vdots & \vdots & & \vdots \\ \ell_{1n} & \ell_{2n} & \cdots & 1 \end{bmatrix} \begin{bmatrix} u_{11} & 0 & \cdots & 0\\ 0 & u_{22} & \cdots & 0 \\ \vdots & \vdots & & \vdots \\ 0 & 0 & \cdots & u_{nn} \end{bmatrix} \begin{bmatrix} 1 & u_{12}/u_{11} & \cdots & u_{1n}/u_{11} \\ 0 & 1 & \cdots & u_{2n}/u_{22} \\ \vdots & \vdots & & \vdots \\ 0 & 0 & \cdots & 1 \end{bmatrix} \end{align} }[/math]

By a uniqueness property of the LDU decomposition, the symmetry of A yields: U = LT, consequently A = LDU = LDLT. Setting R = D1/2LT where D1/2 = diag([math]\displaystyle{ \scriptstyle\sqrt{u_{11}},\scriptstyle\sqrt{u_{22}},\ldots,\scriptstyle\sqrt{u_{11}} }[/math]) yields the desired factorization, because A = LD1/2D1/2LTRTR, and R is upper triangular with positive diagonal entries.

Reverse implication: Conversely, if A = RRT, where R is lower triangular with a positive diagonal, then factoring the diagonal entries out of R is as follows:

[math]\displaystyle{ \mathbf{R} =LD= \begin{bmatrix} 1 & 0 & \cdots & 0\\ r_{12}/r_{11} & 1 & \cdots & 0 \\ \vdots & \vdots & & \vdots \\ r_{1n}/r_{11} & r_{2n}/r_{22} & \cdots & 1 \end{bmatrix} \begin{bmatrix} r_{11} & 0 & \cdots & 0\\ 0 & r_{22} & \cdots & 0 \\ \vdots & \vdots & & \vdots \\ 0 & 0 & \cdots & r_{nn} \end{bmatrix}. }[/math]

R = LD, where L is a lower triangular matrix with a unit diagonal and D is the diagonal matrix whose diagonal entries are the rii ’s. Hence D has a positive diagonal and hence D is non-singular. Hence D2 is a non-singular diagonal matrix. Also, LT is an upper triangular matrix with a unit diagonal. Consequently, A = LD2LT is the LDU factorization for A, and thus the pivots must be positive because they are the diagonal entries in D2.

Uniqueness of the Cholesky decomposition: If we have another Cholesky decomposition A = R1R1T of A, where R1 is lower triangular with a positive diagonal, then similar to the above we may write R1 = L1D1, where L1 is a lower triangular matrix with a unit diagonal and D1 is a diagonal matrix whose diagonal entries are the same as the corresponding diagonal entries of R1. Consequently, A = L1D12L1T is an LDU factorization for A. By the uniqueness of the LDU factorization of A, we have L1 = L and D12 = D2. As both D1 and D are diagonal matrices with positive diagonal entries, we have D1 = D. Hence R1 = L1D1 = LD = R. Hence A has a unique Cholesky decomposition.

Theorem III: Let Ak be the k × k leading principal submatrix of An×n. If A has an LU factorization A = LU, where L is a lower triangular matrix with a unit diagonal, then det(Ak) = u11u22 · · · ukk, and the k-th pivot is ukk = det(A1) = a11 for k = 1, ukk = det(Ak)/det(Ak−1) for k = 2, 3, . . . , n, where ukk is the (k, k)-th entry of U for all k = 1, 2, . . . , n.[6]

Combining Theorem II with Theorem III yields:

Statement I: If the symmetric matrix A can be factored as A=RTR where R is an upper-triangular matrix with positive diagonal entries, then all the pivots of A are positive (by Theorem II), therefore all the leading principal minors of A are positive (by Theorem III).

Statement II: If the nonsingular n × n symmetric matrix A can be factored as [math]\displaystyle{ A=B^TB }[/math], then the QR decomposition (closely related to Gram-Schmidt process) of B (B = QR) yields: [math]\displaystyle{ A=B^TB=R^TQ^TQR=R^TR }[/math], where Q is orthogonal matrix and R is upper triangular matrix.

As A is non-singular and [math]\displaystyle{ A=R^TR }[/math], it follows that all the diagonal entries of R are non-zero. Let rjj be the (j, j)-th entry of R for all j = 1, 2, . . . , n. Then rjj ≠ 0 for all j = 1, 2, . . . , n.

Let F be a diagonal matrix, and let fjj be the (j, j)-th entry of F for all j = 1, 2, . . . , n. For all j = 1, 2, . . . , n, we set fjj = 1 if rjj > 0, and we set fjj = -1 if rjj < 0. Then [math]\displaystyle{ F^TF=I_n }[/math], the n × n identity matrix.

Let S=FR. Then S is an upper-triangular matrix with all diagonal entries being positive. Hence we have [math]\displaystyle{ A=R^TR=R^TF^TFR=S^TS }[/math], for some upper-triangular matrix S with all diagonal entries being positive.

Namely Statement II requires the non-singularity of the symmetric matrix A.

Combining Theorem I with Statement I and Statement II yields:

Statement III: If the real-symmetric matrix A is positive definite then A possess factorization of the form A = BTB, where B is nonsingular (Theorem I), the expression A = BTB implies that A possess factorization of the form A = RTR where R is an upper-triangular matrix with positive diagonal entries (Statement II), therefore all the leading principal minors of A are positive (Statement I).

In other words, Statement III proves the "only if" part of Sylvester's Criterion for non-singular real-symmetric matrices.

Sylvester's Criterion: The real-symmetric matrix A is positive definite if and only if all the leading principal minors of A are positive.

Proof for the case of positive semidefinite matrices

Let [math]\displaystyle{ M_n }[/math] be an n x n Hermitian matrix. Suppose [math]\displaystyle{ M_n }[/math] is semidefinite. Essentially the same proof as for the case that [math]\displaystyle{ M_n }[/math] is strictly positive definite shows that all principal minors (not necessarily the leading principal minors) are non-negative.

For the reverse implication, it suffices to show that if [math]\displaystyle{ M_n }[/math] then for all t>0, all leading principal minors of the Hermitian matrix [math]\displaystyle{ M_n+tI_n }[/math] are strictly positive, where [math]\displaystyle{ I_n }[/math] is the nxn identity matrix. Indeed, from the positive definite case, we would know that the matrices [math]\displaystyle{ M_n+tI_n }[/math] are strictly positive definite. Since the limit of positive definite matrices is always positive semidefinite, we can take [math]\displaystyle{ t \to 0 }[/math] to conclude.

To show this, let [math]\displaystyle{ M_k }[/math] be the kth leading principal submatrix of [math]\displaystyle{ M_n. }[/math] We know that [math]\displaystyle{ q_k(t) = \det(M_k + tI_k) }[/math] is a polynomial in t, related to the characteristic polynomial [math]\displaystyle{ p_{M_k} }[/math] via [math]\displaystyle{ q_k(t) = (-1)^kp_{M_k}(-t). }[/math] We use the identity in Characteristic polynomial to write [math]\displaystyle{ q_k(t) = \sum_{j=0}^k t^{k-j} \operatorname{tr}\left(\textstyle\bigwedge^j M_k\right), }[/math] where [math]\displaystyle{ \operatorname{tr}\left(\bigwedge^j M_k\right) }[/math] is the trace of the jth exterior power of [math]\displaystyle{ M_k. }[/math]

From Minor (linear algebra), we know that the entries in the matrix expansion of [math]\displaystyle{ \bigwedge^j M_k }[/math] (for j > 0) are just the minors of [math]\displaystyle{ M_k. }[/math] In particular, the diagonal entries are the principal minors of [math]\displaystyle{ M_k }[/math], which of course are also principal minors of [math]\displaystyle{ M_n }[/math], and are thus non-negative. Since the trace of a matrix is the sum of the diagonal entries, it follows that [math]\displaystyle{ \operatorname{tr}\left(\textstyle\bigwedge^j M_k\right) \geq 0. }[/math] Thus the coefficient of [math]\displaystyle{ t^{k-j} }[/math] in [math]\displaystyle{ q_k(t) }[/math] is non-negative for all j > 0. For j = 0, it is clear that the coefficient is 1. In particular, [math]\displaystyle{ q_k(t) \gt 0 }[/math] for all t > 0, which is what was required to show.

Notes

  1. Horn, Roger A.; Johnson, Charles R. (1985), Matrix Analysis, Cambridge University Press, ISBN 978-0-521-38632-6 . See Theorem 7.2.5.
  2. Carl D. Meyer, Matrix Analysis and Applied Linear Algebra. See section 7.6 Positive Definite Matrices, page 566
  3. Prussing, John E. (1986), "The Principal Minor Test for Semidefinite Matrices", Journal of Guidance, Control, and Dynamics 9 (1): 121–122, doi:10.2514/3.20077, Bibcode1986JGCD....9..121P, archived from the original on 2017-01-07, https://web.archive.org/web/20170107084552/http://prussing.ae.illinois.edu/semidef.pdf, retrieved 2017-09-28 
  4. Carl D. Meyer, Matrix Analysis and Applied Linear Algebra. See section 7.6 Positive Definite Matrices, page 558
  5. Carl D. Meyer, Matrix Analysis and Applied Linear Algebra. See section 3.10 The LU Factorization, Example 3.10.7, page 154
  6. Carl D. Meyer, Matrix Analysis and Applied Linear Algebra. See section 6.1 Determinants, Exercise 6.1.16, page 474

References

fr:Matrice définie positive#Critère de Sylvester