Cayley–Hamilton theorem
In linear algebra, the Cayley–Hamilton theorem (named after the mathematicians Arthur Cayley and William Rowan Hamilton) states that every square matrix over a commutative ring (such as the real or complex numbers or the integers) satisfies its own characteristic equation.
If A is a given n × n matrix and I_{n} is the n × n identity matrix, then the characteristic polynomial of A is defined as^{[7]} [math]\displaystyle{ p_A(\lambda)=\det(\lambda I_nA) }[/math], where det is the determinant operation and λ is a variable for a scalar element of the base ring. Since the entries of the matrix [math]\displaystyle{ (\lambda I_nA) }[/math] are (linear or constant) polynomials in λ, the determinant is also a degreen monic polynomial in λ, [math]\displaystyle{ p_A(\lambda) = \lambda^n + c_{n1}\lambda^{n1} + \cdots + c_1\lambda + c_0~. }[/math] One can create an analogous polynomial [math]\displaystyle{ p_A(A) }[/math] in the matrix A instead of the scalar variable λ, defined as [math]\displaystyle{ p_A(A) = A^n + c_{n1}A^{n1} + \cdots + c_1A + c_0I_n~. }[/math] The Cayley–Hamilton theorem states that this polynomial expression is equal to the zero matrix, which is to say that [math]\displaystyle{ p_A(A) = \mathbf 0 }[/math]. The theorem allows A^{n} to be expressed as a linear combination of the lower matrix powers of A. When the ring is a field, the Cayley–Hamilton theorem is equivalent to the statement that the minimal polynomial of a square matrix divides its characteristic polynomial. The theorem was first proven in 1853^{[8]} in terms of inverses of linear functions of quaternions, a noncommutative ring, by Hamilton.^{[4]}^{[5]}^{[6]} This corresponds to the special case of certain 4 × 4 real or 2 × 2 complex matrices. The theorem holds for general quaternionic matrices.^{[9]}^{[nb 1]} Cayley in 1858 stated it for 3 × 3 and smaller matrices, but only published a proof for the 2 × 2 case.^{[2]} The general case was first proved by Ferdinand Frobenius in 1878.^{[10]}
Examples
1 × 1 matrices
For a 1 × 1 matrix A = (a), the characteristic polynomial is given by p(λ) = λ − a, and so p(A) = (a) − a(1) = 0 is trivial.
2 × 2 matrices
As a concrete example, let
 [math]\displaystyle{ A = \begin{pmatrix}1&2\\3&4\end{pmatrix}. }[/math]
Its characteristic polynomial is given by
 [math]\displaystyle{ \begin{align} p(\lambda) &= \det(\lambda I_2A) = \det\!\begin{pmatrix}\lambda1&2\\3&\lambda4\end{pmatrix} \\ &=(\lambda1)(\lambda4)(2)(3)=\lambda^25\lambda2. \end{align} }[/math]
The Cayley–Hamilton theorem claims that, if we define
 [math]\displaystyle{ p(X)=X^25X2I_2, }[/math]
then
 [math]\displaystyle{ p(A)=A^25A2I_2 = \begin{pmatrix}0&0\\0&0\\\end{pmatrix}. }[/math]
We can verify by computation that indeed,
 [math]\displaystyle{ A^25A2I_2 = \begin{pmatrix}7&10\\15&22\\\end{pmatrix}\begin{pmatrix}5&10\\15&20\\\end{pmatrix}\begin{pmatrix}2&0\\0&2\\\end{pmatrix}=\begin{pmatrix}0&0\\0&0\\\end{pmatrix}. }[/math]
For a generic 2 × 2 matrix,
 [math]\displaystyle{ A=\begin{pmatrix}a&b\\c&d\\\end{pmatrix} , }[/math]
the characteristic polynomial is given by p(λ) = λ^{2} − (a + d)λ + (ad − bc), so the Cayley–Hamilton theorem states that
 [math]\displaystyle{ p(A)=A^2(a+d)A+(adbc)I_2=\begin{pmatrix}0&0\\0&0\\\end{pmatrix}; }[/math]
which is indeed always the case, evident by working out the entries of A^{2}.
Proof


[math]\displaystyle{ A^2(a+d)A+(adbc)I_2 }[/math] 
Applications
Determinant and inverse matrix
For a general n × n invertible matrix A, i.e., one with nonzero determinant, A^{−1} can thus be written as an (n − 1)th order polynomial expression in A: As indicated, the Cayley–Hamilton theorem amounts to the identity
[math]\displaystyle{ p(A)=A^n+c_{n1}A^{n1}+\cdots+c_1A+(1)^n\det(A)I_n =0. }[/math]
The coefficients c_{i} are given by the elementary symmetric polynomials of the eigenvalues of A. Using Newton identities, the elementary symmetric polynomials can in turn be expressed in terms of power sum symmetric polynomials of the eigenvalues:
 [math]\displaystyle{ s_k = \sum_{i=1}^n \lambda_i^k = \operatorname{tr}(A^k), }[/math]
where tr(A^{k}) is the trace of the matrix A^{k}. Thus, we can express c_{i} in terms of the trace of powers of A.
In general, the formula for the coefficients c_{i} is given in terms of complete exponential Bell polynomials as^{[nb 2]}
 [math]\displaystyle{ c_{nk} = \frac{(1)^{k}}{k!} B_k(s_1, 1! s_2, 2! s_3, \ldots, (1)^{k1}(k1)! s_k). }[/math]
In particular, the determinant of A equals (−1)^{n}c_{0}. Thus, the determinant can be written as the trace identity:
 [math]\displaystyle{ \det(A) = \frac{1}{n!} B_n(s_1, 1! s_2, 2! s_3, \ldots, (1)^{n1}(n1)! s_n). }[/math]
Likewise, the characteristic polynomial can be written as
 [math]\displaystyle{ (1)^n\det(A)I_n = A(A^{n1}+c_{n1}A^{n2}+\cdots+c_{1}I_n), }[/math]
and, by multiplying both sides by A^{−1} (note −(−1)^{n} = (−1)^{n−1}), one is led to an expression for the inverse of A as a trace identity,
 [math]\displaystyle{ \begin{align} A^{1} & = \frac{(1)^{n1}}{\det A}(A^{n1}+c_{n1}A^{n2}+\cdots+c_{1}I_n), \\[5pt] & = \frac{1}{\det A}\sum_{k=0}^{n1} (1)^{n+k1}\frac{A^{nk1}}{k!} B_k(s_1, 1! s_2, 2! s_3, \ldots, (1)^{k1}(k1)! s_k). \end{align} }[/math]
Another method for obtaining these coefficients c_{k} for a general n × n matrix, provided no root be zero, relies on the following alternative expression for the determinant,
 [math]\displaystyle{ p(\lambda)= \det (\lambda I_n A) = \lambda^n \exp (\operatorname{tr} (\log (I_n  A/\lambda))). }[/math]
Hence, by virtue of the Mercator series,
 [math]\displaystyle{ p(\lambda)= \lambda^n \exp \left( \operatorname{tr} \sum_{m=1}^\infty {({A\over\lambda})^m \over m} \right), }[/math]
where the exponential only needs be expanded to order λ^{−n}, since p(λ) is of order n, the net negative powers of λ automatically vanishing by the C–H theorem. (Again, this requires a ring containing the rational numbers.) Differentiation of this expression with respect to λ allows one to express the coefficients of the characteristic polynomial for general n as determinants of m × m matrices,^{[nb 3]}
 [math]\displaystyle{ c_{nm} = \frac{(1)^m}{m!} \begin{vmatrix} \operatorname{tr}A & m1 &0&\cdots\\ \operatorname{tr}A^2 &\operatorname{tr}A& m2 &\cdots\\ \vdots & \vdots & & & \vdots \\ \operatorname{tr}A^{m1} &\operatorname{tr}A^{m2}& \cdots & \cdots & 1 \\ \operatorname{tr}A^m &\operatorname{tr}A^{m1}& \cdots & \cdots & \operatorname{tr}A \end{vmatrix} ~. }[/math]
 Examples
For instance, the first few Bell polynomials are B_{0} = 1, B_{1}(x_{1}) = x_{1}, B_{2}(x_{1}, x_{2}) = x21 + x_{2}, and B_{3}(x_{1}, x_{2}, x_{3}) = x31 + 3 x_{1}x_{2} + x_{3}.
Using these to specify the coefficients c_{i} of the characteristic polynomial of a 2 × 2 matrix yields
 [math]\displaystyle{ \begin{align} c_2 = B_0 = 1, \\[4pt] c_1 = \frac{1}{1!} B_1(s_1) =  s_1 =  \operatorname{tr}(A), \\[4pt] c_0 = \frac{1}{2!} B_2(s_1, 1! s_2) = \frac{1}{2}(s_1^2  s_2) = \frac{1}{2}((\operatorname{tr}(A))^2  \operatorname{tr}(A^2)). \end{align} }[/math]
The coefficient c_{0} gives the determinant of the 2 × 2 matrix, c_{1} minus its trace, while its inverse is given by
 [math]\displaystyle{ A^{1} = \frac{1}{\det A}(A + c_1 I_2) = \frac{2(A  \operatorname{tr}(A) I_2)}{(\operatorname{tr}(A))^2  \operatorname{tr}(A^2)}. }[/math]
It is apparent from the general formula for c_{n−k}, expressed in terms of Bell polynomials, that the expressions
 [math]\displaystyle{ \operatorname{tr}(A)\quad \text{and} \quad \tfrac 1 2 (\operatorname{tr}(A)^2  \operatorname{tr}(A^2)) }[/math]
always give the coefficients c_{n−1} of λ^{n−1} and c_{n−2} of λ^{n−2} in the characteristic polynomial of any n × n matrix, respectively. So, for a 3 × 3 matrix A, the statement of the Cayley–Hamilton theorem can also be written as
 [math]\displaystyle{ A^3 (\operatorname{tr}A)A^2+\frac{1}{2}\left((\operatorname{tr}A)^2\operatorname{tr}(A^2)\right)A\det(A)I_3=O, }[/math]
where the righthand side designates a 3 × 3 matrix with all entries reduced to zero. Likewise, this determinant in the n = 3 case, is now
 [math]\displaystyle{ \begin{align} \det(A) &= \frac{1}{3!} B_3(s_1, 1! s_2, 2! s_3) = \frac{1}{6}(s_1^3 + 3 s_1 (s_2) + 2 s_3) \\[5pt] &= \tfrac{1}{6} \left ( (\operatorname{tr}A)^33\operatorname{tr}(A^2)(\operatorname{tr}A)+2\operatorname{tr}(A^3) \right ). \end{align} }[/math]
This expression gives the negative of coefficient c_{n−3} of λ^{n−3} in the general case, as seen below.
Similarly, one can write for a 4 × 4 matrix A,
 [math]\displaystyle{ A^4(\operatorname{tr}A)A^3 + \tfrac{1}{2}\bigl((\operatorname{tr}A)^2\operatorname{tr}(A^2)\bigr)A^2  \tfrac{1}{6}\bigl( (\operatorname{tr}A)^33\operatorname{tr}(A^2)(\operatorname{tr}A)+2\operatorname{tr}(A^3)\bigr)A + \det(A)I_4 = O, }[/math]
where, now, the determinant is c_{n−4},
 [math]\displaystyle{ \tfrac{1}{24}\! \left( (\operatorname{tr}A)^46 \operatorname{tr}(A^2)(\operatorname{tr}A)^2+3(\operatorname{tr}(A^2))^2+8\operatorname{tr}(A^3)\operatorname{tr}(A) 6\operatorname{tr}(A^4) \right), }[/math]
and so on for larger matrices. The increasingly complex expressions for the coefficients c_{k} is deducible from Newton's identities or the Faddeev–LeVerrier algorithm.
nth power of matrix
The Cayley–Hamilton theorem always provides a relationship between the powers of A (though not always the simplest one), which allows one to simplify expressions involving such powers, and evaluate them without having to compute the power A^{n} or any higher powers of A.
As an example, for [math]\displaystyle{ A = \begin{pmatrix}1&2\\3&4\end{pmatrix} }[/math] the theorem gives
 [math]\displaystyle{ A^2=5A+2I_2\, . }[/math]
Then, to calculate A^{4}, observe
 [math]\displaystyle{ A^3=(5A+2I_2)A=5A^2+2A=5(5A+2I_2)+2A=27A+10I_2, }[/math]
 [math]\displaystyle{ A^4=A^3A=(27A+10I_2)A=27A^2+10A=27(5A+2I_2)+10A=145A+54I_2\, . }[/math]
Likewise,
 [math]\displaystyle{ A^{1}=\frac{A5I_2}{2}~. }[/math]
 [math]\displaystyle{ A^{2}=A^{1}A^{1}=\frac{A^210A+25I_2}{4}=\frac{(5A+2I_2)10A+25I_2}{4}=\frac{5A+27I_2}{4}~. }[/math]
Notice that we have been able to write the matrix power as the sum of two terms. In fact, matrix power of any order k can be written as a matrix polynomial of degree at most n − 1, where n is the size of a square matrix. This is an instance where Cayley–Hamilton theorem can be used to express a matrix function, which we will discuss below systematically.
Matrix functions
Given an analytic function
 [math]\displaystyle{ f(x) = \sum_{k=0}^\infty a_k x^k }[/math]
and the characteristic polynomial p(x) of degree n of an n × n matrix A, the function can be expressed using long division as
 [math]\displaystyle{ f(x) = q(x) p(x) + r(x), }[/math]
where q(x) is some quotient polynomial and r(x) is a remainder polynomial such that 0 ≤ deg r(x) < n.
By the Cayley–Hamilton theorem, replacing x by the matrix A gives p(A) = 0, so one has
 [math]\displaystyle{ f(A) = r(A). }[/math]
Thus, the analytic function of the matrix A can be expressed as a matrix polynomial of degree less than n.
Let the remainder polynomial be
 [math]\displaystyle{ r(x) = c_0 + c_1 x + \cdots + c_{n1} x^{n1}. }[/math]
Since p(λ) = 0, evaluating the function f(x) at the n eigenvalues of A yields
 [math]\displaystyle{ f(\lambda_i) = r(\lambda_i) = c_0 + c_1 \lambda_i + \cdots + c_{n1} \lambda_i^{n1}, \qquad \text{for } i=1,2,...,n. }[/math]
This amounts to a system of n linear equations, which can be solved to determine the coefficients c_{i}. Thus, one has
 [math]\displaystyle{ f(A) = \sum_{k=0}^{n1} c_k A^k. }[/math]
When the eigenvalues are repeated, that is λ_{i} = λ_{j} for some i ≠ j, two or more equations are identical; and hence the linear equations cannot be solved uniquely. For such cases, for an eigenvalue λ with multiplicity m, the first m – 1 derivatives of p(x) vanish at the eigenvalue. This leads to the extra m – 1 linearly independent solutions
 [math]\displaystyle{ \frac{\mathrm{d}^k f(x)}{\mathrm{d}x^k}\Big_{x=\lambda} = \frac{\mathrm{d}^k r(x)}{\mathrm{d}x^k}\Big_{x=\lambda}\qquad \text{for } k = 1, 2, \ldots, m1, }[/math]
which, combined with others, yield the required n equations to solve for c_{i}.
Finding a polynomial that passes through the points (λ_{i}, f (λ_{i})) is essentially an interpolation problem, and can be solved using Lagrange or Newton interpolation techniques, leading to Sylvester's formula.
For example, suppose the task is to find the polynomial representation of
 [math]\displaystyle{ f(A) = e^{At} \qquad \mathrm{where} \qquad A = \begin{pmatrix}1&2\\0&3\end{pmatrix}. }[/math]
The characteristic polynomial is p(x) = (x − 1)(x − 3) = x^{2} − 4x + 3, and the eigenvalues are λ = 1, 3. Let r(x) = c_{0} + c_{1}x. Evaluating f(λ) = r(λ) at the eigenvalues, one obtains two linear equations, e^{t} = c_{0} + c_{1} and e^{3t} = c_{0} + 3c_{1}.
Solving the equations yields c_{0} = (3e^{t} − e^{3t})/2 and c_{1} = (e^{3t} − e^{t})/2. Thus, it follows that
 [math]\displaystyle{ e^{At} = c_0 I_2 + c_1 A = \begin{pmatrix}c_0 + c_1 & 2 c_1\\ 0 & c_0 + 3 c_1\end{pmatrix} = \begin{pmatrix}e^{t} & e^{3t}  e^{t} \\ 0 & e^{3t}\end{pmatrix}. }[/math]
If, instead, the function were f(A) = sin At, then the coefficients would have been c_{0} = (3 sin t − sin 3t)/2 and c_{1} = (sin 3t − sin t)/2; hence
 [math]\displaystyle{ \sin(At) = c_0 I_2 + c_1 A = \begin{pmatrix}\sin t & \sin 3t  \sin t \\ 0 & \sin 3t\end{pmatrix}. }[/math]
As a further example, when considering
 [math]\displaystyle{ f(A) = e^{At} \qquad \mathrm{where} \qquad A = \begin{pmatrix}0 & 1\\1 & 0\end{pmatrix}, }[/math]
then the characteristic polynomial is p(x) = x^{2} + 1, and the eigenvalues are λ = ±i.
As before, evaluating the function at the eigenvalues gives us the linear equations e^{it} = c_{0} + i c_{1} and e^{−it} = c_{0} − ic_{1}; the solution of which gives, c_{0} = (e^{it} + e^{−it})/2 = cos t and c_{1} = (e^{it} − e^{−it})/2i = sin t. Thus, for this case,
 [math]\displaystyle{ e^{At} = (\cos t) I_2 + (\sin t) A = \begin{pmatrix}\cos t & \sin t\\ \sin t & \cos t \end{pmatrix}, }[/math]
which is a rotation matrix.
Standard examples of such usage is the exponential map from the Lie algebra of a matrix Lie group into the group. It is given by a matrix exponential,
 [math]\displaystyle{ \exp: \mathfrak g \rightarrow G; \qquad tX \mapsto e^{tX} = \sum_{n=0}^\infty \frac{t^nX^n}{n!} = I + tX + \frac{t^2X^2}{2} + \cdots, t \in \mathbb R, X \in \mathfrak g . }[/math]
Such expressions have long been known for SU(2),
 [math]\displaystyle{ e^{i(\theta/2)(\hat n \cdot \sigma)} = I_2 \cos \theta/2 + i(\hat n \cdot \sigma) \sin \theta/2, }[/math]
where the σ are the Pauli matrices and for SO(3),
 [math]\displaystyle{ e^{i\theta(\hat n \cdot \mathbf J)} = I_3 + i(\hat n \cdot \mathbf J) \sin \theta + (\hat n \cdot \mathbf J)^2 (\cos \theta  1), }[/math]
which is Rodrigues' rotation formula. For the notation, see 3D rotation group#A note on Lie algebras.
More recently, expressions have appeared for other groups, like the Lorentz group SO(3, 1),^{[11]} O(4, 2)^{[12]} and SU(2, 2),^{[13]} as well as GL(n, R).^{[14]} The group O(4, 2) is the conformal group of spacetime, SU(2, 2) its simply connected cover (to be precise, the simply connected cover of the connected component SO^{+}(4, 2) of O(4, 2)). The expressions obtained apply to the standard representation of these groups. They require knowledge of (some of) the eigenvalues of the matrix to exponentiate. For SU(2) (and hence for SO(3)), closed expressions have been obtained for all irreducible representations, i.e. of any spin.^{[15]}
Algebraic number theory
The Cayley–Hamilton theorem is an effective tool for computing the minimal polynomial of algebraic integers. For example, given a finite extension [math]\displaystyle{ \mathbb{Q}[\alpha_1,\ldots,\alpha_k] }[/math] of [math]\displaystyle{ \mathbb{Q} }[/math] and an algebraic integer [math]\displaystyle{ \alpha \in \mathbb{Q}[\alpha_1,\ldots,\alpha_k] }[/math] which is a nonzero linear combination of the [math]\displaystyle{ \alpha_1^{n_1}\cdots\alpha_k^{n_k} }[/math] we can compute the minimal polynomial of [math]\displaystyle{ \alpha }[/math] by finding a matrix representing the [math]\displaystyle{ \mathbb{Q} }[/math]linear transformation
 [math]\displaystyle{ \cdot \alpha : \mathbb{Q}[\alpha_1,\ldots,\alpha_k] \to \mathbb{Q}[\alpha_1,\ldots,\alpha_k] }[/math]
If we call this transformation matrix [math]\displaystyle{ A }[/math], then we can find the minimal polynomial by applying the Cayley–Hamilton theorem to [math]\displaystyle{ A }[/math].^{[16]}
Proofs
The Cayley–Hamilton theorem is an immediate consequence of the existence of the Jordan normal form for matrices over algebraically closed fields, see Jordan normal form § Cayley–Hamilton theorem. In this section, direct proofs are presented.
As the examples above show, obtaining the statement of the Cayley–Hamilton theorem for an n × n matrix
 [math]\displaystyle{ A = (a_{ij})_{i,j=1}^n }[/math]
requires two steps: first the coefficients c_{i} of the characteristic polynomial are determined by development as a polynomial in t of the determinant
 [math]\displaystyle{ \begin{align} p(t) & = \det(t I_n  A) = \begin{vmatrix}ta_{1,1}&a_{1,2}&\cdots&a_{1,n} \\ a_{2,1}&ta_{2,2}&\cdots&a_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n,1}&a_{n,2}& \cdots& ta_{n,n} \end{vmatrix} \\[5pt] & = t^n+c_{n1}t^{n1}+\cdots+c_1t+c_0, \end{align} }[/math]
and then these coefficients are used in a linear combination of powers of A that is equated to the n × n zero matrix:
 [math]\displaystyle{ A^n+c_{n1}A^{n1} + \cdots + c_1 A + c_0 I_n = \begin{pmatrix} 0 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & 0 \end{pmatrix}. }[/math]
The lefthand side can be worked out to an n × n matrix whose entries are (enormous) polynomial expressions in the set of entries a_{i,j} of A, so the Cayley–Hamilton theorem states that each of these n^{2} expressions equals 0. For any fixed value of n, these identities can be obtained by tedious but straightforward algebraic manipulations. None of these computations, however, can show why the Cayley–Hamilton theorem should be valid for matrices of all possible sizes n, so a uniform proof for all n is needed.
Preliminaries
If a vector v of size n is an eigenvector of A with eigenvalue λ, in other words if A⋅v = λv, then
 [math]\displaystyle{ \begin{align} p(A)\cdot v & = A^n\cdot v+c_{n1}A^{n1}\cdot v+\cdots+c_1A\cdot v+c_0I_n\cdot v \\[6pt] & = \lambda^nv+c_{n1}\lambda^{n1}v+\cdots+c_1\lambda v+c_0 v=p(\lambda)v, \end{align} }[/math]
which is the zero vector since p(λ) = 0 (the eigenvalues of A are precisely the roots of p(t)). This holds for all possible eigenvalues λ, so the two matrices equated by the theorem certainly give the same (null) result when applied to any eigenvector. Now if A admits a basis of eigenvectors, in other words if A is diagonalizable, then the Cayley–Hamilton theorem must hold for A, since two matrices that give the same values when applied to each element of a basis must be equal.
 [math]\displaystyle{ A=XDX^{1}, \quad D=\operatorname{diag}(\lambda_i), \quad i=1,2,...,n }[/math]
 [math]\displaystyle{ p_A(\lambda)=\lambda IA=\prod_{i=1}^n (\lambda\lambda_i)\equiv \sum_{k=0}^n c_k\lambda^k }[/math]
 [math]\displaystyle{ p_A(A)=\sum c_k A^k=X p_A(D)X^{1}=X C X^{1} }[/math]
 [math]\displaystyle{ C_{ii}=\sum_{k=0}^n c_k\lambda_i^k=\prod_{j=1}^n(\lambda_i\lambda_j)=0, \qquad C_{i,j\neq i}=0 }[/math]
 [math]\displaystyle{ \therefore p_A(A)=XCX^{1}=O . }[/math]
Consider now the function [math]\displaystyle{ e\colon M_n \to M_n }[/math] which maps n × n matrices to n × n matrices given by the formula [math]\displaystyle{ e(A)=p_A(A) }[/math], i.e. which takes a matrix [math]\displaystyle{ A }[/math] and plugs it into its own characteristic polynomial. Not all matrices are diagonalizable, but for matrices with complex coefficients many of them are: the set [math]\displaystyle{ D }[/math] of diagonalizable complex square matrices of a given size is dense in the set of all such square matrices^{[17]} (for a matrix to be diagonalizable it suffices for instance that its characteristic polynomial not have any multiple roots). Now viewed as a function [math]\displaystyle{ e\colon \C^{n^2}\to \C ^{n^{2}} }[/math](since matrices have [math]\displaystyle{ n^2 }[/math]entries) we see that this function is continuous. This is true because the entries of the image of a matrix are given by polynomials in the entries of the matrix. Since
[math]\displaystyle{ e(D) = \left\{\begin{pmatrix} 0 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & 0 \end{pmatrix}\right\} }[/math]
and since the set [math]\displaystyle{ D }[/math] is dense, by continuity this function must map the entire set of n × n matrices to the zero matrix. Therefore, the Cayley–Hamilton theorem is true for complex numbers, and must therefore also hold for [math]\displaystyle{ \Q }[/math] or [math]\displaystyle{ \R }[/math]valued matrices.
While this provides a valid proof, the argument is not very satisfactory, since the identities represented by the theorem do not in any way depend on the nature of the matrix (diagonalizable or not), nor on the kind of entries allowed (for matrices with real entries the diagonalizable ones do not form a dense set, and it seems strange one would have to consider complex matrices to see that the Cayley–Hamilton theorem holds for them). We shall therefore now consider only arguments that prove the theorem directly for any matrix using algebraic manipulations only; these also have the benefit of working for matrices with entries in any commutative ring.
There is a great variety of such proofs of the Cayley–Hamilton theorem, of which several will be given here. They vary in the amount of abstract algebraic notions required to understand the proof. The simplest proofs use just those notions needed to formulate the theorem (matrices, polynomials with numeric entries, determinants), but involve technical computations that render somewhat mysterious the fact that they lead precisely to the correct conclusion. It is possible to avoid such details, but at the price of involving more subtle algebraic notions: polynomials with coefficients in a noncommutative ring, or matrices with unusual kinds of entries.
Adjugate matrices
All proofs below use the notion of the adjugate matrix adj(M) of an n × n matrix M, the transpose of its cofactor matrix. This is a matrix whose coefficients are given by polynomial expressions in the coefficients of M (in fact, by certain (n − 1) × (n − 1) determinants), in such a way that the following fundamental relations hold,
 [math]\displaystyle{ \operatorname{adj}(M)\cdot M=\det(M)I_n=M\cdot\operatorname{adj}(M)~. }[/math]
These relations are a direct consequence of the basic properties of determinants: evaluation of the (i, j) entry of the matrix product on the left gives the expansion by column j of the determinant of the matrix obtained from M by replacing column i by a copy of column j, which is det(M) if i = j and zero otherwise; the matrix product on the right is similar, but for expansions by rows.
Being a consequence of just algebraic expression manipulation, these relations are valid for matrices with entries in any commutative ring (commutativity must be assumed for determinants to be defined in the first place). This is important to note here, because these relations will be applied below for matrices with nonnumeric entries such as polynomials.
A direct algebraic proof
This proof uses just the kind of objects needed to formulate the Cayley–Hamilton theorem: matrices with polynomials as entries. The matrix t I_{n} − A whose determinant is the characteristic polynomial of A is such a matrix, and since polynomials form a commutative ring, it has an adjugate
 [math]\displaystyle{ B=\operatorname{adj}(tI_nA). }[/math]
Then, according to the righthand fundamental relation of the adjugate, one has
 [math]\displaystyle{ (t I_n  A)B = \det(t I_n  A) I_n = p(t) I_n. }[/math]
Since B is also a matrix with polynomials in t as entries, one can, for each i, collect the coefficients of t^{ i} in each entry to form a matrix B_{i} of numbers, such that one has
 [math]\displaystyle{ B = \sum_{i = 0}^{n  1} t^i B_i. }[/math]
(The way the entries of B are defined makes clear that no powers higher than t^{ n−1} occur). While this looks like a polynomial with matrices as coefficients, we shall not consider such a notion; it is just a way to write a matrix with polynomial entries as a linear combination of n constant matrices, and the coefficient t^{ i} has been written to the left of the matrix to stress this point of view.
Now, one can expand the matrix product in our equation by bilinearity:
 [math]\displaystyle{ \begin{align} p(t) I_n &= (t I_n  A)B \\ &=(t I_n  A)\sum_{i = 0}^{n  1} t^i B_i \\ &=\sum_{i = 0}^{n  1} tI_n\cdot t^i B_i  \sum_{i = 0}^{n  1} A\cdot t^i B_i \\ &=\sum_{i = 0}^{n  1} t^{i + 1} B_i \sum_{i = 0}^{n  1} t^i AB_i \\ &=t^n B_{n  1} + \sum_{i = 1}^{n  1} t^i(B_{i  1}  AB_i)  AB_0. \end{align} }[/math]
Writing
 [math]\displaystyle{ p(t)I_n=t^nI_n+t^{n1}c_{n1}I_n+\cdots+tc_1I_n+c_0I_n, }[/math]
one obtains an equality of two matrices with polynomial entries, written as linear combinations of constant matrices with powers of t as coefficients.
Such an equality can hold only if in any matrix position the entry that is multiplied by a given power t^{ i} is the same on both sides; it follows that the constant matrices with coefficient t^{ i} in both expressions must be equal. Writing these equations then for i from n down to 0, one finds
 [math]\displaystyle{ B_{n  1} = I_n, \qquad B_{i  1}  AB_i = c_i I_n\quad \text{for }1 \leq i \leq n1, \qquad A B_0 = c_0 I_n. }[/math]
Finally, multiply the equation of the coefficients of t^{ i} from the left by A^{i}, and sum up:
[math]\displaystyle{ A^n B_{n1} + \sum\limits_{i=1}^{n1}\left( A^i B_{i1}  A^{i+1}B_i\right) A B_0 = A^n+c_{n1}A^{n1}+\cdots+c_1A+c_0I_n. }[/math]
The lefthand sides form a telescoping sum and cancel completely; the righthand sides add up to [math]\displaystyle{ p(A) }[/math]:
 [math]\displaystyle{ 0 = p(A). }[/math]
This completes the proof.
A proof using polynomials with matrix coefficients
This proof is similar to the first one, but tries to give meaning to the notion of polynomial with matrix coefficients that was suggested by the expressions occurring in that proof. This requires considerable care, since it is somewhat unusual to consider polynomials with coefficients in a noncommutative ring, and not all reasoning that is valid for commutative polynomials can be applied in this setting.
Notably, while arithmetic of polynomials over a commutative ring models the arithmetic of polynomial functions, this is not the case over a noncommutative ring (in fact there is no obvious notion of polynomial function in this case that is closed under multiplication). So when considering polynomials in t with matrix coefficients, the variable t must not be thought of as an "unknown", but as a formal symbol that is to be manipulated according to given rules; in particular one cannot just set t to a specific value.
 [math]\displaystyle{ (f+g)(x) = \sum_i \left (f_i+g_i \right )x^i = \sum_i{f_i x^i} + \sum_i{g_i x^i} = f(x) + g(x). }[/math]
Let [math]\displaystyle{ M(n,R) }[/math] be the ring of n × n matrices with entries in some ring R (such as the real or complex numbers) that has A as an element. Matrices with as coefficients polynomials in t, such as [math]\displaystyle{ tI_n  A }[/math] or its adjugate B in the first proof, are elements of [math]\displaystyle{ M(n,R[t]) }[/math].
By collecting like powers of t, such matrices can be written as "polynomials" in t with constant matrices as coefficients; write [math]\displaystyle{ M(n,R)[t] }[/math] for the set of such polynomials. Since this set is in bijection with [math]\displaystyle{ M(n,R[t]) }[/math], one defines arithmetic operations on it correspondingly, in particular multiplication is given by
 [math]\displaystyle{ \left( \sum_iM_it^i \right)\!\!\left( \sum_jN_jt^j \right) = \sum_{i,j}(M_i N_j)t^{i+j}, }[/math]
respecting the order of the coefficient matrices from the two operands; obviously this gives a noncommutative multiplication.
Thus, the identity
 [math]\displaystyle{ (t I_n  A)B = p(t) I_n. }[/math]
from the first proof can be viewed as one involving a multiplication of elements in [math]\displaystyle{ M(n,R)[t] }[/math] .
At this point, it is tempting to simply set t equal to the matrix A, which makes the first factor on the left equal to the zero matrix, and the right hand side equal to p(A); however, this is not an allowed operation when coefficients do not commute. It is possible to define a "rightevaluation map" ev_{A} : M[t ] → M, which replaces each t^{ i} by the matrix power A^{i} of A, where one stipulates that the power is always to be multiplied on the right to the corresponding coefficient. But this map is not a ring homomorphism: the rightevaluation of a product differs in general from the product of the rightevaluations. This is so because multiplication of polynomials with matrix coefficients does not model multiplication of expressions containing unknowns: a product [math]\displaystyle{ Mt^i Nt^j = (M\cdot N)t^{i+j} }[/math] is defined assuming that t commutes with N, but this may fail if t is replaced by the matrix A.
One can work around this difficulty in the particular situation at hand, since the above rightevaluation map does become a ring homomorphism if the matrix A is in the center of the ring of coefficients, so that it commutes with all the coefficients of the polynomials (the argument proving this is straightforward, exactly because commuting t with coefficients is now justified after evaluation).
Now, A is not always in the center of M, but we may replace M with a smaller ring provided it contains all the coefficients of the polynomials in question: [math]\displaystyle{ I_n }[/math], A, and the coefficients [math]\displaystyle{ B_i }[/math] of the polynomial B. The obvious choice for such a subring is the centralizer Z of A, the subring of all matrices that commute with A; by definition A is in the center of Z.
This centralizer obviously contains [math]\displaystyle{ I_n }[/math], and A, but one has to show that it contains the matrices [math]\displaystyle{ B_i }[/math]. To do this, one combines the two fundamental relations for adjugates, writing out the adjugate B as a polynomial:
 [math]\displaystyle{ \begin{align} \left(\sum_{i = 0}^m B_i t^i\right)\!(t I_n  A) &= (tI_n  A) \sum_{i = 0}^m B_i t^i \\ \sum_{i = 0}^m B_i t^{i + 1}  \sum_{i = 0}^m B_i A t^i &= \sum_{i = 0}^m B_i t^{i + 1}  \sum_{i = 0}^m A B_i t^i \\ \sum_{i = 0}^m B_i A t^i &= \sum_{i = 0}^m A B_i t^i . \end{align} }[/math]
Equating the coefficients shows that for each i, we have AB_{i} = B_{i }A as desired. Having found the proper setting in which ev_{A} is indeed a homomorphism of rings, one can complete the proof as suggested above:
 [math]\displaystyle{ \begin{align} \operatorname{ev}_A\left(p(t)I_n\right) &= \operatorname{ev}_A((tI_nA)B) \\[5pt] p(A) &= \operatorname{ev}_A(tI_n  A)\cdot \operatorname{ev}_A(B) \\[5pt] p(A) &= (AI_nA) \cdot \operatorname{ev}_A(B) = O\cdot\operatorname{ev}_A(B)=O. \end{align} }[/math]
This completes the proof.
A synthesis of the first two proofs
In the first proof, one was able to determine the coefficients B_{i} of B based on the righthand fundamental relation for the adjugate only. In fact the first n equations derived can be interpreted as determining the quotient B of the Euclidean division of the polynomial p(t)I_{n} on the left by the monic polynomial I_{n}t − A, while the final equation expresses the fact that the remainder is zero. This division is performed in the ring of polynomials with matrix coefficients. Indeed, even over a noncommutative ring, Euclidean division by a monic polynomial P is defined, and always produces a unique quotient and remainder with the same degree condition as in the commutative case, provided it is specified at which side one wishes P to be a factor (here that is to the left).
To see that quotient and remainder are unique (which is the important part of the statement here), it suffices to write [math]\displaystyle{ PQ+r = PQ'+r' }[/math] as [math]\displaystyle{ P(QQ') = r'r }[/math] and observe that since P is monic, P(Q−Q′) cannot have a degree less than that of P, unless Q = Q′.
But the dividend p(t)I_{n} and divisor I_{n}t − A used here both lie in the subring (R[A])[t], where R[A] is the subring of the matrix ring M(n, R) generated by A: the Rlinear span of all powers of A. Therefore, the Euclidean division can in fact be performed within that commutative polynomial ring, and of course it then gives the same quotient B and remainder 0 as in the larger ring; in particular this shows that B in fact lies in (R[A])[t].
But, in this commutative setting, it is valid to set t to A in the equation
 [math]\displaystyle{ p(t)I_n=(tI_nA)B; }[/math]
in other words, to apply the evaluation map
 [math]\displaystyle{ \operatorname{ev}_A:(R[A])[t]\to R[A] }[/math]
which is a ring homomorphism, giving
 [math]\displaystyle{ p(A)=0\cdot\operatorname{ev}_A(B)=0 }[/math]
just like in the second proof, as desired.
In addition to proving the theorem, the above argument tells us that the coefficients B_{i} of B are polynomials in A, while from the second proof we only knew that they lie in the centralizer Z of A; in general Z is a larger subring than R[A], and not necessarily commutative. In particular the constant term B_{0} = adj(−A) lies in R[A]. Since A is an arbitrary square matrix, this proves that adj(A) can always be expressed as a polynomial in A (with coefficients that depend on A).
In fact, the equations found in the first proof allow successively expressing [math]\displaystyle{ B_{n1}, \ldots, B_1, B_0 }[/math] as polynomials in A, which leads to the identity
[math]\displaystyle{ \operatorname{adj}(A)=\sum_{i=1}^nc_iA^{i1}, }[/math]
valid for all n × n matrices, where
 [math]\displaystyle{ p(t)=t^n+c_{n1}t^{n1}+\cdots+c_1t+c_0 }[/math]
is the characteristic polynomial of A.
Note that this identity also implies the statement of the Cayley–Hamilton theorem: one may move adj(−A) to the right hand side, multiply the resulting equation (on the left or on the right) by A, and use the fact that
 [math]\displaystyle{ A\cdot \operatorname{adj}(A) = \operatorname{adj}(A)\cdot (A) = \det(A)I_n = c_0I_n. }[/math]
A proof using matrices of endomorphisms
As was mentioned above, the matrix p(A) in statement of the theorem is obtained by first evaluating the determinant and then substituting the matrix A for t; doing that substitution into the matrix [math]\displaystyle{ tI_nA }[/math] before evaluating the determinant is not meaningful. Nevertheless, it is possible to give an interpretation where p(A) is obtained directly as the value of a certain determinant, but this requires a more complicated setting, one of matrices over a ring in which one can interpret both the entries [math]\displaystyle{ A_{i,j} }[/math] of A, and all of A itself. One could take for this the ring M(n, R) of n × n matrices over R, where the entry [math]\displaystyle{ A_{i,j} }[/math] is realised as [math]\displaystyle{ A_{i,j}I_n }[/math], and A as itself. But considering matrices with matrices as entries might cause confusion with block matrices, which is not intended, as that gives the wrong notion of determinant (recall that the determinant of a matrix is defined as a sum of products of its entries, and in the case of a block matrix this is generally not the same as the corresponding sum of products of its blocks!). It is clearer to distinguish A from the endomorphism φ of an ndimensional vector space V (or free Rmodule if R is not a field) defined by it in a basis [math]\displaystyle{ e_1, \ldots, e_n }[/math], and to take matrices over the ring End(V) of all such endomorphisms. Then φ ∈ End(V) is a possible matrix entry, while A designates the element of M(n, End(V)) whose i, j entry is endomorphism of scalar multiplication by [math]\displaystyle{ A_{i,j} }[/math]; similarly [math]\displaystyle{ I_n }[/math] will be interpreted as element of M(n, End(V)). However, since End(V) is not a commutative ring, no determinant is defined on M(n, End(V)); this can only be done for matrices over a commutative subring of End(V). Now the entries of the matrix [math]\displaystyle{ \varphi I_nA }[/math] all lie in the subring R[φ] generated by the identity and φ, which is commutative. Then a determinant map M(n, R[φ]) → R[φ] is defined, and [math]\displaystyle{ \det(\varphi I_nA) }[/math] evaluates to the value p(φ) of the characteristic polynomial of A at φ (this holds independently of the relation between A and φ); the Cayley–Hamilton theorem states that p(φ) is the null endomorphism.
In this form, the following proof can be obtained from that of (Atiyah & MacDonald 1969, Prop. 2.4) (which in fact is the more general statement related to the Nakayama lemma; one takes for the ideal in that proposition the whole ring R). The fact that A is the matrix of φ in the basis e_{1}, ..., e_{n} means that
 [math]\displaystyle{ \varphi(e_i) = \sum_{j = 1}^n A_{j,i} e_j \quad\text{for }i=1,\ldots,n. }[/math]
One can interpret these as n components of one equation in V^{ n}, whose members can be written using the matrixvector product M(n, End(V)) × V^{ n} → V^{ n} that is defined as usual, but with individual entries ψ ∈ End(V) and v in V being "multiplied" by forming [math]\displaystyle{ \psi(v) }[/math]; this gives:
 [math]\displaystyle{ \varphi I_n \cdot E= A^\operatorname{tr}\cdot E, }[/math]
where [math]\displaystyle{ E\in V^n }[/math] is the element whose component i is e_{i} (in other words it is the basis e_{1}, ..., e_{n} of V written as a column of vectors). Writing this equation as
 [math]\displaystyle{ (\varphi I_nA^\operatorname{tr})\cdot E = 0\in V^n }[/math]
one recognizes the transpose of the matrix [math]\displaystyle{ \varphi I_nA }[/math] considered above, and its determinant (as element of M(n, R[φ])) is also p(φ). To derive from this equation that p(φ) = 0 ∈ End(V), one leftmultiplies by the adjugate matrix of [math]\displaystyle{ \varphi I_nA^\operatorname{tr} }[/math], which is defined in the matrix ring M(n, R[φ]), giving
 [math]\displaystyle{ \begin{align} 0&=\operatorname{adj}(\varphi I_nA^\operatorname{tr})\cdot((\varphi I_nA^\operatorname{tr})\cdot E)\\ &= (\operatorname{adj}(\varphi I_nA^\operatorname{tr})\cdot(\varphi I_nA^\operatorname{tr}))\cdot E\\ &= (\det(\varphi I_nA^\operatorname{tr})I_n)\cdot E\\ &= (p(\varphi)I_n)\cdot E; \end{align} }[/math]
the associativity of matrixmatrix and matrixvector multiplication used in the first step is a purely formal property of those operations, independent of the nature of the entries. Now component i of this equation says that p(φ)(e_{i}) = 0 ∈ V; thus p(φ) vanishes on all e_{i}, and since these elements generate V it follows that p(φ) = 0 ∈ End(V), completing the proof.
One additional fact that follows from this proof is that the matrix A whose characteristic polynomial is taken need not be identical to the value φ substituted into that polynomial; it suffices that φ be an endomorphism of V satisfying the initial equations
 [math]\displaystyle{ \varphi(e_i) = \sum_j A_{j,i} e_j }[/math]
for some sequence of elements e_{1}, ..., e_{n} that generate V (which space might have smaller dimension than n, or in case the ring R is not a field it might not be a free module at all).
A bogus "proof": p(A) = det(AI_{n} − A) = det(A − A) = 0
One persistent elementary but incorrect argument^{[18]} for the theorem is to "simply" take the definition
 [math]\displaystyle{ p(\lambda) = \det(\lambda I_n  A) }[/math]
and substitute A for λ, obtaining
 [math]\displaystyle{ p(A)=\det(A I_n  A) = \det(A  A) = \det(\mathbf{0}) = 0. }[/math]
There are many ways to see why this argument is wrong. First, in the Cayley–Hamilton theorem, p(A) is an n × n matrix. However, the right hand side of the above equation is the value of a determinant, which is a scalar. So they cannot be equated unless n = 1 (i.e. A is just a scalar). Second, in the expression [math]\displaystyle{ \det(\lambda I_n  A) }[/math], the variable λ actually occurs at the diagonal entries of the matrix [math]\displaystyle{ \lambda I_n  A }[/math]. To illustrate, consider the characteristic polynomial in the previous example again:
 [math]\displaystyle{ \det\!\begin{pmatrix}\lambda1&2\\3&\lambda4\end{pmatrix}. }[/math]
If one substitutes the entire matrix A for λ in those positions, one obtains
 [math]\displaystyle{ \det\!\begin{pmatrix} \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}  1 & 2 \\ 3 &\begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}  4\end{pmatrix}, }[/math]
in which the "matrix" expression is simply not a valid one. Note, however, that if scalar multiples of identity matrices instead of scalars are subtracted in the above, i.e. if the substitution is performed as
 [math]\displaystyle{ \det\!\begin{pmatrix} \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}  I_2 & 2I_2 \\ 3I_2 &\begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}  4I_2 \end{pmatrix}, }[/math]
then the determinant is indeed zero, but the expanded matrix in question does not evaluate to [math]\displaystyle{ A I_nA }[/math]; nor can its determinant (a scalar) be compared to p(A) (a matrix). So the argument that [math]\displaystyle{ p(A) = \det(AI_nA) = 0 }[/math] still does not apply.
Actually, if such an argument holds, it should also hold when other multilinear forms instead of determinant is used. For instance, if we consider the permanent function and define [math]\displaystyle{ q(\lambda) = \operatorname{perm}(\lambda I_n  A) }[/math], then by the same argument, we should be able to "prove" that q(A) = 0. But this statement is demonstrably wrong: in the 2dimensional case, for instance, the permanent of a matrix is given by
 [math]\displaystyle{ \operatorname{perm}\!\begin{pmatrix} a & b \\ c & d \end{pmatrix} = ad + bc. }[/math]
So, for the matrix A in the previous example,
 [math]\displaystyle{ \begin{align} q(\lambda) & = \operatorname{perm}(\lambda I_2  A) = \operatorname{perm}\!\begin{pmatrix} \lambda  1 & 2 \\ 3 & \lambda4 \end{pmatrix} \\[6pt] & = (\lambda  1)(\lambda  4) + (2)(3) = \lambda^2  5\lambda + 10. \end{align} }[/math]
Yet one can verify that
 [math]\displaystyle{ q(A) = A^25A+10I_2=12I_2\not=0. }[/math]
One of the proofs for Cayley–Hamilton theorem above bears some similarity to the argument that [math]\displaystyle{ p(A)=\det(AI_nA)=0 }[/math]. By introducing a matrix with nonnumeric coefficients, one can actually let A live inside a matrix entry, but then [math]\displaystyle{ A I_n }[/math] is not equal to A, and the conclusion is reached differently.
Proofs using methods of abstract algebra
Basic properties of Hasse–Schmidt derivations on the exterior algebra [math]\displaystyle{ A = \bigwedge M }[/math] of some Bmodule M (supposed to be free and of finite rank) have been used by (Gatto Salehyan) to prove the Cayley–Hamilton theorem. See also (Gatto Scherbak).
Abstraction and generalizations
The above proofs show that the Cayley–Hamilton theorem holds for matrices with entries in any commutative ring R, and that p(φ) = 0 will hold whenever φ is an endomorphism of an Rmodule generated by elements e_{1},...,e_{n} that satisfies
 [math]\displaystyle{ \varphi(e_j)=\sum a_{ij}e_i, \qquad j =1, \ldots, n. }[/math]
This more general version of the theorem is the source of the celebrated Nakayama lemma in commutative algebra and algebraic geometry.
See also
Remarks
 ↑ Due to the noncommutative nature of the multiplication operation for quaternions and related constructions, care needs to be taken with definitions, most notably in this context, for the determinant. The theorem holds as well for the slightly less wellbehaved splitquaternions, see (Alagös Oral). The rings of quaternions and splitquaternions can both be represented by certain 2 × 2 complex matrices. (When restricted to unit norm, these are the groups SU(2) and SU(1,1) respectively.) Therefore it is not surprising that the theorem holds.
There is no such matrix representation for the octonions, since the multiplication operation is not associative in this case. However, a modified Cayley–Hamilton theorem still holds for the octonions, see (Tian 2000).  ↑ An explicit expression for these coefficients is
 [math]\displaystyle{ c_i = \sum_{ k_1,k_2,\ldots ,k_n}\prod_{l=1}^{n} \frac{(1)^{k_l+1}}{l^{k_l} k_l!}\operatorname{tr}(A^l)^{k_l}, }[/math]
 [math]\displaystyle{ \sum_{l=1}^{n}lk_{l} = ni. }[/math]
 ↑ See, e.g., p. 54 of Brown 1994, which solves Jacobi's formula,
 [math]\displaystyle{ \partial p(\lambda) /\partial \lambda= p(\lambda) \sum^\infty _{m=0}\lambda ^{(m+1)} \operatorname{tr}A^m = p(\lambda) ~ \operatorname{tr} \frac{I}{\lambda I A}\equiv\operatorname{tr} B~, }[/math]
 [math]\displaystyle{ \begin{align} M_0 &\equiv O & c_n &= 1 \qquad &(k=0) \\[5pt] M_k &\equiv AM_{k1} \frac{1}{k1}(\operatorname{tr}(AM_{k1})) I \qquad \qquad & c_{nk} &= \frac 1 k \operatorname{tr}(AM_k) \qquad &k=1,\ldots ,n ~. \end{align} }[/math]
 [math]\displaystyle{ \lambda p' n p =\operatorname{tr} (AB)~, }[/math] (Hou 1998), and the above recursions, in turn.
Notes
 ↑ ^{1.0} ^{1.1} Crilly 1998
 ↑ ^{2.0} ^{2.1} Cayley 1858, pp. 17–37
 ↑ Cayley 1889, pp. 475–496
 ↑ ^{4.0} ^{4.1} Hamilton 1864a
 ↑ ^{5.0} ^{5.1} Hamilton 1864b
 ↑ ^{6.0} ^{6.1} Hamilton 1862
 ↑ Atiyah & MacDonald 1969
 ↑ Hamilton 1853, p. 562
 ↑ Zhang 1997
 ↑ ^{10.0} ^{10.1} Frobenius 1878
 ↑ Zeni & Rodrigues 1992
 ↑ Barut, Zeni & Laufer 1994a
 ↑ Barut, Zeni & Laufer 1994b
 ↑ Laufer 1997
 ↑ Curtright, Fairlie & Zachos 2014
 ↑ Stein, William. Algebraic Number Theory, a Computational Approach. pp. 29. http://wstein.org/books/ant/ant.pdf.
 ↑ Bhatia 1997, p. 7
 ↑ Garrett 2007, p. 381
References
 Alagös, Y.; Oral, K.; Yüce, S. (2012). "Split Quaternion Matrices". Miskolc Mathematical Notes 13 (2): 223–232. doi:10.18514/MMN.2012.364. ISSN 17872405none (open access)
 Atiyah, M. F.; MacDonald, I. G. (1969), Introduction to Commutative Algebra, Westview Press, ISBN 9780201407518
 Barut, A. O.; Zeni, J. R.; Laufer, A. (1994a). "The exponential map for the conformal group O(2,4)". J. Phys. A: Math. Gen. 27 (15): 5239–5250. doi:10.1088/03054470/27/15/022. Bibcode: 1994JPhA...27.5239B.
 Barut, A. O.; Zeni, J. R.; Laufer, A. (1994b). "The exponential map for the unitary group SU(2,2)". J. Phys. A: Math. Gen. 27 (20): 6799–6806. doi:10.1088/03054470/27/20/017. Bibcode: 1994JPhA...27.6799B.
 Bhatia, R. (1997). Matrix Analysis. Graduate texts in mathematics. 169. Springer. ISBN 9780387948461.
 Brown, Lowell S. (1994). Quantum Field Theory. Cambridge University Press. ISBN 9780521469463.
 Cayley, A. (1858). "A Memoir on the Theory of Matrices". Philos. Trans. 148.
 Cayley, A. (1889). The Collected Mathematical Papers of Arthur Cayley. (Classic Reprint). 2. Forgotten books.
 Crilly, T. (1998). "The young Arthur Cayley". Notes Rec. R. Soc. Lond. 52 (2): 267–282. doi:10.1098/rsnr.1998.0050.
 Curtright, T L; Fairlie, D B; Zachos, C K (2014). "A compact formula for rotations as spin matrix polynomials". SIGMA 10 (2014): 084. doi:10.3842/SIGMA.2014.084. Bibcode: 2014SIGMA..10..084C.
 Frobenius, G. (1878). "Ueber lineare Substutionen und bilineare Formen". J. Reine Angew. Math. 1878 (84): 1–63. doi:10.1515/crll.1878.84.1.
 Gantmacher, F.R. (1960). The Theory of Matrices. NY: Chelsea Publishing. ISBN 9780821813768.
 Gatto, Letterio; Salehyan, Parham (2016), Hasse–Schmidt derivations on Grassmann algebras, Springer, doi:10.1007/9783319318424, ISBN 9783319318424
 Gatto, Letterio; Scherbak, Inna (2015), Remarks on the CayleyHamilton Theorem
 Garrett, Paul B. (2007). Abstract Algebra. NY: Chapman and Hall/CRC. ISBN 9781584886891.
 Hamilton, W. R. (1853). Lectures on Quaternions. Dublin. https://archive.org/details/bub_gb_TCwPAAAAIAAJ.
 Hamilton, W. R. (1864a). "On a New and General Method of Inverting a Linear and Quaternion Function of a Quaternion". Proceedings of the Royal Irish Academy viii: 182–183. (communicated on June 9, 1862)
 Hamilton, W. R. (1864b). "On the Existence of a Symbolic and Biquadratic Equation, which is satisfied by the Symbol of Linear Operation in Quaternions". Proceedings of the Royal Irish Academy viii: 190–101. (communicated on June 23, 1862)
 Hou, S. H. (1998). "Classroom Note: A Simple Proof of the LeverrierFaddeev Characteristic Polynomial Algorithm". SIAM Review 40 (3): 706–709. doi:10.1137/S003614459732076X. Bibcode: 1998SIAMR..40..706H. "Classroom Note: A Simple Proof of the LeverrierFaddeev Characteristic Polynomial Algorithm"
 Hamilton, W. R. (1862). "On the Existence of a Symbolic and Biquadratic Equation which is satisfied by the Symbol of Linear or Distributive Operation on a Quaternion". The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science. series iv 24: 127–128. ISSN 14786435. http://zs.thulb.unijena.de/rsc/viewer/jportal_derivate_00126615/PMS_1862_Bd24_%200135.tif. Retrieved 20150214.
 Householder, Alston S. (2006). The Theory of Matrices in Numerical Analysis. Dover Books on Mathematics. ISBN 9780486449722.
 Laufer, A. (1997). "The exponential map of GL(N)". J. Phys. A: Math. Gen. 30 (15): 5455–5470. doi:10.1088/03054470/30/15/029. Bibcode: 1997JPhA...30.5455L.
 Tian, Y. (2000). "Matrix representations of octonions and their application". Advances in Applied Clifford Algebras 10 (1): 61–90. doi:10.1007/BF03042010. ISSN 01887009. Bibcode: 2000math......3166T.
 Zeni, J. R.; Rodrigues, W.A. (1992). "A thoughtful study of Lorentz transformations by Clifford algebras". Int. J. Mod. Phys. A 7 (8): 1793 pp. doi:10.1142/S0217751X92000776. Bibcode: 1992IJMPA...7.1793Z.
 Zhang, F. (1997). "Quaternions and matrices of quaternions". Linear Algebra and Its Applications 251: 21–57. doi:10.1016/00243795(95)005439. ISSN 00243795none (open archive).
External links
 Hazewinkel, Michiel, ed. (2001), "Cayley–Hamilton theorem", Encyclopedia of Mathematics, Springer Science+Business Media B.V. / Kluwer Academic Publishers, ISBN 9781556080104, https://www.encyclopediaofmath.org/index.php?title=p/c120080
 A proof from PlanetMath.
 The Cayley–Hamilton theorem at MathPages
Original source: https://en.wikipedia.org/wiki/Cayley–Hamilton theorem.
Read more 