# The Cayley-Hamilton Theorem

As promised in class, I will present a proof of the Cayley-Hamilton Theorem. This theorem holds over any commutative ring (don’t worry if you don’t know what that is) in particular it holds over $\mathbb{R}$ and $\mathbb{C}$. I’ll go ahead and prove it over $\mathbb{C}$.

First a quick reminder of the characteristic polynomial, and the method for computing the eigenvalues/vectors of a matrix you learned in class.

We want to solve $Av = \lambda v$ and hence we rearrange to get $Av-\lambda v =0$ $(A-\lambda I)v=0$

Now the equation $Mx=0$ has a non-zero solution if and only if $det M =0$ (exercise!), and hence we set $det(A-\lambda I)=0$

We have seen that the determinant is a polynomial in the entries of the matrix, and so this equation above is a polynomial, called the characteristic polynomial, with a variable $\lambda$ that we are going to try and find. This $\lambda$ will be an eigenvalue of the matrix $A$, and the corresponding $v$ (which we can find by substituting for $\lambda$) is an eigenvector.

e.g. Let’s do a quick example.

Let $A=\left(\begin{array}{cc}2 & 1 \\1 & 2\end{array}\right)$ so the characteristic polynomial is given by $det\left(\begin{array}{cc}2 & 1 \\1 & 2\end{array}\right) = (2-\lambda)(2-\lambda)-1=\lambda^2-4\lambda + 3=0$

Let’s give this polynomial a name, write $p(x)=x^2-4x+3$. We see the roots are 1 and 3, so these are the eigenvalues, and we could go ahead and use these to find their corresponding eigenvectors (exercise). Look what happens when we feed our matrix $A$ into this polynomial, we get, $A^2-4A+3I=\left(\begin{array}{cc}5 & 4 \\4 & 5\end{array}\right)-4\left(\begin{array}{cc}2 & 1 \\1 & 2\end{array}\right)+3I=0$

So the matrix $A$ satisfied it’s own characteristic polynomial, $p(A)=0$. It turns out that this is true in general, and that is the statement of the Cayley-Hamilton theorem.

Theorem. (Cayley-Hamilton) Let $A$ be an $n \times n$ square matrix with characteristic polynomial $p(x)$. Then $p(A)=0$.

The proof of this theorem will go in the following way. First we’ll show it for diagonalizable matrices. This case will be easy. We will then use the fact that we can approximate all matrices by diagonalizable ones, that is, we can find a sequence of diagonalizable matrices that converge to $A$.

Pf. Suppose $A$ is a diagonalisable matrix, that is, there exists $P$ such that $P^{-1}AP=D$

with $D$ diagonal of the form $diag(\lambda_1, \ldots, \lambda_n)$ where $\lambda_i$ are the eigenvalues of $A$. We know then (or else, exercise) that $p(A) = Pp(D)P^{-1}$, and that $p(D)=diag(p(\lambda_1),\ldots,p(\lambda_n))=0$ since $\lambda_i$ are the roots of the characteristic polynomial. Hence we have shown that $p(A)=0$ for all diagonalizable matrices.

Now let $A$ be any $n \times n$ complex matrix. I claim that there exists a sequence of matrices $\{A_k\}$ with the following properties:

1. Each $A_k$ is diagonalisable
2. $\lim_{k \rightarrow \infty} A_k = A$ that is, the (pointwise) limit (meaning, fix an entry of the matrix, say the upper left entry, and consider the sequence of complex numbers defined by $\lim A_k$, this sequence converges to the upper left entry in $A$)

There are a few ways to see why this should be true. Something you have probably heard me say before is that the set of diagonalizable matrices is dense in the set of matrices. Points 1 and 2 are exactly what that means. Once we learn about the Jordan-Normal form for matrices we can prove this precisely, but for now let me justify this as follows: a matrix is diagonalizable if it has distinct eigenvalues. The idea is, if we don’t have distinct eigenvalues, we can make small perturbations (that is, jiggle around the entries a tiny bit) until we get one with distinct entries.

Okay, if we believe 1 and 2 (again, don’t worry if you don’t right now, we will make this precise later) we can complete the proof. Approximate $A = \lim_{k\rightarrow \infty} A_k$. Let $p_k(x)$ be the characteristic polynomial of $A_k$. We have $\lim_{k \rightarrow \infty} p_k(A_k) = p(A)$. But $p_k(A_k)=0$ (since they are diagonalizable by assumption), and thus we have that $p(A)=0$. This completes the proof of the Cayley-Hamilton theorem.