linear algebra - Looking for insightful explanation as to why right inverse equals left inverse for square invertible matrices

Thursday, 21 April 2016

linear algebra - Looking for insightful explanation as to why right inverse equals left inverse for square invertible matrices

The simple proof goes:

Let B be the left inverse of A, C the right inverse.

C = (BA)C = B(AC) = B

This proof relies on associativity yet does not give any insight as to why this surprising fact about matrices is true.

AC means a bunch of linear combinations of of columns of A.
CA means a bunch of linear combinations of rows of A. Completely different numbers get multiplied in each case.

The proof above is just a bunch of syntactical steps not having much to do with matrices directly, I cannot see how CA=AC=I.

Can anyone shed some light on this?

Answer

In fact, this isn't about matrices per se, but about inverses in general, and perhaps more specifically about inverses of functions. The same argument works for any function that has a left and a right inverse (and for elements of a monoid or ring, though these can also be interpreted as "functions" via an appropriate setting).

If you really want to try to understand the proof in terms of "meaning", then you should not think of matrices as a bunch of columns or a bunch of numbers, but rather as functions, i.e., as linear transformations.

Say $A$ is an $m\times n$ matrix; then $A$ is "really" a linear transformation from $\mathbb{R}^n$ to $\mathbb{R}^m$ : the columns of $A$ are the images of the standard basis vectors of $\mathbb{R}^n$ under the transformation. If $B$ is a right inverse of $A$ , then $B$ is $n\times m$ , and $AB$ acts like the identity transformation on $\mathbb{R}^m$ . In particular, $AB$ has to be onto, so the rank of $AB$ is $m$ ; since the rank of $AB$ is at most the rank of $A$ , then the rank of $A$ has to be $m$ ; since the rank of $A$ is at most $\max(m,n)$ , then $m\leq n$ . This tells us that $A$ is onto (full rank), and that it has at least as many columns as it has rows.

If $C$ is a left inverse of $A$ , then $C$ must be an $n\times m$ matrix, and $CA$ acts like the identity on $\mathbb{R}^n$ . Because $CA$ is one-to-one, then $A$ has to be one-to-one. In particular, it's nullspace is trivial. That means that it cannot have more columns than rows (that would require a nontrivial nullspace, by the Rank-Nullity Theorem); since it has at least as many columns as it has rows, $A$ has exactly the same number of columns as rows, so $m=n$ .

Moreover, $A$ is now one-to-one and onto. So it is in fact bijective. So it is in fact invertible. Invertible matrices have unique inverses by definition, so $B$ and $C$ have to be equal: they have no choice in the matter. It isn't about the details of the "recipe", it's about the properties of functions: once you have a function that is one-to-one and onto, it has an inverse and the inverse is unique.

I honestly think that trying to puzzle out the details of the "recipe" is not insightful here: it is staring at the bark of a single tree instead of trying to appreciate the forest.

But if you must (and I really think you shouldn't), then you want to realize that $AC$ and $CA$ are talking in a different language: the columns of $A$ specify a basis, $\gamma$ , and tell you how to express the elements of $\gamma$ in terms of the standard basis $\beta$ ; it provides a "translation" from $\gamma$ to $\beta$ . That is, $A=[\mathrm{Id}]_{\gamma}^{\beta}$ . The inverse, $C$ , explains how to express the elements of the standard basis $\beta$ in terms of the vectors in $\gamma$ , $C=[\mathrm{Id}]_{\beta}^{\gamma}$ .
$AC$ talks in the language of the standard basis $\beta$ , $CA$ talks in the language of $\gamma$ . Then it becomes clear why "the same recipe" (not really) should work. It's not really the same recipe, because in $CA$ you "hear" vectors in $\gamma$ , translate them into $\beta$ , and then translates them back in to $\gamma$ . But in $AC$ you "hear" vectors in $\beta$ , translate them into $\gamma$ , and then back into $\beta$ . The "translation recipes" are the same, whether you do $\beta\to\gamma$ first or you do it second (translating English to Russian is the same, whether you are translating something written originally in English into Russian, or something that was translated into English first).

$A$ establishes a bijection between the vectors expressed in terms of $\gamma$ , and the vectors expressed in terms of $\beta$ . $C$ is the bijection going "the other way". Both $AC$ and $CA$ are the identity, but they are the identity of slightly different structures: $AC$ is the identity of " $\mathbb{R}^n$ -with-basis- $\beta$ ", and $CA$ is the identity of " $\mathbb{R}^n$ -with-basis- $\gamma$ ." Only when you forget that $AC$ and $CA$ are being interpreted as matrices on vector-spaces-with-basis do you realize that in fact you have the same "formula" for the expressions (i.e., that both matrices are "the" identity matrix).

If you want to stare at the bark so intently that you must think of matrices in terms of "linear combinations of rows" or "linear combinations of columns", you are going to miss a lot of important properties for matrices. Matrices are really functions; multiplication of matrices is really composition of functions. The aren't a bunch of vectors or a bunch of numbers thrown into a box that you multiply based on some ad hoc rule. They are functions.

Compare how easy, and, yes, intuitive, it is to realize that matrix multiplication is associative because it is just "composition of functions", vs. figuring it out by expanding the double summations of an entry-by-entry expression of $(AB)C$ and $A(BC)$ : you are going in the wrong direction for 'intuitive' understanding. Staring at those summations will not tell you why multiplication is associative, it will only tell you it is. Trying to puzzle out why a right inverse is also a left inverse in terms of "adding columns" and "adding rows" is not going to help either: you need to think of it in terms of functions.

Blog

Thursday, 21 April 2016