Examples of natural transformations (part 2)

Here’s the most common example of a natural transformation that I know of, and probably the most enlightening. Take a vector space V and let V^* denote it’s dual. If V is finite dimensional, then V\cong V^*, but not in any “natural” way. Moreover, even if V is infinite dimensional, V embeds in V^*, but again, not in a “natural” way. That is, we have to pick a basis to show how V embeds in V^*. If we pick a different basis, we get a different embedding. We don’t have any method that is inherent to the structure. And it’s not just that we haven’t found one. They don’t exist.

However, V does embed inside V^{**} naturally (i.e., in a way independent of the structure of V. How are we to describe this? We say that there is a natural transformation between the functors \mbox{id}_{\textsc{Vec}}:\textsc{Vec}\to\textsc{Vec} and -^{**}:\textsc{Vec}\to\textsc{Vec}. I’m suppressing the field over which we’re working in the notation. Oh well. It’s not too important.

So here goes. Let \eta_V:V\to V^{**} by \eta_V(v)=\mbox{ev}_v. By \mbox{ev}_v what I mean is the map that takes in a functional \phi from V^* and evaluates it at v. That is \mbox{ev}_v:\phi\mapsto \phi(v). Since \phi\in V^*, and \mbox{ev}_v is a map from V^* to the underlying field, \mbox{ev}_v is an element of V^{**}. Does this make


If we go across the top and then down, we take v\in V and send it to

T^{**}\circ\eta_V(v)=T^{**}\circ \mbox{ev}_v=T**(\phi\mapsto \phi(v))

=(\phi\mapsto (T^*\circ\phi)(v))=(\phi\mapsto\phi(Tv))=\mbox{ev}_{Tv}.

If we go down and then across, we take v\in V and send it to

(\eta_W\circ T)v=\mbox{ev}_{Tv}

Woohoo! We just proved naturality. In fact, we did two things. First, we showed that we could embed V in V^{**}, and we didn’t have to make any choices about elements to do so. This embedding is canonical. Second, we showed that linear transformations between V and W correspond to linear transformations between V^{**} and W^{**} in a nice way (such that the diagram commutes).


The Rank-Nullity Theorem

Rank-Nullity Theorem:

Let T:V\to W be a linear transformation between vector spaces V and W. Then

\dim V=\dim\mbox{Im }T+\dim\ker T.


Take a basis \{u_1,\dots, u_m\} for \ker T, and extend it to a basis \{u_1,\dots,u_m,v_1,\dots,v_n\} for V. Now it suffices to show that \{Tv_1,\dots, Tv_n\} form a basis for \mbox{Im }T (since then we would have \dim V=m+n,  \dim\ker T=m, and \dim\mbox{Im}T=n ).

First we will check that \{Tv_1,\dots, Tv_n\} are linearly independent. Suppose c_1 Tv_1+\cdots+c_n Tv_n=0. Then, T(c_1v_1+\cdots+c_nv_n)=0, so c_1v_1+\cdots+c_nv_n\in\ker T. This means it can be represented in terms of the \{u_1,\dots, u_m\}. Write

c_1v_1+\cdots+c_nv_n=b_1u_1+\cdots +b_mu_m

c_1v_1+\cdots+c_nv_n-b_1u_1-\cdots -b_mu_m=0.

But this is a linear combination of basis vectors of V which sum to zero, so it must be that each of the c_i=0 and b_i=0 (we don’t really care about the b_i, but it is true anyway). The interesting part is that c_i=0 for every i, exactly the condition needed to say that \{Tv_1,\dots, Tv_n\} is linearly independent.

Now we need only show that \mbox{span}(\{Tv_1,\dots, Tv_n\})=\mbox{Im }T. Let w\in\mbox{Im }T. There is some vector v which has Tv=w. Let’s write v=\lambda_1u_1+\cdots+\lambda_mu_m+\mu_1v_1+\cdots+\mu_nv_n. When we apply T, we get



But since, for each i, Tu_i=0,


Therefore, \{Tv_1,\dots,Tv_n\} form a basis for \mbox{Im }T, proving the theorem.



Notice the distinct lack of assumptions. This is true for any linear transformation over any vector spaces over any field. That’s cool.

Another thing that is cool is that this theorem generalizes wildly. This is the vector space version of what is known as Noether’s first isomorphism theorem. It’s true in the context of many other algebraic structures. We’ll probably see this result soon, at least in the context of groups.

This is the first real theorem about dimension. I wanted you to see this so that you at least understand the flavor of proof. We’ll have another similar proof when we get to Galois Theory.

Kernels and Images

Let’s take a break from matrix representations, and instead think about linear transformations. For the rest of this post, let T:V\to W be a linear transformation. We are going to define images and kernels. Images are slightly easier to understand, so we’ll talk about them first.


The image of T is a subspace of the codomain W. In english, it means any vector we can get as a result of applying T to something. In math, it is defined by:

\mbox{Im }T=\{Tv\mid v\in V\}

Sometimes we want to talk about the dimension of the image, and want a shorthand for that. This is called the rank of the map, and is written:

\mbox{rank }T=\dim\mbox{Im }T.


The kernel of T is a subspace of the domain V. In english it means all of the things that T sends to zero. In math, it is defined by:

\ker T=\{v\in V\mid Tv=0_W\}

Sometimes we want to talk about the dimension of the kernel, and want a shorthand for that. This is called the nullity of the map, and is written:

\mbox{null }T=\dim\ker T

Some things:

You’ll notice that I’m being careless, and just writing \dim when I should be writing \dim_k. I’m also just saying let V be a vector space, and not saying over what field we are working. Such is mathematics. When things are clear, we don’t like to write out all the details. Linear transformations must be between vector spaces over the same field, and the dimension of a vector space is the dimension over the implied field. I’ll write it all down whenever it isn’t clear.

I personally prefer to say \dim\ker T and \dim\mbox{Im }T instead of nullity and rank, you should know the terms. Tomorrow we’ll prove a good theorem relating these things.

Matrix Multiplication part 2

This is part two of the “boring stuff.” I put that in quotes to evoke a sense of perspective. How boring can it be? We’re still doing math.

Suppose I have linear transformations T:U\to V, and S:V\to W. Let \mathcal A, \mathcal B, and \mathcal C be bases for U, V, and W respectively. What happens when I compose T and S? I sure hope that it turns out to me multiplying the matrices (T)_{\mathcal A\to \mathcal B} and (S)_{\mathcal B\to \mathcal C}.

Let \mathcal A=\{a_1,\dots,a_\ell\}\mathcal B=\{b_1,\dots,b_m\}\mathcal C=\{c_1,\dots,c_n\}.

Let M=(S\circ T)_{\mathcal A\to \mathcal C}. I want to know what the entry M_{i,j} in the matrix is. So I want to find out what S\circ T does to a the basis vector a_i and then look at the component of c_j. That will give me the appropriate coefficient.

(S\circ T)a_i=S(Ta_i)=S\left(\displaystyle\sum_{k=1}^mt_{k,i}b_i\right)=\displaystyle\sum_{k=1}^mt_{i,k}S(b_k)=\sum_{k=1}^mt_{i,k}\left(\sum_{p=1}^ns_{k,p}c_p\right)

Of course, we only care about the coefficient of c_j, so we can disregard any time p\neq j. This gives us:


And of course, this is exactly what you would get if you multiplied matrices (S_{\mathcal B\to\mathcal C})\cdot( T_{\mathcal A\to\mathcal B}) the way you were taught.

Okay. I’m pretty sure we’re done with annoying computations for a while.

Matrix Multiplication

Warning: Today and tomorrow are going to be painful. We’re checking that multiplication of matrices and vectors the way we learned works the way we it should. It’s boring, but necessary.

If I have a linear transformation T from V to W, and \mathcal B=\{v_1,\dots,v_n\} is a basis for V and \mathcal C=\{w_1,\dots,w_m\} is a basis for W, I can write the matrix T_{\mathcal B\to\mathcal C}. If I have a vector v\in V, I can write that in its “column notation” in the basis \mathcal B. Let’s say

v=\lambda_1v_1+\cdots+\lambda_nv_n=\left(\begin{array}{c}\lambda_1\\ \vdots \\ \lambda_n\end{array}\right)_\mathcal B,

and say that

Tv_i=a_{1,i}w_1+\cdots+a_{m,i}w_m=\left(\begin{array}{c}a_{1,i}\\ \vdots \\ a_{m,i}\end{array}\right)_\mathcal C.

Then we can use the linearity of T to determine Tv in terms of the Tv_i.

Tv = T\left(\displaystyle\sum_{i=1}^n\lambda_i v_i\right)=\displaystyle\sum_{i=1}^nT(\lambda_i v_i)=\displaystyle\sum_{i=1}^n\lambda_i Tv_i

but then we know each Tv_i in terms of the w_j, so we can figure out the entire sum in terms of the w_j.


If we write this all in the “column notation,” it would say:

\left(\begin{array}{ccc}a_{1,1} & \cdots & a_{1,n}\\ \vdots & \ddots & \vdots\\ a_{m,1} & \cdots & a_{m,n}\end{array}\right)_{\mathcal B\to\mathcal C}\left(\begin{array}{c}\lambda_1\\ \vdots\\ \lambda_n\end{array}\right)_{\mathcal B}=\left(\begin{array}{c}a_{1,1}\lambda_1+\cdots+a_{1,n}\lambda_n\\ \vdots\\ a_{m,1}\lambda_1+\cdots+a_{m,n}\lambda_n\end{array}\right)_{\mathcal C}

You may notice that this is exactly the way you probably learned to multiply a vector by a matrix. What does this mean? It means back in the day, no one told you why you did it that way, but now you know why its correct.

Linear transformations as matrices

Like always, suppose we have a finite dimensional vector space V over a field k. And suppose we have a basis \mathcal B=\{v_1,v_2,\dots,v_n\}. Then if I have any vector v, I can write it as


where each c_i\in k. In an effort to to be lazy, I could skip out on writing the v_i and just record the c_i (of course, this is entirely dependent on my choice of basis. I am going to write them in a vertical column like so:

v=c_1v_1+c_2v_2+\cdots+c_nv_n=\left(\begin{array}{c}c_1\\ c_2\\ \vdots\\ c_n\end{array}\right)_{\mathcal B}

We put the \mathcal B in the subscript to remind us over what basis we are working. A few basic examples,

v_1=\left(\begin{array}{c}1\\ 0\\ 0\\ \vdots\\ 0\end{array}\right)_{\mathcal B}v_2=\left(\begin{array}{c}0\\ 1\\ 0\\ \vdots\\ 0\end{array}\right)_{\mathcal B}, all the way to v_n=\left(\begin{array}{c}0\\ 0\\ 0\\ \vdots\\ 1\end{array}\right)_{\mathcal B}

But last time we noticed that if we know what a linear transformation does to a basis, we know what it does to the entire space. So if T:V\to W is a linear transformation, I can  write down Tv_1, Tv_2, … Tv_n in this “column notation.”

Let’s say \mathcal C=\{w_1,w_2,\dots,w_m\} is a basis for W. Notice that the dimensions of V and W need not be the same. Since Tv_i\in W, we can write Tv_i=a_{1,i}w_1+a_{2,i}w_2+\cdots+a_{m,i}w_m. In the column notation, we have:

Tv_i=\left(\begin{array}{c}a_{1,i}\\ a_{2,i}\\ \vdots\\ a_{m,i}\end{array}\right)_{\mathcal C}.

But we have one of these for each i=1,2,\dots n. There’s no sense in writing so many parentheses and subscripted \mathcal Cs. Let’s just concatenate them. Then we can juts express T as the concatenated list of Tv_1, Tv_2,\dots, Tv_n. We’ll write it like

(Tv_1, Tv_2,\dots, Tv_n)_{\mathcal B\to\mathcal C},

where the subscripts tell us that we’re using the basis \mathcal B for V and \mathcal C for W. If we actually write out the entries Tv_i in their column form, we get something like this:

\left(\begin{array}{ccc}a_{1,1}&\cdots&a_{1,n}\\\vdots&\ddots&\vdots\\a_{m,1}&\cdots&a_{m,n}\end{array}\right)_{\mathcal B\to\mathcal C}.

Look familiar? Matrices are representations of linear transformations in a given basis! It will be important to remember that the bases are largely irrelevant, and that two maps are the same if they do the same thing to the space, even if they look different in different bases. For now, this is a side issue, but know that it will become an important question later on. Tomorrow we will look at lots of examples and how to apply matrices to vectors.

A note on the subscripts:

The notation I am using is 100%  made up by me. Some people have do it like _{\mathcal B}M_{\mathcal C}. I actually like this notation better, but I can’t seem to get the right-side subcripts to work in these posts on a big matrix. Oh well.

Linear Transformations

As I mentioned before, if I have any algebraic object, some interesting things to think about are

  • The object itself
  • Subobjects of the object
  • Structure preserving maps between objects (called homomorphisms)
  • Quotient objects
If our object is vector spaces, then we’ve pretty much exhausted what we can say in any generality about them. The “subobjects” are subspaces and those too seem to not give us much to study. So lets consider the structure preserving maps between vector spaces. In the context of vector spaces, they are called linear transformations (somehow this is a more informative name than vector space homomorphisms).

A linear transformation between two vector spaces V and W (over a field k) is a function T:V\to W such that

  • For all u,v\in V, T(u+v)=T(u)+T(v)
  • For all v\in V, and c\in k, T(c\cdot v)=c\cdot T(v).
These two conditions imply that T(0_V)=0_W, where 0_V is the special zero element (or additive identity) in V and 0_W is the special zero element in W.
It is common to sometimes forget about the parentheses and just write Tv when we mean T(v). This may seem silly now, but in a few days you’ll understand why.
Let’s see some examples:
  • T:V\to W by Tv=0_W. This is a trivial transformation, or zero transformation.
  • T:V\to V by Tv=v. This is the identity transformation. We write it as \mbox{id}_V.
  • More concretely, T:\mathbb R^3\to\mathbb R^2 by T(x,y,z)=(x+2y-z, z-2.5y)
  • T:\mathbb R^3\to \mathbb R^3 by T(x,y,z)=(x,y,0). This is an example of a projection. The picture on the wikipedia page is a good one.
The magic of bases:

Just a reminder, I only want to think about finite dimensional vector spaces. While much of this works for infinite dimensional spaces, treatment of such spaces requires care for which I don’t want to put forth the effort. I will try to always specify, but you should just assume I mean finite dimensional spaces if I forget. I will be sure to mention specifically any time I want to talk about infinite dimensional spaces.

So let V be a finite dimensional vector space over k, and let \{v_1,v_2,\dots, v_n\} be a basis for V. Then, for any vector v\in V, we  can write v=c_1v_1+c_2v_2+\cdots+c_nv_n where each c_i\in k. If T:V\to W is a linear transformation

Then, Tv=T\left(\displaystyle\sum_{i=1}^nc_iv_i\right)=\displaystyle\sum_{i=1}^nT(c_iv_i)=\displaystyle\sum_{i=1}^nc_i\cdot Tv_i.

Though this is just a simple application of the rules for a linear transformation, it tells us something interesting. Namely, if I know what T does to an entire basis, then I know what T does to every vector in V! Tomorrow I’ll talk about notation for this. Spoilers can be found here.