Examples of natural transformations (part 2)

Here’s the most common example of a natural transformation that I know of, and probably the most enlightening. Take a vector space V and let V^* denote it’s dual. If V is finite dimensional, then V\cong V^*, but not in any “natural” way. Moreover, even if V is infinite dimensional, V embeds in V^*, but again, not in a “natural” way. That is, we have to pick a basis to show how V embeds in V^*. If we pick a different basis, we get a different embedding. We don’t have any method that is inherent to the structure. And it’s not just that we haven’t found one. They don’t exist.

However, V does embed inside V^{**} naturally (i.e., in a way independent of the structure of V. How are we to describe this? We say that there is a natural transformation between the functors \mbox{id}_{\textsc{Vec}}:\textsc{Vec}\to\textsc{Vec} and -^{**}:\textsc{Vec}\to\textsc{Vec}. I’m suppressing the field over which we’re working in the notation. Oh well. It’s not too important.

So here goes. Let \eta_V:V\to V^{**} by \eta_V(v)=\mbox{ev}_v. By \mbox{ev}_v what I mean is the map that takes in a functional \phi from V^* and evaluates it at v. That is \mbox{ev}_v:\phi\mapsto \phi(v). Since \phi\in V^*, and \mbox{ev}_v is a map from V^* to the underlying field, \mbox{ev}_v is an element of V^{**}. Does this make

commute?

If we go across the top and then down, we take v\in V and send it to

T^{**}\circ\eta_V(v)=T^{**}\circ \mbox{ev}_v=T**(\phi\mapsto \phi(v))

=(\phi\mapsto (T^*\circ\phi)(v))=(\phi\mapsto\phi(Tv))=\mbox{ev}_{Tv}.

If we go down and then across, we take v\in V and send it to

(\eta_W\circ T)v=\mbox{ev}_{Tv}

Woohoo! We just proved naturality. In fact, we did two things. First, we showed that we could embed V in V^{**}, and we didn’t have to make any choices about elements to do so. This embedding is canonical. Second, we showed that linear transformations between V and W correspond to linear transformations between V^{**} and W^{**} in a nice way (such that the diagram commutes).

The Rank-Nullity Theorem

Rank-Nullity Theorem:

Let T:V\to W be a linear transformation between vector spaces V and W. Then

\dim V=\dim\mbox{Im }T+\dim\ker T.

Proof:

Take a basis \{u_1,\dots, u_m\} for \ker T, and extend it to a basis \{u_1,\dots,u_m,v_1,\dots,v_n\} for V. Now it suffices to show that \{Tv_1,\dots, Tv_n\} form a basis for \mbox{Im }T (since then we would have \dim V=m+n,  \dim\ker T=m, and \dim\mbox{Im}T=n ).

First we will check that \{Tv_1,\dots, Tv_n\} are linearly independent. Suppose c_1 Tv_1+\cdots+c_n Tv_n=0. Then, T(c_1v_1+\cdots+c_nv_n)=0, so c_1v_1+\cdots+c_nv_n\in\ker T. This means it can be represented in terms of the \{u_1,\dots, u_m\}. Write

c_1v_1+\cdots+c_nv_n=b_1u_1+\cdots +b_mu_m

c_1v_1+\cdots+c_nv_n-b_1u_1-\cdots -b_mu_m=0.

But this is a linear combination of basis vectors of V which sum to zero, so it must be that each of the c_i=0 and b_i=0 (we don’t really care about the b_i, but it is true anyway). The interesting part is that c_i=0 for every i, exactly the condition needed to say that \{Tv_1,\dots, Tv_n\} is linearly independent.

Now we need only show that \mbox{span}(\{Tv_1,\dots, Tv_n\})=\mbox{Im }T. Let w\in\mbox{Im }T. There is some vector v which has Tv=w. Let’s write v=\lambda_1u_1+\cdots+\lambda_mu_m+\mu_1v_1+\cdots+\mu_nv_n. When we apply T, we get

w=Tv=T(\lambda_1u_1+\cdots+\lambda_mu_m+\mu_1v_1+\cdots+\mu_nv_n)

w=\lambda_1Tu_1+\cdots+\lambda_mTu_m+\mu_1Tv_1+\cdots+\mu_nTv_n

But since, for each i, Tu_i=0,

w=\mu_1Tv_1+\cdots+\mu_nTv_n\in\mbox{span}(\{Tv_1,\dots,Tv_n\}).

Therefore, \{Tv_1,\dots,Tv_n\} form a basis for \mbox{Im }T, proving the theorem.

\square

Observations:

Notice the distinct lack of assumptions. This is true for any linear transformation over any vector spaces over any field. That’s cool.

Another thing that is cool is that this theorem generalizes wildly. This is the vector space version of what is known as Noether’s first isomorphism theorem. It’s true in the context of many other algebraic structures. We’ll probably see this result soon, at least in the context of groups.

This is the first real theorem about dimension. I wanted you to see this so that you at least understand the flavor of proof. We’ll have another similar proof when we get to Galois Theory.

Kernels and Images

Let’s take a break from matrix representations, and instead think about linear transformations. For the rest of this post, let T:V\to W be a linear transformation. We are going to define images and kernels. Images are slightly easier to understand, so we’ll talk about them first.

Images:

The image of T is a subspace of the codomain W. In english, it means any vector we can get as a result of applying T to something. In math, it is defined by:

\mbox{Im }T=\{Tv\mid v\in V\}

Sometimes we want to talk about the dimension of the image, and want a shorthand for that. This is called the rank of the map, and is written:

\mbox{rank }T=\dim\mbox{Im }T.

Kernels:

The kernel of T is a subspace of the domain V. In english it means all of the things that T sends to zero. In math, it is defined by:

\ker T=\{v\in V\mid Tv=0_W\}

Sometimes we want to talk about the dimension of the kernel, and want a shorthand for that. This is called the nullity of the map, and is written:

\mbox{null }T=\dim\ker T

Some things:

You’ll notice that I’m being careless, and just writing \dim when I should be writing \dim_k. I’m also just saying let V be a vector space, and not saying over what field we are working. Such is mathematics. When things are clear, we don’t like to write out all the details. Linear transformations must be between vector spaces over the same field, and the dimension of a vector space is the dimension over the implied field. I’ll write it all down whenever it isn’t clear.

I personally prefer to say \dim\ker T and \dim\mbox{Im }T instead of nullity and rank, you should know the terms. Tomorrow we’ll prove a good theorem relating these things.

Matrix Multiplication part 2

This is part two of the “boring stuff.” I put that in quotes to evoke a sense of perspective. How boring can it be? We’re still doing math.

Suppose I have linear transformations T:U\to V, and S:V\to W. Let \mathcal A, \mathcal B, and \mathcal C be bases for U, V, and W respectively. What happens when I compose T and S? I sure hope that it turns out to me multiplying the matrices (T)_{\mathcal A\to \mathcal B} and (S)_{\mathcal B\to \mathcal C}.

Let \mathcal A=\{a_1,\dots,a_\ell\}\mathcal B=\{b_1,\dots,b_m\}\mathcal C=\{c_1,\dots,c_n\}.

Let M=(S\circ T)_{\mathcal A\to \mathcal C}. I want to know what the entry M_{i,j} in the matrix is. So I want to find out what S\circ T does to a the basis vector a_i and then look at the component of c_j. That will give me the appropriate coefficient.

(S\circ T)a_i=S(Ta_i)=S\left(\displaystyle\sum_{k=1}^mt_{k,i}b_i\right)=\displaystyle\sum_{k=1}^mt_{i,k}S(b_k)=\sum_{k=1}^mt_{i,k}\left(\sum_{p=1}^ns_{k,p}c_p\right)

Of course, we only care about the coefficient of c_j, so we can disregard any time p\neq j. This gives us:

M_{i,j}=\displaystyle\sum_{k=1}^mt_{i,k}s_{k,j}

And of course, this is exactly what you would get if you multiplied matrices (S_{\mathcal B\to\mathcal C})\cdot( T_{\mathcal A\to\mathcal B}) the way you were taught.

Okay. I’m pretty sure we’re done with annoying computations for a while.

Matrix Multiplication

Warning: Today and tomorrow are going to be painful. We’re checking that multiplication of matrices and vectors the way we learned works the way we it should. It’s boring, but necessary.

If I have a linear transformation T from V to W, and \mathcal B=\{v_1,\dots,v_n\} is a basis for V and \mathcal C=\{w_1,\dots,w_m\} is a basis for W, I can write the matrix T_{\mathcal B\to\mathcal C}. If I have a vector v\in V, I can write that in its “column notation” in the basis \mathcal B. Let’s say

v=\lambda_1v_1+\cdots+\lambda_nv_n=\left(\begin{array}{c}\lambda_1\\ \vdots \\ \lambda_n\end{array}\right)_\mathcal B,

and say that

Tv_i=a_{1,i}w_1+\cdots+a_{m,i}w_m=\left(\begin{array}{c}a_{1,i}\\ \vdots \\ a_{m,i}\end{array}\right)_\mathcal C.

Then we can use the linearity of T to determine Tv in terms of the Tv_i.

Tv = T\left(\displaystyle\sum_{i=1}^n\lambda_i v_i\right)=\displaystyle\sum_{i=1}^nT(\lambda_i v_i)=\displaystyle\sum_{i=1}^n\lambda_i Tv_i

but then we know each Tv_i in terms of the w_j, so we can figure out the entire sum in terms of the w_j.

Tv=\displaystyle\sum_{i=1}^n\lambda_i\left(\displaystyle\sum_{j=1}^ma_{j,i}w_j\right)=\sum_{j=1}^m(a_{1,1}\lambda_1+\cdots+a_{1,n}\lambda_n)w_j

If we write this all in the “column notation,” it would say:

\left(\begin{array}{ccc}a_{1,1} & \cdots & a_{1,n}\\ \vdots & \ddots & \vdots\\ a_{m,1} & \cdots & a_{m,n}\end{array}\right)_{\mathcal B\to\mathcal C}\left(\begin{array}{c}\lambda_1\\ \vdots\\ \lambda_n\end{array}\right)_{\mathcal B}=\left(\begin{array}{c}a_{1,1}\lambda_1+\cdots+a_{1,n}\lambda_n\\ \vdots\\ a_{m,1}\lambda_1+\cdots+a_{m,n}\lambda_n\end{array}\right)_{\mathcal C}

You may notice that this is exactly the way you probably learned to multiply a vector by a matrix. What does this mean? It means back in the day, no one told you why you did it that way, but now you know why its correct.

Linear transformations as matrices

Like always, suppose we have a finite dimensional vector space V over a field k. And suppose we have a basis \mathcal B=\{v_1,v_2,\dots,v_n\}. Then if I have any vector v, I can write it as

v=c_1v_1+c_2v_2+\cdots+c_nv_n

where each c_i\in k. In an effort to to be lazy, I could skip out on writing the v_i and just record the c_i (of course, this is entirely dependent on my choice of basis. I am going to write them in a vertical column like so:

v=c_1v_1+c_2v_2+\cdots+c_nv_n=\left(\begin{array}{c}c_1\\ c_2\\ \vdots\\ c_n\end{array}\right)_{\mathcal B}

We put the \mathcal B in the subscript to remind us over what basis we are working. A few basic examples,

v_1=\left(\begin{array}{c}1\\ 0\\ 0\\ \vdots\\ 0\end{array}\right)_{\mathcal B}v_2=\left(\begin{array}{c}0\\ 1\\ 0\\ \vdots\\ 0\end{array}\right)_{\mathcal B}, all the way to v_n=\left(\begin{array}{c}0\\ 0\\ 0\\ \vdots\\ 1\end{array}\right)_{\mathcal B}

But last time we noticed that if we know what a linear transformation does to a basis, we know what it does to the entire space. So if T:V\to W is a linear transformation, I can  write down Tv_1, Tv_2, … Tv_n in this “column notation.”

Let’s say \mathcal C=\{w_1,w_2,\dots,w_m\} is a basis for W. Notice that the dimensions of V and W need not be the same. Since Tv_i\in W, we can write Tv_i=a_{1,i}w_1+a_{2,i}w_2+\cdots+a_{m,i}w_m. In the column notation, we have:

Tv_i=\left(\begin{array}{c}a_{1,i}\\ a_{2,i}\\ \vdots\\ a_{m,i}\end{array}\right)_{\mathcal C}.

But we have one of these for each i=1,2,\dots n. There’s no sense in writing so many parentheses and subscripted \mathcal Cs. Let’s just concatenate them. Then we can juts express T as the concatenated list of Tv_1, Tv_2,\dots, Tv_n. We’ll write it like

(Tv_1, Tv_2,\dots, Tv_n)_{\mathcal B\to\mathcal C},

where the subscripts tell us that we’re using the basis \mathcal B for V and \mathcal C for W. If we actually write out the entries Tv_i in their column form, we get something like this:

\left(\begin{array}{ccc}a_{1,1}&\cdots&a_{1,n}\\\vdots&\ddots&\vdots\\a_{m,1}&\cdots&a_{m,n}\end{array}\right)_{\mathcal B\to\mathcal C}.

Look familiar? Matrices are representations of linear transformations in a given basis! It will be important to remember that the bases are largely irrelevant, and that two maps are the same if they do the same thing to the space, even if they look different in different bases. For now, this is a side issue, but know that it will become an important question later on. Tomorrow we will look at lots of examples and how to apply matrices to vectors.

A note on the subscripts:

The notation I am using is 100%  made up by me. Some people have do it like _{\mathcal B}M_{\mathcal C}. I actually like this notation better, but I can’t seem to get the right-side subcripts to work in these posts on a big matrix. Oh well.

Linear Transformations

As I mentioned before, if I have any algebraic object, some interesting things to think about are

  • The object itself
  • Subobjects of the object
  • Structure preserving maps between objects (called homomorphisms)
  • Quotient objects
If our object is vector spaces, then we’ve pretty much exhausted what we can say in any generality about them. The “subobjects” are subspaces and those too seem to not give us much to study. So lets consider the structure preserving maps between vector spaces. In the context of vector spaces, they are called linear transformations (somehow this is a more informative name than vector space homomorphisms).
Definition:

A linear transformation between two vector spaces V and W (over a field k) is a function T:V\to W such that

  • For all u,v\in V, T(u+v)=T(u)+T(v)
  • For all v\in V, and c\in k, T(c\cdot v)=c\cdot T(v).
These two conditions imply that T(0_V)=0_W, where 0_V is the special zero element (or additive identity) in V and 0_W is the special zero element in W.
It is common to sometimes forget about the parentheses and just write Tv when we mean T(v). This may seem silly now, but in a few days you’ll understand why.
Let’s see some examples:
  • T:V\to W by Tv=0_W. This is a trivial transformation, or zero transformation.
  • T:V\to V by Tv=v. This is the identity transformation. We write it as \mbox{id}_V.
  • More concretely, T:\mathbb R^3\to\mathbb R^2 by T(x,y,z)=(x+2y-z, z-2.5y)
  • T:\mathbb R^3\to \mathbb R^3 by T(x,y,z)=(x,y,0). This is an example of a projection. The picture on the wikipedia page is a good one.
The magic of bases:

Just a reminder, I only want to think about finite dimensional vector spaces. While much of this works for infinite dimensional spaces, treatment of such spaces requires care for which I don’t want to put forth the effort. I will try to always specify, but you should just assume I mean finite dimensional spaces if I forget. I will be sure to mention specifically any time I want to talk about infinite dimensional spaces.

So let V be a finite dimensional vector space over k, and let \{v_1,v_2,\dots, v_n\} be a basis for V. Then, for any vector v\in V, we  can write v=c_1v_1+c_2v_2+\cdots+c_nv_n where each c_i\in k. If T:V\to W is a linear transformation

Then, Tv=T\left(\displaystyle\sum_{i=1}^nc_iv_i\right)=\displaystyle\sum_{i=1}^nT(c_iv_i)=\displaystyle\sum_{i=1}^nc_i\cdot Tv_i.

Though this is just a simple application of the rules for a linear transformation, it tells us something interesting. Namely, if I know what T does to an entire basis, then I know what T does to every vector in V! Tomorrow I’ll talk about notation for this. Spoilers can be found here.

More about Dimension

I thought today I could point out some facts about dimension.

  1. If W is a subspace of a vector space V over a field k, then

\dim_k W\leq\dim_kV.

  1. You now know what it means when someone says we live in “3-dimensional” space. You can find three basis vectors for the entire universe. (Strictly speaking, this is probably false, but it at least looks true in what little piece of the universe I can see. More on that when we talk about manifolds.)
  2. Sometimes you may hear people ask “what is the fourth dimension?” To you, the informed reader, this now appears to be an uninformed question. What they are trying to ask is “what does the 4-dimensional vector space \mathbb R^4 look like, and what is its fourth basis vector?” The first part of the question makes sense, but is hard to answer. The second part doesn’t make sense. Bases are just sets and therefore aren’t ordered. I could make “the fourth basis vector” whichever I want. If you ever get this question, just point somewhere and say “that way.” You can’t be wrong.
Now I’d like to point out that we know enough about vector spaces to deal with Galois theory. That being said, there is a lot more to think about in linear algebra, and I want to do some of that first, so as to let this information sink in.

Dimension

So we know what a basis is now. How big can a basis be? If you look at some examples, you’ll notice that for a given vector space, every basis you come up with has the same size. This is no coincidence, and is one of the things that make vector spaces so nice to work with.

Theorem: Let V be a vector space. Then every basis for V must have the same size.

This is true, even for vector spaces with infinite bases, but we won’t prove it here. Let \mathcal B and \mathcal C be arbitrary bases for for V. We want to show that |\mathcal B|=|\mathcal C|. Instead, I’m going to prove that |\mathcal B|\leq|\mathcal C|. Since I chose the bases arbitrarily, I just as well could have chosen them in the other order. But if both directions are true, then equality must hold. If you take nothing else away from this post, take away the fact that this trick is really cool.

Take \mathcal B = \{b_1,b_2,\dots,b_n\} and remove a vector b_n. It its definitely still linearly independent, but it no longer spans all of V. Then there have to be some vectors from \mathcal C=\{c_1,c_2,\dots,c_m\} which I can add to \{b_1,b_2,\dots,b_{n-1}\}. Why? Well, if each c_i was linearly dependent on just the b_i, for 1\leq i\leq n-1, then anything we could write as a linear combination with the c_i we could write with b_i. This means that

V= \mbox{span}(\mathcal C)\subseteq \mbox{span}(\{b_1,\dots,b_{n-1}\})\subsetneq V,

which is a contradiction.

We add in linearly independent vectors from C until they span all of V. This must happen, since all of \mathcal C spans V. We now have a new basis for V. If there are any vectors left taken from \mathcal B, we should remove one, and add in vectors of \mathcal C until it forms a basis again.

Eventually we will have removed all of \mathcal B and will be left with \mathcal C. But in each step we removed one vector, and added at least one vector, possibly more. So we have proved that |\mathcal B|\leq|\mathcal C|. By the symmetry argument described above, |\mathcal B|=|\mathcal C|.

\square

What does this mean? It means that the size of a basis is a property of the vector space itself, not the specific basis chosen. Define the dimension of a vector space to be the size of any basis If V is a vector space over a field k, of dimension n, we write

\dim_kV=n.

Here are some examples. You should be able to come up with a basis for each each vector space. Remember, you only need one, since they are all the same size.

  • \dim_{\mathbb R}\mathbb R^n=n
  • \dim_{\mathbb R}\mathbb C=2
  • \dim_{k}\{0\}=0 for any field k

Unique representations in a vector space

Suppose I have a vector space V over a field k, and a finite basis \mathcal B=\{v_1,v_2,\dots,v_n\}. Then each vector has a unique representation of the form c_1 v_1+c_2 v_2+\cdots+c_nv_n, where the c_i come from the field k.

Clearly it has at least one, since by definition of being a basis, \mbox{span}(\mathcal B)=V. This means that every vector can be written as a linear combination of vectors in \mathcal B.

Suppose v\in V has two such representations:

v=b_1v_1+b_2v_2+\cdots+b_nv_n=c_1v_1+c_2v_2+\cdots+c_nv_n

Subracting the two representations gives us that

(b_1-c_1)v_1+(b_2-c_2)v_2+\cdots+(b_n-c_n)v_n=0

But, since a basis is also a linearly independent set, this means that for each i=1,2,\dots,n, b_i-c_i=0, so b_i=c_i. But then we don’t have two different representations. So there is exactly one.

In fact, this is true of vector spaces without finite bases too, but we did not develop any of the necessary tools to talk about infinity yet, so I’m going to restrict our attention to vector spaces with finite bases.

This result is an important fact and will be used over and over again, many times without mention. We’ll just say something like “take a basis, and write v in that basis. The fact that we can and we can uniquely is what we just showed.

Follow

Get every new post delivered to your Inbox.