A nonstandard proof

We’re going to use a nonstandard model of the real numbers to make this work. The construction is a bit wacky: Take a non-principal ultrapower of \mathbb R. Call this the hyperreals. Here’s another way to put it:

A hypernumber a sequence of regular numbers (a_1,a_2,\dots). To make decisions about hypernumbers, we put it to a vote:

Is it true that (3,0,0,\dots)=(0,0,0,\dots)? Well, the first position votes no, because 0\ne 3. The rest of them all vote yes, however, so the vote is infinity to one. They’re equal. Wackier things such as (0,1,0,1,\dots) we wont get into. If you’re interested, look here. (The basic idea is that the ultrafilter decides whether the evens or the odds win in the vote. The ultrafilter always decides which sets are more important. The fact that its non-principal implies that it prefers cofinite  sets to finite ones).

It’s easy to see that the standard real numbers are embedded as hyperreal numbers. If you have a real number x, then (x,x,\dots) is the standard real number embedded in the hyperreals as a hypernumber.

Nonstandard models are fun because now we really do have “infinitesimal numbers.” The hypernumber (1,\frac12,\frac13,\dots) is smaller than every standard positive number (x,x,x,\dots), but its still bigger than (0,0,0,\dots). We can also have infinite numbers (1,2,3,\dots), and even larger infinite numbers (2,3,4,\dots).

Lots of things you can do in the hyperreals are essentially the same in the standard real numbers. Some people call this the “transfer principle.” I never liked that name. All it says is that, in model theoretic terms, the standard real numbers and the hyperreals are “first-order equivalent.” That is, they have the same first order theories. This means that anything you can write down using quantifiers \forall and \exists, where you only quantify over elements (not subsets of elements) is true in one model if and only if it is true in the other.

Anyway, enough logic. Let P be the set of hyperprimes. These are the hypernatural numbers which aren’t hyperdivisible by any hypernatural numbers other than hyper-one and themselves. That sentence was fun.

Let N=(1!,2!,3!,\dots). Note that N is hyperdivisible by every standard natural number. This means that N!+1=(1!+1,2!+1,3!+1,\dots) cannot be divisible by any standard prime. But there does have to be some hyperprime dividing any hypernumber. This is because every standard natural number is divisible by a standard prime. Since divisibility is a first-order property, there must be a hyperprime hyperdividing any hypernatural number. This tells us that P, the set of hyperprimes, contains some nonstandard element.

Now I just want to argue why a set which contains nonstandard elements must have its standard counterpart be infinite. The idea is that, if it were finite, I could define the set by saying n is in P if and only if n=p_1 or n=p_2 or … or n=p_k. I can’t do this if P is infinite, because the sentences I write down must be finite. Then, when I transfer this set to the hyperreal numbers, if it were finite, it would satisfy this property in the hyperreals too. Thus, in the hyperreals, it would only contain the same standard elements.

The fact that P contains a nonstandard hyperprime tells us that there must be infinitely many primes. I left out many details, but hopefully the sketch gives you a reasonable idea of what is going on.

At some level, this is no different from Euclid’s proof, but it just seems so much more badass. I mean, we used the axiom of choice (to construct the non-principal ultrafilter) in a number theory proof. That is decidedly awesome.


Why Programming is Hard

Take your favorite programming language and write a program in it. Your compute stores the source code as a string of zeros and ones. We can interpret that code as a number in binary. So to each program, we have a number. We may have many numbers whose programs that do the same thing, but any program I can write down has a number associated with it. Let P_e be the program which is encoded by e. This might not do anything. It might have lots of spelling mistakes and syntactical errors, but every program is P_e, for some number e.

Programs take in inputs and spit out outputs. Or at least, sometimes they do. Sometimes programs run forever. Suppose I am handed a bunch of programs and told to run them on some inputs, but I’m told I only have an hour to do this. Obviously I can’t run the ones that go one forever, so how do I know which ones those are? How do I tell the difference between a program that runs for an obscenely long time, and one that will never end?

The short of it is that I can’t. There is no algorithmic way to determine which programs halt and which programs run forever.

Theorem: There is no program H which takes input e (an encoding of a program) and x (some input to the program encoded by e), and spits out 1 if P_e(x) halts and 0 otherwise.

Proof: Suppose I had such a program. I’m going to make a slightly different program. What it will do is take in some input e, and determine whether or not P_e(e) halts. If P_e(e) never halts, it will return 17. Otherwise, it will enter an infinite loop and never halt. This program sometimes stops and sometimes goes without stopping, so we’ll call it the “Yielding Program.” Here’s some pseudocode for that program:

To save space, I’m going to write Y for the “yielding program,” and I’m going to let p be the number encoding it. That is, Y=P_p. Ready for the magic? Does Y(p) halt? That is, does Y(p) return 17? If it did, that would mean that we were in the first case in the if-statement, so H(p,p)=0. But H(p,p)=0 means that P_p(p) (a.k.a. Y(p)) doesn’t halt. That’s a contradiction.

What about the other possibility Y(p) goes on forever. This means that H(p,p) is not 0, so it must be 1. But this too is a problem, for then P_p(p) (a.k.a. Y(p)) does halt. That’s another contradiction.

Since we used the existence of a program H to construct Y, which lead to a contradiction, it must be that H cannot not really exist.


If you didn’t like this proof, then  perhaps this one in a Dr. Seuss style will suit you better: Scooping the Loop Snooper

Yoneda’s Lemma (part 1)

So I think we’re ready, at least for the statement of Yoneda’s lemma. It says that for any locally small category \mathcal C, if A is an object in C, and F:\mathcal C\to\textsc{Set} a functor, then

\mbox{Nat}(h^A,F)\cong F(A)

Moreover, the isomorphism is natural in both A and F.

Wow that looks complicated. Let’s parse some of the notation that I haven’t even explained yet. So certainly we know what the righthand-side means. It’s F applied to A. That’s just a set. As for the lefthand-side, \mbox{Nat} and h^A I haven’t explained.

So h^A is a functor we mentioned briefly, but I used a different notation. It’s the representable functor \hom(A,-). I write it as h^A here to avoid over using parentheses and therefore complicating this business well beyond it’s current level of complication. As a reminder, \hom(A,-) is a functor from \mathcal C to \textsc{Set} (the same as F). It takes objects B in \mathcal C to the set of homomorphisms \hom(A,B). Remember we’re in a locally small category, so \hom(A,B) really is a set. It takes morphisms f:B\to C to a map h^A(f):\hom(A,B)\to\hom(A,C) by sending

h^A(f):\phi\mapsto f\circ\phi

We checked all the necessary details in an earlier post to make sure this really was a functor.

So the last thing is that \mbox{Nat}. It’s the collection of all natural transformations between the functors h^A and F. So Yoneda’s lemma claims that there is a one to one correspondence between the the natural transformations from h^A to F and the set F(A).

In particular, it claims that \mbox{Nat}(h^A,F) is a set. This is because \textsc{Set} is a locally small category, and so each natural transformation (defined as a collection of morphisms, each of which was a set) is a set. But there are only so many collections of morphisms, not even all of which are natural transformations. The collection is small enough to be a set. If you don’t care about this set theory business. Then disregard the paragraph you probably just read angrily.

It’s worth mentioning now, that Yoneda’s lemma is a generalization of some nice theorems. We can (and will) use it to derive Cayley’s theorem (every group embeds into a symmetric group). We can (and will) use it to derive the important fact that \hom_R(R,M)\cong M in the category of R-modules. I bet in that one you can already start to see the resemblance.

We’ll prove Yoneda’s lemma over the next few posts.

Coproducts in the category of Sets

As with products. Not every category has coproducts. However, \textsc{Set} does. I claim that given two sets A and B, their coproduct A\coprod B is the disjoint union of A and B. Since coproducts are unique, all we need to do is show that the disjoint union satisfies the properties of a coproduct. So we can let A\coprod B denote the disjoint union, and then verify that it really is the coproduct.


We have the natural injections i_A:A\to A\coprod B and i_B:B\to A\coprod B. Suppose we have a set C and f_A:A\to C and f_B:B\to C. Define g:A\coprod B\to C by g(x)=f_A(x) if x\in\mbox{im}\ i_A, and g(x)=f_B(x) if x\in\mbox{im}\ i_B. Notice that every x\in A\coprod B is in the image of precisely one of the maps i_A and i_B, so this map g is well defined. Now we just need to check that the diagram

commutes. I’ll leave that as an exercise.


More Pedantry

Similar to last time, I want to talk about something mildly pedantic, but again, very important. There’s something in-between large categories and small categories. We don’t call it a medium category (I suppose my letters to the AMS and MAA haven’t been that persuasive). The collection of homomorphisms between X and Y we denoted as \hom(X,Y). This may be a set or a proper class. If it is a set, we’ll call it the “hom-set.” Easy enough. What if for every X,Y\in \mathcal C, \hom(X,Y) is a hom-set? This is what we call a locally small category. Make sense? We may have way too many objects to have it actually be a small category, but there may be few enough maps between them to be locally small. All small categories are locally small. The obvious example of a locally small category which is not small is the category of sets with set maps as the morphisms. There is no “set of all sets,” but given any two sets, the collection of maps between them is a set. Check your set theory axioms.

I’d also like to point out in slightly further detail the mistake I made last time (and have since edited). The definitions of large versus small categories are accurate. Most examples that we know of are large categories. But many of these examples are at least locally small. It seems that smallness is a very restrictive property. Too restrictive to say anything about the categories we care about. On the other hand largeness is just too wide open to say anything interesting. Being locally small is in some sense a balance, and it turns out to have some interesting properties.


We’ve mentioned the category \textsc{Set} (the category of sets with functions as morphisms). Consider another category \textsc{Two} which has two objects A and B and a morphism between them (along with the required identity morphisms)

First of all, the small-caps notation and naming convention is not standard, there is no standard, but I think it is as good as any other, so I’ll use it.

Secondably [sic], there’s a fundamental difference between these two categories. The difference is in the objects. Not the objects themselves, but Ob(\textsc{Set}) and Ob(\textsc{Two}). The first one is not a set. After all, we can’t talk about the set of all sets. In some sense, the collection of all sets is just to big to be a set. We can however talk about the set \{A, B\}. That is Ob(\textsc{Fin}) is a set. (One caveat here is that A and B must be sets themselves, but we can choose them to be, and it’s not the point of this post, nor will it be relevant later).

This is why, in the definition of categories, I specifically mentioned that we had a collection of objects, not a set of objects. But if it’s not a set, what is it? It’s called a class, and it pushes us closer to axiomatic set theory than I want to go. It does however give rise to the following definitions that the pedantic reader will care to think about:


A category \mathcal C is small if Ob(\mathcal C) is an honest-to-goodness set. Otherwise, we say that \mathcal C is large (in the situation that Ob(\mathcal C) is a class and not a set).

I must say that really this stuff is important despite how I’ve presented it. But if you trust me not to lie to you (probably a bad move), you can just read and trust that I’m not breaking any mathematical laws.

Guest post: Isaac Solomon

Read Isaac’s blog here.

From a philosophical point of view, the Axiom of Choice (AC) is often a tough sell. On the one hand, there’s something wrong with vector spaces without bases, or fields without algebraic closures (consequences of \neg AC). On the other hand, non-measurable sets.

The most common example of a non-measurable set, the so-called “Vitali set,” is a tranversal of the quotient space \mathbb{R}/\mathbb{Q}. While \mathbb{R}/\mathbb{Q} is certainly a repulsive creature, there’s no reason to believe, a priori, that all other examples of non-measurable sets are as abstruse. To see that that is indeed the case, we look to the opening line of Solovay’s 1970 paper, “A model of set theory in which every set of reals is Lebesgue measurable”:

We show that the existence of a non-Lebesgue measurable set cannot be proved in Zermelo-Frankel set theory (ZF) if the axiom of choice is disallowed.

Ouch. Unfortunately, it gets worse. There is no formula \varphi(x) for which ZFC proves, “there is a unique non-measurable subset of reals satisfying \varphi(x).” In other words, while non-measurable sets are definable in certain models of ZFC, the theory does not have the capacity to prove that a particular formula “talks about” a non-measurable set. Thus, it appears that AC furnishes a very large collection of sets that even it cannot get a handle on.

In a recent conversation, Andy and I were wondering if there was some formal way of saying that a proof is constructive. We suggested that a proof be called constructive if every existential statement in its deduction had a (first-order) definable witness. So, for example, the proof of the compactness of a metric space would be constructive if, given an arbitrary sequence in that space, one could explicitly outline an algorithm for producing a convergent subsequence.

Thinking more about this, I suspect our definition may be flawed, in that it shifts responsibility for definability to (the theory of the) model at hand, as opposed to the theory we used to generate it. By that I mean that in certain models of ZFC, in which non-measurable sets are not definable, the proof of their existence is non-constructive, but in other models of ZFC, in which non-measurable sets are definable (without parameters), the same proof would be constructive. To me, that seems to miss the whole point of why we pretend to care about constructability in the first place.

Conversely, if you want to switch your perspective to the theory, things become very arbitrary. Given a model M, and a deduction in the theory of that model, do we want to look at provably definable objects in all of \mbox{Th}(M), or in some sub-theory? In that light, we’re stuck with philosophy again, wondering which theory is most natural for considering a certain deduction.

For the philosophically-minded, the remedy probably lies in intuitionistic, or constructive, logic. Whereas in classical logic, all statements are assumed to be true or false, in constructive logic, a statement is only deemed true or false if there is a constructive proof witnessing that assertion. Put differently, the law of the excluded middle, and double negation elimination, are not axioms of constructive logic. Thus, a better definition of constructability would be that a deduction is constructible if and only if it is valid in intuitionistic logic.

Luckily, because we are mathematicians and not philosophers, we don’t have to worry ourselves with these insipid considerations. The workaround is never to adopt a more conservative approach to mathematics, but to rephrase paradoxes as counterexamples and leave reality for the physicists.