A nonstandard proof

We’re going to use a nonstandard model of the real numbers to make this work. The construction is a bit wacky: Take a non-principal ultrapower of \mathbb R. Call this the hyperreals. Here’s another way to put it:

A hypernumber a sequence of regular numbers (a_1,a_2,\dots). To make decisions about hypernumbers, we put it to a vote:

Is it true that (3,0,0,\dots)=(0,0,0,\dots)? Well, the first position votes no, because 0\ne 3. The rest of them all vote yes, however, so the vote is infinity to one. They’re equal. Wackier things such as (0,1,0,1,\dots) we wont get into. If you’re interested, look here. (The basic idea is that the ultrafilter decides whether the evens or the odds win in the vote. The ultrafilter always decides which sets are more important. The fact that its non-principal implies that it prefers cofinite  sets to finite ones).

It’s easy to see that the standard real numbers are embedded as hyperreal numbers. If you have a real number x, then (x,x,\dots) is the standard real number embedded in the hyperreals as a hypernumber.

Nonstandard models are fun because now we really do have “infinitesimal numbers.” The hypernumber (1,\frac12,\frac13,\dots) is smaller than every standard positive number (x,x,x,\dots), but its still bigger than (0,0,0,\dots). We can also have infinite numbers (1,2,3,\dots), and even larger infinite numbers (2,3,4,\dots).

Lots of things you can do in the hyperreals are essentially the same in the standard real numbers. Some people call this the “transfer principle.” I never liked that name. All it says is that, in model theoretic terms, the standard real numbers and the hyperreals are “first-order equivalent.” That is, they have the same first order theories. This means that anything you can write down using quantifiers \forall and \exists, where you only quantify over elements (not subsets of elements) is true in one model if and only if it is true in the other.

Anyway, enough logic. Let P be the set of hyperprimes. These are the hypernatural numbers which aren’t hyperdivisible by any hypernatural numbers other than hyper-one and themselves. That sentence was fun.

Let N=(1!,2!,3!,\dots). Note that N is hyperdivisible by every standard natural number. This means that N!+1=(1!+1,2!+1,3!+1,\dots) cannot be divisible by any standard prime. But there does have to be some hyperprime dividing any hypernumber. This is because every standard natural number is divisible by a standard prime. Since divisibility is a first-order property, there must be a hyperprime hyperdividing any hypernatural number. This tells us that P, the set of hyperprimes, contains some nonstandard element.

Now I just want to argue why a set which contains nonstandard elements must have its standard counterpart be infinite. The idea is that, if it were finite, I could define the set by saying n is in P if and only if n=p_1 or n=p_2 or … or n=p_k. I can’t do this if P is infinite, because the sentences I write down must be finite. Then, when I transfer this set to the hyperreal numbers, if it were finite, it would satisfy this property in the hyperreals too. Thus, in the hyperreals, it would only contain the same standard elements.

The fact that P contains a nonstandard hyperprime tells us that there must be infinitely many primes. I left out many details, but hopefully the sketch gives you a reasonable idea of what is going on.

At some level, this is no different from Euclid’s proof, but it just seems so much more badass. I mean, we used the axiom of choice (to construct the non-principal ultrafilter) in a number theory proof. That is decidedly awesome.

Why Programming is Hard

Take your favorite programming language and write a program in it. Your compute stores the source code as a string of zeros and ones. We can interpret that code as a number in binary. So to each program, we have a number. We may have many numbers whose programs that do the same thing, but any program I can write down has a number associated with it. Let P_e be the program which is encoded by e. This might not do anything. It might have lots of spelling mistakes and syntactical errors, but every program is P_e, for some number e.

Programs take in inputs and spit out outputs. Or at least, sometimes they do. Sometimes programs run forever. Suppose I am handed a bunch of programs and told to run them on some inputs, but I’m told I only have an hour to do this. Obviously I can’t run the ones that go one forever, so how do I know which ones those are? How do I tell the difference between a program that runs for an obscenely long time, and one that will never end?

The short of it is that I can’t. There is no algorithmic way to determine which programs halt and which programs run forever.

Theorem: There is no program H which takes input e (an encoding of a program) and x (some input to the program encoded by e), and spits out 1 if P_e(x) halts and 0 otherwise.

Proof: Suppose I had such a program. I’m going to make a slightly different program. What it will do is take in some input e, and determine whether or not P_e(e) halts. If P_e(e) never halts, it will return 17. Otherwise, it will enter an infinite loop and never halt. This program sometimes stops and sometimes goes without stopping, so we’ll call it the “Yielding Program.” Here’s some pseudocode for that program:

To save space, I’m going to write Y for the “yielding program,” and I’m going to let p be the number encoding it. That is, Y=P_p. Ready for the magic? Does Y(p) halt? That is, does Y(p) return 17? If it did, that would mean that we were in the first case in the if-statement, so H(p,p)=0. But H(p,p)=0 means that P_p(p) (a.k.a. Y(p)) doesn’t halt. That’s a contradiction.

What about the other possibility Y(p) goes on forever. This means that H(p,p) is not 0, so it must be 1. But this too is a problem, for then P_p(p) (a.k.a. Y(p)) does halt. That’s another contradiction.

Since we used the existence of a program H to construct Y, which lead to a contradiction, it must be that H cannot not really exist.

\square

If you didn’t like this proof, then  perhaps this one in a Dr. Seuss style will suit you better: Scooping the Loop Snooper

Yoneda’s Lemma (part 1)

So I think we’re ready, at least for the statement of Yoneda’s lemma. It says that for any locally small category \mathcal C, if A is an object in C, and F:\mathcal C\to\textsc{Set} a functor, then

\mbox{Nat}(h^A,F)\cong F(A)

Moreover, the isomorphism is natural in both A and F.

Wow that looks complicated. Let’s parse some of the notation that I haven’t even explained yet. So certainly we know what the righthand-side means. It’s F applied to A. That’s just a set. As for the lefthand-side, \mbox{Nat} and h^A I haven’t explained.

So h^A is a functor we mentioned briefly, but I used a different notation. It’s the representable functor \hom(A,-). I write it as h^A here to avoid over using parentheses and therefore complicating this business well beyond it’s current level of complication. As a reminder, \hom(A,-) is a functor from \mathcal C to \textsc{Set} (the same as F). It takes objects B in \mathcal C to the set of homomorphisms \hom(A,B). Remember we’re in a locally small category, so \hom(A,B) really is a set. It takes morphisms f:B\to C to a map h^A(f):\hom(A,B)\to\hom(A,C) by sending

h^A(f):\phi\mapsto f\circ\phi

We checked all the necessary details in an earlier post to make sure this really was a functor.

So the last thing is that \mbox{Nat}. It’s the collection of all natural transformations between the functors h^A and F. So Yoneda’s lemma claims that there is a one to one correspondence between the the natural transformations from h^A to F and the set F(A).

In particular, it claims that \mbox{Nat}(h^A,F) is a set. This is because \textsc{Set} is a locally small category, and so each natural transformation (defined as a collection of morphisms, each of which was a set) is a set. But there are only so many collections of morphisms, not even all of which are natural transformations. The collection is small enough to be a set. If you don’t care about this set theory business. Then disregard the paragraph you probably just read angrily.

It’s worth mentioning now, that Yoneda’s lemma is a generalization of some nice theorems. We can (and will) use it to derive Cayley’s theorem (every group embeds into a symmetric group). We can (and will) use it to derive the important fact that \hom_R(R,M)\cong M in the category of R-modules. I bet in that one you can already start to see the resemblance.

We’ll prove Yoneda’s lemma over the next few posts.

Coproducts in the category of Sets

As with products. Not every category has coproducts. However, \textsc{Set} does. I claim that given two sets A and B, their coproduct A\coprod B is the disjoint union of A and B. Since coproducts are unique, all we need to do is show that the disjoint union satisfies the properties of a coproduct. So we can let A\coprod B denote the disjoint union, and then verify that it really is the coproduct.

Proof:

We have the natural injections i_A:A\to A\coprod B and i_B:B\to A\coprod B. Suppose we have a set C and f_A:A\to C and f_B:B\to C. Define g:A\coprod B\to C by g(x)=f_A(x) if x\in\mbox{im}\ i_A, and g(x)=f_B(x) if x\in\mbox{im}\ i_B. Notice that every x\in A\coprod B is in the image of precisely one of the maps i_A and i_B, so this map g is well defined. Now we just need to check that the diagram

commutes. I’ll leave that as an exercise.

\square

More Pedantry

Similar to last time, I want to talk about something mildly pedantic, but again, very important. There’s something in-between large categories and small categories. We don’t call it a medium category (I suppose my letters to the AMS and MAA haven’t been that persuasive). The collection of homomorphisms between X and Y we denoted as \hom(X,Y). This may be a set or a proper class. If it is a set, we’ll call it the “hom-set.” Easy enough. What if for every X,Y\in \mathcal C, \hom(X,Y) is a hom-set? This is what we call a locally small category. Make sense? We may have way too many objects to have it actually be a small category, but there may be few enough maps between them to be locally small. All small categories are locally small. The obvious example of a locally small category which is not small is the category of sets with set maps as the morphisms. There is no “set of all sets,” but given any two sets, the collection of maps between them is a set. Check your set theory axioms.

I’d also like to point out in slightly further detail the mistake I made last time (and have since edited). The definitions of large versus small categories are accurate. Most examples that we know of are large categories. But many of these examples are at least locally small. It seems that smallness is a very restrictive property. Too restrictive to say anything about the categories we care about. On the other hand largeness is just too wide open to say anything interesting. Being locally small is in some sense a balance, and it turns out to have some interesting properties.

Pedantry

We’ve mentioned the category \textsc{Set} (the category of sets with functions as morphisms). Consider another category \textsc{Two} which has two objects A and B and a morphism between them (along with the required identity morphisms)

First of all, the small-caps notation and naming convention is not standard, there is no standard, but I think it is as good as any other, so I’ll use it.

Secondably [sic], there’s a fundamental difference between these two categories. The difference is in the objects. Not the objects themselves, but Ob(\textsc{Set}) and Ob(\textsc{Two}). The first one is not a set. After all, we can’t talk about the set of all sets. In some sense, the collection of all sets is just to big to be a set. We can however talk about the set \{A, B\}. That is Ob(\textsc{Fin}) is a set. (One caveat here is that A and B must be sets themselves, but we can choose them to be, and it’s not the point of this post, nor will it be relevant later).

This is why, in the definition of categories, I specifically mentioned that we had a collection of objects, not a set of objects. But if it’s not a set, what is it? It’s called a class, and it pushes us closer to axiomatic set theory than I want to go. It does however give rise to the following definitions that the pedantic reader will care to think about:

Definition:

A category \mathcal C is small if Ob(\mathcal C) is an honest-to-goodness set. Otherwise, we say that \mathcal C is large (in the situation that Ob(\mathcal C) is a class and not a set).

I must say that really this stuff is important despite how I’ve presented it. But if you trust me not to lie to you (probably a bad move), you can just read and trust that I’m not breaking any mathematical laws.

Guest post: Isaac Solomon

Read Isaac’s blog here.

From a philosophical point of view, the Axiom of Choice (AC) is often a tough sell. On the one hand, there’s something wrong with vector spaces without bases, or fields without algebraic closures (consequences of \neg AC). On the other hand, non-measurable sets.

The most common example of a non-measurable set, the so-called “Vitali set,” is a tranversal of the quotient space \mathbb{R}/\mathbb{Q}. While \mathbb{R}/\mathbb{Q} is certainly a repulsive creature, there’s no reason to believe, a priori, that all other examples of non-measurable sets are as abstruse. To see that that is indeed the case, we look to the opening line of Solovay’s 1970 paper, “A model of set theory in which every set of reals is Lebesgue measurable”:

We show that the existence of a non-Lebesgue measurable set cannot be proved in Zermelo-Frankel set theory (ZF) if the axiom of choice is disallowed.

Ouch. Unfortunately, it gets worse. There is no formula \varphi(x) for which ZFC proves, “there is a unique non-measurable subset of reals satisfying \varphi(x).” In other words, while non-measurable sets are definable in certain models of ZFC, the theory does not have the capacity to prove that a particular formula “talks about” a non-measurable set. Thus, it appears that AC furnishes a very large collection of sets that even it cannot get a handle on.

In a recent conversation, Andy and I were wondering if there was some formal way of saying that a proof is constructive. We suggested that a proof be called constructive if every existential statement in its deduction had a (first-order) definable witness. So, for example, the proof of the compactness of a metric space would be constructive if, given an arbitrary sequence in that space, one could explicitly outline an algorithm for producing a convergent subsequence.

Thinking more about this, I suspect our definition may be flawed, in that it shifts responsibility for definability to (the theory of the) model at hand, as opposed to the theory we used to generate it. By that I mean that in certain models of ZFC, in which non-measurable sets are not definable, the proof of their existence is non-constructive, but in other models of ZFC, in which non-measurable sets are definable (without parameters), the same proof would be constructive. To me, that seems to miss the whole point of why we pretend to care about constructability in the first place.

Conversely, if you want to switch your perspective to the theory, things become very arbitrary. Given a model M, and a deduction in the theory of that model, do we want to look at provably definable objects in all of \mbox{Th}(M), or in some sub-theory? In that light, we’re stuck with philosophy again, wondering which theory is most natural for considering a certain deduction.

For the philosophically-minded, the remedy probably lies in intuitionistic, or constructive, logic. Whereas in classical logic, all statements are assumed to be true or false, in constructive logic, a statement is only deemed true or false if there is a constructive proof witnessing that assertion. Put differently, the law of the excluded middle, and double negation elimination, are not axioms of constructive logic. Thus, a better definition of constructability would be that a deduction is constructible if and only if it is valid in intuitionistic logic.

Luckily, because we are mathematicians and not philosophers, we don’t have to worry ourselves with these insipid considerations. The workaround is never to adopt a more conservative approach to mathematics, but to rephrase paradoxes as counterexamples and leave reality for the physicists.

Philosophy and choice

Seriously, category theory is coming soon. I promise. In the meantime, a friend (who posts on this awesome blog) and I had a conversation about the axiom of choice from a philosophical standpoint. It started when I made the bold claim “whatever, the axiom of choice is obviously true.” Originally, this was meant half as a provocative joke, but really, I think it makes sense. My friend, who I’ll call Steve Neen for the sake of this post, declared, as many mathemematicians do, that he preferred constructive mathematics. After all, how can you say something exists without being able to construct it. This is why mathematicians prefer constructive proofs to non-constructive ones.

I had always sort of accepted this as a reasonable point of view (which I still think it is), but the following thought dawned on me: At their core, each of the axioms of ZFC simply say “there exists a something such that something else.” We may have notation for sets constructed from a specific axiom, but symbology shouldn’t have any bearing on whether or not an axiom is constructive. There is simply nothing intrinsically constructive or non-constructive about any of the axioms.

Of course, Steve then pointed out the difference may not be subjective. Look at what computers can do. They can represent unions, pairs, power sets (though not efficiently), etc. all in an effective manner. But look at the axiom of choice. Computers just aren’t built to do this constructively.

Fair point, Steve. One that had me thinking all of last night. One answer is that what computers can do is somewhat arbitrary, but that feels like a cop out. So here’s my two part answer, that I’m reasonably satisfied with.

  • Look at the Axiom of Infinity. It essentially says there is an infinite set. More precisely,

\exists x[\varnothing\in x\wedge\forall y(y\in x\to y\cup\{y\}\in x)]

Here’s an example of a set a computer can’t represent explicitly. One could argue that an implicit representation is possible, because in finite time one could decide whether or not a given set was an element of this one, but there is no bound on how long it would take to a computer to make this decision. Fair enough. But the power set of an infinite set (i.e., an uncountable set) cannot have such a decision procedure.
This is all to say, that if you discount the axiom of choice for it’s non-constructibility, you should also discount the axiom of infinity.
  • Jokingly, and to prove his point, Steve asked me to pick one element from each coset of \mathbb R/\mathbb Q. First of all, it’s important to note that one’s ability to do so is reliant on the axiom of choice. This is one of the reasons I think it should be true. Morally, I can pick out an element from each coset. I told Steve, all I need to do is produce a choice function on the coset. I claimed (again, jokingly) that I had such a function. In fact, I had uncountably many. If he named a coset, I’d consult my function and tell him the answer.
    Steve did point out, and is correct in doing so, that just because I claim it exists, and can verify as much as he asks doesn’t mean that I have such a function. But from a computational point of view, it sort of does. Any function that’s consistent with a choice function can be extended. If you accept the axiom of choice, then you can extend it to a choice function. If you don’t, then it can just be extended further, so long as the extension is countable. Regardless, all computations are going to be finite, so from a computational point of view, we never really need the axiom of choice, just as we don’t need the axiom of inifinty. We only need finite choice, which is provable from ZF, even without choice.

So that’s a recap of the discussion we had. I’m not entirely convinced anymore that the axiom of choice is “no different” from the other axioms, but it’s an interesting philosophical question I invite you to think about.

Global choice and algebraic closures

It was pointed out to me today that I glazed over a set-theoretic point in my proof that every field has an algebraic closure. We appealed to Zorn’s lemma, which says:

Given a partially ordered set \mathcal P, if every chain \mathcal C\subseteq\mathcal P has an upper bound, then \mathcal P has a maximal element.

An assumption we made here that seems innocuous is that \mathcal P is a set. If it is a proper class (class and not a set), we may run into problems. So what if the collection of all field extensions of a given field is a proper class? Then we can’t use Zorn’s Lemma.

I think, though haven’t attempted to prove it, that this collection of field extensions is really a set, meaning we’re safe in applying Zorn’s lemma. But it’s an interesting diversion to see what happens if we try to prove Zorn’s lemma for classes.

It’s well known that Zorn’s lemma is equivalent to the axiom of choice, so it would be nice to find an “axiom of choice for classes.” Such a thing exists, and it’s called the “axiom of global choice.” It says the same thing about classes that the axiom of choice says about sets. So if you accept the axiom of global choice, we just need to prove a “global Zorn’s lemma.”

Unfortunately, the standard proof of Zorn’s lemma goes something like this: Assume it’s false. Then use the axiom of choice to construct (via transfinite induction) larger and larger elements. These form a well-ordering. We can keep going and we’ll eventually get to a well-ordering that has too many elements to be contained in the set \mathcal P.

The problem is that if we replace the axiom of choice with the axiom of global choice, we’ll be unable to make this well-ordering bigger than our “poclass” \mathcal P.

If you want more information on this sort of set theory stuff, you should read Zach Norwood’s posts (this is a great blog to which you should definitely subscribe):

Basic facts about AC, part I

Basic facts about AC, part II

On an unrelated note, coming soon will be a series of posts on some generalized abstract nonsense.

Every field has an algebraic closure

Yeah. That’s right. Every fieldYesterday we showed how we could add a root of a polynomial to a field and that would make it bigger. So we can just keep adding roots over and over again until we get them all, right? Well, it’s not that simple. See, it’s conceivable that when I add a new root, now I have a lot of new polynomials whose roots I need to add. And if I add all of them, I could get even more polynomials. Will it ever end?

Luckily, we have a handy tool. The careful reader will have noticed that “Every field has an algebraic closure” kind of looks like “Every vector space has a basis.” The choice of titles was not a coincidence. The proofs both rely on Zorn’s lemma. Once again, this is here for the sake of completeness. If you’re comfortable taking this on faith, I won’t hold it against you. Without further ado…

Proof:

Take all the fields that contain a given field k, and order them with E\ge F if E is a field extension of F. Some things might not be comparable. That’s fine. Then neither will be bigger than the other (this is called a partially ordered set, or poset for short). Last time we showed we can always add on a root to a polynomial (or at least waved our hands in that direction). So if we can produce a maximal field, that means that it must already have all the roots. This is exactly what an algebraically closed field is.

Zorn’s lemma gives us a maximal element under certain conditions, so we’ll just have to satisfy those conditions. The conditions are that “every chain has an upper bound.” What’s a chain? It’s what we call a poset that is totally ordered (I voted for toset, but no one else seemed to like that). If you draw a picture, you’ll see why. Everything lies in one line.

So suppose we have a bunch of fields in a chain. That means that if I pick any two fields, one must contain the other. Let’s call them F_\alpha, for \alpha in some indexing set \mathcal A. It’s important to note that if \mathcal A is finite, this not hard. One of the F_\alphas is a maximum. But if it’s infinite, or really infinite, things aren’t so clear. In any event, like many Zorn’s lemma arguments, unions come to save the day.

\displaystyle\bigcup_{\alpha\in\mathcal A}F_\alpha is an upper bound for every F_\alpha.

Well, duh. The big union contains everything. That was the point. We do however need to check that it’s a field. (Left as an exercise to the reader.) The trick is to take notice that we’re only allowed finite sums/products/etc., and we already know how to do those, since they always come from one of the F_\alphas.

Okay, so any chain has an upper bound. We invoke Zorn’s lemma and we have a maximal element. Like I said, maximal means that we can’t extend it any more (so it must have the root of every polynomial).

\square

One can also show that algebraic closures are unique, though I don’t care to do so. We can actually talk about the algebraic closure of a field k, instead of just talking about an algebraic closure. We’ll write \overline k for the algebraic closure of k. For example, \mathbb C=\overline{\mathbb R}

Follow

Get every new post delivered to your Inbox.