Fundamental Theorem of Calculus, Part 2 (Explaining Calculus #17)

We have now discussed an extremely important tenet in the development of calculus, which is called the first part of the Fundamental Theorem of Calculus. We briefly recall what we deduced in the last post:

Fundamental Theorem of Calculus, Part 1: If A(x) is a function that tells us the area underneath the graph of f(x) from 0 up to the point x, and if F(x) is an antiderivative of f(x) with F(0) = 0, then

A(x) = F(x).

Informally, this told us that the process of calculating areas underneath graphs and the process of finding antiderivatives are really the same process. This is rather surprising at first – derivatives were originally designed to tell us about how things change over time, so it isn’t exactly obvious how reversing that process tells us about areas underneath graphs. There is, however, one example which I find particularly enlightening as to why this is the case. We will now explore this example and afterwards develop some new notation to more conveniently express both the first and second parts of the Fundamental Theorem.

An Example from Travel

Consider for a moment the following question:

Suppose you are running at a speed of 8 miles per hour for 5 hours. How far have you run?

A moments reflection reveals the answer is 40 miles. How do we know this? Well, we are taught in school that

Distance = Speed x Time.

We are told the speed, which is 8 miles per hour, and the time, which is 5 hours, and so we multiply them to see that the answer must be 8 \times 5 = 40. Now, let us reconsider this formula in terms of a graph. If you draw a graph with speed on the y-axis and time on the x-axis, then the running scenario we just discussed looks like a rectangle with height equal to 8 (the speed) and length equal to 5 (the time). The area of a rectangle is

Area = Length x Width.

This looks a lot like the above formula for distance – in fact, in this case they are the same formula and give the same answer of 40 miles. So we see where the area comes from. But the antiderivative is also quite present in this problem – the idea of speed is that speed tells you how quickly your location is changing. But if we remember that a derivative tells us how quickly something is changing, we should now understand that speed must be the derivative of the “location” function. So, if areas are the same thing as antiderivatives, then the area underneath a speed graph should tell us how much our location has changed. And this is correct – the area indeed tells us how far we have run.

This example should give the basic idea of why we might expect areas and antiderivatives to be connected. This example also hints at an additional way to understand antiderivatives as so-called “net changes” – that is, an antiderivative tells us something about how much some quantity has changed from a “start” time to some “end” time. In the example above, the area of 40 tells us that we have travelled 40 miles between when we started running and when we stopped running. This additional perspective, once understood, suggests that we probably have more to discover. This is true, but in order to write this down conveniently we need some new ways of writing down this type of information.

The Integral Notation

Up to this point, every time we wanted to talk about an antiderivative of a function f(x), we just had to write a new function F(x) and declare that F'(x) = f(x). We will now introduce a different way of writing down this fact.

Indefinite Integrals

This new notation for writing down antiderivatives goes under the name of integral. So, talking about the integral of a function is the same thing as talking about the antiderivative of the function. The way we write down integrals is

\int f(x) dx.

The dx is present in this formula to inform us that x is the correct variable. So, for example, the function \frac{x^3}{3} + C is the most general antiderivative of x^2, and therefore we now write

\int x^2 dx = \dfrac{x^3}{3} + C.

These integrals are called indefinite integrals, or sometimes just integrals, because there is a second way the symbol \int is used

Definite Integrals

One of the reasons we introduce this new symbol \int is to define the definite integral. This is connected to the idea of antiderivatives, but is actually just a single number rather than a function. The so-called definite integral of a function f(x) from a to b is written

\int_a^b f(x) dx.

What is this number? The value of \int_a^b f(x) dx is the area underneath the graph of f(x) between the value x = a and x = b.

For example, consider

\int_0^3 2x dx.

The graph of y = 2x is a straight line. When x = 0, we find that y = 0, and similarly when x = 3 we find $y = 6$. When you sketch the area underneath this graph, you get a triangle. The vertices of this triangle are (0,0), (3,0), and (3,6). Its area is therefore

\dfrac{1}{2} \mathrm{Base} \times \mathrm{Height} = \dfrac{1}{2} \left( 3 \times 6 \right) = \dfrac{18}{2} = 9.

I encourage the reader to sketch this out for themselves.

Second Part of the Fundamental Theorem

We can now discuss the second part of the Fundamental Theorem of Calculus. This part really tells us how to actually calculate definite integrals in terms of antiderivatives – in other words, it tells us that definite and indefinite integrals are sort of two sides of the same coin, and that once again areas and antiderivatives are inseparable from one another.

Fundamental Theorem of Calculus, Part 2: Let F(x) be an antiderivative of the function f(x), so F'(x) = f(x). Suppose a,b are numbers with a < b. Then

\int_a^b f(x) dx = F(b) - F(a).

Before we show the proof of this, let’s go back to the triangle example from earlier and show how we can compute the area of that triangle in this way. Since the integral of 2x is x^2 + C for some unknown constant C, we should have

\int_0^3 2x dx = \left[ x^2 + C \right]_0^3 = \left( 3^2 + C \right) - \left( 0^2 + C \right) = 9 + C - C = 9.

Notice that the +C didn’t contribute at all to the final solution. For this reason, when using the fundamental theorem of calculus we will very often just set C = 0 since it won’t matter what its value is. Also note that this calculation gave the correct answer of 9 for the value of the area of the triangle.

Now, let us see why this way of calculating areas works.

Proof: Define the function

A(x) = \int_0^x f(u) du.

Remember that because of Part 1, we know A(x) is the formula for the area underneath the graph of f(x) between 0 and x and also the formula for the antiderivative F(x) of f(x).

We will first focus on the fact that A(x) counts areas. If we plug in b, then $A(b)$ is the area underneath the graph of f(x) between the x = 0 and x = b, and A(a) counts between x = 0 and x = a. By subtracting out areas, the area underneath the graph of f(x) between x = a and x = b is A(b) - A(a). But since A(x) = F(x), this area is also just F(b) - F(a), which is what we claimed it would be.


This finished our discussion of the fundamental theorem of calculus. We have now reached one of the peaks of the whole of calculus – we’ve discovered that the processes of calculating speeds and changes over time is the reverse of the process of counting areas, and we’ve learned how to ‘go back and forth’ between them. Now, since integrals are a relatively new introduction, we will next take some time to practice working with them and learning some useful tricks for dealing with more complicated integrals.

Why Prove a Theorem Twice?

I have written before on the problem and solution of the problem of “Pythagorean Triples”. The problem, based on the Pythagorean theorem a^2 + b^2 = c^2 for right triangles, asks for all possible solutions (a,b,c) to this equation which have all of a,b,c whole numbers. Not just any right triangle works – for instance if a = b = 1 then c = \sqrt{2}, which is definitely not a whole number. To see a longer statement of the problem, check out this post. You can also check out Part 1 and Part 2 to the first solution of the problem. If you don’t want to read those, don’t worry. I’ll summarize them all below soon.

I am now coming back to write about this problem again. Even though we have solved it, this is a wonderful opportunity to put on display a principle that all mathematicians think about as they are doing their work. It is not always good enough just to solve a problem. Sometimes, it can be extremely important to solve the same problem in more than one way. This may sound a bit strange – why would you spend your time searching out a second solution when you already know the answer? The reason, I think, would be parallel to reading a book or watching a movie multiple times. When you revisit a favorite book, you are likely to see things you didn’t see before. You will find a new, fresh perspective on some character or some event in the book, and because of that new perspective you will understand the book better.

Mathematics is just like that. When you revisit the same problem from different angles, you can learn a lot more about the problem and about the angles you come at the problem from. The Pythagorean problem is a wonderful example of this. If you’ve read the first solution, it might strike you as odd that geometry had absolutely no part of the solution to this problem. The solution, which I will give a sketch of below, was entirely algebraic! Isn’t that strange? Shouldn’t there be a solution to the problem that involves geometry, since this is a geometry problem?

These are all quite worthwhile questions, and mathematicians explore these sorts of ideas all the time. We want to know how different ideas interact – how both algebra and geometry can solve the same problem. We also explore what that means for both algebra and for geometry – maybe there are other algebra problems that geometry could help us solve, and maybe there are geometry problem that algebra can help us solve. There is now a famous history of interaction between the two in mathematics – pretty much every single problem modern mathematicians care about is multi-disciplinary. In other words, if we find a particular geometry problem interesting, it is probably because it also has connections to algebra and other areas of math. Same with algebra problems – we can almost always connect them to lots of other ideas. This is a central aspect of the mathematician’s mind: we don’t merely want to solve problems, we want to connect problems. My goal in this blog post is to explore two totally different solutions to the same problem – the problem of so-called Pythagorean triples. Since I have previously written on the first solution, I will give an outline of that solution and then explain what sorts of “insights” a mathematician would see in that solution. I will then go on to put forward a completely different solution, one inspired from geometry rather than algebra. I will then explore how we learn very different sorts of information from the geometry angle rather than the algebra angle.

The Problem

If a right triangle has side lengths a,b,c with c the length of the longest side, then the famous Pythagorean theorem tells us that the algebraic equation a^2 + b^2 = c^2 must be true. This is covered in geometry class in school and is famous for its surprising simplicity and great usefulness.

When you start thinking of the equation a^2 + b^2 = c^2 as an algebra equation, you can ask many interesting questions about it. One question is to solve for c if you already have a and b. If you try some random examples of whole numbers a and b, you will begin to notice a pattern that c tends to be a rather ugly number. In particular, it is pretty difficult to find examples where c turns out to be another whole number. The “smallest” example of this happening is 3^2 + 4^2 = 5^2. The “next smallest” is 5^2 + 12^2 = 13^2. There appear not to be very many of these special solutions.

This is interesting to a mathematical explorer – why are there so few of these? And can we find all of them? This is the heart of the problem we wish to consider. We will now try to just give some shorthand language for this problem. We will now call a triple of numbers (a,b,c) a Pythagorean triple if all three of a,b,c are whole numbers and a^2 + b^2 = c^2. In other words, (a,b,c) is a Pythagorean triple if we can find a right triangle with whole number side lengths a, b, c. Our goal for the rest of the post is to explain how you can locate every possible Pythagorean triple in two different ways.

The Key Simplifying Idea

Both of the methods involve a key simplifying idea that makes the problem easier. To see what we mean here, consider the two solutions 3^2 + 4^2 = 5^2 and 6^2 + 8^2 = 10^2. Both of there are genuine solutions. However, notice that if you work out all these numbers, you can divide the second equation by 4 on both sides to obtain the first. In other words, if you multiply the equation 3^2 + 4^2 = 5^2 by four on both sides, you just get the equation 6^2 + 8^2 = 10^2. So, there is a sense in which these are really the “same solution”. On the other hand, there isn’t anything like that to relate 3^2 + 4^2 = 5^2 to the solution 5^2 + 12^2 = 13^2. These are really different answers.

To make the problem easier on ourselves, we should take advantage of this observation that some solutions are “the same”. We have just observed that the Pythagorean triples (3,4,5) and (6,8,10) should be viewed as essentially the same solution. This is because if you multiply all the numbers in (3,4,5) by 2, you get (6,8,10). This gives us the idea that we can multiply or divide a solution (a,b,c) by anything we want to get another solution. We will implement this idea in two slightly different ways in the two methods.

The Algebraic Method

Using the key simplifying idea, we’d like to remove any common factors between numbers. So, we don’t care about the triples (na, nb, nc), since those are really just the same thing as (a,b,c). Therefore, we will assume for the rest of this method that a, b, c have no common factors between them.

We can immediately make use of this fact using a key observation about the value of x^2 for even versus odd values of x. If x is even, notice that that x^2 must be a multiple of 4 (you get two copies of 2). If x is odd, then both x-1 and x+1 are even, and so x^2-1 = (x-1)(x+1) is a multiple of 4 as well. To put it another way, x^2 is one more than a multiple of 4 if x is odd, and x^2 is a multiple of 4 when x is even. This actually enables us to learn something about Pythagorean triples (a,b,c). If a,b were both odd, then a^2 + b^2 would be two more than a multiple of 4. But c^2 isn’t allowed to be two more than a multiple of 2 because of what we just discussed. Since we’ve already assumed that a,b,c don’t share factors, a,b can’t both be even either.

Therefore, we’ve reached an interesting conclusion: one of a,b must be odd and the other must be even. This tells us a fairly significant amount of information already – and this is to me the major “moral of the story” to this approach to solving the problem. If you look at other mathematical problems that involve solving some kind of equation with powers in it, one thing you’ll very frequently see is that people will consider whether the variables are odd or even and very often you can make some significant progress just by making these considerations.

The rest of this solution involves some manipulation of the original equation. You can subtract a^2 from both sides and factor to show that if (a,b,c) is a Pythagorean triple, then (c-a)(c+a) = c^2 - a^2 = b^2. Since b^2 = (c-a)(c+a), if we divide both sides by b(c-a) we can see that

\dfrac{b}{c-a} = \dfrac{c+a}{b}.

These fractions may or may not be in “lowest terms”, so let’s use the fraction \dfrac{m}{n} to be the “lowest-terms” form of both of the fractions above, so

\dfrac{m}{n} = \dfrac{b}{c-a} = \dfrac{c+a}{b}.

Using \dfrac{n}{m} = \dfrac{c-a}{b} and doing some additional algebra which I will leave to the curious reader, we can show that

\dfrac{c}{b} = \dfrac{m^2 + n^2}{2mn}, \ \ \ \dfrac{a}{b} = \dfrac{m^2 - n^2}{2mn}.

This has now solved our problem – the triples all look like

(a,b,c) = (m^2 - n^2, 2mn, m^2 + n^2).

We haven’t learned anything particularly enlightening from all this algebra, other than perhaps the idea that factorizing polynomials is very useful whenever it is possible. While this solution is a completely correct solution to the problem, it is not my favorite solution. My favorite is the geometric method, which I now demonstrate below.

The Geometric Method

Recall that we began the algebraic method by using the key idea to simplify to the case where there aren’t any factors in common between a,b,c. For the geometric method, we are going to start differently. We are going to temporarily allow ourselves fractional solutions to the equation in order to “get rid of” c altogether. What I mean by that is that if a^2 + b^2 = c^2, then

\bigg( \dfrac{a}{c} \bigg)^2 + \bigg( \dfrac{b}{c^2} \bigg)^2 = \dfrac{a^2 + b^2}{c^2} = \dfrac{c^2}{c^2} = 1.

If we use x,y as a sort of shortcut for \dfrac{a}{c}, \dfrac{b}{c}, then our Pythagorean equation is now x^2 + y^2 = 1, and the question is now to ask what the fractional solutions are to this equation. The solution (3,4,5) now translates into the solution (3/5)^2 + (4/5)^2 = 1.

Why is this useful? One way this makes things easier is that now there are only two variables instead of three. It is slightly harder to deal with fractions instead of just whole numbers, but there is a second advantage that makes the difficulty of dealing with fractions well worth it. The important fact to know here is that x^2 + y^2 = 1 is a famous equation in geometry – this is the graph of a circle! This is awesome because now we are allowed to use everything we know about circles to help us out.

The big idea now is to visualize solutions as being points on this circle. To make things a bit easier, we can pick a sort of reference point (1,0) on the circle, which corresponds to a rather silly solution 1^2 + 0^2 = 1^2. Now, as has already been discussed, any Pythagorean triple (a,b,c) corresponds to a solution (a/c)^2 + (b/c)^2 = 1, which we could graph as a point (a/c, b/c) on our circle graph. This is especially nice because the correspondence goes both ways – if we have a point (x,y) on the graph of the circle where both x,y are some kinds of fractions, then (x,y) has to basically look like (a/c, b/c), which leads us to a Pythagorean triple (a,b,c).

To summarize what we’ve just set up, the whole number solutions (a,b,c) to the equation a^2 + b^2 = c^2 are really the same thing as pairs of fractions (x,y) which satisfy x^2 + y^2 = 1. So, if we can identify every pair of fractions on the circle, we can identify all Pythagorean triples! This is one of the deep insights we get from this second method of proof – we can now see much more clearly how geometry plays a role in finding Pythagorean triples. But… how can we make use of this geometry to actually find all the solutions?

This is the next big idea. Imagine we’ve identified some point (x,y) on the circle x^2 + y^2 = 1 but we don’t know yet whether or not x and y are fractions. How could be use geometry to figure that out? The answer, as it turns out, is found in another geometric object: the line. I mentioned earlier that we can find the “obvious point” (1,0) on the circle. What happens if we connect (x,y) to (1,0) by a line? Let’s call that line L. What is the equation for L? Using the “rise over run” formula for the slope of a line, we can find the slope of this line is

\dfrac{y - 0}{x - 1} = \dfrac{y}{x-1}.

Notice that if (x,y) are fractions, so is the slope of the line L. It would be very nice if we could “go the other way” too – that is, could it be true that if I graph a line L going through the point (1,0) and intersect that line with the circle, will that new point (x,y) have fractional entries? If so, then we would have shown that “lines through (1,0) with fraction slopes” are the same thing as the points (x,y) with fractional entries on the circle. If this is so, then we have solved the Pythagorean triples problem in the following steps:

  1. Write down all lines L going through (1,0) with slope m/n.
  2. Translate each line L into the point (x,y) with fractional entries.
  3. Translate each (x,y) into a Pythagorean triple (a,b,c) by the formulas x = a/c and y = b/c.

We’ve actually already shown how to do Step 3. So, all we need to do now is Step 1 and Step 2. This will complete the method.

Step 1 is fairly straightforward. Pick a slope m/n. Then using the ‘point-slope’ formula for a line, the line L going through the point (1,0) with slope m/n has equation

y = \dfrac{m}{n}(x - 1).

Simplifying this equation, the line L has the equation

y = \dfrac{m}{n} x - \dfrac{m}{n}.

This is all of Step 1! Now we can move on to Step 2, where we try to find a point (x,y) which is on the circle x^2 + y^2 = 1 and also on the line y = \dfrac{m}{n} x - \dfrac{m}{n}. Since we have two equations, we can use the method of substitution to solve them. By plugging in the equation for y into the circle, we now need to try to solve the equation

x^2 + \bigg( \dfrac{m}{n} x - \dfrac{m}{n} \bigg)^2 = 1.

By foiling out the square term and making some simplifications, this equation reduces to

\bigg( \dfrac{m^2 + n^2}{n^2} \right) x^2 - \dfrac{2m^2}{n^2}x + \dfrac{m^2 - n^2}{n^2} = 0.

If we multiply both sides by n^2, then we find the quadratic equation

(m^2 + n^2)x^2 - 2m^2 x + (m^2 - n^2) = 0.

We can simply solve this using the quadratic formula. By plugging in all the relevant values, the quadratic formula gives us a solution

x = \dfrac{2m^2 \pm \sqrt{(2m^2)^2 - 4(m^2+n^2)(m^2-n^2)}}{2(m^2+n^2)}.

After some rather tedious simplifying,

x = \dfrac{2m^2 \pm 2n^2}{2(m^2 + n^2)},

and so the two solutions are x = 1 and x = \dfrac{m^2-n^2}{m^2 + n^2}. We already knew the first solution – x = 1 was just the point (1,0). But the second solution will lead to a Pythagorean triple. Following the work we’ve already laid out, we should now have a Pythagorean triple (a,b,c) with

x = \dfrac{a}{c} = \dfrac{m^2 - n^2}{m^2 + n^2}.

When you work out the value of b, you get b = 2mn, which is the same solution we got by the algebraic method.

Concluding Remarks

We have just shown two different ways of solving the same problem – one algebraic, one geometric. I now want to claim that the geometric approach to solving the problem is far, far more important. But what could that possibly mean? Both methods solve the problem, so how can one be better or more important than the other?

I say this method is better because the “toolkit” we learn when we look at the problem from a bird’s eye view is much more useful. To see what I mean, think about the two solutions. The algebraic proof used some weird tricks that don’t really work unless you happen to be dealing with a circle equation. However, the geometric proof isn’t like that. The geometric proof uses lines as its main tool. Lines make perfectly good sense in all sorts of contexts. This ends up making this method better than the other method because, it turns out, this method is helpful for basically any algebraic formula you’d ever be interested in! In other words, for any algebraic formula you can graph, you can learn at least something about points on it by drawing lines that go through the graph. You don’t always wind up with a total solution to the problem that way, but you can always learn at least something. This makes the geometric approach more useful than the algebraic one, since you can’t always use that approach.

To show how much you really can learn, I’ll give an example of what mathematicians have discovered using this method. The circle belongs to a group of equations called conics in geometry – conics are things like circles, ovals, and parabolas (also a more difficult shape called hyperbolas). The equations for these things are similar to circles in that all the powers of terms are at most 2, just like in the circle equation x^2 + y^2 = 1 we used to study Pythagorean triples. The full equation that works for any conic at all is

A x^2 + B xy + C y^2 + Dx + Ey + F = 0.

This equation, as you can see pretty easily, is significantly more complicated than a circle. And yet, mathematicians have proven that if you can find even one point that solves this equation, then every single other point can be found by drawing lines just like we did above! In other words, the method we discovered for solving this circle equation actually works for way more equations than just circles.

(By the way, there actually are some conceptually useful tools that show up in the algebraic method, but these are far less obvious and the usefulness of these methods isn’t quite clear until you learn a lot more about equations. This is why I say the geometric method is better – the usefulness of this approach is much more clear in this case. Any graph can be analyzed using this line method, whereas it isn’t very clear which sorts of equations can be handled using the algebraic method I used here.)

I hope you can get an idea now of why mathematicians might sometimes try to prove things they already know in new ways. Sometimes, you can learn a lot of new ideas by tackling the same problem in different ways.

The Realm of Integers (Types of Numbers #3)

To kick off the tower of numbers, so to speak, the last article in this series discussed the “basic” numbers – mainly focusing on the positive whole numbers but also including zero. Now, we haven’t quite explored every aspect of the number zero yet, and we ran into problems with subtraction within the so-called “natural numbers”. The goal of our discussion now is to solve our subtraction problem.

Overview of Previous Layer of Numbers (Natural Numbers)

The so-called natural numbers are often denoted by \mathbb{N} by mathematicians. For my purposes, I will say that \mathbb{N} = \{ 0, 1, 2, 3, \dots \} is the full list of positive whole numbers along with zero. (Sidenote: The font here is called “blackboard bold” and, as its name suggests, began to exist because mathematicians needed a way to write a letter in boldface with chalk on a blackboard.) With positive numbers in \mathbb{N} like 1, 2, 3, and so on, we know how to add and multiply already. We also know that adding by zero is the same as “doing nothing”. We have also learned in our schooling how to multiply by zero, but this isn’t actually very obvious, so we will pretend we don’t know how to do that and talk about multiplying by zero later on.

What the Natural Numbers are Missing

Notice what we know and don’t know how to do “easily” with natural numbers: we know how to multiply any two positive numbers, and we know how to add any two numbers. Isn’t is sort of odd that subtraction is missing completely? Plus, we don’t quite know how to multiply by zero. At least, it isn’t obvious how to multiply by zero. As simple as the natural numbers are, it is annoying that such a simple idea as subtraction just doesn’t work here. There is, for example, no natural number that is equal to 2 - 5. We know that 5 - 2 = 3, but we do not have the foggiest idea yet of what 2 - 5 might be. Our goal now will be to reduce the number of things we don’t know how to do.

How Integers Fill in the Gap

The solution, which will might sound weird, is just to define more numbers! In other words, just because 2 - 5 isn’t a natural number doesn’t mean it isn’t a number at all. We will call it an integer (more specifically a negative number) when all is said and done. This new world of “integers” will eventually be called \mathbb{Z} (which comes from the German world zahlen which means “numbers”).

But how can you just invent new numbers? We don’t even know how to add these new numbers. How can I add a “new number” to and “old number”? That doesn’t make a lot of sense… at least not yet. This is a fair criticism, and therefore in order to call these things genuine numbers we have to learn how to combine them with the old numbers. To do this, we need some kind of motivation for making sense of these new numbers in light of the old ones.

The way that negative numbers initially came to make sense to people was through monetary calculations. People knew that when you add a debt, you lose money, and when you subtract a debt, you gain money. This is the exact opposite of how money normally works, since if you add money to an account you now have more money, not less. Over time, people began to realize that it is possible to think of debt as a “different kind of number” – or at least they would pretend it was in order to make their calculations faster. The fundamental rule for making these “debt numbers” work is the following:

If I add a debt of x dollars to my account and then add x dollars to my account, I have neither gained nor lost any money. That is, the net affect on my account was to add 0 dollars.

Let’s write -x as the symbol to stand in for a debt of x dollars. Then what this sentence tells us is that x + (-x) = 0. This should look familiar, as in the world of natural numbers we already know that x - x = 0. So, what we have really done is this:

We have discovered a way to think about an equation we already knew about from a different angle.

Instead of thinking about subtraction (as in the equation x - x = 0) we are now thinking about addition with our “new” negative numbers, as in x + (-x) = 0. This is the definition of adding negative numbers! For example now, we can say that 2 - 5 = 2 + (-5). Since we can determine that (2 - 5) + 3 = 0, we can now actually conclude that in our new number system, 2 - 5 = -3!

To a reader who already has studied negative numbers before, this is relatively straightforward. But remember, we are trying to think about this as if we had never come up with the idea of negative numbers before. I am, of course, moving quite a lot quicker than a teacher would when they teach negative numbers for the first time. I can move faster since this isn’t a classroom, and also because I know most people reading this are probably old enough that they learned about negative numbers sometime in the past. The point I want to make, and which will be made much more explicitly soon, is the concept of extending one world of numbers into another world of numbers.

Making Multiplication Work

Now, the idea of multiplication is defined in the context of positive whole numbers. Multiplication is repeated addition. It makes sense how to calculate 3 \times 5, for instance, because I know what it means to add 3 to itself 5 times. But what can we say now about 0 and negative numbers? What do 2 \times 0, or -3 \times 2, or -7 \times -4 even mean? The idea of repeated addition no longer works as clearly in these situations. So how do we make sense of such multiplications?

Here lies the single most important thing to remember when extending a realm of numbers to a new realm of numbers. Our number one goal must always be this:

Make all the rules that work in the smaller realm of numbers also work in the bigger realm of numbers.

This concept is what we will use to make sense of multiplying by zero and by negative numbers. We have three problems we now have to solve.

Problem 1: Zero times any number

For any positive whole numbers, we have the distributive property a(b+c) = ab + ac. Using our main principle, we would like this to still be true if we make some of these numbers equal to zero instead of a positive number. To do this, we need to mess around with different situations until we learn something useful.

You might first try making a = 0, and then you’d end up with the equation 0 * (b+c) = 0*b + 0*c, but we still don’t really know what any of the three terms are, and we can’t really cancel anything out, so this wasn’t a very useful attempt to understand how multiplying by zero works.

Perhaps eventually you decide to try b = c = 0, so then our equation is a*(0+0) = a*0 + a*0 by the distributive property. Well, this is actually quite helpful since we already know that 0 + 0 = 0. If we use this fact we know, then we have a new equation a*0 = a*0 + a*0. Since a*0 is just a number, we can subtract it from both sides of the equation (since we are allowed to subtract now!) and therefore a*0 = a*0 - a*0. Since any number minus itself is equal to 0, we conclude that a*0 = 0. Since multiplication is commutative (that is, a*b = b*a) we also know now that 0 * a = 0.

This should be very interesting if you’ve never seen it before. Notice what I have done – I used formulas like a*(b+c) = a*b + a*c that are true for positive integers, I then assumed that it would also be true for other numbers too, and I used that to figure out how to correctly multiply by zero.

We will now use the same idea – along with our new knowledge that 0 * a = 0, to discover how to correctly multiply by negative numbers.

Problem 2: Negative number times positive number

We now want to know how to multiply (-a) * b. To the curious reader, I encourage you now: try to use the distributive law to figure out how to multiply the negative number -a by the positive number b.

The idea is exactly the same as with Problem 1. We are going to make use of the distributive law a(b+c) = ab + ac. If we assume that this law is also true when some of the numbers are negative, one of the things we would learn is that a(b + (-c)) = ab + a(-c). The clever step we will use to discover the solution is by setting b = c. When we do this, we learn that a*(b + (-b)) = a*b + a*(-b). Now, since b + (-b) = 0, we now have learned that a*0 = a*b + a*(-b), and by our solution to the first problem, we now have also learned that 0 = a*b + a*(-b). We are allowed to subtract from both sides, and therefore by subtracting a*b from both sides we arrive at

a*(-b) = -(a*b).

Ok, so now we know how to multiply one positive number by one negative number. But before we move on, there will be an extremely useful observation we can make from this. If I decide to look at the situation when b = 1, then I conclude that a * (-1) = -a. If you’ve learned about negative numbers before, then this should feel quite obvious. Really, all of this would feel obvious. But remember, someone had to figure all this out for the first time for themselves. It wouldn’t have been obvious to them. By going through all this work, we retrace the steps of brilliant mathematicians of the ancient past.

This innocuous equation a * (-1) = -a will help us greatly simplify our last equation.

Problem 3: Negative number times negative number

The last thing we need to learn how to do is how to multiply (-a) * (-b). We can massively simplify this task by using the commutative property ab = ba along with the rule a * (-1) = -a that we just developed. Using these,

(-a)*(-b) = a * (-1) * b * (-1) = (-1) * (-1) * ab.

Therefore, all we really need to know is how to multiply -1 * -1, then we know everything else. Without walking through all the same steps again, I will show most of the works in one main string of equations, which the reader should slow down to make sure they understand:

0 = -1 * 0 = -1 * (1 - 1) = -1 * (1 + (-1)) = (-1 * 1) + (-1 * -1) = -1 + (-1) * (-1).

Since 0 = -1 + (-1) * (-1), we conclude that (-1)*(-1) = 1. Therefore, (-a)(-b) = ab.


We have now learned how to add, subtract, and multiply together any two integers, any two numbers in the realm of \mathbb{Z}. We have seen how \mathbb{Z} extends the familiar positive whole numbers into a bigger world that, while it is more complicated in some ways, is actually simpler in some other ways. Next time, we will do the same thing with division that we have just done with subtraction, expanding \mathbb{Z} into a new bigger world where division is (almost) always allowed.

Fundamental Theorem of Calculus, Part 1 (Explaining Calculus #16)

We have recently talked about two concepts that appear rather disparate. We have discussed antiderivatives – which is the idea of reversing all the rules for taking derivatives, and Riemann sums, which are used to calculate the areas underneath complicated shapes. What do these have to do with each other? It does not look like they are very related.

However, they are actually extremely close related – in fact these are essentially the same exact thing! The purpose of our discussion in this part of the series on calculus is to figure out why these two very different concepts are actually just two ways of talking about the same thing.

Functions that Calculate Areas

The first thing to introduce in order to make a connection between functions and areas. To do this, for a previously known function f(x), the so-called area function A(x). We will define A(x) so that the value of A(c) is the area between the y-axis, the vertical line x = c, the horizontal line y = 0, and the graph y = f(x). See the blue shaded region below for an example (for now, ignore the red region, it will matter later).

A most helpful visual for the area-counting function A(x) [1]

The nice thing about A(x) is that it enables us to discuss the idea of area (which is geometric) in the language of functions, which is somewhat foreign to geometry. Functions are very algebraic, and areas are very geometric. As usually happens when we find ways to mix together two different types of math, things become a little bit delicate and tricky, but we gain the benefit of being able to use the toolkits of both algebra and geometry when talking about A(x). And now, since we know how to do calculus with functions, perhaps we can try to do some calculus with A(x). This will be our goal.

In particular, we are going to want to learn how to take a derivative of A(x). This involves studying the red sliver in the image above. Recall that the formula for the derivative of a function is

A^\prime(x) = \lim\limits_{h \to 0} \dfrac{A(x+h) - A(x)}{h}.

In order to understand what the value of this derivative might be, we need to understand to get a grip on the region that is included by A(x+h) and A(x). In order to do this, we will think about the problem from two different angles and put them together to see what we can learn.

Estimating a Sliver of Area (Method #1)

Our first method has essentially already been done. I have stated earlier in our discussion that the good thing about having an object that is both algebraic and geometric is that we can look at it from two different angles and learn different kinds of information from those two perspectives.

This first method is the algebraic method. By definition, A(x) tells us about the area from 0 all the way to the point x, and A(x+h) pushes a little bit further to x+h. If we want to know what the red region is, the region beyond x leading up to x+h, we can just substract the area counted in A(x) from the slightly bigger area counted by A(x+h). In the picture above, we could say

Red Area = (All Area to Left of x+h) – Blue Area = A(x+h) - A(x).

This is one way we could think about calculating the red area.

Estimating a Sliver of Area (Method #2)

The second way we can think about counting up the red area goes back to geometry. Recall that when we wanted to talk about areas underneath complicated-looking curves, we employed the method called Riemann sums, whereby we calculated the area in two steps:

(1) Estimate the area using rectangles,

(2) Take a limit as the number of rectangles becomes infinitely large to make the formula exact.

Well, perhaps we could take a Riemann sum view and estimate the red area using a rectangle. But what would this rectangle be?

In order to calculate the area of a rectangle, we need two pieces of information – we need the length of the base and the height of the rectangle. The base stretches from x from x+h, which is a base length of (x+h) - x = h. Now, the red region doesn’t exactly have one single “height” that we can use, since the top is curved, but the upper left corner is a convenient value to use since we actually know what it is – we know the height in the upper left hand corner is f(x). Therefore, in order to get an initial estimate on the red area, we can take our height to be f(x).

So, we can say that

Red Area \approx f(x) * h.

In line with the ideas that we use with Riemann sums, we could transform this \approx to an equals sign by taking a limit as the width of our rectangle becomes infinitely small. If we did this limit right now, we couldn’t learn very much. The clever thing to do, now, will be to first mix in new information to the equation from our first method and only then take a limit as h \to 0.

Comparing the Two Methods

We have just finished discussing two methods for evaluating the red area. Putting the two formulas we have devised together, we can now see that

Red Area = A(x+h) - A(x) \approx f(x) * h.

Now, since the last two terms form a genuine mathematical formula, we can drop our reference to the red area:

A(x+h) - A(x) \approx f(x) * h.

Now, this is looking pretty good. We can divide both sides by h to solve for f(x):

f(x) \approx \dfrac{A(x+h) - A(x)}{h}.

Now, when we employed our second method, I mentioned that we could take a limit as h \to 0 (i.e. taking the limit as the rectangles in a Riemann sum becomes infinitely thin) in order to transform the \approx into a genuine equals sign. Now is the time to finally do this. If we remember the definition of the derivative mentioned earlier in the post,

f(x) = \lim\limits_{h \to 0} \dfrac{A(x+h) - A(x)}{h} = A^\prime(x).

Therefore, we have arrived now at a rather interesting formula, f(x) = A^\prime(x). What are we to make of this new strange formula?

Importance of the Connection to the Derivative

Let’s pause to remember three things we have learned recently:

  • Remember what A(x) actually is. A(x) is a function that knows how to calculate the area underneath the graph of the function y = f(x).
  • Remember that the only way we know so far to calculate the areas underneath fancy curves is to use the extremely complicated Riemann sum process.
  • Remember that, from our previous discussion on antiderivatives, that F(x) is an antiderivative of the function f(x) if and only if F^\prime(x) = f(x).

Let’s now put this information together. We have just learned that A(x) satisfies the equation A^\prime(x) = f(x). This means, because of the third bullet point, A(x) is an antiderivative of f(x). In our last discussion, we talked about how you can actually derive formulas for antiderivatives by taking advantage of rules we already know that work for derivatives. This means that we now have two ways to calculate A(x): the complicated Riemann sum process and a much simpler algebraic process of reversing derivative formulas. True, it is not necessarily easy to reverse derivatives, but it is much, much more straightforward than trying to actually evaluate a Riemann sum! So we now actually have a way better method for calculating complicated areas – we can use antiderivatives to calculate areas!

There remains still one problem. There isn’t just one antiderivative of a function f(x). The antiderivatives of the function f(x) all look like A(x) + C for random constant numbers C. So, if I calculate an antiderivative F(x) of f(x), how do I know whether I have calculated A(x) or, say, A(x) + 7 or A(x) - 3? The answer is that we need to learn how to evaluate A(0). Since A(0) + C is always different for different values of C, we can tell the difference between all the possible options if we learn what A(0) is.

But what is A(0)? Well, this is one of the places the geometry really helps us. Remember that A(x) is the area underneath a graph from the point 0 up to the point x. So we need to calculate the area between 0 and 0. But that isn’t even an area… that is a vertical line! Lines have no area, and so A(0) = 0. So, we now know what is called the First Part of the Fundamental Theorem of Calculus:

Fundamental Theorem of Calculus, Part 1: If A(x) is a function that tells us the area underneath the graph of f(x) from 0 up to the point x, and if F(x) is an antiderivative of f(x) with F(0) = 0, then

A(x) = F(x).

In words, calculating areas and calculating antiderivatives are the same process.


Part 1 of the Fundamental Theorem of Calculus is hugely important. It is called fundamental for a reason – it is upon the discovery of this massively important fact that calculus really begins to become supremely useful.

Next time, we will discuss part 2 of the fundamental theorem. Up to this point, the fundamental theorem has remained a bit abstract, in the sense that we are still dealing only with variables, and we have required that 0 be our starting point. Part 2 of the fundamental theorem removes these restrictions and enables us to place an actual number on areas underneath complicated curves.

Reference for Image

[1] By Kabel – Own work, CC BY-SA 4.0,

Liar, Lunatic, or Lord?: Overview of Lewis’ Famous Argument

CS Lewis is perhaps the most recognized Christian writer of the twentieth century. From the well-known and loved Narnia books to the more theological and apologetics books like Mere Christianity and The Screwtape Letters, Lewis has captured both the imagination and intellect of his many readers since he began his writing. CS Lewis’ analogies, arguments and stories are frequently quoted because of their clarity and cleverness.

Here, I would like to give an overview of one of the most famous of CS Lewis’ arguments in support of Christianity. This argument is based on comparing the impression that most non-Christians get from reading about Jesus to the perspective that Christians have when we read about Jesus. While it isn’t possible to explore all aspects and nuances of the argument in one article, hopefully this will be a good start.

Background: The Common Non-Christian View of Jesus

Of course, Christians have a very high view of Jesus. This is not surprising, because to Christians, Jesus is God Himself, the perfect sinless man who saved us from our sins. Because this is who Christians believe Jesus to be, obviously Christians view Jesus as a supremely good moral teacher and example for how to live life. As regards figures in other religious traditions, Christians generally agree on some points and disagree on others. You can find significant overlaps between Christian ethics and, say, Muslim ethics or Hindu ethics. And on those points of overlap, Christians will of course agree with Muslims or Hindus. However, there will also be points where these traditions disagree, and of course on those points Christians will tend to voice their disagreement. At the end of the day, then, Christians don’t have a high or low view of other religious figures by default – it would depend on what their teachings are.

As you’d expect, this is generally true of other religions as well. Adherents of, say, Hinduism would be inclined to agree with Christian ethics only insofar as that intersects with Hindu ethics. Same with Muslims, and even for non-Christians. Most secularists, for example, have a view of human rights that is extremely similar to the Christian conception of men and women being made in the image of God. While secularists and Christians have some disagreements about the outworkings of this view, there is basic agreement, and a secularist will support Christianity insofar as it agrees with their view.

There is, however, a very strange phenomena when it comes to the person of Jesus. When you look across the spectrum of religious and ethical perspectives, nearly everyone holds Jesus in extremely high regard. Muslims will say that Jesus is one of the greatest men ever to live, most secular people would also say that Jesus was a very good moral teacher. Jesus’ goodness often winds up in such high regard that, when Christians disagree with those other groups, the other groups have a tendency to think that Christians are misrepresenting Jesus.

Isn’t that odd? You don’t find that level of assumption of goodness in any other religious figure. I cannot think of any other religious figure in history whose teachings are generally both well-known and well-accepted by those outside of the religion. Even people who do not believe that Jesus is God do tend to believe that he was one of the greatest moral teachers in history, even if they also happen to think that his followers have distorted some of those teachings over time.

The Problem

Ok, so what is the issue? This sounds perfectly reasonable at face value. However, when you look at Jesus’ teachings more closely, we run into a problem. Jesus was not just a moral teacher – his mission actually wasn’t even primarily to be a moral teacher. His ministry had two main goals – to proclaim the coming of the Kingdom of God, and to proclaim his Messiahship and His Divinity. Just read the Bible – you find a lot of moral teachings, but you find a lot of what would now be called eschatological preaching and lots of claims to be God Himself. Although often these claims are hard to see immediately from our cultural context, Jesus claimed very clearly to be the God of the Universe. Consider as just one example Jesus’ trial before the Jewish leaders:

But he remained silent and made no answer. Again the high priest asked him, “Are you the Christ, the Son of the Blessed?”  And Jesus said, “I am, and you will see the Son of Man seated at the right hand of Power, and coming with the clouds of heaven.” And the high priest tore his garments and said, “What further witnesses do we need?  You have heard his blasphemy. What is your decision?” And they all condemned him as deserving death. (Mark 14:61-64, ESV)

Notice that the response of the Jewish leaders to his statement is that Jesus committed blasphemy. Why? To understand this requires some Old Testament background. Jesus actually claimed to be God three times in his statement. First, Jesus’ use of I am is an indirect reference to Exodus 3:14, where God tells Moses from the burning bush that I AM is His name. Though perhaps you think this is a coincidence. Very well, what about his claiming to be “the Son of Man seated at the right hand of Power, and coming with the clouds of heaven?” The Son of Man is a figure prophesied in the Old Testament, more specifically Daniel 7:13-14, and the Son of Man is portrayed in that passage as divine – he receives authority over all creation and is worshipped as God by all mankind. Finally, in Psalm 110, we read that “The LORD says to my Lord: ‘Sit at my right hand, until I make your enemies your footstool.” Again, notice the sitting at the right hand, and notice that there are two figures who are Lord in this passage. Jesus is claiming to be one of them.

Thus, Jesus is claiming to be, quite literally, the God of the entire universe. This is a problem for anyone who views Jesus as a good moral teacher because that just isn’t the sort of thing that a good moral teacher would say about himself. If a random person on the street claimed to be God, then you very likely would not label him a good moral teacher. You’d probably call him a cult leader. And yet people don’t tend to say that about Jesus.

The Quadrilemma

When C.S. Lewis presented his argument based on this observation, he began with the proposition that Jesus is a real historical figure and that the Bible correctly records the basic message he taught his followers. While there were some people at Lewis’ time that rejected this. As a professor of English literature and well-studied in reading medieval and ancient writings, he found the idea that Jesus never lived to be completely unreasonable. All the reports you have of Jesus’ having been a real human being – in ancient sources both secular and Christian – the idea that there never was any such person just doesn’t make a lot of sense. It amounts to a conspiracy theory. There are lots of arguments for this point – but I won’t rehearse those now because that takes us too far afield from the main goal. So, for the rest of the argument, we take for granted that Jesus lived and that the four gospels record his teaching correctly (notice that we are not assuming he actually rose from the dead, just that Jesus’ words prior to his death are accurately recorded).

We now have a few well-agreed-upon facts to work off of:

  • Jesus was a real person that lived around the years 0-30 AD.
  • Jesus is recognized by nearly everyone as a good moral teacher.
  • Jesus claimed to be God in the flesh and his followers believed this about him after his death.

Now, we run into a bit of a problem. Let us consider Jesus’ claim to be God. Jesus either believed this was true, or he did not. If he did not believe that he was God but claimed to be God anyways, then this is an abhorrently immoral and deceptive lie. But this contradicts everyone who views Jesus as a good moral teacher, and so doesn’t make a lot of sense. Therefore, the most reasonable assumption to make is that Jesus sincerely believed that he is God. Now, continuing in the logical progression, this is either true or false. Either Jesus really was God, or he was not. If Jesus was not really God, and yet he believed he was God, then Jesus is a lunatic. But lunatics are also not taken to be good moral teachers, and so this also doesn’t make sense. Therefore, the only option that still makes sense is that Jesus claimed to be God, really believed that about himself, and that this is really true.

But this is the message of Christianity! If Jesus is actually God, then Christianity is true. There is no longer any room for debate if we get this far. If you come to the conclusion that Jesus is the God of the universe, then the only reasonable response is to worship and follow him. Notice too that the three bullet points from earlier are all we started with – there are a large number of people who do not claim to be Christians who believe these three things. Any of those people, then, should become Christians by C.S Lewis’ reasoning.

A Deductive Form

To conclude, I will put this argument in a deductive form. In other words, I will summarize the reasoning in the form of a step-by-step process using basic rules of logic and supporting the hypotheses with evidence.

  1. If Jesus was a real person, then Jesus claimed to be God and Jesus is a good moral teacher.
  2. Jesus was a real person.
  3. Therefore, Jesus claimed to be God and is a good moral teacher.
  4. If Jesus did not believe he was God, then Jesus is not a good moral teacher.
  5. Therefore, Jesus did believe he was God.
  6. If Jesus is not God, Jesus was insane.
  7. If Jesus was insane, Jesus was not a good moral teacher.
  8. Therefore, Jesus was not insane.
  9. Therefore, Jesus is God.
  10. If Jesus is God, then Christianity is true.
  11. Therefore, Christianity is true.

Now, let’s take the argument step by step:

  1. This is based on the fact that the cumulative evidence of the Bible, other ancient historical writings, lead us to conclude we have accurate reports on Jesus’ actual teachings.
  2. Every respected professional ancient historian on the planet who studies this era of history believes that the historical evidence for Jesus’ existence is too strong to be rationally denied.
  3. This follows from (1) and (2) by the basic rules of logic.
  4. This is grounded in the basic moral fact that intentional deception of this sort is clearly evil.
  5. This follows from (3) and (4) by the basic rules of logic.
  6. This comes from the basic intuition that a normal human being who believes he is God must either by God or suffer from a severe mental illness.
  7. This comes from the basic intution that people who suffer from severe mental illnesses that cause delusions of this sort, while not necessarily bad people, do not have sufficiently reliable cognitive faculties to be trusted as great moral leaders.
  8. This follows from (3) and (7) by the basic rules of logic.
  9. This follows from (6) and (8) by the basic rules of logic.
  10. This is essentially the definition of Christianity.
  11. This follows from (9) and (`10) by the basic rules of logic.

In this logical reasoning, the only points that need to be supported with evidence are 1, 2, 4, 6, 7, and 10. But even 10 doesn’t really have to be supported, because 10 is a definition. Ever other point can be established by what looks like pretty basic reasoning about either history, reading comprehension, ethics, or psychology. So, it seems like if you don’t agree with Christianity, there is some explaining to be done.

Antiderivatives (Explaining Calculus #15)

In the last post in the series on calculus, we talked about Riemann sums – which is just a fancy way of speaking of estimating the area of a complicated region by using lots of rectangles to obtain an approximately correct answer. The post that I must follow this up with, which for now seems completely out of left field, is the idea of an antiderivative. In one sentence we will be talking about pressing a “reverse button” on the process of taking derivatives. I must apologize to anyone reading through the series one post at a time for the apparently sudden and disconnected change of topic – but I promise that soon, we will discover that Riemann sums and antiderivatives have much more in common than is obvious at face value.

But for now, on to antiderivatives!

Reversing Directions

Imagine you are walking to your friend’s house. You know the directions from your house to your friend’s house are:

  1. Walk north 1 block.
  2. Walk east 2 blocks.
  3. Walk north 3 blocks.

If you carefully follow these directions, then you will get to your friend’s house. Now, what if your friend wants to walk to your house? What would his directions be? If you pause and think about it, all your friend would need to do is to take your directions and “reverse” them. His first step should be to “undo” your last step. So, his Step 1 must be “Walk south 3 blocks” since your Step 3 was “Walk north 3 blocks”. His next step should undo your Step 2, and his last step should undo your Step 1. When he writes all of this down, his steps will be

  1. Walk south 3 blocks.
  2. Walk west 2 blocks.
  3. Walk north 1 block.

Notice the patterns between the two sets of instructions. The numbers of blocks flipped their order (123 in your instructions, 321 in his instructions). Also, every direction became its opposite. In other words, you could actually create an instruction manual for reversing any set of directions:

  1. Take your friend’s directions and reverse the order of every step.
  2. After reversing the order of the steps, turn the compass direction in each step to its complete opposite.

To visualize this, here is how this process changes your directions into your friend’s directions:

Your Directions(Flipping the Order)Your Friend’s Directions
Walk north 1 blockWalk north 3 blocksWalk south 3 blocks
Walk east 2 blocksWalk east 2 blocksWalk west 2 blocks
Walk north 3 blocksWalk north 1 blockWalk south 1 block
The method for reversing your directions

Notice this method will actually always work, for any set of directions you might have. So, what we have done is we devised a process, maybe we can call it the anti-direction process, that reverses any set of directions. If the original directions took me from A to B, then the anti-directions will take me from B to A.

What Does “Reversing a Derivative” Mean?

A set of directions is a method by which we can get from one place to another. We can think of this as an analogy for mathematics. When we do some mathematical process, we can think of that process as having a starting point and an ending point. With a many mathematical processes, it can be really useful to know how to reverse the process. As we shall see later, learning how to reverse derivatives is one of these important processes.

The process of reversing a derivative will be called antidifferentiation, and the result we get after we are done is called an antiderivative. So, to give this a definition, the antiderivative of some function f(x) is another function F(x) that I can find by doing a “reverse derivative” on f(x)… whatever that means.

Luckily for us, we can figure out what that means by thinking a bit more about the analogy of directions. Go back to the table, and look at the columns with your directions and your friend’s directions. Notice that, were I to first follow my directions and immediately after that follow my friend’s directions, I would end up right back where I started. This is why the idea of anti-directions made some kind of sense – they literally cancel out the normal directions.

Mathematically, the idea of cancelling out pops up all over the place – subtraction cancels addition and division cancels multiplication. If we want our idea of an antiderivative to make sense, then derivatives should cancel out an antiderivative. In other words, if I take the antiderivative of f(x) and land at the as-of-yet-unknown function F(x), and if I then take the derivative to get F^\prime(x), then everything should cancel out and I should end up back where I started.

This can be used for a much more clear definition for this idea of antiderivatives:

Definition: F(x) is an antiderivative of f(x) if the equation F^\prime(x) = f(x) is true.

Very often, mathematicians use the rather strange looking symbol F(x) = \int f(x) dx to represent antiderivatives. There is a reason why we write it like this, but we haven’t yet gone far enough into the world of antiderivatives to explain why this makes sense. So, for now, take me at my word that this is a sensible way to abbreviate antiderivatives.

A Potential Problem with Antiderivatives

Our goal is to reverse the process of the derivative. If we were lucky, there would be exactly one way to do this, no matter where we start. Unfortunately, this is not quite true. This is because some functions have the same derivative, even though they are not the same function.

Take for example F(x) = x^2 + 3 and G(x) = x^2 + 7. Using the rules for taking derivatives, we can determine that F^\prime(x) = G^\prime(x) = 2x, and yet F(x) \not = G(x). This is a problem because, if I ask you how to find the antiderivative of 2x, which we also write as \int 2x dx, how could I know whether the answer if F(x), or G(x), or maybe something else altogether?

This is a problem – but fortunately there is a solution. We know the solution because we know exactly how much ambiguity exists when trying to reverse the derivative. The following fact explains this:

Fact: If F(x), G(x) are both antiderivatives of a function f(x), then F(x) - G(x) is a constant.

Proof: Recall that F^\prime(x) = G^\prime(x) = f(x) by the definition of “antiderivative”. Using the equation F^\prime(x) = G^\prime(x), we can arrive at a new equation F^\prime(x) - G^\prime(x) = 0. Rewriting this slightly,

\left( F(x) - G(x) \right)^\prime = F^\prime(x) - G^\prime(x) = 0.

This means that the function F(x) - G(x) has a zero derivative. The only functions with zero derivatives – that is, functions that have a completely flat slope everywhere – are constant functions, whose graphs are horizontal lines. So, F(x) - G(x) must be constant.

Ok, so now we know something. If I know that F(x) is an antiderivative of f(x), then so is F(x) + C for any possible constant number C that I might choose, but at least I now know that this is really a full list of the possibilities. So, we might say

\int f(x) dx = F(x) + C.

If we need a full formula for the antiderivative, then we would need some extra information in order to discover the value of C. When application problems are discussed, we will see how this is done. In the meantime, the important detail is to know that, up to this possible +C, we know that the antiderivative is a thing that makes sense.

How do we find a formula for this function F(x)? There are a variety of techniques mathematicians use to handle antiderivatives – but for the sake of understanding what these things are, it will suffice to see how it works in the simplest case we can work with. Later on, once we have a better grip on these things, we could learn how to do more complicated situations.

Main Example: Power Functions

The simplest type of function that we know of are polynomials. And among polynomials, the simplest are those like x^n for n some whole number. For this situation, we will walk through the process of how you might figure out how to calculate F(x), where f(x) = x^n. Remember that the process of the derivative would result in the calculation f^\prime(x) = n x^{n-1}. If you don’t remember why, that’s ok. The way we want to visualize this is as some sort of process which starts with x^n and ends with n x^{n-1}.

This is helpful because we know that, whatever antiderivatives do, it should be true that \int n x^{n-1} dx = x^n (there should be a +C, of course, but I will generally leave that off unless it is important). So, our question boils down to trying to discover some kind of rule that, when we start with n x^{n-1}, we end with x^n.

Here, the idea of reversing the directions is helpful. Recall from the introduction that, if you are reversing directions on a map, you must do two things:

(1) Reverse the order of all the steps.

(2) Do the reverse of each step.

We might think of our effort to find an antiderivative like this as well. If you think about the process x^n \to n x^{n-1}, which represents the derivative, we really have “two steps” going on. First, you multiply the whole expression by the exponent, so x^n \to n x^n is the first step. The second step is to then subtract one from the exponent, so this would take n x^n \to n x^{n-1}. So, the whole process might be visualized by the two arrows:

x^n \to n x^n \to n x^{n-1}.

In order to figure out an antiderivative formula, we would like to run these arrows in the opposite direction. That is, we’d like

n x^{n-1} \to n x^n \to x^n.

Now, let’s think about what each of these arrows would mean. The arrow n x^{n-1} \to n x^n is meant to be the reverse of n x^n \to n x^{n-1}, which was to subtract the exponent by 1. So, we should be adding one to the exponent instead. Similarly, the arrow n x^n \to x^n is the reverse of the arrow x^n \to n x^n, where the rule was to multiply by the exponent. So, to reverse this rule, we should instead divide by the exponent.

This now gives us a set of rules that, if we do them right, should result in an antiderivative recipe. Now, let’s try it out on x^n. The first step must be to add one to the exponent, and the second step to then divide by the exponent. If we do this process (for n x^{n-1} and x^n simultaneously, so the parallels can be seen more clearly) we arrive at

n x^{n-1} \to n x^n \to x^n,

x^n \to x^{n+1} \to \dfrac{1}{n+1} x^{n+1}.

So, hypothetically, the function F(x) = \dfrac{1}{n+1} x^{n+1} should be an antiderivative of f(x) = x^n. To double check that we are really correct, we can take the derivative directly:

F^\prime(x) = \dfrac{1}{n+1} \dfrac{d}{dx} \left[ x^{n+1} \right] = \dfrac{1}{n+1} \left[ (n+1) x^n \right] = x^n.

Alright – so we have our formula! You can do something similar for other types of functions we’ve run into – for example, \int e^x dx = e^x and \int \dfrac{1}{x} dx = \ln(x) (in this case, whenever the input is a positive number).


In this post, we’ve focused on mapping out the beginnings of the idea of an antiderivative. As of yet, we haven’t discussed why this is so important. But now that we have the concept laid out, we can move on and next time discuss why antiderivatives matter so much. In doing so, we will connect the two most recent posts in the series – on antiderivatives and Riemann sums – in a way that is quite surprising.

The Natural Numbers and Zero (Types of Numbers #2)

I now begin the escapade into different sorts of numbers. This may seem rather a strange adventure to anyone who hasn’t listened to too many professional mathematicians talks, or anyone who is too far removed from high school education. I would guess that most people who have read this have some particular picture of number in mind – likely one of the kinds of number I will talk about in this series. And if each person thinks themselves to have the correct notion of number – each and every person is wrong. For there is not one correct notion of number in modern mathematics. There are several – it all depends on context. This will be explained as the series goes on.

The Historical Starting Point: Natural Numbers

What are natural numbers? If you know what whole numbers are, you already have the right start. If you don’t remember, rest assured; these are the most intuitive numbers for humans, because they are the numbers of counting. One, two, three, and so on- these are the natural numbers. Simple enough, isn’t it? So one would think.

And yet, as we shall see as we continue onward into ever deeper realms of number, that the natural numbers only occasionally live up to their moniker of ‘natural’. Sometimes they really are the most sensible of numbers to think about, but sometimes they are not. Consider the following quotation…

“God made the integers; all else is the work of man.” – Leopold Kronecker

This is rather strange. Integers are not natural numbers! In fact, if you happen to already know this, the natural numbers are just about half of the integers. That may seem rather strange – and this strangeness will be addressed later.

The main point that given such a famous quote from such a famous mathematician, it is surprising that I don’t begin with the integers as my object of study. If Kronecker, who is by far my superior in ability, then why not adopt what appears to be his starting point? The reason for this is historical. For a great many people prior to Kronecker’s life, the “natural numbers” were far more ‘natural’ than the integers that Kronecker considers to be fundamental. It is a matter of mathematical history that it took quite a long time for the idea of “negative numbers” – which make up half of the integers, to show up on the scene as genuine numbers. Even zero, which is an integer, but not negative, took some time to show up on the mathematical scene.

Given that both zero and negative numbers took a long time to show up in mathematics, why include zero in the very first discussion of types of number? That is certainly a valid question. I imagine that a professional historian of mathematics would likely introduce zero separately from other numbers – because historically, this is highly significant. Modern readers would likely be highly surprised by how long it took humans to come up with the idea of zero as a number. And yet, as a mathematician myself, I think it is convenient to include zero in my first post about the nature of numbers, despite its historical oddities. There is, in the development of integers, three stages – the positive integers, zero, and then the negative integers. Due to the convenience of the modern perspective, it is easy to wrap zero into the modern perspective. It is, in my view, a remarkable accomplishment of the human intellect that zero need not be introduced as a topic in and of itself.

Anyways, on to a discussion of the so-called natural numbers.

The Positive Whole Numbers

The positive whole numbers are the basic numbers we use to count. These are 1, 2, 3, 4, and so on. The positive whole numbers are built upon the idea of “one after another” – there is the number 1, then the next number, then the next, and so on. As we shall see, this is actually not a concept that number systems always have. Even more special to the positive whole numbers is that there is a first positive whole number – the number 1. Fundamentally, the fact that the words first and next both make sense essentially define what the positive whole numbers are. The first means we have a starting point, the next means we have a list, one item after another, running off to infinity.

With the positive whole numbers, there are two basic operations we can perform. We can add two positive whole numbers together to obtain a new positive whole number, and we can multiply two together to get a new positive whole number. In some cases, you can also subtract and divide, but this doesn’t always work. The reason is that, sometimes, the subtraction of two positive whole numbers cannot be a positive whole number, and whole numbers divided by whole numbers might not be whole numbers. The subtraction 1 - 2 and the division \frac 12 would be examples of this problem.

The solution to these problems will come in the future, where we will expand our definition of number in order to solve these complications. But for now, as long as we live in the world of positive whole numbers, subtraction and division should be viewed with some degree of suspicion.

The Number Zero

Along with the positive integers, we have the number zero. Zero is not positive, but neither is it what we will eventually call ‘negative’. Zero is, so to speak, the number that represents nothing at all.

Now, the number zero has two very special properties that deserve a pause. The first is that x + 0 = x, that is, that x - x = 0, no matter what the number x is. This is part of what defines the number zero. The second – which turns out to be precisely because of the first – is that x * 0 = 0 no matter what x is. Perhaps you’ve never had this identity explained before, but the logic behind it is actually pretty easy to lay out. We start by noticing that 0 = 0 + 0. So, if we multiply both sides by x, then x * 0 = x * (0 + 0). We can then use the distributive law to see that x * 0 = (x*0) + (x*0). Subtracting x*0 from each side, we conclude that x*0 = 0.

We also know that division by zero is forbidden in all mathematics – and it is the fact that x * 0 = 0 that explains why division by zero is not allowed. Since division doesn’t really fit into a discussion of whole numbers in a nice way, we will leave the problem of dividing by zero later on when we focus more on division. The curious reader, however, could quite possibly find the solution for themselves from the equation x*0 = 0.

A Note About Zero

To a modern reader, it might seem like zero comes along pretty naturally with the other whole numbers. This is not so. The discovery of the concept of zero is, in reality, one of the major shifting points in the history of mathematics and took a long time. To demonstrate how difficult it was to conceptualize the number zero, I think it will suffice here that when someone finally did come up with the idea of zero, it wasn’t even treated as a number yet. Zero was at first mainly a positional idea – this means zero was acceptable to use to tell apart 13 from 103 and 1003, but “0” didn’t have any meaning yet in and of itself. This took quite a long time – and we should pause and appreciate the conceptual difficulty of actually treating “nothing” like a number.


The numbers 0, 1, 2, 3, … are the beginning of the journey through the many kinds of numbers that mathematicians think about. They have the advantage of being very simple to understand, but with that advantage comes the disadvantage that we can’t do as much with them – we can’t even really understand subtraction yet because subtractions like 1 - 2 do not land in the world of natural numbers, and division ends up having even worse problems. In order to solve these problems, we will have to expand our concept of number. As we do so, the numbers will become gradually more complicated, but we will gain the advantage of being able to do more and more with them. This is the dynamic that always exists within number systems – the “big” systems will be hard to use but very versatile, and the small ones are easy to use but very limited in their use. As this series continues, take time to notice this pattern.

Appendix: Peano’s Axioms of Arithmetic

Between roughly the time periods 1850 and 1950, the study of mathematical logic or the foundations of mathematics was in vogue. The goal of this study was to make extremely clear what exactly we mean by foundational concepts like numbers and groupings of numbers. Since the counting numbers are among the most basic concepts in mathematics, this was a natural place to begin the attempt the process of axiomatic systematization.

But what exactly is this process? I find it helpful to imagine the process of teaching mathematics – either to a person or to a computer – as helpful. When you do this sort of teaching, there are two basic sorts of things you have to do – you have to teach someone what sorts of things you are working with, and you have to give them rules for how to work. For something like multiplication, you might be told that multiplication is like ‘repeated addition’, that you do multiplication with numbers, and then you are given some rules of how to multiply numbers together.

This is something like what axioms are. Axioms are meant to be a completely non-ambiguous set of rules for working within some framework. Because we want these rules to be non-ambiguous, many axioms (once understood) seem so obvious as to not warrant saying out loud. But this is exactly the point – we want to check our intuition by making everything as simple as possible and building up from the foundation.

Giuseppe Peano came up with the first well-known set of axioms that are designed to define the numbers 0, 1, 2, 3, 4, and so on – the natural numbers. Here are the axioms of so-called Peano arithmetic (as taken from Wikipedia, because I like how they are presented there):

  1. 0 is a natural number.
  2. For every natural number x, x = x.
  3. For all natural numbers x and y, if x = y, then y = x.
  4. For all natural numbers x, y, and z, if x = y and y = z, then x = z.
  5. For all a and b, if b is a natural number and a = b, then a is also a natural number.
  6. For every natural number nS(n) is a natural number.
  7. For all natural numbers m and nm = n if and only if S(m) = S(n).
  8. For every natural number nS(n) = 0 is false.

These axioms may look a bit confusing. I’ll give a quick explanation of what they actually mean.

(1) This identifies the “first number” – which is zero.

(2-4) These axioms clarify the meaning of the equals sign =. (2) is called the identity property, (3) is called the reflexive property, and (4) is called the transitive property.

(5) This makes clear that things that are equal are either both natural numbers or both not natural numbers.

(6) The symbol S is often called the “successor”. You should think of S(n) as n+1. So, what (6) says is that if n is a natural number, then so is n+1.

(7) The equations m = n and m + 1 = n + 1 are essentially the same equation.

(8) The value n+1 is never zero to zero if n is a natural number (since negatives are not natural numbers).

Why axiomatize something as simple as natural numbers? Doesn’t this make things too complicated?

Well, yes and no. Professional mathematics went through a very important phase in which they focused for over 100 years on setting up the “foundations” of mathematics. One reason this is so important is because it helps us really understand which ideas about whole numbers are basic and which are actually built up out of more basic pieces.

Here is an example: it is pretty intuitively clear that m + n = n + m. However, notice that this is not actually stated anywhere in the 8 axioms. This is because you don’t actually have to – you can build up the proof of that equation from these 8 axioms along with a method of proof called induction (see my post here on that method). It is interesting that something as intuitively simple as m + n = n + m is not actually something we have to assume to be true in advance – we can build that truth out of still more simple truths. It is one of the many goals of modern mathematics to more fully understand what is and is not one of the ‘simple truths’ and what happens when we limit ourselves to smaller (or expand to a larger) set of ‘simple truths’.

The Kalam Cosmological Argument: A Brief Overview

In a previous post, I have discussed the intuition behind cosmological arguments and gave several examples. One of these examples was the Kalam cosmological argument. By way of overview, a cosmological argument is, broadly speaking, one that reasons from facts about the universe we observe and metaphysical principles to argue that God, or at least a being very much like God, must exist. The Kalam cosmological argument lies on the nature of causes, beginnings, and both philosophical and physical angles on these two ideas. I would now like to expound my favorite formulation of the Kalam cosmological argument and provide a rough sketch of how to justify the key steps in the argument.

The Argument

The Kalam cosmological argument comes in two stages. Stage one is a deductive argument in the modus ponens form:

  1. Whatever begins to exist has a cause.
  2. The universe began to exist.
  3. Therefore, the universe has a cause.

To be clear on definitions, when we say universe, we mean something like “everything that consists of space, time, matter, or energy” or “everything that can be described in some way by physics.” So if you think there is a multiverse out there, that counts as “the universe” here. We could also replace (1) with “If the universe began to exist, then the universe has a cause” since that would be sufficient to imply (3), but there are compelling reasons to believe (1) in this form, and so we will stick with this. Finally, by “cause” we mean what Aristotle called an efficient cause. An efficient cause is roughly the source of an event. For example, if you have a painting, there is one sense in which the particles in the paint can be said to cause the painting. But this is not what we mean by cause here. The efficient cause of the painting is the painter. The efficient cause of a newborn baby is the baby’s parents, not the atoms that make up its body. The efficient cause of a book is the author, not the paper it is printed on. This is the sort of cause-effect relationship we are talking about here.

Now, if stage one is successful, we now know that the universe has a cause. We now enter stage two – understanding what this cause is like. It might seem like we can’t find anything out, but knowing that something is the cause of the universe may imply certain features about that being. For example, we the cause of a particular book must know how to speak the language the book was written in, and based on experience we know that cause is probably an adult human being. We can also learn more about the author from the contents of the book – if the book is about science, then maybe they are a scientist. This sort of reasoning will make up the second stage of the Kalam cosmological argument.

Stage 1: The Deductive Portion

The first stage of the Kalam cosmological argument involves the deductive steps listed above. Premise 1 says that whatever begins to exist has a cause, Premise 2 then tells us that the universe began to exist, and so we conclude that the universe must have had a cause. In order to check the correctness of the first stage, we should convince ourselves of whether or not the two premises are true, and whether the conclusion logically follows from the premises. These three aspects we need to check are discussed in order.

Premise 1: Whatever Begins to Exist Has a Cause

This premise is the less controversial of the two on the level of philosophy. There are a wide variety of evidences for the truth of the statement “Whatever begins to exist has a cause.” We now consider a few of them

Evidence 1: Our common experience always verifies and never contradicts this statement. If we see a chair, well we know someone made that chair. If we see a person, we know that person has parents. If we see writing on a wall, we know someone wrote it. When we see a social movement begin, we know that certain people began it. Things don’t start for no reason in our common experience – when something starts, there is always a reason why.

Evidence 2: The scientific method presupposes the truth of this. In particular, when we do science, we assume that events we can observe actually do have explanations for why and how they happen. Science is the search for those explanations. By assuming in advance that there is such an explanation, we are assuming that things that begin have causes. When the apple falls from the tree, the beginning of the fall, and in fact the entire process of the fall, must have some sort of cause – which we now call gravity. This is again and again the approach of science.

Evidence 3: Suppose this is actually false – that would mean that certain things can spring into existence for absolutely no reason. If that is so, there is nothing to prevent this from happening all the time, everywhere, with any sort of object. If certain things can arise from absolutely nothing, then it seems like anything and everything could arise this way – after all, nothingness can’t tell the difference! So if this really were possible, why don’t we see horses and villages and basketballs constantly appearing in our homes and workplaces from nothing? The obvious answer would be that things always begin to exist for some reason, and there is no reason or cause why a horse should appear from the void next to you while you read this.

In summary, we find that universal human experience constantly affirms Premise 1. If Premise 1 were false, lots of strange things should be expected to happen and we would have to throw out the entire scientific enterprise. So, it doesn’t make a lot of sense to deny Premise 1.

Premise 2: The Universe Began to Exist

This is more controversial. How could we possibly know that the universe began to exist? We run the danger of potentially begging the question here – we can’t appeal to the Bible or Qur’an for instance because the truth of those accounts would rely on God’s existence, which is the truth we are trying to establish. So, we can’t rely on that sort of argument. How, then, might we argue for this perhaps very odd sounding idea that the universe is only finitely old?

There are in fact several ways we can do this. I will go through six – three that come from the area of philosophy called metaphysics, and three that come from contemporary science. The three metaphysical arguments all attempt to show that the idea of an infinitely long past doesn’t make sense given other things we know. The scientific points reason from a scientific fact we know to the conclusion that the universe must have had a beginning in time.

Evidence 1: In mathematics, the abstract concept of a really infinite number of things is well-understood. We know what it would be like for a really infinite number of things to exist – and it completely confounds all intuition. The thought experiment called Hilbert’s Hotel illustrates very well the problems that arise. In later discussions about the Kalam, I will go more deeply into this paradox, but for now let’s look at just one problem we run into. Imagine that a hotel exists with infinitely many rooms with room numbers 1, 2, 3, 4, and so on, and that this hotel is completely full at the moment. If a new guest shows up, then the hotel can actually still accommodate that guest, even though the hotel is full! For if we move the person in room 1 to room 2, the person in room 2 to room 3, and so on forever, then nobody was kicked out of the hotel, and yet room 1 is now available! This is a rather odd consequence which defies the way we know the physical world ought to work. So, it would be entirely reasonable to say this kind of situation cannot happen.

Then, because really existing infinities don’t make any sense, and an infinite past would be a really existing infinity, then an infinite past makes no sense.

Evidence 2: Maybe you are willing to bite the bullet on the oddities that arise from actually existing infinities. But there are additional problems, since an infinite past is more than just actually existing infinity – it is an actually infinite sequence in time. In order for the past to be infinite, that would have to mean that, by traversing from one moment to the next, we have crossed over an infinite amount of time. But this is extremely problematic – this is like counting to infinity. How could you ever end? To see the problem, imagine that a man has existed forever and he has always been counting down towards zero. You walk up to him today, and find him count 5, 4, 3, 2, 1, 0. Finally, he is done with his count.

But hold on – why did he finish today? Why not yesterday? After all, an infinite amount of time had already passed yesterday, and an infinite amount of time is enough to count down from infinity. Why did he not finish two thousand years ago? Two million years ago? The same reasoning still applies. It appears to be completely inexplicable why our infinitely old man should finish his count on any particular day. But, as we’ve already discussed, there should really be a reason why he finishes on one day rather than another. Therefore, it makes the most sense to say that, even if a real infinite can exist, you can’t build one by successively adding one thing after another to it. Since an infinite past would be infinite in virtue of moments being added to time one after another, this rules out an infinite past.

Evidence 3: This is similar to, but not quite the same as, the second piece of evidence. The previous piece of evidence states that you can’t arrive at infinity by proceeding one step at a time, so to speak. This piece of evidence, instead of thinking of the forwards direction, views the backwards direction. It is quite commonly believed that infinite regresses are bad. If you start asking “Why?” over and over again, you should eventually reach a sort of bedrock answer that doesn’t require any further explanation. But if this is so, then we can ask “Why?” successively about the status of the universe – and if such sequences of cause-and-effect questions must always reach an end (so that we don’t have an infinite regress), then the events in the universe must all trace back to a single point – the beginning of the universe.

We have now covered some philosophical evidence that points out major conceptual difficulties in the idea of an infinite past. We now move forward to scientific difficulties with the idea of an infinite past.

Evidence 4: Edwin Hubble is famous for discovering what is called Hubble’s Law, which says that galaxies are moving away from our galaxy at rates proportional to their distance from us. Hubble was able to reach this conclusion by looking at the red shift of these galaxies, which is a measure of how the speed of those galaxies relative to ours slightly stretches out the light waves via the Doppler effect (think of the sound of a fast-moving car before and after it passes by you). The explanation for this observation is that the universe is expanding in all directions, analogously to a balloon being inflated. Physicists have since accumulated a body of evidence, including theorems by such esteemed physicists as Stephen Hawking and Roger Penrose, that lead very directly to the conclusion that a finite amount of time ago, the universe was infinitely small and sprang into existence in “the Big Bang.” This, of course, would mean that the universe as we know it began to exist. Even if you hold to an inflationary multiverse theory, as many do, a theorem by Vilenkin and others effectively rules out an infinite past for multiverses as well.

Evidence 5: The Big Bang theory predicts that the universe started off in a fireball. If this were so, there should be some leftover radiation from that fireball spread more or less uniformly throughout the entire universe. This leftover radiation, now called the cosmic background radiation, has been discovered and measured to high degrees of accuracy and found to match exactly the standard Big Bang theory, which of course tells us that the universe began to exist.

Evidence 6: One of the most foundational and best established scientific laws of all time is the Second Law of Thermodynamics. Intuitively, the second law tells us that over time, disorder increases. This is exemplified in the wear-and-tear on human structures as well as erosion in natural structures, as well as the aging process for all animals. Things break down over time – this is the second law. Now, if the universe has existed for infinitely long, then shouldn’t there be, so to speak, as much disorder as possible? An infinite amount of time should, if the second law is true, destroy all order and leave the entire universe in a state of complete disorder. We are obviously not in that condition at the moment – a state of complete disorder is embodied by the emptiness of outer space, and there is certainly a lot of order and organization of the universe into galaxies, solar systems, and so on – all of which don’t exist in a totally disordered condition. Therefore, the second law of thermodynamics would then predict that the reason things like planets, stars, and human beings still exist is because there has only been a finite amount of time for the universe to run down, and there is still enough order left for us to exist.

By way of summary, we have the following six pieces of evidence for Premise 2:

  • 1: It doesn’t seem like an infinity of things can really exist.
  • 2: You can’t actually form an infinite by adding on things one after another.
  • 3: Infinite regresses of cause and effect are impossible.
  • 4: The expansion of the universe points to a Big Bang event.
  • 5: The cosmic background radiation points to a primeval fireball, and thus to a Big-Bang event.
  • 6: Matter always becomes disordered over time, and since we are not yet at maximum disorder, it must be that the universe has only had a finite amount of time to move from order to disorder.

To me, this is a strong case – at least strong enough to suspect that probably the universe began to exist. Perhaps you don’t think this is totally conclusive – but remember, it doesn’t have to be 100% beyond-any-possible-doubt proof in order to be accepted. If the evidence is convincing enough to conclude that it is more likely than not Premise 2 is true, then it is completely reasonable then to accept Premise 2 as true. You might, of course, revise that decision later or come to believe it even more firmly as you obtain more and more evidence – that is to be expected. But with this body of evidence, I think it is quite reasonable to conclude that the universe did, in fact, begin to exist.

A Brief Remark: Note that I am not here trying to prove anything, say, about the book of Genesis, or the creation accounts in the Qur’an, or any other religious tradition. The claim that the universe began to exist can be evaluated independently of any religious tradition, and that is what we have done here. Whether or not Genesis or the Qur’an tell the right story would be a completely separate discussion.

Premise 3: The Universe Has a Cause

There isn’t much to this point. The final premise is a deductive conclusion from the first two via the logical rule of modus ponens. This rule states that if we know “if A, then B” and if we also know A, then we know B. There may be a bit of confusion seeing the fact that Premise 1 in our argument is actually of the form “if A, then B”. If this is difficult, then rephrase Premise 1 as “if a thing begins to exist, then that thing has a cause.” If we know Premise 2, that the univese began to exist, then the universe is one of the thing that Premise 1 would tell us must have a cause. This makes the Kalam cosmological argument deductively valid.

Therefore, if we know both Premise 1 and Premise 2, we must accept that the universe has a cause.

Stage 2: The Role of God

But what sort of cause does the universe have? This is the goal of stage 2 of the Kalam cosmological argument. If we know that the universe must have had a cause, what sort of thing could have caused the whole physical universe to spring into existence? Perhaps the idea of “a cause of the universe” seems rather abstruse and hard to grasp, but we can actually do some conceptual analysis and begin to get a grip on what such a cause must be.

How is this so? Well, to begin recall what we mean by universe – we mean by that everything that depends on time, space, matter, and energy for its existence. This means that whatever our cause of the universe is, that being must have caused everything inside spacetime to exist. This means this being itself cannot be made up of that sort of thing – the source of all space, for example, cannot be spatial. That would mean that space is the source of all space, which would mean that space existed before space… which doesn’t make a lot of sense. So it seems only logical to conclude that whatever created the universe must exist beyond space.

For a similar reason, this being must exist beyond time (or in the very least must exist beyond time “prior” to creating time). This is for the same reason as before – the source of all time cannot be temporal, for there wasn’t any time yet, so to speak. So, our cause must be timeless. For again the same reason, our cause must be immaterial – the cause of all material cannot be material itself.

Another rather straightforward point is that whatever caused the universe must have enough power to cause universes. I think it is rather obvious that this is quite a lot of power – certainly no living human being has that sort of power – and so I think it is fair to say that the cause of the universe must be extraordinarily powerful.

So far, we’ve arrived at a “cause of the universe” that exists beyond all space, time, and matter, and possesses an almost incomprehensible amount of power. That sounds somewhat like the idea of God – after all, God is understood by all major religions to exhibit such qualities. But it would be a quite fair point to say that this is not yet a convincing argument – because an extremely important idea about God is yet to be seen. God, as viewed by all major religious groups I know of, is personal – for our purposes, by that I mean that God has the ability to choose one thing over another. For the sake of simplicity, let’s call all of this free will – after all, that’s what mostly how we think of free will – as the ability to choose.

Much can be said for why the cause must be personal. By way of summary, I will make only one point. We already know the cause of the universe has always existed. Now, if a mechanistic process has existed forever, you’d expect the product of that mechanistic process to also exist forever. Since the universe hasn’t existed forever, but the cause has existed forever, the cause of the universe can’t be mechanistic. When we then slow down and think about what kinds of cause-and-effect processes exist that are not mechanistic and do not depend on a preexisting material universe (since, remember, the universe does not create itself), the only solution I’ve ever heard that works is that the cause must be the free will of an immaterial mind. Free will clearly allows for something that has always existed to produce a cause that begins to exist. So, it makes a lot of sense that the cause of the universe can truly be said to be a Creator of the universe.


A lot of details have been skimmed over, but this ought to serve as a good jumping off point for thinking about the Kalam cosmological argument in greater detail. In future posts on the blog, I will go into the various points of the argument in much greater detail and provide lots of academic resources from both sides of the discussion on this very interesting argument. But for now, I hope I have given a general idea of why this argument has caught the attention and interest of so many people.

Riemann Sums and Areas (Explaining Calculus #14)

At this point in calculus, we are taking what appears to be a sharp turn in a direction completely away from what we’ve been doing so far. We now turn to talking about areas.

The Problem

Geometry is one of the most important branches of all mathematics, both theoretical mathematics and applied mathematics. When middle or high school students take classes in geometry, one of the most important focus points in those classes is calculating areas and volumes. In geometry class, the first thing you will learn is how to find the area of a triangle and of a rectangle. After that, you’ll probably learn how to do a circle. Everything else that has easy area formulas is some sort of shape that you can break down into pieces that involve triangles and rectangles.

But what about other areas? What if we want to know the area underneath, say, a parabola? What about other more complicated shapes? How might we calculate complicated areas?

The Idea of the Solution

The idea behind solving the problem of calculating complicated areas is called the Riemann sum, named after Bernard Riemann, one of the most important mathematicians perhaps of all time, certainly of the last two hundred years. The general strategy is one that is employed many, many times in many, many different areas of mathematics. The strategy basically goes as follows.

  1. Find something that we understand well that can approximate something we don’t understand well.
  2. Use that simpler thing to approximate the harder thing.
  3. Using limits, make the approximation “infinitely close”.
  4. Figure out what this “infinitely close approximation” really is.

Step 4 in this process will not be addressed in this post – it will come later. Here, we will be addressing Step 1-3, in two main stages. Stage 1 will deal with 1-2 in learning how to estimate areas of complicated shapes using much simpler shapes (namely, rectangles). In Stage 2, we will explain how to apply a limit to this approximation idea, and why this approximation really does become “infinitely close”.

Stage 1: Estimation

We have set out on a task of estimating the area of a complicated region. How might we do this? Well, if we are taking the strategy seriously, we should focus on using a very simple area. We might as well start with the simplest area formula of all – the area of a rectangle, base times height. Now, if we had a weirdly-shaped region, could we use rectangles to get an area that is approximately right? If you are reading this and haven’t heard of this concept before, try it out for yourself. Draw a circle on a piece of paper, and try to use some rectangles to get a feel for approximating the area of that circle. Or you could draw any 2-dimensional region – try using rectangles to approximate that region.

You may have come up with lots of different approaches – and that’s good! That is exploration. Here is what mathematicians have settled on as their standard method of approximating the area under the graph of a function, say y = f(x):

A visual approximation of areas underneath a graph, from

Notice the method – we’ve made the base of each rectangle lie along the x-axis, and all the rectangles have the same size base. This has the advantage of being very easy to set up the bases of each rectangle. The disadvantage is that it isn’t exactly clear how to set up the height of the rectangle. In this picture, the height of the rectangle is the height of the function f(x) in center of the base. To see this, in each rectangle, draw a vertical line down the middle of each rectangle. You will see that your vertical line will touch the red curve at the top (or bottom) of the rectangle.

With such a choice of the height of the rectangle, we can pretty easily set out the formula for the area of each rectangle. If the rectangle has corners at x_1, x_2 on the x-axis, then the formula for the area of this rectangle is

\text{Area} = \text{Base} \times \text{Height} = (x_2 - x_1) \times f\left( \dfrac{x_1 + x_2}{2}\right).

If we want to find the area under f(x) between, say = -0.5 and x = 0.5, then we could lay out a number of rectangles in this fashion between these two x-values and add up all the areas of the rectangles. In the image above, this has been done with 10 rectangles. The area of the 10 rectangles is calculated to be 0.187358, and the “actual area” (which we don’t know how to calculate for ourselves yet) is about 0.193198. These two numbers are pretty close together – so our approximation does seem to be working. You could convince yourself visually that, if we had used 100 rectangles instead of 10, our approximation would probably have been a great deal closer. Then, you’d expect, the more rectangles we use, the closer to the right answer our estimation should be.

This is exactly the idea that calculus allows us to take advantage of.

Stage 2: Limits

In calculus, we have the tool called the limit, which allows us to take a variable we care about and allow that variable to “approach” some limiting point without ever having to be literally equal to that limiting point. Although this hasn’t been used in quite this way yet, we can use the concept of a limit to allow a variable to approach an infinite value. When we understand how this works, we can use this idea of “limits to \infty” to calculate exact areas.

Up to this points, we mainly worked with limits going towards finite numbers. For example, the definition of the derivative f^\prime(x) is written as

f^\prime(x) = \lim\limits_{h \to 0} \dfrac{f(x+h) - f(x)}{h}.

This expression has a limit as h approaches zero. What this means is we should think about the trend of the value \dfrac{f(x+h) - f(x)}{h} for values of h like h = 1, 0.1, 0.01, 0.001, 0.0001, \dots. The value of the limit is equal to whatever number these values are trending towards. For example, if the list of numbers we got was 2, 1.5, 1.1, 1.01, 1.001, 1.0000001, \dots, then the limit would end up being exactly equal to 1.

So this is how limits to specific numbers work. But what about limits going to \infty? You can’t quite use the same idea, since you can’t really get “closer” to \infty in the same way we can with a number like zero, since all numbers are “infinitely far away” from \infty. The way we handle this is, instead of looking at trends as we let input values go through a list 1, 0.1, 0.001, 0.0001, and so on, we look at the trend as we let our input values go through a list 1, 10, 100, 1000, 10000, 100000, and so on. To be a bit more specific, when we write an expression like

\lim\limits_{x \to \infty} f(x),

what we mean is “what is the value towards which f(x) is trending as we choose super-huge values of x?”

Ok, so we have this general notion of a limit going towards infinity. What is this any good for? Well, remember that in our discussion about rectangles, we noticed that as we allow ourselves more and more rectangles in our estimation, our values will get closer and closer to correct. Using the idea of a limit going towards infinity, what if we allowed the number of rectangles we are using to approach infinity? This is the idea that we now call the Riemann sum.

To write everything down explicitly, let’s say we have a graph of a function f(x) between the points x = a and x = b on the x-axis. Let’s say we are using n rectangles to approximate the area. Then the formula for what is called the Riemann sum is often written down as

\lim\limits_{n \to \infty} \sum_{i=1}^n f(x_i^*) \Delta x.

Now, this looks a bit intimidating. To make this a bit more manageable, let’s break down the meaning of each symbol one by one. The \lim\limits_{n \to \infty} is the symbol that tells us we are using the limit going to infinity. The presence of the letter n in this limit, which we have defined as the number of rectangles we are using in our approximation, means that it is that number that we are allowing to become super-huge. The symbol \sum_{i=1}^n just means “add together a bunch of stuff”. The letter i is a kind of placeholder that keeps track of which term we are at in our sum, the 1 and n symbolize the starting and ending point of our sum. So, for instance,

\sum_{i=1}^n x_i = x_1 + x_2 + x_3 + \dots + x_n.

Thus, this \sum_{i=1}^n symbol is telling us that we are about to add together a bunch of areas of rectangles. This tells us that f(x_i^*) \Delta x_i must be the area of a rectangle. The symbol \Delta x_i is a shorthand for “the base of the rectangle”, the point x_i^* is the point that we are using to help us find the height of the rectangle (which was the midpoint in the example I gave before), and so f(x_i^*) is the height of the rectangle.

We therefore have this method for calculating areas of strange shapes exactly. The only problem is… how in the world do we evaluate such an odd looking expression? It isn’t at all obvious how to do this. If we could find a way, that would be super helpful – after all, knowing areas of weirdly shaped regions could be very helpful for all kinds of problems in real life. But we don’t know how to do that… yet. We will soon discover that a concept closely related to the derivative will come to the rescue and deliver to us an answer to our question about areas.


I’ve spent so much time talking about derivatives, and yet it will be several posts before they come up directly again. This may be confusing, because we will now be talking about areas – which is quite a different thing from slopes and tangent lines. And yet, a few posts from now, we will see how important derivatives are in dealing with areas. There is, in fact, possibly a visual connection you can already make between the two if you think about it for long enough. For the curious reader, see if you can think of a connection.

What are Cosmological Arguments?

Cosmological arguments are probably the most discussed and most intriguing area of natural theology. While there are a great variety of areas of philosophy that point towards God’s existence – we find that, for whatever reason, the realm of cosmological arguments seems to bring the most interest from both defenders and opponents of the arguments. Since these arguments are so interesting, it is worthwhile to lay out in detail what exactly these arguments are and what kinds of evidence and argument they draw on.

What Are Cosmological Arguments?

The idea behind cosmological arguments is well-expressed in the book of Romans:

“For since the creation of the world God’s invisible qualities—his eternal power and divine nature—have been clearly seen, being understood from what has been made, so that people are without excuse.” – Romans 1:20, NIV

The key point here is the phrase ‘being understood from what has been made’. The apostle Paul is making the point that by observing the created world, there is a very real sense in which you ought to recognize the handiwork of God.

To see this point more clearly, consider an analogy. Look at the painting below:

“Starry Night,” by Vincent van Gogh, 1889.

Recognize this? This is the famous Starry Night. Now, we know this was painted by Vincent van Gogh. Now, look at the next painting:

“Wheatfield with Crows,” by Vincent van Gogh, 1890.

Now, I think it fairly likely you haven’t seen this painting before. But if you already know Vincent van Gogh from his painting Starry Night, I think you can instantly see that this too is a painting by the same man. You can see the artist in the painting. Van Gogh has a unique style, a style that is instantly recognizable. Even if you didn’t know who he was, you could probably put two paintings of his side by side and immediately know that the same person painted both.

This is the idea Paul is conveying about God. As God is the Creator of the universe, we can see many of God’s qualities by observing the things He has made – just like we can see the qualities of an artist by the paintings they produce. And this is the basic idea behind cosmological arguments. Cosmological arguments are arguments that infer from observations we can make about the structure of the world around us that there must be a being very much like God behind it all.

So, what do these look like? There are several very different types of examples. I’ll go over a few in the rest of the article. Even if you don’t find them convincing, I hope this overview will help convey the very sensible intuitions that lay behind these arguments.

Kalam Cosmological Argument

The atheist philosopher Quentin Smith, writing in The Cambridge Companion to Atheism (2006), observed about what is now called the Kalam cosmological argument (put into its current form by philosopher William Lane Craig):

“A count of the articles in philosophy journals shows that more articles have been published about Craig’s defense of the Kalam argument than have been published about any other philosopher’s contemporary formulation of an argument for God’s existence. Surprisingly, this even holds for Plantinga’s ontological argument and Plantinga’s argument that theism is a rationally acceptable basic belief. The fact that theists and atheists alike “cannot leave Craig’s Kalam argument alone” suggests that it may be an argument of unusual philosophical interest or else has an attractive core of plausibility that keeps philosophers turning back to it and examining it again.”

So this is quite the famous debate! What, then, is the debate about?

The Kalam cosmological argument is named after medieval Islamic theology, within which the argument was developed to a high degree of philosophical sophistication before the modern time. The argument has a shockingly simply formula:

  1. Whatever begins to exist has a cause.
  2. The universe began to exist.
  3. Therefore, the universe has a cause.

How much simpler can it get! Well, the first step is essentially to point out that, as the entirety of science and philosophy show us again and again, events that we observe are caused by something else (even if, as in quantum mechanics, you can’t fully determine what that cause is). Well, if everything in our experience only starts existing when something causes it to exist – and in fact this principle is a fundamental principle is science – then what of the universe? Science appears to point us in the direction of the universe having an absolute beginning in time – the Big Bang. There are also independent philosophical reasons to think that time had a beginning grounded in the paradoxical nature of past infinite timelines, but the point is still the same – it really looks like the universe began. Then if the first point is right this means the universe must have a cause.

The first objection might be that this doesn’t seem to say anything about God… but slow down and think. The conclusion of the argument is that the entire cosmos has a cause outside of itself. When you think about what it means to be a Cause of the universe, you end up with a concept that looks a lot like God – such a Cause would need to, for example, exist beyond space and time and yet have such immense power that the being can cause time itself to come into existence. The sounds a lot like the divine attribute of omnipotence. There are more sophisticated conceptual arguments that show that this Cause must look even more like a monotheistic God, but going in depth into those would bring us too far afield.

Discussing the Kalam argument leads to lots of interesting conversations about the nature of time, science, and the relationship between cause and effect. Lots of interesting stuff to talk about!

The Leibnizian Contingency Argument

The Leibnizian cosmological argument is, like the Kalam and other cosmological arguments, based on observations about the universe in which we live. But unlike the Kalam, this cosmological argument – put forward famously by the co-inventor of calculus Gottfried Wilhelm Leibniz – has as its starting point the distinction between contingent existence and necessary existence. For example, I exist contingently because I could well have not existed. It is entirely possible that my parents never met or never even existed – and so I don’t have to exist. I am contingent. A being that exists necessarily would be the opposite. A thing that exists necessarily (if there are any such things) would be a thing that must exist – it cannot not exist.

Leibniz’ basic insight is that if something exists contingently (if it didn’t have to exist) then there should be some reason why it exists rather than not. The wide variety of ways to express this idea go under the title of the Principle of Sufficient Reason. Leibniz, starting from this idea, observes that the universe appears to be a contingent thing. Attempting to say that the universe could only exist exactly the way it does is very strange – even saying that the must be a universe at all seems rather strange. Leibniz reasons then that for a contingent universe, there should be a reason outside of itself for why it exists. This explanation, Leibniz would say, is God, who exists necessarily.

From the Stanford Encyclopedia of Philosophy article on this version of cosmological arguments, we find the following presented as one form the argument might take:

  1. If it is possible that it is necessary that a supernatural being of some sort exists, then it is necessary that a supernatural being of that sort exists.
  2. It is possible that it is necessary that a supernatural being of some sort exists.
  3. Therefore, it is necessary that this being exists.

Here is a different way of phrasing the same idea, the phrasing used by William Lane Craig:

1. Anything that exists has an explanation of its existence (either in the necessity of its own nature or in an external cause).
2. If the universe has an explanation of its existence, that explanation is God.
3. The universe exists.
4. Therefore, the universe has an explanation of its existence. (from 1, 3)
5. Therefore, the explanation of the existence of the universe is God. (from 2, 4)

Thomistic Cosmological Argument

There are, again, many variations on the idea behind Thomistic cosmological arguments. The general idea goes back to Aquinas’ Five Ways – the five major proofs of God’s existence that the great philosopher and theologian Thomas Aquinas offered. Several of them fall into the category of cosmological argument, but they take a different flavor than the Kalam or Leibnizian arguments. The basic idea of a Thomistic argument is something like this:

“Change is always the effect of a cause. It would stand to reason then that, on the basic principle that all effects have causes, all change has a cause. Since the idea of an infinite regress of causes is senseless, there must be an Uncaused First Cause that is the sufficient reason for why changing things exist at all.”

Aquinas is a bit hard to read, as he uses a very sophisticated and hard to penetrate philosophical jargon. So instead of presenting one of Aquinas’ formulations and his defense, I’ll present an argument based on the nature of cause and effect put forward by modern philosopher Alexander Pruss in his book Infinity, Causation, and Paradox that I have been reading – which is a deeply fascinating book on the philosophy of mathematics, infinities, and causation. Here is how Pruss formulates his argument.

  1. Nothing has an infinite causal history.
  2. There are no causal loops.
  3. Something has a cause.
  4. Therefore, there is an uncaused cause.

What is Pruss’ idea here? For context, this argument is written down in Chapter 9 of his book. What is he doing in the previous eight chapters? He is explaining, analyzing, and defending a position he calls causal finitism. Causal finitism, as Pruss defines it, is the view that if I trace the causes of some event back in time, the total list, which he calls the causal history of the event, is never infinitely long. Pruss reasons that there are only three possible causal histories an even might have – you could have a loop, an infinite backward chain, or a finite backward chain. Pruss argues extensively that there are no infinite backward chains or loops. To understand these arguments, consider the example of loops. An example of a causal loop would be a statement like

X was caused by Y, which was caused by Z, which was caused by X.

If you take things like this seriously, you end up with insane situations like a human being their own great grandparent – which surely is not possible. So, since all causal loops involve something like this, there are no causal loops. The arguments Pruss presents against the infinite histories are more technical and would take much longer to explain – but Pruss puts forward compelling paradoxes and outright contradictions that show up in situations like these as well. Therefore, Pruss, reasons, the only type of causal history that is valid is the finite causal history. This reasoning explains (1) and (2).

Now, certainly the idea of cause and effect is real – so (3) is true. Since all causal chains must come to an end, and since there are such things as causal chains, there exists an end to all causal chains. The end of the chain has to be uncaused – otherwise the chain wouldn’t be at its end yet. Therefore, an uncaused cause exists – and we could argue separately that God seems to be by far the best candidate for what an Uncaused Cause would be like.


Here, I’ve laid out some of the main categories of cosmological argument. To see a much more extensive list of cosmological arguments, check out this video, which goes over 20 different cosmological arguments that are discussed to varying degrees in the philosophy of religion today.